    Our study has some limitations. First, the external test set was sourced from a single institution. It is critical to test the AI algorithm across multiple datasets distributed across geographical regions to ensure that the model’s results are generalizable across cohorts and geographies. Second, we utilized RT-PCR as the gold standard for the diagnosis of COVID-19 infections. However, RT-PCR has a limited sensitivity of approximately 71%, so there may be cases where the person is COVID-19 positive on chest radiographs but negative on RT-PCR results. Third, our study does not incorporate clinical parameters and does not attempt at categorising patients based on COVID-19 severity scores. We avoided providing COVID-19 scores based on chest radiographs as unlike chest CT scans, there is often interobserver disagreement on the extent of lung involvement in radiography for reasons encompassing different acquisition protocols, image quality, and radiologist opinion. The development of an AI system based on the consensus scoring of a few radiologists from an isolated geographical location may not represent the consensus of radiologists globally. However, some researchers did attempt to build such a system, like the one developed by Borghesi & Maroldi (22) on a small dataset of 100 patients. Attempts were also made by Monaco et al. (23) with the dataset of 295 patients and Orsi et al. (24) with the dataset of 155 patients to produce scoring systems for chest radiographs and link them...

