Data

Training and Validation: Unenhanced chest CTs from 199 and 50 patients, respectively, with positive RT-PCR for SARS-CoV-2 and ground truth annotations of COVID-19 lesions in the lung.

Testing: Additional, unseen 46 patients with positive RT-PCR for SARS-CoV-2 and ground truth annotations of COVID-19 lesions in the lung CT. The test cases are from a variety of sources, included sources not used for training and validation.

CT data provided by The Multi-national NIH Consortium for CT AI in COVID-19 via the NCI TCIA public website. Users of the CT data must abide by the Creative Commons Attribution 4.0 International License under which it has been published. Attribution should include references to the following citations.

Data Citation
An P, Xu S, Harmon SA, Turkbey EB, Sanford TH, Amalou A, Kassin M, Varble N, Blain M, Anderson V, Patella F, Carrafiello G, Turkbey BT, Wood BJ (2020). CT Images in Covid-19 [Data set]. The Cancer Imaging Archive. DOI: https://doi.org/10.7937/tcia.2020.gqry-nc81


TCIA Citation
Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. DOI: 10.1007/s10278-013-9622-7.

The annotation of the data set was made possible through the joint work of Children's National Hopital, NVIDIA and National Institutes of Health. NVIDIA and NIH have jointly developed a COVID-19 segmentation model that takes a full CT chest volume and produces pixel wise segmentations of COVID lesions (ngc.nvidia.com/catalog/models/nvidia:clara_train_covid19_ct_lesion_seg.). All lung lesions related to COVID-19 were included. These segmentations were subsequently used as a starting point for board certified radiologists who manually adjudicated and corrected them using ITKSnap  (http://www.itksnap.org/pmwiki/pmwiki.php) to create the ground truth annotations for the data set with 3D consistency. A limitation of this approach is the potential noise in the annotations, which would be similar in the training, validation and test sets.

COVID-19-20 annotation data are available under CC0 license.

Annotated data must be acknowledged as below:
"The annotation of the dataset was made possible through the joint work of Children's National Hospital, NVIDIA and National Institutes of Health for the COVID-19-20 Lung CT Lesion Segmentation Grand Challenge."