CHexpert Dataset

CHexpert is a large-scale medical imaging dataset, developed by Stanford. It contains more than 220,000 chest x-rays that are automatically annotated and then validated, used to train and evaluate artificial intelligence models for the detection of lung pathologies.

Download dataset

Size

Over 224,000 DICOM chest x-rays

Licence

Free access on request, reserved for academic and non-commercial research (Stanford University specific license)

Description

‍
The dataset includes:

224,316 chest x-rays of over 65,000 patients
Images in DICOM format, from a university hospital
Annotations relating to 14 pathologies: pneumonia, pulmonary edema, rib fracture, cardiomegaly, etc.
Uncertainty levels for certain labels, which can be integrated into the training of probabilistic models

‍

CheXpert is one of the most used benchmarks for the automated classification of medical images in radiology.

‍

What is this dataset for?

‍
CheXpert is used in several use cases:

Training models for the classification and detection of pulmonary pathologies on radiography
Benchmarking medical imaging analysis algorithms (CNN, ViT, multimodality...)
The development of diagnostic tools for radiologists
Assessing the accuracy of AI systems in the face of the uncertainty of medical annotations
Research on the reliability and robustness of health models

‍