CHexpert Dataset
CHexpert is a large-scale medical imaging dataset, developed by Stanford. It contains more than 220,000 chest x-rays that are automatically annotated and then validated, used to train and evaluate artificial intelligence models for the detection of lung pathologies.
Over 224,000 DICOM chest x-rays
Free access on request, reserved for academic and non-commercial research (Stanford University specific license)
Description
The dataset includes:
- 224,316 chest x-rays of over 65,000 patients
- Images in DICOM format, from a university hospital
- Annotations relating to 14 pathologies: pneumonia, pulmonary edema, rib fracture, cardiomegaly, etc.
- Uncertainty levels for certain labels, which can be integrated into the training of probabilistic models
CheXpert is one of the most used benchmarks for the automated classification of medical images in radiology.
What is this dataset for?
CheXpert is used in several use cases:
- Training models for the classification and detection of pulmonary pathologies on radiography
- Benchmarking medical imaging analysis algorithms (CNN, ViT, multimodality...)
- The development of diagnostic tools for radiologists
- Assessing the accuracy of AI systems in the face of the uncertainty of medical annotations
- Research on the reliability and robustness of health models
Can it be enriched or improved?
Yes, several approaches are possible:
- Add additional clinical annotations or definitive diagnoses
- Merge with other datasets (MIMIC-CXR, NIH ChestX-ray14) to enhance diversity
- Integrate metadata (age, gender, background) for contextual models
- Use semi-supervised or uncertainty learning approaches to exploit weak labels
🔗 Source: CHexpert Dataset
Frequently Asked Questions
Are the annotations made by radiologists?
Initial annotations are automatically generated from the reports and then validated on a subset by doctors to assess performance.
Does CheXpert cover pediatric cases?
No, the dataset is based on adult patients. For pediatric cases, other datasets like PadChest or PedchestXray are more appropriate.
Is there a leaderboard for CHexpert?
Yes, Stanford provides a standardized assessment to compare the performance of models on a closed test set.