By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
CHexpert Dataset
Medical

CHexpert Dataset

CHexpert is a large-scale medical imaging dataset, developed by Stanford. It contains more than 220,000 chest x-rays that are automatically annotated and then validated, used to train and evaluate artificial intelligence models for the detection of lung pathologies.

Download dataset
Size

Over 224,000 DICOM chest x-rays

Licence

Free access on request, reserved for academic and non-commercial research (Stanford University specific license)

Description


The dataset includes:

  • 224,316 chest x-rays of over 65,000 patients
  • Images in DICOM format, from a university hospital
  • Annotations relating to 14 pathologies: pneumonia, pulmonary edema, rib fracture, cardiomegaly, etc.
  • Uncertainty levels for certain labels, which can be integrated into the training of probabilistic models

CheXpert is one of the most used benchmarks for the automated classification of medical images in radiology.

What is this dataset for?


CheXpert is used in several use cases:

  • Training models for the classification and detection of pulmonary pathologies on radiography
  • Benchmarking medical imaging analysis algorithms (CNN, ViT, multimodality...)
  • The development of diagnostic tools for radiologists
  • Assessing the accuracy of AI systems in the face of the uncertainty of medical annotations
  • Research on the reliability and robustness of health models

Can it be enriched or improved?


Yes, several approaches are possible:

  • Add additional clinical annotations or definitive diagnoses
  • Merge with other datasets (MIMIC-CXR, NIH ChestX-ray14) to enhance diversity
  • Integrate metadata (age, gender, background) for contextual models
  • Use semi-supervised or uncertainty learning approaches to exploit weak labels

🔗 Source: CHexpert Dataset

Frequently Asked Questions

Are the annotations made by radiologists?

Initial annotations are automatically generated from the reports and then validated on a subset by doctors to assess performance.

Does CheXpert cover pediatric cases?

No, the dataset is based on adult patients. For pediatric cases, other datasets like PadChest or PedchestXray are more appropriate.

Is there a leaderboard for CHexpert?

Yes, Stanford provides a standardized assessment to compare the performance of models on a closed test set.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.