En cliquant sur "Accepter ", vous acceptez que des cookies soient stockés sur votre appareil afin d'améliorer la navigation sur le site, d'analyser son utilisation et de contribuer à nos efforts de marketing. Consultez notre politique de confidentialité pour plus d'informations.
Open Datasets
NIH Chest X-rays
Medical

NIH Chest X-rays

The NIH Chest X-rays Dataset is one of the most used medical datasets in the field of AI applied to radiology. It contains more than 100,000 chest X-rays with automatic annotations covering 14 pathologies, including pneumonia, pleural effusion, emphysema and pulmonary nodules.

Download dataset
Size

112,120 chest x-rays of 30,805 patients, PNG format (from DICOM)

Licence

Free for academic research, under terms of use specified by the National Institutes of Health (NIH). Data is anonymized and publicly accessible

Description


The dataset includes:

  • 112,120 images from postero-anterior chest X-rays
  • 14 pathological labels per image (multiple labels possible)
  • Related metadata: age, gender, patient ID, patient position
  • Annotations generated automatically from radiological reports
  • Data from the NIH Clinical Center

Although the annotations are automatic, the dataset remains a reference in medical vision, often used with pre-trained models or for fine-tuning.

What is this dataset for?


NIH Chest X-rays is used for:

  • Training models for the classification or localization of pulmonary pathologies
  • Pre-training CNN networks for medical vision (DenseNet, EfficientNet, etc.)
  • Benchmarking AI detection models in the hospital field
  • The development of automated sorting or pre-diagnostic alert systems
  • Validation of models on various clinical cases, with a very broad and representative base

Can it be enriched or improved?


Yes, in particular by:

  • The addition of manual annotations validated by radiologists (e.g. via CHexpert or ChestX-ray14)
  • Using segmentation algorithms to locate abnormalities in the lungs
  • Crossing with clinical databases (MIMIC-CXR) to combine images and text
  • Extension with multi-view models (addition of lateral radiographs or CT)

🔗 Source: NIH Chest X-rays Dataset

Frequently Asked Questions

Are the annotations reliable for clinical use?

The labels were automatically extracted from reports. They are useful for training, but manual validation is recommended for sensitive applications.

Are X-rays in DICOM?

The files available are in PNG format converted from DICOM. Some modified versions allow access to the original DICOM structure.

Are there associated benchmarks?

Yes, several research studies have used this dataset to establish benchmarks in multi-label classification, localization, and low supervision detection.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.