LUNA16
LUNA16 (LunG Nodule Analysis 2016) is a reference dataset for the development of algorithms for the detection of pulmonary nodules. It includes medical images from anonymized and annotated chest CT scans (CT scans) to train and evaluate lung cancer screening support systems.
Over 1000 3D chest scans, DICOM format
Free access for academic use, subject to registration and acceptance of the LUNG Nodule Analysis (LUNg Nodule Analysis) competition conditions
Description
The dataset is derived from the database LIDC-IDRI (Lung Image Database Consortium) and contains:
- 888 patients with a total of more than 1000 high-resolution CT scans
- Manual annotations by several radiologists on pulmonary nodules
- Precise metadata (size, location, degree of malignancy)
- Complete 3D volumes allowing volume deep learning approaches
It is structured to facilitate performance comparisons between medical image analysis models.
What is this dataset for?
LUNA16 is used in various contexts:
- Training models for the detection and classification of pulmonary nodules
- Validation of 3D segmentation approaches in radiology
- Development of diagnostic support systems in thoracic oncology
- Participation in scientific competitions on lung cancer screening
- Research in medical AI, image processing and predictive medicine
Can it be enriched or improved?
Yes, for example:
- Cross-referencing data with clinical follow-ups to predict the evolution of nodules
- Enrich annotations with more granular labels (shape, texture, vascularization...)
- Combine with other datasets like NSCLC-RadioGenomics or TCIA
- Use multimodal models combining image, text, and patient history
🔗 Source: LUNA16 Dataset
Frequently Asked Questions
What is the difference between LUNA16 and LIDC-IDRI?
LUNA16 is a filtered and preformatted subset of LIDC-IDRI, specifically structured for the automatic analysis of lung nodules in a competitive framework.
Are the annotations reliable?
Yes, nodules have been annotated by up to 4 radiologists, and only nodules > 3 mm are included in the final evaluation.
Does the dataset contain clinical information?
No Only nodule images and annotations are included. For clinical data, you have to turn to databases like MIMIC-CXR or TCGA-LUAD.