TCIA Dataset (The Cancer Imaging Archive)
TCIA is one of the largest medical image databases dedicated to cancer research. It brings together data from various types of imaging (MRI, CT, CT, X-ray), from real clinical cohorts, with expert annotations. It is an essential resource for training artificial intelligence algorithms in oncology.
Several terabytes of medical images (MRI, CT, X-rays), DICOM format
Free access for research, under specific conditions of license and use according to the collections. Data is de-identified and open to the scientific community
Description
The TCIA corpus includes:
- Dozens of collections with thousands of patients
- Imaging exams: MRI, CT, PET, X-rays
- Manual annotations (segmentation, tumor contours, diagnostics)
- Associated data (biomarkers, genomics, clinical results in some cases)
- Specialized subsets: lung, brain, prostate, breast, etc.
Each collection is documented with clinical metadata and structured according to DICOM standards, facilitating its integration into research workflows.
What is this dataset for?
TCIA is used for:
- Training AI models for tumor detection and segmentation
- The development of diagnostic support systems in radiology
- Multi-modal imaging analysis for translational research
- Validation of cancer prediction or treatment response algorithms
- Cross-referencing with omic data (radiogenomics)
Can it be enriched or improved?
Yes, for example:
- Add custom clinical annotations (grading, stages, scores)
- Merge with databases like The Cancer Genome Atlas (TCGA) for cross-analyses
- Complete the series with synthetic models (GAN, 3D augmentation)
- Use tools like 3D Slicer, MONAI, or NNU-Net for pre-processing and training
🔗 Source: The Cancer Imaging Archive (TCIA)
Frequently Asked Questions
Does TCIA only contain anonymized images?
Yes, all data is strictly de-identified according to HIPAA standards before publication.
Can TCIA be used for clinical studies?
Yes, provided that the study remains within the framework of academic or institutional research. Some collections require a specific access request.
What are the differences between the collections available?
Each collection corresponds to a clinical study or a particular type of cancer. They vary in imaging modalities, number of patients, type of annotations or presence of associated data (monitoring, genomics, etc.).