PubMedVision

PubMedVision is a major multimodal medical dataset containing more than one million sample questions and answers associated with medical images from PubMed. The data is enriched by GPT-4V to ensure its quality and formatting.

Download dataset

Size

Approximately 1.3 million medical VQA pairs, 902 MB, Parquet format

Licence

Apache 2.0

Description

‍

The dataset PubMedVision contains over 1.3 million examples of medical Visual Question Answering (VQA). Each example associates a medical image with a question and its answer, making it possible to train models capable of understanding and answering complex questions in medical imaging.

‍

What is this dataset for?

‍

Training AI models for multimodal medical VQA
Improving the understanding of medical images and their contextual interpretation
Develop assistants to help healthcare professionals analyze clinical images

‍

Can it be enriched or improved?

‍

Yes, it is possible to add specific annotations on pathologies or modalities, to integrate additional data for medical sub-fields, or to reinforce the metadata on the images.

‍

🔎 In summary

Criterion	Evaluation
🧩 Ease of use	⭐⭐⭐✩✩ (Large dataset, requires significant resources)
🧼 Need for cleaning	⭐⭐⭐⭐⭐ (Low – data reformatted and validated by GPT-4V)
🏷️ Annotation richness	⭐⭐⭐⭐⭐ (Contextual questions and answers, body and modality annotations)
📜 Commercial license	✅ Yes (Apache 2.0)
👨‍💻 Beginner friendly	⚠️ No – recommended for advanced users
🔁 Fine-tuning ready	✅ Perfect for multimodal medical VQA
🌍 Cultural diversity	⚠️ Specialized medical imaging dataset