PubMedVision
PubMedVision is a major multimodal medical dataset containing more than one million sample questions and answers associated with medical images from PubMed. The data is enriched by GPT-4V to ensure its quality and formatting.
Approximately 1.3 million medical VQA pairs, 902 MB, Parquet format
Apache 2.0
Description
The dataset PubMedVision contains over 1.3 million examples of medical Visual Question Answering (VQA). Each example associates a medical image with a question and its answer, making it possible to train models capable of understanding and answering complex questions in medical imaging.
What is this dataset for?
- Training AI models for multimodal medical VQA
- Improving the understanding of medical images and their contextual interpretation
- Develop assistants to help healthcare professionals analyze clinical images
Can it be enriched or improved?
Yes, it is possible to add specific annotations on pathologies or modalities, to integrate additional data for medical sub-fields, or to reinforce the metadata on the images.
🔎 In summary
🧠 Recommended for
- Medical imaging researchers
- VQA model developers
- Digital health experts
🔧 Compatible tools
- Hugging Face Transformers
- PyTorch
- VQA tools
- Multimodal frameworks
💡 Tip
Use annotations on body parts and modalities to refine models for specific tasks.
Frequently Asked Questions
Does this dataset contain additional image annotations?
Yes, it includes annotations on body parts and imaging modalities.
Can this dataset be used to train a medical assistant?
Yes, it is designed to improve understanding and response skills in medical imaging.
Is this dataset suitable for beginners in medical AI?
No, its volume and complexity make it more suitable for advanced users with significant resources.




