By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
AyavisionBench
Multimodal

AyavisionBench

AyavisionBench is a benchmark designed to test vision-language models in 23 languages, covering 9 task categories, ranging from graph comprehension to OCR and transcription.

Download dataset
Size

3,105 JPG image-question pairs, 23 languages, total size ~1.34 GB

Licence

Apache 2.0

Description

AyavisionBench is a multilingual dataset designed to assess the capabilities of models combining vision and natural language. It contains images in JPG format associated with questions that require visual context to be answered, in 23 major languages covering approximately half of the world's population. Tasks include describing images, understanding graphics, optical character recognition, and more.

What is this dataset for?

  • Assess the multimodal and multilingual understanding of AI models
  • Test robustness on various visual tasks like OCR, transcription, visual reasoning
  • Train models capable of generalizing to multiple languages and scripts

Can it be enriched or improved?

Yes, it is possible to add more languages, to diversify the types of images, or to enrich the questions with human annotations to increase the quality of the answers and the diversity of the cases.

🔎 In summary

Criterion Evaluation
🧩 Ease of use⭐⭐⭐⭐✩ (Clear dataset, requires multilingual handling)
🧼 Need for cleaning⭐⭐⭐⭐⭐ (Low – data well verified)
🏷️ Annotation richness⭐⭐⭐⭐✩ (Good – varied questions per image)
📜 Commercial license✅ Yes (Apache 2.0)
👨‍💻 Beginner friendly⚠️ Accessible for advanced multimodal projects
🔁 Fine-tuning ready✅ Perfect for multilingual multimodal fine-tuning
🌍 Cultural diversity🌐 Very high – 23 languages across diverse families and scripts

🧠 Recommended for

  • Multimodal AI researchers
  • Multilingual projects
  • Evaluation of vision-language models

🔧 Compatible tools

  • Hugging Face Datasets
  • Transformers
  • PyTorch
  • TensorFlow
  • PIL

💡 Tip

Use language validation to maximize quality in each language.

Frequently Asked Questions

How many languages does AyavisionBench cover?

The dataset includes 23 different languages, covering a great deal of linguistic and scriptural diversity.

What types of tasks are included in this dataset?

Tasks include image description, OCR, graph comprehension, transcription, visual recognition, and reasoning.

Does the license allow commercial use?

Yes, the Apache 2.0 license allows free commercial use subject to compliance with the terms.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.