AyavisionBench
AyavisionBench is a benchmark designed to test vision-language models in 23 languages, covering 9 task categories, ranging from graph comprehension to OCR and transcription.
3,105 JPG image-question pairs, 23 languages, total size ~1.34 GB
Apache 2.0
Description
AyavisionBench is a multilingual dataset designed to assess the capabilities of models combining vision and natural language. It contains images in JPG format associated with questions that require visual context to be answered, in 23 major languages covering approximately half of the world's population. Tasks include describing images, understanding graphics, optical character recognition, and more.
What is this dataset for?
- Assess the multimodal and multilingual understanding of AI models
- Test robustness on various visual tasks like OCR, transcription, visual reasoning
- Train models capable of generalizing to multiple languages and scripts
Can it be enriched or improved?
Yes, it is possible to add more languages, to diversify the types of images, or to enrich the questions with human annotations to increase the quality of the answers and the diversity of the cases.
🔎 In summary
🧠 Recommended for
- Multimodal AI researchers
- Multilingual projects
- Evaluation of vision-language models
🔧 Compatible tools
- Hugging Face Datasets
- Transformers
- PyTorch
- TensorFlow
- PIL
💡 Tip
Use language validation to maximize quality in each language.
Frequently Asked Questions
How many languages does AyavisionBench cover?
The dataset includes 23 different languages, covering a great deal of linguistic and scriptural diversity.
What types of tasks are included in this dataset?
Tasks include image description, OCR, graph comprehension, transcription, visual recognition, and reasoning.
Does the license allow commercial use?
Yes, the Apache 2.0 license allows free commercial use subject to compliance with the terms.




