FLORES+: Multilingual Translation Benchmark

A multi-lingual benchmark for evaluating translation quality in over 200 languages, derived from a variety of sources such as Wikinews and Wikivoyage.

Download dataset

Size

Approximately 2,000 sentences per language × 222 languages, structured text format

Licence

CC-BY-SA 4.0

Description

‍

FLORES+ is a multilingual benchmark used to test the accuracy of machine translation across 222 languages. It contains sentences from various sources (Wikinews, Wikivoyage, Wikijunior), translated from English into a wide range of languages. The corpus is divided into standardized splits (dev, devtest), facilitating comparisons between models.

‍

What is this dataset for?

‍

Evaluate the performance of translation models in low- and high-resource languages
Testing multilingual systems in a controlled context
Explore LLM or NMT language coverage

‍

Can it be enriched or improved?

‍

Yes. You can add new language pairs, complete the game with additional human translations or enrich the metadata by language (linguistic family, typology). It can also be used as a basis for creating specialized benchmarks by field (legal, medical, etc.).

‍

🔎 In summary

Criterion	Evaluation
🧩Ease of Use	⭐⭐⭐⭐⭐ (Simple structure, well documented)
🧼Need for Cleaning	⭐⭐⭐⭐⭐ (None – data ready to use)
🏷️Annotation Richness	⭐⭐⭐⭐☆ (Multilingual, well segmented)
📜Commercial License	✅ Yes (CC-BY-SA 4.0)
👨‍💻Beginner Friendly	👩‍🎓 Yes, easy to handle
🔁Reusable for Fine-Tuning	🔥 Perfect for adapting or evaluating NMT models
🌍Cultural Diversity	🌐 Very high – 222 languages covered