By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
FLORES+: Multilingual Translation Benchmark
Text

FLORES+: Multilingual Translation Benchmark

A multi-lingual benchmark for evaluating translation quality in over 200 languages, derived from a variety of sources such as Wikinews and Wikivoyage.

Download dataset
Size

Approximately 2,000 sentences per language × 222 languages, structured text format

Licence

CC-BY-SA 4.0

Description

FLORES+ is a multilingual benchmark used to test the accuracy of machine translation across 222 languages. It contains sentences from various sources (Wikinews, Wikivoyage, Wikijunior), translated from English into a wide range of languages. The corpus is divided into standardized splits (dev, devtest), facilitating comparisons between models.

What is this dataset for?

  • Evaluate the performance of translation models in low- and high-resource languages
  • Testing multilingual systems in a controlled context
  • Explore LLM or NMT language coverage

Can it be enriched or improved?

Yes. You can add new language pairs, complete the game with additional human translations or enrich the metadata by language (linguistic family, typology). It can also be used as a basis for creating specialized benchmarks by field (legal, medical, etc.).

🔎 In summary

Criterion Evaluation
🧩Ease of Use ⭐⭐⭐⭐⭐ (Simple structure, well documented)
🧼Need for Cleaning ⭐⭐⭐⭐⭐ (None – data ready to use)
🏷️Annotation Richness ⭐⭐⭐⭐☆ (Multilingual, well segmented)
📜Commercial License ✅ Yes (CC-BY-SA 4.0)
👨‍💻Beginner Friendly 👩‍🎓 Yes, easy to handle
🔁Reusable for Fine-Tuning 🔥 Perfect for adapting or evaluating NMT models
🌍Cultural Diversity 🌐 Very high – 222 languages covered

🧠 Recommended for

  • Translation researchers
  • Low resource language specialists
  • Multilingual model developers

🔧 Compatible tools

  • MarianMt
  • Fairseq
  • Hugging Face Transformers
  • BLUE/METEOR

💡 Tip

Use differentiated metrics (BLEU, COMET, ChRF) according to the languages for a detailed evaluation.

Frequently Asked Questions

Can FLORES+ be used to evaluate models on rare languages?

Yes, it is one of its main assets: its coverage includes numerous low-resource languages.

Does the dataset contain parallel texts for learning?

No, it's designed for evaluation. Each source sentence is translated into multiple languages, but it's not a training corpus.

Is this benchmark compatible with fine-tuned translation models?

Absolutely, it is frequently used to validate the quality of trained or adapted models.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.