By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
MultiNli (Multi-Genre Natural Language Inference Corpus)
Text

MultiNli (Multi-Genre Natural Language Inference Corpus)

MultiNLI (Multi-Genre Natural Language Inference) is a reference data set for evaluating the logical understanding of language by NLP models. It was designed to test the ability of models to determine the relationship between two sentences: involvement, contradiction, or neutrality.

Download dataset
Size

Approximately 400,000 sentence pairs, TSV format

Licence

Free for academic use. Restrictions may apply depending on commercial use

Description


The MultiNli dataset includes:

  • Approximately 400,000 pairs of manually annotated sentences
  • Three logical relationships: bias, contradiction, neutral
  • A diversity of textual sources covering formal and informal contexts
  • A TSV format that is easy to integrate into traditional NLP pipelines

What is this dataset for?


MultiNli is mainly used for:

  • Training textual entailment recognition models
  • Assessing the ability of models to detect logical relationships between sentences
  • The fine-tuning of language models on contextual comprehension tasks
  • Analysis of the robustness and logical coherence of the responses generated by the models

Can it be enriched or improved?


Yes, MultiNli can be enriched or adapted for:

  • Create multilingual versions to evaluate models in other languages
  • Add metadata about genres or domains for finer filtering
  • Combine with SNLI (Stanford NLI) for wider coverage
  • Automatically generate new pairs with paraphrase or contradiction models

🔗 Source: MultiNli Dataset

Frequently Asked Questions

What is the difference between MultiNLI and SNLI?

SNLI is focused on a single domain (image descriptions), while MultiNLI covers multiple text genres, making it possible to better test the generalization of models across different language styles.

Can MultiNli be used for evaluation and training?

Yes, it is frequently used both for fine-tuning and for evaluating the logical inference quality of a model.

Why is MultiNli important for generation models?

Even though it's not a generation dataset, MultiNli helps train models to maintain logical consistency in their responses, which is critical for applications like chatbots or voice assistants.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.