By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
GLUE Benchmark
Text

GLUE Benchmark

GLUE (General Language Understanding Evaluation) is a reference NLP benchmark, designed to assess the ability of models to understand language in a standardized way. It brings together several fundamental tasks such as text classification, semantic similarity detection or logical inference.

Download dataset
Size

Set of several datasets in TSV and JSON format

Licence

Free for academic use. Verification recommended for commercial uses according to sub-datasets

Description


The GLUE benchmark includes:

  • 9 data sets covering various tasks: nicking, paraphrasing, feeling analysis, anomaly detection, etc.
  • Standard formats (TSV, JSON) to facilitate integration into training pipelines
  • A public leaderboard to compare the performances of models
  • An overall score (GLUE score) summarizing the results on the various tasks

What is this dataset for?


GLUE is used for:

  • The detailed evaluation of natural language processing models on various tasks
  • Comparing performance between different architectures or training approaches
  • Improving NLP models through structured feedback on their strengths and weaknesses
  • The development of more general and robust models in NLP

Can it be enriched or improved?


Yes, although very comprehensive, GLUE has inspired several extensions:

  • SuperGlue: a more difficult version with more complex tasks
  • Multilingual translation and adaptation for the evaluation of non-English speaking models
  • Addition of dimensions such as fairness, biasability, or robustness in the face of adversarial disturbances
  • Integration into automated fine-tuning frameworks like Hugging Face Transformers

🔗 Source: GLUE Benchmark

Frequently Asked Questions

What is the difference between GLUE and SuperGlue?

SuperGlue uses the GLUE principle but adds more complex and demanding tasks to better differentiate the new generation models. It is considered to be a more selective benchmark.

Can GLUE be used for training, or only for evaluation?

GLUE is primarily designed for evaluation, but its subdatasets can be used for fine-tuning or cross-validation if licenses permit.

Is GLUE still relevant today?

Yes, despite the emergence of new benchmarks, GLUE remains a reference for evaluating basic language comprehension. It is often used as an intermediate step before more complex benchmarks.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.