GLUE Benchmark

GLUE (General Language Understanding Evaluation) is a reference NLP benchmark, designed to assess the ability of models to understand language in a standardized way. It brings together several fundamental tasks such as text classification, semantic similarity detection or logical inference.

Download dataset

Size

Set of several datasets in TSV and JSON format

Licence

Free for academic use. Verification recommended for commercial uses according to sub-datasets

Description

‍
The GLUE benchmark includes:

9 data sets covering various tasks: nicking, paraphrasing, feeling analysis, anomaly detection, etc.
Standard formats (TSV, JSON) to facilitate integration into training pipelines
A public leaderboard to compare the performances of models
An overall score (GLUE score) summarizing the results on the various tasks

‍

What is this dataset for?

‍
GLUE is used for:

The detailed evaluation of natural language processing models on various tasks
Comparing performance between different architectures or training approaches
Improving NLP models through structured feedback on their strengths and weaknesses
The development of more general and robust models in NLP

‍

Can it be enriched or improved?

‍
Yes, although very comprehensive, GLUE has inspired several extensions:

SuperGlue: a more difficult version with more complex tasks
Multilingual translation and adaptation for the evaluation of non-English speaking models
Addition of dimensions such as fairness, biasability, or robustness in the face of adversarial disturbances
Integration into automated fine-tuning frameworks like Hugging Face Transformers

‍

🔗 Source: GLUE Benchmark

‍

Frequently Asked Questions

What is the difference between GLUE and SuperGlue?

SuperGlue uses the GLUE principle but adds more complex and demanding tasks to better differentiate the new generation models. It is considered to be a more selective benchmark.

‍

Can GLUE be used for training, or only for evaluation?

GLUE is primarily designed for evaluation, but its subdatasets can be used for fine-tuning or cross-validation if licenses permit.

Is GLUE still relevant today?

Yes, despite the emergence of new benchmarks, GLUE remains a reference for evaluating basic language comprehension. It is often used as an intermediate step before more complex benchmarks.

Similar datasets

RAVDESS

UCF101

LibriSpeech