By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Text

LexGlue

LexGlue is an NLP benchmark dedicated to the legal field, designed to assess the performance of models on tasks such as the classification of decisions, the prediction of violated articles, or legal MCQs. It combines seven subsets of data, each with a specific objective, to promote the emergence of efficient multi-tasking models in the field of law.

Download dataset
Size

Over 7 sub-datasets (classif., QA), JSON files, thousands of annotated legal documents

Licence

CC-BY 4.0

Description

LexGlue is a legal NLP benchmark combining seven sub-datasets covering different jurisdictions (EU, US) and tasks (multi-label classification, MCQ, prediction of legal articles, etc.). It makes it possible to evaluate “foundation” models on various tasks in law, like GLUE or SuperGlue but dedicated to the legal field. Each dataset has been pre-processed to facilitate its use by legal AI researchers or practitioners.

What is this dataset for?

  • Testing the robustness of multi-tasking models in a realistic legal framework
  • Train an LLM to understand, file, or reason about legal documents
  • Develop LegalTech systems (contractual analysis, decision prediction, etc.)

Can it be enriched or improved?

Yes, LexGlue can be enriched by adding new jurisdictions or annotation formats (e.g. summary of arguments, majority vs minority decisions). Its modular format also makes it easy to merge with other legal bodies for more comprehensive training. It can also be used as a basis for adaptation to French-speaking or multilingual contexts via controlled translation.

🔎 In summary

Criterion Evaluation
🧩 Ease of use⭐⭐⭐⭐✩ (Well-structured with provided scripts)
🧼 Need for cleaning⭐⭐⭐⭐⭐ (Low – ready-to-use data)
🏷️ Annotation richness⭐⭐⭐⭐⭐ (Excellent – multiple annotation types depending on the task)
📜 Commercial license✅ Yes (CC-BY 4.0)
👨‍💻 Beginner friendly⚠️ Moderate – better suited for structured projects
🔁 Fine-tuning ready🎯 Perfect for adapting a model to Legal AI use cases
🌍 Cultural diversity⚡ Medium – focus on European and US law

🧠 Recommended for

  • Legal AI laboratories
  • LegalTech Editors
  • Comparative law researchers

🔧 Compatible tools

  • Hugging Face Transformers
  • PyTorch
  • DeBerta
  • Legal-bert
  • LoRa

💡 Tip

Start with a simple task (e.g. LEDGAR) to test the robustness of your model before tackling complex cases like CaseHold or EcThr.

Frequently Asked Questions

Does LexGlue contain multilingual data?

No, all sub-datasets are in English, but some can be translated/adapted for other jurisdictions.

Can this benchmark be used for non-legal models?

Yes, LexGlue makes it possible to assess the ability of general models to adapt to technical or legal texts.

Is there a hierarchy between sub-datasets to structure the training?

Yes, some sub-datasets are simpler (LEDGAR), others are more complex (CaseHold): it is recommended to combine them gradually.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.