LexGlue
LexGlue is an NLP benchmark dedicated to the legal field, designed to assess the performance of models on tasks such as the classification of decisions, the prediction of violated articles, or legal MCQs. It combines seven subsets of data, each with a specific objective, to promote the emergence of efficient multi-tasking models in the field of law.
Over 7 sub-datasets (classif., QA), JSON files, thousands of annotated legal documents
CC-BY 4.0
Description
LexGlue is a legal NLP benchmark combining seven sub-datasets covering different jurisdictions (EU, US) and tasks (multi-label classification, MCQ, prediction of legal articles, etc.). It makes it possible to evaluate “foundation” models on various tasks in law, like GLUE or SuperGlue but dedicated to the legal field. Each dataset has been pre-processed to facilitate its use by legal AI researchers or practitioners.
What is this dataset for?
- Testing the robustness of multi-tasking models in a realistic legal framework
- Train an LLM to understand, file, or reason about legal documents
- Develop LegalTech systems (contractual analysis, decision prediction, etc.)
Can it be enriched or improved?
Yes, LexGlue can be enriched by adding new jurisdictions or annotation formats (e.g. summary of arguments, majority vs minority decisions). Its modular format also makes it easy to merge with other legal bodies for more comprehensive training. It can also be used as a basis for adaptation to French-speaking or multilingual contexts via controlled translation.
🔎 In summary
🧠 Recommended for
- Legal AI laboratories
- LegalTech Editors
- Comparative law researchers
🔧 Compatible tools
- Hugging Face Transformers
- PyTorch
- DeBerta
- Legal-bert
- LoRa
💡 Tip
Start with a simple task (e.g. LEDGAR) to test the robustness of your model before tackling complex cases like CaseHold or EcThr.
Frequently Asked Questions
Does LexGlue contain multilingual data?
No, all sub-datasets are in English, but some can be translated/adapted for other jurisdictions.
Can this benchmark be used for non-legal models?
Yes, LexGlue makes it possible to assess the ability of general models to adapt to technical or legal texts.
Is there a hierarchy between sub-datasets to structure the training?
Yes, some sub-datasets are simpler (LEDGAR), others are more complex (CaseHold): it is recommended to combine them gradually.




