OpenMathReasoning
A comprehensive corpus for advanced mathematical resolution, combining reasoning chains, generation selection, and integrated inference tools.
3.2M CoT solutions, 1.7M TIR solutions, 566K GenSelect, 193K statements alone; textual data structured in JSON
CC-BY 4.0
Description
OpenMathReasoning is a large-scale mathematical reasoning dataset designed to train language models to solve complex problems from AoPs forums. It includes more than 306,000 unique statements, with several million solutions generated using various strategies: thought chains (CoT), reasoning with integrated tools (TIR), and automatic selection of the best answers (GenSelect). The dataset is structured, validated and accompanied by rich metadata (generator model, success rate, etc.).
What is this dataset for?
- Train efficient mathematical reasoning models capable of solving Olympic-level problems
- Test various approaches: CoT, TIR, majority vote, etc.
- Optimize the training of LLMs specialized in STEM or educational applications
Can it be enriched or improved?
Yes, it is possible to add human annotations for the responses generated, to integrate other mathematical corpora (e.g. MATH, miniF2F), or to structure the problems by theme or level. The dataset can also be used as a basis for new benchmarks or for training models in other languages with adapted translation.
🔎 In summary
🧠 Recommended for
- Mathematical AI researchers
- LLM STEM developers
- Educational AI competitions
🔧 Compatible tools
- PyTorch
- Hugging Face
- DeepSpeed
- Transformers, VllM
💡 Tip
Filter problems by difficulty or success rate to better tailor the training to the ability of the model.
Frequently Asked Questions
Does the dataset cover all types of math problems?
It covers a wide variety, but mostly from AOPs forums. The standard problems are adapted to competitions and advanced reasoning.
Can we filter the data according to the type of reasoning used?
Yes, each example indicates the mode of inference: CoT (chain of thought), TIR (with tools) or GenSelect (response selection).
Is it suitable for fine-tuning without high-end GPUs?
Better exploited with powerful resources, but some subsets can be used with quantization or LoRa.