GSM8K Platinum
An enhanced version of the GSM8K dataset, containing 1,209 carefully revised elementary school math problems. Mislabeled or ambiguous examples have been removed or corrected, in order to provide a reliable basis for evaluating the mathematical reasoning of language models.
Description
GSM8K-Platinum is a premium version of the famous corpus of elementary school math problems. Each statement is accompanied by a detailed solution using step-by-step reasoning. Unlike the original version, the examples have been carefully reviewed to eliminate any ambiguity or annotation errors. The dataset thus allows a fine and reliable evaluation of the mathematical reasoning abilities of language models.
What is this dataset for?
- Accurate benchmark of language models on arithmetic reasoning
- Training specialized models in textual mathematics
- Comparative evaluation between LLMs architectures (GPT, Claude, Mistral...)
Can it be enriched or improved?
Yes, although the dataset is already filtered and cleaned, it is possible to complete it with variant formulations or translations into other languages. Each question can also be enriched with additional annotations (difficulty, type of operation, number of steps).
🔎 In summary
🧠 Recommended for
- AI researchers
- Educational assistant developers
- Fine-tuning specialists
🔧 Compatible tools
- Hugging Face Datasets
- OpenLLM
- LangChain
- JSON parsers
- LoRa
💡 Tip
To adapt it to a French-speaking context, you can translate the statements and then compare the results on both versions to test the robustness of your models.
Frequently Asked Questions
What is the difference between GSM8K and GSM8K-Platinum?
GSM8K-Platinum is a streamlined version of the GSM8K test game: it fixes errors, removes ambiguous statements, and improves overall data quality.
Can a model be trained only with GSM8K-Platinum?
This dataset is mainly intended for evaluation. For training, it is recommended to use it in addition to larger games.
Is the dataset suitable for teaching or pedagogy?
Yes, it can be used as an exercise or training base for educational assistants and machine learning platforms.




