RL Mixed Dataset — Math images and problems for reinforcement learning

Combined dataset from geometry3k and math12k, including images associated with mathematical problems and their answers.

Download dataset

Size

Approximately 3,600 PNG images with problems and answers in text format

Licence

MIT

Description

‍

RL Mixed Dataset is a combination of two mathematical data sets containing images, text problems, and their answers. It is a corpus of around 3,600 examples intended for training multimodal models, particularly in the context of reinforcement learning.

‍

What is this dataset for?

‍

Train multimodal models to solve mathematical problems with visual support
Develop and test reinforcement learning algorithms with complex data
Evaluate visual-textual comprehension in educational or research contexts

‍

Can it be enriched or improved?

‍

Yes, it is possible to add additional annotations on problems, to diversify the types of images, or to integrate linguistic variants for problems and answers.

‍

🔎 In summary

Criterion	Evaluation
🧩 Ease of use	⭐⭐⭐⭐⭐ (Well-structured and easily accessible data)
🧼 Need for cleaning	⭐⭐⭐⭐⭐ (Low – ready-to-use images and text)
🏷️ Annotation richness	⭐⭐✩✩✩ (Basic: problem, image, response)
📜 Commercial license	✅ Yes (MIT)
👨‍💻 Beginner friendly	⚠️ Moderate – requires mathematical knowledge
🔁 Fine-tuning ready	🎯 Suitable for multimodal learning and RL
🌍 Cultural diversity	⚠️ Primarily English, to be enriched