RL Mixed Dataset — Math images and problems for reinforcement learning
Combined dataset from geometry3k and math12k, including images associated with mathematical problems and their answers.
Approximately 3,600 PNG images with problems and answers in text format
MIT
Description
RL Mixed Dataset is a combination of two mathematical data sets containing images, text problems, and their answers. It is a corpus of around 3,600 examples intended for training multimodal models, particularly in the context of reinforcement learning.
What is this dataset for?
- Train multimodal models to solve mathematical problems with visual support
- Develop and test reinforcement learning algorithms with complex data
- Evaluate visual-textual comprehension in educational or research contexts
Can it be enriched or improved?
Yes, it is possible to add additional annotations on problems, to diversify the types of images, or to integrate linguistic variants for problems and answers.
🔎 In summary
🧠 Recommended for
- Researchers in AI / RL
- Multimodal AI developers
🔧 Compatible tools
- PyTorch
- TensorFlow
- RL frameworks
💡 Tip
Use train/test splits for a rigorous evaluation of models.
Frequently Asked Questions
Does this dataset contain additional annotations on math problems?
No, only images, problem statements, and answers are provided.
Can this dataset be used to train non-multimodal models?
Yes, text parts can be extracted for language-only training, but the dataset is optimized for multimodal.
Are there specific constraints related to the image format?
The images are in PNG format, it is recommended to use frameworks that support this standard format.




