Text-to-Image 2M
Very large and qualitative dataset, designed for the fine-tuning of models for generating images from textual descriptions. It combines multiple sources to ensure diversity and quality.
Approximately 2 million examples, 512x512 images (majority), JSON format or similar
MIT
Description
The dataset Text-to-Image 2M contains approximately 2 million text-image pairs, mostly in 512x512 resolution. It is the result of careful selection and improvement of multiple sources, optimized to train accurate and diverse text-to-image models.
What is this dataset for?
- Train and refine models for generating images from text
- Improving the quality and diversity of images produced by models
- Adapt models to high resolutions with a subset of 10,000 1024x1024 images
Can it be enriched or improved?
Yes, it is possible to add additional annotations on style, composition, or objects. You can also extend the dataset with high-resolution data for specialized models. Rewriting the captions for greater precision is another way.
🔎 In summary
🧠 Recommended for
- Generative AI researchers
- Digital artists
- Text-to-image template developers
🔧 Compatible tools
- Stable Diffusion
- SLAB
- Imagen
- Hugging Face Diffusers
- PyTorch
💡 Tip
To optimize fine-tuning, start with a representative subset before integrating the entire dataset.
Frequently Asked Questions
Does this dataset contain high resolution images?
Yes, it contains a subset of 10,000 1024x1024 images for high resolution uses.
Are the captions standardized?
They are descriptive and generated by advanced models, but can be reworked for greater precision.
Can I use this dataset for commercial use?
Yes, the MIT license allows unrestricted commercial use.




