Text-to-Image 2M

Very large and qualitative dataset, designed for the fine-tuning of models for generating images from textual descriptions. It combines multiple sources to ensure diversity and quality.

Download dataset

Size

Approximately 2 million examples, 512x512 images (majority), JSON format or similar

Licence

MIT

Description

‍

The dataset Text-to-Image 2M contains approximately 2 million text-image pairs, mostly in 512x512 resolution. It is the result of careful selection and improvement of multiple sources, optimized to train accurate and diverse text-to-image models.

‍

What is this dataset for?

‍

Train and refine models for generating images from text
Improving the quality and diversity of images produced by models
Adapt models to high resolutions with a subset of 10,000 1024x1024 images

‍

Can it be enriched or improved?

‍

Yes, it is possible to add additional annotations on style, composition, or objects. You can also extend the dataset with high-resolution data for specialized models. Rewriting the captions for greater precision is another way.

‍

🔎 In summary

Criterion	Evaluation
🧩 Ease of use	⭐⭐⭐⭐✩ (Large volume but standardized format)
🧼 Need for cleaning	⭐⭐⭐⭐✩ (Moderate – filtering possible depending on desired quality)
🏷️ Annotation richness	⭐⭐⭐✩✩ (Descriptive text captions, few additional annotations)
📜 Commercial license	✅ Yes (MIT)
👨‍💻 Beginner friendly	⚠️ Moderate – requires managing volume
🔁 Fine-tuning ready	✅ Excellent base for text-to-image
🌍 Cultural diversity	🌐 Wide diversity in content and image styles