By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
Text-to-Image 2M
Multimodal

Text-to-Image 2M

Very large and qualitative dataset, designed for the fine-tuning of models for generating images from textual descriptions. It combines multiple sources to ensure diversity and quality.

Download dataset
Size

Approximately 2 million examples, 512x512 images (majority), JSON format or similar

Licence

MIT

Description

The dataset Text-to-Image 2M contains approximately 2 million text-image pairs, mostly in 512x512 resolution. It is the result of careful selection and improvement of multiple sources, optimized to train accurate and diverse text-to-image models.

What is this dataset for?

  • Train and refine models for generating images from text
  • Improving the quality and diversity of images produced by models
  • Adapt models to high resolutions with a subset of 10,000 1024x1024 images

Can it be enriched or improved?

Yes, it is possible to add additional annotations on style, composition, or objects. You can also extend the dataset with high-resolution data for specialized models. Rewriting the captions for greater precision is another way.

🔎 In summary

Criterion Evaluation
🧩 Ease of use⭐⭐⭐⭐✩ (Large volume but standardized format)
🧼 Need for cleaning⭐⭐⭐⭐✩ (Moderate – filtering possible depending on desired quality)
🏷️ Annotation richness⭐⭐⭐✩✩ (Descriptive text captions, few additional annotations)
📜 Commercial license✅ Yes (MIT)
👨‍💻 Beginner friendly⚠️ Moderate – requires managing volume
🔁 Fine-tuning ready✅ Excellent base for text-to-image
🌍 Cultural diversity🌐 Wide diversity in content and image styles

🧠 Recommended for

  • Generative AI researchers
  • Digital artists
  • Text-to-image template developers

🔧 Compatible tools

  • Stable Diffusion
  • SLAB
  • Imagen
  • Hugging Face Diffusers
  • PyTorch

💡 Tip

To optimize fine-tuning, start with a representative subset before integrating the entire dataset.

Frequently Asked Questions

Does this dataset contain high resolution images?

Yes, it contains a subset of 10,000 1024x1024 images for high resolution uses.

Are the captions standardized?

They are descriptive and generated by advanced models, but can be reworked for greater precision.

Can I use this dataset for commercial use?

Yes, the MIT license allows unrestricted commercial use.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.