MidJourney v5 Prompt Dataset

Massive corpus of text prompts used with MidJourney v5 for AI image generation. Allows you to study creative formulations or to create models that generate prompts.

Download dataset

Size

4.2 million text prompts, tabular files (.csv, .json), cleaning possible via notebook provided

Licence

Apache 2.0

Description

‍

MidJourney v5 Prompt Dataset contains over 4.2 million lines of text prompts collected from interactions with MidJourney Bot. Each prompt reflects artistic styles, detailed scenes, or imaginative compositions for AI-based image generation.

‍

What is this dataset for?

‍

Create or refine prompt generation models for tools like MidJourney, DALL·E, or Stable Diffusion
Analyze artistic or stylistic trends in visual generation queries
Train NLP models specialized in visual description or composition

‍

Can it be enriched or improved?

‍

Yes, the dataset can be filtered, cleaned, or enriched using the notebooks provided. It is possible to add metadata (style, period, objects mentioned) or to translate the prompts for multilingual uses.

‍

🔎 In summary

Criterion	Evaluation
🧩Ease of use	⭐⭐⭐☆☆ (requires preprocessing for some tasks)
🧼Need for cleaning	⭐⭐⭐⭐☆ (moderate – cleaning tools are provided)
🏷️Richness of annotations	⭐⭐☆☆☆ (low, raw prompts without meta-info)
📜Commercial license	✅ Yes (Apache 2.0)
👨‍💻Beginner-friendly	👨‍🎨 Yes – good starting point for exploring prompting
🔁Reusable for fine-tuning	🔥 Very good for training prompt-generating models
🌍Cultural diversity	🌐 High diversity thanks to the open-source origin of prompts