By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
MidJourney v5 Prompt Dataset
Text

MidJourney v5 Prompt Dataset

Massive corpus of text prompts used with MidJourney v5 for AI image generation. Allows you to study creative formulations or to create models that generate prompts.

Download dataset
Size

4.2 million text prompts, tabular files (.csv, .json), cleaning possible via notebook provided

Licence

Apache 2.0

Description

MidJourney v5 Prompt Dataset contains over 4.2 million lines of text prompts collected from interactions with MidJourney Bot. Each prompt reflects artistic styles, detailed scenes, or imaginative compositions for AI-based image generation.

What is this dataset for?

  • Create or refine prompt generation models for tools like MidJourney, DALL·E, or Stable Diffusion
  • Analyze artistic or stylistic trends in visual generation queries
  • Train NLP models specialized in visual description or composition

Can it be enriched or improved?

Yes, the dataset can be filtered, cleaned, or enriched using the notebooks provided. It is possible to add metadata (style, period, objects mentioned) or to translate the prompts for multilingual uses.

🔎 In summary

Criterion Evaluation
🧩Ease of use ⭐⭐⭐☆☆ (requires preprocessing for some tasks)
🧼Need for cleaning ⭐⭐⭐⭐☆ (moderate – cleaning tools are provided)
🏷️Richness of annotations ⭐⭐☆☆☆ (low, raw prompts without meta-info)
📜Commercial license ✅ Yes (Apache 2.0)
👨‍💻Beginner-friendly 👨‍🎨 Yes – good starting point for exploring prompting
🔁Reusable for fine-tuning 🔥 Very good for training prompt-generating models
🌍Cultural diversity 🌐 High diversity thanks to the open-source origin of prompts

🧠 Recommended for

  • AI artists
  • Prompting researchers
  • Visual text generator developers

🔧 Compatible tools

  • Python
  • Hugging Face Datasets
  • Pandas
  • Jupyter notebooks

💡 Tip

Filter prompts that contain specific styles (e.g., “realistic”, “futuristic”) to create targeted subdatasets.

Frequently Asked Questions

Does this dataset contain images or only text?

It only contains text prompts, with no generated images. It is intended for the analysis or generation of text for visual tools.

Can this dataset be used to train a generative model?

Yes, it's ideal for training models that can automatically generate creative prompts for visual purposes.

Is it possible to use it in languages other than English?

Yes, although the prompts are mostly in English, you can translate them or add prompts in other languages to enrich the corpus.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.