MidJourney v5 Prompt Dataset
Massive corpus of text prompts used with MidJourney v5 for AI image generation. Allows you to study creative formulations or to create models that generate prompts.
4.2 million text prompts, tabular files (.csv, .json), cleaning possible via notebook provided
Apache 2.0
Description
MidJourney v5 Prompt Dataset contains over 4.2 million lines of text prompts collected from interactions with MidJourney Bot. Each prompt reflects artistic styles, detailed scenes, or imaginative compositions for AI-based image generation.
What is this dataset for?
- Create or refine prompt generation models for tools like MidJourney, DALL·E, or Stable Diffusion
- Analyze artistic or stylistic trends in visual generation queries
- Train NLP models specialized in visual description or composition
Can it be enriched or improved?
Yes, the dataset can be filtered, cleaned, or enriched using the notebooks provided. It is possible to add metadata (style, period, objects mentioned) or to translate the prompts for multilingual uses.
🔎 In summary
🧠 Recommended for
- AI artists
- Prompting researchers
- Visual text generator developers
🔧 Compatible tools
- Python
- Hugging Face Datasets
- Pandas
- Jupyter notebooks
💡 Tip
Filter prompts that contain specific styles (e.g., “realistic”, “futuristic”) to create targeted subdatasets.
Frequently Asked Questions
Does this dataset contain images or only text?
It only contains text prompts, with no generated images. It is intended for the analysis or generation of text for visual tools.
Can this dataset be used to train a generative model?
Yes, it's ideal for training models that can automatically generate creative prompts for visual purposes.
Is it possible to use it in languages other than English?
Yes, although the prompts are mostly in English, you can translate them or add prompts in other languages to enrich the corpus.