MM-IMDb (Multimodal IMDb Dataset)
mm-IMDb (Multimodal IMDb) is a multimodal dataset combining textual information (movie summaries), images (movie posters), and genre labels. It is designed for training and evaluating models capable of dealing with several modalities in parallel, in classification, recommendation or generation tasks.
Over 25,000 movies, with textual metadata, posters (images) and multi-label labels (genres)
Free use for academic research, under MIT license
Description
For each movie, the dataset includes:
- A textual summary (IMDb synopsis)
- A poster in image (JPEG)
- A list of genres (up to 23 possible genres: drama, action, comedy, etc.)
- Metadata: title, date, duration, etc.
The dataset is structured to be used in multimodal approaches (text + image), with standardized splits for training, validation, and testing.
What is this dataset for?
mm-IMDb can be used for:
- Training multimodal classification models (poster + synopsis → genres)
- The development of film recommendation systems
- The fusion of text/image representations (multi-embedding)
- Analysis of the respective contribution of text and image to classification
- The validation of architectures such as CLIP, ViLT, or multimodal BERT
Can it be enriched or improved?
Yes:
- Add information about the cast, awards, or reviews
- Complete images with scene captures (frames)
- Introduce audio features for tri-modal analysis
- Improving labels via crowdsourcing or more recent re-labeling models
🔗 Source: MM-IMDb Dataset on GitHub
Frequently Asked Questions
Can the dataset be used to test CLIP or BLIP?
Yes, it is an excellent benchmark for testing vision-language models on the classification or semantic alignment task.
Are the images of consistent quality?
The posters are automatically extracted from IMDb. Some may be of varying quality, but they remain generally clean and usable.
Is the dataset multilingual?
No Synopses are in English only.