Innovatiana's Cosmetics Retail Dataset (CRD)
This open source dataset contains annotated images of cosmetic departments in stores. It was designed as part of a retail test project, with the aim of structuring the data necessary for the development of intelligent inventory and product detection algorithms. Despite a difficult context of collaboration, Innovatiana decided to free up this base in order to value the work of annotators and support AI projects in the retail sector.
4,820 annotated images, approximately 245,000 labels, annotations in XML format (CVAT)
Use for research and teaching purposes only. Product images remain the property of their respective owners. The user must ensure the legal conformity of his use
Description
The dataset contains:
- 4820 frames excerpts from videos shot in stores
- Approximately 245,000 manual annotations (bounding boxes, polygons)
- Up to 500 annotated objects per image
- Information on the layout of the shelves (planograms)
- Data structured in subsets to facilitate exploration
The annotations were created with CVAT, in a format compatible with Computer Vision retail projects.
What is this dataset for?
This data set can be used to:
- Training object detection models (cosmetics, shelving)
- Automatic detection of missing products on the shelves
- Inventory tracking and in-store product recognition
- Analysis of compliance with planograms
- The development of visual monitoring tools for mass retailers
Can it be enriched or improved?
Yes. As this dataset comes from a test project, some annotations may be partial or inconsistent. We recommend:
- To clean or refine annotated subsets
- To cross-reference data with other sources (metadata, product catalogs)
- To adapt annotations to internal classifications or business categories
- To complete the database with new shots or job labels
📄 An accompanying PDF is available (innv-cosmetics-dataset-for-retail.pdf) to indicate the subsets validated by our quality team.
🔗 Source: Hugging Face — Innovatiana Cosmetics Dataset
Frequently Asked Questions
Is the dataset ready to use for training?
Partially. Some subsets are clean and usable, others need to be revised. The PDF file provided makes it possible to identify the recommended sections for the initial training.
Why was this dataset made public?
It is an approach of transparency and the valuation of the work done by our teams, in a context of customer litigation. By publishing it, we contribute to open science and highlight the realities of the AI data production chain.
Have the brands present validated this dataset?
No The original customer was not affiliated with any of the brands represented. Innovatiana claims no rights to visual content and acts only as a technical annotator.