Innovatiana's Cosmetics Retail Dataset (CRD)

This open source dataset contains annotated images of cosmetic departments in stores. It was designed as part of a retail test project, with the aim of structuring the data necessary for the development of intelligent inventory and product detection algorithms. Despite a difficult context of collaboration, Innovatiana decided to free up this base in order to value the work of annotators and support AI projects in the retail sector.

Download dataset

Size

4,820 annotated images, approximately 245,000 labels, annotations in XML format (CVAT)

Licence

Use for research and teaching purposes only. Product images remain the property of their respective owners. The user must ensure the legal conformity of his use

Description

‍
The dataset contains:

4820 frames excerpts from videos shot in stores
Approximately 245,000 manual annotations (bounding boxes, polygons)
Up to 500 annotated objects per image
Information on the layout of the shelves (planograms)
Data structured in subsets to facilitate exploration

‍

The annotations were created with CVAT, in a format compatible with Computer Vision retail projects.

‍

What is this dataset for?

‍
This data set can be used to:

Training object detection models (cosmetics, shelving)
Automatic detection of missing products on the shelves
Inventory tracking and in-store product recognition
Analysis of compliance with planograms
The development of visual monitoring tools for mass retailers

‍

Can it be enriched or improved?

‍
Yes. As this dataset comes from a test project, some annotations may be partial or inconsistent. We recommend:

To clean or refine annotated subsets
To cross-reference data with other sources (metadata, product catalogs)
To adapt annotations to internal classifications or business categories
To complete the database with new shots or job labels

‍

📄 An accompanying PDF is available (innv-cosmetics-dataset-for-retail.pdf) to indicate the subsets validated by our quality team.

‍

🔗 Source: Hugging Face — Innovatiana Cosmetics Dataset

‍

Frequently Asked Questions

Is the dataset ready to use for training?

Partially. Some subsets are clean and usable, others need to be revised. The PDF file provided makes it possible to identify the recommended sections for the initial training.

Why was this dataset made public?

It is an approach of transparency and the valuation of the work done by our teams, in a context of customer litigation. By publishing it, we contribute to open science and highlight the realities of the AI data production chain.

Have the brands present validated this dataset?

No The original customer was not affiliated with any of the brands represented. Innovatiana claims no rights to visual content and acts only as a technical annotator.

Similar datasets

SMS Spam Collection

Text-to-Image 2M

GSM8K Platinum