E-commerce Text Classification

This dataset contains more than 50,000 product descriptions from e-commerce sites, divided into 4 categories: Electronics, Books, Home and Clothing. It is ideal for automatic text classification tasks.

Download dataset

Size

50,425 text entries in CSV, 4 classes

Licence

Attribution 4.0 International (CC BY 4.0)

Description

‍

The dataset E-commerce Text Classification is a corpus of 50,425 text entries associated with four main product categories: Electronics, Books, Home, Clothing & Accessories. Each line contains a product description along with its target category, allowing for effective supervised learning.

‍

What is this dataset for?

‍

Train NLP models to classify products according to their description
Set up an automatic categorization engine in an e-commerce platform
Testing supervised text classification algorithms

‍

Can it be enriched or improved?

‍

Yes. It is possible to add sub-categories, to integrate metadata (prices, reviews, etc.), or to use paraphrasing techniques to increase the linguistic diversity of the corpus. Multilingual models can also be tested by translating data.

‍

🔎 In summary

Criterion	Evaluation
🧩 Ease of use	⭐⭐⭐⭐⭐ (CSV ready-to-use)
🧼 Need for cleaning	⭐⭐⭐⭐⭐ (Low – well-structured text)
🏷️ Annotation richness	⭐⭐⭐✩✩ (Medium – simple binary classification)
📜 Commercial license	✅ Yes (CC BY 4.0)
👨‍💻 Beginner friendly	🌟 Very suitable for supervised learning
🔁 Fine-tuning ready	🎯 Compatible with BERT, RoBERTa, etc.
🌍 Cultural diversity	⚠️ Limited – typical e-commerce descriptions