E-commerce Text Classification
This dataset contains more than 50,000 product descriptions from e-commerce sites, divided into 4 categories: Electronics, Books, Home and Clothing. It is ideal for automatic text classification tasks.
50,425 text entries in CSV, 4 classes
Attribution 4.0 International (CC BY 4.0)
Description
The dataset E-commerce Text Classification is a corpus of 50,425 text entries associated with four main product categories: Electronics, Books, Home, Clothing & Accessories. Each line contains a product description along with its target category, allowing for effective supervised learning.
What is this dataset for?
- Train NLP models to classify products according to their description
- Set up an automatic categorization engine in an e-commerce platform
- Testing supervised text classification algorithms
Can it be enriched or improved?
Yes. It is possible to add sub-categories, to integrate metadata (prices, reviews, etc.), or to use paraphrasing techniques to increase the linguistic diversity of the corpus. Multilingual models can also be tested by translating data.
🔎 In summary
🧠 Recommended for
- NLP beginner
- E-commerce prototyping
- Benchmark text classification
🔧 Compatible tools
- Scikit-learn
- SpacY
- Hugging Face Transformers
- FastText
💡 Tip
Use contextual embeddings to improve the performance of your classifier.
Frequently Asked Questions
Is this dataset suitable for multi-category classification?
No, each description is associated with only one category among the four proposed, making it a simple classification dataset.
Can this dataset be used to train a multilingual model?
Yes, by translating the descriptions into several languages, you can adapt the dataset to multilingual NLP tasks.
Does the dataset contain additional product metadata?
No, it only contains descriptions and associated categories. Other data can be added manually to enrich the corpus.




