By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
E-commerce Text Classification
Text

E-commerce Text Classification

This dataset contains more than 50,000 product descriptions from e-commerce sites, divided into 4 categories: Electronics, Books, Home and Clothing. It is ideal for automatic text classification tasks.

Download dataset
Size

50,425 text entries in CSV, 4 classes

Licence

Attribution 4.0 International (CC BY 4.0)

Description

The dataset E-commerce Text Classification is a corpus of 50,425 text entries associated with four main product categories: Electronics, Books, Home, Clothing & Accessories. Each line contains a product description along with its target category, allowing for effective supervised learning.

What is this dataset for?

  • Train NLP models to classify products according to their description
  • Set up an automatic categorization engine in an e-commerce platform
  • Testing supervised text classification algorithms

Can it be enriched or improved?

Yes. It is possible to add sub-categories, to integrate metadata (prices, reviews, etc.), or to use paraphrasing techniques to increase the linguistic diversity of the corpus. Multilingual models can also be tested by translating data.

🔎 In summary

Criterion Evaluation
🧩 Ease of use⭐⭐⭐⭐⭐ (CSV ready-to-use)
🧼 Need for cleaning⭐⭐⭐⭐⭐ (Low – well-structured text)
🏷️ Annotation richness⭐⭐⭐✩✩ (Medium – simple binary classification)
📜 Commercial license✅ Yes (CC BY 4.0)
👨‍💻 Beginner friendly🌟 Very suitable for supervised learning
🔁 Fine-tuning ready🎯 Compatible with BERT, RoBERTa, etc.
🌍 Cultural diversity⚠️ Limited – typical e-commerce descriptions

🧠 Recommended for

  • NLP beginner
  • E-commerce prototyping
  • Benchmark text classification

🔧 Compatible tools

  • Scikit-learn
  • SpacY
  • Hugging Face Transformers
  • FastText

💡 Tip

Use contextual embeddings to improve the performance of your classifier.

Frequently Asked Questions

Is this dataset suitable for multi-category classification?

No, each description is associated with only one category among the four proposed, making it a simple classification dataset.

Can this dataset be used to train a multilingual model?

Yes, by translating the descriptions into several languages, you can adapt the dataset to multilingual NLP tasks.

Does the dataset contain additional product metadata?

No, it only contains descriptions and associated categories. Other data can be added manually to enrich the corpus.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.