By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
Clothing Fit Dataset for Size Recommendation
Text

Clothing Fit Dataset for Size Recommendation

Enriched customer feedback dataset to predict if a size is right, too small, or too big. Includes notes, reviews, measurements, categories.

Download dataset
Size

82,790 entries in JSON format (40 MB), structured customer-product data with text

Licence

CC BY 4.0

Description

Clothing Fit Dataset for Size Recommendation brings together more than 82,000 customer reviews concerning clothing from two major e-commerce platforms. It contains information on ratings, text comments, customer and product measurements, and feedback on the fit (too small, perfect, too big). This rich corpus makes it possible to train models to improve the customer experience in online fashion.

What is this dataset for?

  • Develop a customised recommendation system for e-commerce sites
  • Building a “fit” classification model based on textual opinions
  • Create automatic summary or feeling analysis models based on reviews

Can it be enriched or improved?

Yes. It is possible to cross-reference this dataset with demographic information or product images. Enrichments can also include the analysis of feelings, the linguistic standardization of reviews or the extension to other brands or regions. The JSON format allows easy handling and advanced preprocessing.

🔎 In summary

Criterion Evaluation
🧩Ease of Use ⭐⭐⭐☆☆ (Requires some preprocessing – textual + numerical data)
🧼Cleaning Required ⭐☆☆☆☆ (High: sparsity, noise in text fields)
🏷️Annotation Richness ⭐⭐⭐☆☆ (Complete: feedback, ratings, categories, customer metrics)
📜Commercial License ✅ Yes (CC BY 4.0)
👨‍💻Beginner Friendly 👨‍🎓 Useful for capstone projects with guidance
🔁Reusable for Fine-Tuning 🔥 Excellent base for NLP applied to e-commerce
🌍Cultural Diversity 🌐 US-focused data, but extendable to other markets

🧠 Recommended for

  • Draft recommendations
  • E-commerce startups
  • Marketing analysis

🔧 Compatible tools

  • Python (pandas, scikit-learn)
  • TensorFlow
  • Hugging Face
  • LightGBM

💡 Tip

Consider grouping similar products together to smooth out the effects of sparsity before training.

Frequently Asked Questions

Does the dataset contain images or only text?

This dataset contains only structured and textual data, without images.

Are the sizes standardized across the dataset?

Yes, the sizes have been converted to a unified numerical scale to facilitate modeling.

Can this dataset be used to create a virtual shopping assistant?

Absolutely, it is well suited to training a recommendation model based on user experience.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.