By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
Chinese Sentiment Analyze
Text

Chinese Sentiment Analyze

Chinese dataset combining reviews from e-commerce and social publications (Weibo), useful for the automatic detection of feelings (positive, neutral, negative).

Download dataset
Size

Text data in Chinese (reviews + social networks), JSON/CSV format, 182762 examples

Licence

MIT

Description

Chinese Sentiment Analyze is a data set combining two main sources: product reviews (Shopping Reviews) and messages from the Weibo platform. It is designed for the analysis of feelings in Chinese, allowing classification into categories such as positive, neutral, or negative.

What is this dataset for?

Can it be enriched or improved?

Yes. We can complete this corpus with other areas of opinion (politics, movies, public services) or refine the labels of feelings (level of intensity, specific emotion). A parallel translation or a segmentation by theme would also reinforce the linguistic and application interest of the dataset.

🔎 In summary

Criterion Evaluation
🧩Ease of Use ⭐⭐⭐☆☆ (Data easy to load via Hugging Face)
🧼Cleaning Required ⭐⭐⭐☆☆ (Low — depends on splits, but data generally ready to use)
🏷️Annotation Richness ⭐⭐⭐☆☆ (Sentiment labeled — binary or ternary depending on version)
📜Commercial License ✅ Yes (MIT)
👨‍💻Ideal for Beginners 👩‍💻 Yes — great for getting started with sentiment analysis
🔁Reusable for Fine-tuning 🔥 Perfect for fine-tuning a Chinese BERT classifier
🌍Cultural Diversity 🌏 Good — data from authentic Chinese platforms

🧠 Recommended for

  • Chinese NLP projects
  • Opinion analysis on social networks
  • Multilingual models

🔧 Compatible tools

  • PyTorch
  • Hugging Face Transformers
  • SpacY
  • FastText

💡 Tip

If you want to combine this corpus with data from other languages, be sure to balance the proportions to avoid language bias during fine-tuning.

Frequently Asked Questions

How many feeling labels are available in this dataset?

It depends on the version: some annotations are binary (positive/negative), others include a neutral class for a trinary classification.

Can this dataset be used for long texts?

For the most part, the texts are short to medium (reviews, posts), but the dataset can be completed with longer data if necessary.

Can it be used to train a business model?

Yes, the MIT license allows unrestricted commercial use, including in distributed products.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.