Chinese Sentiment Analyze
Chinese dataset combining reviews from e-commerce and social publications (Weibo), useful for the automatic detection of feelings (positive, neutral, negative).
Text data in Chinese (reviews + social networks), JSON/CSV format, 182762 examples
MIT
Description
Chinese Sentiment Analyze is a data set combining two main sources: product reviews (Shopping Reviews) and messages from the Weibo platform. It is designed for the analysis of feelings in Chinese, allowing classification into categories such as positive, neutral, or negative.
What is this dataset for?
- Training NLP models for the classification of feelings in Mandarin
- Develop opinion analysis tools for commercial or social applications
- Testing the robustness of multilingual models on everyday Chinese texts
Can it be enriched or improved?
Yes. We can complete this corpus with other areas of opinion (politics, movies, public services) or refine the labels of feelings (level of intensity, specific emotion). A parallel translation or a segmentation by theme would also reinforce the linguistic and application interest of the dataset.
🔎 In summary
🧠 Recommended for
- Chinese NLP projects
- Opinion analysis on social networks
- Multilingual models
🔧 Compatible tools
- PyTorch
- Hugging Face Transformers
- SpacY
- FastText
💡 Tip
If you want to combine this corpus with data from other languages, be sure to balance the proportions to avoid language bias during fine-tuning.
Frequently Asked Questions
How many feeling labels are available in this dataset?
It depends on the version: some annotations are binary (positive/negative), others include a neutral class for a trinary classification.
Can this dataset be used for long texts?
For the most part, the texts are short to medium (reviews, posts), but the dataset can be completed with longer data if necessary.
Can it be used to train a business model?
Yes, the MIT license allows unrestricted commercial use, including in distributed products.