Sentiment Analysis for Mental Health
Text dataset compiling statements from multiple sources (social networks, forums) annotated according to 7 mental health states (normal, depression, suicidal, anxiety, stress, stress, bipolarity, personality disorder). Intended to train AI models for emotional analysis and mental health chatbots.
Approximately 51,000 text statements annotated into 7 categories, CSV/JSON format
Open Database License (ODbL) or equivalent free license (please check before using)
Description
Sentiment Analysis for Mental Health dataset brings together more than 51,000 textual statements from various platforms (Reddit, Twitter, etc.), annotated according to 7 categories of mental states. It provides a rich and diverse corpus for the understanding of psychological disorders through automatic language processing.
What is this dataset for?
- Train models for the classification of mental states from text.
- Develop intelligent chatbots for psychological support.
- Conduct emotional analyses to detect mental health trends and crises.
Can it be enriched or improved?
Yes, it is possible to improve the granularity of annotations, to add contextual metadata, or to extend the corpus with other sources. Data cleaning and bias management are essential for optimal use.
🔎 In summary
🧠 Recommended for
- AI mental health researchers
- Chatbot developers
- Sentimental analysts
🔧 Compatible tools
- Hugging Face Transformers
- spaCy
- Scikit-learn
- Rasa
💡 Tip
Use textual data augmentation methods to improve the robustness of models.
Frequently Asked Questions
Does this dataset make it possible to detect suicide risks automatically?
Yes, it includes a specific “Suicidal” category to model the early detection of risks.
⚠️ Important Disclaimer: While this datasets can be used to help identify potential early warning signs of self-harm or suicidal ideation, it is not a substitute for professional evaluation or emergency services. Its outputs are experimental and cannot guarantee accurate or comprehensive detection of all risks. If you or someone you know may be at risk, please seek immediate help from qualified mental health professionals or emergency services rather than relying on the model alone.
Does the diversity of sources impact the quality of the data?
Yes, the variety of platforms requires thorough cleaning to avoid biases related to specific contexts.
Is this dataset suitable for commercial use?
The Open Database license is generally permissive, but it is important to check the exact terms according to the project and use.