Twitter Sentiment Analysis Dataset
The Twitter Sentiment Analysis dataset is a database that is widely used in NLP for opinion analysis tasks. It contains over a million tweets annotated according to their emotional tone: positive, negative, or neutral.
Approximately 1.6 million annotated tweets, in CSV format
Use subject to the Twitter API terms of use. Verification required for commercial uses
Description
The Twitter Sentiment dataset includes:
- 1.6 million text tweets annotated in English
- Three classes: positive, negative, neutral
- A CSV format that can be easily used in NLP pipelines
- Optional metadata (depending on the version): ID, date, username, etc.
What is this dataset for?
This dataset is commonly used to:
- Training models for the classification of feelings on short texts
- Analysis of trends and opinions on social networks
- Online reputation monitoring (brand monitoring)
- The improvement of moderation systems, recommendations or summaries of opinions
Can it be enriched or improved?
Yes, despite its size, this dataset can be enriched:
- Addition of emotional subcategories (joy, anger, surprise, etc.)
- Integration of contextual data (hashtags, emojis, images)
- Creation of thematic filters (politics, sport, health...)
- Translation or adaptation for multilingual analyses
🔗 Source: Twitter Sentiment Dataset
Frequently Asked Questions
Are the tweets in the dataset still available?
Not necessarily. Some may have been removed or made private. It is recommended to double-check their availability before use.
Can this dataset be used in a commercial context?
That depends on the terms of use of the Twitter API. It is imperative to consult the platform's policy before commercial exploitation.
Are there newer alternatives?
Yes, other datasets like TweetEval or Sentiment140 offer variants, sometimes enriched or more recent, for similar uses.