Twitter Sentiment Analysis Dataset

The Twitter Sentiment Analysis dataset is a database that is widely used in NLP for opinion analysis tasks. It contains over a million tweets annotated according to their emotional tone: positive, negative, or neutral.

Download dataset

Size

Approximately 1.6 million annotated tweets, in CSV format

Licence

Use subject to the Twitter API terms of use. Verification required for commercial uses

Description

‍
The Twitter Sentiment dataset includes:

1.6 million text tweets annotated in English
Three classes: positive, negative, neutral
A CSV format that can be easily used in NLP pipelines
Optional metadata (depending on the version): ID, date, username, etc.

‍

What is this dataset for?

‍
This dataset is commonly used to:

Training models for the classification of feelings on short texts
Analysis of trends and opinions on social networks
Online reputation monitoring (brand monitoring)
The improvement of moderation systems, recommendations or summaries of opinions

‍

Can it be enriched or improved?

‍
Yes, despite its size, this dataset can be enriched:

Addition of emotional subcategories (joy, anger, surprise, etc.)
Integration of contextual data (hashtags, emojis, images)
Creation of thematic filters (politics, sport, health...)
Translation or adaptation for multilingual analyses

‍

🔗 Source: Twitter Sentiment Dataset

‍

Frequently Asked Questions

Are the tweets in the dataset still available?

Not necessarily. Some may have been removed or made private. It is recommended to double-check their availability before use.

Can this dataset be used in a commercial context?

That depends on the terms of use of the Twitter API. It is imperative to consult the platform's policy before commercial exploitation.

Are there newer alternatives?

Yes, other datasets like TweetEval or Sentiment140 offer variants, sometimes enriched or more recent, for similar uses.

Similar datasets

Text

MidJourney Detailed Prompts

Text

Consumer Complaints Dataset

Medical

CHexpert Dataset