En cliquant sur "Accepter ", vous acceptez que des cookies soient stockés sur votre appareil afin d'améliorer la navigation sur le site, d'analyser son utilisation et de contribuer à nos efforts de marketing. Consultez notre politique de confidentialité pour plus d'informations.
Video

UCF101

UCF101 is an open source dataset that is a reference in the field of video analysis. It includes more than 13,000 clips representing various human actions such as running, jumping, cooking or playing sports. It is one of the most used benchmarks for training and evaluating action recognition models.

Download dataset
Size

13320 videos classified into 101 categories of human actions, AVI format

Licence

Free for academic use, licensed under Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0)

Description


The dataset contains:

  • 13,320 short videos (about 7 seconds on average)
  • 101 action classes (sports, daily actions, social interactions...)
  • Videos from YouTube, with a realistic, unfiltered background
  • 25 groups for a standardized division in training/testing
  • Video data in AVI format, 320×240 pixels at 25 fps

Each video shows a single main action, making the supervised classification task easier.

What is this dataset for?


UCF101 is used for:

  • Training human action recognition models (CNN 3D, RNN, Video Transformers)
  • Validation of embedded vision systems (robots, security cameras, etc.)
  • Pre-training video models that are then used to detect events
  • Research on space-time processing architectures (SlowFast, TimesFormer, VideoMAE)
  • Behavioral analysis in a general public or surveillance context

Can it be enriched or improved?


Yes, in particular via:

  • The addition of finer annotations (multi-actions, exact timeframe)
  • Conversion to HDF5 or TFRecord to speed up ingestion
  • Training temporal segmentation or multi-label detection models
  • Cross-referencing audio or text data for multimodal approaches

🔗 Source: UCF101 Dataset (official)

Frequently Asked Questions

Does UCF101 contain sound?

No, the videos are silent. Combining with other datasets like Kinetics is recommended if you are looking for an audio component.

Is the dataset suitable for real-time detection?

Partially. The videos are short and well-cut, which is great for classification. For real-time detection, adaptations or a dataset like ActivityNet are preferable.

Is there a newer or extended version?

Yes. The HMDB51 dataset is more difficult (fewer examples, more noise), and Kinetics-600/700 offers a larger volume for similar tasks.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.