CameraBench

The CameraBench dataset aims to better understand camera movements in videos. It includes more than 1,000 manually annotated clips, making it possible to evaluate the performance of generative and discriminatory models in a multimodal vision.

Download dataset

Size

Approximately 1,071 video clips, 87 KB in parquet format

Licence

MIT

Description

‍

CameraBench is a set of annotated videos for studying camera movements and evaluating multimodal models. It includes clips annotated with expert labels and captions to test the geometry and semantics captured by various models.

‍

What is this dataset for?

‍

Evaluate the performance of multimodal vision models (VLMs) on video perception tasks
Analyzing and understanding camera movements in video sequences
Facilitate fine-tuning and improvement of multimodal models for video tasks

‍

Can it be enriched or improved?

‍

Yes, the dataset can be enriched by additional annotations or by adding new video clips to increase the diversity of movements and contexts. The fine-tuning of specific models is encouraged to improve multimodal perception.

‍

🔎 In summary

Criterion	Evaluation
🧩 Ease of use	⭐⭐⭐⭐✩ (Clean dataset, lightweight parquet format)
🧼 Need for cleaning	⭐⭐⭐⭐⭐ (Low, reliable annotations)
🏷️ Annotation richness	⭐⭐⭐⭐✩ (Good – expert labels and captions)
📜 Commercial license	✅ Yes (MIT)
👨‍💻 Beginner friendly	✅ Yes, moderate size dataset
🔁 Fine-tuning ready	✅ Perfect for multimodal video models
🌍 Cultural diversity	⚠️ Not specified, varied clips