Ego4D Video — Embodied planning dataset
Dataset derived from Ego4D containing first-person videos associated with natural language instructions generated automatically and then verified manually. It is designed for embodied planning and multi-modal reasoning tasks.
Hundreds of hours of egocentric videos + text instructions, video formats + JSON
Apache 2.0
Description
Ego4D Video is a multimodal dataset combining self-centered videos with detailed step-by-step instructions. It is based on the famous Ego4D dataset, by selecting relevant sequences enriched with language descriptions generated automatically and then verified by humans. This dataset is ideal for training embodied planning, navigation, or comprehension models in real context.
What is this dataset for?
- Train vision-language models to follow instructions in complex environments
- Testing multimodal reasoning skills through embodied planning
- Develop autonomous agents capable of interacting with the real world by following instructions
Can it be enriched or improved?
Yes, it is possible to add new videos, expand the types of tasks represented, or include additional annotations (objects, actions, locations). The structure also allows the addition of multilingual translations or user feedback to refine the instructions.
🔎 In summary
🧠 Recommended for
- Robotics researchers
- AI planning
- Incarnate VLMs
🔧 Compatible tools
- PyTorch
- OpenCV
- Hugging Face Datasets
- CLIP
- VideoMAE
💡 Tip
Use video-instruction correspondence to train a step-by-step planning model with fine supervision.
Frequently Asked Questions
What is the difference between the original Ego4D and this dataset?
This dataset selects specific segments of Ego4D and enriches them with detailed and validated language instructions.
Can this dataset be used for autonomous navigation?
Yes, it is particularly suited to embodied navigation and instruction tracking tasks in a real context.
Do you need advanced skills to use it?
A good command of video processing and multimodal models is recommended to use it effectively.