Optimizing the autonomous perception of vehicles through video annotation

Precision in the detection of pedestrians and mobile objects
ADAS algorithm calibration time
of annotated data ready for training per day
In the automotive industry, the race toward fully autonomous vehicles is one of the most ambitious technological challenges of our time. For a car to navigate safely without human intervention, it must be able to perceive and interpret its environment in real time. Every vehicle, pedestrian, traffic light, and road sign becomes a piece of critical information that must be detected, classified, and acted upon instantly.
Behind this capability lies not only advanced algorithms, but also enormous volumes of annotated video data. Perception systems in autonomous driving cannot function reliably without training on datasets that reflect the complexity of real-world traffic: changing weather, varying lighting, occlusions, and unpredictable human behavior. The quality of these datasets often makes the difference between a system that functions in the lab and one that performs safely on the road.
The Mission
The primary objective of Innovatiana’s project was to create a training dataset for the detection and classification of road objects—from cars and trucks to pedestrians, cyclists, traffic lights, and road signs—using continuous video streams captured in real driving conditions. Unlike static image datasets, videos offer the advantage of contextual understanding and motion tracking, but they also present additional annotation challenges.
To address these, the mission was structured around two key pillars:
- Frame-by-frame annotation with bounding boxes and polygons
Each object appearing in the video sequences had to be annotated individually, frame by frame. Bounding boxes were used for efficiency, while polygonal annotations were applied in cases requiring fine-grained accuracy (for example, irregular shapes like pedestrians in motion, cyclists with bikes, or complex traffic signs). This level of detail ensures that perception algorithms learn not only to recognize objects, but also to understand their contours and interactions. - Rigorous quality control for temporal and spatial consistency
Annotating video sequences introduces unique challenges: an object must be tracked consistently across multiple frames, even if it partially disappears due to occlusion or changes in perspective. Innovatiana deployed a multi-step quality control process to ensure annotations remained temporally coherent (the same object kept the same ID throughout the video) and spatially precise (bounding boxes aligned accurately with object edges at every frame). This consistency is essential for training robust tracking and detection systems.
Innovatiana’s Approach
To execute this mission, Innovatiana mobilized a specialized team of annotators with expertise in computer vision and traffic scene understanding. Annotators received domain-specific training to recognize not only obvious categories like cars and pedestrians but also subtle elements such as partially hidden signs, damaged road markings, or traffic lights seen from oblique angles.
The process was supported by a custom annotation workflow tailored for large-scale video data:
- Automated pre-labeling was introduced using baseline object detection models, which provided initial bounding boxes. Annotators then refined these suggestions, significantly accelerating throughput while maintaining accuracy.
- Cross-validation between annotators ensured inter-annotator agreement, reducing subjectivity in ambiguous cases (e.g., when deciding whether a distant blurred object was a pedestrian or a lamppost).
- Systematic audits were built into the workflow, with random sampling of annotated frames subjected to secondary review, ensuring error detection and correction at scale.
This hybrid approach, combining human expertise with semi-automated tools, struck a balance between efficiency and precision.
👉 Read our article on ADAS annotation : Learn how accurate video annotation enhances the intelligence of autonomous vehicles.