Knowledge

Dataset for pedestrian detection: the best resources to train your AI

Written by

Aïcha

Published on

2025-08-22

Reading time

min

Pedestrian detection is a major challenge in artificial intelligence, especially in the fields of urban surveillance, autonomous vehicles and accident prevention. Computer vision models capable of accurately identifying the presence of pedestrians are based on specific databases a.k.a. datasets.

‍

A dataset for pedestrian detection is a collection of pictures annotated that allows machine learning algorithms to be trained and evaluated. These data are essential for teaching models to recognize human figures in different environments and under various conditions (lighting, weather, pedestrian density)... for “security” use cases, of course respectful of personal data!

‍

In this article, we are going to pay attention to the best datasets available to train an AI to detect pedestrians accurately. We will also see how these bases of data are structured and how to use them to improve the performance of your recognition models of pictures.

‍

Data and annotations: the key to a powerful dataset

‍

The effectiveness of a pedestrian detection model depends directly on the quality of data used for his training. A well-structured dataset that is accurately annotated and covers a wide range of situations makes it possible to improve the reliability of the algorithm. On the other hand, data incomplete or incorrectly annotated can cause detection errors, compromising model performance.

‍

Why is data quality critical in AI?

A pedestrian detection model is based on data annotated accurately. Poor annotation or an unbalanced dataset can cause errors in pedestrian recognition, increasing false positives or oversights. An effective dataset should include data varied, covering different pedestrian positions, weather conditions and environments (urban areas, roads, sidewalks, pedestrian crossings).

‍

Source : https://www.researchgate.net/figure/Qualitative-Result-Comparison-on-ETH-and-UCY-dataset_fig3_351103717 — ***Source: ResearchGate***

‍

Types of annotations used in datasets for pedestrian detection

Bounding boxes : Delineation of pedestrians with rectangles to locate them in the image.
Semantic segmentation : More precise identification of the contours of pedestrians.
Keypoints : Joint detection to analyze postures and movements.

‍

Rigorous annotation of data is essential for improving the robustness of models and reducing the biases associated with insufficient sampling.

‍

Criteria for choosing a suitable dataset

‍

Training of a high-performance model is based on the use of a fit for purpose dataset. A good pedestrian detection dataset must contain diversified pictures, data well annotated and a broad coverage of possible scenarios. Some datasets specialize in urban environments, while others include night or thermal shots.

‍

Source : https://www.researchgate.net/figure/YOLO-Object-detection-process_fig1_336622048 — ***Source: ResearchGate***

‍

The choice of a dataset depends on several factors:

‍

Size and diversity (image)

The more a dataset contains pictures, in different formats, the more the model can generalize its predictions. A balanced dataset should include varied scenes (roads, sidewalks, car parks, pedestrian crossings).

‍

Accuracy of annotations

Annotations should be detailed and consistent, with different types of markings (bounding boxes, segmentation).

‍

Accessibility

Some datasets are open access, while others require authorization or subscription.

‍

Specificity of the context

A dataset can be optimized for a particular type of detection (pedestrians in an urban environment, pedestrians seen from a drone, thermal detection, etc.).

‍

Best datasets for pedestrian detection

‍

Here are 10 of the best datasets available for the detection of pedestrians, classified according to their content, their annotations and their specific applications.

‍

1. Caltech Pedestrian Dataset

Description: This dataset is one of the most popular for training and evaluating pedestrian detection models. It was captured with a sensor embedded in a vehicle traveling the streets of Los Angeles.

‍

Contents:

Environ 250,000 images excerpted from videos, with 350,000 pedestrian instances.
Annotations in Bounding Boxes, with classes according to the level of occultation of pedestrians (whole, partially hidden, highly hidden).

‍

Apps: Used primarily for training autonomous vehicle vision models and urban surveillance systems.

‍

2. CityPersons Dataset

Description: An extension of the Cityscapes dataset, it focuses specifically on the detection of pedestrians in urban areas.

‍

Contents:

More than 5,000 images high resolution taken in several European cities.
35,000 annotations Pedestrian accuracy with bounding boxes and detailed labels (adult, child, cyclist, partially hidden).

‍

Apps: Ideal for recognizing pedestrians in dense environments with a wide variety of urban situations.

‍

3. EuroCity Persons Dataset

Description: This European dataset offers a wide variety of climatic and environmental conditions, allowing robust training of the models.

‍

Contents:

More than 47,000 images captured in various European cities.
Diversity of scenes: rain, fog, snow, snow, sun, night.
Detailed annotations including the position, size, and visibility of pedestrians.

‍

Apps: Adapted to models requiring increased robustness in the face of climatic variations and changes in lighting.

‍

4. KAIST Multispectral Pedestrian Dataset

Description: Designed for the detection of pedestrians in low light conditions, this dataset integrates thermal and visible images.

‍

Contents:

95,000 images with double capture in the visible and infrared spectrum.
Detailed annotations with bounding boxes aligned to both spectra.

‍

Apps: Essential for autonomous vehicles and video surveillance at night or in low visibility environments.

‍

5. Tsinghua-Daimler Cyclist and Pedestrian Detection Dataset (TDC-PED)

Description: Captured in China and Germany, this dataset focuses on the detection of pedestrians and cyclists in a context of autonomous driving.

‍

Contents:

100,000 images captured from vehicles in traffic.
Precise annotation of pedestrians and cyclists from various positions and angles.

‍

Apps: Designed for the recognition of pedestrians and cyclists in real traffic conditions.

‍

6. INRIA Person Dataset

Description: One of the first datasets dedicated to pedestrian detection, often used to test computer vision algorithms.

‍

Contents:

1,800 pictures high resolution with detailed annotations.
Urban and leisure scenes, with different pedestrian postures.

‍

Apps: Used for the initial development of pedestrian recognition models.

‍

7. Penn-Fudan Pedestrian Dataset

Description: A smaller but well-annotated dataset, useful for quick experiments and advanced segmentation.

‍

Contents:

170 pictures high resolution with precise annotations in bounding boxes and semantic segmentation.

‍

Apps: Ideal for quick tests or models that require detailed segmentation of pedestrians.

‍

8. MOT17 Dataset (Multiple Object Tracking)

Description: A dataset oriented to the monitoring of pedestrians through several successive images.

‍

Contents:

Video footage captured in an urban environment.
Detailed annotations for frame-by-frame pedestrian tracking.

‍

Apps: Adapted to models that require real-time monitoring capability, such as automated monitoring.

‍

9. CUHK Square Dataset

Description: This dataset was designed to analyze the behavior of pedestrians in public spaces.

‍

Contents:

10,000 images captured in public squares with a high density of pedestrians.
Annotations allowing the detection and recognition of individuals in crowds.

‍

Apps: Used for the analysis of movements and the detection of abnormal behaviors in urban areas.

‍

10. LIP (Look Into Person) Dataset

Description: This dataset contains detailed annotations to detect pedestrians in terms of clothing and postures.

‍

Contents:

50,000 images with annotations of body parts and clothing.
Fine detection of silhouettes and segmentation of the various parts of pedestrians.

‍

Apps: Used for models that require a detailed understanding of pedestrians, especially for fashion or facial recognition applications.

‍

Conclusion

‍

Pedestrian detection in artificial intelligence is based on the use of datasets varied and well annotated. The choice of the dataset is essential to guarantee the results, in particular the precision and robustness of the models in the face of various real conditions: urban environment, low light, changing weather, pedestrian density. The datasets presented in this article offer a variety of scenarios adapted to the specific needs of each project, whether it is surveillance, autonomous vehicles or behavioral analysis.

‍

Continuous improvement of datasets and annotation techniques plays a key role in the development of more efficient and reliable models. Careful data selection and appropriate training make it possible to optimize detection and improve the security of AI-based systems.

‍

Use case: how ANPR data optimizes license plate recognition

Discover the ADAS annotation: the fuel for autonomous driving systems

From annotation to intelligence: discover how ADAS systems learn to interpret the road to revolutionize mobility

Dataset for vehicle detection: what data sets for a powerful AI?

AI vehicle detection: how to choose a relevant dataset for effective models adapted to real conditions?

Dataset for pedestrian detection: the best resources to train your AI

Data and annotations: the key to a powerful dataset

Why is data quality critical in AI?

Types of annotations used in datasets for pedestrian detection

Criteria for choosing a suitable dataset

Size and diversity (image)

Accuracy of annotations

Accessibility

Specificity of the context

Best datasets for pedestrian detection

1. Caltech Pedestrian Dataset

2. CityPersons Dataset

3. EuroCity Persons Dataset

4. KAIST Multispectral Pedestrian Dataset

5. Tsinghua-Daimler Cyclist and Pedestrian Detection Dataset (TDC-PED)

6. INRIA Person Dataset

7. Penn-Fudan Pedestrian Dataset

8. MOT17 Dataset (Multiple Object Tracking)

9. CUHK Square Dataset

10. LIP (Look Into Person) Dataset

Conclusion

You may like

Use case: how ANPR data optimizes license plate recognition

Discover the ADAS annotation: the fuel for autonomous driving systems

Dataset for vehicle detection: what data sets for a powerful AI?