Knowledge

Discover YOLO v9: understanding YOLO, the most popular object detection algorithm

Written by

Nicolas

Published on

2024-03-02

Reading time

min

Object detection is a fundamental task in Computer Vision : it allows artificial intelligences to locate and classify objects present in images or videos. The ability to accurately detect objects has numerous applications, ranging from self-driving cars to surveillance systems. In recent years, an algorithm has gained popularity for its exceptional performance in object detection : You Only Look Once (YOLO). But what do you know about this algorithm and what understanding do you have of it?

‍

💡 You have no idea? Don't worry, this article is here to explain what YOLO is, its importance in the world of AI and its different versions. After reading this, you will have a good understanding of YOLO and its applications. Let's go!

‍

Object detection algorithms: what are they?

‍

Object detection algorithms are computer programs designed to identify and locate objects in an image or video. These powerful detection algorithms can identify multiple objects and classify them into different categories.

‍

A popular example of an object detection algorithm is YOLO (You Only Look Once), which quickly processes images in real time, making it highly effective for applications such as traffic monitoring and control. Another example is the family R-CNN (Regions with Convolutional Neural Networks), which includes Fast R-CNN and Faster R-CNN, recognized for their precision in detecting a single object or several objects by first proposing regions and then classifying them.

‍

With advances in artificial intelligence (Deep Learning), these algorithms are constantly improving, becoming faster and more accurate, and playing an essential role in the development of technologies such as autonomous vehicles, where they contribute to automating a system for detecting obstacles on the road, for example.

‍
What is YOLO, how important is it in AI?

‍

We saw it, YOLO, or “You Only Look Once,” is a special tool that helps computers quickly and accurately see things in images, text files, or videos.

‍

Created by the experts Joseph Redmon and Ali Farhadi In 2015, YOLO is faster than older tools because it analyzes the entire image at once. This quick check allows YOLO to quickly identify if there are other objects, such as cars, trees, or animals, and where they are in the image.

‍

The importance of YOLO is enormous for AI, especially in the development of advanced products such as autonomous vehicles. For self-driving cars, YOLO can function like the eyes of the car, quickly spotting things on the road to avoid accidents. Also, embedded in smart cameras, YOLO can contribute to improving video surveillance by automatically detecting unusual behaviors, for example in airports or shopping centers. This means that if someone leaves a backpack alone, YOLO can let the security team know immediately via a notification.

‍

The creators of YOLO continue to update the algorithm to continuously improve it; there are numerous versions, from YOLOv1 to YOLOv9 (the most recent, released in February 2024), each new version being faster and more accurate. YOLO has become very popular because it gives machines superpowers to see and understand the world quickly and locate objects for a multitude of real-world applications.

‍

How to Prepare Data to Train Your YOLO Models?

Call on our annotators for your most complex data labeling tasks, and improve your data quality to achieve up to 99% accuracy! Start collaborating with our Data Labelers today.

‍

How does YOLO work?

‍

Here's how the YOLO (You Only Look Once) object detection algorithm works, explained in simple steps:

‍

1. Take a photo

First, the YOLO algorithm starts with an image, just like when you take a photo with a camera. This is what we call object detection based on image classification !

‍

2. Divide the image

Then, it divides the given image into small squares, like a checkerboard. Each square will be checked to see if it contains an object (a cat, a dog, or even a can of food, for example).

‍

3. Look for clues

For each square, YOLO looks for clues or characteristics such as edges, shapes, or textures that could indicate what object is there. He surrounds them with encompassing boxes. Since YOLO needs to learn to fully understand and interpret a new data set, he has sometimes been given a reference data set (or “ground truth”) that he can draw on to get points of comparison.

‍

4. Making predictions

The algorithm makes an assumption for each square in an image: what object could it be and where exactly is it in the square? It gives each assumption a score to show its level of certainty.

‍

5. Eliminate surpluses

Some squares have guesses of different objects that overlap, like two squares guessing part of the same car. YOLO chooses the best guess for each object, getting rid of superfluous assumptions.

‍

6. Show what he found

At the end, YOLO shows you where it thinks each object is by drawing boxes around them and labeling them, like “car” or “tree.” If you give him 1,000 images containing dogs and cats, and tell him to identify the cats, he will show you metadata-rich images pointing to the cats.

‍

💡 The strength of YOLO is that it looks at all the elements of an image (broken down into “squares”) at the same time. That's why It is fast and can even work in real time, which is extremely useful for applications that require quick reactions, such as autonomous cars or video surveillance!

‍

💡 Did You Know?

YOLO, short for "You Only Look Once", is one of the most popular object detection model architectures and algorithms. YOLO can predict both the class of an object and the bounding box that defines its location in the image in a single pass, making it ideal for real-time applications.

‍

YOLO vs. R-CNN: what's the difference (s)?

‍

Both YOLO and R-CNN are effective in identifying objects in images or videos, but they do it in different ways and for often different use cases. Here's how they differ in object detection processes!

‍

Vitesse

YOLO is very fast because it analyzes the entire image all at once. But R-CNN looks at parts of the image several times to find objects, which takes longer. Thus, the YOLO model offers more speed in detecting objects!

‍

Steps taken

YOLO divides the image into squares, guess what's in each, and eliminates superfluous guesswork. R-CNN starts by finding interesting parts of the image and then looks at those parts more closely to determine what's in them.

‍

Precision

R-CNN is very meticulous and accurate because it spends more time checking every part of the image. YOLO is faster, but sometimes it's not always as thorough as R-CNN.

‍

Use cases

YOLO is suitable when you need quick answers, like in a self-driving car that needs to make quick decisions. R-CNN is best when you need to be really sure what's in the image and have more time to check, for example if a medical image shows signs of illness.

‍

Criteria	YOLO	R-CNN
Speed	Faster	Slower
Method	Looks at an image in one single pass	Looks at image fragments multiple times
Accuracy	Less accurate but improving	More accurate
Best for	Real-time applications	Detailed analysis where responsiveness is not critical

Comparison Table: YOLO vs. R-CNN

‍

👉 Overall, using YOLO is like taking a quick look at a room and quickly finding most of the objects in it. Operating R-CNN is like taking the time to look at every nook and cranny of this room to make sure you don't miss a thing. These algorithms are both great for playing this game, but they play it differently!

‍

Evolution of object detection: from YOLO 1 to YOLO 9

‍

YOLO, an acronym for “You Only Look Once,” is a real-time object detection algorithm that has seen significant improvements since its inception. As a detector”One Shot“, it processes images and identifies objects by predicting bounding boxes and class probabilities in a single pass. Over time, YOLO has become more and more resilient and efficient, as illustrated very well by the latest publication by its authors:

‍

*Illustration of YOLO's performance from* ***GitHub*** *from its authors, tested on the MS COCO set. On the x-axis, the number of parameters; on the y-axis, the mean precision as a percentage.*

‍

YOLO V1

- The first release of YOLO revolutionized the AI/Computer Vision research community with its real-time object detection capabilities, offering much faster inference speeds than existing methods such as R-CNN.

- YOLO v1 divides the incoming image into a grid and predicts multiple bounding boxes and class probabilities for each grid cell.

- However, with this first version, precision was a compromise. YOLO then had trouble with small objects and produced numerous object location errors.

‍

YOLO V2 and V3

- Subsequent releases, such as YOLO v2 and v3, introduced notable improvements and new features like anchor boxes, using k-means clustering to predict more accurate bounding box coordinates.

- These versions also took advantage of batch standardization and the management of higher resolution input images, leading to significantly better detection performance on benchmarks such as data sets Pascal VOC and COCO.

‍

YOLO V4 and V5

- With the aim of achieving both high speed and high precision, YOLO v4 introduced features such as the Spatial pyramidal pooling and a more complex YOLO architecture based on cutting-edge convolutional networks.

- YOLO v5 focused on simplification and optimization, allowing it to run extremely quickly on less powerful hardware while maintaining high precision.

‍

YOLO V6 to V8

- The most recent versions of YOLO, starting with version 6, introduce continuous improvements focused on concrete YOLO applications, such as autonomous vehicles or video surveillance. The further you go in time, the more YOLO moves away from the research community to reach the general public and use cases applied to real life.

- These versions refined the use of deep learning techniques, including various forms ofIncrease in data and optimization algorithms that have helped improve average accuracy and the ability to detect a diverse range of object classes.

‍

YOLO V9

On February 21, 2024, Chien-Yao Wang, I-Hau Yeh and Hong-Yuan Mark Liao published the article ”YoLov9: Learn what you want to learn using Programmable Gradient Information“, which introduces a new computer vision model architecture: YoLoV9.

‍

YoLoV9 represents a major advance in the YOLO model series, offering significant improvements in accuracy and efficiency for the detection of objects in real time. It differs from its predecessors, in particular YOLOv8, by a 49% reduction in the number of parameters and 43% in computational complexity, while increasing the average accuracy on the MS COCO dataset by 0.6%.

‍

The YoloV9 series includes four models : YOLOV9-S (small), YOLOV9-m (medium), YOLOV9-c (compact), and YOLOV9-e (extended), each varying in terms of the number of settings and performances. These models are designed to meet a variety of needs, ranging from lightweight applications to more performance-intensive applications.

‍

YoLov9 introduces two major innovations:

- 1. The Programmable Gradient Information (PGI)‍

- 2. The Generalized Efficient Layer Aggregation Network (GELAN)

‍

The ERP is an auxiliary supervisory mechanism with three main components:

- 1. A main branch

- 2. A reversible auxiliary branch

- 3. Multi-level auxiliary information

‍

This structure helps mitigate information loss caused by information bottlenecks, a common problem in deep neural networks. The GELAN combines elements of CSPnet, known for its efficient planning of the gradient path, andELAN, which prioritizes inference speed, creating a versatile architecture that focuses on lightweight design, fast inference, and increased accuracy.

‍

In addition, YoloV9 is suitable for a variety of Computer Vision applications, including in the areas of logistics and distribution, autonomous vehicles, retail people counting, or sports analytics. These applications benefit from YoLov9's ability to detect objects in real time with great precision and efficiency.

‍

💡 In short, YoLov9 represents an important milestone in artificial intelligence research, reflecting the current dynamic of relentless pursuit to achieve and maintain leading status in the field. The developers of YoLov9 published the source code on GitHub, thus facilitating its adaptation to various Computer Vision tasks.

‍

Version	Improvements	Speed / Accuracy Trade-off	Applications
V1	Prediction by grid cell, single-shot method	Fast but less accurate	Basic real-time detection (research)
V2 & V3	Anchor boxes, batch normalization	Faster and more accurate	Various real-time applications
V4 & V5	Spatial pyramid pooling, optimizations	Balanced between speed and accuracy	Demanding environments, such as transportation
V6 to V8	Targeted optimizations, improved architectures	Highly accurate and real-time	Specialized applications, such as surveillance
V9	Improved small object detection, integration with other AI models, explainable AI	Enhanced accuracy and speed	Applications like medical imaging, autonomous driving, or industrial defect detection

Summary table of the different YOLO versions and their evolution

‍

As it evolved from YOLO v1 to v9, the YOLO family of object detection algorithms consolidated its position as a key tool in Computer Vision. With each release, YOLO has become more adept in its ability to detect objects of varying degrees of complexity, in a variety of scenarios, becoming an essential component in automation systems where fast and accurate object detection is paramount. To learn more and test YoloV9, do not hesitate to go to Hugging Face 🤗!

‍

What are the main applications of YOLO in various industries?

‍

YOLO, one of the best object detection algorithms, is used in a variety of areas of life, making our everyday lives much easier. Here is a quick overview of the main industries where YOLO is used!

‍

Surveillance systems

YOLO is widely used in surveillance to maintain security in public areas such as airports, shopping malls, and city streets. It quickly identifies items left unattended, such as bags potentially containing dangerous materials, and unusual movements, alerting authorities in real time. This helps prevent crime and respond to potential threats quickly, ensuring public safety.

‍

Traffic control and management

In the field of traffic management, YOLO can analyze traffic patterns, identify traffic violations, and detect accidents as soon as they occur. Authorities use this data in real time to optimize traffic flows, reduce congestion, and deploy emergency services more quickly if needed. With YOLO, smart cities can effectively manage their roads, potentially saving lives by reducing accident response times.

‍

Health

In the healthcare sector, YOLO is used in medical imaging to identify abnormalities in scans and to assist in diagnoses. While not as accurate as specialized diagnostic tools, it nonetheless speeds up preliminary analysis, pointing out areas that require further examination by a health professional. This application from YOLO can speed up patient screening and help with the early detection of diseases.

‍

Industrial automation

Manufacturing and logistics industries benefit from YOLO because it streamlines operations by identifying components on assembly lines, tracking inventory in real time, and identifying defects in products. Such practice leads to better quality control, increased efficiency, and reduced operational costs by minimizing human errors and increasing production throughput.

‍

Retail sale

Retailers use YOLO to understand customer behavior and improve store layouts. By analyzing how individuals move around a store, businesses can optimize shelf locations, improve customer service, and manage queues more effectively. This information helps build better customer experiences.

‍

Autonomous vehicles

Using YOLO to develop autonomous vehicle AIs allows cars to detect other cars, pedestrians, and obstacles on the road, making it essential for the driving decision-making process.

‍

Frequently Asked Questions

What is the concept of "Non-Maximum Suppression (NMS)" in the context of YOLO object detection?

NMS is a post-processing technique used in YOLO to ensure that each detected object is accounted for only once. After YOLO predicts multiple bounding boxes for detected objects, NMS reviews these boxes and removes the less probable ones, keeping only the most likely bounding boxes. This prevents multiple detections of the same object and improves the algorithm’s accuracy.

How does YOLO leverage the Pascal VOC dataset to improve its performance?

The Pascal VOC dataset is a well-known dataset in Computer Vision that provides standardized image sets for object class recognition. YOLO uses this dataset—among others like COCO—for training and testing in order to progressively improve object detection. Training on VOC helps the model learn to detect the 20 object classes included in the dataset and validate its accuracy and efficiency on training images.

Can YOLO effectively detect two bounding boxes around a single object?

YOLO can predict more than one bounding box per object; however, it relies on NMS to decide which is the most accurate. The algorithm initially predicts multiple boxes, then, based on class probabilities and Intersection over Union (IoU) scores, it selects the best bounding box while discarding the others.

How is YOLO different from a convolutional neural network like CNN in approaching object detection?

YOLO is designed as a single-shot detector, meaning it performs both classification and localization in one pass. It is not fully convolutional because it uses fully connected layers at the end of the architecture. A convolutional neural network (CNN), on the other hand, does not include fully connected layers and performs segmentation, producing a segmentation map. In the context of object detection, YOLO offers a fast and efficient way to detect objects using bounding box coordinates and class probabilities, whereas CNNs are often used for pixel-level segmentation.

Does YOLO use a Support Vector Machine (SVM) for object classification?

No, YOLO does not use Support Vector Machines (SVMs) for object classification. Instead, it directly predicts class probabilities for each bounding box using softmax or logistic classifiers as part of the same Deep Learning model, rather than relying on traditional Machine Learning approaches like SVMs.

‍

A last word

‍

In summary, YOLO is a powerful object detection algorithm and there are few competitors that can compete with it in designing and marketing high-performance AI products that are relatively inexpensive to develop. With excellent object detection performance, real-time object detection features, and unparalleled detection performance, YOLO is already used in a wide range of industries. So we hope you enjoyed the information we provided in this article. Thanks for reading!

‍

And if you want to know more about the preparing data sets to train your YOLO models, why not explore the services offered by Innovatiana ? At Innovatiana, we understand the importance of a well-structured and dense data set for the effectiveness of artificial intelligence models. We specialize in preparing and processing quality data to maximize the performance of your YOLO models!

Introduction to object detection in Computer Vision [2025]

Bounding Box annotation for Computer Vision models: 10 essential tips

Bounding Boxes' accurate annotation is critical for machine learning. Follow these 10 practices for quality data

MediaPipe: the essential toolbox for Computer Vision

MediaPipe simplifies computer vision by offering ready-to-use solutions for facial detection or gesture tracking