Discover YOLO v9: understanding YOLO, the most popular object detection algorithm


Object detection is a fundamental task in Computer Vision : it allows artificial intelligences to locate and classify objects present in images or videos. The ability to accurately detect objects has numerous applications, ranging from self-driving cars to surveillance systems. In recent years, an algorithm has gained popularity for its exceptional performance in object detection : You Only Look Once (YOLO). But what do you know about this algorithm and what understanding do you have of it?
💡 You have no idea? Don't worry, this article is here to explain what YOLO is, its importance in the world of AI and its different versions. After reading this, you will have a good understanding of YOLO and its applications. Let's go!
Object detection algorithms: what are they?
Object detection algorithms are computer programs designed to identify and locate objects in an image or video. These powerful detection algorithms can identify multiple objects and classify them into different categories.
A popular example of an object detection algorithm is YOLO (You Only Look Once), which quickly processes images in real time, making it highly effective for applications such as traffic monitoring and control. Another example is the family R-CNN (Regions with Convolutional Neural Networks), which includes Fast R-CNN and Faster R-CNN, recognized for their precision in detecting a single object or several objects by first proposing regions and then classifying them.
With advances in artificial intelligence (Deep Learning), these algorithms are constantly improving, becoming faster and more accurate, and playing an essential role in the development of technologies such as autonomous vehicles, where they contribute to automating a system for detecting obstacles on the road, for example.
What is YOLO, how important is it in AI?
We saw it, YOLO, or “You Only Look Once,” is a special tool that helps computers quickly and accurately see things in images, text files, or videos.
Created by the experts Joseph Redmon and Ali Farhadi In 2015, YOLO is faster than older tools because it analyzes the entire image at once. This quick check allows YOLO to quickly identify if there are other objects, such as cars, trees, or animals, and where they are in the image.
The importance of YOLO is enormous for AI, especially in the development of advanced products such as autonomous vehicles. For self-driving cars, YOLO can function like the eyes of the car, quickly spotting things on the road to avoid accidents. Also, embedded in smart cameras, YOLO can contribute to improving video surveillance by automatically detecting unusual behaviors, for example in airports or shopping centers. This means that if someone leaves a backpack alone, YOLO can let the security team know immediately via a notification.
The creators of YOLO continue to update the algorithm to continuously improve it; there are numerous versions, from YOLOv1 to YOLOv9 (the most recent, released in February 2024), each new version being faster and more accurate. YOLO has become very popular because it gives machines superpowers to see and understand the world quickly and locate objects for a multitude of real-world applications.
How does YOLO work?
Here's how the YOLO (You Only Look Once) object detection algorithm works, explained in simple steps:
1. Take a photo
First, the YOLO algorithm starts with an image, just like when you take a photo with a camera. This is what we call object detection based on image classification !
2. Divide the image
Then, it divides the given image into small squares, like a checkerboard. Each square will be checked to see if it contains an object (a cat, a dog, or even a can of food, for example).
3. Look for clues
For each square, YOLO looks for clues or characteristics such as edges, shapes, or textures that could indicate what object is there. He surrounds them with encompassing boxes. Since YOLO needs to learn to fully understand and interpret a new data set, he has sometimes been given a reference data set (or “ground truth”) that he can draw on to get points of comparison.
4. Making predictions
The algorithm makes an assumption for each square in an image: what object could it be and where exactly is it in the square? It gives each assumption a score to show its level of certainty.
5. Eliminate surpluses
Some squares have guesses of different objects that overlap, like two squares guessing part of the same car. YOLO chooses the best guess for each object, getting rid of superfluous assumptions.
6. Show what he found
At the end, YOLO shows you where it thinks each object is by drawing boxes around them and labeling them, like “car” or “tree.” If you give him 1,000 images containing dogs and cats, and tell him to identify the cats, he will show you metadata-rich images pointing to the cats.
💡 The strength of YOLO is that it looks at all the elements of an image (broken down into “squares”) at the same time. That's why It is fast and can even work in real time, which is extremely useful for applications that require quick reactions, such as autonomous cars or video surveillance!
YOLO vs. R-CNN: what's the difference (s)?
Both YOLO and R-CNN are effective in identifying objects in images or videos, but they do it in different ways and for often different use cases. Here's how they differ in object detection processes!
Vitesse
YOLO is very fast because it analyzes the entire image all at once. But R-CNN looks at parts of the image several times to find objects, which takes longer. Thus, the YOLO model offers more speed in detecting objects!
Steps taken
YOLO divides the image into squares, guess what's in each, and eliminates superfluous guesswork. R-CNN starts by finding interesting parts of the image and then looks at those parts more closely to determine what's in them.
Precision
R-CNN is very meticulous and accurate because it spends more time checking every part of the image. YOLO is faster, but sometimes it's not always as thorough as R-CNN.
Use cases
YOLO is suitable when you need quick answers, like in a self-driving car that needs to make quick decisions. R-CNN is best when you need to be really sure what's in the image and have more time to check, for example if a medical image shows signs of illness.
👉 Overall, using YOLO is like taking a quick look at a room and quickly finding most of the objects in it. Operating R-CNN is like taking the time to look at every nook and cranny of this room to make sure you don't miss a thing. These algorithms are both great for playing this game, but they play it differently!
Evolution of object detection: from YOLO 1 to YOLO 9
YOLO, an acronym for “You Only Look Once,” is a real-time object detection algorithm that has seen significant improvements since its inception. As a detector”One Shot“, it processes images and identifies objects by predicting bounding boxes and class probabilities in a single pass. Over time, YOLO has become more and more resilient and efficient, as illustrated very well by the latest publication by its authors:

YOLO V1
- The first release of YOLO revolutionized the AI/Computer Vision research community with its real-time object detection capabilities, offering much faster inference speeds than existing methods such as R-CNN.
- YOLO v1 divides the incoming image into a grid and predicts multiple bounding boxes and class probabilities for each grid cell.
- However, with this first version, precision was a compromise. YOLO then had trouble with small objects and produced numerous object location errors.
YOLO V2 and V3
- Subsequent releases, such as YOLO v2 and v3, introduced notable improvements and new features like anchor boxes, using k-means clustering to predict more accurate bounding box coordinates.
- These versions also took advantage of batch standardization and the management of higher resolution input images, leading to significantly better detection performance on benchmarks such as data sets Pascal VOC and COCO.
YOLO V4 and V5
- With the aim of achieving both high speed and high precision, YOLO v4 introduced features such as the Spatial pyramidal pooling and a more complex YOLO architecture based on cutting-edge convolutional networks.
- YOLO v5 focused on simplification and optimization, allowing it to run extremely quickly on less powerful hardware while maintaining high precision.
YOLO V6 to V8
- The most recent versions of YOLO, starting with version 6, introduce continuous improvements focused on concrete YOLO applications, such as autonomous vehicles or video surveillance. The further you go in time, the more YOLO moves away from the research community to reach the general public and use cases applied to real life.
- These versions refined the use of deep learning techniques, including various forms ofIncrease in data and optimization algorithms that have helped improve average accuracy and the ability to detect a diverse range of object classes.
YOLO V9
On February 21, 2024, Chien-Yao Wang, I-Hau Yeh and Hong-Yuan Mark Liao published the article ”YoLov9: Learn what you want to learn using Programmable Gradient Information“, which introduces a new computer vision model architecture: YoLoV9.
YoLoV9 represents a major advance in the YOLO model series, offering significant improvements in accuracy and efficiency for the detection of objects in real time. It differs from its predecessors, in particular YOLOv8, by a 49% reduction in the number of parameters and 43% in computational complexity, while increasing the average accuracy on the MS COCO dataset by 0.6%.
The YoloV9 series includes four models : YOLOV9-S (small), YOLOV9-m (medium), YOLOV9-c (compact), and YOLOV9-e (extended), each varying in terms of the number of settings and performances. These models are designed to meet a variety of needs, ranging from lightweight applications to more performance-intensive applications.
YoLov9 introduces two major innovations:
- 1. The Programmable Gradient Information (PGI)
- 2. The Generalized Efficient Layer Aggregation Network (GELAN)
The ERP is an auxiliary supervisory mechanism with three main components:
- 1. A main branch
- 2. A reversible auxiliary branch
- 3. Multi-level auxiliary information
This structure helps mitigate information loss caused by information bottlenecks, a common problem in deep neural networks. The GELAN combines elements of CSPnet, known for its efficient planning of the gradient path, andELAN, which prioritizes inference speed, creating a versatile architecture that focuses on lightweight design, fast inference, and increased accuracy.
In addition, YoloV9 is suitable for a variety of Computer Vision applications, including in the areas of logistics and distribution, autonomous vehicles, retail people counting, or sports analytics. These applications benefit from YoLov9's ability to detect objects in real time with great precision and efficiency.
💡 In short, YoLov9 represents an important milestone in artificial intelligence research, reflecting the current dynamic of relentless pursuit to achieve and maintain leading status in the field. The developers of YoLov9 published the source code on GitHub, thus facilitating its adaptation to various Computer Vision tasks.
As it evolved from YOLO v1 to v9, the YOLO family of object detection algorithms consolidated its position as a key tool in Computer Vision. With each release, YOLO has become more adept in its ability to detect objects of varying degrees of complexity, in a variety of scenarios, becoming an essential component in automation systems where fast and accurate object detection is paramount. To learn more and test YoloV9, do not hesitate to go to Hugging Face 🤗!
What are the main applications of YOLO in various industries?
YOLO, one of the best object detection algorithms, is used in a variety of areas of life, making our everyday lives much easier. Here is a quick overview of the main industries where YOLO is used!
Surveillance systems
YOLO is widely used in surveillance to maintain security in public areas such as airports, shopping malls, and city streets. It quickly identifies items left unattended, such as bags potentially containing dangerous materials, and unusual movements, alerting authorities in real time. This helps prevent crime and respond to potential threats quickly, ensuring public safety.
Traffic control and management
In the field of traffic management, YOLO can analyze traffic patterns, identify traffic violations, and detect accidents as soon as they occur. Authorities use this data in real time to optimize traffic flows, reduce congestion, and deploy emergency services more quickly if needed. With YOLO, smart cities can effectively manage their roads, potentially saving lives by reducing accident response times.
Health
In the healthcare sector, YOLO is used in medical imaging to identify abnormalities in scans and to assist in diagnoses. While not as accurate as specialized diagnostic tools, it nonetheless speeds up preliminary analysis, pointing out areas that require further examination by a health professional. This application from YOLO can speed up patient screening and help with the early detection of diseases.
Industrial automation
Manufacturing and logistics industries benefit from YOLO because it streamlines operations by identifying components on assembly lines, tracking inventory in real time, and identifying defects in products. Such practice leads to better quality control, increased efficiency, and reduced operational costs by minimizing human errors and increasing production throughput.
Retail sale
Retailers use YOLO to understand customer behavior and improve store layouts. By analyzing how individuals move around a store, businesses can optimize shelf locations, improve customer service, and manage queues more effectively. This information helps build better customer experiences.
Autonomous vehicles
Using YOLO to develop autonomous vehicle AIs allows cars to detect other cars, pedestrians, and obstacles on the road, making it essential for the driving decision-making process.
A last word
In summary, YOLO is a powerful object detection algorithm and there are few competitors that can compete with it in designing and marketing high-performance AI products that are relatively inexpensive to develop. With excellent object detection performance, real-time object detection features, and unparalleled detection performance, YOLO is already used in a wide range of industries. So we hope you enjoyed the information we provided in this article. Thanks for reading!
And if you want to know more about the preparing data sets to train your YOLO models, why not explore the services offered by Innovatiana ? At Innovatiana, we understand the importance of a well-structured and dense data set for the effectiveness of artificial intelligence models. We specialize in preparing and processing quality data to maximize the performance of your YOLO models!