Knowledge

Discover interactive segmentation: a new era for image analysis

Written by

Aïcha

Published on

2025-03-08

Reading time

min

Image segmentation consists in dividing an image into significant regions in order to facilitate analysis. When she is Interactive, a human guides the algorithm (for example, with advanced annotation tools) to obtain accurate segmentation of specific areas of interest. This approach makes it possible to segment Any object, even not provided for by the classes of an automatic model, thanks to the indications of the user. To prepare datasets, interactive segmentation is therefore valuable in filling the gaps in fully automatic methods, by combining the speed of AI and human expertise.

‍

💡 In this article, we explore the principles of interactive segmentation, let's trace the evolution of techniques (from rule-based methods to neural networks), present its flagship applications (medical imaging, image editing, robotics, etc.), and discuss the current challenges as well as the future prospects of this technology.

‍

CVAT user interface (illustration) for interactive segmentation: annotators use the Segment Anything 2 feature to create masks on sheets, then review and correct them manually for better accuracy (Source: Innovatiana)

‍

Principle of interactive segmentation

‍

Interactive segmentation involves a human-machine collaboration to isolate an object in an image. The user provides visual indications and the segmentation algorithm calculates the masks correspondents. Several modes of interaction are commonly used:

Checkpoints : the user clicks on a few pixels by marking them as belonging either to the target object (positives) or at the bottom (negative points). The system then adjusts the mask accordingly, allowing the user to add more points until the desired result is achieved.
Encompassing box (Bounding Box): the user draws an approximate rectangle around the object of interest. The algorithm will then precisely segment the inside of this box by distinguishing the object from the bottom.
Scribbles /brush strokes : the user paints roughly lines on the object to be kept and possibly on the background to be excluded. These Scribbles serve as a guide for the algorithm to delineate areas.

‍

Each new user input updates the segmentation iteratively until the target object is properly isolated. The big advantage of this approach is Remove ambiguity in complex cases: the human can specify what the machine should segment. For example, if several objects touch each other or if the lighting interferes with the scene, the user can guide the result in a few clicks. Thus, interactive segmentation combines precision of human control and the algorithm calculation speed, offering a result that is often more reliable than a fully automatic (or entirely manual) method on difficult images.

‍

Are you looking for high-quality datasets for your Computer Vision models?

Feel free to contact us: our team of expert Data Labelers has the knowledge and experience to segment your most complex images and videos.

‍

Evolution of image segmentation techniques

‍

Image segmentation has evolved a lot in a few decades, moving from simple deterministic methods to algorithms ofdeep learning very efficient. We can distinguish three main stages in this evolution:

‍

1. Rule-based methods (years 1980-1990)‍

The first segmentation processes were based on fixed criteria manually by experts in image processing. Among these classical techniques, we find for example the Threshold (binarization of an image according to a luminance or color threshold), the edge detection (delineation of objects via their edges by examining the image gradients) or the Region Growing (grouping of neighboring pixels with similar characteristics). These methods”By hand” work well in simple cases, but lack robustness as soon as the scenes are complex or the variable shooting settings. They often need to be adjusted frame by frame. However, they have laid the theoretical foundations of segmentation and are still used for simple needs or as pre-processing.

‍

2. Approaches based on machine learning (2000s)‍

With the progress of statistical learning, researchers have introduced models that can learn to segment based on annotated data. For example, methods combine pixel descriptors (color, texture, etc.) and classifiers trained (SVM, random forests...) to predict the label (object or background) of each pixel. Other techniques, such as Random Walks (random markets) or Markovian models (MRF/CRF), integrate neighborhood information to improve the coherence of segments. In interactive segmentation, an algorithm that marks this era is the Graph Cut (and its GrabCut extension) which uses a graph model to interactively separate an object: the user initiates the process (for example by roughly surrounding the object) and the algorithm optimizes a cut of the image graph by minimizing a cost criterion. Overall, these approaches partially learn from data, making them more adaptive than simple fixed rules. However, their performance remains limited by the need to manually define the right characteristics to be learned (Handcrafted Features), and they quickly reach their limits on very complex images or various objects.

‍

3. Neural networks and deep learning (years 2010 to the present)‍

The revolution came from convolutional neural networks (CNN) capable oflearn automatically the characteristics that are relevant for segmenting images. Models such as U-Net, Mask R-CNN or more recently Segment Anything (SAM) by Meta have pushed boundaries in terms of precision and generalization. By feeding these networks with large sets of annotated images, they manage to finely segment objects of various shapes and sizes, sometimes even under difficult background conditions. Modern techniques often mix encoder-decoder (to capture global context and local details) and multi-scale attention, making them very effective in distinguishing each pixel in the image. In addition, some recent models are Promptable, that is, they accept instructions (dots, box, text) as input to segment any target specified in the image. This makes them particularly suitable for interactive segmentation, where a user point or click can be used as Prompt to instantly generate a mask.

‍

It is important to note that despite the excellence of neural networks, traditional methods have not completely disappeared: in contexts where computing resources are limited or images are very simple, well-chosen thresholding may suffice. However, for industrial applications requiring robustness and scale, deep learning-based approaches dominate image segmentation today.

‍

Applications in various fields

‍

Interactive segmentation has a variety of applications when it comes to isolating visual objects with precision. It is used both for annotate data (creation of training datasets for AI) only for tools intended for professionals or the general public. Here are some major areas where it provides added value:

‍

Medicine and biomedical imaging

‍

*Segmentation of a brain MRI: original image (a) and image segmented into three tissues: white matter (WM), gray matter (GM) and cerebrospinal fluid (CSF) (b). (Source:* ***pmc.ncbi.nlm.nih.gov***)

‍

In medicine, image segmentation makes it possible to delineate anatomical structures or anomalies (tumors, organs, lesions, etc.) on imaging exams (MRI, CT, ultrasound, etc.). Automatic methods are useful, but The intervention of a specialist is often necessary to correct and refine the results. In fact, manually analyzing entire volumes is extremely time-consuming and subject to variation.

‍

Interactive segmentation accelerates this process: a radiologist can, for example, trigger an automatic segmentation of a tumor and then correct it in a few clicks if necessary, instead of delineating it entirely by hand. Likewise, for the preparation of computer-aided surgery, the surgeon can quickly adjust the segmented target area (such as an organ to be treated) in order to obtain an accurate 3D model. Thanks to these interactive tools, we obtain faster reliable cut-outs structures of interest, which helps with the diagnosis, treatment plan or the creation of personalized operating guides.

‍

Image editing and graphic design

‍

*Example of****subject extraction*** *by GrabCut: by roughly framing the cat in the photo (left), the algorithm automatically segments the subject on a transparent background (right). Source:* ***researchgate.net***

‍

Whether for the photography, the publicity Or the design, interactive segmentation is a valuable tool for manipulating visual elements. A common use case is object clipping (or Background Removal): it involves removing the background of an image to isolate the subject (product, person, etc.). Consumer software such as Photoshop integrates intelligent selection tools (magnetic lasso, improved magic wand, etc.) that rely on interactive segmentation algorithms: the user indicates approximately the area to be preserved, the tool calculates the precise outline and makes it possible to refine by painting poorly cut areas.

‍

Today, many online services offer to remove the background from a photo In one click, thanks to AI. However, they often provide for a “manual” mode to adjust the result, as automatic can confuse elements (for example, fine hair with the background). Interactive segmentation is also used in Augmented reality (to dynamically place an object or a person in a different setting) or for the selective colorization (isolate a color element on a black and white background, etc.). In all these cases, it offers precise control to the user while freeing the user from having to draw the contours entirely by hand.

‍

Robotics, autonomous vehicles and machine vision

‍

*Real-time segmented urban scene for an autonomous vehicle (each color representing a class)*

‍

Robotic systems and autonomous vehicles rely heavily on computer vision to understand their environment. In particular, the semantic segmentation provides a detailed understanding of each pixel of the image captured by the robot or car camera: it assigns a label to each pixel (vehicle, pedestrian, road, obstacle, building, etc.).

‍

This is particularly important for navigation, as the system needs to know where the road is, how to distinguish a pedestrian from a lamp post, etc. In most cases, these segmentations are performed entirely automatically by neural networks trained on thousands of urban images. Nevertheless, the constitution of these databases Training makes extensive use of interactive segmentation: human operators manually annotate examples (street images) using interactive tools to segment each object, in order to create Ground truths accurate to train the models. Moreover, in industrial robotics, an operator can use interactive segmentation to teach quickly for a robot to identify a particular part among others on an assembly line (by segmenting it on a few images, to generate examples).

‍

We can therefore see that the human intervenes either upstream (to produce high quality annotated data) or possibly In supervision (for example, a driver supervising an autonomous vehicle could correct the detection of an ambiguous obstacle in real time via an interactive segmentation interface, if such assistance features exist in the future). In all cases, interactive segmentation provides a quality assurance and a safety net in areas (transport, automation, robotics) where reliability is essential.

‍

Current challenges and future prospects

‍

Despite its successes, interactive segmentation faces several dares. On the one hand, it is a question of reducing even more theuser effort required: ideally, one would want to segment any object with a single click or a single instruction. Recent work goes in this direction with foundation models suchlike Segment Anything Model (SAM) from Meta, capable of generating a mask from a simple point or an enclosing box provided as input. These very generic models show impressive results, but they are not infallible. In practice, their predictions still often require human validation and correction. For example, we note that an annotation produced by SAM is not Not always perfect and that a specialist must repeat it to achieve the required quality.

‍

Improve the First-time accuracy is therefore a challenge: this involves more efficient networks, possibly combining vision and language (we are beginning to explore models that can be guided by textual instructions, such as “select the big tree on the right side of the picture”).

‍

On the other hand, interactive segmentation must be adapted to new types of data. For example, the3D imaging (volume) or the video pose additional challenges: how can a user effectively guide segmentation in a time sequence or volume? Tools must be invented to propagate corrections over time or according to 3D sections, in order to avoid humans having to repeat everything frame by frame. Areas of research focus oncontinuous learning : an interactive system could learn as the user makes corrections, to avoid repeating the same mistakes. We then talk about adaptive interactive segmentation, where the model is customized to the operator's preferences or to the specific data encountered.

‍

Another challenge lies in theuser experience itself: make the annotation interface as intuitive and efficient as possible. For example, you need instant visual feedback (that the user sees the effect of their clicks in real time), intelligent suggestions (the system could proactively suggest segmenting such objects if the user hesitates), and the ability to undo or refine locally without starting all over again. The latency must be minimal to allow a smooth interaction: this involves optimizing the algorithms (some recent works aim at lightweight models that can run in real time on the CPU).

Despite these challenges, the prospects for interactive segmentation are very promising. With the rise of ever more powerful and generalist AI models, we can imagine tools capable of “segment everything” almost instantly, requiring only quick user validation. In many professional fields, these advances will save precious time for experts (doctors, engineers, etc.) who will be able to focus on analysis rather than on the tedious preparation of data... even if these tools do not in any way exempt the establishment of a complete and effective labeling process (or LabeloPS).

‍

In conclusion, interactive segmentation illustrates the complementarity between humans and AI: algorithms provide speed of execution and the ability to process large volumes of images, while human expertise guarantees the relevance and quality of the final result. Current research efforts aim to minimize the intervention needed without completely eliminating it, so that the final decision remains in enlightened human hands. Let's bet that in the near future, thanks to the continuous improvement of models and interfaces, interactive segmentation will become an even more tool. transparent and powerful, integrating naturally into many workflows without even realizing it.

‍

Sources to go further

‍

- For a general introduction to the various image segmentation techniques, you can consult 🔗 This article from Innovatiana.

- The 🔗 Kili Technology blog details the principles of interactive segmentation and how it interacts.

- Finally, to discover Meta's Segment Anything model that prefigures the future of universal segmentation, we suggest you read Meta's Segment Anything model that foreshadows the future of universal segmentation 🔗 SAM: everything you need to know.

Happy exploring!

Polygonal annotation: defining the contours of advanced data annotation

Image segmentation: key to visual artificial intelligence?

Explore image segmentation methods in visual artificial intelligence and their main areas of application

Instance segmentation: when AI differentiates between objects in an image

Instance segmentation accurately identifies each object in an image. Discover its benefits and applications in AI