How-to

Conducting your data annotation campaign: our guide (2/2)

Written by

Nicolas

Published on

2023-12-18

Reading time

min

The preliminary steps mentioned in the first part of this guide led to the constitution of a team, the precise definition of the project problem, and the development of precise rules for annotation tasks. The campaign can start! In this article, we've compiled a set of recommendations for running successful data annotation campaigns.

‍

Training and mobilizing Data Labelers for successful AI projects

‍

Training and mobilizing Data Labelers (or annotators) is a necessary step in any data annotation campaign. The repetitive, tedious and sometimes complex nature of the annotation task exposes to the risk of errors such as the omission of an object to be annotated on a given image, or the assignment of an inappropriate label. In-depth training and effective mobilization of annotators, both at the beginning and during the project, are essential to mitigate these risks of errors and especially to identify them as early as possible.

‍

In the preliminary phase of the project, it is essential to clearly explain the challenges of the project to the team of annotators, highlighting the central role of annotation in the success of the project. This is an essential awareness-raising phase. This integration stage also represents an opportunity to make annotators aware of concepts related to Artificial Intelligence, and to the reality of AI product development cycles.

‍

A good practice is also to maintain a register of the most common errors, updated as the project progresses, with a participatory approach (i.e.: each annotator is invited to complete the register with the specific cases identified, supplemented with concrete examples and illustrated with screenshots).

‍

Maintain the engagement of annotators throughout the project

‍

Maintaining the commitment of the annotators throughout the project requires a constant dynamic of exchanges. Setting up sharing tools such as instant messaging, discussion forums, and collaborative documents is useful for fostering discussions within the project team, allowing difficulties to be resolved, questions to be asked, and mutual support to be provided. Regular synchronization sessions can also be set up to communicate on the progress of the project, share possible changes, or highlight specific points of attention related to the annotation.

‍

Control and ensure the quality of the data

‍

When the final objective of the annotation campaign is to develop an algorithm to automate a task, the presence of errors in the data and metadata used for training can cause the algorithm to reproduce the imperfections of manual annotation. Here we bring together several best practices to make projects reliable, regardless of their size.

‍

Create a dataset Ground Truth (or Ground Truth)

‍

A data set, also called”Ground Truth“, consists of annotated documents whose annotations have been rigorously checked, thus guaranteeing unquestionable quality. This data set can be used in a variety of ways.

‍

On the one hand, the corresponding documents (excluding annotations) can be submitted for annotation by the annotators at the start of the project. This approach aims to ensure an adequate understanding of the task by the annotators and to verify that the annotation schema is unambiguous, that is, it could not lead two annotators to annotate the same document in a correct but divergent manner. By comparing annotator annotations with quality-assured ones, errors or ambiguities can be detected. These findings will either help to clarify the elements of the annotation schema that require additional explanation, or to correct the annotation schema to eliminate some ambiguities.

‍

On the other hand, the “Ground Truth” data set can also be used as a test data set, thus offering the possibility of evaluating the algorithm developed on a data set of maximum quality. This approach makes it possible to measure the performance of the algorithm under reliable conditions and to ensure its robustness and precision.

‍

Random verification of documents annotated by Data Labelers

‍

It is recommended that, throughout the project, the project manager periodically review annotated documents, selected randomly, in order to ensure the quality of the annotations.

‍

Implementation of consistency tests on annotations

‍

In the context of some projects, it is possible to implement automatic tests that reflect the business rules that annotations must respect. When such tests can be integrated, they offer the possibility of automatically detecting annotated documents with a high risk of errors, thus requiring priority verification by the business expert.

‍

Finally: take stock of your annotation campaign

‍

Conducting an annotation campaign, which is often faced with complex challenges, requires careful evaluation at the end of the campaign to identify useful lessons for future projects involving annotation. This critical phase makes it possible to document in detail the methodology used, the progress of the campaign, as well as key metrics. The following section provides a non-exhaustive list of metrics and questions relevant to an in-depth evaluation of your annotation campaign, thus offering Insights precious.

‍

Below are some indicators that can be used to assess the performance and relevance of annotation campaigns:

• Duration of the annotation campaign

• Number of annotators mobilized

• Total volume of annotated documents

• Average time spent annotating a document

• Appropriateness of the annotation software (performance, comparison of results using several platforms, ergonomics, etc.)

• Appropriateness of the annotation scheme (readability, reproducibility, coverage of specific cases)

• Ability to mobilize professional annotators who are experts in their field

‍

A comprehensive assessment approach contributes to a better understanding of the successes and challenges encountered, thus providing essential information to improve future annotation campaigns.

‍

(End of guide. Find the first part of our guide at This address).

‍

To go further, discover our article on the criteria for choose the right annotation platform according to your use cases.

‍

To manage your data annotation campaigns, Innovatiana stands out by presenting an integrated solution via a platform that distinguishes itself by offering a global solution, accessible at https://dashboard.innovatiana.com, for data collection and annotation challenges. It represents an all-in-one approach, centralizing the specific requirements of each project within the same working environment, thus allowing appropriate customization.