Impact Sourcing

Data Labeling is a profession, not a casual job

Written by

Aïcha

Published on

2023-07-10

Reading time

min

[Source from which our article is based: Deep Learning AI - The Batch - Issue 204 - https://www.deeplearning.ai/the-batch/issue-204/]

‍

💡 In a data-centric approach to AI, the development of successful AI products depends on precisely annotated datasets

‍

However, the demanding nature of Data Labeling work and the costs associated with annotating data on a large scale encourage businesses to look for solutions to automate annotation work or use low-paid freelance providers. These Data Labelers, often sourced via platforms such as Amazon Mechanical Turk or Upwork, are in high demand and sometimes tend to rush the work to respect the strict deadlines or rules that are imposed on them, or to give up. However, everyone would benefit from viewing data annotation less as a casual job or an “odd job,” and more as a profession in their own right.

‍

^{How does the data annotation industry work?}

Companies providing services for the provision of annotators (or Data Labelers), such as Centaur Labs, Surge AI, Remotasks or Outlier (which belong to Scale AI) and many other major players in the sector, use systems of Crowdsourcing automated or manual to manage self-employed workers around the world. Freelance Data Labelers must pass qualifying exams, undergo training, and be evaluated regularly to perform tasks such as plotting ”Bounding Box“ on images or videos, the classification of the feelings expressed in posts on social networks, the evaluation of sexual video clips in some cases, the sorting of bank transactions or the evaluation of the responses of Chatbots.

‍

^{Challenges related to the stability of jobs and salaries for freelance Data Labelers}

The salary scale for Data Labelers varies considerably depending on the location of the workers and the task assigned to them, ranging from $1 per hour in Kenya to $25 per hour or more in the United States. Some tasks that require functional or specialized knowledge, informed judgment, and/or a significant amount of work can be paid up to $300 per micro-task.

‍

Moreover, this work is generally not very stable and does not take into account labor law: if a Data Labeler is absent for a day to go to the doctor or is the victim of a power outage or Internet connection, he or she is immediately replaced thanks to the crowdsourcing system. In addition, there is no tolerance in this system for moments of fatigue or temporary performance problems: a few mistakes and it's the end of the contract for the Data Labeler!

‍

By considering Data Labeling as a simple and accessible task for all, companies seek to reduce costs drastically until negotiating indecent hourly rates. If using a solution offshore is often a good idea to reduce your costs, make no mistake: it is not possible to obtain a quality service that respects fundamental human rights at the same time at less than 5 EUR per hour (which is already very low!) for a Data Labeler, regardless of whether it is located in India, the Philippines or Madagascar.

‍

This system put in place is unfortunately too impersonal today: in order to protect the trade secrets of their customers, companies assign tasks without revealing to Data Labelers the identity of their customer, the application or the function concerned. Data Labelers do not know the purpose of the annotations they produce and undertake not to talk about their work. The result is a loss of meaning, and poor to poor quality data sets... not ideal for training models!

‍

^{Challenges related to the instructions given to Data Labelers and their training}

The instructions for labelling tasks are often very poorly documented and ambiguous. For example, these tasks may require the annotation of clothing worn by human beings, which excludes clothing in a photo of a doll or a cartoon character. But what about images of clothing reflected in a mirror? And does armour count as clothing? And snorkel masks? As Data Scientists and developers, two of the most in-demand IT jobs in the US, iterate over their models, the rules for annotating data become more and more complex, requiring annotators to consider a growing variety of exceptions and special cases. At the first error or the first oversight, Data Labelers risk losing their jobs! Very often, their customers did not make the effort to accurately document the particular or atypical cases, exceptions, or potential data quality problems of the initial set. In many cases, no discussion is possible between the client and the freelance Data Labeler, who finds himself in difficulty and ends up abandoning his work, even if it means not being paid for the work already done on the platform of Crowdsourcing. It's an aberration!

‍

‍^{Challenges related to working conditions and uncertainty of annotation micro-tasks}

In the world of Data Labeling, work schedules are often sporadic and unpredictable. Workers don't know when the next task is going to arrive, how long it will last, whether it will be interesting or overwhelming, or whether it will be well or poorly paid. This uncertainty, combined with the gap between their hourly wage and the earnings of their employers as reported in the press, can demoralize workers.

‍

Many annotators deal with stress by secretly banding together on WhatsApp to share information and ask for advice on how to find interesting tasks and how to avoid work they think is unwanted. They learn tips, such as using existing artificial intelligence models to do the work for them for the simplest tasks, connecting via proxy servers to hide their location, and creating multiple accounts to protect themselves from being suspended if they violate the rules set by the companies that offer them work.

‍

^{The importance of the Data Labeler profession and the annotation of quality data}

The development of successful AI systems depends on precisely annotated data. However, the strict financial constraints of large-scale annotation encourage companies to use the cheapest solutions on the market, choosing the lowest hourly rate, regardless of the quality of the data produced, the ethics of the AI Supply Chain, or the volume of hours that will be imposed on Data Labelers. Still, everyone would benefit from viewing data annotation less as a casual job and more as a profession in its own right.

‍

The value of qualified Data Labelers (or annotators) is becoming even more apparent as AI professionals adopt data-centric development practices that make it possible to build effective systems with relatively few examples. With far fewer examples, selecting and annotating them appropriately is absolutely critical.

‍

📣 Manually labelling data is a process considered painstaking, sometimes thankless. In reality, valuing this work constitutes the best way to create quality data sets to train the AI models. With Innovatiana, we offer expertise, a qualified workforce and automated controls to handle data needs at scale. Talent is everywhere. Not the opportunities. We want to contribute to repair this injustice by creating jobs in Madagascar, with fair wages and ethical working conditions.

‍
Aïcha CAMILLE JO, CEO of Innovatiana.

Clickworkers and crowdsourcing: what are they and why should we rethink this model for AI?

Data Labeling Industry: Is Crowdsourcing for AI the Only Model?

Is AI really “artificial” or “intelligent”? We explain to you the reality of crowdsourcing for AI and its alternatives

Hiring Data Annotator for AI: Our Advice

Our guide to hiring data annotators for AI: recruitment strategy and tips

Data Labeling is a profession, not a casual job

How does the data annotation industry work?

Challenges related to the stability of jobs and salaries for freelance Data Labelers

Challenges related to the instructions given to Data Labelers and their training

‍Challenges related to working conditions and uncertainty of annotation micro-tasks

The importance of the Data Labeler profession and the annotation of quality data

You may like