CASE STUDY
Harnessing the wealth of audio data through accurate multimodal annotation

+500 hours
annotated and transcribed audio files
+30
labels applied to multimodal data
100%
Correspondence between audio segments and transcription
In the customer support, health, and behavioral analysis industries, exploiting audio data is critical to train models that can detect intentions, emotions, or entities in human speech.
The mission
Create a rich, structured dataset from raw audio files, including:
- The fine segmentation of audios in Chunks relevant with timestamps;
- Manual transcription of segments, with correction of speech recognition errors;
- The annotation of more than 30 labels related to content (themes, intentions, emotions, entities, interruptions...);
- Building relationships multimodal between the transcript and the corresponding audio portions.
Innovatiana mobilized a dedicated team, expert in audio annotation and NLP, and set up a tooled process allowing both a high level of precision and complete traceability of the annotations.
The results
- A dataset structured to train speech-to-text models, classification, or intent detection;
- A multi-modal truth base aligned to exploit both the audio signal and its linguistic interpretation;
- A significant reduction in the time required for human validation thanks to the initial quality of the annotations.