Medical Speech Transcription and Intent Dataset

Multimodal dataset of more than 8 hours of audio statements coupled with their text transcripts on common medical symptoms, ideal for training medical speech recognition systems.

Download dataset

Size

Over 8 hours of audio in WAV files, with associated transcripts in CSV and text format.

Licence

License accessible via Figure Eight (Appen), use under conditions (see description)

Description

‍

The dataset Medical Speech Transcription and Intent contains several thousand audio excerpts describing common medical symptoms, along with their text transcripts. It was collected via a collaborative platform and contains natural variations in pronunciation and quality.

‍

What is this dataset for?

‍

Training medical speech recognition models
Detect intentions and symptoms expressed orally
Building voice assistants specialized in health

‍

Can it be enriched or improved?

‍

The dataset requires a cleaning of the labels and a quality control of the audios. It can be enriched with additional annotations such as speaker identification, background noise, or fine segmentation.

‍

🔎 In summary

Criterion	Evaluation
🧩 Ease of use	⭐⭐⭐✩✩ (Requires audio cleaning and preprocessing)
🧼 Need for cleaning	⭐⭐✩✩✩ (Significant: variable quality, labels sometimes incorrect)
🏷️ Annotation richness	⭐⭐⭐✩✩ (Medium: transcriptions and intents, few advanced metadata)
📜 Commercial license	⚖️ Use under conditions (Figure Eight/Appen)
👨‍💻 Beginner friendly	⚠️ Medium, better with audio experience
🔁 Fine-tuning ready	🎯 Yes, for ASR and medical NLP
🌍 Cultural diversity	⚠️ Not specified, probably limited