Medical Speech Transcription and Intent Dataset
Multimodal dataset of more than 8 hours of audio statements coupled with their text transcripts on common medical symptoms, ideal for training medical speech recognition systems.
Over 8 hours of audio in WAV files, with associated transcripts in CSV and text format.
License accessible via Figure Eight (Appen), use under conditions (see description)
Description
The dataset Medical Speech Transcription and Intent contains several thousand audio excerpts describing common medical symptoms, along with their text transcripts. It was collected via a collaborative platform and contains natural variations in pronunciation and quality.
What is this dataset for?
- Training medical speech recognition models
- Detect intentions and symptoms expressed orally
- Building voice assistants specialized in health
Can it be enriched or improved?
The dataset requires a cleaning of the labels and a quality control of the audios. It can be enriched with additional annotations such as speaker identification, background noise, or fine segmentation.
🔎 In summary
🧠 Recommended for
- Researchers in medical ASR
- Health voice assistant developers
- NLP engineers
🔧 Compatible tools
- Kaldi
- ESPnet
- Hugging Face Transformers
- Librosa
💡 Tip
Perform thorough cleaning of labels before training to improve performance.
Frequently Asked Questions
Does this dataset include intent annotations for medical statements?
Yes, each statement is associated with an intention related to a specific medical symptom.
What is the audio quality of the files included?
Audio quality varies, some files are of poor quality and require cleaning.
Can this dataset be used to train a general speech recognition model?
It is specifically oriented to the medical field, but can be used as a basis for specialized training.




