By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
Medical Speech Transcription and Intent Dataset
Multimodal

Medical Speech Transcription and Intent Dataset

Multimodal dataset of more than 8 hours of audio statements coupled with their text transcripts on common medical symptoms, ideal for training medical speech recognition systems.

Download dataset
Size

Over 8 hours of audio in WAV files, with associated transcripts in CSV and text format.

Licence

License accessible via Figure Eight (Appen), use under conditions (see description)

Description

The dataset Medical Speech Transcription and Intent contains several thousand audio excerpts describing common medical symptoms, along with their text transcripts. It was collected via a collaborative platform and contains natural variations in pronunciation and quality.

What is this dataset for?

  • Training medical speech recognition models
  • Detect intentions and symptoms expressed orally
  • Building voice assistants specialized in health

Can it be enriched or improved?

The dataset requires a cleaning of the labels and a quality control of the audios. It can be enriched with additional annotations such as speaker identification, background noise, or fine segmentation.

🔎 In summary

Criterion Evaluation
🧩 Ease of use⭐⭐⭐✩✩ (Requires audio cleaning and preprocessing)
🧼 Need for cleaning⭐⭐✩✩✩ (Significant: variable quality, labels sometimes incorrect)
🏷️ Annotation richness⭐⭐⭐✩✩ (Medium: transcriptions and intents, few advanced metadata)
📜 Commercial license⚖️ Use under conditions (Figure Eight/Appen)
👨‍💻 Beginner friendly⚠️ Medium, better with audio experience
🔁 Fine-tuning ready🎯 Yes, for ASR and medical NLP
🌍 Cultural diversity⚠️ Not specified, probably limited

🧠 Recommended for

  • Researchers in medical ASR
  • Health voice assistant developers
  • NLP engineers

🔧 Compatible tools

  • Kaldi
  • ESPnet
  • Hugging Face Transformers
  • Librosa

💡 Tip

Perform thorough cleaning of labels before training to improve performance.

Frequently Asked Questions

Does this dataset include intent annotations for medical statements?

Yes, each statement is associated with an intention related to a specific medical symptom.

What is the audio quality of the files included?

Audio quality varies, some files are of poor quality and require cleaning.

Can this dataset be used to train a general speech recognition model?

It is specifically oriented to the medical field, but can be used as a basis for specialized training.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.