Respiratory Sound Database

Audio database of lung recordings including annotated breathing sounds (whistles, crackles) on 126 patients of all ages.

Download dataset

Size

1844 files (920.wav + 920.txt), annotations, diagnostics, and metadata

Licence

Free to use for research purposes

Description

‍

The Respiratory Sound Database is an audio base for the analysis of respiratory disorders. It contains 920 recordings made on 126 patients using digital stethoscopes. Each file is annotated manually to indicate the presence of abnormal breathing noises such as Crackles And the Wheezes. Recordings last between 10 and 90 seconds, covering a total of 5.5 hours of data.

‍

What is this dataset for?

‍

Train models to detect respiratory pathologies such as asthma or COPD.
Automate the annotation of medical sounds for assisted diagnostics.
Create mobile or embedded screening applications (e.g. smart stethoscopes).

‍

Can it be enriched or improved?

‍

Yes. It is possible to complete this dataset with new records from other populations or clinical conditions. The addition of additional annotations (e.g. lung location, sound intensity) would reinforce its usefulness. It can also be used to generate enriched spectrograms or be cross-referenced with clinical data.

‍

🔎 In summary

Criterion	Evaluation
🧩Ease of use	⭐⭐⭐☆☆ (Files organized, but requires audio preprocessing)
🧼Need for cleaning	⭐⭐⭐⭐☆ (Low – well-structured data)
🏷️Annotation richness	⭐⭐⭐⭐☆ (Presence of annotations: cycles/crackles/wheezes)
📜Commercial license	✅ Yes – freely usable for research and development
👨‍💻Beginner-friendly	👨‍🎓 Yes, if guidance provided for audio processing
🔁Reusable for fine-tuning	🔥 Very suitable for audio classifier models
🌍Cultural diversity	🌍 Samples from European patients (Portugal, Greece)

‍

🧠 Recommended for

AI bio-acoustics projects
AI-assisted diagnosis
Medical research

‍

🔧 Compatible tools

Librosa
PyTorch
Tensorflow
Audacity
Kaggle Notebooks

‍

💡 Tip

Converting audio files into spectrograms makes it easier to train CNN models than on raw materials.

Frequently Asked Questions

Does this dataset contain annotated examples of specific pathologies?

Yes, each recording is associated with annotations reporting abnormal sounds such as crackles and whistles, useful for the classification of pathologies.

Can the files be used for supervised learning?

Yes, the files are accompanied by annotation texts that allow supervised models to be trained on audio signals.

Can this dataset be used for a mobile respiratory detection application?

Yes, by adapting models and optimizing inference, this dataset is ideal for prototyping mobile or connected apps.

Similar datasets

MIMIC-III

FineWeb-edu

MNIST