Respiratory Sound Database
Audio database of lung recordings including annotated breathing sounds (whistles, crackles) on 126 patients of all ages.
1844 files (920.wav + 920.txt), annotations, diagnostics, and metadata
Free to use for research purposes
Description
The Respiratory Sound Database is an audio base for the analysis of respiratory disorders. It contains 920 recordings made on 126 patients using digital stethoscopes. Each file is annotated manually to indicate the presence of abnormal breathing noises such as Crackles And the Wheezes. Recordings last between 10 and 90 seconds, covering a total of 5.5 hours of data.
What is this dataset for?
- Train models to detect respiratory pathologies such as asthma or COPD.
- Automate the annotation of medical sounds for assisted diagnostics.
- Create mobile or embedded screening applications (e.g. smart stethoscopes).
Can it be enriched or improved?
Yes. It is possible to complete this dataset with new records from other populations or clinical conditions. The addition of additional annotations (e.g. lung location, sound intensity) would reinforce its usefulness. It can also be used to generate enriched spectrograms or be cross-referenced with clinical data.
🔎 In summary
🧠 Recommended for
- AI bio-acoustics projects
- AI-assisted diagnosis
- Medical research
🔧 Compatible tools
- Librosa
- PyTorch
- Tensorflow
- Audacity
- Kaggle Notebooks
💡 Tip
Converting audio files into spectrograms makes it easier to train CNN models than on raw materials.
Frequently Asked Questions
Does this dataset contain annotated examples of specific pathologies?
Yes, each recording is associated with annotations reporting abnormal sounds such as crackles and whistles, useful for the classification of pathologies.
Can the files be used for supervised learning?
Yes, the files are accompanied by annotation texts that allow supervised models to be trained on audio signals.
Can this dataset be used for a mobile respiratory detection application?
Yes, by adapting models and optimizing inference, this dataset is ideal for prototyping mobile or connected apps.