优化 MFCC 参数以自动检测呼吸系统疾病

IF 3.4 2区物理与天体物理 Q1 ACOUSTICS Applied Acoustics Pub Date : 2025-01-15 Epub Date: 2024-09-20 DOI:10.1016/j.apacoust.2024.110299

Yuyang Yan , Sami O. Simons , Loes van Bemmel , Lauren G. Reinders , Frits M.E. Franssen , Visara Urovi

{"title":"优化 MFCC 参数以自动检测呼吸系统疾病","authors":"Yuyang Yan , Sami O. Simons , Loes van Bemmel , Lauren G. Reinders , Frits M.E. Franssen , Visara Urovi","doi":"10.1016/j.apacoust.2024.110299","DOIUrl":null,"url":null,"abstract":"<div><p>Voice signals originating from the respiratory tract are utilized as valuable acoustic biomarkers for the diagnosis and assessment of respiratory diseases. Among the employed acoustic features, Mel Frequency Cepstral Coefficients (MFCC) are widely used for automatic analysis, with MFCC extraction commonly relying on default parameters. However, no comprehensive study has systematically investigated the impact of MFCC extraction parameters on respiratory disease diagnosis. In this study, we address this gap by examining the effects of key parameters, namely the number of coefficients, frame length, and hop length between frames, on respiratory condition examination. Our investigation uses four datasets: the Cambridge COVID-19 Sound database, the Coswara dataset, the Saarbrücken Voice Disorders (SVD) database, and a TACTICAS dataset. The Support Vector Machine (SVM) is employed as the classifier, given its widespread adoption and efficacy. Our findings indicate that the accuracy of MFCC decreases as hop length increases, and the optimal number of coefficients is observed to be approximately 30. The performance of MFCC varies with frame length across the datasets: for the COVID-19 datasets (Cambridge COVID-19 Sound database and Coswara dataset), performance declines with longer frame lengths, while for the SVD dataset, performance improves with increasing frame length (from 50 ms to 500 ms). Furthermore, we investigate the optimized combination of these parameters and observe substantial enhancements in accuracy. Compared to the worst combination, the SVM model achieves an accuracy of 81.1%, 80.6%, and 71.7%, with improvements of 19.6%, 16.10%, and 14.90% for the Cambridge COVID-19 Sound database, the Coswara dataset, and the SVD dataset respectively. To validate the generalization of these findings, we employ the Long Short-Term Memory (LSTM) model as a validation model. Remarkably, the LSTM model also demonstrates improved accuracy of 14.12%, 10.10%, and 6.68% across the datasets when utilizing the optimal combination of parameters. The optimal parameters are validated using an external voice pathology dataset (TACTICAS dataset). The results demonstrate the generalization capabilities of the optimized parameters across various pathologies, machine-learning models, and languages.</p></div>","PeriodicalId":55506,"journal":{"name":"Applied Acoustics","volume":"228 ","pages":"Article 110299"},"PeriodicalIF":3.4000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0003682X2400450X/pdfft?md5=16a6e915e6b621772a0f895d48b52015&pid=1-s2.0-S0003682X2400450X-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Optimizing MFCC parameters for the automatic detection of respiratory diseases\",\"authors\":\"Yuyang Yan , Sami O. Simons , Loes van Bemmel , Lauren G. Reinders , Frits M.E. Franssen , Visara Urovi\",\"doi\":\"10.1016/j.apacoust.2024.110299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Voice signals originating from the respiratory tract are utilized as valuable acoustic biomarkers for the diagnosis and assessment of respiratory diseases. Among the employed acoustic features, Mel Frequency Cepstral Coefficients (MFCC) are widely used for automatic analysis, with MFCC extraction commonly relying on default parameters. However, no comprehensive study has systematically investigated the impact of MFCC extraction parameters on respiratory disease diagnosis. In this study, we address this gap by examining the effects of key parameters, namely the number of coefficients, frame length, and hop length between frames, on respiratory condition examination. Our investigation uses four datasets: the Cambridge COVID-19 Sound database, the Coswara dataset, the Saarbrücken Voice Disorders (SVD) database, and a TACTICAS dataset. The Support Vector Machine (SVM) is employed as the classifier, given its widespread adoption and efficacy. Our findings indicate that the accuracy of MFCC decreases as hop length increases, and the optimal number of coefficients is observed to be approximately 30. The performance of MFCC varies with frame length across the datasets: for the COVID-19 datasets (Cambridge COVID-19 Sound database and Coswara dataset), performance declines with longer frame lengths, while for the SVD dataset, performance improves with increasing frame length (from 50 ms to 500 ms). Furthermore, we investigate the optimized combination of these parameters and observe substantial enhancements in accuracy. Compared to the worst combination, the SVM model achieves an accuracy of 81.1%, 80.6%, and 71.7%, with improvements of 19.6%, 16.10%, and 14.90% for the Cambridge COVID-19 Sound database, the Coswara dataset, and the SVD dataset respectively. To validate the generalization of these findings, we employ the Long Short-Term Memory (LSTM) model as a validation model. Remarkably, the LSTM model also demonstrates improved accuracy of 14.12%, 10.10%, and 6.68% across the datasets when utilizing the optimal combination of parameters. The optimal parameters are validated using an external voice pathology dataset (TACTICAS dataset). The results demonstrate the generalization capabilities of the optimized parameters across various pathologies, machine-learning models, and languages.</p></div>\",\"PeriodicalId\":55506,\"journal\":{\"name\":\"Applied Acoustics\",\"volume\":\"228 \",\"pages\":\"Article 110299\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-01-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0003682X2400450X/pdfft?md5=16a6e915e6b621772a0f895d48b52015&pid=1-s2.0-S0003682X2400450X-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Acoustics\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0003682X2400450X\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/9/20 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Acoustics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003682X2400450X","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/20 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

摘要

源自呼吸道的声音信号被用作诊断和评估呼吸道疾病的重要声学生物标志物。在采用的声学特征中，梅尔频率倒频谱系数（MFCC）被广泛用于自动分析，MFCC 提取通常依赖于默认参数。然而，还没有一项全面的研究系统地调查了 MFCC 提取参数对呼吸疾病诊断的影响。在本研究中，我们通过研究关键参数（即系数数量、帧长度和帧间跳变长度）对呼吸系统疾病检查的影响来弥补这一空白。我们的研究使用了四个数据集：剑桥 COVID-19 声音数据库、Coswara 数据集、萨尔布吕肯嗓音疾病（SVD）数据库和 TACTICAS 数据集。考虑到支持向量机（SVM）的广泛应用和有效性，我们将其用作分类器。我们的研究结果表明，MFCC 的准确度随着跳数长度的增加而降低，最佳系数数约为 30。MFCC 的性能随不同数据集的帧长而变化：对于 COVID-19 数据集（剑桥 COVID-19 声音数据库和 Coswara 数据集），性能随帧长的增加而下降，而对于 SVD 数据集，性能随帧长的增加而提高（从 50 毫秒到 500 毫秒）。此外，我们还研究了这些参数的优化组合，并观察到准确度有了大幅提高。与最差组合相比，SVM 模型的准确率分别达到了 81.1%、80.6% 和 71.7%，在剑桥 COVID-19 声音数据库、Coswara 数据集和 SVD 数据集上的准确率分别提高了 19.6%、16.10% 和 14.90%。为了验证这些发现的通用性，我们采用了长短期记忆（LSTM）模型作为验证模型。值得注意的是，当使用最优参数组合时，LSTM 模型在各个数据集上的准确率也分别提高了 14.12%、10.10% 和 6.68%。最佳参数使用外部语音病理数据集（TACTICAS 数据集）进行了验证。结果证明了优化参数在不同病理、机器学习模型和语言中的通用能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Optimizing MFCC parameters for the automatic detection of respiratory diseases

Voice signals originating from the respiratory tract are utilized as valuable acoustic biomarkers for the diagnosis and assessment of respiratory diseases. Among the employed acoustic features, Mel Frequency Cepstral Coefficients (MFCC) are widely used for automatic analysis, with MFCC extraction commonly relying on default parameters. However, no comprehensive study has systematically investigated the impact of MFCC extraction parameters on respiratory disease diagnosis. In this study, we address this gap by examining the effects of key parameters, namely the number of coefficients, frame length, and hop length between frames, on respiratory condition examination. Our investigation uses four datasets: the Cambridge COVID-19 Sound database, the Coswara dataset, the Saarbrücken Voice Disorders (SVD) database, and a TACTICAS dataset. The Support Vector Machine (SVM) is employed as the classifier, given its widespread adoption and efficacy. Our findings indicate that the accuracy of MFCC decreases as hop length increases, and the optimal number of coefficients is observed to be approximately 30. The performance of MFCC varies with frame length across the datasets: for the COVID-19 datasets (Cambridge COVID-19 Sound database and Coswara dataset), performance declines with longer frame lengths, while for the SVD dataset, performance improves with increasing frame length (from 50 ms to 500 ms). Furthermore, we investigate the optimized combination of these parameters and observe substantial enhancements in accuracy. Compared to the worst combination, the SVM model achieves an accuracy of 81.1%, 80.6%, and 71.7%, with improvements of 19.6%, 16.10%, and 14.90% for the Cambridge COVID-19 Sound database, the Coswara dataset, and the SVD dataset respectively. To validate the generalization of these findings, we employ the Long Short-Term Memory (LSTM) model as a validation model. Remarkably, the LSTM model also demonstrates improved accuracy of 14.12%, 10.10%, and 6.68% across the datasets when utilizing the optimal combination of parameters. The optimal parameters are validated using an external voice pathology dataset (TACTICAS dataset). The results demonstrate the generalization capabilities of the optimized parameters across various pathologies, machine-learning models, and languages.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Acoustics 物理-声学

CiteScore

7.40

自引率

11.80%

发文量

618

审稿时长

7.5 months

期刊介绍： Since its launch in 1968, Applied Acoustics has been publishing high quality research papers providing state-of-the-art coverage of research findings for engineers and scientists involved in applications of acoustics in the widest sense. Applied Acoustics looks not only at recent developments in the understanding of acoustics but also at ways of exploiting that understanding. The Journal aims to encourage the exchange of practical experience through publication and in so doing creates a fund of technological information that can be used for solving related problems. The presentation of information in graphical or tabular form is especially encouraged. If a report of a mathematical development is a necessary part of a paper it is important to ensure that it is there only as an integral part of a practical solution to a problem and is supported by data. Applied Acoustics encourages the exchange of practical experience in the following ways: • Complete Papers • Short Technical Notes • Review Articles; and thereby provides a wealth of technological information that can be used to solve related problems. Manuscripts that address all fields of applications of acoustics ranging from medicine and NDT to the environment and buildings are welcome.