优化 MFCC 参数以自动检测呼吸系统疾病

IF 3.4 2区 物理与天体物理 Q1 ACOUSTICS Applied Acoustics Pub Date : 2024-09-20 DOI:10.1016/j.apacoust.2024.110299
{"title":"优化 MFCC 参数以自动检测呼吸系统疾病","authors":"","doi":"10.1016/j.apacoust.2024.110299","DOIUrl":null,"url":null,"abstract":"<div><p>Voice signals originating from the respiratory tract are utilized as valuable acoustic biomarkers for the diagnosis and assessment of respiratory diseases. Among the employed acoustic features, Mel Frequency Cepstral Coefficients (MFCC) are widely used for automatic analysis, with MFCC extraction commonly relying on default parameters. However, no comprehensive study has systematically investigated the impact of MFCC extraction parameters on respiratory disease diagnosis. In this study, we address this gap by examining the effects of key parameters, namely the number of coefficients, frame length, and hop length between frames, on respiratory condition examination. Our investigation uses four datasets: the Cambridge COVID-19 Sound database, the Coswara dataset, the Saarbrücken Voice Disorders (SVD) database, and a TACTICAS dataset. The Support Vector Machine (SVM) is employed as the classifier, given its widespread adoption and efficacy. Our findings indicate that the accuracy of MFCC decreases as hop length increases, and the optimal number of coefficients is observed to be approximately 30. The performance of MFCC varies with frame length across the datasets: for the COVID-19 datasets (Cambridge COVID-19 Sound database and Coswara dataset), performance declines with longer frame lengths, while for the SVD dataset, performance improves with increasing frame length (from 50 ms to 500 ms). Furthermore, we investigate the optimized combination of these parameters and observe substantial enhancements in accuracy. Compared to the worst combination, the SVM model achieves an accuracy of 81.1%, 80.6%, and 71.7%, with improvements of 19.6%, 16.10%, and 14.90% for the Cambridge COVID-19 Sound database, the Coswara dataset, and the SVD dataset respectively. To validate the generalization of these findings, we employ the Long Short-Term Memory (LSTM) model as a validation model. Remarkably, the LSTM model also demonstrates improved accuracy of 14.12%, 10.10%, and 6.68% across the datasets when utilizing the optimal combination of parameters. The optimal parameters are validated using an external voice pathology dataset (TACTICAS dataset). The results demonstrate the generalization capabilities of the optimized parameters across various pathologies, machine-learning models, and languages.</p></div>","PeriodicalId":55506,"journal":{"name":"Applied Acoustics","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0003682X2400450X/pdfft?md5=16a6e915e6b621772a0f895d48b52015&pid=1-s2.0-S0003682X2400450X-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Optimizing MFCC parameters for the automatic detection of respiratory diseases\",\"authors\":\"\",\"doi\":\"10.1016/j.apacoust.2024.110299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Voice signals originating from the respiratory tract are utilized as valuable acoustic biomarkers for the diagnosis and assessment of respiratory diseases. Among the employed acoustic features, Mel Frequency Cepstral Coefficients (MFCC) are widely used for automatic analysis, with MFCC extraction commonly relying on default parameters. However, no comprehensive study has systematically investigated the impact of MFCC extraction parameters on respiratory disease diagnosis. In this study, we address this gap by examining the effects of key parameters, namely the number of coefficients, frame length, and hop length between frames, on respiratory condition examination. Our investigation uses four datasets: the Cambridge COVID-19 Sound database, the Coswara dataset, the Saarbrücken Voice Disorders (SVD) database, and a TACTICAS dataset. The Support Vector Machine (SVM) is employed as the classifier, given its widespread adoption and efficacy. Our findings indicate that the accuracy of MFCC decreases as hop length increases, and the optimal number of coefficients is observed to be approximately 30. The performance of MFCC varies with frame length across the datasets: for the COVID-19 datasets (Cambridge COVID-19 Sound database and Coswara dataset), performance declines with longer frame lengths, while for the SVD dataset, performance improves with increasing frame length (from 50 ms to 500 ms). Furthermore, we investigate the optimized combination of these parameters and observe substantial enhancements in accuracy. Compared to the worst combination, the SVM model achieves an accuracy of 81.1%, 80.6%, and 71.7%, with improvements of 19.6%, 16.10%, and 14.90% for the Cambridge COVID-19 Sound database, the Coswara dataset, and the SVD dataset respectively. To validate the generalization of these findings, we employ the Long Short-Term Memory (LSTM) model as a validation model. Remarkably, the LSTM model also demonstrates improved accuracy of 14.12%, 10.10%, and 6.68% across the datasets when utilizing the optimal combination of parameters. The optimal parameters are validated using an external voice pathology dataset (TACTICAS dataset). The results demonstrate the generalization capabilities of the optimized parameters across various pathologies, machine-learning models, and languages.</p></div>\",\"PeriodicalId\":55506,\"journal\":{\"name\":\"Applied Acoustics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0003682X2400450X/pdfft?md5=16a6e915e6b621772a0f895d48b52015&pid=1-s2.0-S0003682X2400450X-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Acoustics\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0003682X2400450X\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Acoustics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003682X2400450X","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

摘要

源自呼吸道的声音信号被用作诊断和评估呼吸道疾病的重要声学生物标志物。在采用的声学特征中,梅尔频率倒频谱系数(MFCC)被广泛用于自动分析,MFCC 提取通常依赖于默认参数。然而,还没有一项全面的研究系统地调查了 MFCC 提取参数对呼吸疾病诊断的影响。在本研究中,我们通过研究关键参数(即系数数量、帧长度和帧间跳变长度)对呼吸系统疾病检查的影响来弥补这一空白。我们的研究使用了四个数据集:剑桥 COVID-19 声音数据库、Coswara 数据集、萨尔布吕肯嗓音疾病(SVD)数据库和 TACTICAS 数据集。考虑到支持向量机(SVM)的广泛应用和有效性,我们将其用作分类器。我们的研究结果表明,MFCC 的准确度随着跳数长度的增加而降低,最佳系数数约为 30。MFCC 的性能随不同数据集的帧长而变化:对于 COVID-19 数据集(剑桥 COVID-19 声音数据库和 Coswara 数据集),性能随帧长的增加而下降,而对于 SVD 数据集,性能随帧长的增加而提高(从 50 毫秒到 500 毫秒)。此外,我们还研究了这些参数的优化组合,并观察到准确度有了大幅提高。与最差组合相比,SVM 模型的准确率分别达到了 81.1%、80.6% 和 71.7%,在剑桥 COVID-19 声音数据库、Coswara 数据集和 SVD 数据集上的准确率分别提高了 19.6%、16.10% 和 14.90%。为了验证这些发现的通用性,我们采用了长短期记忆(LSTM)模型作为验证模型。值得注意的是,当使用最优参数组合时,LSTM 模型在各个数据集上的准确率也分别提高了 14.12%、10.10% 和 6.68%。最佳参数使用外部语音病理数据集(TACTICAS 数据集)进行了验证。结果证明了优化参数在不同病理、机器学习模型和语言中的通用能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Optimizing MFCC parameters for the automatic detection of respiratory diseases

Voice signals originating from the respiratory tract are utilized as valuable acoustic biomarkers for the diagnosis and assessment of respiratory diseases. Among the employed acoustic features, Mel Frequency Cepstral Coefficients (MFCC) are widely used for automatic analysis, with MFCC extraction commonly relying on default parameters. However, no comprehensive study has systematically investigated the impact of MFCC extraction parameters on respiratory disease diagnosis. In this study, we address this gap by examining the effects of key parameters, namely the number of coefficients, frame length, and hop length between frames, on respiratory condition examination. Our investigation uses four datasets: the Cambridge COVID-19 Sound database, the Coswara dataset, the Saarbrücken Voice Disorders (SVD) database, and a TACTICAS dataset. The Support Vector Machine (SVM) is employed as the classifier, given its widespread adoption and efficacy. Our findings indicate that the accuracy of MFCC decreases as hop length increases, and the optimal number of coefficients is observed to be approximately 30. The performance of MFCC varies with frame length across the datasets: for the COVID-19 datasets (Cambridge COVID-19 Sound database and Coswara dataset), performance declines with longer frame lengths, while for the SVD dataset, performance improves with increasing frame length (from 50 ms to 500 ms). Furthermore, we investigate the optimized combination of these parameters and observe substantial enhancements in accuracy. Compared to the worst combination, the SVM model achieves an accuracy of 81.1%, 80.6%, and 71.7%, with improvements of 19.6%, 16.10%, and 14.90% for the Cambridge COVID-19 Sound database, the Coswara dataset, and the SVD dataset respectively. To validate the generalization of these findings, we employ the Long Short-Term Memory (LSTM) model as a validation model. Remarkably, the LSTM model also demonstrates improved accuracy of 14.12%, 10.10%, and 6.68% across the datasets when utilizing the optimal combination of parameters. The optimal parameters are validated using an external voice pathology dataset (TACTICAS dataset). The results demonstrate the generalization capabilities of the optimized parameters across various pathologies, machine-learning models, and languages.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Acoustics
Applied Acoustics 物理-声学
CiteScore
7.40
自引率
11.80%
发文量
618
审稿时长
7.5 months
期刊介绍: Since its launch in 1968, Applied Acoustics has been publishing high quality research papers providing state-of-the-art coverage of research findings for engineers and scientists involved in applications of acoustics in the widest sense. Applied Acoustics looks not only at recent developments in the understanding of acoustics but also at ways of exploiting that understanding. The Journal aims to encourage the exchange of practical experience through publication and in so doing creates a fund of technological information that can be used for solving related problems. The presentation of information in graphical or tabular form is especially encouraged. If a report of a mathematical development is a necessary part of a paper it is important to ensure that it is there only as an integral part of a practical solution to a problem and is supported by data. Applied Acoustics encourages the exchange of practical experience in the following ways: • Complete Papers • Short Technical Notes • Review Articles; and thereby provides a wealth of technological information that can be used to solve related problems. Manuscripts that address all fields of applications of acoustics ranging from medicine and NDT to the environment and buildings are welcome.
期刊最新文献
Motion coprime array-based DOA estimation considering phase disturbance of sensor array Prediction of flanking sound transmission through cross-laminated timber junctions with resilient interlayers TPat: Transition pattern feature extraction based Parkinson’s disorder detection using FNIRS signals Voice handicap prevalence among healthcare workers in China and Indonesia Acoustic metaslit for regional sound insulation for a three-dimensional diffuse sound field incidence
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1