COPDVD:在新收集和评估的语音数据集上对慢性阻塞性肺病进行自动分类

IF 6.1 2区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence in Medicine Pub Date : 2024-08-15 DOI:10.1016/j.artmed.2024.102953
Alper Idrisoglu , Ana Luiza Dallora , Abbas Cheddad , Peter Anderberg , Andreas Jakobsson , Johan Sanmartin Berglund
{"title":"COPDVD:在新收集和评估的语音数据集上对慢性阻塞性肺病进行自动分类","authors":"Alper Idrisoglu ,&nbsp;Ana Luiza Dallora ,&nbsp;Abbas Cheddad ,&nbsp;Peter Anderberg ,&nbsp;Andreas Jakobsson ,&nbsp;Johan Sanmartin Berglund","doi":"10.1016/j.artmed.2024.102953","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Chronic obstructive pulmonary disease (COPD) is a severe condition affecting millions worldwide, leading to numerous annual deaths. The absence of significant symptoms in its early stages promotes high underdiagnosis rates for the affected people. Besides pulmonary function failure, another harmful problem of COPD is the systemic effects, e.g., heart failure or voice distortion. However, the systemic effects of COPD might provide valuable information for early detection. In other words, symptoms caused by systemic effects could be helpful to detect the condition in its early stages.</p></div><div><h3>Objective</h3><p>The proposed study aims to explore whether the voice features extracted from the vowel “a” utterance carry any information that can be predictive of COPD by employing Machine Learning (ML) on a newly collected voice dataset.</p></div><div><h3>Methods</h3><p>Forty-eight participants were recruited from the pool of research clinic visitors at Blekinge Institute of Technology (BTH) in Sweden between January 2022 and May 2023. A dataset consisting of 1246 recordings from 48 participants was gathered. The collection of voice recordings containing the vowel “a” utterance commenced following an information and consent meeting with each participant using the <em>VoiceDiagnostic</em> application. The collected voice data was subjected to silence segment removal, feature extraction of baseline acoustic features, and Mel Frequency Cepstrum Coefficients (MFCC). Sociodemographic data was also collected from the participants. Three ML models were investigated for the binary classification of COPD and healthy controls: Random Forest (RF), Support Vector Machine (SVM), and CatBoost (CB). A nested k-fold cross-validation approach was employed. Additionally, the hyperparameters were optimized using grid-search on each ML model. For best performance assessment, accuracy, F1-score, precision, and recall metrics were computed. Afterward, we further examined the best classifier by utilizing the Area Under the Curve (AUC), Average Precision (AP), and SHapley Additive exPlanations (SHAP) feature-importance measures.</p></div><div><h3>Results</h3><p>The classifiers RF, SVM, and CB achieved a maximum accuracy of 77 %, 69 %, and 78 % on the test set and 93 %, 78 % and 97 % on the validation set, respectively. The CB classifier outperformed RF and SVM. After further investigation of the best-performing classifier, CB demonstrated the highest performance, producing an AUC of 82 % and AP of 76 %. In addition to age and gender, the mean values of baseline acoustic and MFCC features demonstrate high importance and deterministic characteristics for classification performance in both test and validation sets, though in varied order.</p></div><div><h3>Conclusion</h3><p>This study concludes that the utterance of vowel “a” recordings contain information that can be captured by the CatBoost classifier with high accuracy for the classification of COPD. Additionally, baseline acoustic and MFCC features, in conjunction with age and gender information, can be employed for classification purposes and benefit healthcare for decision support in COPD diagnosis.</p></div><div><h3>Clinical trial registration number</h3><p><span><span>NCT05897944</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"156 ","pages":"Article 102953"},"PeriodicalIF":6.1000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0933365724001957/pdfft?md5=91d584b7c0bf3f6fbf86fae6ca0a54d6&pid=1-s2.0-S0933365724001957-main.pdf","citationCount":"0","resultStr":"{\"title\":\"COPDVD: Automated classification of chronic obstructive pulmonary disease on a new collected and evaluated voice dataset\",\"authors\":\"Alper Idrisoglu ,&nbsp;Ana Luiza Dallora ,&nbsp;Abbas Cheddad ,&nbsp;Peter Anderberg ,&nbsp;Andreas Jakobsson ,&nbsp;Johan Sanmartin Berglund\",\"doi\":\"10.1016/j.artmed.2024.102953\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Chronic obstructive pulmonary disease (COPD) is a severe condition affecting millions worldwide, leading to numerous annual deaths. The absence of significant symptoms in its early stages promotes high underdiagnosis rates for the affected people. Besides pulmonary function failure, another harmful problem of COPD is the systemic effects, e.g., heart failure or voice distortion. However, the systemic effects of COPD might provide valuable information for early detection. In other words, symptoms caused by systemic effects could be helpful to detect the condition in its early stages.</p></div><div><h3>Objective</h3><p>The proposed study aims to explore whether the voice features extracted from the vowel “a” utterance carry any information that can be predictive of COPD by employing Machine Learning (ML) on a newly collected voice dataset.</p></div><div><h3>Methods</h3><p>Forty-eight participants were recruited from the pool of research clinic visitors at Blekinge Institute of Technology (BTH) in Sweden between January 2022 and May 2023. A dataset consisting of 1246 recordings from 48 participants was gathered. The collection of voice recordings containing the vowel “a” utterance commenced following an information and consent meeting with each participant using the <em>VoiceDiagnostic</em> application. The collected voice data was subjected to silence segment removal, feature extraction of baseline acoustic features, and Mel Frequency Cepstrum Coefficients (MFCC). Sociodemographic data was also collected from the participants. Three ML models were investigated for the binary classification of COPD and healthy controls: Random Forest (RF), Support Vector Machine (SVM), and CatBoost (CB). A nested k-fold cross-validation approach was employed. Additionally, the hyperparameters were optimized using grid-search on each ML model. For best performance assessment, accuracy, F1-score, precision, and recall metrics were computed. Afterward, we further examined the best classifier by utilizing the Area Under the Curve (AUC), Average Precision (AP), and SHapley Additive exPlanations (SHAP) feature-importance measures.</p></div><div><h3>Results</h3><p>The classifiers RF, SVM, and CB achieved a maximum accuracy of 77 %, 69 %, and 78 % on the test set and 93 %, 78 % and 97 % on the validation set, respectively. The CB classifier outperformed RF and SVM. After further investigation of the best-performing classifier, CB demonstrated the highest performance, producing an AUC of 82 % and AP of 76 %. In addition to age and gender, the mean values of baseline acoustic and MFCC features demonstrate high importance and deterministic characteristics for classification performance in both test and validation sets, though in varied order.</p></div><div><h3>Conclusion</h3><p>This study concludes that the utterance of vowel “a” recordings contain information that can be captured by the CatBoost classifier with high accuracy for the classification of COPD. Additionally, baseline acoustic and MFCC features, in conjunction with age and gender information, can be employed for classification purposes and benefit healthcare for decision support in COPD diagnosis.</p></div><div><h3>Clinical trial registration number</h3><p><span><span>NCT05897944</span><svg><path></path></svg></span>.</p></div>\",\"PeriodicalId\":55458,\"journal\":{\"name\":\"Artificial Intelligence in Medicine\",\"volume\":\"156 \",\"pages\":\"Article 102953\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2024-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0933365724001957/pdfft?md5=91d584b7c0bf3f6fbf86fae6ca0a54d6&pid=1-s2.0-S0933365724001957-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence in Medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0933365724001957\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0933365724001957","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

背景 慢性阻塞性肺疾病(COPD)是一种严重的疾病,影响着全球数百万人,每年导致大量死亡。由于慢性阻塞性肺病早期没有明显症状,因此患者的诊断率很低。除肺功能衰竭外,慢性阻塞性肺病的另一个有害问题是全身影响,如心力衰竭或声音失真。然而,慢性阻塞性肺病的全身影响可能为早期检测提供有价值的信息。本研究旨在通过在新收集的语音数据集上使用机器学习(ML)技术,探讨从元音 "a "的语音中提取的语音特征是否包含任何可预测慢性阻塞性肺病的信息。方法在 2022 年 1 月至 2023 年 5 月期间,从瑞典布莱金厄理工学院(Blekinge Institute of Technology,BTH)的研究诊所访客中招募了 48 名参与者。数据集由 48 名参与者的 1246 份录音组成。在使用 VoiceDiagnostic 应用程序与每位参与者进行信息交流并征得同意后,开始收集包含元音 "a "的语音记录。收集到的语音数据经过了静音段去除、基线声学特征提取和梅尔频率倒频谱系数(MFCC)处理。此外,还收集了参与者的社会人口学数据。针对慢性阻塞性肺病和健康对照组的二元分类,研究了三种 ML 模型:随机森林 (RF)、支持向量机 (SVM) 和 CatBoost (CB)。采用了嵌套 k 倍交叉验证方法。此外,还在每个多模型上使用网格搜索对超参数进行了优化。为了评估最佳性能,我们计算了准确率、F1 分数、精确度和召回率指标。结果RF、SVM 和 CB 分类器在测试集上的最高准确率分别为 77%、69% 和 78%,在验证集上的最高准确率分别为 93%、78% 和 97%。CB 分类器的表现优于 RF 和 SVM。在对表现最好的分类器进行进一步研究后,CB 表现最好,其 AUC 为 82%,AP 为 76%。除了年龄和性别外,基线声学特征和 MFCC 特征的平均值在测试集和验证集中都显示出了对分类性能的高度重要性和确定性特征,尽管顺序有所不同。此外,基线声学和 MFCC 特征与年龄和性别信息相结合,可用于分类目的,并有利于医疗保健对慢性阻塞性肺病诊断的决策支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
COPDVD: Automated classification of chronic obstructive pulmonary disease on a new collected and evaluated voice dataset

Background

Chronic obstructive pulmonary disease (COPD) is a severe condition affecting millions worldwide, leading to numerous annual deaths. The absence of significant symptoms in its early stages promotes high underdiagnosis rates for the affected people. Besides pulmonary function failure, another harmful problem of COPD is the systemic effects, e.g., heart failure or voice distortion. However, the systemic effects of COPD might provide valuable information for early detection. In other words, symptoms caused by systemic effects could be helpful to detect the condition in its early stages.

Objective

The proposed study aims to explore whether the voice features extracted from the vowel “a” utterance carry any information that can be predictive of COPD by employing Machine Learning (ML) on a newly collected voice dataset.

Methods

Forty-eight participants were recruited from the pool of research clinic visitors at Blekinge Institute of Technology (BTH) in Sweden between January 2022 and May 2023. A dataset consisting of 1246 recordings from 48 participants was gathered. The collection of voice recordings containing the vowel “a” utterance commenced following an information and consent meeting with each participant using the VoiceDiagnostic application. The collected voice data was subjected to silence segment removal, feature extraction of baseline acoustic features, and Mel Frequency Cepstrum Coefficients (MFCC). Sociodemographic data was also collected from the participants. Three ML models were investigated for the binary classification of COPD and healthy controls: Random Forest (RF), Support Vector Machine (SVM), and CatBoost (CB). A nested k-fold cross-validation approach was employed. Additionally, the hyperparameters were optimized using grid-search on each ML model. For best performance assessment, accuracy, F1-score, precision, and recall metrics were computed. Afterward, we further examined the best classifier by utilizing the Area Under the Curve (AUC), Average Precision (AP), and SHapley Additive exPlanations (SHAP) feature-importance measures.

Results

The classifiers RF, SVM, and CB achieved a maximum accuracy of 77 %, 69 %, and 78 % on the test set and 93 %, 78 % and 97 % on the validation set, respectively. The CB classifier outperformed RF and SVM. After further investigation of the best-performing classifier, CB demonstrated the highest performance, producing an AUC of 82 % and AP of 76 %. In addition to age and gender, the mean values of baseline acoustic and MFCC features demonstrate high importance and deterministic characteristics for classification performance in both test and validation sets, though in varied order.

Conclusion

This study concludes that the utterance of vowel “a” recordings contain information that can be captured by the CatBoost classifier with high accuracy for the classification of COPD. Additionally, baseline acoustic and MFCC features, in conjunction with age and gender information, can be employed for classification purposes and benefit healthcare for decision support in COPD diagnosis.

Clinical trial registration number

NCT05897944.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Artificial Intelligence in Medicine
Artificial Intelligence in Medicine 工程技术-工程:生物医学
CiteScore
15.00
自引率
2.70%
发文量
143
审稿时长
6.3 months
期刊介绍: Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care. Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.
期刊最新文献
Hyperbolic multivariate feature learning in higher-order heterogeneous networks for drug–disease prediction Editorial Board BDFormer: Boundary-aware dual-decoder transformer for skin lesion segmentation Finger-aware Artificial Neural Network for predicting arthritis in Patients with hand pain Artificial intelligence-driven approaches in antibiotic stewardship programs and optimizing prescription practices: A systematic review
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1