COPDVD：在新收集和评估的语音数据集上对慢性阻塞性肺病进行自动分类

IF 6.1 2区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence in Medicine Pub Date : 2024-08-15 DOI:10.1016/j.artmed.2024.102953

Alper Idrisoglu , Ana Luiza Dallora , Abbas Cheddad , Peter Anderberg , Andreas Jakobsson , Johan Sanmartin Berglund

{"title":"COPDVD：在新收集和评估的语音数据集上对慢性阻塞性肺病进行自动分类","authors":"Alper Idrisoglu , Ana Luiza Dallora , Abbas Cheddad , Peter Anderberg , Andreas Jakobsson , Johan Sanmartin Berglund","doi":"10.1016/j.artmed.2024.102953","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Chronic obstructive pulmonary disease (COPD) is a severe condition affecting millions worldwide, leading to numerous annual deaths. The absence of significant symptoms in its early stages promotes high underdiagnosis rates for the affected people. Besides pulmonary function failure, another harmful problem of COPD is the systemic effects, e.g., heart failure or voice distortion. However, the systemic effects of COPD might provide valuable information for early detection. In other words, symptoms caused by systemic effects could be helpful to detect the condition in its early stages.</p></div><div><h3>Objective</h3><p>The proposed study aims to explore whether the voice features extracted from the vowel “a” utterance carry any information that can be predictive of COPD by employing Machine Learning (ML) on a newly collected voice dataset.</p></div><div><h3>Methods</h3><p>Forty-eight participants were recruited from the pool of research clinic visitors at Blekinge Institute of Technology (BTH) in Sweden between January 2022 and May 2023. A dataset consisting of 1246 recordings from 48 participants was gathered. The collection of voice recordings containing the vowel “a” utterance commenced following an information and consent meeting with each participant using the <em>VoiceDiagnostic</em> application. The collected voice data was subjected to silence segment removal, feature extraction of baseline acoustic features, and Mel Frequency Cepstrum Coefficients (MFCC). Sociodemographic data was also collected from the participants. Three ML models were investigated for the binary classification of COPD and healthy controls: Random Forest (RF), Support Vector Machine (SVM), and CatBoost (CB). A nested k-fold cross-validation approach was employed. Additionally, the hyperparameters were optimized using grid-search on each ML model. For best performance assessment, accuracy, F1-score, precision, and recall metrics were computed. Afterward, we further examined the best classifier by utilizing the Area Under the Curve (AUC), Average Precision (AP), and SHapley Additive exPlanations (SHAP) feature-importance measures.</p></div><div><h3>Results</h3><p>The classifiers RF, SVM, and CB achieved a maximum accuracy of 77 %, 69 %, and 78 % on the test set and 93 %, 78 % and 97 % on the validation set, respectively. The CB classifier outperformed RF and SVM. After further investigation of the best-performing classifier, CB demonstrated the highest performance, producing an AUC of 82 % and AP of 76 %. In addition to age and gender, the mean values of baseline acoustic and MFCC features demonstrate high importance and deterministic characteristics for classification performance in both test and validation sets, though in varied order.</p></div><div><h3>Conclusion</h3><p>This study concludes that the utterance of vowel “a” recordings contain information that can be captured by the CatBoost classifier with high accuracy for the classification of COPD. Additionally, baseline acoustic and MFCC features, in conjunction with age and gender information, can be employed for classification purposes and benefit healthcare for decision support in COPD diagnosis.</p></div><div><h3>Clinical trial registration number</h3><p><span><span>NCT05897944</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"156 ","pages":"Article 102953"},"PeriodicalIF":6.1000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0933365724001957/pdfft?md5=91d584b7c0bf3f6fbf86fae6ca0a54d6&pid=1-s2.0-S0933365724001957-main.pdf","citationCount":"0","resultStr":"{\"title\":\"COPDVD: Automated classification of chronic obstructive pulmonary disease on a new collected and evaluated voice dataset\",\"authors\":\"Alper Idrisoglu , Ana Luiza Dallora , Abbas Cheddad , Peter Anderberg , Andreas Jakobsson , Johan Sanmartin Berglund\",\"doi\":\"10.1016/j.artmed.2024.102953\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Chronic obstructive pulmonary disease (COPD) is a severe condition affecting millions worldwide, leading to numerous annual deaths. The absence of significant symptoms in its early stages promotes high underdiagnosis rates for the affected people. Besides pulmonary function failure, another harmful problem of COPD is the systemic effects, e.g., heart failure or voice distortion. However, the systemic effects of COPD might provide valuable information for early detection. In other words, symptoms caused by systemic effects could be helpful to detect the condition in its early stages.</p></div><div><h3>Objective</h3><p>The proposed study aims to explore whether the voice features extracted from the vowel “a” utterance carry any information that can be predictive of COPD by employing Machine Learning (ML) on a newly collected voice dataset.</p></div><div><h3>Methods</h3><p>Forty-eight participants were recruited from the pool of research clinic visitors at Blekinge Institute of Technology (BTH) in Sweden between January 2022 and May 2023. A dataset consisting of 1246 recordings from 48 participants was gathered. The collection of voice recordings containing the vowel “a” utterance commenced following an information and consent meeting with each participant using the <em>VoiceDiagnostic</em> application. The collected voice data was subjected to silence segment removal, feature extraction of baseline acoustic features, and Mel Frequency Cepstrum Coefficients (MFCC). Sociodemographic data was also collected from the participants. Three ML models were investigated for the binary classification of COPD and healthy controls: Random Forest (RF), Support Vector Machine (SVM), and CatBoost (CB). A nested k-fold cross-validation approach was employed. Additionally, the hyperparameters were optimized using grid-search on each ML model. For best performance assessment, accuracy, F1-score, precision, and recall metrics were computed. Afterward, we further examined the best classifier by utilizing the Area Under the Curve (AUC), Average Precision (AP), and SHapley Additive exPlanations (SHAP) feature-importance measures.</p></div><div><h3>Results</h3><p>The classifiers RF, SVM, and CB achieved a maximum accuracy of 77 %, 69 %, and 78 % on the test set and 93 %, 78 % and 97 % on the validation set, respectively. The CB classifier outperformed RF and SVM. After further investigation of the best-performing classifier, CB demonstrated the highest performance, producing an AUC of 82 % and AP of 76 %. In addition to age and gender, the mean values of baseline acoustic and MFCC features demonstrate high importance and deterministic characteristics for classification performance in both test and validation sets, though in varied order.</p></div><div><h3>Conclusion</h3><p>This study concludes that the utterance of vowel “a” recordings contain information that can be captured by the CatBoost classifier with high accuracy for the classification of COPD. Additionally, baseline acoustic and MFCC features, in conjunction with age and gender information, can be employed for classification purposes and benefit healthcare for decision support in COPD diagnosis.</p></div><div><h3>Clinical trial registration number</h3><p><span><span>NCT05897944</span><svg><path></path></svg></span>.</p></div>\",\"PeriodicalId\":55458,\"journal\":{\"name\":\"Artificial Intelligence in Medicine\",\"volume\":\"156 \",\"pages\":\"Article 102953\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2024-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0933365724001957/pdfft?md5=91d584b7c0bf3f6fbf86fae6ca0a54d6&pid=1-s2.0-S0933365724001957-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence in Medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0933365724001957\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0933365724001957","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

背景慢性阻塞性肺疾病（COPD）是一种严重的疾病，影响着全球数百万人，每年导致大量死亡。由于慢性阻塞性肺病早期没有明显症状，因此患者的诊断率很低。除肺功能衰竭外，慢性阻塞性肺病的另一个有害问题是全身影响，如心力衰竭或声音失真。然而，慢性阻塞性肺病的全身影响可能为早期检测提供有价值的信息。本研究旨在通过在新收集的语音数据集上使用机器学习（ML）技术，探讨从元音 "a "的语音中提取的语音特征是否包含任何可预测慢性阻塞性肺病的信息。方法在 2022 年 1 月至 2023 年 5 月期间，从瑞典布莱金厄理工学院（Blekinge Institute of Technology，BTH）的研究诊所访客中招募了 48 名参与者。数据集由 48 名参与者的 1246 份录音组成。在使用 VoiceDiagnostic 应用程序与每位参与者进行信息交流并征得同意后，开始收集包含元音 "a "的语音记录。收集到的语音数据经过了静音段去除、基线声学特征提取和梅尔频率倒频谱系数（MFCC）处理。此外，还收集了参与者的社会人口学数据。针对慢性阻塞性肺病和健康对照组的二元分类，研究了三种 ML 模型：随机森林 (RF)、支持向量机 (SVM) 和 CatBoost (CB)。采用了嵌套 k 倍交叉验证方法。此外，还在每个多模型上使用网格搜索对超参数进行了优化。为了评估最佳性能，我们计算了准确率、F1 分数、精确度和召回率指标。结果RF、SVM 和 CB 分类器在测试集上的最高准确率分别为 77%、69% 和 78%，在验证集上的最高准确率分别为 93%、78% 和 97%。CB 分类器的表现优于 RF 和 SVM。在对表现最好的分类器进行进一步研究后，CB 表现最好，其 AUC 为 82%，AP 为 76%。除了年龄和性别外，基线声学特征和 MFCC 特征的平均值在测试集和验证集中都显示出了对分类性能的高度重要性和确定性特征，尽管顺序有所不同。此外，基线声学和 MFCC 特征与年龄和性别信息相结合，可用于分类目的，并有利于医疗保健对慢性阻塞性肺病诊断的决策支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

COPDVD: Automated classification of chronic obstructive pulmonary disease on a new collected and evaluated voice dataset

Background

Chronic obstructive pulmonary disease (COPD) is a severe condition affecting millions worldwide, leading to numerous annual deaths. The absence of significant symptoms in its early stages promotes high underdiagnosis rates for the affected people. Besides pulmonary function failure, another harmful problem of COPD is the systemic effects, e.g., heart failure or voice distortion. However, the systemic effects of COPD might provide valuable information for early detection. In other words, symptoms caused by systemic effects could be helpful to detect the condition in its early stages.

Objective

The proposed study aims to explore whether the voice features extracted from the vowel “a” utterance carry any information that can be predictive of COPD by employing Machine Learning (ML) on a newly collected voice dataset.

Methods

Forty-eight participants were recruited from the pool of research clinic visitors at Blekinge Institute of Technology (BTH) in Sweden between January 2022 and May 2023. A dataset consisting of 1246 recordings from 48 participants was gathered. The collection of voice recordings containing the vowel “a” utterance commenced following an information and consent meeting with each participant using the VoiceDiagnostic application. The collected voice data was subjected to silence segment removal, feature extraction of baseline acoustic features, and Mel Frequency Cepstrum Coefficients (MFCC). Sociodemographic data was also collected from the participants. Three ML models were investigated for the binary classification of COPD and healthy controls: Random Forest (RF), Support Vector Machine (SVM), and CatBoost (CB). A nested k-fold cross-validation approach was employed. Additionally, the hyperparameters were optimized using grid-search on each ML model. For best performance assessment, accuracy, F1-score, precision, and recall metrics were computed. Afterward, we further examined the best classifier by utilizing the Area Under the Curve (AUC), Average Precision (AP), and SHapley Additive exPlanations (SHAP) feature-importance measures.

Results

The classifiers RF, SVM, and CB achieved a maximum accuracy of 77 %, 69 %, and 78 % on the test set and 93 %, 78 % and 97 % on the validation set, respectively. The CB classifier outperformed RF and SVM. After further investigation of the best-performing classifier, CB demonstrated the highest performance, producing an AUC of 82 % and AP of 76 %. In addition to age and gender, the mean values of baseline acoustic and MFCC features demonstrate high importance and deterministic characteristics for classification performance in both test and validation sets, though in varied order.

Conclusion

This study concludes that the utterance of vowel “a” recordings contain information that can be captured by the CatBoost classifier with high accuracy for the classification of COPD. Additionally, baseline acoustic and MFCC features, in conjunction with age and gender information, can be employed for classification purposes and benefit healthcare for decision support in COPD diagnosis.

Clinical trial registration number

NCT05897944.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial Intelligence in Medicine 工程技术-工程：生物医学

CiteScore

15.00

自引率

2.70%

发文量

143

审稿时长

6.3 months

期刊介绍： Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care. Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.