噼啪声的不可靠性：利用医生和人工智能进行呼吸音研究的启示。

IF 3.1 3区医学 Q1 PRIMARY HEALTH CARE NPJ Primary Care Respiratory Medicine Pub Date : 2024-10-15 DOI:10.1038/s41533-024-00392-9

Chun-Hsiang Huang, Chi-Hsin Chen, Jing-Tong Tzeng, An-Yan Chang, Cheng-Yi Fan, Chih-Wei Sung, Chi-Chun Lee, Edward Pei-Chuan Huang

{"title":"噼啪声的不可靠性：利用医生和人工智能进行呼吸音研究的启示。","authors":"Chun-Hsiang Huang, Chi-Hsin Chen, Jing-Tong Tzeng, An-Yan Chang, Cheng-Yi Fan, Chih-Wei Sung, Chi-Chun Lee, Edward Pei-Chuan Huang","doi":"10.1038/s41533-024-00392-9","DOIUrl":null,"url":null,"abstract":"Background and introduction: In comparison to other physical assessment methods, the inconsistency in respiratory evaluations continues to pose a major issue and challenge.Objectives: This study aims to evaluate the difference in the identification ability of different breath sound.Methods/description: In this prospective study, breath sounds from the Formosa Archive of Breath Sound were labeled by five physicians. Six artificial intelligence (AI) breath sound interpretation models were developed based on all labeled data and the labels from the five physicians, respectively. After labeling by AIs and physicians, labels with discrepancy were considered doubtful and relabeled by two additional physicians. The final labels were determined by a majority vote among the physicians. The capability of breath sound identification for humans and AI was evaluated using sensitivity, specificity and the area under the receiver-operating characteristic curve (AUROC).Results/outcome: A total of 11,532 breath sound files were labeled, with 579 doubtful labels identified. After relabeling and exclusion, there were 305 labels with gold standard. For wheezing, both human physicians and the AI model demonstrated good sensitivities (89.5% vs. 86.0%) and good specificities (96.4% vs. 95.2%). For crackles, both human physicians and the AI model showed good sensitivities (93.9% vs. 80.3%) but poor specificities (56.6% vs. 65.9%). Lower AUROC values were noted in crackles identification for both physicians and the AI model compared to wheezing.Conclusion: Even with the assistance of artificial intelligence tools, accurately identifying crackles compared to wheezing remains challenging. Consequently, crackles are unreliable for medical decision-making, and further examination is warranted.","PeriodicalId":19470,"journal":{"name":"NPJ Primary Care Respiratory Medicine","volume":"34 1","pages":"28"},"PeriodicalIF":3.1000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11480396/pdf/","citationCount":"0","resultStr":"{\"title\":\"The unreliability of crackles: insights from a breath sound study using physicians and artificial intelligence.\",\"authors\":\"Chun-Hsiang Huang, Chi-Hsin Chen, Jing-Tong Tzeng, An-Yan Chang, Cheng-Yi Fan, Chih-Wei Sung, Chi-Chun Lee, Edward Pei-Chuan Huang\",\"doi\":\"10.1038/s41533-024-00392-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background and introduction: In comparison to other physical assessment methods, the inconsistency in respiratory evaluations continues to pose a major issue and challenge.Objectives: This study aims to evaluate the difference in the identification ability of different breath sound.Methods/description: In this prospective study, breath sounds from the Formosa Archive of Breath Sound were labeled by five physicians. Six artificial intelligence (AI) breath sound interpretation models were developed based on all labeled data and the labels from the five physicians, respectively. After labeling by AIs and physicians, labels with discrepancy were considered doubtful and relabeled by two additional physicians. The final labels were determined by a majority vote among the physicians. The capability of breath sound identification for humans and AI was evaluated using sensitivity, specificity and the area under the receiver-operating characteristic curve (AUROC).Results/outcome: A total of 11,532 breath sound files were labeled, with 579 doubtful labels identified. After relabeling and exclusion, there were 305 labels with gold standard. For wheezing, both human physicians and the AI model demonstrated good sensitivities (89.5% vs. 86.0%) and good specificities (96.4% vs. 95.2%). For crackles, both human physicians and the AI model showed good sensitivities (93.9% vs. 80.3%) but poor specificities (56.6% vs. 65.9%). Lower AUROC values were noted in crackles identification for both physicians and the AI model compared to wheezing.Conclusion: Even with the assistance of artificial intelligence tools, accurately identifying crackles compared to wheezing remains challenging. Consequently, crackles are unreliable for medical decision-making, and further examination is warranted.\",\"PeriodicalId\":19470,\"journal\":{\"name\":\"NPJ Primary Care Respiratory Medicine\",\"volume\":\"34 1\",\"pages\":\"28\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11480396/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"NPJ Primary Care Respiratory Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1038/s41533-024-00392-9\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PRIMARY HEALTH CARE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ Primary Care Respiratory Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41533-024-00392-9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PRIMARY HEALTH CARE","Score":null,"Total":0}

引用次数: 0

摘要

背景和引言：与其他身体评估方法相比，呼吸评估的不一致性仍然是一个主要问题和挑战：本研究旨在评估不同呼吸音识别能力的差异：在这项前瞻性研究中，五位医生对来自福尔摩沙呼吸音档案的呼吸音进行了标注。根据所有标注数据和五位医生的标注，分别开发了六个人工智能（AI）呼吸音解读模型。经人工智能和医生标注后，存在差异的标注被视为可疑标注，由另外两名医生重新标注。最终标签由医生们以多数票决定。使用灵敏度、特异性和接收者工作特征曲线下面积（AUROC）对人类和人工智能的呼吸音识别能力进行评估：共对 11,532 份呼吸声文件进行了标注，发现了 579 个可疑标注。经过重新标注和排除后，有 305 个标注符合金标准。对于喘息，人类医生和人工智能模型都表现出良好的灵敏度（89.5% 对 86.0%）和特异性（96.4% 对 95.2%）。对于噼啪声，人类医生和人工智能模型都显示出良好的灵敏度（93.9% 对 80.3%），但特异性较差（56.6% 对 65.9%）。与喘息相比，医生和人工智能模型在识别噼啪声方面的 AUROC 值较低：结论：即使在人工智能工具的帮助下，与喘息相比，准确识别裂纹仍具有挑战性。因此，噼啪声在医疗决策中并不可靠，需要进一步研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The unreliability of crackles: insights from a breath sound study using physicians and artificial intelligence.

Background and introduction: In comparison to other physical assessment methods, the inconsistency in respiratory evaluations continues to pose a major issue and challenge.

Objectives: This study aims to evaluate the difference in the identification ability of different breath sound.

Methods/description: In this prospective study, breath sounds from the Formosa Archive of Breath Sound were labeled by five physicians. Six artificial intelligence (AI) breath sound interpretation models were developed based on all labeled data and the labels from the five physicians, respectively. After labeling by AIs and physicians, labels with discrepancy were considered doubtful and relabeled by two additional physicians. The final labels were determined by a majority vote among the physicians. The capability of breath sound identification for humans and AI was evaluated using sensitivity, specificity and the area under the receiver-operating characteristic curve (AUROC).

Results/outcome: A total of 11,532 breath sound files were labeled, with 579 doubtful labels identified. After relabeling and exclusion, there were 305 labels with gold standard. For wheezing, both human physicians and the AI model demonstrated good sensitivities (89.5% vs. 86.0%) and good specificities (96.4% vs. 95.2%). For crackles, both human physicians and the AI model showed good sensitivities (93.9% vs. 80.3%) but poor specificities (56.6% vs. 65.9%). Lower AUROC values were noted in crackles identification for both physicians and the AI model compared to wheezing.

Conclusion: Even with the assistance of artificial intelligence tools, accurately identifying crackles compared to wheezing remains challenging. Consequently, crackles are unreliable for medical decision-making, and further examination is warranted.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

NPJ Primary Care Respiratory Medicine PRIMARY HEALTH CARE-RESPIRATORY SYSTEM

CiteScore

5.50

自引率

6.50%

发文量

审稿时长

10 weeks

期刊介绍： npj Primary Care Respiratory Medicine is an open access, online-only, multidisciplinary journal dedicated to publishing high-quality research in all areas of the primary care management of respiratory and respiratory-related allergic diseases. Papers published by the journal represent important advances of significance to specialists within the fields of primary care and respiratory medicine. We are particularly interested in receiving papers in relation to the following aspects of respiratory medicine, respiratory-related allergic diseases and tobacco control: epidemiology prevention clinical care service delivery and organisation of healthcare (including implementation science) global health.