Optimizing age-related hearing risk predictions: an advanced machine learning integration with HHIE-S

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Biodata Mining Pub Date : 2023-12-14 DOI:10.1186/s13040-023-00351-z

Tzong-Hann Yang, Yu-Fu Chen, Yen-Fu Cheng, Jue-Ni Huang, Chuan-Song Wu, Yuan-Chia Chu

{"title":"Optimizing age-related hearing risk predictions: an advanced machine learning integration with HHIE-S","authors":"Tzong-Hann Yang, Yu-Fu Chen, Yen-Fu Cheng, Jue-Ni Huang, Chuan-Song Wu, Yuan-Chia Chu","doi":"10.1186/s13040-023-00351-z","DOIUrl":null,"url":null,"abstract":"The elderly are disproportionately affected by age-related hearing loss (ARHL). Despite being a well-known tool for ARHL evaluation, the Hearing Handicap Inventory for the Elderly Screening version (HHIE-S) has only traditionally been used for direct screening using self-reported outcomes. This work uses a novel integration of machine learning approaches to improve the predicted accuracy of the HHIE-S tool for ARHL in older adults. We employed a dataset that was gathered between 2016 and 2018 and included 1,526 senior citizens from several Taipei City Hospital branches. 80% of the data were used for training (n = 1220) and 20% were used for testing (n = 356). XGBoost, Gradient Boosting, and LightGBM were among the machine learning models that were only used and assessed on the training set. In order to prevent data leakage and overfitting, the Light Gradient Boosting Machine (LGBM) model—which had the greatest AUC of 0.83 (95% CI 0.81–0.85)—was then only used on the holdout testing data. On the testing set, the LGBM model showed a strong AUC of 0.82 (95% CI 0.79–0.86), far outperforming conventional techniques. Notably, several HHIE-S items and age were found to be significant characteristics. In contrast to traditional HHIE research, which concentrates on the psychological effects of hearing loss, this study combines cutting-edge machine learning techniques—specifically, the LGBM classifier—with the HHIE-S tool. The incorporation of SHAP values enhances the interpretability of the model's predictions and provides a more comprehensive comprehension of the significance of various aspects. Our methodology highlights the great potential that arises from combining machine learning with validated hearing evaluation instruments such as the HHIE-S. Healthcare practitioners can anticipate ARHL more accurately thanks to this integration, which makes it easier to intervene quickly and precisely.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"33 4 1","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-023-00351-z","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

The elderly are disproportionately affected by age-related hearing loss (ARHL). Despite being a well-known tool for ARHL evaluation, the Hearing Handicap Inventory for the Elderly Screening version (HHIE-S) has only traditionally been used for direct screening using self-reported outcomes. This work uses a novel integration of machine learning approaches to improve the predicted accuracy of the HHIE-S tool for ARHL in older adults. We employed a dataset that was gathered between 2016 and 2018 and included 1,526 senior citizens from several Taipei City Hospital branches. 80% of the data were used for training (n = 1220) and 20% were used for testing (n = 356). XGBoost, Gradient Boosting, and LightGBM were among the machine learning models that were only used and assessed on the training set. In order to prevent data leakage and overfitting, the Light Gradient Boosting Machine (LGBM) model—which had the greatest AUC of 0.83 (95% CI 0.81–0.85)—was then only used on the holdout testing data. On the testing set, the LGBM model showed a strong AUC of 0.82 (95% CI 0.79–0.86), far outperforming conventional techniques. Notably, several HHIE-S items and age were found to be significant characteristics. In contrast to traditional HHIE research, which concentrates on the psychological effects of hearing loss, this study combines cutting-edge machine learning techniques—specifically, the LGBM classifier—with the HHIE-S tool. The incorporation of SHAP values enhances the interpretability of the model's predictions and provides a more comprehensive comprehension of the significance of various aspects. Our methodology highlights the great potential that arises from combining machine learning with validated hearing evaluation instruments such as the HHIE-S. Healthcare practitioners can anticipate ARHL more accurately thanks to this integration, which makes it easier to intervene quickly and precisely.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

优化年龄相关听力风险预测：先进的机器学习与 HHIE-S 集成

老年人受年龄相关性听力损失（ARHL）的影响尤为严重。尽管老年人听力障碍量表筛查版（HHIE-S）是众所周知的 ARHL 评估工具，但传统上仅用于使用自我报告结果进行直接筛查。这项工作采用了一种新颖的机器学习方法集成，以提高 HHIE-S 工具对老年人听力障碍的预测准确性。我们采用了 2016 年至 2018 年间收集的数据集，其中包括来自台北市立医院多家分院的 1526 名老年人。其中 80% 的数据用于训练（n = 1220），20% 的数据用于测试（n = 356）。XGBoost、梯度提升和LightGBM等机器学习模型仅在训练集上使用和评估。为了防止数据泄漏和过拟合，轻梯度提升机（Light Gradient Boosting Machine，LGBM）模型的 AUC 最高，为 0.83（95% CI 0.81-0.85），因此只用于保留测试数据。在测试集上，LGBM 模型的 AUC 高达 0.82（95% CI 0.79-0.86），远远超过了传统技术。值得注意的是，几个 HHIE-S 项目和年龄被认为是重要特征。传统的 HHIE 研究侧重于听力损失的心理影响，而本研究则将前沿的机器学习技术（特别是 LGBM 分类器）与 HHIE-S 工具相结合。SHAP 值的加入增强了模型预测的可解释性，并提供了对各方面重要性的更全面理解。我们的方法凸显了将机器学习与 HHIE-S 等经过验证的听力评估工具相结合的巨大潜力。通过这种整合，医疗从业人员可以更准确地预测 ARHL，从而更容易快速、准确地进行干预。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Biodata Mining MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

7.90

自引率

0.00%

发文量

审稿时长

23 weeks

期刊介绍： BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data. Topical areas include, but are not limited to: -Development, evaluation, and application of novel data mining and machine learning algorithms. -Adaptation, evaluation, and application of traditional data mining and machine learning algorithms. -Open-source software for the application of data mining and machine learning algorithms. -Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies. -Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.