A tree-based explainable AI model for early detection of Covid-19 using physiological data.

IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS BMC Medical Informatics and Decision Making Pub Date : 2024-06-24 DOI:10.1186/s12911-024-02576-2
Manar Abu Talib, Yaman Afadar, Qassim Nasir, Ali Bou Nassif, Haytham Hijazi, Ahmad Hasasneh
{"title":"A tree-based explainable AI model for early detection of Covid-19 using physiological data.","authors":"Manar Abu Talib, Yaman Afadar, Qassim Nasir, Ali Bou Nassif, Haytham Hijazi, Ahmad Hasasneh","doi":"10.1186/s12911-024-02576-2","DOIUrl":null,"url":null,"abstract":"<p><p>With the outbreak of COVID-19 in 2020, countries worldwide faced significant concerns and challenges. Various studies have emerged utilizing Artificial Intelligence (AI) and Data Science techniques for disease detection. Although COVID-19 cases have declined, there are still cases and deaths around the world. Therefore, early detection of COVID-19 before the onset of symptoms has become crucial in reducing its extensive impact. Fortunately, wearable devices such as smartwatches have proven to be valuable sources of physiological data, including Heart Rate (HR) and sleep quality, enabling the detection of inflammatory diseases. In this study, we utilize an already-existing dataset that includes individual step counts and heart rate data to predict the probability of COVID-19 infection before the onset of symptoms. We train three main model architectures: the Gradient Boosting classifier (GB), CatBoost trees, and TabNet classifier to analyze the physiological data and compare their respective performances. We also add an interpretability layer to our best-performing model, which clarifies prediction results and allows a detailed assessment of effectiveness. Moreover, we created a private dataset by gathering physiological data from Fitbit devices to guarantee reliability and avoid bias.The identical set of models was then applied to this private dataset using the same pre-trained models, and the results were documented. Using the CatBoost tree-based method, our best-performing model outperformed previous studies with an accuracy rate of 85% on the publicly available dataset. Furthermore, this identical pre-trained CatBoost model produced an accuracy of 81% when applied to the private dataset. You will find the source code in the link: https://github.com/OpenUAE-LAB/Covid-19-detection-using-Wearable-data.git .</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11194929/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-024-02576-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

With the outbreak of COVID-19 in 2020, countries worldwide faced significant concerns and challenges. Various studies have emerged utilizing Artificial Intelligence (AI) and Data Science techniques for disease detection. Although COVID-19 cases have declined, there are still cases and deaths around the world. Therefore, early detection of COVID-19 before the onset of symptoms has become crucial in reducing its extensive impact. Fortunately, wearable devices such as smartwatches have proven to be valuable sources of physiological data, including Heart Rate (HR) and sleep quality, enabling the detection of inflammatory diseases. In this study, we utilize an already-existing dataset that includes individual step counts and heart rate data to predict the probability of COVID-19 infection before the onset of symptoms. We train three main model architectures: the Gradient Boosting classifier (GB), CatBoost trees, and TabNet classifier to analyze the physiological data and compare their respective performances. We also add an interpretability layer to our best-performing model, which clarifies prediction results and allows a detailed assessment of effectiveness. Moreover, we created a private dataset by gathering physiological data from Fitbit devices to guarantee reliability and avoid bias.The identical set of models was then applied to this private dataset using the same pre-trained models, and the results were documented. Using the CatBoost tree-based method, our best-performing model outperformed previous studies with an accuracy rate of 85% on the publicly available dataset. Furthermore, this identical pre-trained CatBoost model produced an accuracy of 81% when applied to the private dataset. You will find the source code in the link: https://github.com/OpenUAE-LAB/Covid-19-detection-using-Wearable-data.git .

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用生理数据早期检测 Covid-19 的基于树的可解释人工智能模型。
随着 COVID-19 在 2020 年的爆发,世界各国都面临着巨大的担忧和挑战。利用人工智能(AI)和数据科学技术进行疾病检测的研究层出不穷。虽然 COVID-19 病例有所减少,但世界各地仍有病例和死亡病例。因此,在症状出现之前及早发现 COVID-19 已成为减少其广泛影响的关键。幸运的是,智能手表等可穿戴设备已被证明是宝贵的生理数据来源,包括心率(HR)和睡眠质量,从而能够检测炎症性疾病。在本研究中,我们利用包含个人步数和心率数据的现有数据集来预测症状出现前感染 COVID-19 的概率。我们训练了三种主要的模型架构:梯度提升分类器(GB)、CatBoost 树和 TabNet 分类器,以分析生理数据并比较它们各自的性能。我们还为表现最佳的模型添加了可解释性层,以明确预测结果,并对有效性进行详细评估。此外,我们还通过收集 Fitbit 设备的生理数据创建了一个私有数据集,以保证可靠性并避免偏见。然后,我们使用相同的预训练模型将相同的模型集应用于该私有数据集,并将结果记录在案。使用基于 CatBoost 树的方法,我们的最佳模型在公开数据集上的准确率达到了 85%,超过了之前的研究。此外,在应用于私有数据集时,这个相同的预训练 CatBoost 模型的准确率也达到了 81%。您可以在以下链接中找到源代码:https://github.com/OpenUAE-LAB/Covid-19-detection-using-Wearable-data.git 。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.20
自引率
5.70%
发文量
297
审稿时长
1 months
期刊介绍: BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.
期刊最新文献
Real-world data to support post-market safety and performance of embolization coils: evidence generation from a medical device manufacturer and data institute partnership. Development of message passing-based graph convolutional networks for classifying cancer pathology reports Machine learning-based evaluation of prognostic factors for mortality and relapse in patients with acute lymphoblastic leukemia: a comparative simulation study A cross domain access control model for medical consortium based on DBSCAN and penalty function RCC-Supporter: supporting renal cell carcinoma treatment decision-making using machine learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1