Vimala Balakrishnan , Yousra Kherabi , Ghayathri Ramanathan , Scott Arjay Paul , Chiong Kian Tiong
{"title":"Machine learning approaches in diagnosing tuberculosis through biomarkers - A systematic review","authors":"Vimala Balakrishnan , Yousra Kherabi , Ghayathri Ramanathan , Scott Arjay Paul , Chiong Kian Tiong","doi":"10.1016/j.pbiomolbio.2023.03.001","DOIUrl":null,"url":null,"abstract":"<div><p>Biomarker-based tests may facilitate Tuberculosis (TB) diagnosis, accelerate treatment initiation, and thus improve outcomes. This review synthesizes the literature on biomarker-based detection for TB diagnosis using machine learning. The systematic review approach follows the PRISMA guideline. Articles were sought using relevant keywords from Web of Science, PubMed, and Scopus, resulting in 19 eligible studies after a meticulous screening. All the studies were found to have focused on the supervised learning approach, with Support Vector Machine (SVM) and Random Forest emerging as the top two algorithms, with the highest accuracy, sensitivity and specificity reported to be 97.0%, 99.2%, and 98.0%, respectively. Further, protein-based biomarkers were widely explored, followed by gene-based such as RNA sequence and, Spoligotypes. Publicly available datasets were observed to be popularly used by the studies reviewed whilst studies targeting specific cohorts such as HIV patients or children gathering their own data from healthcare facilities, leading to smaller datasets. Of these, most studies used the leave one out cross validation technique to mitigate overfitting. The review shows that machine learning is increasingly assessed in research to improve TB diagnosis through biomarkers, as promising results were shown in terms of model's detection performance. This provides insights on the possible application of machine learning approaches to diagnose TB using biomarkers as opposed to the traditional methods that can be time consuming. Low-middle income settings, where access to basic biomarkers could be provided as compared to sputum-based tests that are not always available, could be a major application of such models.</p></div>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0079610723000263","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 4
Abstract
Biomarker-based tests may facilitate Tuberculosis (TB) diagnosis, accelerate treatment initiation, and thus improve outcomes. This review synthesizes the literature on biomarker-based detection for TB diagnosis using machine learning. The systematic review approach follows the PRISMA guideline. Articles were sought using relevant keywords from Web of Science, PubMed, and Scopus, resulting in 19 eligible studies after a meticulous screening. All the studies were found to have focused on the supervised learning approach, with Support Vector Machine (SVM) and Random Forest emerging as the top two algorithms, with the highest accuracy, sensitivity and specificity reported to be 97.0%, 99.2%, and 98.0%, respectively. Further, protein-based biomarkers were widely explored, followed by gene-based such as RNA sequence and, Spoligotypes. Publicly available datasets were observed to be popularly used by the studies reviewed whilst studies targeting specific cohorts such as HIV patients or children gathering their own data from healthcare facilities, leading to smaller datasets. Of these, most studies used the leave one out cross validation technique to mitigate overfitting. The review shows that machine learning is increasingly assessed in research to improve TB diagnosis through biomarkers, as promising results were shown in terms of model's detection performance. This provides insights on the possible application of machine learning approaches to diagnose TB using biomarkers as opposed to the traditional methods that can be time consuming. Low-middle income settings, where access to basic biomarkers could be provided as compared to sputum-based tests that are not always available, could be a major application of such models.
基于生物标志物的检测可以促进结核病(TB)的诊断,加快治疗开始,从而改善结果。这篇综述综合了使用机器学习进行结核病诊断的基于生物标志物的检测的文献。系统审查方法遵循PRISMA指南。文章使用Web of Science、PubMed和Scopus的相关关键词进行检索,经过仔细筛选,获得了19项符合条件的研究。所有研究都集中在监督学习方法上,支持向量机(SVM)和随机森林(Random Forest)是排名前两位的算法,其最高准确率、灵敏度和特异性分别为97.0%、99.2%和98.0%。此外,基于蛋白质的生物标志物被广泛探索,其次是基于基因的生物标志,如RNA序列和Spoligotype。观察到公开可用的数据集被审查的研究广泛使用,而针对特定人群的研究,如HIV患者或儿童,从医疗机构收集他们自己的数据,导致数据集更小。其中,大多数研究使用了留一交叉验证技术来缓解过度拟合。该综述表明,机器学习在通过生物标志物改善结核病诊断的研究中得到了越来越多的评估,因为在模型的检测性能方面显示出了有希望的结果。这为机器学习方法在使用生物标志物诊断结核病方面的可能应用提供了见解,而不是传统的耗时方法。中低收入环境中,与并不总是可用的基于痰的检测相比,可以提供基本的生物标志物,这可能是此类模型的主要应用。