Prediction of tuberculosis clusters in the riverine municipalities of the Brazilian Amazon with machine learning.

Luis Silva, Luise Gomes da Motta, Lynn Eberly
{"title":"Prediction of tuberculosis clusters in the riverine municipalities of the Brazilian Amazon with machine learning.","authors":"Luis Silva, Luise Gomes da Motta, Lynn Eberly","doi":"10.1590/1980-549720240024","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Tuberculosis (TB) is the second most deadly infectious disease globally, posing a significant burden in Brazil and its Amazonian region. This study focused on the \"riverine municipalities\" and hypothesizes the presence of TB clusters in the area. We also aimed to train a machine learning model to differentiate municipalities classified as hot spots vs. non-hot spots using disease surveillance variables as predictors.</p><p><strong>Methods: </strong>Data regarding the incidence of TB from 2019 to 2022 in the riverine town was collected from the Brazilian Health Ministry Informatics Department. Moran's I was used to assess global spatial autocorrelation, while the Getis-Ord GI* method was employed to detect high and low-incidence clusters. A Random Forest machine-learning model was trained using surveillance variables related to TB cases to predict hot spots among non-hot spot municipalities.</p><p><strong>Results: </strong>Our analysis revealed distinct geographical clusters with high and low TB incidence following a west-to-east distribution pattern. The Random Forest Classification model utilizes six surveillance variables to predict hot vs. non-hot spots. The machine learning model achieved an Area Under the Receiver Operator Curve (AUC-ROC) of 0.81.</p><p><strong>Conclusion: </strong>Municipalities with higher percentages of recurrent cases, deaths due to TB, antibiotic regimen changes, percentage of new cases, and cases with smoking history were the best predictors of hot spots. This prediction method can be leveraged to identify the municipalities at the highest risk of being hot spots for the disease, aiding policymakers with an evidenced-based tool to direct resource allocation for disease control in the riverine municipalities.</p>","PeriodicalId":74697,"journal":{"name":"Revista brasileira de epidemiologia = Brazilian journal of epidemiology","volume":"27 ","pages":"e240024"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11093519/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revista brasileira de epidemiologia = Brazilian journal of epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1590/1980-549720240024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: Tuberculosis (TB) is the second most deadly infectious disease globally, posing a significant burden in Brazil and its Amazonian region. This study focused on the "riverine municipalities" and hypothesizes the presence of TB clusters in the area. We also aimed to train a machine learning model to differentiate municipalities classified as hot spots vs. non-hot spots using disease surveillance variables as predictors.

Methods: Data regarding the incidence of TB from 2019 to 2022 in the riverine town was collected from the Brazilian Health Ministry Informatics Department. Moran's I was used to assess global spatial autocorrelation, while the Getis-Ord GI* method was employed to detect high and low-incidence clusters. A Random Forest machine-learning model was trained using surveillance variables related to TB cases to predict hot spots among non-hot spot municipalities.

Results: Our analysis revealed distinct geographical clusters with high and low TB incidence following a west-to-east distribution pattern. The Random Forest Classification model utilizes six surveillance variables to predict hot vs. non-hot spots. The machine learning model achieved an Area Under the Receiver Operator Curve (AUC-ROC) of 0.81.

Conclusion: Municipalities with higher percentages of recurrent cases, deaths due to TB, antibiotic regimen changes, percentage of new cases, and cases with smoking history were the best predictors of hot spots. This prediction method can be leveraged to identify the municipalities at the highest risk of being hot spots for the disease, aiding policymakers with an evidenced-based tool to direct resource allocation for disease control in the riverine municipalities.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用机器学习预测巴西亚马逊河沿岸城市的结核病集群。
目的:结核病(TB)是全球第二大致命传染病,给巴西及其亚马逊地区造成了沉重负担。本研究重点关注 "沿河城市",并假设该地区存在结核病集群。我们还旨在训练一个机器学习模型,利用疾病监测变量作为预测因子,区分被归类为热点与非热点的城市:方法:我们从巴西卫生部信息部门收集了 2019 年至 2022 年沿河城镇的结核病发病率数据。Moran's I 用于评估全球空间自相关性,Getis-Ord GI* 方法用于检测高发病率和低发病率集群。利用与肺结核病例相关的监测变量训练了随机森林机器学习模型,以预测非热点城市中的热点:我们的分析显示,结核病高发和低发地区呈自西向东的分布格局。随机森林分类模型利用六个监测变量来预测热点与非热点。该机器学习模型的受体运算曲线下面积(AUC-ROC)为 0.81:复发病例、肺结核死亡病例、抗生素治疗方案变化、新发病例百分比以及有吸烟史的病例百分比较高的城市是热点地区的最佳预测因素。可以利用这种预测方法来确定哪些城市成为疾病热点的风险最高,从而为政策制定者提供一种基于证据的工具,指导沿河城市的疾病控制资源分配。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Erratum. Socio-occupational conditions and health of fishers exposed to the oil disaster-crime in Pernambuco, Brazil. Places to purchase food in urban and rural areas of Brazil. Erratum. Erratum.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1