{"title":"Prediction of tuberculosis clusters in the riverine municipalities of the Brazilian Amazon with machine learning.","authors":"Luis Silva, Luise Gomes da Motta, Lynn Eberly","doi":"10.1590/1980-549720240024","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Tuberculosis (TB) is the second most deadly infectious disease globally, posing a significant burden in Brazil and its Amazonian region. This study focused on the \"riverine municipalities\" and hypothesizes the presence of TB clusters in the area. We also aimed to train a machine learning model to differentiate municipalities classified as hot spots vs. non-hot spots using disease surveillance variables as predictors.</p><p><strong>Methods: </strong>Data regarding the incidence of TB from 2019 to 2022 in the riverine town was collected from the Brazilian Health Ministry Informatics Department. Moran's I was used to assess global spatial autocorrelation, while the Getis-Ord GI* method was employed to detect high and low-incidence clusters. A Random Forest machine-learning model was trained using surveillance variables related to TB cases to predict hot spots among non-hot spot municipalities.</p><p><strong>Results: </strong>Our analysis revealed distinct geographical clusters with high and low TB incidence following a west-to-east distribution pattern. The Random Forest Classification model utilizes six surveillance variables to predict hot vs. non-hot spots. The machine learning model achieved an Area Under the Receiver Operator Curve (AUC-ROC) of 0.81.</p><p><strong>Conclusion: </strong>Municipalities with higher percentages of recurrent cases, deaths due to TB, antibiotic regimen changes, percentage of new cases, and cases with smoking history were the best predictors of hot spots. This prediction method can be leveraged to identify the municipalities at the highest risk of being hot spots for the disease, aiding policymakers with an evidenced-based tool to direct resource allocation for disease control in the riverine municipalities.</p>","PeriodicalId":74697,"journal":{"name":"Revista brasileira de epidemiologia = Brazilian journal of epidemiology","volume":"27 ","pages":"e240024"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11093519/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revista brasileira de epidemiologia = Brazilian journal of epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1590/1980-549720240024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Tuberculosis (TB) is the second most deadly infectious disease globally, posing a significant burden in Brazil and its Amazonian region. This study focused on the "riverine municipalities" and hypothesizes the presence of TB clusters in the area. We also aimed to train a machine learning model to differentiate municipalities classified as hot spots vs. non-hot spots using disease surveillance variables as predictors.
Methods: Data regarding the incidence of TB from 2019 to 2022 in the riverine town was collected from the Brazilian Health Ministry Informatics Department. Moran's I was used to assess global spatial autocorrelation, while the Getis-Ord GI* method was employed to detect high and low-incidence clusters. A Random Forest machine-learning model was trained using surveillance variables related to TB cases to predict hot spots among non-hot spot municipalities.
Results: Our analysis revealed distinct geographical clusters with high and low TB incidence following a west-to-east distribution pattern. The Random Forest Classification model utilizes six surveillance variables to predict hot vs. non-hot spots. The machine learning model achieved an Area Under the Receiver Operator Curve (AUC-ROC) of 0.81.
Conclusion: Municipalities with higher percentages of recurrent cases, deaths due to TB, antibiotic regimen changes, percentage of new cases, and cases with smoking history were the best predictors of hot spots. This prediction method can be leveraged to identify the municipalities at the highest risk of being hot spots for the disease, aiding policymakers with an evidenced-based tool to direct resource allocation for disease control in the riverine municipalities.