{"title":"基于信息价值和机器学习的中国血吸虫病传播高风险区识别:一种新的数据驱动建模尝试。","authors":"Yan-Feng Gong, Ling-Qian Zhu, Yin-Long Li, Li-Juan Zhang, Jing-Bo Xue, Shang Xia, Shan Lv, Jing Xu, Shi-Zhu Li","doi":"10.1186/s40249-021-00874-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Schistosomiasis control is striving forward to transmission interruption and even elimination, evidence-lead control is of vital importance to eliminate the hidden dangers of schistosomiasis. This study attempts to identify high risk areas of schistosomiasis in China by using information value and machine learning.</p><p><strong>Methods: </strong>The local case distribution from schistosomiasis surveillance data in China between 2005 and 2019 was assessed based on 19 variables including climate, geography, and social economy. Seven models were built in three categories including information value (IV), three machine learning models [logistic regression (LR), random forest (RF), generalized boosted model (GBM)], and three coupled models (IV + LR, IV + RF, IV + GBM). Accuracy, area under the curve (AUC), and F1-score were used to evaluate the prediction performance of the models. The optimal model was selected to predict the risk distribution for schistosomiasis.</p><p><strong>Results: </strong>There is a more prone to schistosomiasis epidemic provided that paddy fields, grasslands, less than 2.5 km from the waterway, annual average temperature of 11.5-19.0 °C, annual average rainfall of 1000-1550 mm. IV + GBM had the highest prediction effect (accuracy = 0.878, AUC = 0.902, F1 = 0.920) compared with the other six models. The results of IV + GBM showed that the risk areas are mainly distributed in the coastal regions of the middle and lower reaches of the Yangtze River, the Poyang Lake region, and the Dongting Lake region. High-risk areas are primarily distributed in eastern Changde, western Yueyang, northeastern Yiyang, middle Changsha of Hunan province; southern Jiujiang, northern Nanchang, northeastern Shangrao, eastern Yichun in Jiangxi province; southern Jingzhou, southern Xiantao, middle Wuhan in Hubei province; southern Anqing, northwestern Guichi, eastern Wuhu in Anhui province; middle Meishan, northern Leshan, and the middle of Liangshan in Sichuan province.</p><p><strong>Conclusions: </strong>The risk of schistosomiasis transmission in China still exists, with high-risk areas relatively concentrated in the coastal regions of the middle and lower reaches of the Yangtze River. Coupled models of IV and machine learning provide for effective analysis and prediction, forming a scientific basis for evidence-lead surveillance and control.</p>","PeriodicalId":13587,"journal":{"name":"Infectious Diseases of Poverty","volume":null,"pages":null},"PeriodicalIF":4.8000,"publicationDate":"2021-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8237418/pdf/","citationCount":"0","resultStr":"{\"title\":\"Identification of the high-risk area for schistosomiasis transmission in China based on information value and machine learning: a newly data-driven modeling attempt.\",\"authors\":\"Yan-Feng Gong, Ling-Qian Zhu, Yin-Long Li, Li-Juan Zhang, Jing-Bo Xue, Shang Xia, Shan Lv, Jing Xu, Shi-Zhu Li\",\"doi\":\"10.1186/s40249-021-00874-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Schistosomiasis control is striving forward to transmission interruption and even elimination, evidence-lead control is of vital importance to eliminate the hidden dangers of schistosomiasis. This study attempts to identify high risk areas of schistosomiasis in China by using information value and machine learning.</p><p><strong>Methods: </strong>The local case distribution from schistosomiasis surveillance data in China between 2005 and 2019 was assessed based on 19 variables including climate, geography, and social economy. Seven models were built in three categories including information value (IV), three machine learning models [logistic regression (LR), random forest (RF), generalized boosted model (GBM)], and three coupled models (IV + LR, IV + RF, IV + GBM). Accuracy, area under the curve (AUC), and F1-score were used to evaluate the prediction performance of the models. The optimal model was selected to predict the risk distribution for schistosomiasis.</p><p><strong>Results: </strong>There is a more prone to schistosomiasis epidemic provided that paddy fields, grasslands, less than 2.5 km from the waterway, annual average temperature of 11.5-19.0 °C, annual average rainfall of 1000-1550 mm. IV + GBM had the highest prediction effect (accuracy = 0.878, AUC = 0.902, F1 = 0.920) compared with the other six models. The results of IV + GBM showed that the risk areas are mainly distributed in the coastal regions of the middle and lower reaches of the Yangtze River, the Poyang Lake region, and the Dongting Lake region. High-risk areas are primarily distributed in eastern Changde, western Yueyang, northeastern Yiyang, middle Changsha of Hunan province; southern Jiujiang, northern Nanchang, northeastern Shangrao, eastern Yichun in Jiangxi province; southern Jingzhou, southern Xiantao, middle Wuhan in Hubei province; southern Anqing, northwestern Guichi, eastern Wuhu in Anhui province; middle Meishan, northern Leshan, and the middle of Liangshan in Sichuan province.</p><p><strong>Conclusions: </strong>The risk of schistosomiasis transmission in China still exists, with high-risk areas relatively concentrated in the coastal regions of the middle and lower reaches of the Yangtze River. Coupled models of IV and machine learning provide for effective analysis and prediction, forming a scientific basis for evidence-lead surveillance and control.</p>\",\"PeriodicalId\":13587,\"journal\":{\"name\":\"Infectious Diseases of Poverty\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2021-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8237418/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Infectious Diseases of Poverty\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s40249-021-00874-9\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"INFECTIOUS DISEASES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infectious Diseases of Poverty","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s40249-021-00874-9","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
Identification of the high-risk area for schistosomiasis transmission in China based on information value and machine learning: a newly data-driven modeling attempt.
Background: Schistosomiasis control is striving forward to transmission interruption and even elimination, evidence-lead control is of vital importance to eliminate the hidden dangers of schistosomiasis. This study attempts to identify high risk areas of schistosomiasis in China by using information value and machine learning.
Methods: The local case distribution from schistosomiasis surveillance data in China between 2005 and 2019 was assessed based on 19 variables including climate, geography, and social economy. Seven models were built in three categories including information value (IV), three machine learning models [logistic regression (LR), random forest (RF), generalized boosted model (GBM)], and three coupled models (IV + LR, IV + RF, IV + GBM). Accuracy, area under the curve (AUC), and F1-score were used to evaluate the prediction performance of the models. The optimal model was selected to predict the risk distribution for schistosomiasis.
Results: There is a more prone to schistosomiasis epidemic provided that paddy fields, grasslands, less than 2.5 km from the waterway, annual average temperature of 11.5-19.0 °C, annual average rainfall of 1000-1550 mm. IV + GBM had the highest prediction effect (accuracy = 0.878, AUC = 0.902, F1 = 0.920) compared with the other six models. The results of IV + GBM showed that the risk areas are mainly distributed in the coastal regions of the middle and lower reaches of the Yangtze River, the Poyang Lake region, and the Dongting Lake region. High-risk areas are primarily distributed in eastern Changde, western Yueyang, northeastern Yiyang, middle Changsha of Hunan province; southern Jiujiang, northern Nanchang, northeastern Shangrao, eastern Yichun in Jiangxi province; southern Jingzhou, southern Xiantao, middle Wuhan in Hubei province; southern Anqing, northwestern Guichi, eastern Wuhu in Anhui province; middle Meishan, northern Leshan, and the middle of Liangshan in Sichuan province.
Conclusions: The risk of schistosomiasis transmission in China still exists, with high-risk areas relatively concentrated in the coastal regions of the middle and lower reaches of the Yangtze River. Coupled models of IV and machine learning provide for effective analysis and prediction, forming a scientific basis for evidence-lead surveillance and control.
期刊介绍:
Infectious Diseases of Poverty is a peer-reviewed, open access journal that focuses on essential public health questions related to infectious diseases of poverty. It covers a wide range of topics and methods, including the biology of pathogens and vectors, diagnosis and detection, treatment and case management, epidemiology and modeling, zoonotic hosts and animal reservoirs, control strategies and implementation, new technologies, and their application.
The journal also explores the impact of transdisciplinary or multisectoral approaches on health systems, ecohealth, environmental management, and innovative technologies. It aims to provide a platform for the exchange of research and ideas that can contribute to the improvement of public health in resource-limited settings.
In summary, Infectious Diseases of Poverty aims to address the urgent challenges posed by infectious diseases in impoverished populations. By publishing high-quality research in various areas, the journal seeks to advance our understanding of these diseases and contribute to the development of effective strategies for prevention, diagnosis, and treatment.