基于信息价值和机器学习的中国血吸虫病传播高风险区识别:一种新的数据驱动建模尝试。

IF 4.8 1区 医学 Q1 INFECTIOUS DISEASES Infectious Diseases of Poverty Pub Date : 2021-06-27 DOI:10.1186/s40249-021-00874-9
Yan-Feng Gong, Ling-Qian Zhu, Yin-Long Li, Li-Juan Zhang, Jing-Bo Xue, Shang Xia, Shan Lv, Jing Xu, Shi-Zhu Li
{"title":"基于信息价值和机器学习的中国血吸虫病传播高风险区识别:一种新的数据驱动建模尝试。","authors":"Yan-Feng Gong, Ling-Qian Zhu, Yin-Long Li, Li-Juan Zhang, Jing-Bo Xue, Shang Xia, Shan Lv, Jing Xu, Shi-Zhu Li","doi":"10.1186/s40249-021-00874-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Schistosomiasis control is striving forward to transmission interruption and even elimination, evidence-lead control is of vital importance to eliminate the hidden dangers of schistosomiasis. This study attempts to identify high risk areas of schistosomiasis in China by using information value and machine learning.</p><p><strong>Methods: </strong>The local case distribution from schistosomiasis surveillance data in China between 2005 and 2019 was assessed based on 19 variables including climate, geography, and social economy. Seven models were built in three categories including information value (IV), three machine learning models [logistic regression (LR), random forest (RF), generalized boosted model (GBM)], and three coupled models (IV + LR, IV + RF, IV + GBM). Accuracy, area under the curve (AUC), and F1-score were used to evaluate the prediction performance of the models. The optimal model was selected to predict the risk distribution for schistosomiasis.</p><p><strong>Results: </strong>There is a more prone to schistosomiasis epidemic provided that paddy fields, grasslands, less than 2.5 km from the waterway, annual average temperature of 11.5-19.0 °C, annual average rainfall of 1000-1550 mm. IV + GBM had the highest prediction effect (accuracy = 0.878, AUC = 0.902, F1 = 0.920) compared with the other six models. The results of IV + GBM showed that the risk areas are mainly distributed in the coastal regions of the middle and lower reaches of the Yangtze River, the Poyang Lake region, and the Dongting Lake region. High-risk areas are primarily distributed in eastern Changde, western Yueyang, northeastern Yiyang, middle Changsha of Hunan province; southern Jiujiang, northern Nanchang, northeastern Shangrao, eastern Yichun in Jiangxi province; southern Jingzhou, southern Xiantao, middle Wuhan in Hubei province; southern Anqing, northwestern Guichi, eastern Wuhu in Anhui province; middle Meishan, northern Leshan, and the middle of Liangshan in Sichuan province.</p><p><strong>Conclusions: </strong>The risk of schistosomiasis transmission in China still exists, with high-risk areas relatively concentrated in the coastal regions of the middle and lower reaches of the Yangtze River. Coupled models of IV and machine learning provide for effective analysis and prediction, forming a scientific basis for evidence-lead surveillance and control.</p>","PeriodicalId":13587,"journal":{"name":"Infectious Diseases of Poverty","volume":null,"pages":null},"PeriodicalIF":4.8000,"publicationDate":"2021-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8237418/pdf/","citationCount":"0","resultStr":"{\"title\":\"Identification of the high-risk area for schistosomiasis transmission in China based on information value and machine learning: a newly data-driven modeling attempt.\",\"authors\":\"Yan-Feng Gong, Ling-Qian Zhu, Yin-Long Li, Li-Juan Zhang, Jing-Bo Xue, Shang Xia, Shan Lv, Jing Xu, Shi-Zhu Li\",\"doi\":\"10.1186/s40249-021-00874-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Schistosomiasis control is striving forward to transmission interruption and even elimination, evidence-lead control is of vital importance to eliminate the hidden dangers of schistosomiasis. This study attempts to identify high risk areas of schistosomiasis in China by using information value and machine learning.</p><p><strong>Methods: </strong>The local case distribution from schistosomiasis surveillance data in China between 2005 and 2019 was assessed based on 19 variables including climate, geography, and social economy. Seven models were built in three categories including information value (IV), three machine learning models [logistic regression (LR), random forest (RF), generalized boosted model (GBM)], and three coupled models (IV + LR, IV + RF, IV + GBM). Accuracy, area under the curve (AUC), and F1-score were used to evaluate the prediction performance of the models. The optimal model was selected to predict the risk distribution for schistosomiasis.</p><p><strong>Results: </strong>There is a more prone to schistosomiasis epidemic provided that paddy fields, grasslands, less than 2.5 km from the waterway, annual average temperature of 11.5-19.0 °C, annual average rainfall of 1000-1550 mm. IV + GBM had the highest prediction effect (accuracy = 0.878, AUC = 0.902, F1 = 0.920) compared with the other six models. The results of IV + GBM showed that the risk areas are mainly distributed in the coastal regions of the middle and lower reaches of the Yangtze River, the Poyang Lake region, and the Dongting Lake region. High-risk areas are primarily distributed in eastern Changde, western Yueyang, northeastern Yiyang, middle Changsha of Hunan province; southern Jiujiang, northern Nanchang, northeastern Shangrao, eastern Yichun in Jiangxi province; southern Jingzhou, southern Xiantao, middle Wuhan in Hubei province; southern Anqing, northwestern Guichi, eastern Wuhu in Anhui province; middle Meishan, northern Leshan, and the middle of Liangshan in Sichuan province.</p><p><strong>Conclusions: </strong>The risk of schistosomiasis transmission in China still exists, with high-risk areas relatively concentrated in the coastal regions of the middle and lower reaches of the Yangtze River. Coupled models of IV and machine learning provide for effective analysis and prediction, forming a scientific basis for evidence-lead surveillance and control.</p>\",\"PeriodicalId\":13587,\"journal\":{\"name\":\"Infectious Diseases of Poverty\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2021-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8237418/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Infectious Diseases of Poverty\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s40249-021-00874-9\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"INFECTIOUS DISEASES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infectious Diseases of Poverty","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s40249-021-00874-9","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0

摘要

背景:血吸虫病防治工作正朝着阻断传播甚至消除传播的方向努力,循证控制对消除血吸虫病隐患至关重要。本研究试图利用信息价值和机器学习识别中国血吸虫病高风险地区:方法:根据中国 2005 年至 2019 年的血吸虫病监测数据,基于气候、地理和社会经济等 19 个变量对当地病例分布进行评估。建立了三个类别的七个模型,包括信息值(IV)、三个机器学习模型[逻辑回归(LR)、随机森林(RF)、广义提升模型(GBM)]和三个耦合模型(IV + LR、IV + RF、IV + GBM)。精确度、曲线下面积(AUC)和 F1 分数用于评估模型的预测性能。结果表明,血吸虫病风险分布预测模型的准确性较高:在水田、草地、距离水道小于 2.5 公里、年平均气温为 11.5-19.0 ℃、年平均降雨量为 1000-1550 毫米的地方更容易发生血吸虫病流行。与其他六个模型相比,IV + GBM 的预测效果最高(准确率 = 0.878,AUC = 0.902,F1 = 0.920)。IV + GBM 的结果表明,风险区主要分布在长江中下游沿岸地区、鄱阳湖区和洞庭湖区。高风险区主要分布在湖南省常德市东部、岳阳市西部、益阳市东北部、长沙市中部;江西省九江市南部、南昌市北部、上饶市东北部、宜春市东部;湖北省荆州市南部、仙桃市南部、武汉市中部;安徽省安庆市南部、贵池市西北部、芜湖市东部;四川省眉山市中部、乐山市北部、凉山州中部。结论中国血吸虫病传播的风险依然存在,高风险地区相对集中在长江中下游沿海地区。IV和机器学习的耦合模型提供了有效的分析和预测,为以证据为先导的监测和控制提供了科学依据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Identification of the high-risk area for schistosomiasis transmission in China based on information value and machine learning: a newly data-driven modeling attempt.

Background: Schistosomiasis control is striving forward to transmission interruption and even elimination, evidence-lead control is of vital importance to eliminate the hidden dangers of schistosomiasis. This study attempts to identify high risk areas of schistosomiasis in China by using information value and machine learning.

Methods: The local case distribution from schistosomiasis surveillance data in China between 2005 and 2019 was assessed based on 19 variables including climate, geography, and social economy. Seven models were built in three categories including information value (IV), three machine learning models [logistic regression (LR), random forest (RF), generalized boosted model (GBM)], and three coupled models (IV + LR, IV + RF, IV + GBM). Accuracy, area under the curve (AUC), and F1-score were used to evaluate the prediction performance of the models. The optimal model was selected to predict the risk distribution for schistosomiasis.

Results: There is a more prone to schistosomiasis epidemic provided that paddy fields, grasslands, less than 2.5 km from the waterway, annual average temperature of 11.5-19.0 °C, annual average rainfall of 1000-1550 mm. IV + GBM had the highest prediction effect (accuracy = 0.878, AUC = 0.902, F1 = 0.920) compared with the other six models. The results of IV + GBM showed that the risk areas are mainly distributed in the coastal regions of the middle and lower reaches of the Yangtze River, the Poyang Lake region, and the Dongting Lake region. High-risk areas are primarily distributed in eastern Changde, western Yueyang, northeastern Yiyang, middle Changsha of Hunan province; southern Jiujiang, northern Nanchang, northeastern Shangrao, eastern Yichun in Jiangxi province; southern Jingzhou, southern Xiantao, middle Wuhan in Hubei province; southern Anqing, northwestern Guichi, eastern Wuhu in Anhui province; middle Meishan, northern Leshan, and the middle of Liangshan in Sichuan province.

Conclusions: The risk of schistosomiasis transmission in China still exists, with high-risk areas relatively concentrated in the coastal regions of the middle and lower reaches of the Yangtze River. Coupled models of IV and machine learning provide for effective analysis and prediction, forming a scientific basis for evidence-lead surveillance and control.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Infectious Diseases of Poverty
Infectious Diseases of Poverty Medicine-Public Health, Environmental and Occupational Health
CiteScore
16.70
自引率
1.20%
发文量
368
审稿时长
13 weeks
期刊介绍: Infectious Diseases of Poverty is a peer-reviewed, open access journal that focuses on essential public health questions related to infectious diseases of poverty. It covers a wide range of topics and methods, including the biology of pathogens and vectors, diagnosis and detection, treatment and case management, epidemiology and modeling, zoonotic hosts and animal reservoirs, control strategies and implementation, new technologies, and their application. The journal also explores the impact of transdisciplinary or multisectoral approaches on health systems, ecohealth, environmental management, and innovative technologies. It aims to provide a platform for the exchange of research and ideas that can contribute to the improvement of public health in resource-limited settings. In summary, Infectious Diseases of Poverty aims to address the urgent challenges posed by infectious diseases in impoverished populations. By publishing high-quality research in various areas, the journal seeks to advance our understanding of these diseases and contribute to the development of effective strategies for prevention, diagnosis, and treatment.
期刊最新文献
Spatio-temporal dynamics of malaria in Rwanda between 2012 and 2022: a demography-specific analysis Global patterns of syphilis, gonococcal infection, typhoid fever, paratyphoid fever, diphtheria, pertussis, tetanus, and leprosy from 1990 to 2021: findings from the Global Burden of Disease Study 2021 The abundance of snail hosts mediates the effects of antagonist interactions between trematodes on the transmission of human schistosomes MODELS: a six-step framework for developing an infectious disease model Mutations and intron polymorphisms in voltage-gated sodium channel genes of different geographic populations of Culex pipiens pallens/Culex pipiens quinquefasciatus in China
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1