机器学习算法在预测新发高血压中的应用:基于中国健康与营养调查的研究。

IF 4 3区 医学 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH Environmental Health and Preventive Medicine Pub Date : 2025-01-01 DOI:10.1265/ehpm.24-00270
Manhui Zhang, Xian Xia, Qiqi Wang, Yue Pan, Guanyi Zhang, Zhigang Wang
{"title":"机器学习算法在预测新发高血压中的应用:基于中国健康与营养调查的研究。","authors":"Manhui Zhang, Xian Xia, Qiqi Wang, Yue Pan, Guanyi Zhang, Zhigang Wang","doi":"10.1265/ehpm.24-00270","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Hypertension is a serious chronic disease that can significantly lead to various cardiovascular diseases, affecting vital organs such as the heart, brain, and kidneys. Our goal is to predict the risk of new onset hypertension using machine learning algorithms and identify the characteristics of patients with new onset hypertension.</p><p><strong>Methods: </strong>We analyzed data from the 2011 China Health and Nutrition Survey cohort of individuals who were not hypertensive at baseline and had follow-up results available for prediction by 2015. We tested and evaluated the performance of four traditional machine learning algorithms commonly used in epidemiological studies: Logistic Regression, Support Vector Machine, XGBoost, LightGBM, and two deep learning algorithms: TabNet and AMFormer model. We modeled using 16 and 29 features, respectively. SHAP values were applied to select key features associated with new onset hypertension.</p><p><strong>Results: </strong>A total of 4,982 participants were included in the analysis, of whom 1,017 developed hypertension during the 4-year follow-up. Among the 16-feature models, Logistic Regression had the highest AUC of 0.784(0.775∼0.806). In the 29-feature prediction models, AMFormer performed the best with an AUC of 0.802(0.795∼0.820), and also scored the highest in MCC (0.417, 95%CI: 0.400∼0.434) and F1 (0.503, 95%CI: 0.484∼0.505) metrics, demonstrating superior overall performance compared to the other models. Additionally, key features selected based on the AMFormer, such as age, province, waist circumference, urban or rural location, education level, employment status, weight, WHR, and BMI, played significant roles.</p><p><strong>Conclusion: </strong>We used the AMFormer model for the first time in predicting new onset hypertension and achieved the best results among the six algorithms tested. Key features associated with new onset hypertension can be determined through this algorithm. The practice of machine learning algorithms can further enhance the predictive efficacy of diseases and identify risk factors for diseases.</p>","PeriodicalId":11707,"journal":{"name":"Environmental Health and Preventive Medicine","volume":"30 ","pages":"3"},"PeriodicalIF":4.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11744027/pdf/","citationCount":"0","resultStr":"{\"title\":\"Application of machine learning algorithms in predicting new onset hypertension: a study based on the China Health and Nutrition Survey.\",\"authors\":\"Manhui Zhang, Xian Xia, Qiqi Wang, Yue Pan, Guanyi Zhang, Zhigang Wang\",\"doi\":\"10.1265/ehpm.24-00270\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Hypertension is a serious chronic disease that can significantly lead to various cardiovascular diseases, affecting vital organs such as the heart, brain, and kidneys. Our goal is to predict the risk of new onset hypertension using machine learning algorithms and identify the characteristics of patients with new onset hypertension.</p><p><strong>Methods: </strong>We analyzed data from the 2011 China Health and Nutrition Survey cohort of individuals who were not hypertensive at baseline and had follow-up results available for prediction by 2015. We tested and evaluated the performance of four traditional machine learning algorithms commonly used in epidemiological studies: Logistic Regression, Support Vector Machine, XGBoost, LightGBM, and two deep learning algorithms: TabNet and AMFormer model. We modeled using 16 and 29 features, respectively. SHAP values were applied to select key features associated with new onset hypertension.</p><p><strong>Results: </strong>A total of 4,982 participants were included in the analysis, of whom 1,017 developed hypertension during the 4-year follow-up. Among the 16-feature models, Logistic Regression had the highest AUC of 0.784(0.775∼0.806). In the 29-feature prediction models, AMFormer performed the best with an AUC of 0.802(0.795∼0.820), and also scored the highest in MCC (0.417, 95%CI: 0.400∼0.434) and F1 (0.503, 95%CI: 0.484∼0.505) metrics, demonstrating superior overall performance compared to the other models. Additionally, key features selected based on the AMFormer, such as age, province, waist circumference, urban or rural location, education level, employment status, weight, WHR, and BMI, played significant roles.</p><p><strong>Conclusion: </strong>We used the AMFormer model for the first time in predicting new onset hypertension and achieved the best results among the six algorithms tested. Key features associated with new onset hypertension can be determined through this algorithm. The practice of machine learning algorithms can further enhance the predictive efficacy of diseases and identify risk factors for diseases.</p>\",\"PeriodicalId\":11707,\"journal\":{\"name\":\"Environmental Health and Preventive Medicine\",\"volume\":\"30 \",\"pages\":\"3\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11744027/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Health and Preventive Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1265/ehpm.24-00270\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Health and Preventive Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1265/ehpm.24-00270","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

摘要

背景:高血压是一种严重的慢性疾病,可显著导致各种心血管疾病,影响心脏、大脑和肾脏等重要器官。我们的目标是使用机器学习算法预测新发高血压的风险,并识别新发高血压患者的特征。方法:我们分析了来自2011年中国健康与营养调查队列的数据,这些人群在基线时没有高血压,并且在2015年之前有可用于预测的随访结果。我们测试并评估了流行病学研究中常用的四种传统机器学习算法的性能:Logistic回归、支持向量机、XGBoost、LightGBM,以及两种深度学习算法:TabNet和AMFormer模型。我们分别使用16个和29个特征进行建模。应用SHAP值选择与新发高血压相关的关键特征。结果:共有4982名参与者被纳入分析,其中1017人在4年随访期间出现高血压。在16个特征模型中,Logistic回归的AUC最高,为0.784(0.775 ~ 0.806)。在29个特征预测模型中,AMFormer表现最好,AUC为0.802(0.795 ~ 0.820),并且在MCC (0.417, 95%CI: 0.400 ~ 0.434)和F1 (0.503, 95%CI: 0.484 ~ 0.505)指标中得分最高,与其他模型相比表现出卓越的整体性能。此外,基于AMFormer选择的关键特征,如年龄、省份、腰围、城市或农村位置、教育程度、就业状况、体重、WHR和BMI,也发挥了重要作用。结论:我们首次使用AMFormer模型预测新发高血压,并且在6种算法中取得了最好的结果。通过该算法可以确定与新发高血压相关的关键特征。机器学习算法的实践可以进一步增强疾病的预测功效,识别疾病的危险因素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Application of machine learning algorithms in predicting new onset hypertension: a study based on the China Health and Nutrition Survey.

Background: Hypertension is a serious chronic disease that can significantly lead to various cardiovascular diseases, affecting vital organs such as the heart, brain, and kidneys. Our goal is to predict the risk of new onset hypertension using machine learning algorithms and identify the characteristics of patients with new onset hypertension.

Methods: We analyzed data from the 2011 China Health and Nutrition Survey cohort of individuals who were not hypertensive at baseline and had follow-up results available for prediction by 2015. We tested and evaluated the performance of four traditional machine learning algorithms commonly used in epidemiological studies: Logistic Regression, Support Vector Machine, XGBoost, LightGBM, and two deep learning algorithms: TabNet and AMFormer model. We modeled using 16 and 29 features, respectively. SHAP values were applied to select key features associated with new onset hypertension.

Results: A total of 4,982 participants were included in the analysis, of whom 1,017 developed hypertension during the 4-year follow-up. Among the 16-feature models, Logistic Regression had the highest AUC of 0.784(0.775∼0.806). In the 29-feature prediction models, AMFormer performed the best with an AUC of 0.802(0.795∼0.820), and also scored the highest in MCC (0.417, 95%CI: 0.400∼0.434) and F1 (0.503, 95%CI: 0.484∼0.505) metrics, demonstrating superior overall performance compared to the other models. Additionally, key features selected based on the AMFormer, such as age, province, waist circumference, urban or rural location, education level, employment status, weight, WHR, and BMI, played significant roles.

Conclusion: We used the AMFormer model for the first time in predicting new onset hypertension and achieved the best results among the six algorithms tested. Key features associated with new onset hypertension can be determined through this algorithm. The practice of machine learning algorithms can further enhance the predictive efficacy of diseases and identify risk factors for diseases.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Environmental Health and Preventive Medicine
Environmental Health and Preventive Medicine PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH -
CiteScore
7.90
自引率
2.10%
发文量
44
审稿时长
10 weeks
期刊介绍: The official journal of the Japanese Society for Hygiene, Environmental Health and Preventive Medicine (EHPM) brings a comprehensive approach to prevention and environmental health related to medical, biological, molecular biological, genetic, physical, psychosocial, chemical, and other environmental factors. Environmental Health and Preventive Medicine features definitive studies on human health sciences and provides comprehensive and unique information to a worldwide readership.
期刊最新文献
Association between urinary metallothionein concentration and causes of death among cadmium-exposed residents in Japan: a 35-year follow-up study. Application of machine learning algorithms in predicting new onset hypertension: a study based on the China Health and Nutrition Survey. Characteristics and outcomes of out-of-hospital cardiac arrest among students under school supervision in Japan: a descriptive epidemiological study (2008-2021). Impact of fear of coronavirus disease 2019 on attention-deficit/hyperactivity disorder traits associated with depressive symptoms, functional impairment, and low self-esteem in university students: a cross-sectional study with mediation analysis. Regional adipose distribution and metabolically unhealthy phenotype in Chinese adults: evidence from China National Health Survey.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1