基于机器学习的 2 型糖尿病并发症预测模型(使用马来西亚国家糖尿病登记册):研究方案。

IF 1.6 Q3 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH Journal of Public Health Research Pub Date : 2024-02-29 eCollection Date: 2024-01-01 DOI:10.1177/22799036241231786
Mohamad Zulfikrie Abas, Ken Li, Noran Naqiah Hairi, Wan Yuen Choo, Kim Sui Wan
{"title":"基于机器学习的 2 型糖尿病并发症预测模型(使用马来西亚国家糖尿病登记册):研究方案。","authors":"Mohamad Zulfikrie Abas, Ken Li, Noran Naqiah Hairi, Wan Yuen Choo, Kim Sui Wan","doi":"10.1177/22799036241231786","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The prevalence of diabetes in Malaysia is increasing, and identifying patients with higher risk of complications is crucial for effective management. The use of machine learning (ML) to develop prediction models has been shown to outperform non-ML models. This study aims to develop predictive models for Type 2 Diabetes (T2D) complications in Malaysia using ML techniques.</p><p><strong>Design and methods: </strong>This 10-year retrospective cohort study uses clinical audit datasets from Malaysian National Diabetes Registry from 2011 to 2021. T2D patients who received treatment in public health clinics in the southern region of Malaysia with at least two data points in 10 years are included. Patients with diabetes complications at baseline are excluded to ensure temporality between predictors and the target variable. Appropriate methods are used to address issues related to data cleaning, missing data imputation, data splitting, feature selection, and class imbalance. The study uses 7 ML algorithms, including logistic regression, support vector machine, <i>k</i>-nearest neighbours, decision tree, random forest, extreme gradient boosting, and light gradient boosting machine, to develop predictive models for four target variables: nephropathy, retinopathy, ischaemic heart disease, and stroke. Hyperparameter tuning is performed for each algorithm. The model training is performed using a stratified <i>k</i>-fold cross-validation technique. The best model for each algorithm is evaluated on a hold-out dataset using multiple metrics.</p><p><strong>Expected impact of the study on public health: </strong>The prediction model may be a valuable tool for diabetes management and secondary prevention by enabling earlier interventions and optimal resource allocation, leading to better health outcomes.</p>","PeriodicalId":45958,"journal":{"name":"Journal of Public Health Research","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10906050/pdf/","citationCount":"0","resultStr":"{\"title\":\"Machine learning based predictive model of Type 2 diabetes complications using Malaysian National Diabetes Registry: A study protocol.\",\"authors\":\"Mohamad Zulfikrie Abas, Ken Li, Noran Naqiah Hairi, Wan Yuen Choo, Kim Sui Wan\",\"doi\":\"10.1177/22799036241231786\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The prevalence of diabetes in Malaysia is increasing, and identifying patients with higher risk of complications is crucial for effective management. The use of machine learning (ML) to develop prediction models has been shown to outperform non-ML models. This study aims to develop predictive models for Type 2 Diabetes (T2D) complications in Malaysia using ML techniques.</p><p><strong>Design and methods: </strong>This 10-year retrospective cohort study uses clinical audit datasets from Malaysian National Diabetes Registry from 2011 to 2021. T2D patients who received treatment in public health clinics in the southern region of Malaysia with at least two data points in 10 years are included. Patients with diabetes complications at baseline are excluded to ensure temporality between predictors and the target variable. Appropriate methods are used to address issues related to data cleaning, missing data imputation, data splitting, feature selection, and class imbalance. The study uses 7 ML algorithms, including logistic regression, support vector machine, <i>k</i>-nearest neighbours, decision tree, random forest, extreme gradient boosting, and light gradient boosting machine, to develop predictive models for four target variables: nephropathy, retinopathy, ischaemic heart disease, and stroke. Hyperparameter tuning is performed for each algorithm. The model training is performed using a stratified <i>k</i>-fold cross-validation technique. The best model for each algorithm is evaluated on a hold-out dataset using multiple metrics.</p><p><strong>Expected impact of the study on public health: </strong>The prediction model may be a valuable tool for diabetes management and secondary prevention by enabling earlier interventions and optimal resource allocation, leading to better health outcomes.</p>\",\"PeriodicalId\":45958,\"journal\":{\"name\":\"Journal of Public Health Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-02-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10906050/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Public Health Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/22799036241231786\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Public Health Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/22799036241231786","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

摘要

背景:马来西亚的糖尿病发病率正在上升,识别并发症风险较高的患者对有效管理至关重要。使用机器学习(ML)开发预测模型已被证明优于非ML模型。本研究旨在利用ML技术开发马来西亚2型糖尿病(T2D)并发症的预测模型:这项为期 10 年的回顾性队列研究使用了 2011 年至 2021 年马来西亚国家糖尿病登记处的临床审计数据集。研究对象包括在马来西亚南部地区公共卫生诊所接受治疗的 T2D 患者,10 年内至少有两个数据点。不包括基线糖尿病并发症患者,以确保预测因素与目标变量之间的时间性。采用适当的方法解决数据清理、缺失数据估算、数据分割、特征选择和类不平衡等相关问题。研究使用了 7 种 ML 算法,包括逻辑回归、支持向量机、k-近邻、决策树、随机森林、极梯度提升和轻梯度提升机,为肾病、视网膜病变、缺血性心脏病和中风这四个目标变量开发预测模型。每种算法都要进行超参数调整。模型训练采用分层 k 倍交叉验证技术。研究对公共卫生的预期影响:该预测模型将成为糖尿病管理和二级预防的重要工具,可实现早期干预和优化资源分配,从而改善健康状况。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Machine learning based predictive model of Type 2 diabetes complications using Malaysian National Diabetes Registry: A study protocol.

Background: The prevalence of diabetes in Malaysia is increasing, and identifying patients with higher risk of complications is crucial for effective management. The use of machine learning (ML) to develop prediction models has been shown to outperform non-ML models. This study aims to develop predictive models for Type 2 Diabetes (T2D) complications in Malaysia using ML techniques.

Design and methods: This 10-year retrospective cohort study uses clinical audit datasets from Malaysian National Diabetes Registry from 2011 to 2021. T2D patients who received treatment in public health clinics in the southern region of Malaysia with at least two data points in 10 years are included. Patients with diabetes complications at baseline are excluded to ensure temporality between predictors and the target variable. Appropriate methods are used to address issues related to data cleaning, missing data imputation, data splitting, feature selection, and class imbalance. The study uses 7 ML algorithms, including logistic regression, support vector machine, k-nearest neighbours, decision tree, random forest, extreme gradient boosting, and light gradient boosting machine, to develop predictive models for four target variables: nephropathy, retinopathy, ischaemic heart disease, and stroke. Hyperparameter tuning is performed for each algorithm. The model training is performed using a stratified k-fold cross-validation technique. The best model for each algorithm is evaluated on a hold-out dataset using multiple metrics.

Expected impact of the study on public health: The prediction model may be a valuable tool for diabetes management and secondary prevention by enabling earlier interventions and optimal resource allocation, leading to better health outcomes.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Public Health Research
Journal of Public Health Research PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH-
CiteScore
3.70
自引率
4.30%
发文量
116
审稿时长
10 weeks
期刊介绍: The Journal of Public Health Research (JPHR) is an online Open Access, peer-reviewed journal in the field of public health science. The aim of the journal is to stimulate debate and dissemination of knowledge in the public health field in order to improve efficacy, effectiveness and efficiency of public health interventions to improve health outcomes of populations. This aim can only be achieved by adopting a global and multidisciplinary approach. The Journal of Public Health Research publishes contributions from both the “traditional'' disciplines of public health, including hygiene, epidemiology, health education, environmental health, occupational health, health policy, hospital management, health economics, law and ethics as well as from the area of new health care fields including social science, communication science, eHealth and mHealth philosophy, health technology assessment, genetics research implications, population-mental health, gender and disparity issues, global and migration-related themes. In support of this approach, JPHR strongly encourages the use of real multidisciplinary approaches and analyses in the manuscripts submitted to the journal. In addition to Original research, Systematic Review, Meta-analysis, Meta-synthesis and Perspectives and Debate articles, JPHR publishes newsworthy Brief Reports, Letters and Study Protocols related to public health and public health management activities.
期刊最新文献
Analyzing diversity trends in dermatology: A comprehensive overview. Reproducibility and explainability in digital pathology: The need to make black-box artificial intelligence systems more transparent. Healthcare students are faced with the issue of the rights of people with disabilities and the quality of services: Are we training future healthcare workers who overemphasize technology? Elder abuse in the transgender community of Pakistan: A clandestine issue. Mediterranean Diet adherence, physical activity level, and quality of life in patients affected by thyroid diseases: Comparison between pre- and post-lockdown assessment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1