Prediction of metabolic syndrome using machine learning approaches based on genetic and nutritional factors: a 14-year prospective-based cohort study.

IF 2 4区医学 Q3 GENETICS & HEREDITY BMC Medical Genomics Pub Date : 2024-09-04 DOI:10.1186/s12920-024-01998-1

Dayeon Shin

{"title":"Prediction of metabolic syndrome using machine learning approaches based on genetic and nutritional factors: a 14-year prospective-based cohort study.","authors":"Dayeon Shin","doi":"10.1186/s12920-024-01998-1","DOIUrl":null,"url":null,"abstract":"Introduction: Metabolic syndrome is a chronic disease associated with multiple comorbidities. Over the last few years, machine learning techniques have been used to predict metabolic syndrome. However, studies incorporating demographic, clinical, laboratory, dietary, and genetic factors to predict the incidence of metabolic syndrome in Koreans are limited. In the present study, we propose a genome-wide polygenic risk score for the prediction of metabolic syndrome, along with other factors, to improve the prediction accuracy of metabolic syndrome.Methods: We developed 7 machine learning-based models and used Cox multivariable regression, deep neural network (DNN), support vector machine (SVM), stochastic gradient descent (SGD), random forest (RAF), Naïve Bayes (NBA) classifier, and AdaBoost (ADB) to predict the incidence of metabolic syndrome at year 14 using the dataset from the Korean Genome and Epidemiology Study (KoGES) Ansan and Ansung.Results: Of the 5440 patients, 2,120 were considered to have new-onset metabolic syndrome. The AUC values of model, which included sex, age, alcohol intake, energy intake, marital status, education status, income status, smoking status, dried laver intake, and genome-wide polygenic risk score (gPRS) Z-score based on 344,447 SNPs (p-value < 1.0), were the highest for RAF (0.994 [95% CI 0.985, 1.000]) and ADB (0.994 [95% CI 0.986, 1.000]).Conclusions: Incorporating both gPRS and demographic, clinical, laboratory, and seaweed data led to enhanced metabolic syndrome risk prediction by capturing the distinct etiologies of metabolic syndrome development. The RAF- and ADB-based models predicted metabolic syndrome more accurately than the NBA-based model for the Korean population.","PeriodicalId":8915,"journal":{"name":"BMC Medical Genomics","volume":"17 1","pages":"224"},"PeriodicalIF":2.0000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11373243/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Genomics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12920-024-01998-1","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Metabolic syndrome is a chronic disease associated with multiple comorbidities. Over the last few years, machine learning techniques have been used to predict metabolic syndrome. However, studies incorporating demographic, clinical, laboratory, dietary, and genetic factors to predict the incidence of metabolic syndrome in Koreans are limited. In the present study, we propose a genome-wide polygenic risk score for the prediction of metabolic syndrome, along with other factors, to improve the prediction accuracy of metabolic syndrome.

Methods: We developed 7 machine learning-based models and used Cox multivariable regression, deep neural network (DNN), support vector machine (SVM), stochastic gradient descent (SGD), random forest (RAF), Naïve Bayes (NBA) classifier, and AdaBoost (ADB) to predict the incidence of metabolic syndrome at year 14 using the dataset from the Korean Genome and Epidemiology Study (KoGES) Ansan and Ansung.

Results: Of the 5440 patients, 2,120 were considered to have new-onset metabolic syndrome. The AUC values of model, which included sex, age, alcohol intake, energy intake, marital status, education status, income status, smoking status, dried laver intake, and genome-wide polygenic risk score (gPRS) Z-score based on 344,447 SNPs (p-value < 1.0), were the highest for RAF (0.994 [95% CI 0.985, 1.000]) and ADB (0.994 [95% CI 0.986, 1.000]).

Conclusions: Incorporating both gPRS and demographic, clinical, laboratory, and seaweed data led to enhanced metabolic syndrome risk prediction by capturing the distinct etiologies of metabolic syndrome development. The RAF- and ADB-based models predicted metabolic syndrome more accurately than the NBA-based model for the Korean population.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用基于遗传和营养因素的机器学习方法预测代谢综合征：一项为期 14 年的前瞻性队列研究。

导言代谢综合征是一种与多种并发症相关的慢性疾病。过去几年中，机器学习技术已被用于预测代谢综合征。然而，结合人口统计学、临床、实验室、饮食和遗传因素来预测韩国人代谢综合征发病率的研究还很有限。在本研究中，我们提出了预测代谢综合征的全基因组多基因风险评分，并结合其他因素，以提高代谢综合征的预测准确性：我们开发了 7 种基于机器学习的模型，并使用 Cox 多变量回归、深度神经网络（DNN）、支持向量机（SVM）、随机梯度下降（SGD）、随机森林（RAF）、奈夫贝叶斯（NBA）分类器和 AdaBoost（ADB），利用韩国基因组与流行病学研究（KoGES）安山和安城的数据集预测代谢综合征在第 14 年的发病率：在 5440 名患者中，有 2120 人被认为患有新发代谢综合征。该模型的 AUC 值包括性别、年龄、酒精摄入量、能量摄入量、婚姻状况、教育状况、收入状况、吸烟状况、紫菜干摄入量，以及基于 344 447 个 SNPs 的全基因组多基因风险评分（gPRS）Z-score（p 值结论）：将 gPRS 与人口统计学、临床、实验室和海藻数据相结合，通过捕捉代谢综合征发生的不同病因，提高了代谢综合征的风险预测能力。在韩国人群中，基于 RAF 和 ADB 的模型比基于 NBA 的模型能更准确地预测代谢综合征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

BMC Medical Genomics 医学-遗传学

CiteScore

3.90

自引率

0.00%

发文量

243

审稿时长

3.5 months

期刊介绍： BMC Medical Genomics is an open access journal publishing original peer-reviewed research articles in all aspects of functional genomics, genome structure, genome-scale population genetics, epigenomics, proteomics, systems analysis, and pharmacogenomics in relation to human health and disease.