发现糖尿病肾病生物标志物的机器学习、孟德尔随机化和实验验证综合方法。

IF 5.4 2区医学 Q1 ENDOCRINOLOGY & METABOLISM Diabetes, Obesity & Metabolism Pub Date : 2024-10-06 DOI:10.1111/dom.15933

Yidong Zhu MD, Jun Liu MD, Bo Wang MD

{"title":"发现糖尿病肾病生物标志物的机器学习、孟德尔随机化和实验验证综合方法。","authors":"Yidong Zhu MD, Jun Liu MD, Bo Wang MD","doi":"10.1111/dom.15933","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Aim</h3>\n \n <p>To identify potential biomarkers and explore the mechanisms underlying diabetic nephropathy (DN) by integrating machine learning, Mendelian randomization (MR) and experimental validation.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Microarray and RNA-sequencing datasets (GSE47184, GSE96804, GSE104948, GSE104954, GSE142025 and GSE175759) were obtained from the Gene Expression Omnibus database. Differential expression analysis identified the differentially expressed genes (DEGs) between patients with DN and controls. Diverse machine learning algorithms, including least absolute shrinkage and selection operator, support vector machine-recursive feature elimination, and random forest, were used to enhance gene selection accuracy and predictive power. We integrated summary-level data from genome-wide association studies on DN with expression quantitative trait loci data to identify genes with potential causal relationships to DN. The predictive performance of the biomarker gene was validated using receiver operating characteristic (ROC) curves. Gene set enrichment and correlation analyses were conducted to investigate potential mechanisms. Finally, the biomarker gene was validated using quantitative real-time polymerase chain reaction in clinical samples from patients with DN and controls.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Based on identified 314 DEGs, seven characteristic genes with high predictive performance were identified using three integrated machine learning algorithms. MR analysis revealed 219 genes with significant causal effects on DN, ultimately identifying one co-expressed gene, carbonic anhydrase II (<i>CA2</i>), as a key biomarker for DN. The ROC curves demonstrated the excellent predictive performance of <i>CA2</i>, with area under the curve values consistently above 0.878 across all datasets. Additionally, our analysis indicated a significant association between <i>CA2</i> and infiltrating immune cells in DN, providing potential mechanistic insights. This biomarker was validated using clinical samples, confirming the reliability of our findings in clinical practice.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>By integrating machine learning, MR and experimental validation, we successfully identified and validated <i>CA2</i> as a promising biomarker for DN with excellent predictive performance. The biomarker may play a role in the pathogenesis and progression of DN via immune-related pathways. These findings provide important insights into the molecular mechanisms underlying DN and may inform the development of personalized treatment strategies for this disease.</p>\n </section>\n </div>","PeriodicalId":158,"journal":{"name":"Diabetes, Obesity & Metabolism","volume":"26 12","pages":"5646-5660"},"PeriodicalIF":5.4000,"publicationDate":"2024-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrated approach of machine learning, Mendelian randomization and experimental validation for biomarker discovery in diabetic nephropathy\",\"authors\":\"Yidong Zhu MD, Jun Liu MD, Bo Wang MD\",\"doi\":\"10.1111/dom.15933\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Aim</h3>\\n \\n <p>To identify potential biomarkers and explore the mechanisms underlying diabetic nephropathy (DN) by integrating machine learning, Mendelian randomization (MR) and experimental validation.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>Microarray and RNA-sequencing datasets (GSE47184, GSE96804, GSE104948, GSE104954, GSE142025 and GSE175759) were obtained from the Gene Expression Omnibus database. Differential expression analysis identified the differentially expressed genes (DEGs) between patients with DN and controls. Diverse machine learning algorithms, including least absolute shrinkage and selection operator, support vector machine-recursive feature elimination, and random forest, were used to enhance gene selection accuracy and predictive power. We integrated summary-level data from genome-wide association studies on DN with expression quantitative trait loci data to identify genes with potential causal relationships to DN. The predictive performance of the biomarker gene was validated using receiver operating characteristic (ROC) curves. Gene set enrichment and correlation analyses were conducted to investigate potential mechanisms. Finally, the biomarker gene was validated using quantitative real-time polymerase chain reaction in clinical samples from patients with DN and controls.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>Based on identified 314 DEGs, seven characteristic genes with high predictive performance were identified using three integrated machine learning algorithms. MR analysis revealed 219 genes with significant causal effects on DN, ultimately identifying one co-expressed gene, carbonic anhydrase II (<i>CA2</i>), as a key biomarker for DN. The ROC curves demonstrated the excellent predictive performance of <i>CA2</i>, with area under the curve values consistently above 0.878 across all datasets. Additionally, our analysis indicated a significant association between <i>CA2</i> and infiltrating immune cells in DN, providing potential mechanistic insights. This biomarker was validated using clinical samples, confirming the reliability of our findings in clinical practice.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusion</h3>\\n \\n <p>By integrating machine learning, MR and experimental validation, we successfully identified and validated <i>CA2</i> as a promising biomarker for DN with excellent predictive performance. The biomarker may play a role in the pathogenesis and progression of DN via immune-related pathways. These findings provide important insights into the molecular mechanisms underlying DN and may inform the development of personalized treatment strategies for this disease.</p>\\n </section>\\n </div>\",\"PeriodicalId\":158,\"journal\":{\"name\":\"Diabetes, Obesity & Metabolism\",\"volume\":\"26 12\",\"pages\":\"5646-5660\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2024-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Diabetes, Obesity & Metabolism\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/dom.15933\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENDOCRINOLOGY & METABOLISM\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diabetes, Obesity & Metabolism","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/dom.15933","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}

引用次数: 0

摘要

目的：通过整合机器学习、孟德尔随机化（MR）和实验验证，确定潜在的生物标志物并探索糖尿病肾病（DN）的发病机制：从基因表达总库数据库获取芯片和 RNA 序列数据集（GSE47184、GSE96804、GSE104948、GSE104954、GSE142025 和 GSE175759）。差异表达分析确定了 DN 患者与对照组之间的差异表达基因（DEGs）。为了提高基因选择的准确性和预测能力，我们使用了多种机器学习算法，包括最小绝对收缩和选择算子、支持向量机-递归特征消除和随机森林。我们整合了 DN 全基因组关联研究的摘要级数据和表达定量性状位点数据，以确定与 DN 有潜在因果关系的基因。使用接收者操作特征曲线（ROC）验证了生物标志基因的预测性能。还进行了基因组富集和相关性分析，以研究潜在的机制。最后，在 DN 患者和对照组的临床样本中使用定量实时聚合酶链反应对生物标志基因进行了验证：结果：在确定的 314 个 DEGs 基础上，使用三种综合机器学习算法确定了 7 个具有高预测性能的特征基因。磁共振分析发现了 219 个对 DN 有显著因果效应的基因，最终确定了一个共表达基因--碳酸酐酶 II（CA2）--作为 DN 的关键生物标志物。ROC 曲线显示 CA2 具有出色的预测性能，在所有数据集上的曲线下面积值均高于 0.878。此外，我们的分析表明，CA2 与 DN 中的浸润免疫细胞之间存在显著关联，这为我们提供了潜在的机理启示。这一生物标记物通过临床样本进行了验证，证实了我们的研究结果在临床实践中的可靠性：通过整合机器学习、磁共振成像和实验验证，我们成功鉴定并验证了 CA2 是一种有前途的 DN 生物标记物，具有极佳的预测性能。该生物标志物可能通过免疫相关途径在 DN 的发病和进展过程中发挥作用。这些发现为我们深入了解 DN 的分子机制提供了重要依据，并可为开发该疾病的个性化治疗策略提供参考。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Integrated approach of machine learning, Mendelian randomization and experimental validation for biomarker discovery in diabetic nephropathy

Aim

To identify potential biomarkers and explore the mechanisms underlying diabetic nephropathy (DN) by integrating machine learning, Mendelian randomization (MR) and experimental validation.

Methods

Microarray and RNA-sequencing datasets (GSE47184, GSE96804, GSE104948, GSE104954, GSE142025 and GSE175759) were obtained from the Gene Expression Omnibus database. Differential expression analysis identified the differentially expressed genes (DEGs) between patients with DN and controls. Diverse machine learning algorithms, including least absolute shrinkage and selection operator, support vector machine-recursive feature elimination, and random forest, were used to enhance gene selection accuracy and predictive power. We integrated summary-level data from genome-wide association studies on DN with expression quantitative trait loci data to identify genes with potential causal relationships to DN. The predictive performance of the biomarker gene was validated using receiver operating characteristic (ROC) curves. Gene set enrichment and correlation analyses were conducted to investigate potential mechanisms. Finally, the biomarker gene was validated using quantitative real-time polymerase chain reaction in clinical samples from patients with DN and controls.

Results

Based on identified 314 DEGs, seven characteristic genes with high predictive performance were identified using three integrated machine learning algorithms. MR analysis revealed 219 genes with significant causal effects on DN, ultimately identifying one co-expressed gene, carbonic anhydrase II (CA2), as a key biomarker for DN. The ROC curves demonstrated the excellent predictive performance of CA2, with area under the curve values consistently above 0.878 across all datasets. Additionally, our analysis indicated a significant association between CA2 and infiltrating immune cells in DN, providing potential mechanistic insights. This biomarker was validated using clinical samples, confirming the reliability of our findings in clinical practice.

Conclusion

By integrating machine learning, MR and experimental validation, we successfully identified and validated CA2 as a promising biomarker for DN with excellent predictive performance. The biomarker may play a role in the pathogenesis and progression of DN via immune-related pathways. These findings provide important insights into the molecular mechanisms underlying DN and may inform the development of personalized treatment strategies for this disease.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Diabetes, Obesity & Metabolism 医学-内分泌学与代谢

CiteScore

10.90

自引率

6.90%

发文量

319

审稿时长

3-8 weeks

期刊介绍： Diabetes, Obesity and Metabolism is primarily a journal of clinical and experimental pharmacology and therapeutics covering the interrelated areas of diabetes, obesity and metabolism. The journal prioritises high-quality original research that reports on the effects of new or existing therapies, including dietary, exercise and lifestyle (non-pharmacological) interventions, in any aspect of metabolic and endocrine disease, either in humans or animal and cellular systems. ‘Metabolism’ may relate to lipids, bone and drug metabolism, or broader aspects of endocrine dysfunction. Preclinical pharmacology, pharmacokinetic studies, meta-analyses and those addressing drug safety and tolerability are also highly suitable for publication in this journal. Original research may be published as a main paper or as a research letter.