{"title":"发现糖尿病肾病生物标志物的机器学习、孟德尔随机化和实验验证综合方法。","authors":"Yidong Zhu MD, Jun Liu MD, Bo Wang MD","doi":"10.1111/dom.15933","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Aim</h3>\n \n <p>To identify potential biomarkers and explore the mechanisms underlying diabetic nephropathy (DN) by integrating machine learning, Mendelian randomization (MR) and experimental validation.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Microarray and RNA-sequencing datasets (GSE47184, GSE96804, GSE104948, GSE104954, GSE142025 and GSE175759) were obtained from the Gene Expression Omnibus database. Differential expression analysis identified the differentially expressed genes (DEGs) between patients with DN and controls. Diverse machine learning algorithms, including least absolute shrinkage and selection operator, support vector machine-recursive feature elimination, and random forest, were used to enhance gene selection accuracy and predictive power. We integrated summary-level data from genome-wide association studies on DN with expression quantitative trait loci data to identify genes with potential causal relationships to DN. The predictive performance of the biomarker gene was validated using receiver operating characteristic (ROC) curves. Gene set enrichment and correlation analyses were conducted to investigate potential mechanisms. Finally, the biomarker gene was validated using quantitative real-time polymerase chain reaction in clinical samples from patients with DN and controls.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Based on identified 314 DEGs, seven characteristic genes with high predictive performance were identified using three integrated machine learning algorithms. MR analysis revealed 219 genes with significant causal effects on DN, ultimately identifying one co-expressed gene, carbonic anhydrase II (<i>CA2</i>), as a key biomarker for DN. The ROC curves demonstrated the excellent predictive performance of <i>CA2</i>, with area under the curve values consistently above 0.878 across all datasets. Additionally, our analysis indicated a significant association between <i>CA2</i> and infiltrating immune cells in DN, providing potential mechanistic insights. This biomarker was validated using clinical samples, confirming the reliability of our findings in clinical practice.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>By integrating machine learning, MR and experimental validation, we successfully identified and validated <i>CA2</i> as a promising biomarker for DN with excellent predictive performance. The biomarker may play a role in the pathogenesis and progression of DN via immune-related pathways. These findings provide important insights into the molecular mechanisms underlying DN and may inform the development of personalized treatment strategies for this disease.</p>\n </section>\n </div>","PeriodicalId":158,"journal":{"name":"Diabetes, Obesity & Metabolism","volume":"26 12","pages":"5646-5660"},"PeriodicalIF":5.4000,"publicationDate":"2024-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrated approach of machine learning, Mendelian randomization and experimental validation for biomarker discovery in diabetic nephropathy\",\"authors\":\"Yidong Zhu MD, Jun Liu MD, Bo Wang MD\",\"doi\":\"10.1111/dom.15933\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Aim</h3>\\n \\n <p>To identify potential biomarkers and explore the mechanisms underlying diabetic nephropathy (DN) by integrating machine learning, Mendelian randomization (MR) and experimental validation.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>Microarray and RNA-sequencing datasets (GSE47184, GSE96804, GSE104948, GSE104954, GSE142025 and GSE175759) were obtained from the Gene Expression Omnibus database. Differential expression analysis identified the differentially expressed genes (DEGs) between patients with DN and controls. Diverse machine learning algorithms, including least absolute shrinkage and selection operator, support vector machine-recursive feature elimination, and random forest, were used to enhance gene selection accuracy and predictive power. We integrated summary-level data from genome-wide association studies on DN with expression quantitative trait loci data to identify genes with potential causal relationships to DN. The predictive performance of the biomarker gene was validated using receiver operating characteristic (ROC) curves. Gene set enrichment and correlation analyses were conducted to investigate potential mechanisms. Finally, the biomarker gene was validated using quantitative real-time polymerase chain reaction in clinical samples from patients with DN and controls.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>Based on identified 314 DEGs, seven characteristic genes with high predictive performance were identified using three integrated machine learning algorithms. MR analysis revealed 219 genes with significant causal effects on DN, ultimately identifying one co-expressed gene, carbonic anhydrase II (<i>CA2</i>), as a key biomarker for DN. The ROC curves demonstrated the excellent predictive performance of <i>CA2</i>, with area under the curve values consistently above 0.878 across all datasets. Additionally, our analysis indicated a significant association between <i>CA2</i> and infiltrating immune cells in DN, providing potential mechanistic insights. This biomarker was validated using clinical samples, confirming the reliability of our findings in clinical practice.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusion</h3>\\n \\n <p>By integrating machine learning, MR and experimental validation, we successfully identified and validated <i>CA2</i> as a promising biomarker for DN with excellent predictive performance. The biomarker may play a role in the pathogenesis and progression of DN via immune-related pathways. These findings provide important insights into the molecular mechanisms underlying DN and may inform the development of personalized treatment strategies for this disease.</p>\\n </section>\\n </div>\",\"PeriodicalId\":158,\"journal\":{\"name\":\"Diabetes, Obesity & Metabolism\",\"volume\":\"26 12\",\"pages\":\"5646-5660\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2024-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Diabetes, Obesity & Metabolism\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/dom.15933\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENDOCRINOLOGY & METABOLISM\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diabetes, Obesity & Metabolism","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/dom.15933","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
Integrated approach of machine learning, Mendelian randomization and experimental validation for biomarker discovery in diabetic nephropathy
Aim
To identify potential biomarkers and explore the mechanisms underlying diabetic nephropathy (DN) by integrating machine learning, Mendelian randomization (MR) and experimental validation.
Methods
Microarray and RNA-sequencing datasets (GSE47184, GSE96804, GSE104948, GSE104954, GSE142025 and GSE175759) were obtained from the Gene Expression Omnibus database. Differential expression analysis identified the differentially expressed genes (DEGs) between patients with DN and controls. Diverse machine learning algorithms, including least absolute shrinkage and selection operator, support vector machine-recursive feature elimination, and random forest, were used to enhance gene selection accuracy and predictive power. We integrated summary-level data from genome-wide association studies on DN with expression quantitative trait loci data to identify genes with potential causal relationships to DN. The predictive performance of the biomarker gene was validated using receiver operating characteristic (ROC) curves. Gene set enrichment and correlation analyses were conducted to investigate potential mechanisms. Finally, the biomarker gene was validated using quantitative real-time polymerase chain reaction in clinical samples from patients with DN and controls.
Results
Based on identified 314 DEGs, seven characteristic genes with high predictive performance were identified using three integrated machine learning algorithms. MR analysis revealed 219 genes with significant causal effects on DN, ultimately identifying one co-expressed gene, carbonic anhydrase II (CA2), as a key biomarker for DN. The ROC curves demonstrated the excellent predictive performance of CA2, with area under the curve values consistently above 0.878 across all datasets. Additionally, our analysis indicated a significant association between CA2 and infiltrating immune cells in DN, providing potential mechanistic insights. This biomarker was validated using clinical samples, confirming the reliability of our findings in clinical practice.
Conclusion
By integrating machine learning, MR and experimental validation, we successfully identified and validated CA2 as a promising biomarker for DN with excellent predictive performance. The biomarker may play a role in the pathogenesis and progression of DN via immune-related pathways. These findings provide important insights into the molecular mechanisms underlying DN and may inform the development of personalized treatment strategies for this disease.
期刊介绍:
Diabetes, Obesity and Metabolism is primarily a journal of clinical and experimental pharmacology and therapeutics covering the interrelated areas of diabetes, obesity and metabolism. The journal prioritises high-quality original research that reports on the effects of new or existing therapies, including dietary, exercise and lifestyle (non-pharmacological) interventions, in any aspect of metabolic and endocrine disease, either in humans or animal and cellular systems. ‘Metabolism’ may relate to lipids, bone and drug metabolism, or broader aspects of endocrine dysfunction. Preclinical pharmacology, pharmacokinetic studies, meta-analyses and those addressing drug safety and tolerability are also highly suitable for publication in this journal. Original research may be published as a main paper or as a research letter.