To identify potential biomarkers and explore the mechanisms underlying diabetic nephropathy (DN) by integrating machine learning, Mendelian randomization (MR) and experimental validation.
Microarray and RNA-sequencing datasets (GSE47184, GSE96804, GSE104948, GSE104954, GSE142025 and GSE175759) were obtained from the Gene Expression Omnibus database. Differential expression analysis identified the differentially expressed genes (DEGs) between patients with DN and controls. Diverse machine learning algorithms, including least absolute shrinkage and selection operator, support vector machine-recursive feature elimination, and random forest, were used to enhance gene selection accuracy and predictive power. We integrated summary-level data from genome-wide association studies on DN with expression quantitative trait loci data to identify genes with potential causal relationships to DN. The predictive performance of the biomarker gene was validated using receiver operating characteristic (ROC) curves. Gene set enrichment and correlation analyses were conducted to investigate potential mechanisms. Finally, the biomarker gene was validated using quantitative real-time polymerase chain reaction in clinical samples from patients with DN and controls.
Based on identified 314 DEGs, seven characteristic genes with high predictive performance were identified using three integrated machine learning algorithms. MR analysis revealed 219 genes with significant causal effects on DN, ultimately identifying one co-expressed gene, carbonic anhydrase II (CA2), as a key biomarker for DN. The ROC curves demonstrated the excellent predictive performance of CA2, with area under the curve values consistently above 0.878 across all datasets. Additionally, our analysis indicated a significant association between CA2 and infiltrating immune cells in DN, providing potential mechanistic insights. This biomarker was validated using clinical samples, confirming the reliability of our findings in clinical practice.
By integrating machine learning, MR and experimental validation, we successfully identified and validated CA2 as a promising biomarker for DN with excellent predictive performance. The biomarker may play a role in the pathogenesis and progression of DN via immune-related pathways. These findings provide important insights into the molecular mechanisms underlying DN and may inform the development of personalized treatment strategies for this disease.