The correlation between chronic obstructive pulmonary disease (COPD) and Type 2 diabetes mellitus (T2DM) has long been recognized, but their shared molecular underpinnings remain elusive. This study aims to uncover common genetic markers and pathways in COPD and T2DM, providing insights into their molecular crosstalk.
Utilizing the Gene Expression Omnibus (GEO) database, we analyzed gene expression datasets from six COPD and five T2DM studies. A multifaceted bioinformatics approach, encompassing the limma R package, unified matrix analysis, and weighted gene co-expression network analysis (WGCNA), was deployed to identify differentially expressed genes (DEGs) and hub genes. Functional enrichment and protein–protein interaction (PPI) analyses were conducted, followed by cross-species validation in Mus musculus models. Machine learning techniques, including random forest and LASSO regression, were applied for further validation, culminating in the development of a prognostic model using XGBoost.
Our analysis revealed shared DEGs such as KIF1C, CSTA, GMNN, and PHGDH in both COPD and T2DM. Cross-species comparison identified common genes including PON1 and CD14, exhibiting varying expression patterns. The random forest and LASSO regression identified six critical genes, with our XGBoost model demonstrating significant predictive accuracy (AUC = 0.996 for COPD).
This study identifies key genetic markers shared between COPD and T2DM, providing new insights into their molecular pathways. Our XGBoost model exhibited high predictive accuracy for COPD, highlighting the potential utility of these markers. These findings offer promising biomarkers for early detection and enhance our understanding of the diseases' interplay. Further validation in larger cohorts is recommended.