Breast cancer (BC), a malignant tumor, is a significant cause of death and disability among women globally. Recent research indicates that copy number variation plays a crucial role in tumor development. In this study, we employed the Single-Cell Variational Aneuploidy Analysis (SCEVAN) algorithm to differentiate between malignant and non-malignant cells, aiming to identify genetic signatures with prognostic relevance for predicting patient survival.
We analyzed gene expression profiles and associated clinical data from the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases. Using the SCEVAN algorithm, we distinguished malignant from non-malignant cells and investigated cellular interactions within the tumor microenvironment (TME). We categorized TCGA samples based on differentially expressed genes (DEGs) between these cell types. Subsequent Kyoto Encyclopedia of Genes and Genomes pathway analysis was conducted. Additionally, we developed polygenic models for the DEGs using least absolute shrinkage and selection operator-penalized Cox regression analysis. To assess the prognostic accuracy of these characteristics, we generated Kaplan–Meier and receiver operating characteristic curves from training and validation datasets. We also monitored the expression variations of prognostic genes across the pseudotime of malignant cells. Patients were divided into high-risk and low-risk groups based on median risk scores to compare their TME and identify potential therapeutic agents. Lastly, polymerase chain reaction was used to validate seven pivotal genes.
The SCEVAN algorithm identified distinct malignant and non-malignant cells in GSE180286. Cellchat analysis revealed significantly increased cellular communication, particularly between fibroblasts, endothelial cells and malignant cells. The DEGs were predominantly involved in immune-related pathways. TCGA samples were classified into clusters A and B based on these genes. Cluster A, enriched in immune pathways, was associated with poorer prognosis, whereas cluster B, predominantly involved in circadian rhythm pathways, showed better outcomes. We constructed a 14-gene prognostic signature, validated in a 1:1 internal TCGA cohort and external GEO datasets (GSE42568 and GSE146558). Kaplan–Meier analysis confirmed the prognostic signature's accuracy (p < 0.001). Receiver operating characteristic curve analysis demonstrated the predictive reliability of these prognostic features. Single-cell pseudotime analysis with monocle2 highlighted the distinct expression trends of these genes in malignant cells, underscoring the intratumoral heterogeneity. Furthermore, we explored the differences in TME between high- and low-risk groups and identified 16 significantly correlated drugs.
Our findings suggest that the 14-gene prognostic signature could serve as a novel biomarker for forecasting the prognosis of BC patients. Additionally, the immune cells and pathways in different risk groups indicate that immunotherapy may be a crucial component of treatment strategies for BC patients.