Objectives: With the increasing application of high-throughput transcriptomic data in cancer research, identifying reliable cancer biomarkers in high-dimensional settings remains a major challenge. This study aims to systematically evaluate various regularized conditional logistic regression (CLR) methods under a matched case-control (MCC) design, focusing on their performance in variable selection, parameter estimation, and predictive accuracy. Special emphasis is placed on the importance of the matching design in reducing confounding effects and improving model interpretability.
Methods: We utilize RNA-seq data from The Cancer Genome Atlas (TCGA), specifically datasets for liver, thyroid, and lung cancers, which include paired tumor and adjacent normal tissue samples. In our analysis, we apply 4 regularized CLR methods implemented in R packages-namely "clogitL1," "pclogit," "clogitLasso," and "penalizedclr"-to analyze over 20 000 gene expression features. We evaluate the comparative performance of these methods based on metrics such as gene selection stability, predictive accuracy, and interpretability. Additionally, we employ a bootstrap resampling framework to estimate gene selection probabilities, which serve as a measure of gene importance.
Results: Our results show that incorporating the MCC design significantly enhances feature selection performance by mitigating confounding noise. The regularized CLR models successfully identify several well-established cancer-related genes with high selection consistency and statistical significance. In contrast, models that ignore the matched design tend to miss critical biomarkers or produce excessive false positives, leading to potentially misleading interpretations.
Conclusions: This study highlights the value of integrating a matched case-control design with regularized CLR methods for the analysis of high-dimensional transcriptomic data. The proposed analytical framework offers improved accuracy, robustness, and biological relevance, providing a practical and scalable approach for cancer genomics research. It also supports the advancement of precision medicine and translational applications.
{"title":"Robust Cancer Biomarker Identification From Matched Transcriptomic Data Via Bootstrapped Regularized Conditional Logistic Regression.","authors":"Jie-Huei Wang, Zih-Han Wu, Hui-Chen Lu, Tzung-Ying Guo","doi":"10.1177/11769351251404255","DOIUrl":"10.1177/11769351251404255","url":null,"abstract":"<p><strong>Objectives: </strong>With the increasing application of high-throughput transcriptomic data in cancer research, identifying reliable cancer biomarkers in high-dimensional settings remains a major challenge. This study aims to systematically evaluate various regularized conditional logistic regression (CLR) methods under a matched case-control (MCC) design, focusing on their performance in variable selection, parameter estimation, and predictive accuracy. Special emphasis is placed on the importance of the matching design in reducing confounding effects and improving model interpretability.</p><p><strong>Methods: </strong>We utilize RNA-seq data from The Cancer Genome Atlas (TCGA), specifically datasets for liver, thyroid, and lung cancers, which include paired tumor and adjacent normal tissue samples. In our analysis, we apply 4 regularized CLR methods implemented in R packages-namely \"clogitL1,\" \"pclogit,\" \"clogitLasso,\" and \"penalizedclr\"-to analyze over 20 000 gene expression features. We evaluate the comparative performance of these methods based on metrics such as gene selection stability, predictive accuracy, and interpretability. Additionally, we employ a bootstrap resampling framework to estimate gene selection probabilities, which serve as a measure of gene importance.</p><p><strong>Results: </strong>Our results show that incorporating the MCC design significantly enhances feature selection performance by mitigating confounding noise. The regularized CLR models successfully identify several well-established cancer-related genes with high selection consistency and statistical significance. In contrast, models that ignore the matched design tend to miss critical biomarkers or produce excessive false positives, leading to potentially misleading interpretations.</p><p><strong>Conclusions: </strong>This study highlights the value of integrating a matched case-control design with regularized CLR methods for the analysis of high-dimensional transcriptomic data. The proposed analytical framework offers improved accuracy, robustness, and biological relevance, providing a practical and scalable approach for cancer genomics research. It also supports the advancement of precision medicine and translational applications.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251404255"},"PeriodicalIF":2.5,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12709001/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145782996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p><strong>Background: </strong>Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal malignancy with a dismal 5-year survival rate, largely due to the absence of reliable biomarkers for early detection. The molecular mechanisms underpinning PDAC pathogenesis remain incompletely understood, highlighting the urgent need for novel diagnostic strategies.</p><p><strong>Objective: </strong>This study aimed to integrate eQTL-driven Mendelian randomization (MR) with transcriptomic and genome-wide association data to identify causal PDAC-associated genes and construct a diagnostic nomogram based on 5 hub genes (CTSC, SMYD3, MFGE8, IGFBP7, POC1B) for early detection of pancreatic ductal adenocarcinoma (PDAC).</p><p><strong>Methods: </strong>Transcriptomic data from GSE62165 and GSE25471 were retrieved from the Gene Expression Omnibus (GEO) and processed for differential expression using LIMMA and GEO2R, followed by batch correction and weighted gene co-expression network analysis (WGCNA). Summary-level eQTL statistics were obtained from OpenGWAS, and GWAS data included over 5000 PDAC cases. MR analysis was performed using inverse variance weighted (IVW) as the primary approach, supplemented with MR-Egger, weighted median, weighted mode, and MR-PRESSO. Instrument strength, pleiotropy, and heterogeneity were assessed via F-statistics, Egger intercept, and Cochran's <i>Q</i> test. Candidate genes were filtered using a consensus approach combining random forest (RF), support vector machine-recursive feature elimination (SVM-RFE), and Lasso regression. Diagnostic performance was evaluated via ROC curves, C-index, calibration plots, and decision curve analysis. Mechanistic insights were derived from KEGG and GO enrichment analyses, as well as protein-protein interaction (PPI) network analyses.</p><p><strong>Results: </strong>Five eQTL-associated hub genes-<b>CTSC, SMYD3, MFGE8, IGFBP7, and POC1B</b>-were identified as causally linked to PDAC via robust MR analysis with minimal evidence of pleiotropy or heterogeneity. These genes demonstrated high diagnostic potential (AUC > 0.85, <i>P</i> < .001). A diagnostic nomogram incorporating these genes achieved strong predictive performance (C-index = 0.92) with favorable clinical decision curve results. Functional enrichment and PPI analyses implicated these genes, particularly CTSC, in modulating the <b>ITGAV/ITGB3-PI3K-Akt signaling axis</b>, contributing to PDAC cell cycle regulation and apoptosis resistance.</p><p><strong>Conclusions: </strong>This study presents a multi-omics, MR-informed framework for identifying eQTL-regulated biomarkers of PDAC. The identified hub genes offer promising avenues for early detection, while the mechanistic mapping of the PI3K-Akt pathway provides translational insights. These findings warrant further validation in clinical and experimental settings and hold potential to reshape PDAC diagnostic strategies.Pancreatic ductal adenocarcinoma (PDAC) remains a formidable clinical ch
{"title":"Integrative Analysis of eQTL Genes Reveals Key Biomarkers and Mechanisms for Early Diagnosis of Pancreatic Ductal Adenocarcinoma.","authors":"Xuebo Wang, Xusheng Zhang, Shicai Liang, Jialong Wang, Yannan Xie, Jiawei Wang, Bendong Chen","doi":"10.1177/11769351251400465","DOIUrl":"10.1177/11769351251400465","url":null,"abstract":"<p><strong>Background: </strong>Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal malignancy with a dismal 5-year survival rate, largely due to the absence of reliable biomarkers for early detection. The molecular mechanisms underpinning PDAC pathogenesis remain incompletely understood, highlighting the urgent need for novel diagnostic strategies.</p><p><strong>Objective: </strong>This study aimed to integrate eQTL-driven Mendelian randomization (MR) with transcriptomic and genome-wide association data to identify causal PDAC-associated genes and construct a diagnostic nomogram based on 5 hub genes (CTSC, SMYD3, MFGE8, IGFBP7, POC1B) for early detection of pancreatic ductal adenocarcinoma (PDAC).</p><p><strong>Methods: </strong>Transcriptomic data from GSE62165 and GSE25471 were retrieved from the Gene Expression Omnibus (GEO) and processed for differential expression using LIMMA and GEO2R, followed by batch correction and weighted gene co-expression network analysis (WGCNA). Summary-level eQTL statistics were obtained from OpenGWAS, and GWAS data included over 5000 PDAC cases. MR analysis was performed using inverse variance weighted (IVW) as the primary approach, supplemented with MR-Egger, weighted median, weighted mode, and MR-PRESSO. Instrument strength, pleiotropy, and heterogeneity were assessed via F-statistics, Egger intercept, and Cochran's <i>Q</i> test. Candidate genes were filtered using a consensus approach combining random forest (RF), support vector machine-recursive feature elimination (SVM-RFE), and Lasso regression. Diagnostic performance was evaluated via ROC curves, C-index, calibration plots, and decision curve analysis. Mechanistic insights were derived from KEGG and GO enrichment analyses, as well as protein-protein interaction (PPI) network analyses.</p><p><strong>Results: </strong>Five eQTL-associated hub genes-<b>CTSC, SMYD3, MFGE8, IGFBP7, and POC1B</b>-were identified as causally linked to PDAC via robust MR analysis with minimal evidence of pleiotropy or heterogeneity. These genes demonstrated high diagnostic potential (AUC > 0.85, <i>P</i> < .001). A diagnostic nomogram incorporating these genes achieved strong predictive performance (C-index = 0.92) with favorable clinical decision curve results. Functional enrichment and PPI analyses implicated these genes, particularly CTSC, in modulating the <b>ITGAV/ITGB3-PI3K-Akt signaling axis</b>, contributing to PDAC cell cycle regulation and apoptosis resistance.</p><p><strong>Conclusions: </strong>This study presents a multi-omics, MR-informed framework for identifying eQTL-regulated biomarkers of PDAC. The identified hub genes offer promising avenues for early detection, while the mechanistic mapping of the PI3K-Akt pathway provides translational insights. These findings warrant further validation in clinical and experimental settings and hold potential to reshape PDAC diagnostic strategies.Pancreatic ductal adenocarcinoma (PDAC) remains a formidable clinical ch","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251400465"},"PeriodicalIF":2.5,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12709030/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145782962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-12eCollection Date: 2025-01-01DOI: 10.1177/11769351251401330
Syed Billal Hossain, Md Mizanoor Rahman, Kapashia Binte Giash, Md Hazrat Ali, Mst Asma Akter, A B M Alauddin Chowdhury
Background: Post-mastectomy PTSD is a serious mental health issue, but it has not been studied enough, particularly in low-resource settings like Bangladesh. This study aimed to predict PTSD among breast cancer survivors using machine learning (ML) models and identify significant predictors through the Boruta algorithm, a feature selection tool, offering scalable solutions for early detection and intervention.
Methods: A cross-sectional study of 138 post-mastectomy breast cancer patients was conducted across 3 hospitals in Bangladesh. Data on sociodemographic, health history, social experience, and treatment were collected using validated tools, including the PTSD Checklist for DSM-5 (PCL-5). The Boruta algorithm identified key predictors, and 10 ML models were evaluated for PTSD prediction using metrics such as accuracy, sensitivity, specificity, and AUC.
Results: Random Forest (RF) outperformed other models (accuracy: 88.9%, AUC: 0.914). Significant predictors included education, monthly income, and changes in family behaviour. Factors like marital status, having chronic diseases, and hormone therapy were not statistically significant. PTSD prevalence was 34.1%, with urban residents and younger patients facing higher risks.
Conclusion: ML models, particularly RF, demonstrated strong predictive performance and identified critical PTSD predictors. These findings highlight the potential for cost-effective PTSD screening in resource-constrained settings. Future research should focus on broader validation and longitudinal studies to refine predictive models.
{"title":"Prediction and Feature Selection of Mastectomy-Related Post Traumatic Stress Disorder (PTSD) Using Machine Learning Among Breast Cancer Patients in Bangladesh.","authors":"Syed Billal Hossain, Md Mizanoor Rahman, Kapashia Binte Giash, Md Hazrat Ali, Mst Asma Akter, A B M Alauddin Chowdhury","doi":"10.1177/11769351251401330","DOIUrl":"10.1177/11769351251401330","url":null,"abstract":"<p><strong>Background: </strong>Post-mastectomy PTSD is a serious mental health issue, but it has not been studied enough, particularly in low-resource settings like Bangladesh. This study aimed to predict PTSD among breast cancer survivors using machine learning (ML) models and identify significant predictors through the Boruta algorithm, a feature selection tool, offering scalable solutions for early detection and intervention.</p><p><strong>Methods: </strong>A cross-sectional study of 138 post-mastectomy breast cancer patients was conducted across 3 hospitals in Bangladesh. Data on sociodemographic, health history, social experience, and treatment were collected using validated tools, including the PTSD Checklist for DSM-5 (PCL-5). The Boruta algorithm identified key predictors, and 10 ML models were evaluated for PTSD prediction using metrics such as accuracy, sensitivity, specificity, and AUC.</p><p><strong>Results: </strong>Random Forest (RF) outperformed other models (accuracy: 88.9%, AUC: 0.914). Significant predictors included education, monthly income, and changes in family behaviour. Factors like marital status, having chronic diseases, and hormone therapy were not statistically significant. PTSD prevalence was 34.1%, with urban residents and younger patients facing higher risks.</p><p><strong>Conclusion: </strong>ML models, particularly RF, demonstrated strong predictive performance and identified critical PTSD predictors. These findings highlight the potential for cost-effective PTSD screening in resource-constrained settings. Future research should focus on broader validation and longitudinal studies to refine predictive models.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251401330"},"PeriodicalIF":2.5,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701936/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-29eCollection Date: 2025-01-01DOI: 10.1177/11769351251396242
Lulu Wang, Hua Jin, Xiaowei Liu, Hanzhi Zhang
Objectives: The aim of this study is to investigate the role of epithelial cell transforming sequence 2 (ECT2) as a pan-cancer biomarker and to assess its potential as an immune-related target for cancer immunotherapy.
Methods: We conducted a comprehensive analysis of ECT2 expression across 44 tumor types using large-scale transcriptomic datasets from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. Pan-cancer Cox regression analyses were performed to evaluate the correlation between ECT2 expression and patient survival outcomes. Functional assays, including ECT2 knockdown via shRNA in the HepG2 hepatocellular carcinoma (HCC) cell line, were employed to investigate its mechanistic role. Transcriptomic profiling and pathway analyses were also conducted to explore the impact of ECT2 on cell proliferation and the tumor immune microenvironment.
Results: ECT2 was found to be significantly upregulated in 31 tumor types. Elevated ECT2 expression was consistently associated with worse overall survival (OS), disease-specific survival (DSS), disease-free interval (DFI), and progression-free interval (PFI) across multiple cancer subtypes. Functional assays revealed that ECT2 knockdown significantly reduced HepG2 cell viability and impaired cell cycle progression, with downregulation of Cyclin D1. Transcriptomic analysis of ECT2-depleted cells indicated enriched gene sets related to cell proliferation and mitotic regulation. Additionally, ECT2 expression was significantly correlated with immune features, including immune cell infiltration, immune checkpoint gene expression, tumor mutational burden (TMB), and microsatellite instability (MSI).
Conclusion: ECT2 is identified as a potential pan-cancer prognostic biomarker with dual roles in tumor initiation and progression, as well as in modulating the tumor immune microenvironment. Our findings suggest that ECT2 may serve as a promising therapeutic target in cancer immunotherapy, warranting further investigation into its immune-regulatory and oncogenic functions.
{"title":"Pan-Cancer Analysis of the Prognostic and Immunological Role of ECT2: A Promising Target for Survival and Immunotherapy.","authors":"Lulu Wang, Hua Jin, Xiaowei Liu, Hanzhi Zhang","doi":"10.1177/11769351251396242","DOIUrl":"10.1177/11769351251396242","url":null,"abstract":"<p><strong>Objectives: </strong>The aim of this study is to investigate the role of epithelial cell transforming sequence 2 (ECT2) as a pan-cancer biomarker and to assess its potential as an immune-related target for cancer immunotherapy.</p><p><strong>Methods: </strong>We conducted a comprehensive analysis of ECT2 expression across 44 tumor types using large-scale transcriptomic datasets from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. Pan-cancer Cox regression analyses were performed to evaluate the correlation between ECT2 expression and patient survival outcomes. Functional assays, including ECT2 knockdown via shRNA in the HepG2 hepatocellular carcinoma (HCC) cell line, were employed to investigate its mechanistic role. Transcriptomic profiling and pathway analyses were also conducted to explore the impact of ECT2 on cell proliferation and the tumor immune microenvironment.</p><p><strong>Results: </strong>ECT2 was found to be significantly upregulated in 31 tumor types. Elevated ECT2 expression was consistently associated with worse overall survival (OS), disease-specific survival (DSS), disease-free interval (DFI), and progression-free interval (PFI) across multiple cancer subtypes. Functional assays revealed that ECT2 knockdown significantly reduced HepG2 cell viability and impaired cell cycle progression, with downregulation of Cyclin D1. Transcriptomic analysis of ECT2-depleted cells indicated enriched gene sets related to cell proliferation and mitotic regulation. Additionally, ECT2 expression was significantly correlated with immune features, including immune cell infiltration, immune checkpoint gene expression, tumor mutational burden (TMB), and microsatellite instability (MSI).</p><p><strong>Conclusion: </strong>ECT2 is identified as a potential pan-cancer prognostic biomarker with dual roles in tumor initiation and progression, as well as in modulating the tumor immune microenvironment. Our findings suggest that ECT2 may serve as a promising therapeutic target in cancer immunotherapy, warranting further investigation into its immune-regulatory and oncogenic functions.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251396242"},"PeriodicalIF":2.5,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145655403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Breast cancer remains a predominant malignancy and a leading cause of oncologic mortality among women globally. The discovery of novel biomarkers is crucial for improving therapeutic outcomes.
Methods: We conducted a comprehensive analysis of the immunological and prognostic significance of hepatitis A virus cellular receptor 1 (HAVCR1) in breast cancer using publicly available datasets.
Results: HAVCR1 expression was markedly downregulated in breast cancer tissues. Significantly, lower expression levels of HAVCR1 in pre-treatment tumor samples were associated with poorer prognosis among pan-cancer patients undergoing immunotherapy, and a higher incidence of metastasis was observed in the breast cancer subgroup. Subtype-specific DEG analyses further indicated that distinct patterns of immune infiltration may underlie this association. Moreover, gene set enrichment analysis (GSEA) highlighted the immunological relevance of HAVCR1, particularly its involvement in T cell activation within the TNBC subtype. Clinically, elevated levels of HAVCR1 expression in pre-treatment T cells were indicative of a more favorable response to PD-1 blockade therapy compared to those with diminished expression.
Conclusion: The expression of HAVCR1 exhibits a strong correlation with immune infiltration and holds potential as a prognostic biomarker for breast cancer, offering predictive insight into the efficacy of immunotherapeutic interventions.
{"title":"An Integrated Analysis of HAVCR1 with a Focus on Immunological and Prognostic Roles in Breast Cancer.","authors":"Wen Sun, Weiya Zhang, Jianyi Zhao, Mingyi Sang, Qixuan Feng, Wenbin Zhou, Yue Sun","doi":"10.1177/11769351251393148","DOIUrl":"10.1177/11769351251393148","url":null,"abstract":"<p><strong>Background: </strong>Breast cancer remains a predominant malignancy and a leading cause of oncologic mortality among women globally. The discovery of novel biomarkers is crucial for improving therapeutic outcomes.</p><p><strong>Methods: </strong>We conducted a comprehensive analysis of the immunological and prognostic significance of hepatitis A virus cellular receptor 1 (HAVCR1) in breast cancer using publicly available datasets.</p><p><strong>Results: </strong>HAVCR1 expression was markedly downregulated in breast cancer tissues. Significantly, lower expression levels of HAVCR1 in pre-treatment tumor samples were associated with poorer prognosis among pan-cancer patients undergoing immunotherapy, and a higher incidence of metastasis was observed in the breast cancer subgroup. Subtype-specific DEG analyses further indicated that distinct patterns of immune infiltration may underlie this association. Moreover, gene set enrichment analysis (GSEA) highlighted the immunological relevance of HAVCR1, particularly its involvement in T cell activation within the TNBC subtype. Clinically, elevated levels of HAVCR1 expression in pre-treatment T cells were indicative of a more favorable response to PD-1 blockade therapy compared to those with diminished expression.</p><p><strong>Conclusion: </strong>The expression of HAVCR1 exhibits a strong correlation with immune infiltration and holds potential as a prognostic biomarker for breast cancer, offering predictive insight into the efficacy of immunotherapeutic interventions.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251393148"},"PeriodicalIF":2.5,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12663051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145649533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28eCollection Date: 2025-01-01DOI: 10.1177/11769351251393146
Benjamin Goldberg, Eric Nels Pederson, Zhengqing Ouyang
Objective: Breast cancer is one of the most prominent and deadly diseases in the world, and its prognosis varies widely based on the expression of certain genes. Identification of these genes is important for developing and interpreting clinical prognostic tests as well as furthering our understanding of breast cancer biology. We expand on prior efforts in the field toward identifying prognostic genes, by integrating powerful statistical methods.
Methods: To this end, we use an unsupervised random forest model, which allows for robust learning of non-linear gene expression/survival relationships and the ability to identify the most important genes affecting both positive and negative breast cancer prognosis. In total, 1,518 participants were considered from the METABRIC dataset, using 20,387 mRNA expression level variables and 23 clinical variables including HER2 mutation status. The top 250 & bottom 250 expressing genes and 6 clinical features were selected for the unsupervised random forest model.
Results: Our research corroborates previous discoveries of 27 important prognostic genes while also identifying 3 genes as potentially novel prognostic factors. Based on gene ontology analysis, we additionally show that these genes have plausible connections to breast cancer biology that should be experimentally investigated.
Conclusions: Here, we demonstrate the utility of the unsupervised random forest model over K-means clustering for identifying important genes in breast cancer.
{"title":"Unsupervised Random Forest Identifies Important Genetic Prognostic Factors for Breast Cancer Survival Time.","authors":"Benjamin Goldberg, Eric Nels Pederson, Zhengqing Ouyang","doi":"10.1177/11769351251393146","DOIUrl":"10.1177/11769351251393146","url":null,"abstract":"<p><strong>Objective: </strong>Breast cancer is one of the most prominent and deadly diseases in the world, and its prognosis varies widely based on the expression of certain genes. Identification of these genes is important for developing and interpreting clinical prognostic tests as well as furthering our understanding of breast cancer biology. We expand on prior efforts in the field toward identifying prognostic genes, by integrating powerful statistical methods.</p><p><strong>Methods: </strong>To this end, we use an unsupervised random forest model, which allows for robust learning of non-linear gene expression/survival relationships and the ability to identify the most important genes affecting both positive and negative breast cancer prognosis. In total, 1,518 participants were considered from the METABRIC dataset, using 20,387 mRNA expression level variables and 23 clinical variables including <i>HER2</i> mutation status. The top 250 & bottom 250 expressing genes and 6 clinical features were selected for the unsupervised random forest model.</p><p><strong>Results: </strong>Our research corroborates previous discoveries of 27 important prognostic genes while also identifying 3 genes as potentially novel prognostic factors. Based on gene ontology analysis, we additionally show that these genes have plausible connections to breast cancer biology that should be experimentally investigated.</p><p><strong>Conclusions: </strong>Here, we demonstrate the utility of the unsupervised random forest model over K-means clustering for identifying important genes in breast cancer.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251393146"},"PeriodicalIF":2.5,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12663042/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145649557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-24eCollection Date: 2025-01-01DOI: 10.1177/11769351251396250
Tamara Babic, Bojana Banovic Djeri, Dunja Pavlovic, Sandra Dragicevic, Jovana Despotovic, Jelena Karanovic, Aleksandra Nikolic
Objectives: This study aimed to identify transcript isoforms of protein-coding genes with potential relevance to the malignant transformation of gut mucosa.
Methods: Colon cancer cell lines (HCT116, DLD1, SW620) and immortalized cells derived from healthy gut epithelium (HCEC-1CT) were cultured as spheroids and subjected to RNA sequencing to profile both canonical and non-canonical transcripts. The resulting data were compared with prior bioinformatics study findings that analyzed RNA-seq datasets from 473 patient-derived tumor and 417 non-tumor colon tissue samples.
Results: Among 375 transcripts previously reported as significantly dysregulated in colon (39 up-regulated and 336 down-regulated), 32 transcripts displayed expression patterns in colon cell lines consistent with those observed in patient tissues (4 up-regulated and 28 down-regulated). In silico characterization of these molecules revealed that all of them exhibited at least 1 feature commonly associated with RNAs possessing regulatory functions, such as coding truncated protein isoform, exosomal localization, or enrichment in repetitive elements. The most prominently dysregulated transcripts with consistent expression profiles across both datasets were NTMT1-204 (up-regulated in cancer) and BLOC1S6-218 and DCTN1-205 (both down-regulated in cancer). The remaining 343 transcripts did not show consistent expression patterns in the cell lines, suggesting their dysregulation in patient-derived tissues may be due to the stromal or microenvironmental factors absent in vitro.
Conclusion: In summary, this comparative transcriptomic analysis identified 32 transcript isoforms, comprising 2 canonical and 30 non-canonical transcripts, that may play regulatory roles in colon carcinogenesis and warrant further investigation in the context of gut epithelial cell biology.
{"title":"Comparative RNA-Seq Analysis of Colon Spheroids and Patient-derived Tissues Identifies Non-Canonical Transcript Isoforms of Protein-Coding Genes Implicated in Colon Carcinogenesis.","authors":"Tamara Babic, Bojana Banovic Djeri, Dunja Pavlovic, Sandra Dragicevic, Jovana Despotovic, Jelena Karanovic, Aleksandra Nikolic","doi":"10.1177/11769351251396250","DOIUrl":"https://doi.org/10.1177/11769351251396250","url":null,"abstract":"<p><strong>Objectives: </strong>This study aimed to identify transcript isoforms of protein-coding genes with potential relevance to the malignant transformation of gut mucosa.</p><p><strong>Methods: </strong>Colon cancer cell lines (HCT116, DLD1, SW620) and immortalized cells derived from healthy gut epithelium (HCEC-1CT) were cultured as spheroids and subjected to RNA sequencing to profile both canonical and non-canonical transcripts. The resulting data were compared with prior bioinformatics study findings that analyzed RNA-seq datasets from 473 patient-derived tumor and 417 non-tumor colon tissue samples.</p><p><strong>Results: </strong>Among 375 transcripts previously reported as significantly dysregulated in colon (39 up-regulated and 336 down-regulated), 32 transcripts displayed expression patterns in colon cell lines consistent with those observed in patient tissues (4 up-regulated and 28 down-regulated). In silico characterization of these molecules revealed that all of them exhibited at least 1 feature commonly associated with RNAs possessing regulatory functions, such as coding truncated protein isoform, exosomal localization, or enrichment in repetitive elements. The most prominently dysregulated transcripts with consistent expression profiles across both datasets were NTMT1-204 (up-regulated in cancer) and BLOC1S6-218 and DCTN1-205 (both down-regulated in cancer). The remaining 343 transcripts did not show consistent expression patterns in the cell lines, suggesting their dysregulation in patient-derived tissues may be due to the stromal or microenvironmental factors absent in vitro.</p><p><strong>Conclusion: </strong>In summary, this comparative transcriptomic analysis identified 32 transcript isoforms, comprising 2 canonical and 30 non-canonical transcripts, that may play regulatory roles in colon carcinogenesis and warrant further investigation in the context of gut epithelial cell biology.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251396250"},"PeriodicalIF":2.5,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12647565/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145640286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-23eCollection Date: 2025-01-01DOI: 10.1177/11769351251394271
Poonamjeet Kaur Loyal, Edward Chege, Jasmit Shah, Anne Mwirigi, Samuel Nguku Gitau
Background: Patients with Human Immunodeficiency Virus (HIV)have an atypical imaging pattern of lymphoma. There is paucity of literature on differences in tumor volume or burden of disease amongst HIV positive patients compared with HIV negative patients and how this correlates with clinicopathological parameters of aggressiveness and prognosis.
Methods: This was a retrospective cross-sectional study of patients with non-Hodgkin lymphoma which were categorized into HIV positive and HIV negative. The tumor burden, disease sites, international prognostic score and Ki-67 index were recorded. Continuous variables were analyzed using the Kruskal Wallis test and categorical variables with Fisher's Exact test.
Results: Out of the 92 patients with non-Hodgkin lymphoma, 51.1% were HIV positive with a median age of 45.0 years. The median sum of product diameters used to measure tumor burden was 102.6 [IQR: 51.7, 173.1] with no significant difference seen between the 2 groups. The extranodal disease was significantly higher in the HIV positive group (85.1%) while exclusive nodal disease was seen predominantly in the non-HIV group (66.7%) (P < .001). Complete treatment response was higher in the non-HIV group 54.5% compared to 20.9% for the HIV group (P < .001). More HIV positive patients succumbed, 37.2% compared to the 4.5% for non-HIV patients (P < .001).
Conclusion: HIV-related lymphoma remains a poorly understood subset. Although there was no significant difference in overall tumor burden between HIV positive and negative patients, extranodal disease was significantly higher in the HIV positive patients. Furthermore, the clinical prognostication score and Ki-67 which apply well for HIV-negative patients may not apply for HIV-related lymphoma.
背景:人类免疫缺陷病毒(HIV)患者具有非典型的淋巴瘤影像学特征。与HIV阴性患者相比,HIV阳性患者的肿瘤体积或疾病负担的差异以及这与侵袭性和预后的临床病理参数之间的关系,文献很少。方法:对HIV阳性和HIV阴性的非霍奇金淋巴瘤患者进行回顾性横断面研究。记录肿瘤负荷、发病部位、国际预后评分及Ki-67指数。连续变量采用Kruskal Wallis检验,分类变量采用Fisher精确检验。结果:92例非霍奇金淋巴瘤患者中,51.1%为HIV阳性,中位年龄为45.0岁。用于测量肿瘤负荷的产品直径中位数和为102.6 [IQR: 51.7, 173.1],两组间无显著差异。结外疾病在HIV阳性组中显著增加(85.1%),而排他性淋巴结疾病主要见于非HIV组(66.7%)(P P P)。虽然HIV阳性和阴性患者的总体肿瘤负担没有显著差异,但HIV阳性患者的结外病变明显更高。此外,适用于hiv阴性患者的临床预后评分和Ki-67可能不适用于hiv相关淋巴瘤。
{"title":"Lymphoma Imaging in HIV and Non-HIV Patients: A Retrospective Cross-Sectional Study With Clinical and Pathological Correlation.","authors":"Poonamjeet Kaur Loyal, Edward Chege, Jasmit Shah, Anne Mwirigi, Samuel Nguku Gitau","doi":"10.1177/11769351251394271","DOIUrl":"https://doi.org/10.1177/11769351251394271","url":null,"abstract":"<p><strong>Background: </strong>Patients with Human Immunodeficiency Virus (HIV)have an atypical imaging pattern of lymphoma. There is paucity of literature on differences in tumor volume or burden of disease amongst HIV positive patients compared with HIV negative patients and how this correlates with clinicopathological parameters of aggressiveness and prognosis.</p><p><strong>Methods: </strong>This was a retrospective cross-sectional study of patients with non-Hodgkin lymphoma which were categorized into HIV positive and HIV negative. The tumor burden, disease sites, international prognostic score and Ki-67 index were recorded. Continuous variables were analyzed using the Kruskal Wallis test and categorical variables with Fisher's Exact test.</p><p><strong>Results: </strong>Out of the 92 patients with non-Hodgkin lymphoma, 51.1% were HIV positive with a median age of 45.0 years. The median sum of product diameters used to measure tumor burden was 102.6 [IQR: 51.7, 173.1] with no significant difference seen between the 2 groups. The extranodal disease was significantly higher in the HIV positive group (85.1%) while exclusive nodal disease was seen predominantly in the non-HIV group (66.7%) (<i>P</i> < .001). Complete treatment response was higher in the non-HIV group 54.5% compared to 20.9% for the HIV group (<i>P</i> < .001). More HIV positive patients succumbed, 37.2% compared to the 4.5% for non-HIV patients (<i>P</i> < .001).</p><p><strong>Conclusion: </strong>HIV-related lymphoma remains a poorly understood subset. Although there was no significant difference in overall tumor burden between HIV positive and negative patients, extranodal disease was significantly higher in the HIV positive patients. Furthermore, the clinical prognostication score and Ki-67 which apply well for HIV-negative patients may not apply for HIV-related lymphoma.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251394271"},"PeriodicalIF":2.5,"publicationDate":"2025-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12644430/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145640519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-20eCollection Date: 2025-01-01DOI: 10.1177/11769351251389781
Hannah M Tosi, Chunlei Zheng, Amelia H Tarren, Meghana Yellanki, Stephen J Miller, Oleg V Soloviev, June K Corrigan, George R Schneeloch, Hormuzd A Katki, Lauren E Kearney, Tanner J Caverly, Nichole T Tanner, Renda Soylemez Wiener, Mary Brophy, Nathanael R Fillmore, Nhan V Do, Danne C Elbers
Objectives: The objective of the Prediction Augmented Screening Initiative (PASI) pilot application was to design and implement a clinical tool to optimize the lung cancer screening (LCS) workflow for providers. The Boston Informatics Group (BIG) at the Department of Veterans Affairs (VA) developed the Enabling Technologies for Rapid Learning Health Systems Platform (ENTHRALL) to support delivery of knowledge in a Learning Health System (LHS) framework. The BIG leveraged ENTHRALL to implement the PASI pilot application on a very short timeline. The application uses VA data to estimate patients' benefit from LCS based on National Cancer Institute (NCI) models, allowing proactive outreach to patients with high predicted benefit from LCS.
Methods: The application was designed utilizing ENTHRALL infrastructure, including optimized nightly data pulls to gather patient information, Natural Language Processing to extract smoking history, and a user interface (UI). Cross-functional collaboration allowed the use of the NCI's peer-reviewed prediction algorithm to provide daily patient benefit scores.
Results: The UI displays patients in descending order of benefit, delivering a prioritized list to providers. Clinicians can fill in information and track patient status to assist with their outreach activities. For the pilot, only patients meeting USPSTF LCS criteria (the current field standard) were displayed. Five VA stations were included.
Conclusions: Utilizing the VA BIG's ENTHRALL framework for an LHS, the group demonstrated their ability to design and deliver a new application within 3 months of inception, which was successfully utilized at 5 VA hospitals. The VA's capability to rapidly build clinically relevant applications will help it become an LHS tailored to current problems impacting the Veteran. Due to the success of the pilot, the clinical research team got approval to expand their study. The BIG is working on a non-pilot build.
{"title":"Rapid Support and Implementation of an Application for the Prediction Augmented Screening Initiative (PASI) Planning Phase Through the Enabling Technologies for Rapid Learning Health Systems Platform (ENTHRALL) at the Department of Veterans Affairs (VA).","authors":"Hannah M Tosi, Chunlei Zheng, Amelia H Tarren, Meghana Yellanki, Stephen J Miller, Oleg V Soloviev, June K Corrigan, George R Schneeloch, Hormuzd A Katki, Lauren E Kearney, Tanner J Caverly, Nichole T Tanner, Renda Soylemez Wiener, Mary Brophy, Nathanael R Fillmore, Nhan V Do, Danne C Elbers","doi":"10.1177/11769351251389781","DOIUrl":"10.1177/11769351251389781","url":null,"abstract":"<p><strong>Objectives: </strong>The objective of the Prediction Augmented Screening Initiative (PASI) pilot application was to design and implement a clinical tool to optimize the lung cancer screening (LCS) workflow for providers. The Boston Informatics Group (BIG) at the Department of Veterans Affairs (VA) developed the Enabling Technologies for Rapid Learning Health Systems Platform (ENTHRALL) to support delivery of knowledge in a Learning Health System (LHS) framework. The BIG leveraged ENTHRALL to implement the PASI pilot application on a very short timeline. The application uses VA data to estimate patients' benefit from LCS based on National Cancer Institute (NCI) models, allowing proactive outreach to patients with high predicted benefit from LCS.</p><p><strong>Methods: </strong>The application was designed utilizing ENTHRALL infrastructure, including optimized nightly data pulls to gather patient information, Natural Language Processing to extract smoking history, and a user interface (UI). Cross-functional collaboration allowed the use of the NCI's peer-reviewed prediction algorithm to provide daily patient benefit scores.</p><p><strong>Results: </strong>The UI displays patients in descending order of benefit, delivering a prioritized list to providers. Clinicians can fill in information and track patient status to assist with their outreach activities. For the pilot, only patients meeting USPSTF LCS criteria (the current field standard) were displayed. Five VA stations were included.</p><p><strong>Conclusions: </strong>Utilizing the VA BIG's ENTHRALL framework for an LHS, the group demonstrated their ability to design and deliver a new application within 3 months of inception, which was successfully utilized at 5 VA hospitals. The VA's capability to rapidly build clinically relevant applications will help it become an LHS tailored to current problems impacting the Veteran. Due to the success of the pilot, the clinical research team got approval to expand their study. The BIG is working on a non-pilot build.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251389781"},"PeriodicalIF":2.5,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12638702/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145588690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01eCollection Date: 2025-01-01DOI: 10.1177/11769351251380520
Qiang Yi, Yaoyao Mei, Zhu Yang, Yi Liu
Background: Carbonic anhydrase 9 (CA9) plays a crucial role in pH regulation and adaptation under hypoxic conditions in the tumor microenvironment. Despite its known involvement in the progression of specific cancers, a comprehensive pan-cancer examination of the prognostic value and biological implications of CA9 has not been performed. This study systematically explored the diverse roles of CA9 across multiple cancer types.
Methods: Bioinformatics methods were applied via extensive datasets from TCGA, GTEx, CPTAC, CancerSEA, and the public literature. We systematically analyzed the associations between CA9 expression profiles and various clinical parameters, prognosis, immune infiltration, immune-related genes, TMB, MSI, and tumor stemness scores. Additionally, a single-cell functional analysis was conducted.
Results: CA9 was significantly upregulated in 29 out of 33 cancer types, indicating high discriminatory ability between tumor and normal tissues. Elevated CA9 expression correlated with poor OS and PFIs in multiple cancers, such as GBMLGG, CESC, LUAD, KIPAN, GBM, THYM, LIHC, THCA, PAAD, and KICH. In 39 cancers, CA9 expression was predominantly negatively correlated with the infiltration of 22 immune cell infiltrations. It was also associated with TMB in 12 tumors and with MSI in 9. Single-cell analysis revealed positive links between CA9 and essential processes such as hypoxia, metastasis, angiogenesis, and stemness.
Conclusion: This study provides compelling evidence that CA9 is a potential pan-cancer prognostic marker and diagnostic tool. The associations of CA9 with immune components and determinants of immunotherapy response indicate the importance of CA9 in advancing cancer research and personalized treatment strategies.
{"title":"Systematic Analysis of CA9 as a Pan-Cancer Marker for Prognosis and Immunity.","authors":"Qiang Yi, Yaoyao Mei, Zhu Yang, Yi Liu","doi":"10.1177/11769351251380520","DOIUrl":"10.1177/11769351251380520","url":null,"abstract":"<p><strong>Background: </strong>Carbonic anhydrase 9 (CA9) plays a crucial role in pH regulation and adaptation under hypoxic conditions in the tumor microenvironment. Despite its known involvement in the progression of specific cancers, a comprehensive pan-cancer examination of the prognostic value and biological implications of CA9 has not been performed. This study systematically explored the diverse roles of CA9 across multiple cancer types.</p><p><strong>Methods: </strong>Bioinformatics methods were applied via extensive datasets from TCGA, GTEx, CPTAC, CancerSEA, and the public literature. We systematically analyzed the associations between CA9 expression profiles and various clinical parameters, prognosis, immune infiltration, immune-related genes, TMB, MSI, and tumor stemness scores. Additionally, a single-cell functional analysis was conducted.</p><p><strong>Results: </strong>CA9 was significantly upregulated in 29 out of 33 cancer types, indicating high discriminatory ability between tumor and normal tissues. Elevated CA9 expression correlated with poor OS and PFIs in multiple cancers, such as GBMLGG, CESC, LUAD, KIPAN, GBM, THYM, LIHC, THCA, PAAD, and KICH. In 39 cancers, CA9 expression was predominantly negatively correlated with the infiltration of 22 immune cell infiltrations. It was also associated with TMB in 12 tumors and with MSI in 9. Single-cell analysis revealed positive links between CA9 and essential processes such as hypoxia, metastasis, angiogenesis, and stemness.</p><p><strong>Conclusion: </strong>This study provides compelling evidence that CA9 is a potential pan-cancer prognostic marker and diagnostic tool. The associations of CA9 with immune components and determinants of immunotherapy response indicate the importance of CA9 in advancing cancer research and personalized treatment strategies.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251380520"},"PeriodicalIF":2.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12489208/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145233586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}