Pub Date : 2026-01-03eCollection Date: 2026-01-01DOI: 10.1177/11769351251411074
Hadrien T Gayap, Philippe-Pierre Robichaud, Nicolas Crapoulet, Eric P Allain
Background and objectives: Next-generation sequencing (NGS) is transforming clinical diagnostics by enabling the detection of genetic variation with unprecedented precision. However, successful implementation of NGS workflows necessitates stringent quality control. This study introduces Molecular Genetics Dashboard (MGDB), a novel bioinformatics tool designed to enhance quality control in clinical NGS workflows.
Methods: Using the Python dash framework for visualizations and MySQL databases, we have developed a novel tool for variant-level monitoring of clinical NGS sequencing runs. MGDB uses a docker-compose containerization for improved portability and can flexibly include or exclude samples from accumulated statistics with notes from interpreters.
Results: MGDB facilitates variant-level run-to-run monitoring, ensuring the consistency of variant detection across sequencing cycles. The tool provides an interactive platform for visualizing and assessing variant data, identifying potential inconsistencies or outliers and improving data management and interpretation compared to traditional methods. MGDB was tested using samples sequenced with Oncomine Focus/Comprehensive Plus assays on S5 sequencers and analyzed via IonReporter software.
Conclusions: MGDB offers a robust and user-friendly solution for enhancing quality control in clinical NGS workflows, contributing to greater accuracy and reliability in variant detection. The tool is freely available on GitHub: https://github.com/acri-nb/GeneticVariantsDB.
{"title":"MGDB: A Novel Bioinformatics Quality Control Tool for Clinical Next-Generation Sequencing.","authors":"Hadrien T Gayap, Philippe-Pierre Robichaud, Nicolas Crapoulet, Eric P Allain","doi":"10.1177/11769351251411074","DOIUrl":"10.1177/11769351251411074","url":null,"abstract":"<p><strong>Background and objectives: </strong>Next-generation sequencing (NGS) is transforming clinical diagnostics by enabling the detection of genetic variation with unprecedented precision. However, successful implementation of NGS workflows necessitates stringent quality control. This study introduces Molecular Genetics Dashboard (MGDB), a novel bioinformatics tool designed to enhance quality control in clinical NGS workflows.</p><p><strong>Methods: </strong>Using the Python dash framework for visualizations and MySQL databases, we have developed a novel tool for variant-level monitoring of clinical NGS sequencing runs. MGDB uses a docker-compose containerization for improved portability and can flexibly include or exclude samples from accumulated statistics with notes from interpreters.</p><p><strong>Results: </strong>MGDB facilitates variant-level run-to-run monitoring, ensuring the consistency of variant detection across sequencing cycles. The tool provides an interactive platform for visualizing and assessing variant data, identifying potential inconsistencies or outliers and improving data management and interpretation compared to traditional methods. MGDB was tested using samples sequenced with Oncomine Focus/Comprehensive Plus assays on S5 sequencers and analyzed via IonReporter software.</p><p><strong>Conclusions: </strong>MGDB offers a robust and user-friendly solution for enhancing quality control in clinical NGS workflows, contributing to greater accuracy and reliability in variant detection. The tool is freely available on GitHub: https://github.com/acri-nb/GeneticVariantsDB.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"25 ","pages":"11769351251411074"},"PeriodicalIF":2.5,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12764754/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145906846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23eCollection Date: 2025-01-01DOI: 10.1177/11769351251394107
Shuting Lin, Peng Qiu
Objectives: Cancer stratification is essential for accurate prognosis and personalized treatment selection. While many existing approaches integrate multiple omics data types to identify cancer subtypes, it remains unclear how clustering results from individual omics layers compare in their ability to capture survival-related patient clusters. This study aims to examine patient clusters separately defined by different omics data types and to explore the consistency of these clusters as well as their associations with survival outcomes.
Methods: In this study, we conducted clustering analysis on miRNA expression, gene expression, and DNA methylation data across 20 cancer types in TCGA. We employed a standard clustering pipeline similar to the widely used Seurat clustering pipeline in scRNA-seq analysis. We performed survival analysis to assess whether the resulting patient clusters exhibit significantly different survival outcomes.
Results: We observed significant survival differences among patient clusters in 11 cancer types. Notably, in 6 of these 11 cancer types, the survival differences among patient clusters were significant in multiple omics data types. For each of these 6 cancer types, we compared the consistency of patient clusters across different omics data types. Interestingly, in each cancer type, we noticed one set of patients who consistently clustered together irrespective of the omics data type, and these patients exhibited either the most favorable or the most unfavorable survival outcomes. This observation suggested that those patients with the most prominent survival outcomes show distinct expression patterns in multiple genomics aspects and could be captured by clustering analysis in multiple omics data types. To interpret these findings, we identified differentially expressed molecular features. Using established miRNA-target relationships, gene-gene interactions, as well as gene-CpG relationships, we constructed networks specific to each cancer type based on the differentially expressed features. These networks revealed several molecular modules associated with patient survival outcomes, such as the miR-200c-3p/ZEB2 axis in bladder cancer, the regulatory role of miR-98 in breast cancer, as well as the association of miR-21 with target genes APC in kidney renal cell carcinoma.
Conclusion: These findings suggest that omics-specific clustering can identify robust survival-related patient clusters and uncover molecular features that may contribute to differential survival outcomes.
{"title":"Clustering Analysis of Multiple Omics Data Types Identifies Cancer Patients With Consistent Survival Outcomes.","authors":"Shuting Lin, Peng Qiu","doi":"10.1177/11769351251394107","DOIUrl":"10.1177/11769351251394107","url":null,"abstract":"<p><strong>Objectives: </strong>Cancer stratification is essential for accurate prognosis and personalized treatment selection. While many existing approaches integrate multiple omics data types to identify cancer subtypes, it remains unclear how clustering results from individual omics layers compare in their ability to capture survival-related patient clusters. This study aims to examine patient clusters separately defined by different omics data types and to explore the consistency of these clusters as well as their associations with survival outcomes.</p><p><strong>Methods: </strong>In this study, we conducted clustering analysis on miRNA expression, gene expression, and DNA methylation data across 20 cancer types in TCGA. We employed a standard clustering pipeline similar to the widely used Seurat clustering pipeline in scRNA-seq analysis. We performed survival analysis to assess whether the resulting patient clusters exhibit significantly different survival outcomes.</p><p><strong>Results: </strong>We observed significant survival differences among patient clusters in 11 cancer types. Notably, in 6 of these 11 cancer types, the survival differences among patient clusters were significant in multiple omics data types. For each of these 6 cancer types, we compared the consistency of patient clusters across different omics data types. Interestingly, in each cancer type, we noticed one set of patients who consistently clustered together irrespective of the omics data type, and these patients exhibited either the most favorable or the most unfavorable survival outcomes. This observation suggested that those patients with the most prominent survival outcomes show distinct expression patterns in multiple genomics aspects and could be captured by clustering analysis in multiple omics data types. To interpret these findings, we identified differentially expressed molecular features. Using established miRNA-target relationships, gene-gene interactions, as well as gene-CpG relationships, we constructed networks specific to each cancer type based on the differentially expressed features. These networks revealed several molecular modules associated with patient survival outcomes, such as the miR-200c-3p/ZEB2 axis in bladder cancer, the regulatory role of miR-98 in breast cancer, as well as the association of miR-21 with target genes APC in kidney renal cell carcinoma.</p><p><strong>Conclusion: </strong>These findings suggest that omics-specific clustering can identify robust survival-related patient clusters and uncover molecular features that may contribute to differential survival outcomes.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251394107"},"PeriodicalIF":2.5,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12743153/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145850943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objectives: With the increasing application of high-throughput transcriptomic data in cancer research, identifying reliable cancer biomarkers in high-dimensional settings remains a major challenge. This study aims to systematically evaluate various regularized conditional logistic regression (CLR) methods under a matched case-control (MCC) design, focusing on their performance in variable selection, parameter estimation, and predictive accuracy. Special emphasis is placed on the importance of the matching design in reducing confounding effects and improving model interpretability.
Methods: We utilize RNA-seq data from The Cancer Genome Atlas (TCGA), specifically datasets for liver, thyroid, and lung cancers, which include paired tumor and adjacent normal tissue samples. In our analysis, we apply 4 regularized CLR methods implemented in R packages-namely "clogitL1," "pclogit," "clogitLasso," and "penalizedclr"-to analyze over 20 000 gene expression features. We evaluate the comparative performance of these methods based on metrics such as gene selection stability, predictive accuracy, and interpretability. Additionally, we employ a bootstrap resampling framework to estimate gene selection probabilities, which serve as a measure of gene importance.
Results: Our results show that incorporating the MCC design significantly enhances feature selection performance by mitigating confounding noise. The regularized CLR models successfully identify several well-established cancer-related genes with high selection consistency and statistical significance. In contrast, models that ignore the matched design tend to miss critical biomarkers or produce excessive false positives, leading to potentially misleading interpretations.
Conclusions: This study highlights the value of integrating a matched case-control design with regularized CLR methods for the analysis of high-dimensional transcriptomic data. The proposed analytical framework offers improved accuracy, robustness, and biological relevance, providing a practical and scalable approach for cancer genomics research. It also supports the advancement of precision medicine and translational applications.
{"title":"Robust Cancer Biomarker Identification From Matched Transcriptomic Data Via Bootstrapped Regularized Conditional Logistic Regression.","authors":"Jie-Huei Wang, Zih-Han Wu, Hui-Chen Lu, Tzung-Ying Guo","doi":"10.1177/11769351251404255","DOIUrl":"10.1177/11769351251404255","url":null,"abstract":"<p><strong>Objectives: </strong>With the increasing application of high-throughput transcriptomic data in cancer research, identifying reliable cancer biomarkers in high-dimensional settings remains a major challenge. This study aims to systematically evaluate various regularized conditional logistic regression (CLR) methods under a matched case-control (MCC) design, focusing on their performance in variable selection, parameter estimation, and predictive accuracy. Special emphasis is placed on the importance of the matching design in reducing confounding effects and improving model interpretability.</p><p><strong>Methods: </strong>We utilize RNA-seq data from The Cancer Genome Atlas (TCGA), specifically datasets for liver, thyroid, and lung cancers, which include paired tumor and adjacent normal tissue samples. In our analysis, we apply 4 regularized CLR methods implemented in R packages-namely \"clogitL1,\" \"pclogit,\" \"clogitLasso,\" and \"penalizedclr\"-to analyze over 20 000 gene expression features. We evaluate the comparative performance of these methods based on metrics such as gene selection stability, predictive accuracy, and interpretability. Additionally, we employ a bootstrap resampling framework to estimate gene selection probabilities, which serve as a measure of gene importance.</p><p><strong>Results: </strong>Our results show that incorporating the MCC design significantly enhances feature selection performance by mitigating confounding noise. The regularized CLR models successfully identify several well-established cancer-related genes with high selection consistency and statistical significance. In contrast, models that ignore the matched design tend to miss critical biomarkers or produce excessive false positives, leading to potentially misleading interpretations.</p><p><strong>Conclusions: </strong>This study highlights the value of integrating a matched case-control design with regularized CLR methods for the analysis of high-dimensional transcriptomic data. The proposed analytical framework offers improved accuracy, robustness, and biological relevance, providing a practical and scalable approach for cancer genomics research. It also supports the advancement of precision medicine and translational applications.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251404255"},"PeriodicalIF":2.5,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12709001/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145782996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p><strong>Background: </strong>Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal malignancy with a dismal 5-year survival rate, largely due to the absence of reliable biomarkers for early detection. The molecular mechanisms underpinning PDAC pathogenesis remain incompletely understood, highlighting the urgent need for novel diagnostic strategies.</p><p><strong>Objective: </strong>This study aimed to integrate eQTL-driven Mendelian randomization (MR) with transcriptomic and genome-wide association data to identify causal PDAC-associated genes and construct a diagnostic nomogram based on 5 hub genes (CTSC, SMYD3, MFGE8, IGFBP7, POC1B) for early detection of pancreatic ductal adenocarcinoma (PDAC).</p><p><strong>Methods: </strong>Transcriptomic data from GSE62165 and GSE25471 were retrieved from the Gene Expression Omnibus (GEO) and processed for differential expression using LIMMA and GEO2R, followed by batch correction and weighted gene co-expression network analysis (WGCNA). Summary-level eQTL statistics were obtained from OpenGWAS, and GWAS data included over 5000 PDAC cases. MR analysis was performed using inverse variance weighted (IVW) as the primary approach, supplemented with MR-Egger, weighted median, weighted mode, and MR-PRESSO. Instrument strength, pleiotropy, and heterogeneity were assessed via F-statistics, Egger intercept, and Cochran's <i>Q</i> test. Candidate genes were filtered using a consensus approach combining random forest (RF), support vector machine-recursive feature elimination (SVM-RFE), and Lasso regression. Diagnostic performance was evaluated via ROC curves, C-index, calibration plots, and decision curve analysis. Mechanistic insights were derived from KEGG and GO enrichment analyses, as well as protein-protein interaction (PPI) network analyses.</p><p><strong>Results: </strong>Five eQTL-associated hub genes-<b>CTSC, SMYD3, MFGE8, IGFBP7, and POC1B</b>-were identified as causally linked to PDAC via robust MR analysis with minimal evidence of pleiotropy or heterogeneity. These genes demonstrated high diagnostic potential (AUC > 0.85, <i>P</i> < .001). A diagnostic nomogram incorporating these genes achieved strong predictive performance (C-index = 0.92) with favorable clinical decision curve results. Functional enrichment and PPI analyses implicated these genes, particularly CTSC, in modulating the <b>ITGAV/ITGB3-PI3K-Akt signaling axis</b>, contributing to PDAC cell cycle regulation and apoptosis resistance.</p><p><strong>Conclusions: </strong>This study presents a multi-omics, MR-informed framework for identifying eQTL-regulated biomarkers of PDAC. The identified hub genes offer promising avenues for early detection, while the mechanistic mapping of the PI3K-Akt pathway provides translational insights. These findings warrant further validation in clinical and experimental settings and hold potential to reshape PDAC diagnostic strategies.Pancreatic ductal adenocarcinoma (PDAC) remains a formidable clinical ch
{"title":"Integrative Analysis of eQTL Genes Reveals Key Biomarkers and Mechanisms for Early Diagnosis of Pancreatic Ductal Adenocarcinoma.","authors":"Xuebo Wang, Xusheng Zhang, Shicai Liang, Jialong Wang, Yannan Xie, Jiawei Wang, Bendong Chen","doi":"10.1177/11769351251400465","DOIUrl":"10.1177/11769351251400465","url":null,"abstract":"<p><strong>Background: </strong>Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal malignancy with a dismal 5-year survival rate, largely due to the absence of reliable biomarkers for early detection. The molecular mechanisms underpinning PDAC pathogenesis remain incompletely understood, highlighting the urgent need for novel diagnostic strategies.</p><p><strong>Objective: </strong>This study aimed to integrate eQTL-driven Mendelian randomization (MR) with transcriptomic and genome-wide association data to identify causal PDAC-associated genes and construct a diagnostic nomogram based on 5 hub genes (CTSC, SMYD3, MFGE8, IGFBP7, POC1B) for early detection of pancreatic ductal adenocarcinoma (PDAC).</p><p><strong>Methods: </strong>Transcriptomic data from GSE62165 and GSE25471 were retrieved from the Gene Expression Omnibus (GEO) and processed for differential expression using LIMMA and GEO2R, followed by batch correction and weighted gene co-expression network analysis (WGCNA). Summary-level eQTL statistics were obtained from OpenGWAS, and GWAS data included over 5000 PDAC cases. MR analysis was performed using inverse variance weighted (IVW) as the primary approach, supplemented with MR-Egger, weighted median, weighted mode, and MR-PRESSO. Instrument strength, pleiotropy, and heterogeneity were assessed via F-statistics, Egger intercept, and Cochran's <i>Q</i> test. Candidate genes were filtered using a consensus approach combining random forest (RF), support vector machine-recursive feature elimination (SVM-RFE), and Lasso regression. Diagnostic performance was evaluated via ROC curves, C-index, calibration plots, and decision curve analysis. Mechanistic insights were derived from KEGG and GO enrichment analyses, as well as protein-protein interaction (PPI) network analyses.</p><p><strong>Results: </strong>Five eQTL-associated hub genes-<b>CTSC, SMYD3, MFGE8, IGFBP7, and POC1B</b>-were identified as causally linked to PDAC via robust MR analysis with minimal evidence of pleiotropy or heterogeneity. These genes demonstrated high diagnostic potential (AUC > 0.85, <i>P</i> < .001). A diagnostic nomogram incorporating these genes achieved strong predictive performance (C-index = 0.92) with favorable clinical decision curve results. Functional enrichment and PPI analyses implicated these genes, particularly CTSC, in modulating the <b>ITGAV/ITGB3-PI3K-Akt signaling axis</b>, contributing to PDAC cell cycle regulation and apoptosis resistance.</p><p><strong>Conclusions: </strong>This study presents a multi-omics, MR-informed framework for identifying eQTL-regulated biomarkers of PDAC. The identified hub genes offer promising avenues for early detection, while the mechanistic mapping of the PI3K-Akt pathway provides translational insights. These findings warrant further validation in clinical and experimental settings and hold potential to reshape PDAC diagnostic strategies.Pancreatic ductal adenocarcinoma (PDAC) remains a formidable clinical ch","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251400465"},"PeriodicalIF":2.5,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12709030/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145782962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-12eCollection Date: 2025-01-01DOI: 10.1177/11769351251401330
Syed Billal Hossain, Md Mizanoor Rahman, Kapashia Binte Giash, Md Hazrat Ali, Mst Asma Akter, A B M Alauddin Chowdhury
Background: Post-mastectomy PTSD is a serious mental health issue, but it has not been studied enough, particularly in low-resource settings like Bangladesh. This study aimed to predict PTSD among breast cancer survivors using machine learning (ML) models and identify significant predictors through the Boruta algorithm, a feature selection tool, offering scalable solutions for early detection and intervention.
Methods: A cross-sectional study of 138 post-mastectomy breast cancer patients was conducted across 3 hospitals in Bangladesh. Data on sociodemographic, health history, social experience, and treatment were collected using validated tools, including the PTSD Checklist for DSM-5 (PCL-5). The Boruta algorithm identified key predictors, and 10 ML models were evaluated for PTSD prediction using metrics such as accuracy, sensitivity, specificity, and AUC.
Results: Random Forest (RF) outperformed other models (accuracy: 88.9%, AUC: 0.914). Significant predictors included education, monthly income, and changes in family behaviour. Factors like marital status, having chronic diseases, and hormone therapy were not statistically significant. PTSD prevalence was 34.1%, with urban residents and younger patients facing higher risks.
Conclusion: ML models, particularly RF, demonstrated strong predictive performance and identified critical PTSD predictors. These findings highlight the potential for cost-effective PTSD screening in resource-constrained settings. Future research should focus on broader validation and longitudinal studies to refine predictive models.
{"title":"Prediction and Feature Selection of Mastectomy-Related Post Traumatic Stress Disorder (PTSD) Using Machine Learning Among Breast Cancer Patients in Bangladesh.","authors":"Syed Billal Hossain, Md Mizanoor Rahman, Kapashia Binte Giash, Md Hazrat Ali, Mst Asma Akter, A B M Alauddin Chowdhury","doi":"10.1177/11769351251401330","DOIUrl":"10.1177/11769351251401330","url":null,"abstract":"<p><strong>Background: </strong>Post-mastectomy PTSD is a serious mental health issue, but it has not been studied enough, particularly in low-resource settings like Bangladesh. This study aimed to predict PTSD among breast cancer survivors using machine learning (ML) models and identify significant predictors through the Boruta algorithm, a feature selection tool, offering scalable solutions for early detection and intervention.</p><p><strong>Methods: </strong>A cross-sectional study of 138 post-mastectomy breast cancer patients was conducted across 3 hospitals in Bangladesh. Data on sociodemographic, health history, social experience, and treatment were collected using validated tools, including the PTSD Checklist for DSM-5 (PCL-5). The Boruta algorithm identified key predictors, and 10 ML models were evaluated for PTSD prediction using metrics such as accuracy, sensitivity, specificity, and AUC.</p><p><strong>Results: </strong>Random Forest (RF) outperformed other models (accuracy: 88.9%, AUC: 0.914). Significant predictors included education, monthly income, and changes in family behaviour. Factors like marital status, having chronic diseases, and hormone therapy were not statistically significant. PTSD prevalence was 34.1%, with urban residents and younger patients facing higher risks.</p><p><strong>Conclusion: </strong>ML models, particularly RF, demonstrated strong predictive performance and identified critical PTSD predictors. These findings highlight the potential for cost-effective PTSD screening in resource-constrained settings. Future research should focus on broader validation and longitudinal studies to refine predictive models.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251401330"},"PeriodicalIF":2.5,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701936/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-29eCollection Date: 2025-01-01DOI: 10.1177/11769351251396242
Lulu Wang, Hua Jin, Xiaowei Liu, Hanzhi Zhang
Objectives: The aim of this study is to investigate the role of epithelial cell transforming sequence 2 (ECT2) as a pan-cancer biomarker and to assess its potential as an immune-related target for cancer immunotherapy.
Methods: We conducted a comprehensive analysis of ECT2 expression across 44 tumor types using large-scale transcriptomic datasets from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. Pan-cancer Cox regression analyses were performed to evaluate the correlation between ECT2 expression and patient survival outcomes. Functional assays, including ECT2 knockdown via shRNA in the HepG2 hepatocellular carcinoma (HCC) cell line, were employed to investigate its mechanistic role. Transcriptomic profiling and pathway analyses were also conducted to explore the impact of ECT2 on cell proliferation and the tumor immune microenvironment.
Results: ECT2 was found to be significantly upregulated in 31 tumor types. Elevated ECT2 expression was consistently associated with worse overall survival (OS), disease-specific survival (DSS), disease-free interval (DFI), and progression-free interval (PFI) across multiple cancer subtypes. Functional assays revealed that ECT2 knockdown significantly reduced HepG2 cell viability and impaired cell cycle progression, with downregulation of Cyclin D1. Transcriptomic analysis of ECT2-depleted cells indicated enriched gene sets related to cell proliferation and mitotic regulation. Additionally, ECT2 expression was significantly correlated with immune features, including immune cell infiltration, immune checkpoint gene expression, tumor mutational burden (TMB), and microsatellite instability (MSI).
Conclusion: ECT2 is identified as a potential pan-cancer prognostic biomarker with dual roles in tumor initiation and progression, as well as in modulating the tumor immune microenvironment. Our findings suggest that ECT2 may serve as a promising therapeutic target in cancer immunotherapy, warranting further investigation into its immune-regulatory and oncogenic functions.
{"title":"Pan-Cancer Analysis of the Prognostic and Immunological Role of ECT2: A Promising Target for Survival and Immunotherapy.","authors":"Lulu Wang, Hua Jin, Xiaowei Liu, Hanzhi Zhang","doi":"10.1177/11769351251396242","DOIUrl":"10.1177/11769351251396242","url":null,"abstract":"<p><strong>Objectives: </strong>The aim of this study is to investigate the role of epithelial cell transforming sequence 2 (ECT2) as a pan-cancer biomarker and to assess its potential as an immune-related target for cancer immunotherapy.</p><p><strong>Methods: </strong>We conducted a comprehensive analysis of ECT2 expression across 44 tumor types using large-scale transcriptomic datasets from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. Pan-cancer Cox regression analyses were performed to evaluate the correlation between ECT2 expression and patient survival outcomes. Functional assays, including ECT2 knockdown via shRNA in the HepG2 hepatocellular carcinoma (HCC) cell line, were employed to investigate its mechanistic role. Transcriptomic profiling and pathway analyses were also conducted to explore the impact of ECT2 on cell proliferation and the tumor immune microenvironment.</p><p><strong>Results: </strong>ECT2 was found to be significantly upregulated in 31 tumor types. Elevated ECT2 expression was consistently associated with worse overall survival (OS), disease-specific survival (DSS), disease-free interval (DFI), and progression-free interval (PFI) across multiple cancer subtypes. Functional assays revealed that ECT2 knockdown significantly reduced HepG2 cell viability and impaired cell cycle progression, with downregulation of Cyclin D1. Transcriptomic analysis of ECT2-depleted cells indicated enriched gene sets related to cell proliferation and mitotic regulation. Additionally, ECT2 expression was significantly correlated with immune features, including immune cell infiltration, immune checkpoint gene expression, tumor mutational burden (TMB), and microsatellite instability (MSI).</p><p><strong>Conclusion: </strong>ECT2 is identified as a potential pan-cancer prognostic biomarker with dual roles in tumor initiation and progression, as well as in modulating the tumor immune microenvironment. Our findings suggest that ECT2 may serve as a promising therapeutic target in cancer immunotherapy, warranting further investigation into its immune-regulatory and oncogenic functions.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251396242"},"PeriodicalIF":2.5,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145655403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Breast cancer remains a predominant malignancy and a leading cause of oncologic mortality among women globally. The discovery of novel biomarkers is crucial for improving therapeutic outcomes.
Methods: We conducted a comprehensive analysis of the immunological and prognostic significance of hepatitis A virus cellular receptor 1 (HAVCR1) in breast cancer using publicly available datasets.
Results: HAVCR1 expression was markedly downregulated in breast cancer tissues. Significantly, lower expression levels of HAVCR1 in pre-treatment tumor samples were associated with poorer prognosis among pan-cancer patients undergoing immunotherapy, and a higher incidence of metastasis was observed in the breast cancer subgroup. Subtype-specific DEG analyses further indicated that distinct patterns of immune infiltration may underlie this association. Moreover, gene set enrichment analysis (GSEA) highlighted the immunological relevance of HAVCR1, particularly its involvement in T cell activation within the TNBC subtype. Clinically, elevated levels of HAVCR1 expression in pre-treatment T cells were indicative of a more favorable response to PD-1 blockade therapy compared to those with diminished expression.
Conclusion: The expression of HAVCR1 exhibits a strong correlation with immune infiltration and holds potential as a prognostic biomarker for breast cancer, offering predictive insight into the efficacy of immunotherapeutic interventions.
{"title":"An Integrated Analysis of HAVCR1 with a Focus on Immunological and Prognostic Roles in Breast Cancer.","authors":"Wen Sun, Weiya Zhang, Jianyi Zhao, Mingyi Sang, Qixuan Feng, Wenbin Zhou, Yue Sun","doi":"10.1177/11769351251393148","DOIUrl":"10.1177/11769351251393148","url":null,"abstract":"<p><strong>Background: </strong>Breast cancer remains a predominant malignancy and a leading cause of oncologic mortality among women globally. The discovery of novel biomarkers is crucial for improving therapeutic outcomes.</p><p><strong>Methods: </strong>We conducted a comprehensive analysis of the immunological and prognostic significance of hepatitis A virus cellular receptor 1 (HAVCR1) in breast cancer using publicly available datasets.</p><p><strong>Results: </strong>HAVCR1 expression was markedly downregulated in breast cancer tissues. Significantly, lower expression levels of HAVCR1 in pre-treatment tumor samples were associated with poorer prognosis among pan-cancer patients undergoing immunotherapy, and a higher incidence of metastasis was observed in the breast cancer subgroup. Subtype-specific DEG analyses further indicated that distinct patterns of immune infiltration may underlie this association. Moreover, gene set enrichment analysis (GSEA) highlighted the immunological relevance of HAVCR1, particularly its involvement in T cell activation within the TNBC subtype. Clinically, elevated levels of HAVCR1 expression in pre-treatment T cells were indicative of a more favorable response to PD-1 blockade therapy compared to those with diminished expression.</p><p><strong>Conclusion: </strong>The expression of HAVCR1 exhibits a strong correlation with immune infiltration and holds potential as a prognostic biomarker for breast cancer, offering predictive insight into the efficacy of immunotherapeutic interventions.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251393148"},"PeriodicalIF":2.5,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12663051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145649533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28eCollection Date: 2025-01-01DOI: 10.1177/11769351251393146
Benjamin Goldberg, Eric Nels Pederson, Zhengqing Ouyang
Objective: Breast cancer is one of the most prominent and deadly diseases in the world, and its prognosis varies widely based on the expression of certain genes. Identification of these genes is important for developing and interpreting clinical prognostic tests as well as furthering our understanding of breast cancer biology. We expand on prior efforts in the field toward identifying prognostic genes, by integrating powerful statistical methods.
Methods: To this end, we use an unsupervised random forest model, which allows for robust learning of non-linear gene expression/survival relationships and the ability to identify the most important genes affecting both positive and negative breast cancer prognosis. In total, 1,518 participants were considered from the METABRIC dataset, using 20,387 mRNA expression level variables and 23 clinical variables including HER2 mutation status. The top 250 & bottom 250 expressing genes and 6 clinical features were selected for the unsupervised random forest model.
Results: Our research corroborates previous discoveries of 27 important prognostic genes while also identifying 3 genes as potentially novel prognostic factors. Based on gene ontology analysis, we additionally show that these genes have plausible connections to breast cancer biology that should be experimentally investigated.
Conclusions: Here, we demonstrate the utility of the unsupervised random forest model over K-means clustering for identifying important genes in breast cancer.
{"title":"Unsupervised Random Forest Identifies Important Genetic Prognostic Factors for Breast Cancer Survival Time.","authors":"Benjamin Goldberg, Eric Nels Pederson, Zhengqing Ouyang","doi":"10.1177/11769351251393146","DOIUrl":"10.1177/11769351251393146","url":null,"abstract":"<p><strong>Objective: </strong>Breast cancer is one of the most prominent and deadly diseases in the world, and its prognosis varies widely based on the expression of certain genes. Identification of these genes is important for developing and interpreting clinical prognostic tests as well as furthering our understanding of breast cancer biology. We expand on prior efforts in the field toward identifying prognostic genes, by integrating powerful statistical methods.</p><p><strong>Methods: </strong>To this end, we use an unsupervised random forest model, which allows for robust learning of non-linear gene expression/survival relationships and the ability to identify the most important genes affecting both positive and negative breast cancer prognosis. In total, 1,518 participants were considered from the METABRIC dataset, using 20,387 mRNA expression level variables and 23 clinical variables including <i>HER2</i> mutation status. The top 250 & bottom 250 expressing genes and 6 clinical features were selected for the unsupervised random forest model.</p><p><strong>Results: </strong>Our research corroborates previous discoveries of 27 important prognostic genes while also identifying 3 genes as potentially novel prognostic factors. Based on gene ontology analysis, we additionally show that these genes have plausible connections to breast cancer biology that should be experimentally investigated.</p><p><strong>Conclusions: </strong>Here, we demonstrate the utility of the unsupervised random forest model over K-means clustering for identifying important genes in breast cancer.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251393146"},"PeriodicalIF":2.5,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12663042/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145649557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-24eCollection Date: 2025-01-01DOI: 10.1177/11769351251396250
Tamara Babic, Bojana Banovic Djeri, Dunja Pavlovic, Sandra Dragicevic, Jovana Despotovic, Jelena Karanovic, Aleksandra Nikolic
Objectives: This study aimed to identify transcript isoforms of protein-coding genes with potential relevance to the malignant transformation of gut mucosa.
Methods: Colon cancer cell lines (HCT116, DLD1, SW620) and immortalized cells derived from healthy gut epithelium (HCEC-1CT) were cultured as spheroids and subjected to RNA sequencing to profile both canonical and non-canonical transcripts. The resulting data were compared with prior bioinformatics study findings that analyzed RNA-seq datasets from 473 patient-derived tumor and 417 non-tumor colon tissue samples.
Results: Among 375 transcripts previously reported as significantly dysregulated in colon (39 up-regulated and 336 down-regulated), 32 transcripts displayed expression patterns in colon cell lines consistent with those observed in patient tissues (4 up-regulated and 28 down-regulated). In silico characterization of these molecules revealed that all of them exhibited at least 1 feature commonly associated with RNAs possessing regulatory functions, such as coding truncated protein isoform, exosomal localization, or enrichment in repetitive elements. The most prominently dysregulated transcripts with consistent expression profiles across both datasets were NTMT1-204 (up-regulated in cancer) and BLOC1S6-218 and DCTN1-205 (both down-regulated in cancer). The remaining 343 transcripts did not show consistent expression patterns in the cell lines, suggesting their dysregulation in patient-derived tissues may be due to the stromal or microenvironmental factors absent in vitro.
Conclusion: In summary, this comparative transcriptomic analysis identified 32 transcript isoforms, comprising 2 canonical and 30 non-canonical transcripts, that may play regulatory roles in colon carcinogenesis and warrant further investigation in the context of gut epithelial cell biology.
{"title":"Comparative RNA-Seq Analysis of Colon Spheroids and Patient-derived Tissues Identifies Non-Canonical Transcript Isoforms of Protein-Coding Genes Implicated in Colon Carcinogenesis.","authors":"Tamara Babic, Bojana Banovic Djeri, Dunja Pavlovic, Sandra Dragicevic, Jovana Despotovic, Jelena Karanovic, Aleksandra Nikolic","doi":"10.1177/11769351251396250","DOIUrl":"https://doi.org/10.1177/11769351251396250","url":null,"abstract":"<p><strong>Objectives: </strong>This study aimed to identify transcript isoforms of protein-coding genes with potential relevance to the malignant transformation of gut mucosa.</p><p><strong>Methods: </strong>Colon cancer cell lines (HCT116, DLD1, SW620) and immortalized cells derived from healthy gut epithelium (HCEC-1CT) were cultured as spheroids and subjected to RNA sequencing to profile both canonical and non-canonical transcripts. The resulting data were compared with prior bioinformatics study findings that analyzed RNA-seq datasets from 473 patient-derived tumor and 417 non-tumor colon tissue samples.</p><p><strong>Results: </strong>Among 375 transcripts previously reported as significantly dysregulated in colon (39 up-regulated and 336 down-regulated), 32 transcripts displayed expression patterns in colon cell lines consistent with those observed in patient tissues (4 up-regulated and 28 down-regulated). In silico characterization of these molecules revealed that all of them exhibited at least 1 feature commonly associated with RNAs possessing regulatory functions, such as coding truncated protein isoform, exosomal localization, or enrichment in repetitive elements. The most prominently dysregulated transcripts with consistent expression profiles across both datasets were NTMT1-204 (up-regulated in cancer) and BLOC1S6-218 and DCTN1-205 (both down-regulated in cancer). The remaining 343 transcripts did not show consistent expression patterns in the cell lines, suggesting their dysregulation in patient-derived tissues may be due to the stromal or microenvironmental factors absent in vitro.</p><p><strong>Conclusion: </strong>In summary, this comparative transcriptomic analysis identified 32 transcript isoforms, comprising 2 canonical and 30 non-canonical transcripts, that may play regulatory roles in colon carcinogenesis and warrant further investigation in the context of gut epithelial cell biology.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251396250"},"PeriodicalIF":2.5,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12647565/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145640286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-23eCollection Date: 2025-01-01DOI: 10.1177/11769351251394271
Poonamjeet Kaur Loyal, Edward Chege, Jasmit Shah, Anne Mwirigi, Samuel Nguku Gitau
Background: Patients with Human Immunodeficiency Virus (HIV)have an atypical imaging pattern of lymphoma. There is paucity of literature on differences in tumor volume or burden of disease amongst HIV positive patients compared with HIV negative patients and how this correlates with clinicopathological parameters of aggressiveness and prognosis.
Methods: This was a retrospective cross-sectional study of patients with non-Hodgkin lymphoma which were categorized into HIV positive and HIV negative. The tumor burden, disease sites, international prognostic score and Ki-67 index were recorded. Continuous variables were analyzed using the Kruskal Wallis test and categorical variables with Fisher's Exact test.
Results: Out of the 92 patients with non-Hodgkin lymphoma, 51.1% were HIV positive with a median age of 45.0 years. The median sum of product diameters used to measure tumor burden was 102.6 [IQR: 51.7, 173.1] with no significant difference seen between the 2 groups. The extranodal disease was significantly higher in the HIV positive group (85.1%) while exclusive nodal disease was seen predominantly in the non-HIV group (66.7%) (P < .001). Complete treatment response was higher in the non-HIV group 54.5% compared to 20.9% for the HIV group (P < .001). More HIV positive patients succumbed, 37.2% compared to the 4.5% for non-HIV patients (P < .001).
Conclusion: HIV-related lymphoma remains a poorly understood subset. Although there was no significant difference in overall tumor burden between HIV positive and negative patients, extranodal disease was significantly higher in the HIV positive patients. Furthermore, the clinical prognostication score and Ki-67 which apply well for HIV-negative patients may not apply for HIV-related lymphoma.
背景:人类免疫缺陷病毒(HIV)患者具有非典型的淋巴瘤影像学特征。与HIV阴性患者相比,HIV阳性患者的肿瘤体积或疾病负担的差异以及这与侵袭性和预后的临床病理参数之间的关系,文献很少。方法:对HIV阳性和HIV阴性的非霍奇金淋巴瘤患者进行回顾性横断面研究。记录肿瘤负荷、发病部位、国际预后评分及Ki-67指数。连续变量采用Kruskal Wallis检验,分类变量采用Fisher精确检验。结果:92例非霍奇金淋巴瘤患者中,51.1%为HIV阳性,中位年龄为45.0岁。用于测量肿瘤负荷的产品直径中位数和为102.6 [IQR: 51.7, 173.1],两组间无显著差异。结外疾病在HIV阳性组中显著增加(85.1%),而排他性淋巴结疾病主要见于非HIV组(66.7%)(P P P)。虽然HIV阳性和阴性患者的总体肿瘤负担没有显著差异,但HIV阳性患者的结外病变明显更高。此外,适用于hiv阴性患者的临床预后评分和Ki-67可能不适用于hiv相关淋巴瘤。
{"title":"Lymphoma Imaging in HIV and Non-HIV Patients: A Retrospective Cross-Sectional Study With Clinical and Pathological Correlation.","authors":"Poonamjeet Kaur Loyal, Edward Chege, Jasmit Shah, Anne Mwirigi, Samuel Nguku Gitau","doi":"10.1177/11769351251394271","DOIUrl":"https://doi.org/10.1177/11769351251394271","url":null,"abstract":"<p><strong>Background: </strong>Patients with Human Immunodeficiency Virus (HIV)have an atypical imaging pattern of lymphoma. There is paucity of literature on differences in tumor volume or burden of disease amongst HIV positive patients compared with HIV negative patients and how this correlates with clinicopathological parameters of aggressiveness and prognosis.</p><p><strong>Methods: </strong>This was a retrospective cross-sectional study of patients with non-Hodgkin lymphoma which were categorized into HIV positive and HIV negative. The tumor burden, disease sites, international prognostic score and Ki-67 index were recorded. Continuous variables were analyzed using the Kruskal Wallis test and categorical variables with Fisher's Exact test.</p><p><strong>Results: </strong>Out of the 92 patients with non-Hodgkin lymphoma, 51.1% were HIV positive with a median age of 45.0 years. The median sum of product diameters used to measure tumor burden was 102.6 [IQR: 51.7, 173.1] with no significant difference seen between the 2 groups. The extranodal disease was significantly higher in the HIV positive group (85.1%) while exclusive nodal disease was seen predominantly in the non-HIV group (66.7%) (<i>P</i> < .001). Complete treatment response was higher in the non-HIV group 54.5% compared to 20.9% for the HIV group (<i>P</i> < .001). More HIV positive patients succumbed, 37.2% compared to the 4.5% for non-HIV patients (<i>P</i> < .001).</p><p><strong>Conclusion: </strong>HIV-related lymphoma remains a poorly understood subset. Although there was no significant difference in overall tumor burden between HIV positive and negative patients, extranodal disease was significantly higher in the HIV positive patients. Furthermore, the clinical prognostication score and Ki-67 which apply well for HIV-negative patients may not apply for HIV-related lymphoma.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251394271"},"PeriodicalIF":2.5,"publicationDate":"2025-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12644430/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145640519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}