首页 > 最新文献

Cancer Informatics最新文献

英文 中文
MGDB: A Novel Bioinformatics Quality Control Tool for Clinical Next-Generation Sequencing. MGDB:一种用于临床下一代测序的新型生物信息学质量控制工具。
IF 2.5 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-03 eCollection Date: 2026-01-01 DOI: 10.1177/11769351251411074
Hadrien T Gayap, Philippe-Pierre Robichaud, Nicolas Crapoulet, Eric P Allain

Background and objectives: Next-generation sequencing (NGS) is transforming clinical diagnostics by enabling the detection of genetic variation with unprecedented precision. However, successful implementation of NGS workflows necessitates stringent quality control. This study introduces Molecular Genetics Dashboard (MGDB), a novel bioinformatics tool designed to enhance quality control in clinical NGS workflows.

Methods: Using the Python dash framework for visualizations and MySQL databases, we have developed a novel tool for variant-level monitoring of clinical NGS sequencing runs. MGDB uses a docker-compose containerization for improved portability and can flexibly include or exclude samples from accumulated statistics with notes from interpreters.

Results: MGDB facilitates variant-level run-to-run monitoring, ensuring the consistency of variant detection across sequencing cycles. The tool provides an interactive platform for visualizing and assessing variant data, identifying potential inconsistencies or outliers and improving data management and interpretation compared to traditional methods. MGDB was tested using samples sequenced with Oncomine Focus/Comprehensive Plus assays on S5 sequencers and analyzed via IonReporter software.

Conclusions: MGDB offers a robust and user-friendly solution for enhancing quality control in clinical NGS workflows, contributing to greater accuracy and reliability in variant detection. The tool is freely available on GitHub: https://github.com/acri-nb/GeneticVariantsDB.

背景和目的:下一代测序(NGS)正在通过前所未有的精度检测遗传变异,从而改变临床诊断。然而,NGS工作流程的成功实施需要严格的质量控制。本研究介绍了分子遗传学仪表盘(MGDB),这是一种新型的生物信息学工具,旨在加强临床NGS工作流程的质量控制。方法:利用Python dash框架进行可视化和MySQL数据库,我们开发了一种新的工具,用于临床NGS测序运行的变水平监测。MGDB使用docker-compose容器化来提高可移植性,并且可以灵活地使用解释器的注释从累积的统计数据中包括或排除样本。结果:MGDB促进了变异水平的运行-运行监测,确保了跨测序周期变异检测的一致性。与传统方法相比,该工具提供了一个交互式平台,用于可视化和评估变量数据,识别潜在的不一致或异常值,并改进数据管理和解释。MGDB使用Oncomine Focus/Comprehensive Plus测定法在S5测序仪上测序,并通过IonReporter软件进行分析。结论:MGDB为加强临床NGS工作流程的质量控制提供了一个强大且用户友好的解决方案,有助于提高变异检测的准确性和可靠性。该工具在GitHub上免费提供:https://github.com/acri-nb/GeneticVariantsDB。
{"title":"MGDB: A Novel Bioinformatics Quality Control Tool for Clinical Next-Generation Sequencing.","authors":"Hadrien T Gayap, Philippe-Pierre Robichaud, Nicolas Crapoulet, Eric P Allain","doi":"10.1177/11769351251411074","DOIUrl":"10.1177/11769351251411074","url":null,"abstract":"<p><strong>Background and objectives: </strong>Next-generation sequencing (NGS) is transforming clinical diagnostics by enabling the detection of genetic variation with unprecedented precision. However, successful implementation of NGS workflows necessitates stringent quality control. This study introduces Molecular Genetics Dashboard (MGDB), a novel bioinformatics tool designed to enhance quality control in clinical NGS workflows.</p><p><strong>Methods: </strong>Using the Python dash framework for visualizations and MySQL databases, we have developed a novel tool for variant-level monitoring of clinical NGS sequencing runs. MGDB uses a docker-compose containerization for improved portability and can flexibly include or exclude samples from accumulated statistics with notes from interpreters.</p><p><strong>Results: </strong>MGDB facilitates variant-level run-to-run monitoring, ensuring the consistency of variant detection across sequencing cycles. The tool provides an interactive platform for visualizing and assessing variant data, identifying potential inconsistencies or outliers and improving data management and interpretation compared to traditional methods. MGDB was tested using samples sequenced with Oncomine Focus/Comprehensive Plus assays on S5 sequencers and analyzed via IonReporter software.</p><p><strong>Conclusions: </strong>MGDB offers a robust and user-friendly solution for enhancing quality control in clinical NGS workflows, contributing to greater accuracy and reliability in variant detection. The tool is freely available on GitHub: https://github.com/acri-nb/GeneticVariantsDB.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"25 ","pages":"11769351251411074"},"PeriodicalIF":2.5,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12764754/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145906846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering Analysis of Multiple Omics Data Types Identifies Cancer Patients With Consistent Survival Outcomes. 多组学数据类型的聚类分析确定了具有一致生存结果的癌症患者。
IF 2.5 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-23 eCollection Date: 2025-01-01 DOI: 10.1177/11769351251394107
Shuting Lin, Peng Qiu

Objectives: Cancer stratification is essential for accurate prognosis and personalized treatment selection. While many existing approaches integrate multiple omics data types to identify cancer subtypes, it remains unclear how clustering results from individual omics layers compare in their ability to capture survival-related patient clusters. This study aims to examine patient clusters separately defined by different omics data types and to explore the consistency of these clusters as well as their associations with survival outcomes.

Methods: In this study, we conducted clustering analysis on miRNA expression, gene expression, and DNA methylation data across 20 cancer types in TCGA. We employed a standard clustering pipeline similar to the widely used Seurat clustering pipeline in scRNA-seq analysis. We performed survival analysis to assess whether the resulting patient clusters exhibit significantly different survival outcomes.

Results: We observed significant survival differences among patient clusters in 11 cancer types. Notably, in 6 of these 11 cancer types, the survival differences among patient clusters were significant in multiple omics data types. For each of these 6 cancer types, we compared the consistency of patient clusters across different omics data types. Interestingly, in each cancer type, we noticed one set of patients who consistently clustered together irrespective of the omics data type, and these patients exhibited either the most favorable or the most unfavorable survival outcomes. This observation suggested that those patients with the most prominent survival outcomes show distinct expression patterns in multiple genomics aspects and could be captured by clustering analysis in multiple omics data types. To interpret these findings, we identified differentially expressed molecular features. Using established miRNA-target relationships, gene-gene interactions, as well as gene-CpG relationships, we constructed networks specific to each cancer type based on the differentially expressed features. These networks revealed several molecular modules associated with patient survival outcomes, such as the miR-200c-3p/ZEB2 axis in bladder cancer, the regulatory role of miR-98 in breast cancer, as well as the association of miR-21 with target genes APC in kidney renal cell carcinoma.

Conclusion: These findings suggest that omics-specific clustering can identify robust survival-related patient clusters and uncover molecular features that may contribute to differential survival outcomes.

目的:肿瘤分层是准确预后和个性化治疗选择的必要条件。虽然许多现有的方法整合了多种组学数据类型来识别癌症亚型,但目前尚不清楚来自各个组学层的聚类结果如何能够捕获与生存相关的患者聚类。本研究旨在检查由不同组学数据类型单独定义的患者群,并探索这些群的一致性以及它们与生存结果的关联。方法:在本研究中,我们对TCGA 20种癌症类型的miRNA表达、基因表达和DNA甲基化数据进行聚类分析。我们采用了类似于scRNA-seq分析中广泛使用的Seurat聚类管道的标准聚类管道。我们进行了生存分析,以评估所产生的患者群是否表现出显著不同的生存结果。结果:我们在11种癌症类型的患者群中观察到显著的生存差异。值得注意的是,在这11种癌症类型中的6种中,患者群之间的生存差异在多个组学数据类型中都是显著的。对于这6种癌症类型中的每一种,我们比较了不同组学数据类型的患者群的一致性。有趣的是,在每种癌症类型中,我们注意到一组患者无论组学数据类型如何都始终聚集在一起,这些患者表现出最有利或最不利的生存结果。这一观察结果表明,那些生存结果最突出的患者在多个基因组学方面表现出不同的表达模式,可以通过多组学数据类型的聚类分析来捕获。为了解释这些发现,我们确定了差异表达的分子特征。利用已建立的miRNA-target关系、基因-基因相互作用以及基因- cpg关系,我们基于差异表达特征构建了针对每种癌症类型的网络。这些网络揭示了与患者生存结果相关的几个分子模块,如膀胱癌中的miR-200c-3p/ZEB2轴,乳腺癌中miR-98的调节作用,以及肾肾细胞癌中miR-21与靶基因APC的关联。结论:这些发现表明,组学特异性聚类可以识别与生存相关的患者簇,并揭示可能导致差异生存结果的分子特征。
{"title":"Clustering Analysis of Multiple Omics Data Types Identifies Cancer Patients With Consistent Survival Outcomes.","authors":"Shuting Lin, Peng Qiu","doi":"10.1177/11769351251394107","DOIUrl":"10.1177/11769351251394107","url":null,"abstract":"<p><strong>Objectives: </strong>Cancer stratification is essential for accurate prognosis and personalized treatment selection. While many existing approaches integrate multiple omics data types to identify cancer subtypes, it remains unclear how clustering results from individual omics layers compare in their ability to capture survival-related patient clusters. This study aims to examine patient clusters separately defined by different omics data types and to explore the consistency of these clusters as well as their associations with survival outcomes.</p><p><strong>Methods: </strong>In this study, we conducted clustering analysis on miRNA expression, gene expression, and DNA methylation data across 20 cancer types in TCGA. We employed a standard clustering pipeline similar to the widely used Seurat clustering pipeline in scRNA-seq analysis. We performed survival analysis to assess whether the resulting patient clusters exhibit significantly different survival outcomes.</p><p><strong>Results: </strong>We observed significant survival differences among patient clusters in 11 cancer types. Notably, in 6 of these 11 cancer types, the survival differences among patient clusters were significant in multiple omics data types. For each of these 6 cancer types, we compared the consistency of patient clusters across different omics data types. Interestingly, in each cancer type, we noticed one set of patients who consistently clustered together irrespective of the omics data type, and these patients exhibited either the most favorable or the most unfavorable survival outcomes. This observation suggested that those patients with the most prominent survival outcomes show distinct expression patterns in multiple genomics aspects and could be captured by clustering analysis in multiple omics data types. To interpret these findings, we identified differentially expressed molecular features. Using established miRNA-target relationships, gene-gene interactions, as well as gene-CpG relationships, we constructed networks specific to each cancer type based on the differentially expressed features. These networks revealed several molecular modules associated with patient survival outcomes, such as the miR-200c-3p/ZEB2 axis in bladder cancer, the regulatory role of miR-98 in breast cancer, as well as the association of miR-21 with target genes APC in kidney renal cell carcinoma.</p><p><strong>Conclusion: </strong>These findings suggest that omics-specific clustering can identify robust survival-related patient clusters and uncover molecular features that may contribute to differential survival outcomes.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251394107"},"PeriodicalIF":2.5,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12743153/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145850943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Cancer Biomarker Identification From Matched Transcriptomic Data Via Bootstrapped Regularized Conditional Logistic Regression. 基于自举正则化条件逻辑回归的匹配转录组数据鲁棒性癌症生物标志物鉴定。
IF 2.5 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-16 eCollection Date: 2025-01-01 DOI: 10.1177/11769351251404255
Jie-Huei Wang, Zih-Han Wu, Hui-Chen Lu, Tzung-Ying Guo

Objectives: With the increasing application of high-throughput transcriptomic data in cancer research, identifying reliable cancer biomarkers in high-dimensional settings remains a major challenge. This study aims to systematically evaluate various regularized conditional logistic regression (CLR) methods under a matched case-control (MCC) design, focusing on their performance in variable selection, parameter estimation, and predictive accuracy. Special emphasis is placed on the importance of the matching design in reducing confounding effects and improving model interpretability.

Methods: We utilize RNA-seq data from The Cancer Genome Atlas (TCGA), specifically datasets for liver, thyroid, and lung cancers, which include paired tumor and adjacent normal tissue samples. In our analysis, we apply 4 regularized CLR methods implemented in R packages-namely "clogitL1," "pclogit," "clogitLasso," and "penalizedclr"-to analyze over 20 000 gene expression features. We evaluate the comparative performance of these methods based on metrics such as gene selection stability, predictive accuracy, and interpretability. Additionally, we employ a bootstrap resampling framework to estimate gene selection probabilities, which serve as a measure of gene importance.

Results: Our results show that incorporating the MCC design significantly enhances feature selection performance by mitigating confounding noise. The regularized CLR models successfully identify several well-established cancer-related genes with high selection consistency and statistical significance. In contrast, models that ignore the matched design tend to miss critical biomarkers or produce excessive false positives, leading to potentially misleading interpretations.

Conclusions: This study highlights the value of integrating a matched case-control design with regularized CLR methods for the analysis of high-dimensional transcriptomic data. The proposed analytical framework offers improved accuracy, robustness, and biological relevance, providing a practical and scalable approach for cancer genomics research. It also supports the advancement of precision medicine and translational applications.

随着高通量转录组数据在癌症研究中的应用越来越多,在高维环境中识别可靠的癌症生物标志物仍然是一个主要挑战。本研究旨在系统评估匹配病例对照(MCC)设计下的各种正则化条件逻辑回归(CLR)方法,重点关注它们在变量选择、参数估计和预测精度方面的性能。特别强调了匹配设计在减少混淆效应和提高模型可解释性方面的重要性。方法:我们利用来自癌症基因组图谱(TCGA)的RNA-seq数据,特别是肝癌、甲状腺癌和肺癌的数据集,包括成对的肿瘤和邻近的正常组织样本。在我们的分析中,我们使用了在R包中实现的4种正则化CLR方法——即“clogitL1”、“pclogit”、“clogitLasso”和“penalizedclr”——来分析超过20,000个基因表达特征。我们基于诸如基因选择稳定性、预测准确性和可解释性等指标来评估这些方法的比较性能。此外,我们采用自举重采样框架来估计基因选择概率,这是基因重要性的衡量标准。结果:我们的研究结果表明,结合MCC设计可以显著提高特征选择性能,降低混杂噪声。正则化的CLR模型成功地识别了几个已建立的癌症相关基因,具有很高的选择一致性和统计学意义。相比之下,忽略匹配设计的模型往往会错过关键的生物标志物或产生过多的假阳性,从而导致潜在的误导性解释。结论:本研究强调了将匹配病例对照设计与正则化CLR方法整合在高维转录组学数据分析中的价值。提出的分析框架提供了更高的准确性、稳健性和生物学相关性,为癌症基因组学研究提供了一种实用和可扩展的方法。它还支持精准医学和转化应用的进步。
{"title":"Robust Cancer Biomarker Identification From Matched Transcriptomic Data Via Bootstrapped Regularized Conditional Logistic Regression.","authors":"Jie-Huei Wang, Zih-Han Wu, Hui-Chen Lu, Tzung-Ying Guo","doi":"10.1177/11769351251404255","DOIUrl":"10.1177/11769351251404255","url":null,"abstract":"<p><strong>Objectives: </strong>With the increasing application of high-throughput transcriptomic data in cancer research, identifying reliable cancer biomarkers in high-dimensional settings remains a major challenge. This study aims to systematically evaluate various regularized conditional logistic regression (CLR) methods under a matched case-control (MCC) design, focusing on their performance in variable selection, parameter estimation, and predictive accuracy. Special emphasis is placed on the importance of the matching design in reducing confounding effects and improving model interpretability.</p><p><strong>Methods: </strong>We utilize RNA-seq data from The Cancer Genome Atlas (TCGA), specifically datasets for liver, thyroid, and lung cancers, which include paired tumor and adjacent normal tissue samples. In our analysis, we apply 4 regularized CLR methods implemented in R packages-namely \"clogitL1,\" \"pclogit,\" \"clogitLasso,\" and \"penalizedclr\"-to analyze over 20 000 gene expression features. We evaluate the comparative performance of these methods based on metrics such as gene selection stability, predictive accuracy, and interpretability. Additionally, we employ a bootstrap resampling framework to estimate gene selection probabilities, which serve as a measure of gene importance.</p><p><strong>Results: </strong>Our results show that incorporating the MCC design significantly enhances feature selection performance by mitigating confounding noise. The regularized CLR models successfully identify several well-established cancer-related genes with high selection consistency and statistical significance. In contrast, models that ignore the matched design tend to miss critical biomarkers or produce excessive false positives, leading to potentially misleading interpretations.</p><p><strong>Conclusions: </strong>This study highlights the value of integrating a matched case-control design with regularized CLR methods for the analysis of high-dimensional transcriptomic data. The proposed analytical framework offers improved accuracy, robustness, and biological relevance, providing a practical and scalable approach for cancer genomics research. It also supports the advancement of precision medicine and translational applications.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251404255"},"PeriodicalIF":2.5,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12709001/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145782996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrative Analysis of eQTL Genes Reveals Key Biomarkers and Mechanisms for Early Diagnosis of Pancreatic Ductal Adenocarcinoma. eQTL基因的整合分析揭示了早期诊断胰腺导管腺癌的关键生物标志物和机制。
IF 2.5 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-16 eCollection Date: 2025-01-01 DOI: 10.1177/11769351251400465
Xuebo Wang, Xusheng Zhang, Shicai Liang, Jialong Wang, Yannan Xie, Jiawei Wang, Bendong Chen
<p><strong>Background: </strong>Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal malignancy with a dismal 5-year survival rate, largely due to the absence of reliable biomarkers for early detection. The molecular mechanisms underpinning PDAC pathogenesis remain incompletely understood, highlighting the urgent need for novel diagnostic strategies.</p><p><strong>Objective: </strong>This study aimed to integrate eQTL-driven Mendelian randomization (MR) with transcriptomic and genome-wide association data to identify causal PDAC-associated genes and construct a diagnostic nomogram based on 5 hub genes (CTSC, SMYD3, MFGE8, IGFBP7, POC1B) for early detection of pancreatic ductal adenocarcinoma (PDAC).</p><p><strong>Methods: </strong>Transcriptomic data from GSE62165 and GSE25471 were retrieved from the Gene Expression Omnibus (GEO) and processed for differential expression using LIMMA and GEO2R, followed by batch correction and weighted gene co-expression network analysis (WGCNA). Summary-level eQTL statistics were obtained from OpenGWAS, and GWAS data included over 5000 PDAC cases. MR analysis was performed using inverse variance weighted (IVW) as the primary approach, supplemented with MR-Egger, weighted median, weighted mode, and MR-PRESSO. Instrument strength, pleiotropy, and heterogeneity were assessed via F-statistics, Egger intercept, and Cochran's <i>Q</i> test. Candidate genes were filtered using a consensus approach combining random forest (RF), support vector machine-recursive feature elimination (SVM-RFE), and Lasso regression. Diagnostic performance was evaluated via ROC curves, C-index, calibration plots, and decision curve analysis. Mechanistic insights were derived from KEGG and GO enrichment analyses, as well as protein-protein interaction (PPI) network analyses.</p><p><strong>Results: </strong>Five eQTL-associated hub genes-<b>CTSC, SMYD3, MFGE8, IGFBP7, and POC1B</b>-were identified as causally linked to PDAC via robust MR analysis with minimal evidence of pleiotropy or heterogeneity. These genes demonstrated high diagnostic potential (AUC > 0.85, <i>P</i> < .001). A diagnostic nomogram incorporating these genes achieved strong predictive performance (C-index = 0.92) with favorable clinical decision curve results. Functional enrichment and PPI analyses implicated these genes, particularly CTSC, in modulating the <b>ITGAV/ITGB3-PI3K-Akt signaling axis</b>, contributing to PDAC cell cycle regulation and apoptosis resistance.</p><p><strong>Conclusions: </strong>This study presents a multi-omics, MR-informed framework for identifying eQTL-regulated biomarkers of PDAC. The identified hub genes offer promising avenues for early detection, while the mechanistic mapping of the PI3K-Akt pathway provides translational insights. These findings warrant further validation in clinical and experimental settings and hold potential to reshape PDAC diagnostic strategies.Pancreatic ductal adenocarcinoma (PDAC) remains a formidable clinical ch
背景:胰腺导管腺癌(Pancreatic ductal adencarcinoma, PDAC)是一种高致死率的恶性肿瘤,5年生存率低,主要原因是缺乏可靠的早期检测生物标志物。支持PDAC发病机制的分子机制仍然不完全清楚,强调迫切需要新的诊断策略。目的:本研究旨在将eqtl驱动的孟德尔随机化(MR)与转录组学和全基因组关联数据相结合,鉴定PDAC相关的致病基因,并构建基于5个中心基因(CTSC、SMYD3、MFGE8、IGFBP7、POC1B)的诊断图,用于胰腺导管腺癌(PDAC)的早期检测。方法:从Gene Expression Omnibus (GEO)检索GSE62165和GSE25471的转录组学数据,使用LIMMA和GEO2R进行差异表达处理,然后进行批量校正和加权基因共表达网络分析(WGCNA)。从OpenGWAS中获得汇总级eQTL统计数据,GWAS数据包括5000多例PDAC病例。MR分析以逆方差加权(IVW)为主要方法,辅以MR- egger、加权中位数、加权模式和MR- presso。通过f统计、Egger截距和Cochran’s Q检验评估工具强度、多效性和异质性。采用随机森林(RF)、支持向量机递归特征消除(SVM-RFE)和Lasso回归相结合的共识方法筛选候选基因。通过ROC曲线、c指数、校准图和决策曲线分析评估诊断效果。通过KEGG和GO富集分析以及蛋白质-蛋白质相互作用(PPI)网络分析获得了机制见解。结果:5个eqtl相关的中枢基因——ctsc、SMYD3、MFGE8、IGFBP7和poc1b——通过强有力的MR分析被确定与PDAC有因果关系,而多效性或异质性的证据很少。这些基因显示出很高的诊断潜力(AUC > 0.85, P ITGAV/ITGB3-PI3K-Akt信号轴),参与PDAC细胞周期调控和细胞凋亡抵抗。结论:本研究提出了一个多组学、磁共振信息框架,用于鉴定eqtl调控的PDAC生物标志物。已确定的枢纽基因为早期检测提供了有希望的途径,而PI3K-Akt通路的机制定位提供了翻译方面的见解。这些发现值得在临床和实验环境中进一步验证,并具有重塑PDAC诊断策略的潜力。胰腺导管腺癌(PDAC)由于其侵袭性和缺乏有效的早期诊断生物标志物,仍然是一个巨大的临床挑战。为了解决这个问题,我们利用孟德尔随机化(MR)整合转录组学数据、全基因组关联研究(GWAS)和表达数量性状位点(eQTL)信息,以确定与PDAC风险因果相关的基因。在两个GEO数据集(GSE62165, GSE25471)中鉴定差异表达基因,并使用加权基因共表达网络分析(WGCNA)对其进行优先级排序。采用IVW、MR- egger、加权中位数和MR- presso进行MR分析,鉴定出5个中心基因——ctsc、SMYD3、MFGE8、IGFBP7和poc1b——是PDAC的重要致病因素。这些基因被纳入到使用机器学习方法(随机森林、SVM-RFE、Lasso)构建的诊断模型中,该模型具有较强的分类性能(AUC > 0.85)和良好的校准(C-index = 0.92)。功能富集和蛋白相互作用分析显示,CTSC调控ECM-integrin-PI3K-Akt信号通路,参与肿瘤细胞增殖和存活。研究结果建立了一个基于多组学的生物标志物面板,具有很强的诊断效用和机制相关性,为未来临床队列的转化验证提供了一个潜在的框架。
{"title":"Integrative Analysis of eQTL Genes Reveals Key Biomarkers and Mechanisms for Early Diagnosis of Pancreatic Ductal Adenocarcinoma.","authors":"Xuebo Wang, Xusheng Zhang, Shicai Liang, Jialong Wang, Yannan Xie, Jiawei Wang, Bendong Chen","doi":"10.1177/11769351251400465","DOIUrl":"10.1177/11769351251400465","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal malignancy with a dismal 5-year survival rate, largely due to the absence of reliable biomarkers for early detection. The molecular mechanisms underpinning PDAC pathogenesis remain incompletely understood, highlighting the urgent need for novel diagnostic strategies.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aimed to integrate eQTL-driven Mendelian randomization (MR) with transcriptomic and genome-wide association data to identify causal PDAC-associated genes and construct a diagnostic nomogram based on 5 hub genes (CTSC, SMYD3, MFGE8, IGFBP7, POC1B) for early detection of pancreatic ductal adenocarcinoma (PDAC).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;Transcriptomic data from GSE62165 and GSE25471 were retrieved from the Gene Expression Omnibus (GEO) and processed for differential expression using LIMMA and GEO2R, followed by batch correction and weighted gene co-expression network analysis (WGCNA). Summary-level eQTL statistics were obtained from OpenGWAS, and GWAS data included over 5000 PDAC cases. MR analysis was performed using inverse variance weighted (IVW) as the primary approach, supplemented with MR-Egger, weighted median, weighted mode, and MR-PRESSO. Instrument strength, pleiotropy, and heterogeneity were assessed via F-statistics, Egger intercept, and Cochran's &lt;i&gt;Q&lt;/i&gt; test. Candidate genes were filtered using a consensus approach combining random forest (RF), support vector machine-recursive feature elimination (SVM-RFE), and Lasso regression. Diagnostic performance was evaluated via ROC curves, C-index, calibration plots, and decision curve analysis. Mechanistic insights were derived from KEGG and GO enrichment analyses, as well as protein-protein interaction (PPI) network analyses.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;Five eQTL-associated hub genes-&lt;b&gt;CTSC, SMYD3, MFGE8, IGFBP7, and POC1B&lt;/b&gt;-were identified as causally linked to PDAC via robust MR analysis with minimal evidence of pleiotropy or heterogeneity. These genes demonstrated high diagnostic potential (AUC &gt; 0.85, &lt;i&gt;P&lt;/i&gt; &lt; .001). A diagnostic nomogram incorporating these genes achieved strong predictive performance (C-index = 0.92) with favorable clinical decision curve results. Functional enrichment and PPI analyses implicated these genes, particularly CTSC, in modulating the &lt;b&gt;ITGAV/ITGB3-PI3K-Akt signaling axis&lt;/b&gt;, contributing to PDAC cell cycle regulation and apoptosis resistance.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;This study presents a multi-omics, MR-informed framework for identifying eQTL-regulated biomarkers of PDAC. The identified hub genes offer promising avenues for early detection, while the mechanistic mapping of the PI3K-Akt pathway provides translational insights. These findings warrant further validation in clinical and experimental settings and hold potential to reshape PDAC diagnostic strategies.Pancreatic ductal adenocarcinoma (PDAC) remains a formidable clinical ch","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251400465"},"PeriodicalIF":2.5,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12709030/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145782962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction and Feature Selection of Mastectomy-Related Post Traumatic Stress Disorder (PTSD) Using Machine Learning Among Breast Cancer Patients in Bangladesh. 使用机器学习在孟加拉国乳腺癌患者中预测和选择乳房切除相关的创伤后应激障碍(PTSD)
IF 2.5 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-12 eCollection Date: 2025-01-01 DOI: 10.1177/11769351251401330
Syed Billal Hossain, Md Mizanoor Rahman, Kapashia Binte Giash, Md Hazrat Ali, Mst Asma Akter, A B M Alauddin Chowdhury

Background: Post-mastectomy PTSD is a serious mental health issue, but it has not been studied enough, particularly in low-resource settings like Bangladesh. This study aimed to predict PTSD among breast cancer survivors using machine learning (ML) models and identify significant predictors through the Boruta algorithm, a feature selection tool, offering scalable solutions for early detection and intervention.

Methods: A cross-sectional study of 138 post-mastectomy breast cancer patients was conducted across 3 hospitals in Bangladesh. Data on sociodemographic, health history, social experience, and treatment were collected using validated tools, including the PTSD Checklist for DSM-5 (PCL-5). The Boruta algorithm identified key predictors, and 10 ML models were evaluated for PTSD prediction using metrics such as accuracy, sensitivity, specificity, and AUC.

Results: Random Forest (RF) outperformed other models (accuracy: 88.9%, AUC: 0.914). Significant predictors included education, monthly income, and changes in family behaviour. Factors like marital status, having chronic diseases, and hormone therapy were not statistically significant. PTSD prevalence was 34.1%, with urban residents and younger patients facing higher risks.

Conclusion: ML models, particularly RF, demonstrated strong predictive performance and identified critical PTSD predictors. These findings highlight the potential for cost-effective PTSD screening in resource-constrained settings. Future research should focus on broader validation and longitudinal studies to refine predictive models.

背景:乳房切除术后创伤后应激障碍是一种严重的精神健康问题,但尚未得到足够的研究,特别是在孟加拉国等资源匮乏的地区。本研究旨在使用机器学习(ML)模型预测乳腺癌幸存者的创伤后应激障碍,并通过Boruta算法(一种特征选择工具)确定重要的预测因子,为早期检测和干预提供可扩展的解决方案。方法:对孟加拉国3家医院的138例乳房切除术后乳腺癌患者进行横断面研究。使用包括DSM-5 (PCL-5) PTSD检查表在内的有效工具收集社会人口学、健康史、社会经验和治疗方面的数据。Boruta算法确定了关键预测因子,并使用准确性、敏感性、特异性和AUC等指标对10 ML模型进行PTSD预测评估。结果:随机森林(Random Forest, RF)模型优于其他模型(准确率:88.9%,AUC: 0.914)。重要的预测因素包括教育程度、月收入和家庭行为的变化。婚姻状况、患有慢性疾病和激素治疗等因素没有统计学意义。PTSD患病率为34.1%,其中城市居民和年轻患者风险较高。结论:ML模型,尤其是RF,表现出很强的预测能力,并确定了关键的PTSD预测因子。这些发现强调了在资源有限的情况下进行具有成本效益的PTSD筛查的潜力。未来的研究应侧重于更广泛的验证和纵向研究,以完善预测模型。
{"title":"Prediction and Feature Selection of Mastectomy-Related Post Traumatic Stress Disorder (PTSD) Using Machine Learning Among Breast Cancer Patients in Bangladesh.","authors":"Syed Billal Hossain, Md Mizanoor Rahman, Kapashia Binte Giash, Md Hazrat Ali, Mst Asma Akter, A B M Alauddin Chowdhury","doi":"10.1177/11769351251401330","DOIUrl":"10.1177/11769351251401330","url":null,"abstract":"<p><strong>Background: </strong>Post-mastectomy PTSD is a serious mental health issue, but it has not been studied enough, particularly in low-resource settings like Bangladesh. This study aimed to predict PTSD among breast cancer survivors using machine learning (ML) models and identify significant predictors through the Boruta algorithm, a feature selection tool, offering scalable solutions for early detection and intervention.</p><p><strong>Methods: </strong>A cross-sectional study of 138 post-mastectomy breast cancer patients was conducted across 3 hospitals in Bangladesh. Data on sociodemographic, health history, social experience, and treatment were collected using validated tools, including the PTSD Checklist for DSM-5 (PCL-5). The Boruta algorithm identified key predictors, and 10 ML models were evaluated for PTSD prediction using metrics such as accuracy, sensitivity, specificity, and AUC.</p><p><strong>Results: </strong>Random Forest (RF) outperformed other models (accuracy: 88.9%, AUC: 0.914). Significant predictors included education, monthly income, and changes in family behaviour. Factors like marital status, having chronic diseases, and hormone therapy were not statistically significant. PTSD prevalence was 34.1%, with urban residents and younger patients facing higher risks.</p><p><strong>Conclusion: </strong>ML models, particularly RF, demonstrated strong predictive performance and identified critical PTSD predictors. These findings highlight the potential for cost-effective PTSD screening in resource-constrained settings. Future research should focus on broader validation and longitudinal studies to refine predictive models.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251401330"},"PeriodicalIF":2.5,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701936/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pan-Cancer Analysis of the Prognostic and Immunological Role of ECT2: A Promising Target for Survival and Immunotherapy. ECT2的预后和免疫学作用的泛癌分析:一个有希望的生存和免疫治疗靶点。
IF 2.5 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-29 eCollection Date: 2025-01-01 DOI: 10.1177/11769351251396242
Lulu Wang, Hua Jin, Xiaowei Liu, Hanzhi Zhang

Objectives: The aim of this study is to investigate the role of epithelial cell transforming sequence 2 (ECT2) as a pan-cancer biomarker and to assess its potential as an immune-related target for cancer immunotherapy.

Methods: We conducted a comprehensive analysis of ECT2 expression across 44 tumor types using large-scale transcriptomic datasets from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. Pan-cancer Cox regression analyses were performed to evaluate the correlation between ECT2 expression and patient survival outcomes. Functional assays, including ECT2 knockdown via shRNA in the HepG2 hepatocellular carcinoma (HCC) cell line, were employed to investigate its mechanistic role. Transcriptomic profiling and pathway analyses were also conducted to explore the impact of ECT2 on cell proliferation and the tumor immune microenvironment.

Results: ECT2 was found to be significantly upregulated in 31 tumor types. Elevated ECT2 expression was consistently associated with worse overall survival (OS), disease-specific survival (DSS), disease-free interval (DFI), and progression-free interval (PFI) across multiple cancer subtypes. Functional assays revealed that ECT2 knockdown significantly reduced HepG2 cell viability and impaired cell cycle progression, with downregulation of Cyclin D1. Transcriptomic analysis of ECT2-depleted cells indicated enriched gene sets related to cell proliferation and mitotic regulation. Additionally, ECT2 expression was significantly correlated with immune features, including immune cell infiltration, immune checkpoint gene expression, tumor mutational burden (TMB), and microsatellite instability (MSI).

Conclusion: ECT2 is identified as a potential pan-cancer prognostic biomarker with dual roles in tumor initiation and progression, as well as in modulating the tumor immune microenvironment. Our findings suggest that ECT2 may serve as a promising therapeutic target in cancer immunotherapy, warranting further investigation into its immune-regulatory and oncogenic functions.

目的:本研究的目的是研究上皮细胞转化序列2 (ECT2)作为泛癌症生物标志物的作用,并评估其作为癌症免疫治疗免疫相关靶点的潜力。方法:我们利用来自癌症基因组图谱(TCGA)和基因型-组织表达(GTEx)项目的大规模转录组数据集,对44种肿瘤类型的ECT2表达进行了全面分析。采用泛癌Cox回归分析评估ECT2表达与患者生存结果的相关性。在HepG2肝细胞癌(HCC)细胞系中,通过shRNA敲低ECT2的功能分析,研究了其机制作用。我们还通过转录组学分析和通路分析来探讨ECT2对细胞增殖和肿瘤免疫微环境的影响。结果:ECT2在31种肿瘤中表达显著上调。在多种癌症亚型中,升高的ECT2表达始终与较差的总生存期(OS)、疾病特异性生存期(DSS)、无病间期(DFI)和无进展间期(PFI)相关。功能分析显示,ECT2敲低显著降低HepG2细胞活力,细胞周期进程受损,Cyclin D1下调。转录组学分析显示,ect2缺失的细胞中富集了与细胞增殖和有丝分裂调控相关的基因集。此外,ECT2表达与免疫细胞浸润、免疫检查点基因表达、肿瘤突变负担(TMB)和微卫星不稳定性(MSI)等免疫特征显著相关。结论:ECT2是一种潜在的泛癌预后生物标志物,在肿瘤发生和进展以及调节肿瘤免疫微环境中具有双重作用。我们的研究结果表明,ECT2可能作为癌症免疫治疗的一个有希望的治疗靶点,值得进一步研究其免疫调节和致癌功能。
{"title":"Pan-Cancer Analysis of the Prognostic and Immunological Role of ECT2: A Promising Target for Survival and Immunotherapy.","authors":"Lulu Wang, Hua Jin, Xiaowei Liu, Hanzhi Zhang","doi":"10.1177/11769351251396242","DOIUrl":"10.1177/11769351251396242","url":null,"abstract":"<p><strong>Objectives: </strong>The aim of this study is to investigate the role of epithelial cell transforming sequence 2 (ECT2) as a pan-cancer biomarker and to assess its potential as an immune-related target for cancer immunotherapy.</p><p><strong>Methods: </strong>We conducted a comprehensive analysis of ECT2 expression across 44 tumor types using large-scale transcriptomic datasets from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. Pan-cancer Cox regression analyses were performed to evaluate the correlation between ECT2 expression and patient survival outcomes. Functional assays, including ECT2 knockdown via shRNA in the HepG2 hepatocellular carcinoma (HCC) cell line, were employed to investigate its mechanistic role. Transcriptomic profiling and pathway analyses were also conducted to explore the impact of ECT2 on cell proliferation and the tumor immune microenvironment.</p><p><strong>Results: </strong>ECT2 was found to be significantly upregulated in 31 tumor types. Elevated ECT2 expression was consistently associated with worse overall survival (OS), disease-specific survival (DSS), disease-free interval (DFI), and progression-free interval (PFI) across multiple cancer subtypes. Functional assays revealed that ECT2 knockdown significantly reduced HepG2 cell viability and impaired cell cycle progression, with downregulation of Cyclin D1. Transcriptomic analysis of ECT2-depleted cells indicated enriched gene sets related to cell proliferation and mitotic regulation. Additionally, ECT2 expression was significantly correlated with immune features, including immune cell infiltration, immune checkpoint gene expression, tumor mutational burden (TMB), and microsatellite instability (MSI).</p><p><strong>Conclusion: </strong>ECT2 is identified as a potential pan-cancer prognostic biomarker with dual roles in tumor initiation and progression, as well as in modulating the tumor immune microenvironment. Our findings suggest that ECT2 may serve as a promising therapeutic target in cancer immunotherapy, warranting further investigation into its immune-regulatory and oncogenic functions.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251396242"},"PeriodicalIF":2.5,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145655403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Integrated Analysis of HAVCR1 with a Focus on Immunological and Prognostic Roles in Breast Cancer. 基于乳腺癌免疫和预后作用的HAVCR1的综合分析
IF 2.5 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-28 eCollection Date: 2025-01-01 DOI: 10.1177/11769351251393148
Wen Sun, Weiya Zhang, Jianyi Zhao, Mingyi Sang, Qixuan Feng, Wenbin Zhou, Yue Sun

Background: Breast cancer remains a predominant malignancy and a leading cause of oncologic mortality among women globally. The discovery of novel biomarkers is crucial for improving therapeutic outcomes.

Methods: We conducted a comprehensive analysis of the immunological and prognostic significance of hepatitis A virus cellular receptor 1 (HAVCR1) in breast cancer using publicly available datasets.

Results: HAVCR1 expression was markedly downregulated in breast cancer tissues. Significantly, lower expression levels of HAVCR1 in pre-treatment tumor samples were associated with poorer prognosis among pan-cancer patients undergoing immunotherapy, and a higher incidence of metastasis was observed in the breast cancer subgroup. Subtype-specific DEG analyses further indicated that distinct patterns of immune infiltration may underlie this association. Moreover, gene set enrichment analysis (GSEA) highlighted the immunological relevance of HAVCR1, particularly its involvement in T cell activation within the TNBC subtype. Clinically, elevated levels of HAVCR1 expression in pre-treatment T cells were indicative of a more favorable response to PD-1 blockade therapy compared to those with diminished expression.

Conclusion: The expression of HAVCR1 exhibits a strong correlation with immune infiltration and holds potential as a prognostic biomarker for breast cancer, offering predictive insight into the efficacy of immunotherapeutic interventions.

背景:乳腺癌仍然是一种主要的恶性肿瘤,也是全球妇女肿瘤死亡率的主要原因。新的生物标志物的发现对于改善治疗效果至关重要。方法:我们利用公开的数据集对甲型肝炎病毒细胞受体1 (HAVCR1)在乳腺癌中的免疫学和预后意义进行了全面分析。结果:HAVCR1在乳腺癌组织中表达明显下调。值得注意的是,在接受免疫治疗的泛癌患者中,治疗前肿瘤样本中较低的HAVCR1表达水平与较差的预后相关,并且在乳腺癌亚组中观察到较高的转移发生率。亚型特异性DEG分析进一步表明,不同的免疫浸润模式可能是这种关联的基础。此外,基因集富集分析(GSEA)强调了HAVCR1的免疫学相关性,特别是它参与TNBC亚型的T细胞活化。临床上,治疗前T细胞中HAVCR1表达水平升高表明与表达降低的T细胞相比,对PD-1阻断治疗的反应更有利。结论:HAVCR1的表达与免疫浸润有很强的相关性,具有作为乳腺癌预后生物标志物的潜力,为免疫治疗干预的疗效提供了预测性见解。
{"title":"An Integrated Analysis of HAVCR1 with a Focus on Immunological and Prognostic Roles in Breast Cancer.","authors":"Wen Sun, Weiya Zhang, Jianyi Zhao, Mingyi Sang, Qixuan Feng, Wenbin Zhou, Yue Sun","doi":"10.1177/11769351251393148","DOIUrl":"10.1177/11769351251393148","url":null,"abstract":"<p><strong>Background: </strong>Breast cancer remains a predominant malignancy and a leading cause of oncologic mortality among women globally. The discovery of novel biomarkers is crucial for improving therapeutic outcomes.</p><p><strong>Methods: </strong>We conducted a comprehensive analysis of the immunological and prognostic significance of hepatitis A virus cellular receptor 1 (HAVCR1) in breast cancer using publicly available datasets.</p><p><strong>Results: </strong>HAVCR1 expression was markedly downregulated in breast cancer tissues. Significantly, lower expression levels of HAVCR1 in pre-treatment tumor samples were associated with poorer prognosis among pan-cancer patients undergoing immunotherapy, and a higher incidence of metastasis was observed in the breast cancer subgroup. Subtype-specific DEG analyses further indicated that distinct patterns of immune infiltration may underlie this association. Moreover, gene set enrichment analysis (GSEA) highlighted the immunological relevance of HAVCR1, particularly its involvement in T cell activation within the TNBC subtype. Clinically, elevated levels of HAVCR1 expression in pre-treatment T cells were indicative of a more favorable response to PD-1 blockade therapy compared to those with diminished expression.</p><p><strong>Conclusion: </strong>The expression of HAVCR1 exhibits a strong correlation with immune infiltration and holds potential as a prognostic biomarker for breast cancer, offering predictive insight into the efficacy of immunotherapeutic interventions.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251393148"},"PeriodicalIF":2.5,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12663051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145649533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Random Forest Identifies Important Genetic Prognostic Factors for Breast Cancer Survival Time. 无监督随机森林识别乳腺癌生存时间的重要遗传预后因素。
IF 2.5 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-28 eCollection Date: 2025-01-01 DOI: 10.1177/11769351251393146
Benjamin Goldberg, Eric Nels Pederson, Zhengqing Ouyang

Objective: Breast cancer is one of the most prominent and deadly diseases in the world, and its prognosis varies widely based on the expression of certain genes. Identification of these genes is important for developing and interpreting clinical prognostic tests as well as furthering our understanding of breast cancer biology. We expand on prior efforts in the field toward identifying prognostic genes, by integrating powerful statistical methods.

Methods: To this end, we use an unsupervised random forest model, which allows for robust learning of non-linear gene expression/survival relationships and the ability to identify the most important genes affecting both positive and negative breast cancer prognosis. In total, 1,518 participants were considered from the METABRIC dataset, using 20,387 mRNA expression level variables and 23 clinical variables including HER2 mutation status. The top 250 & bottom 250 expressing genes and 6 clinical features were selected for the unsupervised random forest model.

Results: Our research corroborates previous discoveries of 27 important prognostic genes while also identifying 3 genes as potentially novel prognostic factors. Based on gene ontology analysis, we additionally show that these genes have plausible connections to breast cancer biology that should be experimentally investigated.

Conclusions: Here, we demonstrate the utility of the unsupervised random forest model over K-means clustering for identifying important genes in breast cancer.

目的:乳腺癌是世界上最突出和最致命的疾病之一,其预后因某些基因的表达而有很大差异。这些基因的鉴定对于发展和解释临床预后测试以及进一步加深我们对乳腺癌生物学的理解非常重要。通过整合强大的统计方法,我们扩展了先前在识别预后基因领域的努力。方法:为此,我们使用无监督随机森林模型,该模型允许对非线性基因表达/生存关系进行鲁棒学习,并能够识别影响乳腺癌阳性和阴性预后的最重要基因。总共从METABRIC数据集中考虑了1,518名参与者,使用了20,387个mRNA表达水平变量和23个临床变量,包括HER2突变状态。选择表达基因最多的250个和表达基因最少的250个以及6个临床特征作为无监督随机森林模型。结果:我们的研究证实了先前发现的27个重要预后基因,同时也确定了3个基因可能是新的预后因素。基于基因本体论分析,我们还表明这些基因与乳腺癌生物学有合理的联系,应该进行实验研究。结论:在这里,我们展示了非监督随机森林模型在K-means聚类中识别乳腺癌重要基因的效用。
{"title":"Unsupervised Random Forest Identifies Important Genetic Prognostic Factors for Breast Cancer Survival Time.","authors":"Benjamin Goldberg, Eric Nels Pederson, Zhengqing Ouyang","doi":"10.1177/11769351251393146","DOIUrl":"10.1177/11769351251393146","url":null,"abstract":"<p><strong>Objective: </strong>Breast cancer is one of the most prominent and deadly diseases in the world, and its prognosis varies widely based on the expression of certain genes. Identification of these genes is important for developing and interpreting clinical prognostic tests as well as furthering our understanding of breast cancer biology. We expand on prior efforts in the field toward identifying prognostic genes, by integrating powerful statistical methods.</p><p><strong>Methods: </strong>To this end, we use an unsupervised random forest model, which allows for robust learning of non-linear gene expression/survival relationships and the ability to identify the most important genes affecting both positive and negative breast cancer prognosis. In total, 1,518 participants were considered from the METABRIC dataset, using 20,387 mRNA expression level variables and 23 clinical variables including <i>HER2</i> mutation status. The top 250 & bottom 250 expressing genes and 6 clinical features were selected for the unsupervised random forest model.</p><p><strong>Results: </strong>Our research corroborates previous discoveries of 27 important prognostic genes while also identifying 3 genes as potentially novel prognostic factors. Based on gene ontology analysis, we additionally show that these genes have plausible connections to breast cancer biology that should be experimentally investigated.</p><p><strong>Conclusions: </strong>Here, we demonstrate the utility of the unsupervised random forest model over K-means clustering for identifying important genes in breast cancer.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251393146"},"PeriodicalIF":2.5,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12663042/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145649557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparative RNA-Seq Analysis of Colon Spheroids and Patient-derived Tissues Identifies Non-Canonical Transcript Isoforms of Protein-Coding Genes Implicated in Colon Carcinogenesis. 结肠球状体和患者源性组织的RNA-Seq比较分析鉴定了与结肠癌发生有关的蛋白质编码基因的非规范转录异构体
IF 2.5 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-24 eCollection Date: 2025-01-01 DOI: 10.1177/11769351251396250
Tamara Babic, Bojana Banovic Djeri, Dunja Pavlovic, Sandra Dragicevic, Jovana Despotovic, Jelena Karanovic, Aleksandra Nikolic

Objectives: This study aimed to identify transcript isoforms of protein-coding genes with potential relevance to the malignant transformation of gut mucosa.

Methods: Colon cancer cell lines (HCT116, DLD1, SW620) and immortalized cells derived from healthy gut epithelium (HCEC-1CT) were cultured as spheroids and subjected to RNA sequencing to profile both canonical and non-canonical transcripts. The resulting data were compared with prior bioinformatics study findings that analyzed RNA-seq datasets from 473 patient-derived tumor and 417 non-tumor colon tissue samples.

Results: Among 375 transcripts previously reported as significantly dysregulated in colon (39 up-regulated and 336 down-regulated), 32 transcripts displayed expression patterns in colon cell lines consistent with those observed in patient tissues (4 up-regulated and 28 down-regulated). In silico characterization of these molecules revealed that all of them exhibited at least 1 feature commonly associated with RNAs possessing regulatory functions, such as coding truncated protein isoform, exosomal localization, or enrichment in repetitive elements. The most prominently dysregulated transcripts with consistent expression profiles across both datasets were NTMT1-204 (up-regulated in cancer) and BLOC1S6-218 and DCTN1-205 (both down-regulated in cancer). The remaining 343 transcripts did not show consistent expression patterns in the cell lines, suggesting their dysregulation in patient-derived tissues may be due to the stromal or microenvironmental factors absent in vitro.

Conclusion: In summary, this comparative transcriptomic analysis identified 32 transcript isoforms, comprising 2 canonical and 30 non-canonical transcripts, that may play regulatory roles in colon carcinogenesis and warrant further investigation in the context of gut epithelial cell biology.

目的:本研究旨在鉴定与肠黏膜恶性转化潜在相关的蛋白质编码基因的转录异构体。方法:将结肠癌细胞系(HCT116、DLD1、SW620)和来源于健康肠道上皮的永生化细胞(HCEC-1CT)培养成球形,并进行RNA测序以分析典型和非典型转录物。结果数据与先前的生物信息学研究结果进行了比较,这些研究结果分析了来自473例患者来源的肿瘤和417例非肿瘤结肠组织样本的RNA-seq数据集。结果:在先前报道的结肠中显著失调的375个转录本中(39个上调,336个下调),32个转录本在结肠细胞系中的表达模式与患者组织中的表达模式一致(4个上调,28个下调)。这些分子的硅表征表明,它们都表现出至少一种与具有调节功能的rna相关的特征,如编码截断的蛋白质异构体、外泌体定位或重复元件的富集。两个数据集中表达谱一致的最显著的失调转录本是NTMT1-204(在癌症中上调)和BLOC1S6-218和DCTN1-205(在癌症中均下调)。其余343个转录本在细胞系中没有表现出一致的表达模式,这表明它们在患者来源的组织中的失调可能是由于体外缺乏基质或微环境因素。结论:总之,本比较转录组学分析鉴定出32个转录异构体,包括2个典型转录本和30个非典型转录本,它们可能在结肠癌发生中发挥调节作用,值得在肠道上皮细胞生物学的背景下进一步研究。
{"title":"Comparative RNA-Seq Analysis of Colon Spheroids and Patient-derived Tissues Identifies Non-Canonical Transcript Isoforms of Protein-Coding Genes Implicated in Colon Carcinogenesis.","authors":"Tamara Babic, Bojana Banovic Djeri, Dunja Pavlovic, Sandra Dragicevic, Jovana Despotovic, Jelena Karanovic, Aleksandra Nikolic","doi":"10.1177/11769351251396250","DOIUrl":"https://doi.org/10.1177/11769351251396250","url":null,"abstract":"<p><strong>Objectives: </strong>This study aimed to identify transcript isoforms of protein-coding genes with potential relevance to the malignant transformation of gut mucosa.</p><p><strong>Methods: </strong>Colon cancer cell lines (HCT116, DLD1, SW620) and immortalized cells derived from healthy gut epithelium (HCEC-1CT) were cultured as spheroids and subjected to RNA sequencing to profile both canonical and non-canonical transcripts. The resulting data were compared with prior bioinformatics study findings that analyzed RNA-seq datasets from 473 patient-derived tumor and 417 non-tumor colon tissue samples.</p><p><strong>Results: </strong>Among 375 transcripts previously reported as significantly dysregulated in colon (39 up-regulated and 336 down-regulated), 32 transcripts displayed expression patterns in colon cell lines consistent with those observed in patient tissues (4 up-regulated and 28 down-regulated). In silico characterization of these molecules revealed that all of them exhibited at least 1 feature commonly associated with RNAs possessing regulatory functions, such as coding truncated protein isoform, exosomal localization, or enrichment in repetitive elements. The most prominently dysregulated transcripts with consistent expression profiles across both datasets were NTMT1-204 (up-regulated in cancer) and BLOC1S6-218 and DCTN1-205 (both down-regulated in cancer). The remaining 343 transcripts did not show consistent expression patterns in the cell lines, suggesting their dysregulation in patient-derived tissues may be due to the stromal or microenvironmental factors absent in vitro.</p><p><strong>Conclusion: </strong>In summary, this comparative transcriptomic analysis identified 32 transcript isoforms, comprising 2 canonical and 30 non-canonical transcripts, that may play regulatory roles in colon carcinogenesis and warrant further investigation in the context of gut epithelial cell biology.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251396250"},"PeriodicalIF":2.5,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12647565/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145640286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lymphoma Imaging in HIV and Non-HIV Patients: A Retrospective Cross-Sectional Study With Clinical and Pathological Correlation. HIV和非HIV患者的淋巴瘤影像学:具有临床和病理相关性的回顾性横断面研究。
IF 2.5 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-23 eCollection Date: 2025-01-01 DOI: 10.1177/11769351251394271
Poonamjeet Kaur Loyal, Edward Chege, Jasmit Shah, Anne Mwirigi, Samuel Nguku Gitau

Background: Patients with Human Immunodeficiency Virus (HIV)have an atypical imaging pattern of lymphoma. There is paucity of literature on differences in tumor volume or burden of disease amongst HIV positive patients compared with HIV negative patients and how this correlates with clinicopathological parameters of aggressiveness and prognosis.

Methods: This was a retrospective cross-sectional study of patients with non-Hodgkin lymphoma which were categorized into HIV positive and HIV negative. The tumor burden, disease sites, international prognostic score and Ki-67 index were recorded. Continuous variables were analyzed using the Kruskal Wallis test and categorical variables with Fisher's Exact test.

Results: Out of the 92 patients with non-Hodgkin lymphoma, 51.1% were HIV positive with a median age of 45.0 years. The median sum of product diameters used to measure tumor burden was 102.6 [IQR: 51.7, 173.1] with no significant difference seen between the 2 groups. The extranodal disease was significantly higher in the HIV positive group (85.1%) while exclusive nodal disease was seen predominantly in the non-HIV group (66.7%) (P < .001). Complete treatment response was higher in the non-HIV group 54.5% compared to 20.9% for the HIV group (P < .001). More HIV positive patients succumbed, 37.2% compared to the 4.5% for non-HIV patients (P < .001).

Conclusion: HIV-related lymphoma remains a poorly understood subset. Although there was no significant difference in overall tumor burden between HIV positive and negative patients, extranodal disease was significantly higher in the HIV positive patients. Furthermore, the clinical prognostication score and Ki-67 which apply well for HIV-negative patients may not apply for HIV-related lymphoma.

背景:人类免疫缺陷病毒(HIV)患者具有非典型的淋巴瘤影像学特征。与HIV阴性患者相比,HIV阳性患者的肿瘤体积或疾病负担的差异以及这与侵袭性和预后的临床病理参数之间的关系,文献很少。方法:对HIV阳性和HIV阴性的非霍奇金淋巴瘤患者进行回顾性横断面研究。记录肿瘤负荷、发病部位、国际预后评分及Ki-67指数。连续变量采用Kruskal Wallis检验,分类变量采用Fisher精确检验。结果:92例非霍奇金淋巴瘤患者中,51.1%为HIV阳性,中位年龄为45.0岁。用于测量肿瘤负荷的产品直径中位数和为102.6 [IQR: 51.7, 173.1],两组间无显著差异。结外疾病在HIV阳性组中显著增加(85.1%),而排他性淋巴结疾病主要见于非HIV组(66.7%)(P P P)。虽然HIV阳性和阴性患者的总体肿瘤负担没有显著差异,但HIV阳性患者的结外病变明显更高。此外,适用于hiv阴性患者的临床预后评分和Ki-67可能不适用于hiv相关淋巴瘤。
{"title":"Lymphoma Imaging in HIV and Non-HIV Patients: A Retrospective Cross-Sectional Study With Clinical and Pathological Correlation.","authors":"Poonamjeet Kaur Loyal, Edward Chege, Jasmit Shah, Anne Mwirigi, Samuel Nguku Gitau","doi":"10.1177/11769351251394271","DOIUrl":"https://doi.org/10.1177/11769351251394271","url":null,"abstract":"<p><strong>Background: </strong>Patients with Human Immunodeficiency Virus (HIV)have an atypical imaging pattern of lymphoma. There is paucity of literature on differences in tumor volume or burden of disease amongst HIV positive patients compared with HIV negative patients and how this correlates with clinicopathological parameters of aggressiveness and prognosis.</p><p><strong>Methods: </strong>This was a retrospective cross-sectional study of patients with non-Hodgkin lymphoma which were categorized into HIV positive and HIV negative. The tumor burden, disease sites, international prognostic score and Ki-67 index were recorded. Continuous variables were analyzed using the Kruskal Wallis test and categorical variables with Fisher's Exact test.</p><p><strong>Results: </strong>Out of the 92 patients with non-Hodgkin lymphoma, 51.1% were HIV positive with a median age of 45.0 years. The median sum of product diameters used to measure tumor burden was 102.6 [IQR: 51.7, 173.1] with no significant difference seen between the 2 groups. The extranodal disease was significantly higher in the HIV positive group (85.1%) while exclusive nodal disease was seen predominantly in the non-HIV group (66.7%) (<i>P</i> < .001). Complete treatment response was higher in the non-HIV group 54.5% compared to 20.9% for the HIV group (<i>P</i> < .001). More HIV positive patients succumbed, 37.2% compared to the 4.5% for non-HIV patients (<i>P</i> < .001).</p><p><strong>Conclusion: </strong>HIV-related lymphoma remains a poorly understood subset. Although there was no significant difference in overall tumor burden between HIV positive and negative patients, extranodal disease was significantly higher in the HIV positive patients. Furthermore, the clinical prognostication score and Ki-67 which apply well for HIV-negative patients may not apply for HIV-related lymphoma.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"24 ","pages":"11769351251394271"},"PeriodicalIF":2.5,"publicationDate":"2025-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12644430/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145640519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Cancer Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1