首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
GFSeeker: a splicing-graph-based approach for accurate gene fusion detection from long-read RNA sequencing data. GFSeeker:一种基于剪接图的方法,用于从长读RNA测序数据中精确检测基因融合。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf702
Bingyan Wang, Heng Hu, Runtian Gao, Guohua Wang, Tao Jiang

Gene fusions are critical oncogenic drivers and therapeutic targets in diverse cancers. Long-read ribonucleic acid sequencing (RNA-seq) offers an unprecedented opportunity to resolve the full-length structure of fusion isoforms, but its high intrinsic error rates pose significant challenges to the precise identification of true fusion events. Here, we developed GFSeeker, an innovative splicing-graph-based computational framework for accurate gene fusion detection from long-read RNA-seq. GFSeeker employs a unique pipeline based on a splicing graph reference and a dual re-alignment validation to effectively overcome data noise from high error rates. Benchmarking across simulated, non-tumor, and cancer cell line datasets demonstrated GFSeeker's state-of-the-art performance, achieving 6%-15% higher F1 score compared to existing methods. Notably, GFSeeker successfully identified the known fusion event, MATN2-POP1, in the MCF-7 cancer cell line, missed by other tools, highlighting its superior sensitivity in resolving complex fusion events. These results validate GFSeeker as a powerful and reliable tool for gene fusion discovery, heralding its significant potential to advance cancer research and precision diagnostics.

基因融合是多种癌症的关键致癌驱动因素和治疗靶点。长读核糖核酸测序(RNA-seq)为解决融合异构体的全长结构提供了前所未有的机会,但其高固有错误率对准确识别真正的融合事件构成了重大挑战。在这里,我们开发了GFSeeker,这是一个创新的基于剪接图的计算框架,用于从长读RNA-seq中精确检测基因融合。GFSeeker采用了基于拼接图参考和双重重新对齐验证的独特管道,有效克服了高错误率带来的数据噪声。模拟、非肿瘤和癌细胞系数据集的基准测试表明,GFSeeker具有最先进的性能,与现有方法相比,F1得分提高了6%-15%。值得注意的是,GFSeeker成功地识别了MCF-7癌细胞系中已知的融合事件MATN2-POP1,这是其他工具无法识别的,突出了其在解决复杂融合事件方面的优越敏感性。这些结果验证了GFSeeker是一种强大而可靠的基因融合发现工具,预示着其在推进癌症研究和精确诊断方面的巨大潜力。
{"title":"GFSeeker: a splicing-graph-based approach for accurate gene fusion detection from long-read RNA sequencing data.","authors":"Bingyan Wang, Heng Hu, Runtian Gao, Guohua Wang, Tao Jiang","doi":"10.1093/bib/bbaf702","DOIUrl":"10.1093/bib/bbaf702","url":null,"abstract":"<p><p>Gene fusions are critical oncogenic drivers and therapeutic targets in diverse cancers. Long-read ribonucleic acid sequencing (RNA-seq) offers an unprecedented opportunity to resolve the full-length structure of fusion isoforms, but its high intrinsic error rates pose significant challenges to the precise identification of true fusion events. Here, we developed GFSeeker, an innovative splicing-graph-based computational framework for accurate gene fusion detection from long-read RNA-seq. GFSeeker employs a unique pipeline based on a splicing graph reference and a dual re-alignment validation to effectively overcome data noise from high error rates. Benchmarking across simulated, non-tumor, and cancer cell line datasets demonstrated GFSeeker's state-of-the-art performance, achieving 6%-15% higher F1 score compared to existing methods. Notably, GFSeeker successfully identified the known fusion event, MATN2-POP1, in the MCF-7 cancer cell line, missed by other tools, highlighting its superior sensitivity in resolving complex fusion events. These results validate GFSeeker as a powerful and reliable tool for gene fusion discovery, heralding its significant potential to advance cancer research and precision diagnostics.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12777712/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145917105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GRAFT: a graph-aware fusion transformer for cancer driver gene prediction. GRAFT:用于癌症驱动基因预测的图形感知融合转换器。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf706
Sang-Pil Cho, Young-Rae Cho

Identifying cancer driver genes is essential for precision oncology, but existing computational methods are often limited by their reliance on single biological networks and their inability to capture long-range molecular dependencies. To address these challenges, we propose GRAFT, a Graph-Aware Fusion Transformer. This framework learns modality-specific features from protein-protein interactions, pathway co-occurrence, and gene semantic similarity using a multi-view graph encoder. These representations are further enriched with two auxiliary feature types: structural encodings derived from network topology and functional embeddings guided by curated gene sets. The integrated features are then processed by a transformer backbone, where a novel edge-attention bias makes the model explicitly sensitive to the underlying graph topologies, enabling the effective modeling of both local and global dependencies. Extensive evaluations demonstrate that GRAFT achieves competitive performance with leading state-of-the-art methods in pan-cancer analysis, while consistently delivering superior predictive accuracy across numerous specific cancer types. More importantly, a functional enrichment analysis of the novel candidate driver genes predicted by our model confirms their strong associations with key cancer-related processes, demonstrating the model's ability to make biologically plausible discoveries. By delivering a powerful and interpretable framework, our model not only advances the identification of cancer driver genes but also establishes a robust paradigm for multimodal data integration in systems biology. The source codes and datasets are publicly accessible at https://github.com/spcho-dev/GRAFT.

确定癌症驱动基因对精确肿瘤学至关重要,但现有的计算方法往往受到其依赖单一生物网络和无法捕获远程分子依赖性的限制。为了解决这些挑战,我们提出了GRAFT,一个图形感知融合变压器。该框架使用多视图图编码器从蛋白质相互作用、途径共发生和基因语义相似性中学习模式特异性特征。这些表示进一步丰富了两种辅助特征类型:来自网络拓扑的结构编码和由精心策划的基因集引导的功能嵌入。然后由变压器主干处理集成的特征,其中新颖的边缘注意偏差使模型显式地对底层图拓扑敏感,从而实现对局部和全局依赖关系的有效建模。广泛的评估表明,GRAFT在泛癌症分析中具有领先的最先进的方法,同时在许多特定癌症类型中始终如一地提供卓越的预测准确性。更重要的是,我们的模型预测的新的候选驱动基因的功能富集分析证实了它们与关键癌症相关过程的强烈关联,证明了该模型有能力做出生物学上合理的发现。通过提供一个强大且可解释的框架,我们的模型不仅推进了癌症驱动基因的识别,而且为系统生物学中的多模态数据集成建立了一个强大的范例。源代码和数据集可在https://github.com/spcho-dev/GRAFT公开访问。
{"title":"GRAFT: a graph-aware fusion transformer for cancer driver gene prediction.","authors":"Sang-Pil Cho, Young-Rae Cho","doi":"10.1093/bib/bbaf706","DOIUrl":"10.1093/bib/bbaf706","url":null,"abstract":"<p><p>Identifying cancer driver genes is essential for precision oncology, but existing computational methods are often limited by their reliance on single biological networks and their inability to capture long-range molecular dependencies. To address these challenges, we propose GRAFT, a Graph-Aware Fusion Transformer. This framework learns modality-specific features from protein-protein interactions, pathway co-occurrence, and gene semantic similarity using a multi-view graph encoder. These representations are further enriched with two auxiliary feature types: structural encodings derived from network topology and functional embeddings guided by curated gene sets. The integrated features are then processed by a transformer backbone, where a novel edge-attention bias makes the model explicitly sensitive to the underlying graph topologies, enabling the effective modeling of both local and global dependencies. Extensive evaluations demonstrate that GRAFT achieves competitive performance with leading state-of-the-art methods in pan-cancer analysis, while consistently delivering superior predictive accuracy across numerous specific cancer types. More importantly, a functional enrichment analysis of the novel candidate driver genes predicted by our model confirms their strong associations with key cancer-related processes, demonstrating the model's ability to make biologically plausible discoveries. By delivering a powerful and interpretable framework, our model not only advances the identification of cancer driver genes but also establishes a robust paradigm for multimodal data integration in systems biology. The source codes and datasets are publicly accessible at https://github.com/spcho-dev/GRAFT.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790624/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145948471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comprehensive survey of genome language models in bioinformatics. 生物信息学中基因组语言模型的综合综述。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf724
Liyuan Shu, Jiao Tang, Xiaoyu Guan, Daoqiang Zhang

Large language models have revolutionized natural language processing by effectively modeling complex semantics and capturing long-range contextual relationships. Inspired by these advancements, genome language models (gLMs) have recently emerged, conceptualizing DNA and RNA sequences as biological texts and enabling the identification of intricate genomic grammar and distant regulatory interactions. This review examines the need for gLMs, emphasizing their capacity to overcome the limitations of traditional deep learning approaches in genomic sequence characterization. We comprehensively survey contemporary gLM architectures, including Transformer models, Hyena convolutions, and state space models, as well as various sequence tokenization strategies, assessing their applicability, and effectiveness across diverse genomic applications. Additionally, we discuss foundational pretraining strategies and provide an overview of genomic pretraining datasets spanning multiple species and functional domains. We critically analyze evaluation methodologies, including supervised, zero-shot, and few-shot learning paradigms, as well as fine-tuning approaches. An extensive taxonomy of downstream tasks is presented, alongside a summary of existing benchmarks and emerging trends. Finally, we contemplate key challenges such as data scarcity, interpretability, and the computational demands of genomic modeling, and propose a roadmap to guide future advances in genome language modeling.

大型语言模型通过有效地建模复杂语义和捕获远程上下文关系,彻底改变了自然语言处理。受这些进步的启发,基因组语言模型(gLMs)最近出现,将DNA和RNA序列概念化为生物学文本,并使复杂的基因组语法和远程调控相互作用得以识别。这篇综述探讨了对glm的需求,强调了它们在基因组序列表征中克服传统深度学习方法局限性的能力。我们全面调查了当代的gLM架构,包括Transformer模型、鬣狗卷积和状态空间模型,以及各种序列标记化策略,评估了它们在不同基因组应用中的适用性和有效性。此外,我们还讨论了基本的预训练策略,并提供了跨多个物种和功能域的基因组预训练数据集的概述。我们批判性地分析评估方法,包括监督、零试和少试学习范式,以及微调方法。介绍了下游任务的广泛分类,以及现有基准和新趋势的摘要。最后,我们展望了基因组建模的关键挑战,如数据稀缺性、可解释性和计算需求,并提出了指导基因组语言建模未来发展的路线图。
{"title":"A comprehensive survey of genome language models in bioinformatics.","authors":"Liyuan Shu, Jiao Tang, Xiaoyu Guan, Daoqiang Zhang","doi":"10.1093/bib/bbaf724","DOIUrl":"10.1093/bib/bbaf724","url":null,"abstract":"<p><p>Large language models have revolutionized natural language processing by effectively modeling complex semantics and capturing long-range contextual relationships. Inspired by these advancements, genome language models (gLMs) have recently emerged, conceptualizing DNA and RNA sequences as biological texts and enabling the identification of intricate genomic grammar and distant regulatory interactions. This review examines the need for gLMs, emphasizing their capacity to overcome the limitations of traditional deep learning approaches in genomic sequence characterization. We comprehensively survey contemporary gLM architectures, including Transformer models, Hyena convolutions, and state space models, as well as various sequence tokenization strategies, assessing their applicability, and effectiveness across diverse genomic applications. Additionally, we discuss foundational pretraining strategies and provide an overview of genomic pretraining datasets spanning multiple species and functional domains. We critically analyze evaluation methodologies, including supervised, zero-shot, and few-shot learning paradigms, as well as fine-tuning approaches. An extensive taxonomy of downstream tasks is presented, alongside a summary of existing benchmarks and emerging trends. Finally, we contemplate key challenges such as data scarcity, interpretability, and the computational demands of genomic modeling, and propose a roadmap to guide future advances in genome language modeling.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805252/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A systematic review of molecular representation learning foundation models. 分子表征学习基础模型的系统综述。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf703
Bosheng Song, Jiayi Zhang, Ying Liu, Yuansheng Liu, Jing Jiang, Sisi Yuan, Xia Zhen, Yiping Liu

Molecular representation learning (MRL) is afoundation in leveraging computational methods for drug discovery, enabling the transformation of molecular structure and properties into numerical vectors. These vectors serve as input for machine learning models and facilitate the prediction and analysis of molecular attributes, functions, and reactions. The advent of foundation models has introduced both new opportunities and challenges to MRL. These models have improved generalizability and migration in scarce data. Through pretraining and fine-tuning, foundation models can be adapted to various domains. Their robust encoding and generative abilities also allow the transformation of molecular data into more expressive forms. This paper provides a detailed review of current mainstream molecular descriptors and datasets, focusing primarily on the representation of small molecules while excluding larger molecules such as proteins and peptides. It classifies foundation models into two primary categories based on the form of input: unimodal-based and multimodal-based models. For each category, representative models are identified and their advantages and disadvantages evaluated. Moreover, we systematically summarize four core pretraining strategies for MRL foundation models, analyzing their task designs, applicable scenarios, and impacts on downstream performance. In addition, the application of molecular representation foundation models in drug discovery and development is discussed, together with the current status of model interpretability. The paper concludes with insights into the future directions of MRL foundation models.

分子表示学习(MRL)是利用计算方法进行药物发现的基础,能够将分子结构和性质转换为数值向量。这些载体作为机器学习模型的输入,促进了分子属性、功能和反应的预测和分析。基础模型的出现给MRL带来了新的机遇和挑战。这些模型提高了在稀缺数据中的泛化和迁移能力。通过预训练和微调,基础模型可以适应不同的领域。它们强大的编码和生成能力也允许将分子数据转换为更具表现力的形式。本文提供了当前主流分子描述符和数据集的详细回顾,主要集中在小分子的表示,而不包括大分子,如蛋白质和肽。它根据输入形式将基础模型分为两大类:基于单模态的模型和基于多模态的模型。对于每个类别,确定了具有代表性的模型,并评估了它们的优缺点。此外,我们系统地总结了MRL基础模型的四种核心预训练策略,分析了它们的任务设计、适用场景以及对下游性能的影响。此外,还讨论了分子表示基础模型在药物发现和开发中的应用,以及模型可解释性的现状。最后,对MRL基础模型的未来发展方向进行了展望。
{"title":"A systematic review of molecular representation learning foundation models.","authors":"Bosheng Song, Jiayi Zhang, Ying Liu, Yuansheng Liu, Jing Jiang, Sisi Yuan, Xia Zhen, Yiping Liu","doi":"10.1093/bib/bbaf703","DOIUrl":"10.1093/bib/bbaf703","url":null,"abstract":"<p><p>Molecular representation learning (MRL) is afoundation in leveraging computational methods for drug discovery, enabling the transformation of molecular structure and properties into numerical vectors. These vectors serve as input for machine learning models and facilitate the prediction and analysis of molecular attributes, functions, and reactions. The advent of foundation models has introduced both new opportunities and challenges to MRL. These models have improved generalizability and migration in scarce data. Through pretraining and fine-tuning, foundation models can be adapted to various domains. Their robust encoding and generative abilities also allow the transformation of molecular data into more expressive forms. This paper provides a detailed review of current mainstream molecular descriptors and datasets, focusing primarily on the representation of small molecules while excluding larger molecules such as proteins and peptides. It classifies foundation models into two primary categories based on the form of input: unimodal-based and multimodal-based models. For each category, representative models are identified and their advantages and disadvantages evaluated. Moreover, we systematically summarize four core pretraining strategies for MRL foundation models, analyzing their task designs, applicable scenarios, and impacts on downstream performance. In addition, the application of molecular representation foundation models in drug discovery and development is discussed, together with the current status of model interpretability. The paper concludes with insights into the future directions of MRL foundation models.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12784970/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145932191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepRMSF: a deep learning-based automated approach for predicting atomic-level flexibility in RNA structure. DeepRMSF:一种基于深度学习的自动化方法,用于预测RNA结构的原子水平灵活性。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf720
Chenjie Feng, Xiaowen Sun, Xintao Song, Lei Bao, Weikang Gong, Renmin Han

Understanding RNA conformational dynamics is essential to understand its roles in complex biological processes. While computational methods have revolutionized the prediction of static 3D RNA structures, predicting local flexibility directly from structure remains a significant challenge. We developed DeepRMSF, a deep learning-based method that leverages atomic-level descriptions of RNA to predict vibrational flexibility given a tertiary structure. Trained on MD-derived root-mean-square fluctuations(RMSF), DeepRMSF was benchmarked on 371 nonredundant RNAs, with 311 RNAs used for five-fold cross-validation (PCC = 0.7219-0.7464) and 60 RNAs as an independent test set (PCC = 0.734), ensuring minimal sequence/structural similarity between sets. DeepRMSF predicts the local flexibility of medium-sized RNAs (~75 nucleotides) in ~8.2 s, achieving >3000-fold speed-up over MD simulations while maintaining strong extrapolative accuracy. Rather than replacing MD, DeepRMSF offers a scalable and practical alternative for transcriptome-scale screening of RNA flexibility, facilitating studies on RNA structure-dynamics-function relationships and supporting computational modeling in RNA biology.

理解RNA构象动力学对于理解其在复杂生物过程中的作用至关重要。虽然计算方法已经彻底改变了静态3D RNA结构的预测,但直接从结构预测局部灵活性仍然是一个重大挑战。我们开发了DeepRMSF,这是一种基于深度学习的方法,利用RNA的原子水平描述来预测给定三级结构的振动灵活性。DeepRMSF在md衍生的均方根波动(RMSF)上进行训练,对371个非冗余rna进行基准测试,其中311个rna用于五重交叉验证(PCC = 0.719 -0.7464), 60个rna作为独立测试集(PCC = 0.734),确保集合之间的序列/结构相似性最小。DeepRMSF在8.2秒内预测中等大小rna(~75个核苷酸)的局部灵活性,在保持很强的外推准确性的同时,实现了比MD模拟快3000倍的速度。DeepRMSF不是取代MD,而是为转录组尺度的RNA灵活性筛选提供了可扩展和实用的替代方案,促进了RNA结构-动力学-功能关系的研究,并支持RNA生物学的计算建模。
{"title":"DeepRMSF: a deep learning-based automated approach for predicting atomic-level flexibility in RNA structure.","authors":"Chenjie Feng, Xiaowen Sun, Xintao Song, Lei Bao, Weikang Gong, Renmin Han","doi":"10.1093/bib/bbaf720","DOIUrl":"10.1093/bib/bbaf720","url":null,"abstract":"<p><p>Understanding RNA conformational dynamics is essential to understand its roles in complex biological processes. While computational methods have revolutionized the prediction of static 3D RNA structures, predicting local flexibility directly from structure remains a significant challenge. We developed DeepRMSF, a deep learning-based method that leverages atomic-level descriptions of RNA to predict vibrational flexibility given a tertiary structure. Trained on MD-derived root-mean-square fluctuations(RMSF), DeepRMSF was benchmarked on 371 nonredundant RNAs, with 311 RNAs used for five-fold cross-validation (PCC = 0.7219-0.7464) and 60 RNAs as an independent test set (PCC = 0.734), ensuring minimal sequence/structural similarity between sets. DeepRMSF predicts the local flexibility of medium-sized RNAs (~75 nucleotides) in ~8.2 s, achieving >3000-fold speed-up over MD simulations while maintaining strong extrapolative accuracy. Rather than replacing MD, DeepRMSF offers a scalable and practical alternative for transcriptome-scale screening of RNA flexibility, facilitating studies on RNA structure-dynamics-function relationships and supporting computational modeling in RNA biology.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12798811/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145965339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPCRact: a hierarchical framework for predicting ligand-induced GPCR activity via allosteric communication modeling. GPCRact:通过变构通信模型预测配体诱导的GPCR活性的分层框架。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf719
Hyojin Son, Gwan-Su Yi

Accurate prediction of ligand-induced activity for G-protein-coupled receptors (GPCRs) is a cornerstone of drug discovery, yet it is challenged by the need to model allosteric communication-the long-range signaling linking ligand binding to distal conformational changes. Prevailing sequence-based models often fail to capture these three-dimensional dynamics, a limitation frequently masked by averaged performance on simpler Class A targets. To address this, we introduce GPCRact, a novel framework that models the biophysical principles of allosteric modulation in GPCR activation. It first constructs a high-resolution, three-dimensional structure-aware graph from the heavy-atom coordinates of functionally critical residues at binding and allosteric sites. A dual attention architecture then captures the activation process: cross-attention encodes the initial ligand-protein interaction at the binding site, whereas self-attention learns the subsequent intra-protein signal propagation. This hierarchical architecture is built upon an E(n)-Equivariant Graph Neural Network (EGNN) to explicitly model conformational consequences of ligand binding, and is further refined with a tailored loss function and inference logic to mitigate error propagation. Underpinned by GPCRactDB, a comprehensive database we constructed for this study, GPCRact not only achieves state-of-the-art performance but also demonstrates robustly superior accuracy on a curated benchmark of allosterically complex receptors where existing models systematically underperform. Crucially, analysis of the learned attention weights confirms that the model identifies biologically validated allosteric pathways, offering a significant step toward resolving the black box nature of previous methods. Thus, GPCRact provides a more accurate, interpretable, and mechanistically-grounded solution to a long-standing challenge, paving the way for effective structure-guided drug discovery.

准确预测配体诱导的g蛋白偶联受体(gpcr)的活性是药物发现的基石,但它受到变构通信模型的挑战,变构通信是连接配体结合和远端构象变化的远程信号。主流的基于序列的模型常常不能捕捉到这些三维动态,这一限制常常被更简单的a类目标的平均性能所掩盖。为了解决这个问题,我们引入了GPCRact,这是一个新的框架,模拟了GPCR激活中变构调节的生物物理原理。它首先从结合位点和变构位点的功能关键残基的重原子坐标构建了一个高分辨率的三维结构感知图。双注意结构捕获了激活过程:交叉注意编码结合位点的初始配体-蛋白质相互作用,而自注意学习随后的蛋白质内信号传播。这种分层结构建立在E(n)-等变图神经网络(EGNN)的基础上,以明确地模拟配体结合的构象后果,并通过定制的损失函数和推理逻辑进一步改进,以减轻错误传播。在GPCRactDB(我们为本研究构建的一个综合数据库)的支持下,GPCRact不仅实现了最先进的性能,而且在现有模型系统表现不佳的变构复杂受体的精心基准上显示出强大的优越准确性。至关重要的是,对学习到的注意力权重的分析证实了该模型识别了生物学上有效的变构途径,为解决以前方法的黑箱性质提供了重要的一步。因此,GPCRact为长期存在的挑战提供了更准确、可解释和机械基础的解决方案,为有效的结构导向药物发现铺平了道路。
{"title":"GPCRact: a hierarchical framework for predicting ligand-induced GPCR activity via allosteric communication modeling.","authors":"Hyojin Son, Gwan-Su Yi","doi":"10.1093/bib/bbaf719","DOIUrl":"10.1093/bib/bbaf719","url":null,"abstract":"<p><p>Accurate prediction of ligand-induced activity for G-protein-coupled receptors (GPCRs) is a cornerstone of drug discovery, yet it is challenged by the need to model allosteric communication-the long-range signaling linking ligand binding to distal conformational changes. Prevailing sequence-based models often fail to capture these three-dimensional dynamics, a limitation frequently masked by averaged performance on simpler Class A targets. To address this, we introduce GPCRact, a novel framework that models the biophysical principles of allosteric modulation in GPCR activation. It first constructs a high-resolution, three-dimensional structure-aware graph from the heavy-atom coordinates of functionally critical residues at binding and allosteric sites. A dual attention architecture then captures the activation process: cross-attention encodes the initial ligand-protein interaction at the binding site, whereas self-attention learns the subsequent intra-protein signal propagation. This hierarchical architecture is built upon an E(n)-Equivariant Graph Neural Network (EGNN) to explicitly model conformational consequences of ligand binding, and is further refined with a tailored loss function and inference logic to mitigate error propagation. Underpinned by GPCRactDB, a comprehensive database we constructed for this study, GPCRact not only achieves state-of-the-art performance but also demonstrates robustly superior accuracy on a curated benchmark of allosterically complex receptors where existing models systematically underperform. Crucially, analysis of the learned attention weights confirms that the model identifies biologically validated allosteric pathways, offering a significant step toward resolving the black box nature of previous methods. Thus, GPCRact provides a more accurate, interpretable, and mechanistically-grounded solution to a long-standing challenge, paving the way for effective structure-guided drug discovery.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805254/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking knowledge graph embedding models for the prediction of oligogenic combinations. 用于预测寡基因组合的基准知识图嵌入模型。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf712
Inas Bosch, Barbara Gravel, Alexandre Renaux, Ann Nowé, Maris Laan, Tom Lenaerts

Identifying the potential oligogenic causes of rare diseases remains a challenge, notwithstanding the advancements made in the last decade. While a variety of predictive and ranking approaches have been proposed, their precision remains limited, as only a small number of high-quality training cases are available and it remains difficult to know which features may be most relevant for the design of new predictors. We hypothesize here that structured biological information, which provides an integration of various relevant biological networks and ontologies in a single heterogeneous knowledge graph, can make a difference as it allows for learning a relevant genetic representation through KGE methods. An exhaustive benchmarking is performed here wherein we assess the performance of various state-of-the-art embedding models for the task of identifying potentially pathogenic gene pairs. The results obtained show that these KGE provide highly accurate predictions, leading to an Area Under the Precision-Recall Curve of up to $0.93$, representing also a significant advancement over previous approaches for predicting gene pairs involved in oligogenic diseases. We show nonetheless that care needs to be taken in the cross-validation when using embeddings, as data leakage between folds in embedding space will reveal overly optimistic results. The further evaluation of the methods on a holdout set as well as on a group of new male infertility cases show that three Translational Distance models (TransE, MurE, and RotatE) and two of the Semantic Matching models (DisMult and QuatE) provide the better results. The analysis is concluded by comparing all known gene combinations for these top-ranking models, examining their similarities and differences. Overall, KGE provide a predictive advancement but new steps will need to be taken generate explanations as to why the pairs are relevant for oligogenic diseases.

尽管在过去十年中取得了进展,但确定罕见疾病的潜在寡基因原因仍然是一项挑战。虽然已经提出了各种预测和排序方法,但它们的精度仍然有限,因为只有少数高质量的训练案例可用,并且很难知道哪些特征可能与新预测器的设计最相关。我们在这里假设,结构化的生物信息,在单一的异构知识图谱中提供了各种相关生物网络和本体的集成,可以发挥作用,因为它允许通过KGE方法学习相关的遗传表示。这里进行了详尽的基准测试,其中我们评估了用于识别潜在致病基因对的任务的各种最先进的嵌入模型的性能。获得的结果表明,这些KGE提供了高度准确的预测,导致Precision-Recall曲线下的面积高达0.93美元,这也代表了比以前预测涉及少源性疾病的基因对的方法的重大进步。尽管如此,我们表明在使用嵌入时需要注意交叉验证,因为嵌入空间中折叠之间的数据泄漏将揭示过于乐观的结果。在holdout集合和一组新的男性不孕症病例上对这些方法的进一步评估表明,三种平移距离模型(TransE, MurE和RotatE)和两种语义匹配模型(DisMult和QuatE)提供了更好的结果。通过比较这些顶级模型的所有已知基因组合,检查它们的异同,分析得出结论。总的来说,KGE提供了一种预测性的进步,但需要采取新的步骤来解释为什么这对基因与少原性疾病相关。
{"title":"Benchmarking knowledge graph embedding models for the prediction of oligogenic combinations.","authors":"Inas Bosch, Barbara Gravel, Alexandre Renaux, Ann Nowé, Maris Laan, Tom Lenaerts","doi":"10.1093/bib/bbaf712","DOIUrl":"10.1093/bib/bbaf712","url":null,"abstract":"<p><p>Identifying the potential oligogenic causes of rare diseases remains a challenge, notwithstanding the advancements made in the last decade. While a variety of predictive and ranking approaches have been proposed, their precision remains limited, as only a small number of high-quality training cases are available and it remains difficult to know which features may be most relevant for the design of new predictors. We hypothesize here that structured biological information, which provides an integration of various relevant biological networks and ontologies in a single heterogeneous knowledge graph, can make a difference as it allows for learning a relevant genetic representation through KGE methods. An exhaustive benchmarking is performed here wherein we assess the performance of various state-of-the-art embedding models for the task of identifying potentially pathogenic gene pairs. The results obtained show that these KGE provide highly accurate predictions, leading to an Area Under the Precision-Recall Curve of up to $0.93$, representing also a significant advancement over previous approaches for predicting gene pairs involved in oligogenic diseases. We show nonetheless that care needs to be taken in the cross-validation when using embeddings, as data leakage between folds in embedding space will reveal overly optimistic results. The further evaluation of the methods on a holdout set as well as on a group of new male infertility cases show that three Translational Distance models (TransE, MurE, and RotatE) and two of the Semantic Matching models (DisMult and QuatE) provide the better results. The analysis is concluded by comparing all known gene combinations for these top-ranking models, examining their similarities and differences. Overall, KGE provide a predictive advancement but new steps will need to be taken generate explanations as to why the pairs are relevant for oligogenic diseases.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790627/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145948436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial intelligence in mitotic checkpoint modeling: transforming our understanding of cellular division through machine learning and predictive biology. 有丝分裂检查点建模中的人工智能:通过机器学习和预测生物学改变我们对细胞分裂的理解。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf729
Bashar Ibrahim

Mitotic checkpoints safeguard genomic integrity by orchestrating the precise segregation of chromosomes during cell division. Yet their complex, nonlinear dynamics have long defied full understanding through traditional experimental and computational approaches. In recent years, artificial intelligence (AI) has begun to transform this landscape. Machine learning and deep learning methods now achieve substantial accuracies in predicting cellular behaviors and uncovering novel regulatory mechanisms within checkpoint networks. Advances include transformer architectures capable of predicting spindle assembly checkpoint engagement with >95% accuracy, graph neural networks that decode kinetochore-microtubule dynamics at subpixel resolution, and hybrid AI-mechanistic models that reveal previously hidden feedback circuits. By integrating multi-omics data and bridging molecular mechanisms with clinical applications, AI-driven approaches are opening significant opportunities for precision medicine in cancer and other proliferative diseases. This review synthesizes emerging computational frameworks, highlights transformative AI-driven discoveries, and proposes a roadmap for developing predictive, personalized models of mitotic checkpoint control-charting a path from computational insight to clinical impact.

有丝分裂检查点通过在细胞分裂过程中协调染色体的精确分离来保护基因组的完整性。然而,它们复杂的非线性动力学长期以来无法通过传统的实验和计算方法得到充分的理解。近年来,人工智能(AI)已经开始改变这一格局。机器学习和深度学习方法现在在预测细胞行为和揭示检查点网络中的新调节机制方面取得了相当大的准确性。目前的进展包括能够预测主轴装配检查点啮合的变压器架构,准确度为bb0 95%,以亚像素分辨率解码着丝点-微管动力学的图形神经网络,以及揭示先前隐藏反馈电路的混合ai机制模型。通过整合多组学数据,并将分子机制与临床应用相结合,人工智能驱动的方法为癌症和其他增殖性疾病的精准医疗开辟了重要机遇。这篇综述综合了新兴的计算框架,强调了变革性的人工智能驱动的发现,并提出了一个发展有丝分裂检查点控制的预测性、个性化模型的路线图——绘制了一条从计算洞察力到临床影响的路径。
{"title":"Artificial intelligence in mitotic checkpoint modeling: transforming our understanding of cellular division through machine learning and predictive biology.","authors":"Bashar Ibrahim","doi":"10.1093/bib/bbaf729","DOIUrl":"10.1093/bib/bbaf729","url":null,"abstract":"<p><p>Mitotic checkpoints safeguard genomic integrity by orchestrating the precise segregation of chromosomes during cell division. Yet their complex, nonlinear dynamics have long defied full understanding through traditional experimental and computational approaches. In recent years, artificial intelligence (AI) has begun to transform this landscape. Machine learning and deep learning methods now achieve substantial accuracies in predicting cellular behaviors and uncovering novel regulatory mechanisms within checkpoint networks. Advances include transformer architectures capable of predicting spindle assembly checkpoint engagement with >95% accuracy, graph neural networks that decode kinetochore-microtubule dynamics at subpixel resolution, and hybrid AI-mechanistic models that reveal previously hidden feedback circuits. By integrating multi-omics data and bridging molecular mechanisms with clinical applications, AI-driven approaches are opening significant opportunities for precision medicine in cancer and other proliferative diseases. This review synthesizes emerging computational frameworks, highlights transformative AI-driven discoveries, and proposes a roadmap for developing predictive, personalized models of mitotic checkpoint control-charting a path from computational insight to clinical impact.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805251/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CNAttention: an attention-based deep multiple-instance method for uncovering copy number aberration signatures across cancers. CNAttention:一种基于注意力的深度多实例方法,用于发现癌症的拷贝数畸变特征。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf696
Ziying Yang, Michael Baudis

Somatic copy number aberrations (CNAs) represent a distinct class of genomic mutations associated with oncogenetic effects. Over the past three decades, significant volumes of CNA data have been generated through molecular-cytogenetic and genome sequencing-based techniques. These data have been pivotal in identifying cancer-related genes and advancing research on the relationship between CNAs and histopathologically defined cancer types. However, comprehensive studies of CNA landscapes and disease parameters are challenging due to the vast diagnostic and genomic heterogeneity encountered in "pan-cancer" approaches. In this study, we introduce CNAttention, an attention-based deep multiple instance learning method designed to comprehensively analyze CNAs across different cancers and uncover specific CNA patterns within integrated gene-level CNA profiles of 30 cancer types. CNAttention effectively learns CNA features unique to each cancer type and generates CNA signatures for 30 cancer types using attention mechanisms, highlighting the distinctiveness of their CNA landscapes. CNAttention demonstrates high accuracy and exhibits stable performance even with the incorporation of external datasets or parameter adjustments, underscoring its effectiveness in tumor identification. Expanding these signatures to cancer classification trees reveals common patterns not only among physiologically related cancer types but also among clinico-pathologically distant types, such as different cancers originating from neural crest derived cells. Additionally, detected signatures also uncover genomic heterogeneity in individual cancer types, for instance in brain lower grade glioma. Additional experiments with classification models underscore the efficacy of these signatures in representing various cancer types and their potential utility in clinical diagnosis.

体细胞拷贝数畸变(CNAs)代表了一类与致癌效应相关的基因组突变。在过去的三十年中,通过基于分子细胞遗传学和基因组测序的技术产生了大量的CNA数据。这些数据对于识别癌症相关基因和推进CNAs与组织病理学定义的癌症类型之间关系的研究至关重要。然而,由于在“泛癌症”方法中遇到了巨大的诊断和基因组异质性,因此对CNA景观和疾病参数的综合研究具有挑战性。在这项研究中,我们引入了一种基于注意力的深度多实例学习方法CNAttention,旨在全面分析不同癌症的CNA,并在30种癌症类型的综合基因水平CNA谱中揭示特定的CNA模式。CNAttention有效地学习每种癌症类型特有的CNA特征,并使用注意机制为30种癌症类型生成CNA特征,突出其CNA景观的独特性。CNAttention显示出很高的准确性,甚至在合并外部数据集或参数调整时也表现出稳定的性能,强调了其在肿瘤识别中的有效性。将这些特征扩展到癌症分类树中,不仅揭示了生理相关的癌症类型之间的共同模式,而且还揭示了临床病理上遥远的类型之间的共同模式,例如源自神经嵴衍生细胞的不同癌症。此外,检测到的特征也揭示了个体癌症类型的基因组异质性,例如脑低度胶质瘤。分类模型的其他实验强调了这些特征在代表各种癌症类型及其在临床诊断中的潜在效用方面的功效。
{"title":"CNAttention: an attention-based deep multiple-instance method for uncovering copy number aberration signatures across cancers.","authors":"Ziying Yang, Michael Baudis","doi":"10.1093/bib/bbaf696","DOIUrl":"10.1093/bib/bbaf696","url":null,"abstract":"<p><p>Somatic copy number aberrations (CNAs) represent a distinct class of genomic mutations associated with oncogenetic effects. Over the past three decades, significant volumes of CNA data have been generated through molecular-cytogenetic and genome sequencing-based techniques. These data have been pivotal in identifying cancer-related genes and advancing research on the relationship between CNAs and histopathologically defined cancer types. However, comprehensive studies of CNA landscapes and disease parameters are challenging due to the vast diagnostic and genomic heterogeneity encountered in \"pan-cancer\" approaches. In this study, we introduce CNAttention, an attention-based deep multiple instance learning method designed to comprehensively analyze CNAs across different cancers and uncover specific CNA patterns within integrated gene-level CNA profiles of 30 cancer types. CNAttention effectively learns CNA features unique to each cancer type and generates CNA signatures for 30 cancer types using attention mechanisms, highlighting the distinctiveness of their CNA landscapes. CNAttention demonstrates high accuracy and exhibits stable performance even with the incorporation of external datasets or parameter adjustments, underscoring its effectiveness in tumor identification. Expanding these signatures to cancer classification trees reveals common patterns not only among physiologically related cancer types but also among clinico-pathologically distant types, such as different cancers originating from neural crest derived cells. Additionally, detected signatures also uncover genomic heterogeneity in individual cancer types, for instance in brain lower grade glioma. Additional experiments with classification models underscore the efficacy of these signatures in representing various cancer types and their potential utility in clinical diagnosis.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805253/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CoxMDS: multiple data splitting for high-dimensional mediation analysis with survival outcomes in epigenome-wide studies. CoxMDS:在全表观基因组研究中,对生存结果进行高维中介分析的多重数据分割。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf730
Minhao Yao, Peixin Tian, Xihao Li, Shijia Bian, Gao Wang, Yian Gu, Ana Navas-Acien, Badri N Vardarajan, Daniel W Belsky, Gary W Miller, Andrea A Baccarelli, Zhonghua Liu

Causal mediation analysis investigates whether the effect of an exposure on an outcome operates through intermediate variables known as mediators. Although progress has been made in high-dimensional mediation analysis, current methods do not reliably control the false discovery rate (FDR) in finite samples, especially when mediators are moderately to highly correlated or follow non-Gaussian distributions. These challenges frequently arise in DNA methylation studies. We introduce CoxMDS, a multiple data splitting method that uses Cox proportional hazards models to identify putative causal mediators for survival outcomes. CoxMDS ensures finite-sample FDR control even in the presence of correlated or non-Gaussian mediators. Through simulations, CoxMDS is shown to maintain FDR control and achieve higher statistical power compared with existing approaches. In applications to DNA methylation data with survival outcomes, CoxMDS identified eight CpG sites in The Cancer Genome Atlas that are consistent with the hypothesis that DNA methylation may mediate the effect of smoking on lung cancer survival, and two CpG sites in the Alzheimer's Disease Neuroimaging Initiative that are consistent with the hypothesis that DNA methylation may mediate the effect of smoking on time to Alzheimer's disease conversion.

因果中介分析调查暴露对结果的影响是否通过被称为中介的中间变量起作用。尽管在高维中介分析方面取得了进展,但目前的方法并不能可靠地控制有限样本中的错误发现率(FDR),特别是当中介具有中等到高度相关或遵循非高斯分布时。这些挑战经常出现在DNA甲基化研究中。我们介绍了CoxMDS,这是一种多数据分割方法,使用Cox比例风险模型来确定生存结果的假定因果中介。即使存在相关或非高斯介质,CoxMDS也能确保有限样本FDR控制。仿真结果表明,与现有方法相比,CoxMDS保持了FDR控制,并具有更高的统计功率。在DNA甲基化数据与生存结果的应用中,CoxMDS在癌症基因组图谱中发现了8个CpG位点,这与DNA甲基化可能介导吸烟对肺癌生存影响的假设一致,在阿尔茨海默病神经影像学倡议中发现了2个CpG位点,这与DNA甲基化可能介导吸烟对阿尔茨海默病转化的影响的假设一致。
{"title":"CoxMDS: multiple data splitting for high-dimensional mediation analysis with survival outcomes in epigenome-wide studies.","authors":"Minhao Yao, Peixin Tian, Xihao Li, Shijia Bian, Gao Wang, Yian Gu, Ana Navas-Acien, Badri N Vardarajan, Daniel W Belsky, Gary W Miller, Andrea A Baccarelli, Zhonghua Liu","doi":"10.1093/bib/bbaf730","DOIUrl":"10.1093/bib/bbaf730","url":null,"abstract":"<p><p>Causal mediation analysis investigates whether the effect of an exposure on an outcome operates through intermediate variables known as mediators. Although progress has been made in high-dimensional mediation analysis, current methods do not reliably control the false discovery rate (FDR) in finite samples, especially when mediators are moderately to highly correlated or follow non-Gaussian distributions. These challenges frequently arise in DNA methylation studies. We introduce CoxMDS, a multiple data splitting method that uses Cox proportional hazards models to identify putative causal mediators for survival outcomes. CoxMDS ensures finite-sample FDR control even in the presence of correlated or non-Gaussian mediators. Through simulations, CoxMDS is shown to maintain FDR control and achieve higher statistical power compared with existing approaches. In applications to DNA methylation data with survival outcomes, CoxMDS identified eight CpG sites in The Cancer Genome Atlas that are consistent with the hypothesis that DNA methylation may mediate the effect of smoking on lung cancer survival, and two CpG sites in the Alzheimer's Disease Neuroimaging Initiative that are consistent with the hypothesis that DNA methylation may mediate the effect of smoking on time to Alzheimer's disease conversion.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805255/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1