Briefings in bioinformatics最新文献_第6页

Re: Qi et al. "A roadmap for T cell receptor-peptide-MHC binding prediction by machine learning: glimpse and foresight" (Briefings in Bioinformatics, 2025). 回复：Qi等。“通过机器学习预测T细胞受体-肽- mhc结合的路线图：一瞥和预见”（生物信息学简报，2025）。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag032

Cedric Ly, Stefan Bonn, Immo Prinz

引用次数: 0

Multi-seed searching algorithm for integrated codon optimization of mRNA stability and translational efficiency in vaccine design. 疫苗设计中mRNA稳定性和翻译效率整合密码子优化的多种子搜索算法。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag047

Yuhan Bo, Bingxin Liu, Shengyu Huang, Yanwei Liu, Libin Deng, Dake Zhang, Jing Zhang

Messenger RNA (mRNA) vaccines have revolutionized vaccinology with their rapid development cycles and adaptability, yet their broad application is constrained by unresolved challenges in balancing mRNA structural stability and translational efficiency. Here, we introduce a groundbreaking multi-seed searching algorithm for mRNA codon optimization, an innovative framework that synergistically co-optimizes minimum free energy and codon adaptation index through adaptive integration of simulated annealing and genetic algorithms. This novel approach enhances global search capability to escape local optima, a critical limitation of existing tools. Evaluations across long therapeutic mRNA sequences and short peptides (neoantigens from bladder cancer and melanoma) reveal our algorithm outperforms state-of-the-art LinearDesign, delivering superior balanced improvements in both stability and translational efficiency validating its unique ability to navigate the inherent trade-offs between these two key metrics. Built on this algorithm, the Optiseed platform introduces transformative features including customizable scoring functions, flexible parameters for tailored optimization, and support for integrating untranslated regions (UTRs), poly(A) tails, and other elements to enable end-to-end vaccine construct design. This innovation addresses the rigidity of conventional tools, empowering precise, context-specific optimization. Optiseed represents a robust, scalable solution for mRNA vaccine codon optimization. Its superior performance across diverse sequences underscores its potential to accelerate mRNA-based therapeutic development, particularly in personalized cancer immunotherapy, while offering a framework adaptable for other applications such as infectious disease vaccine design.

信使RNA （mRNA）疫苗以其快速的开发周期和适应性彻底改变了疫苗学，但其广泛应用受到mRNA结构稳定性和翻译效率平衡方面尚未解决的挑战的限制。在此，我们介绍了一种开创性的mRNA密码子优化多种子搜索算法，该算法通过模拟退火和遗传算法的自适应集成，协同优化最小自由能和密码子适应指数。这种新颖的方法增强了全局搜索能力，以避免局部最优，这是现有工具的一个关键限制。对长治疗mRNA序列和短肽（来自膀胱癌和黑色素瘤的新抗原）的评估表明，我们的算法优于最先进的线性设计，在稳定性和翻译效率方面提供了卓越的平衡改进，验证了其在这两个关键指标之间进行内在权衡的独特能力。基于该算法，Optiseed平台引入了变革性的功能，包括可定制的评分功能，定制优化的灵活参数，以及支持整合非翻译区域（utr），聚(A)尾部和其他元素，以实现端到端疫苗构建设计。这一创新解决了传统工具的刚性问题，实现了精确的、针对具体情况的优化。Optiseed代表了一个强大的、可扩展的mRNA疫苗密码子优化解决方案。它在不同序列上的卓越表现凸显了其加速基于mrna的治疗发展的潜力，特别是在个性化癌症免疫治疗中，同时为感染性疾病疫苗设计等其他应用提供了一个适用的框架。

{"title":"Multi-seed searching algorithm for integrated codon optimization of mRNA stability and translational efficiency in vaccine design.","authors":"Yuhan Bo, Bingxin Liu, Shengyu Huang, Yanwei Liu, Libin Deng, Dake Zhang, Jing Zhang","doi":"10.1093/bib/bbag047","DOIUrl":"10.1093/bib/bbag047","url":null,"abstract":"Messenger RNA (mRNA) vaccines have revolutionized vaccinology with their rapid development cycles and adaptability, yet their broad application is constrained by unresolved challenges in balancing mRNA structural stability and translational efficiency. Here, we introduce a groundbreaking multi-seed searching algorithm for mRNA codon optimization, an innovative framework that synergistically co-optimizes minimum free energy and codon adaptation index through adaptive integration of simulated annealing and genetic algorithms. This novel approach enhances global search capability to escape local optima, a critical limitation of existing tools. Evaluations across long therapeutic mRNA sequences and short peptides (neoantigens from bladder cancer and melanoma) reveal our algorithm outperforms state-of-the-art LinearDesign, delivering superior balanced improvements in both stability and translational efficiency validating its unique ability to navigate the inherent trade-offs between these two key metrics. Built on this algorithm, the Optiseed platform introduces transformative features including customizable scoring functions, flexible parameters for tailored optimization, and support for integrating untranslated regions (UTRs), poly(A) tails, and other elements to enable end-to-end vaccine construct design. This innovation addresses the rigidity of conventional tools, empowering precise, context-specific optimization. Optiseed represents a robust, scalable solution for mRNA vaccine codon optimization. Its superior performance across diverse sequences underscores its potential to accelerate mRNA-based therapeutic development, particularly in personalized cancer immunotherapy, while offering a framework adaptable for other applications such as infectious disease vaccine design.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12885097/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146149172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NanoPrePro: a fully equipped, fast, and memory-efficient preprocessor for nanopore transcriptomic sequencing. NanoPrePro：一个设备齐全，快速，内存高效的预处理纳米孔转录组测序。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag063

Chia-Chen Chu, Jhong-He Yu, Shang-Che Kuo, Fan-Wei Yang, Chia-Chang Lin, Chang-Hung Chen, Yi-Chen Wu, Cing Shih, Ying-Hsuan Sun, Te-Lun Mai, Ying-Lan Chen, Hsin-Hung Lin, Jung-Chen Su, Ying-Chung Jimmy Lin

NanoPrePro is a streamlined read preprocessor specifically designed for high precision in identifying full-length reads from Oxford Nanopore Technology (ONT) transcriptomic sequencing results, achieved through the precise identification of adapters/primers. However, the preprocessing of ONT reads has been a long-term neglected and ambiguous area without thorough and systematic investigation. Here, we developed NanoPrePro that outperformed the current best preprocessor, Pychopper, using simulated and real datasets. Through sequence similarity, adapter/primer location, and adapter/primer length, NanoPrePro exerted a self-optimizing function to extract the best parameters in each sequencing file for users to customize their analyses. Furthermore, NanoPrePro shows a 38-times faster speed with less memory cost. NanoPrePro can be regarded as the state-of-the-art preprocessor with forward adaptability of ONT sequencing.

NanoPrePro是一款流线型的读取预处理器，专门设计用于高精度识别来自牛津纳米孔技术（ONT）转录组测序结果的全长读取，通过精确识别适配器/引物实现。然而，ONT读取的预处理一直是一个长期被忽视和模糊的领域，没有深入和系统的研究。在这里，我们开发的NanoPrePro在使用模拟和真实数据集的情况下，优于当前最好的预处理器Pychopper。通过序列相似性、适配器/引物位置和适配器/引物长度，NanoPrePro发挥了自优化功能，从每个测序文件中提取最佳参数，供用户定制分析。此外，NanoPrePro显示速度快38倍，内存成本更低。NanoPrePro可以被认为是最先进的预处理程序，具有ONT测序的前向适应性。

{"title":"NanoPrePro: a fully equipped, fast, and memory-efficient preprocessor for nanopore transcriptomic sequencing.","authors":"Chia-Chen Chu, Jhong-He Yu, Shang-Che Kuo, Fan-Wei Yang, Chia-Chang Lin, Chang-Hung Chen, Yi-Chen Wu, Cing Shih, Ying-Hsuan Sun, Te-Lun Mai, Ying-Lan Chen, Hsin-Hung Lin, Jung-Chen Su, Ying-Chung Jimmy Lin","doi":"10.1093/bib/bbag063","DOIUrl":"10.1093/bib/bbag063","url":null,"abstract":"NanoPrePro is a streamlined read preprocessor specifically designed for high precision in identifying full-length reads from Oxford Nanopore Technology (ONT) transcriptomic sequencing results, achieved through the precise identification of adapters/primers. However, the preprocessing of ONT reads has been a long-term neglected and ambiguous area without thorough and systematic investigation. Here, we developed NanoPrePro that outperformed the current best preprocessor, Pychopper, using simulated and real datasets. Through sequence similarity, adapter/primer location, and adapter/primer length, NanoPrePro exerted a self-optimizing function to extract the best parameters in each sequencing file for users to customize their analyses. Furthermore, NanoPrePro shows a 38-times faster speed with less memory cost. NanoPrePro can be regarded as the state-of-the-art preprocessor with forward adaptability of ONT sequencing.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12903951/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146194110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identification of cancer mini-drivers by deciphering selective landscape in the cancer genome. 通过解读癌症基因组中的选择性景观来识别癌症的微型驱动因素。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf694

Xunuo Zhu, Wenyi Zhao, Siqi Wang, Jingwen Yang, Jingqi Zhou, Binbin Zhou, Ji Cao, Bo Yang, Zhan Zhou, Xun Gu

Cancer development is driven by somatic evolution and clonal selection. However, traditional selective pressure analysis methods have treated all sites within a gene equally, such a gene-level model oversimplifies the complexity of cancer evolution. In this study, we introduced CN/CS-calculator, a novel site-specific method that can capture selective pressures acting across different gene sites. By deciphering the interplay between the selection pattern and the function of a gene in oncogenesis, CN/CS-calculator uncovers a unique class of mini-driver genes, which exhibit weak positive selection, with certain critical sites providing context-dependent promoter effects on the fitness of cancer subclones while others are constrained by evolutionary conservation. Our method emphasizes the importance of site-specific analysis in uncovering how subtle evolutionary forces shape cancer biology. The refined understanding offers new insights into the mechanisms of cancer heterogeneity and molecular evolution, with potential implications for advancing therapeutic strategies and prognostic assessments.

癌症的发展是由体细胞进化和克隆选择驱动的。然而，传统的选择压力分析方法平等地对待基因内的所有位点，这种基因水平的模型过度简化了癌症进化的复杂性。在这项研究中，我们引入了CN/CS-calculator，这是一种新的位点特异性方法，可以捕获作用于不同基因位点的选择压力。通过解析基因在肿瘤发生中的选择模式和功能之间的相互作用，CN/CS-calculator揭示了一类独特的迷你驱动基因，它们表现出弱正向选择，某些关键位点对癌症亚克隆的适应度提供上下文依赖的启动子效应，而其他关键位点则受到进化守恒的限制。我们的方法强调了位点特异性分析在揭示微妙的进化力量如何塑造癌症生物学中的重要性。精细化的理解为癌症异质性和分子进化的机制提供了新的见解，对推进治疗策略和预后评估具有潜在的意义。

{"title":"Identification of cancer mini-drivers by deciphering selective landscape in the cancer genome.","authors":"Xunuo Zhu, Wenyi Zhao, Siqi Wang, Jingwen Yang, Jingqi Zhou, Binbin Zhou, Ji Cao, Bo Yang, Zhan Zhou, Xun Gu","doi":"10.1093/bib/bbaf694","DOIUrl":"10.1093/bib/bbaf694","url":null,"abstract":"Cancer development is driven by somatic evolution and clonal selection. However, traditional selective pressure analysis methods have treated all sites within a gene equally, such a gene-level model oversimplifies the complexity of cancer evolution. In this study, we introduced CN/CS-calculator, a novel site-specific method that can capture selective pressures acting across different gene sites. By deciphering the interplay between the selection pattern and the function of a gene in oncogenesis, CN/CS-calculator uncovers a unique class of mini-driver genes, which exhibit weak positive selection, with certain critical sites providing context-dependent promoter effects on the fitness of cancer subclones while others are constrained by evolutionary conservation. Our method emphasizes the importance of site-specific analysis in uncovering how subtle evolutionary forces shape cancer biology. The refined understanding offers new insights into the mechanisms of cancer heterogeneity and molecular evolution, with potential implications for advancing therapeutic strategies and prognostic assessments.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12784965/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145932212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing TFEA.ChIP with ENCODE regulatory maps for generalizable transcription factor enrichment. 加强TFEA。芯片与ENCODE调控图的通用转录因子富集。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf715

Yosra Berrouayel, Luis Del Peso

Identifying transcription factors (TFs) responsible for gene expression changes remain a central challenge in functional genomics. TFEA.ChIP is a ChIP-seq-based TF enrichment analysis tool that addresses this by linking TF binding profiles to differentially expressed genes through experimentally supported cis-regulatory element (CRE)-gene associations. Unlike motif- or heuristic-based approaches, TFEA.ChIP adopts a biologically grounded strategy by intersecting TF binding data from ReMap2022 with regulatory maps from ENCODE's rE2G and CREdb. To overcome the high context-specificity of rE2G associations, we developed filtering strategies based on confidence scores and recurrence across biosamples. Benchmarking on 342 curated gene sets from the Molecular Signatures Database C2 CGP collection showed that recurrence-based filtering significantly improved accuracy, outperforming the original GeneHancer-based implementation and leading tools including BARTv2.0, Lisa, ChEA3, and HOMER. A case study on hypoxia further validated the method, demonstrating accurate and pathway-specific enrichment of hypoxia-inducible factor-related TFs using both overrepresentation analysis and gene set enrichment analysis. Additionally, the updated implementation of TFEA.ChIP in R/Bioconductor introduces several user-friendly features, including automated analysis workflows and expression-based filtering of candidate TFs. These additions streamline the integration of TFEA.ChIP into standard RNA-seq analysis pipelines, enabling more efficient and reproducible workflows. Together with its strong benchmarking performance and biologically grounded framework, the updated tool provides a robust and accessible solution for inferring transcriptional regulators from gene expression data.

识别负责基因表达变化的转录因子（TFs）仍然是功能基因组学的核心挑战。TFEA。ChIP是一种基于ChIP-seq的TF富集分析工具，通过实验支持的顺式调控元件(CRE)-基因关联，将TF结合谱与差异表达基因联系起来，解决了这一问题。与母题或启发式方法不同，TFEA。ChIP采用基于生物学的策略，将来自ReMap2022的TF结合数据与ENCODE的rE2G和CREdb的调控图谱交叉。为了克服rE2G关联的高上下文特异性，我们开发了基于置信度评分和生物样本复发的过滤策略。对来自分子签名数据库C2 CGP收集的342个策划的基因集进行基准测试表明，基于递归的过滤显着提高了准确性，优于原始的基于genehacker的实现和领先的工具，包括BARTv2.0, Lisa， ChEA3和HOMER。一个关于缺氧的案例研究进一步验证了该方法，通过过度代表性分析和基因集富集分析，证明了缺氧诱导因子相关tf的准确和通路特异性富集。此外，更新了TFEA的实现。ChIP在R/Bioconductor中引入了几个用户友好的功能，包括自动分析工作流程和基于表达式的候选tf过滤。这些新增功能简化了TFEA的集成。ChIP进入标准RNA-seq分析管道，实现更高效和可重复的工作流程。结合其强大的基准性能和生物学基础框架，更新的工具为从基因表达数据推断转录调控因子提供了一个强大且可访问的解决方案。

{"title":"Enhancing TFEA.ChIP with ENCODE regulatory maps for generalizable transcription factor enrichment.","authors":"Yosra Berrouayel, Luis Del Peso","doi":"10.1093/bib/bbaf715","DOIUrl":"10.1093/bib/bbaf715","url":null,"abstract":"Identifying transcription factors (TFs) responsible for gene expression changes remain a central challenge in functional genomics. TFEA.ChIP is a ChIP-seq-based TF enrichment analysis tool that addresses this by linking TF binding profiles to differentially expressed genes through experimentally supported cis-regulatory element (CRE)-gene associations. Unlike motif- or heuristic-based approaches, TFEA.ChIP adopts a biologically grounded strategy by intersecting TF binding data from ReMap2022 with regulatory maps from ENCODE's rE2G and CREdb. To overcome the high context-specificity of rE2G associations, we developed filtering strategies based on confidence scores and recurrence across biosamples. Benchmarking on 342 curated gene sets from the Molecular Signatures Database C2 CGP collection showed that recurrence-based filtering significantly improved accuracy, outperforming the original GeneHancer-based implementation and leading tools including BARTv2.0, Lisa, ChEA3, and HOMER. A case study on hypoxia further validated the method, demonstrating accurate and pathway-specific enrichment of hypoxia-inducible factor-related TFs using both overrepresentation analysis and gene set enrichment analysis. Additionally, the updated implementation of TFEA.ChIP in R/Bioconductor introduces several user-friendly features, including automated analysis workflows and expression-based filtering of candidate TFs. These additions streamline the integration of TFEA.ChIP into standard RNA-seq analysis pipelines, enabling more efficient and reproducible workflows. Together with its strong benchmarking performance and biologically grounded framework, the updated tool provides a robust and accessible solution for inferring transcriptional regulators from gene expression data.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12796816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145958912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive multi-view information bottleneck for multi-omics data clustering. 多组学数据聚类的自适应多视图信息瓶颈。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf717

Zhen Tian, Xiaojiao Wei, Zhengzheng Lou, Zhixia Teng, Shouli Fu

Motivation: Recent advances in single-cell sequencing have transformed precise measurement of gene expression at cellular resolution, enabling unprecedented dissection of cellular heterogeneity and intricate biological processes. The accumulation of multi-omics data offers new avenues for cell clustering-a critical foundation for cell-type identification and downstream analyses. However, substantial challenges persist in simultaneously achieving effective integration of complementary information in multi-omics data and their appropriate weight allocation.

Results: Here, we propose an Adaptive Multi-View clustering framework with the Information Bottleneck principle to solve the multi-omics data clustering task (named scAMVIB). The proposed model could learn multi-view omics representations that capture both inter-omics associations and omics-specific patterns, with the adaptive weight allocation. Specifically, multi-view data comprise two components: (i) the integrated omics feature matrix derived from the similarity network fusion strategy and (ii) omics-specific representations from distinct platforms. These inputs are processed through a multi-view information bottleneck clustering framework that leverages cross-view complementarity to enhance representations. View weights are adaptively assigned via maximum entropy regularization, proportional to their information content. The final cell partitions are obtained through sequential iterative optimization. Comprehensive experiments across multiple datasets demonstrate that scAMVIB has strong competitiveness in clustering while maintaining biological interpretability.

动机：单细胞测序的最新进展已经改变了细胞分辨率下基因表达的精确测量，使前所未有的细胞异质性和复杂的生物过程的解剖成为可能。多组学数据的积累为细胞聚集提供了新的途径，这是细胞类型鉴定和下游分析的重要基础。然而，如何同时实现多组学数据中互补信息的有效整合及其适当的权重分配，仍然存在实质性的挑战。结果：本文提出了一种基于信息瓶颈原理的自适应多视图聚类框架（scAMVIB）来解决多组学数据聚类任务。该模型可以学习多视图组学表示，同时捕获组间关联和组特定模式，并具有自适应的权重分配。具体来说，多视图数据包括两个组成部分：(i)来自相似网络融合策略的集成组学特征矩阵和（ii）来自不同平台的组学特定表示。这些输入通过多视图信息瓶颈聚类框架进行处理，该框架利用跨视图互补性来增强表示。视图权重通过最大熵正则化自适应分配，与信息内容成正比。通过序贯迭代优化得到最终单元分区。跨多个数据集的综合实验表明，scAMVIB在保持生物可解释性的同时具有很强的聚类竞争力。

{"title":"Adaptive multi-view information bottleneck for multi-omics data clustering.","authors":"Zhen Tian, Xiaojiao Wei, Zhengzheng Lou, Zhixia Teng, Shouli Fu","doi":"10.1093/bib/bbaf717","DOIUrl":"10.1093/bib/bbaf717","url":null,"abstract":"Motivation: Recent advances in single-cell sequencing have transformed precise measurement of gene expression at cellular resolution, enabling unprecedented dissection of cellular heterogeneity and intricate biological processes. The accumulation of multi-omics data offers new avenues for cell clustering-a critical foundation for cell-type identification and downstream analyses. However, substantial challenges persist in simultaneously achieving effective integration of complementary information in multi-omics data and their appropriate weight allocation.Results: Here, we propose an Adaptive Multi-View clustering framework with the Information Bottleneck principle to solve the multi-omics data clustering task (named scAMVIB). The proposed model could learn multi-view omics representations that capture both inter-omics associations and omics-specific patterns, with the adaptive weight allocation. Specifically, multi-view data comprise two components: (i) the integrated omics feature matrix derived from the similarity network fusion strategy and (ii) omics-specific representations from distinct platforms. These inputs are processed through a multi-view information bottleneck clustering framework that leverages cross-view complementarity to enhance representations. View weights are adaptively assigned via maximum entropy regularization, proportional to their information content. The final cell partitions are obtained through sequential iterative optimization. Comprehensive experiments across multiple datasets demonstrate that scAMVIB has strong competitiveness in clustering while maintaining biological interpretability.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12796825/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145958937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CoBRA: compound binding site prediction using RNA language model. CoBRA：利用RNA语言模型预测化合物结合位点。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf713

Wonkyeong Jang, Woong-Hee Shin

RNA performs a variety of functions within cells and is implicated in various human diseases. Because druggable proteins occupy a small portion of the genome, considerable interest has been increasing in developing drugs targeting RNAs. Thus, precise prediction of small-molecule binding sites across different classes of RNAs is important. In this study, a lightweight deep learning program for predicting RNA-drug binding sites, called compound binding site prediction for RNA (CoBRA), is introduced. Our approach utilizes residue-level embeddings derived from a pre-trained RNA language model, without relying on any structural information. These embeddings encapsulate the contextual and statistical properties of each nucleotide and are used as input for a multi-layer perceptron classifier that performs binary classification of binding nucleotides. The model was trained using the TR60 and HARIBOSS datasets and tested on four independent benchmark sets. The performance of CoBRA demonstrates a relative improvement of 22.1% in the Matthew correlation coefficient and a 45.6% increase in sensitivity compared to existing state-of-the-art RNA-ligand binding site prediction methods that utilize structural information. These results demonstrate that sequence-based language model embeddings, which do not require explicit coordinate or distance information, can match or outperform structure-based methods. This makes it a flexible tool for predicting binding sites across diverse RNA targets.

RNA在细胞内发挥多种功能，并与各种人类疾病有关。由于可药物蛋白只占基因组的一小部分，因此人们对开发靶向rna的药物越来越感兴趣。因此，精确预测不同种类rna的小分子结合位点是很重要的。在本研究中，介绍了一种用于预测RNA-药物结合位点的轻量级深度学习程序，称为RNA化合物结合位点预测（CoBRA）。我们的方法利用来自预训练RNA语言模型的残差级嵌入，而不依赖于任何结构信息。这些嵌入封装了每个核苷酸的上下文和统计属性，并用作多层感知器分类器的输入，该分类器对结合核苷酸进行二元分类。模型使用TR60和HARIBOSS数据集进行训练，并在四个独立的基准集上进行测试。与现有最先进的利用结构信息的rna -配体结合位点预测方法相比，CoBRA的性能显示马修相关系数相对提高了22.1%，灵敏度提高了45.6%。这些结果表明，基于序列的语言模型嵌入不需要明确的坐标或距离信息，可以匹配或优于基于结构的方法。这使得它成为预测不同RNA靶点结合位点的灵活工具。

{"title":"CoBRA: compound binding site prediction using RNA language model.","authors":"Wonkyeong Jang, Woong-Hee Shin","doi":"10.1093/bib/bbaf713","DOIUrl":"10.1093/bib/bbaf713","url":null,"abstract":"RNA performs a variety of functions within cells and is implicated in various human diseases. Because druggable proteins occupy a small portion of the genome, considerable interest has been increasing in developing drugs targeting RNAs. Thus, precise prediction of small-molecule binding sites across different classes of RNAs is important. In this study, a lightweight deep learning program for predicting RNA-drug binding sites, called compound binding site prediction for RNA (CoBRA), is introduced. Our approach utilizes residue-level embeddings derived from a pre-trained RNA language model, without relying on any structural information. These embeddings encapsulate the contextual and statistical properties of each nucleotide and are used as input for a multi-layer perceptron classifier that performs binary classification of binding nucleotides. The model was trained using the TR60 and HARIBOSS datasets and tested on four independent benchmark sets. The performance of CoBRA demonstrates a relative improvement of 22.1% in the Matthew correlation coefficient and a 45.6% increase in sensitivity compared to existing state-of-the-art RNA-ligand binding site prediction methods that utilize structural information. These results demonstrate that sequence-based language model embeddings, which do not require explicit coordinate or distance information, can match or outperform structure-based methods. This makes it a flexible tool for predicting binding sites across diverse RNA targets.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790621/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145948425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ProTCR: a protein language model-driven framework for decoding TCR-antigen recognition toward precision immunotherapies. 蛋白质语言模型驱动的框架，用于解码tcr抗原识别，以实现精确免疫治疗。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf716

Minrui Xu, Manman Lu, Peng Liu, Siwen Zhang, Lanming Chen, Qi Liu, Yong Lin, Lu Xie

The ability of T-cell receptors (TCRs) to recognize neoantigens is fundamental to the initiation and maintenance of adaptive immune responses. In TCR-based immunotherapies, elucidating the recognition patterns of TCRs for peptides and accurately identifying therapeutically relevant TCR-peptide pairs remain critical challenges. Here, we present a novel dual-pathway network model, ProTCR, which integrates the protein language model ProtT5 with deep learning methods. By incorporating both global and local feature extraction mechanisms, ProTCR enables efficient representation of amino acid sequences, thereby enhancing the model's generalizability across diverse data distributions and improving its biological interpretability. ProTCR demonstrates robust performance and broad applicability across various datasets, including neoantigens, previously unseen peptides, and MHC class II-restricted epitopes, overcoming the reliance on known peptide-TCR pairs observed in previous studies. It also offers new insights for predicting diverse classes of antigenic peptides. We applied ProTCR to several clinically relevant scenarios, including immunotherapeutic target identification in acute myeloid leukemia, neoantigen-targeted immunotherapy in solid tumours, and antigen-specific T cell recognition against pathogens such as influenza and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Across these complex settings, ProTCR consistently maintained high accuracy and stability, demonstrating strong cross-task adaptability and broad potential for clinical application. This work not only provides a powerful tool for elucidating immune response mechanisms but also offers a solid computational foundation for the design of neoantigen or TCR based precision immunotherapy strategies.

t细胞受体（TCRs）识别新抗原的能力是启动和维持适应性免疫反应的基础。在基于tcr的免疫治疗中，阐明tcr对肽的识别模式和准确识别治疗相关的tcr -肽对仍然是关键的挑战。在这里，我们提出了一种新的双通路网络模型，ProTCR，它将蛋白质语言模型ProtT5与深度学习方法相结合。通过结合全局和局部特征提取机制，ProTCR能够有效地表示氨基酸序列，从而增强模型在不同数据分布中的通用性，并提高其生物学可解释性。ProTCR在各种数据集上表现出强大的性能和广泛的适用性，包括新抗原、以前未见过的肽和MHC ii类限制性表位，克服了以往研究中对已知肽- tcr对的依赖。它也为预测不同种类的抗原肽提供了新的见解。我们将ProTCR应用于几个临床相关场景，包括急性髓性白血病的免疫治疗靶点识别、实体肿瘤的新抗原靶向免疫治疗，以及针对流感和严重急性呼吸综合征冠状病毒2 （SARS-CoV-2）等病原体的抗原特异性T细胞识别。在这些复杂的环境中，ProTCR始终保持高准确性和稳定性，显示出强大的跨任务适应性和广泛的临床应用潜力。这项工作不仅为阐明免疫反应机制提供了有力的工具，而且为设计基于新抗原或TCR的精确免疫治疗策略提供了坚实的计算基础。

{"title":"ProTCR: a protein language model-driven framework for decoding TCR-antigen recognition toward precision immunotherapies.","authors":"Minrui Xu, Manman Lu, Peng Liu, Siwen Zhang, Lanming Chen, Qi Liu, Yong Lin, Lu Xie","doi":"10.1093/bib/bbaf716","DOIUrl":"10.1093/bib/bbaf716","url":null,"abstract":"The ability of T-cell receptors (TCRs) to recognize neoantigens is fundamental to the initiation and maintenance of adaptive immune responses. In TCR-based immunotherapies, elucidating the recognition patterns of TCRs for peptides and accurately identifying therapeutically relevant TCR-peptide pairs remain critical challenges. Here, we present a novel dual-pathway network model, ProTCR, which integrates the protein language model ProtT5 with deep learning methods. By incorporating both global and local feature extraction mechanisms, ProTCR enables efficient representation of amino acid sequences, thereby enhancing the model's generalizability across diverse data distributions and improving its biological interpretability. ProTCR demonstrates robust performance and broad applicability across various datasets, including neoantigens, previously unseen peptides, and MHC class II-restricted epitopes, overcoming the reliance on known peptide-TCR pairs observed in previous studies. It also offers new insights for predicting diverse classes of antigenic peptides. We applied ProTCR to several clinically relevant scenarios, including immunotherapeutic target identification in acute myeloid leukemia, neoantigen-targeted immunotherapy in solid tumours, and antigen-specific T cell recognition against pathogens such as influenza and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Across these complex settings, ProTCR consistently maintained high accuracy and stability, demonstrating strong cross-task adaptability and broad potential for clinical application. This work not only provides a powerful tool for elucidating immune response mechanisms but also offers a solid computational foundation for the design of neoantigen or TCR based precision immunotherapy strategies.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790622/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145948548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Harnessing AI to fuse phenotypic signatures for drug target identification: progress in computational modeling. 利用人工智能融合药物靶标识别的表型特征：计算建模的进展。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag045

Fengming Chen, Ranran Zhao, Xingxing Han, Huan Li, Zhishu Tang

Computational models integrating large-scale gene expression profiles provide a powerful approach for predicting multi-target drug interactions (DTIs). Unlike traditional experimental and computational methods that often require detailed structural or target-specific information, gene expression-based models leverage reference transcriptional signatures. This enables functional inference of interactions without explicit structural data, offering a valuable strategy in data-limited scenarios. By incorporating phenotypic information, these models bridge phenotype screening and target prediction, establishing a novel paradigm for target identification. This review introduces and compares current target identification methods, emphasizing the unique advantages of gene expression profiling in DTI prediction. We also outline major public databases and their applications. As an effective hypothesis-generation tools, computational DTI models reduce experimental costs, enhance understanding of multi-target mechanisms, and accelerate drug discovery. We categorize and analyze three primary model types utilizing large-scale gene expression data: biological network-based, association-based, and multimodal integration approaches, discussing their respective strengths and limitations. Key challenges and future directions are also addressed, including data integration, algorithm optimization, and multi-omics fusion, to fully realize the potential of gene expression data in multi-target drug prediction. This review offers comprehensive guidance on advanced tools, databases, and methodologies, enabling novel research paths for unbiased multi-target exploration. By linking phenotype screening with computational analysis, this integrative approach is expected to advance precision medicine, especially in uncovering drug mechanisms in complex diseases, offering promising prospects.

整合大规模基因表达谱的计算模型为预测多靶点药物相互作用（DTIs）提供了一种强大的方法。与传统的实验和计算方法不同，这些方法通常需要详细的结构或目标特异性信息，基于基因表达的模型利用参考转录特征。这允许在没有显式结构数据的情况下对交互进行功能推断，在数据有限的场景中提供了有价值的策略。通过结合表型信息，这些模型将表型筛选和靶标预测联系起来，建立了一种新的靶标识别范式。本文对目前的靶点鉴定方法进行了介绍和比较，强调了基因表达谱在DTI预测中的独特优势。我们还概述了主要的公共数据库及其应用。计算DTI模型作为一种有效的假设生成工具，降低了实验成本，增强了对多靶点机制的理解，加速了药物的发现。我们利用大规模基因表达数据对三种主要的模型类型进行了分类和分析：基于生物网络的、基于关联的和多模态集成的方法，并讨论了它们各自的优势和局限性。提出了数据整合、算法优化、多组学融合等关键挑战和未来发展方向，以充分发挥基因表达数据在多靶点药物预测中的潜力。这篇综述为先进的工具、数据库和方法提供了全面的指导，为公正的多目标探索提供了新的研究途径。通过将表型筛选与计算分析相结合，这种综合方法有望推进精准医学，特别是在揭示复杂疾病的药物机制方面，具有广阔的前景。

{"title":"Harnessing AI to fuse phenotypic signatures for drug target identification: progress in computational modeling.","authors":"Fengming Chen, Ranran Zhao, Xingxing Han, Huan Li, Zhishu Tang","doi":"10.1093/bib/bbag045","DOIUrl":"10.1093/bib/bbag045","url":null,"abstract":"Computational models integrating large-scale gene expression profiles provide a powerful approach for predicting multi-target drug interactions (DTIs). Unlike traditional experimental and computational methods that often require detailed structural or target-specific information, gene expression-based models leverage reference transcriptional signatures. This enables functional inference of interactions without explicit structural data, offering a valuable strategy in data-limited scenarios. By incorporating phenotypic information, these models bridge phenotype screening and target prediction, establishing a novel paradigm for target identification. This review introduces and compares current target identification methods, emphasizing the unique advantages of gene expression profiling in DTI prediction. We also outline major public databases and their applications. As an effective hypothesis-generation tools, computational DTI models reduce experimental costs, enhance understanding of multi-target mechanisms, and accelerate drug discovery. We categorize and analyze three primary model types utilizing large-scale gene expression data: biological network-based, association-based, and multimodal integration approaches, discussing their respective strengths and limitations. Key challenges and future directions are also addressed, including data integration, algorithm optimization, and multi-omics fusion, to fully realize the potential of gene expression data in multi-target drug prediction. This review offers comprehensive guidance on advanced tools, databases, and methodologies, enabling novel research paths for unbiased multi-target exploration. By linking phenotype screening with computational analysis, this integrative approach is expected to advance precision medicine, especially in uncovering drug mechanisms in complex diseases, offering promising prospects.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12885100/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146149095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BiGvCL: bipartite graph-based cross-domain contrastive learning model for the predicting drug-gene interactions. BiGvCL：基于二部图的药物-基因相互作用预测跨域对比学习模型。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf710

Shida He, Zixu Wang, Jing Li, Quan Zou, Feng Zhang

Drug-gene interactions (DGIs) influence the toxicity or ineffectiveness of the drug therapy and play an important role in elucidating drug mechanisms, predicting potential adverse effects, and facilitating precision medicine. Existing computational methods typically rely on chemical or genetic sequence features of drugs and genes, limiting their effectiveness for novel entities lacking explicit annotations. To address this, we propose BiGvCL, a framework that predicts DGIs exclusively based on network topology, requiring no explicit feature information for drugs or genes. BiGvCL introduces a lightweight graph attention mechanism (GATLite) to efficiently aggregate local neighborhood information. Additionally, we develop a gated graph convolutional network (GatedGCN) to explicitly learn high-order interactions between drugs and genes, further integrating contrastive learning to enhance the model's generalizability. Comprehensive experiments on DrugBank and DGIdb datasets show that BiGvCL achieves competitive performance across all metrics compared with representative baselines. Cross-domain evaluations on OGB datasets further confirm its adaptability to heterogeneous biomedical networks. Ablation and hyperparameter analyses highlight the key contributions of contrastive and gated mechanisms, while case studies and molecular docking provide supporting evidence for the biological relevance of predictions. Collectively, while BiGvCL is constrained by its reliance on network topology and transductive learning paradigm, it demonstrates the potential of topology-based approaches for discovering novel drug-gene interactions, which may inform drug repurposing and precision medicine efforts.

药物-基因相互作用（dgi）影响药物治疗的毒性或无效，在阐明药物机制、预测潜在不良反应和促进精准医学方面发挥着重要作用。现有的计算方法通常依赖于药物和基因的化学或基因序列特征，限制了它们对缺乏明确注释的新实体的有效性。为了解决这个问题，我们提出了BiGvCL，这是一个完全基于网络拓扑预测dgi的框架，不需要药物或基因的明确特征信息。BiGvCL引入了一种轻量级的图关注机制（GATLite）来有效地聚合局部邻域信息。此外，我们开发了一个门控图卷积网络（GatedGCN）来明确学习药物和基因之间的高阶相互作用，进一步整合对比学习以增强模型的可泛化性。在DrugBank和DGIdb数据集上的综合实验表明，与代表性基线相比，BiGvCL在所有指标上都实现了具有竞争力的性能。对OGB数据集的跨域评价进一步证实了其对异构生物医学网络的适应性。消融和超参数分析强调了对比和门控机制的关键贡献，而案例研究和分子对接为预测的生物学相关性提供了支持证据。总的来说，虽然BiGvCL受限于其对网络拓扑和转导学习范式的依赖，但它证明了基于拓扑的方法在发现新的药物-基因相互作用方面的潜力，这可能为药物再利用和精准医学工作提供信息。

{"title":"BiGvCL: bipartite graph-based cross-domain contrastive learning model for the predicting drug-gene interactions.","authors":"Shida He, Zixu Wang, Jing Li, Quan Zou, Feng Zhang","doi":"10.1093/bib/bbaf710","DOIUrl":"10.1093/bib/bbaf710","url":null,"abstract":"Drug-gene interactions (DGIs) influence the toxicity or ineffectiveness of the drug therapy and play an important role in elucidating drug mechanisms, predicting potential adverse effects, and facilitating precision medicine. Existing computational methods typically rely on chemical or genetic sequence features of drugs and genes, limiting their effectiveness for novel entities lacking explicit annotations. To address this, we propose BiGvCL, a framework that predicts DGIs exclusively based on network topology, requiring no explicit feature information for drugs or genes. BiGvCL introduces a lightweight graph attention mechanism (GATLite) to efficiently aggregate local neighborhood information. Additionally, we develop a gated graph convolutional network (GatedGCN) to explicitly learn high-order interactions between drugs and genes, further integrating contrastive learning to enhance the model's generalizability. Comprehensive experiments on DrugBank and DGIdb datasets show that BiGvCL achieves competitive performance across all metrics compared with representative baselines. Cross-domain evaluations on OGB datasets further confirm its adaptability to heterogeneous biomedical networks. Ablation and hyperparameter analyses highlight the key contributions of contrastive and gated mechanisms, while case studies and molecular docking provide supporting evidence for the biological relevance of predictions. Collectively, while BiGvCL is constrained by its reliance on network topology and transductive learning paradigm, it demonstrates the potential of topology-based approaches for discovering novel drug-gene interactions, which may inform drug repurposing and precision medicine efforts.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12848949/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146060084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0