{"title":"Re: Qi et al. \"A roadmap for T cell receptor-peptide-MHC binding prediction by machine learning: glimpse and foresight\" (Briefings in Bioinformatics, 2025).","authors":"Cedric Ly, Stefan Bonn, Immo Prinz","doi":"10.1093/bib/bbag032","DOIUrl":"10.1093/bib/bbag032","url":null,"abstract":"","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12874877/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146123833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Messenger RNA (mRNA) vaccines have revolutionized vaccinology with their rapid development cycles and adaptability, yet their broad application is constrained by unresolved challenges in balancing mRNA structural stability and translational efficiency. Here, we introduce a groundbreaking multi-seed searching algorithm for mRNA codon optimization, an innovative framework that synergistically co-optimizes minimum free energy and codon adaptation index through adaptive integration of simulated annealing and genetic algorithms. This novel approach enhances global search capability to escape local optima, a critical limitation of existing tools. Evaluations across long therapeutic mRNA sequences and short peptides (neoantigens from bladder cancer and melanoma) reveal our algorithm outperforms state-of-the-art LinearDesign, delivering superior balanced improvements in both stability and translational efficiency validating its unique ability to navigate the inherent trade-offs between these two key metrics. Built on this algorithm, the Optiseed platform introduces transformative features including customizable scoring functions, flexible parameters for tailored optimization, and support for integrating untranslated regions (UTRs), poly(A) tails, and other elements to enable end-to-end vaccine construct design. This innovation addresses the rigidity of conventional tools, empowering precise, context-specific optimization. Optiseed represents a robust, scalable solution for mRNA vaccine codon optimization. Its superior performance across diverse sequences underscores its potential to accelerate mRNA-based therapeutic development, particularly in personalized cancer immunotherapy, while offering a framework adaptable for other applications such as infectious disease vaccine design.
{"title":"Multi-seed searching algorithm for integrated codon optimization of mRNA stability and translational efficiency in vaccine design.","authors":"Yuhan Bo, Bingxin Liu, Shengyu Huang, Yanwei Liu, Libin Deng, Dake Zhang, Jing Zhang","doi":"10.1093/bib/bbag047","DOIUrl":"10.1093/bib/bbag047","url":null,"abstract":"<p><p>Messenger RNA (mRNA) vaccines have revolutionized vaccinology with their rapid development cycles and adaptability, yet their broad application is constrained by unresolved challenges in balancing mRNA structural stability and translational efficiency. Here, we introduce a groundbreaking multi-seed searching algorithm for mRNA codon optimization, an innovative framework that synergistically co-optimizes minimum free energy and codon adaptation index through adaptive integration of simulated annealing and genetic algorithms. This novel approach enhances global search capability to escape local optima, a critical limitation of existing tools. Evaluations across long therapeutic mRNA sequences and short peptides (neoantigens from bladder cancer and melanoma) reveal our algorithm outperforms state-of-the-art LinearDesign, delivering superior balanced improvements in both stability and translational efficiency validating its unique ability to navigate the inherent trade-offs between these two key metrics. Built on this algorithm, the Optiseed platform introduces transformative features including customizable scoring functions, flexible parameters for tailored optimization, and support for integrating untranslated regions (UTRs), poly(A) tails, and other elements to enable end-to-end vaccine construct design. This innovation addresses the rigidity of conventional tools, empowering precise, context-specific optimization. Optiseed represents a robust, scalable solution for mRNA vaccine codon optimization. Its superior performance across diverse sequences underscores its potential to accelerate mRNA-based therapeutic development, particularly in personalized cancer immunotherapy, while offering a framework adaptable for other applications such as infectious disease vaccine design.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12885097/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146149172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chia-Chen Chu, Jhong-He Yu, Shang-Che Kuo, Fan-Wei Yang, Chia-Chang Lin, Chang-Hung Chen, Yi-Chen Wu, Cing Shih, Ying-Hsuan Sun, Te-Lun Mai, Ying-Lan Chen, Hsin-Hung Lin, Jung-Chen Su, Ying-Chung Jimmy Lin
NanoPrePro is a streamlined read preprocessor specifically designed for high precision in identifying full-length reads from Oxford Nanopore Technology (ONT) transcriptomic sequencing results, achieved through the precise identification of adapters/primers. However, the preprocessing of ONT reads has been a long-term neglected and ambiguous area without thorough and systematic investigation. Here, we developed NanoPrePro that outperformed the current best preprocessor, Pychopper, using simulated and real datasets. Through sequence similarity, adapter/primer location, and adapter/primer length, NanoPrePro exerted a self-optimizing function to extract the best parameters in each sequencing file for users to customize their analyses. Furthermore, NanoPrePro shows a 38-times faster speed with less memory cost. NanoPrePro can be regarded as the state-of-the-art preprocessor with forward adaptability of ONT sequencing.
{"title":"NanoPrePro: a fully equipped, fast, and memory-efficient preprocessor for nanopore transcriptomic sequencing.","authors":"Chia-Chen Chu, Jhong-He Yu, Shang-Che Kuo, Fan-Wei Yang, Chia-Chang Lin, Chang-Hung Chen, Yi-Chen Wu, Cing Shih, Ying-Hsuan Sun, Te-Lun Mai, Ying-Lan Chen, Hsin-Hung Lin, Jung-Chen Su, Ying-Chung Jimmy Lin","doi":"10.1093/bib/bbag063","DOIUrl":"10.1093/bib/bbag063","url":null,"abstract":"<p><p>NanoPrePro is a streamlined read preprocessor specifically designed for high precision in identifying full-length reads from Oxford Nanopore Technology (ONT) transcriptomic sequencing results, achieved through the precise identification of adapters/primers. However, the preprocessing of ONT reads has been a long-term neglected and ambiguous area without thorough and systematic investigation. Here, we developed NanoPrePro that outperformed the current best preprocessor, Pychopper, using simulated and real datasets. Through sequence similarity, adapter/primer location, and adapter/primer length, NanoPrePro exerted a self-optimizing function to extract the best parameters in each sequencing file for users to customize their analyses. Furthermore, NanoPrePro shows a 38-times faster speed with less memory cost. NanoPrePro can be regarded as the state-of-the-art preprocessor with forward adaptability of ONT sequencing.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12903951/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146194110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xunuo Zhu, Wenyi Zhao, Siqi Wang, Jingwen Yang, Jingqi Zhou, Binbin Zhou, Ji Cao, Bo Yang, Zhan Zhou, Xun Gu
Cancer development is driven by somatic evolution and clonal selection. However, traditional selective pressure analysis methods have treated all sites within a gene equally, such a gene-level model oversimplifies the complexity of cancer evolution. In this study, we introduced CN/CS-calculator, a novel site-specific method that can capture selective pressures acting across different gene sites. By deciphering the interplay between the selection pattern and the function of a gene in oncogenesis, CN/CS-calculator uncovers a unique class of mini-driver genes, which exhibit weak positive selection, with certain critical sites providing context-dependent promoter effects on the fitness of cancer subclones while others are constrained by evolutionary conservation. Our method emphasizes the importance of site-specific analysis in uncovering how subtle evolutionary forces shape cancer biology. The refined understanding offers new insights into the mechanisms of cancer heterogeneity and molecular evolution, with potential implications for advancing therapeutic strategies and prognostic assessments.
{"title":"Identification of cancer mini-drivers by deciphering selective landscape in the cancer genome.","authors":"Xunuo Zhu, Wenyi Zhao, Siqi Wang, Jingwen Yang, Jingqi Zhou, Binbin Zhou, Ji Cao, Bo Yang, Zhan Zhou, Xun Gu","doi":"10.1093/bib/bbaf694","DOIUrl":"10.1093/bib/bbaf694","url":null,"abstract":"<p><p>Cancer development is driven by somatic evolution and clonal selection. However, traditional selective pressure analysis methods have treated all sites within a gene equally, such a gene-level model oversimplifies the complexity of cancer evolution. In this study, we introduced CN/CS-calculator, a novel site-specific method that can capture selective pressures acting across different gene sites. By deciphering the interplay between the selection pattern and the function of a gene in oncogenesis, CN/CS-calculator uncovers a unique class of mini-driver genes, which exhibit weak positive selection, with certain critical sites providing context-dependent promoter effects on the fitness of cancer subclones while others are constrained by evolutionary conservation. Our method emphasizes the importance of site-specific analysis in uncovering how subtle evolutionary forces shape cancer biology. The refined understanding offers new insights into the mechanisms of cancer heterogeneity and molecular evolution, with potential implications for advancing therapeutic strategies and prognostic assessments.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12784965/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145932212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Identifying transcription factors (TFs) responsible for gene expression changes remain a central challenge in functional genomics. TFEA.ChIP is a ChIP-seq-based TF enrichment analysis tool that addresses this by linking TF binding profiles to differentially expressed genes through experimentally supported cis-regulatory element (CRE)-gene associations. Unlike motif- or heuristic-based approaches, TFEA.ChIP adopts a biologically grounded strategy by intersecting TF binding data from ReMap2022 with regulatory maps from ENCODE's rE2G and CREdb. To overcome the high context-specificity of rE2G associations, we developed filtering strategies based on confidence scores and recurrence across biosamples. Benchmarking on 342 curated gene sets from the Molecular Signatures Database C2 CGP collection showed that recurrence-based filtering significantly improved accuracy, outperforming the original GeneHancer-based implementation and leading tools including BARTv2.0, Lisa, ChEA3, and HOMER. A case study on hypoxia further validated the method, demonstrating accurate and pathway-specific enrichment of hypoxia-inducible factor-related TFs using both overrepresentation analysis and gene set enrichment analysis. Additionally, the updated implementation of TFEA.ChIP in R/Bioconductor introduces several user-friendly features, including automated analysis workflows and expression-based filtering of candidate TFs. These additions streamline the integration of TFEA.ChIP into standard RNA-seq analysis pipelines, enabling more efficient and reproducible workflows. Together with its strong benchmarking performance and biologically grounded framework, the updated tool provides a robust and accessible solution for inferring transcriptional regulators from gene expression data.
{"title":"Enhancing TFEA.ChIP with ENCODE regulatory maps for generalizable transcription factor enrichment.","authors":"Yosra Berrouayel, Luis Del Peso","doi":"10.1093/bib/bbaf715","DOIUrl":"10.1093/bib/bbaf715","url":null,"abstract":"<p><p>Identifying transcription factors (TFs) responsible for gene expression changes remain a central challenge in functional genomics. TFEA.ChIP is a ChIP-seq-based TF enrichment analysis tool that addresses this by linking TF binding profiles to differentially expressed genes through experimentally supported cis-regulatory element (CRE)-gene associations. Unlike motif- or heuristic-based approaches, TFEA.ChIP adopts a biologically grounded strategy by intersecting TF binding data from ReMap2022 with regulatory maps from ENCODE's rE2G and CREdb. To overcome the high context-specificity of rE2G associations, we developed filtering strategies based on confidence scores and recurrence across biosamples. Benchmarking on 342 curated gene sets from the Molecular Signatures Database C2 CGP collection showed that recurrence-based filtering significantly improved accuracy, outperforming the original GeneHancer-based implementation and leading tools including BARTv2.0, Lisa, ChEA3, and HOMER. A case study on hypoxia further validated the method, demonstrating accurate and pathway-specific enrichment of hypoxia-inducible factor-related TFs using both overrepresentation analysis and gene set enrichment analysis. Additionally, the updated implementation of TFEA.ChIP in R/Bioconductor introduces several user-friendly features, including automated analysis workflows and expression-based filtering of candidate TFs. These additions streamline the integration of TFEA.ChIP into standard RNA-seq analysis pipelines, enabling more efficient and reproducible workflows. Together with its strong benchmarking performance and biologically grounded framework, the updated tool provides a robust and accessible solution for inferring transcriptional regulators from gene expression data.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12796816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145958912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhen Tian, Xiaojiao Wei, Zhengzheng Lou, Zhixia Teng, Shouli Fu
Motivation: Recent advances in single-cell sequencing have transformed precise measurement of gene expression at cellular resolution, enabling unprecedented dissection of cellular heterogeneity and intricate biological processes. The accumulation of multi-omics data offers new avenues for cell clustering-a critical foundation for cell-type identification and downstream analyses. However, substantial challenges persist in simultaneously achieving effective integration of complementary information in multi-omics data and their appropriate weight allocation.
Results: Here, we propose an Adaptive Multi-View clustering framework with the Information Bottleneck principle to solve the multi-omics data clustering task (named scAMVIB). The proposed model could learn multi-view omics representations that capture both inter-omics associations and omics-specific patterns, with the adaptive weight allocation. Specifically, multi-view data comprise two components: (i) the integrated omics feature matrix derived from the similarity network fusion strategy and (ii) omics-specific representations from distinct platforms. These inputs are processed through a multi-view information bottleneck clustering framework that leverages cross-view complementarity to enhance representations. View weights are adaptively assigned via maximum entropy regularization, proportional to their information content. The final cell partitions are obtained through sequential iterative optimization. Comprehensive experiments across multiple datasets demonstrate that scAMVIB has strong competitiveness in clustering while maintaining biological interpretability.
{"title":"Adaptive multi-view information bottleneck for multi-omics data clustering.","authors":"Zhen Tian, Xiaojiao Wei, Zhengzheng Lou, Zhixia Teng, Shouli Fu","doi":"10.1093/bib/bbaf717","DOIUrl":"10.1093/bib/bbaf717","url":null,"abstract":"<p><strong>Motivation: </strong>Recent advances in single-cell sequencing have transformed precise measurement of gene expression at cellular resolution, enabling unprecedented dissection of cellular heterogeneity and intricate biological processes. The accumulation of multi-omics data offers new avenues for cell clustering-a critical foundation for cell-type identification and downstream analyses. However, substantial challenges persist in simultaneously achieving effective integration of complementary information in multi-omics data and their appropriate weight allocation.</p><p><strong>Results: </strong>Here, we propose an Adaptive Multi-View clustering framework with the Information Bottleneck principle to solve the multi-omics data clustering task (named scAMVIB). The proposed model could learn multi-view omics representations that capture both inter-omics associations and omics-specific patterns, with the adaptive weight allocation. Specifically, multi-view data comprise two components: (i) the integrated omics feature matrix derived from the similarity network fusion strategy and (ii) omics-specific representations from distinct platforms. These inputs are processed through a multi-view information bottleneck clustering framework that leverages cross-view complementarity to enhance representations. View weights are adaptively assigned via maximum entropy regularization, proportional to their information content. The final cell partitions are obtained through sequential iterative optimization. Comprehensive experiments across multiple datasets demonstrate that scAMVIB has strong competitiveness in clustering while maintaining biological interpretability.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12796825/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145958937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RNA performs a variety of functions within cells and is implicated in various human diseases. Because druggable proteins occupy a small portion of the genome, considerable interest has been increasing in developing drugs targeting RNAs. Thus, precise prediction of small-molecule binding sites across different classes of RNAs is important. In this study, a lightweight deep learning program for predicting RNA-drug binding sites, called compound binding site prediction for RNA (CoBRA), is introduced. Our approach utilizes residue-level embeddings derived from a pre-trained RNA language model, without relying on any structural information. These embeddings encapsulate the contextual and statistical properties of each nucleotide and are used as input for a multi-layer perceptron classifier that performs binary classification of binding nucleotides. The model was trained using the TR60 and HARIBOSS datasets and tested on four independent benchmark sets. The performance of CoBRA demonstrates a relative improvement of 22.1% in the Matthew correlation coefficient and a 45.6% increase in sensitivity compared to existing state-of-the-art RNA-ligand binding site prediction methods that utilize structural information. These results demonstrate that sequence-based language model embeddings, which do not require explicit coordinate or distance information, can match or outperform structure-based methods. This makes it a flexible tool for predicting binding sites across diverse RNA targets.
{"title":"CoBRA: compound binding site prediction using RNA language model.","authors":"Wonkyeong Jang, Woong-Hee Shin","doi":"10.1093/bib/bbaf713","DOIUrl":"10.1093/bib/bbaf713","url":null,"abstract":"<p><p>RNA performs a variety of functions within cells and is implicated in various human diseases. Because druggable proteins occupy a small portion of the genome, considerable interest has been increasing in developing drugs targeting RNAs. Thus, precise prediction of small-molecule binding sites across different classes of RNAs is important. In this study, a lightweight deep learning program for predicting RNA-drug binding sites, called compound binding site prediction for RNA (CoBRA), is introduced. Our approach utilizes residue-level embeddings derived from a pre-trained RNA language model, without relying on any structural information. These embeddings encapsulate the contextual and statistical properties of each nucleotide and are used as input for a multi-layer perceptron classifier that performs binary classification of binding nucleotides. The model was trained using the TR60 and HARIBOSS datasets and tested on four independent benchmark sets. The performance of CoBRA demonstrates a relative improvement of 22.1% in the Matthew correlation coefficient and a 45.6% increase in sensitivity compared to existing state-of-the-art RNA-ligand binding site prediction methods that utilize structural information. These results demonstrate that sequence-based language model embeddings, which do not require explicit coordinate or distance information, can match or outperform structure-based methods. This makes it a flexible tool for predicting binding sites across diverse RNA targets.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790621/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145948425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ability of T-cell receptors (TCRs) to recognize neoantigens is fundamental to the initiation and maintenance of adaptive immune responses. In TCR-based immunotherapies, elucidating the recognition patterns of TCRs for peptides and accurately identifying therapeutically relevant TCR-peptide pairs remain critical challenges. Here, we present a novel dual-pathway network model, ProTCR, which integrates the protein language model ProtT5 with deep learning methods. By incorporating both global and local feature extraction mechanisms, ProTCR enables efficient representation of amino acid sequences, thereby enhancing the model's generalizability across diverse data distributions and improving its biological interpretability. ProTCR demonstrates robust performance and broad applicability across various datasets, including neoantigens, previously unseen peptides, and MHC class II-restricted epitopes, overcoming the reliance on known peptide-TCR pairs observed in previous studies. It also offers new insights for predicting diverse classes of antigenic peptides. We applied ProTCR to several clinically relevant scenarios, including immunotherapeutic target identification in acute myeloid leukemia, neoantigen-targeted immunotherapy in solid tumours, and antigen-specific T cell recognition against pathogens such as influenza and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Across these complex settings, ProTCR consistently maintained high accuracy and stability, demonstrating strong cross-task adaptability and broad potential for clinical application. This work not only provides a powerful tool for elucidating immune response mechanisms but also offers a solid computational foundation for the design of neoantigen or TCR based precision immunotherapy strategies.
{"title":"ProTCR: a protein language model-driven framework for decoding TCR-antigen recognition toward precision immunotherapies.","authors":"Minrui Xu, Manman Lu, Peng Liu, Siwen Zhang, Lanming Chen, Qi Liu, Yong Lin, Lu Xie","doi":"10.1093/bib/bbaf716","DOIUrl":"10.1093/bib/bbaf716","url":null,"abstract":"<p><p>The ability of T-cell receptors (TCRs) to recognize neoantigens is fundamental to the initiation and maintenance of adaptive immune responses. In TCR-based immunotherapies, elucidating the recognition patterns of TCRs for peptides and accurately identifying therapeutically relevant TCR-peptide pairs remain critical challenges. Here, we present a novel dual-pathway network model, ProTCR, which integrates the protein language model ProtT5 with deep learning methods. By incorporating both global and local feature extraction mechanisms, ProTCR enables efficient representation of amino acid sequences, thereby enhancing the model's generalizability across diverse data distributions and improving its biological interpretability. ProTCR demonstrates robust performance and broad applicability across various datasets, including neoantigens, previously unseen peptides, and MHC class II-restricted epitopes, overcoming the reliance on known peptide-TCR pairs observed in previous studies. It also offers new insights for predicting diverse classes of antigenic peptides. We applied ProTCR to several clinically relevant scenarios, including immunotherapeutic target identification in acute myeloid leukemia, neoantigen-targeted immunotherapy in solid tumours, and antigen-specific T cell recognition against pathogens such as influenza and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Across these complex settings, ProTCR consistently maintained high accuracy and stability, demonstrating strong cross-task adaptability and broad potential for clinical application. This work not only provides a powerful tool for elucidating immune response mechanisms but also offers a solid computational foundation for the design of neoantigen or TCR based precision immunotherapy strategies.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790622/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145948548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Computational models integrating large-scale gene expression profiles provide a powerful approach for predicting multi-target drug interactions (DTIs). Unlike traditional experimental and computational methods that often require detailed structural or target-specific information, gene expression-based models leverage reference transcriptional signatures. This enables functional inference of interactions without explicit structural data, offering a valuable strategy in data-limited scenarios. By incorporating phenotypic information, these models bridge phenotype screening and target prediction, establishing a novel paradigm for target identification. This review introduces and compares current target identification methods, emphasizing the unique advantages of gene expression profiling in DTI prediction. We also outline major public databases and their applications. As an effective hypothesis-generation tools, computational DTI models reduce experimental costs, enhance understanding of multi-target mechanisms, and accelerate drug discovery. We categorize and analyze three primary model types utilizing large-scale gene expression data: biological network-based, association-based, and multimodal integration approaches, discussing their respective strengths and limitations. Key challenges and future directions are also addressed, including data integration, algorithm optimization, and multi-omics fusion, to fully realize the potential of gene expression data in multi-target drug prediction. This review offers comprehensive guidance on advanced tools, databases, and methodologies, enabling novel research paths for unbiased multi-target exploration. By linking phenotype screening with computational analysis, this integrative approach is expected to advance precision medicine, especially in uncovering drug mechanisms in complex diseases, offering promising prospects.
{"title":"Harnessing AI to fuse phenotypic signatures for drug target identification: progress in computational modeling.","authors":"Fengming Chen, Ranran Zhao, Xingxing Han, Huan Li, Zhishu Tang","doi":"10.1093/bib/bbag045","DOIUrl":"10.1093/bib/bbag045","url":null,"abstract":"<p><p>Computational models integrating large-scale gene expression profiles provide a powerful approach for predicting multi-target drug interactions (DTIs). Unlike traditional experimental and computational methods that often require detailed structural or target-specific information, gene expression-based models leverage reference transcriptional signatures. This enables functional inference of interactions without explicit structural data, offering a valuable strategy in data-limited scenarios. By incorporating phenotypic information, these models bridge phenotype screening and target prediction, establishing a novel paradigm for target identification. This review introduces and compares current target identification methods, emphasizing the unique advantages of gene expression profiling in DTI prediction. We also outline major public databases and their applications. As an effective hypothesis-generation tools, computational DTI models reduce experimental costs, enhance understanding of multi-target mechanisms, and accelerate drug discovery. We categorize and analyze three primary model types utilizing large-scale gene expression data: biological network-based, association-based, and multimodal integration approaches, discussing their respective strengths and limitations. Key challenges and future directions are also addressed, including data integration, algorithm optimization, and multi-omics fusion, to fully realize the potential of gene expression data in multi-target drug prediction. This review offers comprehensive guidance on advanced tools, databases, and methodologies, enabling novel research paths for unbiased multi-target exploration. By linking phenotype screening with computational analysis, this integrative approach is expected to advance precision medicine, especially in uncovering drug mechanisms in complex diseases, offering promising prospects.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12885100/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146149095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shida He, Zixu Wang, Jing Li, Quan Zou, Feng Zhang
Drug-gene interactions (DGIs) influence the toxicity or ineffectiveness of the drug therapy and play an important role in elucidating drug mechanisms, predicting potential adverse effects, and facilitating precision medicine. Existing computational methods typically rely on chemical or genetic sequence features of drugs and genes, limiting their effectiveness for novel entities lacking explicit annotations. To address this, we propose BiGvCL, a framework that predicts DGIs exclusively based on network topology, requiring no explicit feature information for drugs or genes. BiGvCL introduces a lightweight graph attention mechanism (GATLite) to efficiently aggregate local neighborhood information. Additionally, we develop a gated graph convolutional network (GatedGCN) to explicitly learn high-order interactions between drugs and genes, further integrating contrastive learning to enhance the model's generalizability. Comprehensive experiments on DrugBank and DGIdb datasets show that BiGvCL achieves competitive performance across all metrics compared with representative baselines. Cross-domain evaluations on OGB datasets further confirm its adaptability to heterogeneous biomedical networks. Ablation and hyperparameter analyses highlight the key contributions of contrastive and gated mechanisms, while case studies and molecular docking provide supporting evidence for the biological relevance of predictions. Collectively, while BiGvCL is constrained by its reliance on network topology and transductive learning paradigm, it demonstrates the potential of topology-based approaches for discovering novel drug-gene interactions, which may inform drug repurposing and precision medicine efforts.
{"title":"BiGvCL: bipartite graph-based cross-domain contrastive learning model for the predicting drug-gene interactions.","authors":"Shida He, Zixu Wang, Jing Li, Quan Zou, Feng Zhang","doi":"10.1093/bib/bbaf710","DOIUrl":"10.1093/bib/bbaf710","url":null,"abstract":"<p><p>Drug-gene interactions (DGIs) influence the toxicity or ineffectiveness of the drug therapy and play an important role in elucidating drug mechanisms, predicting potential adverse effects, and facilitating precision medicine. Existing computational methods typically rely on chemical or genetic sequence features of drugs and genes, limiting their effectiveness for novel entities lacking explicit annotations. To address this, we propose BiGvCL, a framework that predicts DGIs exclusively based on network topology, requiring no explicit feature information for drugs or genes. BiGvCL introduces a lightweight graph attention mechanism (GATLite) to efficiently aggregate local neighborhood information. Additionally, we develop a gated graph convolutional network (GatedGCN) to explicitly learn high-order interactions between drugs and genes, further integrating contrastive learning to enhance the model's generalizability. Comprehensive experiments on DrugBank and DGIdb datasets show that BiGvCL achieves competitive performance across all metrics compared with representative baselines. Cross-domain evaluations on OGB datasets further confirm its adaptability to heterogeneous biomedical networks. Ablation and hyperparameter analyses highlight the key contributions of contrastive and gated mechanisms, while case studies and molecular docking provide supporting evidence for the biological relevance of predictions. Collectively, while BiGvCL is constrained by its reliance on network topology and transductive learning paradigm, it demonstrates the potential of topology-based approaches for discovering novel drug-gene interactions, which may inform drug repurposing and precision medicine efforts.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12848949/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146060084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}