Pub Date : 2026-01-14DOI: 10.1093/bioinformatics/btag007
Alexandra M Wong, Cecile Meier-Scherling, Lorin Crawford
Motivation: Predicting synergistic cancer drug combinations through computational methods offers a scalable approach to creating therapies that are more effective and less toxic. However, most algorithms focus solely on synergy without considering toxicity when selecting optimal drug combinations. In the absence of combinatorial toxicity assays, a few models use toxicity penalties to balance high synergy with lower toxicity. Still, these penalties have not been explicitly validated against known drug-drug interactions.
Results: In this study, we examine whether synergy scores and toxicity metrics correlate with known adverse drug interactions. While some metrics show trends with toxicity levels, our results reveal significant limitations in using them as penalties. These findings highlight the challenges of incorporating toxicity into synergy prediction frameworks and suggest that advancing the field requires more comprehensive combination toxicity data.
Availability and implementation: The code written for this project is available at https://github.com/amw14/toxicity-cancer-drug-combination.
{"title":"Characterizing Clinical Toxicity in Cancer Combination Therapies.","authors":"Alexandra M Wong, Cecile Meier-Scherling, Lorin Crawford","doi":"10.1093/bioinformatics/btag007","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag007","url":null,"abstract":"<p><strong>Motivation: </strong>Predicting synergistic cancer drug combinations through computational methods offers a scalable approach to creating therapies that are more effective and less toxic. However, most algorithms focus solely on synergy without considering toxicity when selecting optimal drug combinations. In the absence of combinatorial toxicity assays, a few models use toxicity penalties to balance high synergy with lower toxicity. Still, these penalties have not been explicitly validated against known drug-drug interactions.</p><p><strong>Results: </strong>In this study, we examine whether synergy scores and toxicity metrics correlate with known adverse drug interactions. While some metrics show trends with toxicity levels, our results reveal significant limitations in using them as penalties. These findings highlight the challenges of incorporating toxicity into synergy prediction frameworks and suggest that advancing the field requires more comprehensive combination toxicity data.</p><p><strong>Availability and implementation: </strong>The code written for this project is available at https://github.com/amw14/toxicity-cancer-drug-combination.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1093/bioinformatics/btag008
Adriana Carolina Gonzalez-Cavazos, Roger Tu, Meghamala Sinha, Andrew I Su
Drug repositioning offers a cost-effective alternative to traditional drug development by identifying new uses for existing drugs. Recent advances leverage Graph Neural Networks (GNN) to model complex biological data, showing promise in predicting novel drug-disease associations. However, these frameworks often lack explainability, a critical factor for validating predictions and understanding drug mechanisms. Here, we introduce Drug-Based Reasoning Explainer (DBR-X), an explainable GNN model that combines a link prediction module and a path-identification module to generate interpretable and faithful explanations. When benchmarked against other GNN link prediction frameworks, DBR-X achieves superior performance in identifying known drug-disease associations, demonstrating higher accuracy across all evaluation metrics. The quality of DBR-X biological explanations was assessed through multiple approaches: comparison with manually-curated drug mechanisms, evaluation of explanation faithfulness through deletion and insertion studies, and measurement of stability under graph perturbations. Together, our model not only advances the state-of-the-art in drug repositioning predictions but also provides multi-hop explanations that can accelerate the translation of computational predictions into clinical applications.
{"title":"A Case-Based Explainable Graph Neural Network Framework for Mechanistic Drug Repositioning.","authors":"Adriana Carolina Gonzalez-Cavazos, Roger Tu, Meghamala Sinha, Andrew I Su","doi":"10.1093/bioinformatics/btag008","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag008","url":null,"abstract":"<p><p>Drug repositioning offers a cost-effective alternative to traditional drug development by identifying new uses for existing drugs. Recent advances leverage Graph Neural Networks (GNN) to model complex biological data, showing promise in predicting novel drug-disease associations. However, these frameworks often lack explainability, a critical factor for validating predictions and understanding drug mechanisms. Here, we introduce Drug-Based Reasoning Explainer (DBR-X), an explainable GNN model that combines a link prediction module and a path-identification module to generate interpretable and faithful explanations. When benchmarked against other GNN link prediction frameworks, DBR-X achieves superior performance in identifying known drug-disease associations, demonstrating higher accuracy across all evaluation metrics. The quality of DBR-X biological explanations was assessed through multiple approaches: comparison with manually-curated drug mechanisms, evaluation of explanation faithfulness through deletion and insertion studies, and measurement of stability under graph perturbations. Together, our model not only advances the state-of-the-art in drug repositioning predictions but also provides multi-hop explanations that can accelerate the translation of computational predictions into clinical applications.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145986038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1093/bioinformatics/btag014
Xinyi Tang, Ran Liu
Motivation: Sequence motif identification is crucial for understanding molecular recognition, particularly in immune responses involving peptide binding to MHC class I molecules for antigen presentation to T cells. Traditionally, MHC class I binding motifs are assumed to be contiguous and span nine amino acids. However, structural evidence suggests that binding may involve non-adjacent residues, challenging the assumptions of existing methods.
Results: In this study, we propose GAMMA (Gap-Aware Motif Mining Algorithm), a probabilistic framework designed to identify non-contiguous motifs under conditions of incomplete labeling. GAMMA employs Bayesian inference with MCMC sampling to jointly estimate motif parameters, binding locations, and the relative spacing between binding positions. Through extensive simulations and real-world applications to MHC class I peptide datasets, GAMMA outperforms existing motif discovery tools such as GLAM2 in accurately localizing binding residues and identifying the underlying motifs. Notably, our results suggest that the true number of binding residues may be eight, fewer than the commonly assumed nine. In addition, for longer peptides, the model captures increased flexibility in the central region, consistent with structural observations that peptides may bulge in the middle.
Availability: The raw data and the source codes are available on GitHub (https://github.com/RanLIUaca/GAMMAmotif).
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"GAMMA: Gap-aware Motif Mining under Incomplete Labeling with Applications to MHC Motifs.","authors":"Xinyi Tang, Ran Liu","doi":"10.1093/bioinformatics/btag014","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag014","url":null,"abstract":"<p><strong>Motivation: </strong>Sequence motif identification is crucial for understanding molecular recognition, particularly in immune responses involving peptide binding to MHC class I molecules for antigen presentation to T cells. Traditionally, MHC class I binding motifs are assumed to be contiguous and span nine amino acids. However, structural evidence suggests that binding may involve non-adjacent residues, challenging the assumptions of existing methods.</p><p><strong>Results: </strong>In this study, we propose GAMMA (Gap-Aware Motif Mining Algorithm), a probabilistic framework designed to identify non-contiguous motifs under conditions of incomplete labeling. GAMMA employs Bayesian inference with MCMC sampling to jointly estimate motif parameters, binding locations, and the relative spacing between binding positions. Through extensive simulations and real-world applications to MHC class I peptide datasets, GAMMA outperforms existing motif discovery tools such as GLAM2 in accurately localizing binding residues and identifying the underlying motifs. Notably, our results suggest that the true number of binding residues may be eight, fewer than the commonly assumed nine. In addition, for longer peptides, the model captures increased flexibility in the central region, consistent with structural observations that peptides may bulge in the middle.</p><p><strong>Availability: </strong>The raw data and the source codes are available on GitHub (https://github.com/RanLIUaca/GAMMAmotif).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1093/bioinformatics/btag011
Darya Shlyk, Lawrence Hunter
Motivation: Biomedical Entity Linking (BEL) maps mentions in biomedical text to standardized identifiers, enabling structured data integration and downstream knowledge discovery. However, current BEL systems remain fundamentally constrained by the recall of the initial candidate pool, where suboptimal retrieval limits the overall effectiveness of the normalization pipeline.
Results: We present the first systematic evaluation of Generative Relevance Feedback (GRF) for enhancing candidate retrieval in state-of-the-art BEL systems. GRF leverages large language models (LLMs) to enrich the expressiveness of the mention in a zero-shot fashion. We assess GRF's impact under two scenarios-direct linking prediction and candidate generation in cascading normalization pipelines-and analyze its sensitivity to different LLMs, feedback types, and integration strategies. Experiments across eight corpora and four biomedical knowledge bases demonstrate that integrating GRF significantly improves both accuracy and recall, thereby increasing the upper bound on normalization performance. Our findings highlight GRF as an efficient, model-agnostic solution and underscore its potential as a key component for advancing BEL.
Availability: The code to reproduce our experiments can be found at: https://doi.org/10.5281/zenodo.17853541.
{"title":"Improving biomedical entity linking with generative relevance feedback.","authors":"Darya Shlyk, Lawrence Hunter","doi":"10.1093/bioinformatics/btag011","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag011","url":null,"abstract":"<p><strong>Motivation: </strong>Biomedical Entity Linking (BEL) maps mentions in biomedical text to standardized identifiers, enabling structured data integration and downstream knowledge discovery. However, current BEL systems remain fundamentally constrained by the recall of the initial candidate pool, where suboptimal retrieval limits the overall effectiveness of the normalization pipeline.</p><p><strong>Results: </strong>We present the first systematic evaluation of Generative Relevance Feedback (GRF) for enhancing candidate retrieval in state-of-the-art BEL systems. GRF leverages large language models (LLMs) to enrich the expressiveness of the mention in a zero-shot fashion. We assess GRF's impact under two scenarios-direct linking prediction and candidate generation in cascading normalization pipelines-and analyze its sensitivity to different LLMs, feedback types, and integration strategies. Experiments across eight corpora and four biomedical knowledge bases demonstrate that integrating GRF significantly improves both accuracy and recall, thereby increasing the upper bound on normalization performance. Our findings highlight GRF as an efficient, model-agnostic solution and underscore its potential as a key component for advancing BEL.</p><p><strong>Availability: </strong>The code to reproduce our experiments can be found at: https://doi.org/10.5281/zenodo.17853541.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1093/bioinformatics/btag023
Siera Martinez, Tushar Sharma, Luke Johnson, Allen Kim, Vania Ballesteros Prieto, Hovhannes Arestakesyan, Sunisha Harish, Jewel Dias, Joseph Goldfrank, Nathan Edwards, Anelia Horvath
Motivation: Accurately characterizing expressed genetic variation at the single-cell level is essential for understanding transcriptional heterogeneity, allelic regulation, and mutational dynamics within complex tissues. However, few tools enable comprehensive visualization and quantitative analysis of expressed variants across individual cells.
Results: scSNViz is an R package for the exploration, quantification, and visualization of expressed single-nucleotide variants (SNVs) from cell-barcoded single-cell RNA sequencing (scRNA-seq) data. The software supports estimation of variant allele fractions, clustering of SNV expression profiles, and 2D and 3D visualization of individual SNVs or user-defined SNV groups. Beyond visualization, scSNViz facilitates investigation of cell-, cluster-, or lineage-specific variant expression patterns, as well as allelic dynamics including imprinting, random allele inactivation, and transcriptional bursting. It interoperates seamlessly with established single-cell frameworks-Seurat for clustering, Slingshot for trajectory inference, scType for cell-type annotation, and CopyKat for copy-number profiling-enabling integrative multi-omic analyses of expressed variation.
Availability: scSNViz is implemented in R and freely available at https://github.com/HorvathLab/scSNViz (DOI: 10.5281/zenodo.17307516). The package includes comprehensive documentation and example workflows designed for users with limited bioinformatics experience.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"scSNViz: Visualization and analysis of Cell-Specific expressed SNVs.","authors":"Siera Martinez, Tushar Sharma, Luke Johnson, Allen Kim, Vania Ballesteros Prieto, Hovhannes Arestakesyan, Sunisha Harish, Jewel Dias, Joseph Goldfrank, Nathan Edwards, Anelia Horvath","doi":"10.1093/bioinformatics/btag023","DOIUrl":"10.1093/bioinformatics/btag023","url":null,"abstract":"<p><strong>Motivation: </strong>Accurately characterizing expressed genetic variation at the single-cell level is essential for understanding transcriptional heterogeneity, allelic regulation, and mutational dynamics within complex tissues. However, few tools enable comprehensive visualization and quantitative analysis of expressed variants across individual cells.</p><p><strong>Results: </strong>scSNViz is an R package for the exploration, quantification, and visualization of expressed single-nucleotide variants (SNVs) from cell-barcoded single-cell RNA sequencing (scRNA-seq) data. The software supports estimation of variant allele fractions, clustering of SNV expression profiles, and 2D and 3D visualization of individual SNVs or user-defined SNV groups. Beyond visualization, scSNViz facilitates investigation of cell-, cluster-, or lineage-specific variant expression patterns, as well as allelic dynamics including imprinting, random allele inactivation, and transcriptional bursting. It interoperates seamlessly with established single-cell frameworks-Seurat for clustering, Slingshot for trajectory inference, scType for cell-type annotation, and CopyKat for copy-number profiling-enabling integrative multi-omic analyses of expressed variation.</p><p><strong>Availability: </strong>scSNViz is implemented in R and freely available at https://github.com/HorvathLab/scSNViz (DOI: 10.5281/zenodo.17307516). The package includes comprehensive documentation and example workflows designed for users with limited bioinformatics experience.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145986024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1093/bioinformatics/btag021
Huda Ahmad, Hannah M Doherty, Sam Benedict, James Haycocks, Ge Zhou, Patrick Moynihan, Danesh Moradigaravand, Manuel Banzhaf
Motivation: Chemical genomics is a powerful high-throughput approach to systematically link phenotypes to genotypes. However, the vast datasets generated remain challenging to explore due to the lack of integrated, interactive tools for visualisation and analysis. Existing workflows often require multiple independent software tools, limiting data accessibility and collaboration. Therefore, we created a user-friendly platform that enables efficient exploration and sharing of chemical genomics data.
Results: We developed ChemGenXplore, a web-based Shiny application designed to streamline the visualisation and analysis of chemical genomic screens. It offers two primary functionalities: one for exploring pre-implemented datasets and another for analysing user-uploaded datasets. ChemGenXplore enables users to visualise phenotypic profiles, assess gene-gene and condition-condition correlations, perform GO and KEGG enrichment analysis, and generate customisable, interactive heatmaps. To further support collaborative research, ChemGenXplore also facilitates the comparative analysis of chemical genomic and other omics datasets. By consolidating these features into a single interactive and accessible tool, ChemGenXplore facilitates data sharing, enhances reproducibility, and promotes collaboration within the research community.
Availability: ChemGenXplore is freely accessible as a web application at https://chemgenxplore.kaust.edu.sa/. Source code and documentation, including instructions for local installation, are provided on GitHub (https://github.com/Hudaahmadd/ChemGenXplore). A Docker image is also available on DockerHub (https://hub.docker.com/r/hudaahmad/chemgenxplore) to ensure reproducibility and simplify installation.
Contact: example@example.org.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"ChemGenXplore: An Interactive Tool for Exploring and Analysing Chemical Genomic Data.","authors":"Huda Ahmad, Hannah M Doherty, Sam Benedict, James Haycocks, Ge Zhou, Patrick Moynihan, Danesh Moradigaravand, Manuel Banzhaf","doi":"10.1093/bioinformatics/btag021","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag021","url":null,"abstract":"<p><strong>Motivation: </strong>Chemical genomics is a powerful high-throughput approach to systematically link phenotypes to genotypes. However, the vast datasets generated remain challenging to explore due to the lack of integrated, interactive tools for visualisation and analysis. Existing workflows often require multiple independent software tools, limiting data accessibility and collaboration. Therefore, we created a user-friendly platform that enables efficient exploration and sharing of chemical genomics data.</p><p><strong>Results: </strong>We developed ChemGenXplore, a web-based Shiny application designed to streamline the visualisation and analysis of chemical genomic screens. It offers two primary functionalities: one for exploring pre-implemented datasets and another for analysing user-uploaded datasets. ChemGenXplore enables users to visualise phenotypic profiles, assess gene-gene and condition-condition correlations, perform GO and KEGG enrichment analysis, and generate customisable, interactive heatmaps. To further support collaborative research, ChemGenXplore also facilitates the comparative analysis of chemical genomic and other omics datasets. By consolidating these features into a single interactive and accessible tool, ChemGenXplore facilitates data sharing, enhances reproducibility, and promotes collaboration within the research community.</p><p><strong>Availability: </strong>ChemGenXplore is freely accessible as a web application at https://chemgenxplore.kaust.edu.sa/. Source code and documentation, including instructions for local installation, are provided on GitHub (https://github.com/Hudaahmadd/ChemGenXplore). A Docker image is also available on DockerHub (https://hub.docker.com/r/hudaahmad/chemgenxplore) to ensure reproducibility and simplify installation.</p><p><strong>Contact: </strong>example@example.org.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145967397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1093/bioinformatics/btag002
Hayden C Metsky, Katherine J Siddle, Christian B Matranga, Pardis C Sabeti
{"title":"Best practices when benchmarking CATCH for the design of genome enrichment probes.","authors":"Hayden C Metsky, Katherine J Siddle, Christian B Matranga, Pardis C Sabeti","doi":"10.1093/bioinformatics/btag002","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag002","url":null,"abstract":"","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145967178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1093/bioinformatics/btag006
Yitao Xu, Guanyun Wei, Jingying Zhou, Yuanhua Huang, Weichua Yu, Zhixiang Lin, Ran Liu, Xiaodan Fan
Motivation: Accurate in silico identification of B-cell epitope residues is crucial for antibody design and structure-guided vaccine development. Although recent protein language models and structure-aware methods can capture spatial information of tertiary structure when generating residue embeddings, most existing epitope predictors use these embeddings to perform classification for individual residues one by one, without enforcing spatial continuity for reported epitope residues. Such methods often result in biologically implausible predictions because B-cell epitope residues always cluster together on the antigen surface.
Results: We present RoBep, a region-oriented B-cell epitope predictor that explicitly models the spatial clustering of epitope residues. RoBep introduces a novel region constraint mechanism and combines the advanced protein language model ESM-Cambrian with an equivariant graph neural network. Our method outperforms existing structure-based methods on the benchmark dataset, demonstrating improvements of 26%, 45%, 13%, and 43% in F1, MCC, AUPR, and AUROC0.1, respectively. In addition to residue-level predictions, RoBep can also provide antibody-antigen binding regions. Importantly, the predicted epitope residues are ensured to be spatially compact, enhancing biological plausibility and practical relevance for immunotherapeutic design.
Availability: A user-friendly website for using RoBep is provided at https://huggingface.co/spaces/NielTT/RoBep. All datasets, source code used in this work, and implementation instructions of the website are publicly available at https://github.com/YitaoXU/RoBep.
{"title":"RoBep: A Region-Oriented Deep Learning Model for B-Cell Epitope Prediction.","authors":"Yitao Xu, Guanyun Wei, Jingying Zhou, Yuanhua Huang, Weichua Yu, Zhixiang Lin, Ran Liu, Xiaodan Fan","doi":"10.1093/bioinformatics/btag006","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag006","url":null,"abstract":"<p><strong>Motivation: </strong>Accurate in silico identification of B-cell epitope residues is crucial for antibody design and structure-guided vaccine development. Although recent protein language models and structure-aware methods can capture spatial information of tertiary structure when generating residue embeddings, most existing epitope predictors use these embeddings to perform classification for individual residues one by one, without enforcing spatial continuity for reported epitope residues. Such methods often result in biologically implausible predictions because B-cell epitope residues always cluster together on the antigen surface.</p><p><strong>Results: </strong>We present RoBep, a region-oriented B-cell epitope predictor that explicitly models the spatial clustering of epitope residues. RoBep introduces a novel region constraint mechanism and combines the advanced protein language model ESM-Cambrian with an equivariant graph neural network. Our method outperforms existing structure-based methods on the benchmark dataset, demonstrating improvements of 26%, 45%, 13%, and 43% in F1, MCC, AUPR, and AUROC0.1, respectively. In addition to residue-level predictions, RoBep can also provide antibody-antigen binding regions. Importantly, the predicted epitope residues are ensured to be spatially compact, enhancing biological plausibility and practical relevance for immunotherapeutic design.</p><p><strong>Availability: </strong>A user-friendly website for using RoBep is provided at https://huggingface.co/spaces/NielTT/RoBep. All datasets, source code used in this work, and implementation instructions of the website are publicly available at https://github.com/YitaoXU/RoBep.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145960998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1093/bioinformatics/btag012
Kaylee D Rich, James D Wasmuth
Motivation: Molecular mimicry is used by pathogens to evade the host immune system and manipulate other host cellular processes. It is often mediated by short motifs in non-homologous proteins, whose detection challenges the sensitivity and specificity of existing bioinformatics tools.
Results: We present mimicDetector, a k-mer-based pipeline for identifying protein-level molecular mimicry between pathogens and their hosts. Applied to 17 globally important pathogens, mimicDetector identified a broad and biologically plausible set of mimicry candidates, including helminth proteins mimicking components of the human complement system and a Leishmania infantum mimic of Reticulon-4, a regulator of immune cell recruitment.
Availability: mimicDetector is freely available at https://github.com/kayleerich/mimicDetector/, implemented in Python and Snakemake, and compatible with Unix-based systems.
Supplementary information: Data related to the results are incorporated into the article and online supplementary material available at Bioinformatics online.
{"title":"mimicDetector: a pipeline for protein motif mimicry detection in host-pathogen interactions.","authors":"Kaylee D Rich, James D Wasmuth","doi":"10.1093/bioinformatics/btag012","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag012","url":null,"abstract":"<p><strong>Motivation: </strong>Molecular mimicry is used by pathogens to evade the host immune system and manipulate other host cellular processes. It is often mediated by short motifs in non-homologous proteins, whose detection challenges the sensitivity and specificity of existing bioinformatics tools.</p><p><strong>Results: </strong>We present mimicDetector, a k-mer-based pipeline for identifying protein-level molecular mimicry between pathogens and their hosts. Applied to 17 globally important pathogens, mimicDetector identified a broad and biologically plausible set of mimicry candidates, including helminth proteins mimicking components of the human complement system and a Leishmania infantum mimic of Reticulon-4, a regulator of immune cell recruitment.</p><p><strong>Availability: </strong>mimicDetector is freely available at https://github.com/kayleerich/mimicDetector/, implemented in Python and Snakemake, and compatible with Unix-based systems.</p><p><strong>Supplementary information: </strong>Data related to the results are incorporated into the article and online supplementary material available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-11DOI: 10.1093/bioinformatics/btag016
Joseph Thorpe, Nina Billows, Gabrielle C Ngwana-Joseph, Amy Ibrahim, Deborah Nolder, Colin J Sutherland, Nguyen Thi Hong Ngoc, Nguyen Thi Huong Binh, Nguyen Quang Thieu, Jamille G Dombrowski, Silvia Maria Di Santi, Claudio R F Marinho, Jody E Phelan, Tomasz Kurowski, Fady Mohareb, Susana Campino, Taane G Clark
Motivation: Malaria, caused by Plasmodium parasites, imposes a significant public health burden. While Plasmodium falciparum remains the primary target of elimination strategies due to its high mortality rate, lesser-known species such as P. malariae, P. vivax, and P. knowlesi continue to contribute to substantial human morbidity. Genomic approaches, including whole-genome sequencing, offer powerful tools for understanding the biology, transmission, and emerging drug resistance of these neglected Plasmodium species. However, there is an urgent need for informatic tools to summarise and visualise the high-dimensional and complex genomic data generated.
Results: We developed Malaria-GENOMAP, a user-friendly web-based tool, which integrates genomic variant data, such as allele frequencies, with geographical maps and chromosome-wide to gene views for in-depth exploration. The tool includes variation from P. knowlesi (n = 139), P. malariae (n = 158), P. ovale curtisi (n = 36), P. ovale wallikeri (n = 47), P. simium (n = 38), and P. vivax (n = 1,359). It enables the investigation of population structure, geographic associations of mutations, and putative drug resistance markers, offering valuable insights for malaria control efforts.
Availability: Malaria-GENOMAP is available online at https://genomics.lshtm.ac.uk/malaria-genomaps/#/.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"Malaria-GENOMAP: A web-based tool for exploring genomic variation of malaria parasites.","authors":"Joseph Thorpe, Nina Billows, Gabrielle C Ngwana-Joseph, Amy Ibrahim, Deborah Nolder, Colin J Sutherland, Nguyen Thi Hong Ngoc, Nguyen Thi Huong Binh, Nguyen Quang Thieu, Jamille G Dombrowski, Silvia Maria Di Santi, Claudio R F Marinho, Jody E Phelan, Tomasz Kurowski, Fady Mohareb, Susana Campino, Taane G Clark","doi":"10.1093/bioinformatics/btag016","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag016","url":null,"abstract":"<p><strong>Motivation: </strong>Malaria, caused by Plasmodium parasites, imposes a significant public health burden. While Plasmodium falciparum remains the primary target of elimination strategies due to its high mortality rate, lesser-known species such as P. malariae, P. vivax, and P. knowlesi continue to contribute to substantial human morbidity. Genomic approaches, including whole-genome sequencing, offer powerful tools for understanding the biology, transmission, and emerging drug resistance of these neglected Plasmodium species. However, there is an urgent need for informatic tools to summarise and visualise the high-dimensional and complex genomic data generated.</p><p><strong>Results: </strong>We developed Malaria-GENOMAP, a user-friendly web-based tool, which integrates genomic variant data, such as allele frequencies, with geographical maps and chromosome-wide to gene views for in-depth exploration. The tool includes variation from P. knowlesi (n = 139), P. malariae (n = 158), P. ovale curtisi (n = 36), P. ovale wallikeri (n = 47), P. simium (n = 38), and P. vivax (n = 1,359). It enables the investigation of population structure, geographic associations of mutations, and putative drug resistance markers, offering valuable insights for malaria control efforts.</p><p><strong>Availability: </strong>Malaria-GENOMAP is available online at https://genomics.lshtm.ac.uk/malaria-genomaps/#/.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145954215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}