Pub Date : 2026-01-14DOI: 10.1093/bioinformatics/btag028
Chenyang Xie, Yingying Song, Song He, Xiaochen Bo, Zhongnan Zhang
Motivation: The goal of molecular representation learning is to automate the extraction of molecular features, a critical task in cheminformatics and drug discovery. While pretraining models using multiple views like SMILES, two-dimensional graphs, and three-dimensional conformations have advanced the field, integrating them effectively to produce superior representations remains a challenge.
Results: bridge this gap, we propose a novel multi-view molecular pretraining method termed MMPCS, which explicitly factorizes representations into consistency and specific information. Our approach utilizes the Graph Isomorphism Network and the RoBERTa model to encode two-dimensional molecular topological graphs and SMILES sequences, respectively. Each resulting molecular embedding is decomposed into a shared consistency component and a view-specific remainder. An autoencoder then aligns the consistency information across views. The combined consistency and view-specific representations serve as input for downstream tasks, enabling precise and task-aware predictions. When benchmarked against 16 state-of-the-art molecular pretraining methods, MMPCS achieved the highest average performance across both classification and regression tasks for molecular property prediction. It also delivered outstanding results in predicting drug-target binding affinity and cancer drug response, demonstrating its robustness and broad applicability. Additionally, a case study on the SARS-CoV-2 Omicron variant highlights the potential of MMPCS in facilitating drug repurposing efforts.
Availability and implementation: The source code and datasets supporting this study are publicly available at GitHub (https://github.com/xmubiocode/MMPCS) and Zenodo (https://doi.org/10.5281/zenodo.18182748).
{"title":"MMPCS: multi-view molecular pretraining based on consistency information and specific information.","authors":"Chenyang Xie, Yingying Song, Song He, Xiaochen Bo, Zhongnan Zhang","doi":"10.1093/bioinformatics/btag028","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag028","url":null,"abstract":"<p><strong>Motivation: </strong>The goal of molecular representation learning is to automate the extraction of molecular features, a critical task in cheminformatics and drug discovery. While pretraining models using multiple views like SMILES, two-dimensional graphs, and three-dimensional conformations have advanced the field, integrating them effectively to produce superior representations remains a challenge.</p><p><strong>Results: </strong>bridge this gap, we propose a novel multi-view molecular pretraining method termed MMPCS, which explicitly factorizes representations into consistency and specific information. Our approach utilizes the Graph Isomorphism Network and the RoBERTa model to encode two-dimensional molecular topological graphs and SMILES sequences, respectively. Each resulting molecular embedding is decomposed into a shared consistency component and a view-specific remainder. An autoencoder then aligns the consistency information across views. The combined consistency and view-specific representations serve as input for downstream tasks, enabling precise and task-aware predictions. When benchmarked against 16 state-of-the-art molecular pretraining methods, MMPCS achieved the highest average performance across both classification and regression tasks for molecular property prediction. It also delivered outstanding results in predicting drug-target binding affinity and cancer drug response, demonstrating its robustness and broad applicability. Additionally, a case study on the SARS-CoV-2 Omicron variant highlights the potential of MMPCS in facilitating drug repurposing efforts.</p><p><strong>Availability and implementation: </strong>The source code and datasets supporting this study are publicly available at GitHub (https://github.com/xmubiocode/MMPCS) and Zenodo (https://doi.org/10.5281/zenodo.18182748).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145986087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1093/bioinformatics/btag029
Hang Wei, Yuran Xie, Wenxiang Zhang, Linyang Li, Shuai Wu, Lin Gao
Motivation: Identifying non-coding RNAs (ncRNAs) associated with drug resistance is critical for elucidating molecular mechanisms underlying drug response, facilitating drug screening, and discovering novel therapeutic targets. While several graph neural network-based methods have been proposed to infer ncRNA-drug resistance associations, they remain fundamentally constrained by semantic distortion induced by sparse bipartite network and neglect of relational semantics among molecular entities, ultimately compromising both predictive reliability and biological interpretability.
Results: In this study, we propose iNcRD-HG, a novel framework for identifying ncRNA-drug resistance associations. The framework addresses three critical aspects: constructing a context-enriched heterogeneous network that integrates six distinct molecular interaction types with bio-entity-specific attributes, developing a semantic-enhanced graph learning architecture that implements relation-type-aware message passing to capture complex contextual dependencies, and introducing an interpretability mechanism to reveal potential synergistic pathways underlying drug response. Experimental results demonstrate that iNcRD-HG achieves superior predictive performance across diverse benchmark datasets while deriving association features with strong discriminative capability. By identifying molecular synergistic contexts, iNcRD-HG provides mechanistically interpretable insights into ncRNA-mediated drug resistance.
Availability and implementation: Datasets and source codes are available at https://github.com/Biohang/iNcRD-HG.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"Semantic-Enhanced heterogeneous graph learning for identifying ncRNAs associated with drug resistance.","authors":"Hang Wei, Yuran Xie, Wenxiang Zhang, Linyang Li, Shuai Wu, Lin Gao","doi":"10.1093/bioinformatics/btag029","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag029","url":null,"abstract":"<p><strong>Motivation: </strong>Identifying non-coding RNAs (ncRNAs) associated with drug resistance is critical for elucidating molecular mechanisms underlying drug response, facilitating drug screening, and discovering novel therapeutic targets. While several graph neural network-based methods have been proposed to infer ncRNA-drug resistance associations, they remain fundamentally constrained by semantic distortion induced by sparse bipartite network and neglect of relational semantics among molecular entities, ultimately compromising both predictive reliability and biological interpretability.</p><p><strong>Results: </strong>In this study, we propose iNcRD-HG, a novel framework for identifying ncRNA-drug resistance associations. The framework addresses three critical aspects: constructing a context-enriched heterogeneous network that integrates six distinct molecular interaction types with bio-entity-specific attributes, developing a semantic-enhanced graph learning architecture that implements relation-type-aware message passing to capture complex contextual dependencies, and introducing an interpretability mechanism to reveal potential synergistic pathways underlying drug response. Experimental results demonstrate that iNcRD-HG achieves superior predictive performance across diverse benchmark datasets while deriving association features with strong discriminative capability. By identifying molecular synergistic contexts, iNcRD-HG provides mechanistically interpretable insights into ncRNA-mediated drug resistance.</p><p><strong>Availability and implementation: </strong>Datasets and source codes are available at https://github.com/Biohang/iNcRD-HG.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145986071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1093/bioinformatics/btag010
Huaiwu Zhang, Xinliang Sun, Jianxin Wang, Min Li, Jing Tang
Motivation: Drug synergy is crucial for developing effective combination therapies, but traditional screening methods suffer from inefficiency and high costs. While deep learning shows promise for predicting drug synergy, current approaches using Transformers and graph neural networks focus on combining drug and cell line features without modelling how genes causally influence drug responses.
Results: To address this limitation, we propose CADS (Causal Adjustment for Drug Synergy), a deep learning framework that integrates causal relationships between genes and drug responses. Leveraging multi-omics data, CADS uses a learnable mask mechanism to identify key causal genes while filtering out irrelevant genetic factors through backdoor adjustment. Our model achieves two key objectives simultaneously: accurate prediction of drug synergy and interpretable causal gene discovery. Experiments on multiple datasets show that CADS consistently outperforms state-of-the-art methods across multiple metrics. Case studies demonstrate that CADS can reduce unnecessary complexity while providing more biological insights through its gene importance scores, which help identify clinically validated cancer-related genes that mediate drug interactions.
Availability and implementation: Taken together, CADS advances combination therapy prediction by explicitly modelling drug synergy causal genes, offering enhanced interpretability for AI-based drug development. The source code can be found at https://github.com/HuaiwuZhang/causalDC.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"CADS: A Causal Inference Framework for Identifying Essential Genes to Enhance Drug Synergy Prediction.","authors":"Huaiwu Zhang, Xinliang Sun, Jianxin Wang, Min Li, Jing Tang","doi":"10.1093/bioinformatics/btag010","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag010","url":null,"abstract":"<p><strong>Motivation: </strong>Drug synergy is crucial for developing effective combination therapies, but traditional screening methods suffer from inefficiency and high costs. While deep learning shows promise for predicting drug synergy, current approaches using Transformers and graph neural networks focus on combining drug and cell line features without modelling how genes causally influence drug responses.</p><p><strong>Results: </strong>To address this limitation, we propose CADS (Causal Adjustment for Drug Synergy), a deep learning framework that integrates causal relationships between genes and drug responses. Leveraging multi-omics data, CADS uses a learnable mask mechanism to identify key causal genes while filtering out irrelevant genetic factors through backdoor adjustment. Our model achieves two key objectives simultaneously: accurate prediction of drug synergy and interpretable causal gene discovery. Experiments on multiple datasets show that CADS consistently outperforms state-of-the-art methods across multiple metrics. Case studies demonstrate that CADS can reduce unnecessary complexity while providing more biological insights through its gene importance scores, which help identify clinically validated cancer-related genes that mediate drug interactions.</p><p><strong>Availability and implementation: </strong>Taken together, CADS advances combination therapy prediction by explicitly modelling drug synergy causal genes, offering enhanced interpretability for AI-based drug development. The source code can be found at https://github.com/HuaiwuZhang/causalDC.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: Aberrant DNA methylation is a fundamental epigenetic hallmark of cancer. However, existing resources often lack technological diversity and comprehensive cancer coverage. Furthermore, most platforms fail to achieve deep multi-omics integration and tend to ignore cancer-type-specific methylation features, limiting their utility in precision oncology and drug discovery.
Results: We developed CMAtlas (Cancer Methylation Atlas), a comprehensive platform integrating 13,753 samples across 34 cancer types. By applying technology-tailored pipelines to data from various profiling technologies, we identified 830,725 tumor-specific differentially methylated elements (DMEs) and 1,480,098 differentially methylated regions (DMRs), alongside 1,154,256 cancer-type-specific DMEs and 329,154 DMRs. The platform demonstrates high cross-platform consistency and strong concordance between tumor tissues and cell lines, ensuring the robustness of our findings. All DMEs and DMRs are annotated with multi-omics data (RNA expression, somatic mutations, and chromatin accessibility) and clinical relevance (survival associations and cell-free DNA profiling). We further demonstrate the utility of CMAtlas by identifying prognostic aberrant methylation in colorectal cancer driver genes.
Availability: CMAtlas is freely accessible at {{https://cmatlas.renlab.cn/}}. The platform offers an intuitive web interface supporting gene-centric and cancer-centric queries, alongside customizable analysis modules designed to facilitate user-specific research needs.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"CMAtlas: a comprehensive DNA methylation atlas for exploring epigenetic alterations in 34 human cancer types.","authors":"Mengni Liu, Lizhen Jiang, Luowanyue Zhang, Tianjian Chen, Xingzhe Wang, Yuan Liang, Xianping Shi, Jian Ren, Yueyuan Zheng","doi":"10.1093/bioinformatics/btag022","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag022","url":null,"abstract":"<p><strong>Motivation: </strong>Aberrant DNA methylation is a fundamental epigenetic hallmark of cancer. However, existing resources often lack technological diversity and comprehensive cancer coverage. Furthermore, most platforms fail to achieve deep multi-omics integration and tend to ignore cancer-type-specific methylation features, limiting their utility in precision oncology and drug discovery.</p><p><strong>Results: </strong>We developed CMAtlas (Cancer Methylation Atlas), a comprehensive platform integrating 13,753 samples across 34 cancer types. By applying technology-tailored pipelines to data from various profiling technologies, we identified 830,725 tumor-specific differentially methylated elements (DMEs) and 1,480,098 differentially methylated regions (DMRs), alongside 1,154,256 cancer-type-specific DMEs and 329,154 DMRs. The platform demonstrates high cross-platform consistency and strong concordance between tumor tissues and cell lines, ensuring the robustness of our findings. All DMEs and DMRs are annotated with multi-omics data (RNA expression, somatic mutations, and chromatin accessibility) and clinical relevance (survival associations and cell-free DNA profiling). We further demonstrate the utility of CMAtlas by identifying prognostic aberrant methylation in colorectal cancer driver genes.</p><p><strong>Availability: </strong>CMAtlas is freely accessible at {{https://cmatlas.renlab.cn/}}. The platform offers an intuitive web interface supporting gene-centric and cancer-centric queries, alongside customizable analysis modules designed to facilitate user-specific research needs.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145986016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1093/bioinformatics/btag007
Alexandra M Wong, Cecile Meier-Scherling, Lorin Crawford
Motivation: Predicting synergistic cancer drug combinations through computational methods offers a scalable approach to creating therapies that are more effective and less toxic. However, most algorithms focus solely on synergy without considering toxicity when selecting optimal drug combinations. In the absence of combinatorial toxicity assays, a few models use toxicity penalties to balance high synergy with lower toxicity. Still, these penalties have not been explicitly validated against known drug-drug interactions.
Results: In this study, we examine whether synergy scores and toxicity metrics correlate with known adverse drug interactions. While some metrics show trends with toxicity levels, our results reveal significant limitations in using them as penalties. These findings highlight the challenges of incorporating toxicity into synergy prediction frameworks and suggest that advancing the field requires more comprehensive combination toxicity data.
Availability and implementation: The code written for this project is available at https://github.com/amw14/toxicity-cancer-drug-combination.
{"title":"Characterizing Clinical Toxicity in Cancer Combination Therapies.","authors":"Alexandra M Wong, Cecile Meier-Scherling, Lorin Crawford","doi":"10.1093/bioinformatics/btag007","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag007","url":null,"abstract":"<p><strong>Motivation: </strong>Predicting synergistic cancer drug combinations through computational methods offers a scalable approach to creating therapies that are more effective and less toxic. However, most algorithms focus solely on synergy without considering toxicity when selecting optimal drug combinations. In the absence of combinatorial toxicity assays, a few models use toxicity penalties to balance high synergy with lower toxicity. Still, these penalties have not been explicitly validated against known drug-drug interactions.</p><p><strong>Results: </strong>In this study, we examine whether synergy scores and toxicity metrics correlate with known adverse drug interactions. While some metrics show trends with toxicity levels, our results reveal significant limitations in using them as penalties. These findings highlight the challenges of incorporating toxicity into synergy prediction frameworks and suggest that advancing the field requires more comprehensive combination toxicity data.</p><p><strong>Availability and implementation: </strong>The code written for this project is available at https://github.com/amw14/toxicity-cancer-drug-combination.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1093/bioinformatics/btag008
Adriana Carolina Gonzalez-Cavazos, Roger Tu, Meghamala Sinha, Andrew I Su
Drug repositioning offers a cost-effective alternative to traditional drug development by identifying new uses for existing drugs. Recent advances leverage Graph Neural Networks (GNN) to model complex biological data, showing promise in predicting novel drug-disease associations. However, these frameworks often lack explainability, a critical factor for validating predictions and understanding drug mechanisms. Here, we introduce Drug-Based Reasoning Explainer (DBR-X), an explainable GNN model that combines a link prediction module and a path-identification module to generate interpretable and faithful explanations. When benchmarked against other GNN link prediction frameworks, DBR-X achieves superior performance in identifying known drug-disease associations, demonstrating higher accuracy across all evaluation metrics. The quality of DBR-X biological explanations was assessed through multiple approaches: comparison with manually-curated drug mechanisms, evaluation of explanation faithfulness through deletion and insertion studies, and measurement of stability under graph perturbations. Together, our model not only advances the state-of-the-art in drug repositioning predictions but also provides multi-hop explanations that can accelerate the translation of computational predictions into clinical applications.
{"title":"A Case-Based Explainable Graph Neural Network Framework for Mechanistic Drug Repositioning.","authors":"Adriana Carolina Gonzalez-Cavazos, Roger Tu, Meghamala Sinha, Andrew I Su","doi":"10.1093/bioinformatics/btag008","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag008","url":null,"abstract":"<p><p>Drug repositioning offers a cost-effective alternative to traditional drug development by identifying new uses for existing drugs. Recent advances leverage Graph Neural Networks (GNN) to model complex biological data, showing promise in predicting novel drug-disease associations. However, these frameworks often lack explainability, a critical factor for validating predictions and understanding drug mechanisms. Here, we introduce Drug-Based Reasoning Explainer (DBR-X), an explainable GNN model that combines a link prediction module and a path-identification module to generate interpretable and faithful explanations. When benchmarked against other GNN link prediction frameworks, DBR-X achieves superior performance in identifying known drug-disease associations, demonstrating higher accuracy across all evaluation metrics. The quality of DBR-X biological explanations was assessed through multiple approaches: comparison with manually-curated drug mechanisms, evaluation of explanation faithfulness through deletion and insertion studies, and measurement of stability under graph perturbations. Together, our model not only advances the state-of-the-art in drug repositioning predictions but also provides multi-hop explanations that can accelerate the translation of computational predictions into clinical applications.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145986038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1093/bioinformatics/btag014
Xinyi Tang, Ran Liu
Motivation: Sequence motif identification is crucial for understanding molecular recognition, particularly in immune responses involving peptide binding to MHC class I molecules for antigen presentation to T cells. Traditionally, MHC class I binding motifs are assumed to be contiguous and span nine amino acids. However, structural evidence suggests that binding may involve non-adjacent residues, challenging the assumptions of existing methods.
Results: In this study, we propose GAMMA (Gap-Aware Motif Mining Algorithm), a probabilistic framework designed to identify non-contiguous motifs under conditions of incomplete labeling. GAMMA employs Bayesian inference with MCMC sampling to jointly estimate motif parameters, binding locations, and the relative spacing between binding positions. Through extensive simulations and real-world applications to MHC class I peptide datasets, GAMMA outperforms existing motif discovery tools such as GLAM2 in accurately localizing binding residues and identifying the underlying motifs. Notably, our results suggest that the true number of binding residues may be eight, fewer than the commonly assumed nine. In addition, for longer peptides, the model captures increased flexibility in the central region, consistent with structural observations that peptides may bulge in the middle.
Availability: The raw data and the source codes are available on GitHub (https://github.com/RanLIUaca/GAMMAmotif).
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"GAMMA: Gap-aware Motif Mining under Incomplete Labeling with Applications to MHC Motifs.","authors":"Xinyi Tang, Ran Liu","doi":"10.1093/bioinformatics/btag014","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag014","url":null,"abstract":"<p><strong>Motivation: </strong>Sequence motif identification is crucial for understanding molecular recognition, particularly in immune responses involving peptide binding to MHC class I molecules for antigen presentation to T cells. Traditionally, MHC class I binding motifs are assumed to be contiguous and span nine amino acids. However, structural evidence suggests that binding may involve non-adjacent residues, challenging the assumptions of existing methods.</p><p><strong>Results: </strong>In this study, we propose GAMMA (Gap-Aware Motif Mining Algorithm), a probabilistic framework designed to identify non-contiguous motifs under conditions of incomplete labeling. GAMMA employs Bayesian inference with MCMC sampling to jointly estimate motif parameters, binding locations, and the relative spacing between binding positions. Through extensive simulations and real-world applications to MHC class I peptide datasets, GAMMA outperforms existing motif discovery tools such as GLAM2 in accurately localizing binding residues and identifying the underlying motifs. Notably, our results suggest that the true number of binding residues may be eight, fewer than the commonly assumed nine. In addition, for longer peptides, the model captures increased flexibility in the central region, consistent with structural observations that peptides may bulge in the middle.</p><p><strong>Availability: </strong>The raw data and the source codes are available on GitHub (https://github.com/RanLIUaca/GAMMAmotif).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1093/bioinformatics/btag011
Darya Shlyk, Lawrence Hunter
Motivation: Biomedical Entity Linking (BEL) maps mentions in biomedical text to standardized identifiers, enabling structured data integration and downstream knowledge discovery. However, current BEL systems remain fundamentally constrained by the recall of the initial candidate pool, where suboptimal retrieval limits the overall effectiveness of the normalization pipeline.
Results: We present the first systematic evaluation of Generative Relevance Feedback (GRF) for enhancing candidate retrieval in state-of-the-art BEL systems. GRF leverages large language models (LLMs) to enrich the expressiveness of the mention in a zero-shot fashion. We assess GRF's impact under two scenarios-direct linking prediction and candidate generation in cascading normalization pipelines-and analyze its sensitivity to different LLMs, feedback types, and integration strategies. Experiments across eight corpora and four biomedical knowledge bases demonstrate that integrating GRF significantly improves both accuracy and recall, thereby increasing the upper bound on normalization performance. Our findings highlight GRF as an efficient, model-agnostic solution and underscore its potential as a key component for advancing BEL.
Availability: The code to reproduce our experiments can be found at: https://doi.org/10.5281/zenodo.17853541.
{"title":"Improving biomedical entity linking with generative relevance feedback.","authors":"Darya Shlyk, Lawrence Hunter","doi":"10.1093/bioinformatics/btag011","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag011","url":null,"abstract":"<p><strong>Motivation: </strong>Biomedical Entity Linking (BEL) maps mentions in biomedical text to standardized identifiers, enabling structured data integration and downstream knowledge discovery. However, current BEL systems remain fundamentally constrained by the recall of the initial candidate pool, where suboptimal retrieval limits the overall effectiveness of the normalization pipeline.</p><p><strong>Results: </strong>We present the first systematic evaluation of Generative Relevance Feedback (GRF) for enhancing candidate retrieval in state-of-the-art BEL systems. GRF leverages large language models (LLMs) to enrich the expressiveness of the mention in a zero-shot fashion. We assess GRF's impact under two scenarios-direct linking prediction and candidate generation in cascading normalization pipelines-and analyze its sensitivity to different LLMs, feedback types, and integration strategies. Experiments across eight corpora and four biomedical knowledge bases demonstrate that integrating GRF significantly improves both accuracy and recall, thereby increasing the upper bound on normalization performance. Our findings highlight GRF as an efficient, model-agnostic solution and underscore its potential as a key component for advancing BEL.</p><p><strong>Availability: </strong>The code to reproduce our experiments can be found at: https://doi.org/10.5281/zenodo.17853541.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1093/bioinformatics/btag023
Siera Martinez, Tushar Sharma, Luke Johnson, Allen Kim, Vania Ballesteros Prieto, Hovhannes Arestakesyan, Sunisha Harish, Jewel Dias, Joseph Goldfrank, Nathan Edwards, Anelia Horvath
Motivation: Accurately characterizing expressed genetic variation at the single-cell level is essential for understanding transcriptional heterogeneity, allelic regulation, and mutational dynamics within complex tissues. However, few tools enable comprehensive visualization and quantitative analysis of expressed variants across individual cells.
Results: scSNViz is an R package for the exploration, quantification, and visualization of expressed single-nucleotide variants (SNVs) from cell-barcoded single-cell RNA sequencing (scRNA-seq) data. The software supports estimation of variant allele fractions, clustering of SNV expression profiles, and 2D and 3D visualization of individual SNVs or user-defined SNV groups. Beyond visualization, scSNViz facilitates investigation of cell-, cluster-, or lineage-specific variant expression patterns, as well as allelic dynamics including imprinting, random allele inactivation, and transcriptional bursting. It interoperates seamlessly with established single-cell frameworks-Seurat for clustering, Slingshot for trajectory inference, scType for cell-type annotation, and CopyKat for copy-number profiling-enabling integrative multi-omic analyses of expressed variation.
Availability: scSNViz is implemented in R and freely available at https://github.com/HorvathLab/scSNViz (DOI: 10.5281/zenodo.17307516). The package includes comprehensive documentation and example workflows designed for users with limited bioinformatics experience.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"scSNViz: Visualization and analysis of Cell-Specific expressed SNVs.","authors":"Siera Martinez, Tushar Sharma, Luke Johnson, Allen Kim, Vania Ballesteros Prieto, Hovhannes Arestakesyan, Sunisha Harish, Jewel Dias, Joseph Goldfrank, Nathan Edwards, Anelia Horvath","doi":"10.1093/bioinformatics/btag023","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag023","url":null,"abstract":"<p><strong>Motivation: </strong>Accurately characterizing expressed genetic variation at the single-cell level is essential for understanding transcriptional heterogeneity, allelic regulation, and mutational dynamics within complex tissues. However, few tools enable comprehensive visualization and quantitative analysis of expressed variants across individual cells.</p><p><strong>Results: </strong>scSNViz is an R package for the exploration, quantification, and visualization of expressed single-nucleotide variants (SNVs) from cell-barcoded single-cell RNA sequencing (scRNA-seq) data. The software supports estimation of variant allele fractions, clustering of SNV expression profiles, and 2D and 3D visualization of individual SNVs or user-defined SNV groups. Beyond visualization, scSNViz facilitates investigation of cell-, cluster-, or lineage-specific variant expression patterns, as well as allelic dynamics including imprinting, random allele inactivation, and transcriptional bursting. It interoperates seamlessly with established single-cell frameworks-Seurat for clustering, Slingshot for trajectory inference, scType for cell-type annotation, and CopyKat for copy-number profiling-enabling integrative multi-omic analyses of expressed variation.</p><p><strong>Availability: </strong>scSNViz is implemented in R and freely available at https://github.com/HorvathLab/scSNViz (DOI: 10.5281/zenodo.17307516). The package includes comprehensive documentation and example workflows designed for users with limited bioinformatics experience.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145986024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1093/bioinformatics/btag021
Huda Ahmad, Hannah M Doherty, Sam Benedict, James Haycocks, Ge Zhou, Patrick Moynihan, Danesh Moradigaravand, Manuel Banzhaf
Motivation: Chemical genomics is a powerful high-throughput approach to systematically link phenotypes to genotypes. However, the vast datasets generated remain challenging to explore due to the lack of integrated, interactive tools for visualisation and analysis. Existing workflows often require multiple independent software tools, limiting data accessibility and collaboration. Therefore, we created a user-friendly platform that enables efficient exploration and sharing of chemical genomics data.
Results: We developed ChemGenXplore, a web-based Shiny application designed to streamline the visualisation and analysis of chemical genomic screens. It offers two primary functionalities: one for exploring pre-implemented datasets and another for analysing user-uploaded datasets. ChemGenXplore enables users to visualise phenotypic profiles, assess gene-gene and condition-condition correlations, perform GO and KEGG enrichment analysis, and generate customisable, interactive heatmaps. To further support collaborative research, ChemGenXplore also facilitates the comparative analysis of chemical genomic and other omics datasets. By consolidating these features into a single interactive and accessible tool, ChemGenXplore facilitates data sharing, enhances reproducibility, and promotes collaboration within the research community.
Availability: ChemGenXplore is freely accessible as a web application at https://chemgenxplore.kaust.edu.sa/. Source code and documentation, including instructions for local installation, are provided on GitHub (https://github.com/Hudaahmadd/ChemGenXplore). A Docker image is also available on DockerHub (https://hub.docker.com/r/hudaahmad/chemgenxplore) to ensure reproducibility and simplify installation.
Contact: example@example.org.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"ChemGenXplore: An Interactive Tool for Exploring and Analysing Chemical Genomic Data.","authors":"Huda Ahmad, Hannah M Doherty, Sam Benedict, James Haycocks, Ge Zhou, Patrick Moynihan, Danesh Moradigaravand, Manuel Banzhaf","doi":"10.1093/bioinformatics/btag021","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag021","url":null,"abstract":"<p><strong>Motivation: </strong>Chemical genomics is a powerful high-throughput approach to systematically link phenotypes to genotypes. However, the vast datasets generated remain challenging to explore due to the lack of integrated, interactive tools for visualisation and analysis. Existing workflows often require multiple independent software tools, limiting data accessibility and collaboration. Therefore, we created a user-friendly platform that enables efficient exploration and sharing of chemical genomics data.</p><p><strong>Results: </strong>We developed ChemGenXplore, a web-based Shiny application designed to streamline the visualisation and analysis of chemical genomic screens. It offers two primary functionalities: one for exploring pre-implemented datasets and another for analysing user-uploaded datasets. ChemGenXplore enables users to visualise phenotypic profiles, assess gene-gene and condition-condition correlations, perform GO and KEGG enrichment analysis, and generate customisable, interactive heatmaps. To further support collaborative research, ChemGenXplore also facilitates the comparative analysis of chemical genomic and other omics datasets. By consolidating these features into a single interactive and accessible tool, ChemGenXplore facilitates data sharing, enhances reproducibility, and promotes collaboration within the research community.</p><p><strong>Availability: </strong>ChemGenXplore is freely accessible as a web application at https://chemgenxplore.kaust.edu.sa/. Source code and documentation, including instructions for local installation, are provided on GitHub (https://github.com/Hudaahmadd/ChemGenXplore). A Docker image is also available on DockerHub (https://hub.docker.com/r/hudaahmad/chemgenxplore) to ensure reproducibility and simplify installation.</p><p><strong>Contact: </strong>example@example.org.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145967397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}