Pub Date : 2026-01-19DOI: 10.1093/bioinformatics/btag036
Zhihui Zhu, Huapeng Liu, Xuechen Li, Haojin Zhou, Jiaqi Wang
Motivation: Short peptides hold significant promise in drug discovery and materials science due to their biocompatibility, multifunctionality, ease of synthesis, etc. However, accurately predicting their physicochemical properties, a prerequisite for application development, remains a grand challenge due to the sheet quantity of peptides.
Results: This study presents an innovative approach integrating uniform design (UD) on the sampling over the whole space with artificial intelligence (AI) on the sampled data to enhance prediction of key physicochemical properties, including aggregation propensity (AP), hydrophilicity (logP), and isoelectric point (pI), within the complete sequence space of tetrapeptides (160,000 sequences). Using UD, we generate 31 distinct peptide datasets, with a consistent amino acid occupation fraction of 5% at each position, thereby creating unbiased training data without any amino acid preferences for training AI models. This work provides comprehensive datasets on the physicochemical properties of all tetrapeptides, develops robust AI-based predictive models, and quantitatively elucidates the relationships between key physicochemical attributes and self-assembly behaviors of short peptides by Shapley Additive Explanations (SHAP) analysis. By integrating the strategic experimental design (i.e., UD), AI modeling, and peptide domain knowledge, our approach facilitates the discovery and optimization of functional peptides, offering new opportunities for peptide-based therapeutic applications.
Availability: The complete datasets, source code, and pre-trained models are made available at the Github repository (https://github.com/JiaqiBenWang/UD-AI-Peptide) and Zenodo (https://doi.org/10.5281/zenodo.17984124).
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"Uniform Design-Embedded Predictions of (Tetra-)Peptide Physicochemical Properties.","authors":"Zhihui Zhu, Huapeng Liu, Xuechen Li, Haojin Zhou, Jiaqi Wang","doi":"10.1093/bioinformatics/btag036","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag036","url":null,"abstract":"<p><strong>Motivation: </strong>Short peptides hold significant promise in drug discovery and materials science due to their biocompatibility, multifunctionality, ease of synthesis, etc. However, accurately predicting their physicochemical properties, a prerequisite for application development, remains a grand challenge due to the sheet quantity of peptides.</p><p><strong>Results: </strong>This study presents an innovative approach integrating uniform design (UD) on the sampling over the whole space with artificial intelligence (AI) on the sampled data to enhance prediction of key physicochemical properties, including aggregation propensity (AP), hydrophilicity (logP), and isoelectric point (pI), within the complete sequence space of tetrapeptides (160,000 sequences). Using UD, we generate 31 distinct peptide datasets, with a consistent amino acid occupation fraction of 5% at each position, thereby creating unbiased training data without any amino acid preferences for training AI models. This work provides comprehensive datasets on the physicochemical properties of all tetrapeptides, develops robust AI-based predictive models, and quantitatively elucidates the relationships between key physicochemical attributes and self-assembly behaviors of short peptides by Shapley Additive Explanations (SHAP) analysis. By integrating the strategic experimental design (i.e., UD), AI modeling, and peptide domain knowledge, our approach facilitates the discovery and optimization of functional peptides, offering new opportunities for peptide-based therapeutic applications.</p><p><strong>Availability: </strong>The complete datasets, source code, and pre-trained models are made available at the Github repository (https://github.com/JiaqiBenWang/UD-AI-Peptide) and Zenodo (https://doi.org/10.5281/zenodo.17984124).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146004742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1093/bioinformatics/btag030
Isis Narváez-Bandera, Ashley Lui, Yonatan Ayalew Mekonnen, Vanessa Rubio, Augustine Takyi, Noah Sulman, Christopher Wilson, Hayley D Ackerman, Oscar E Ospina, Guillermo Gonzalez-Calderon, Elsa Flores, Qian Li, Ann Chen, Brooke Fridley, Paul Stewart
Summary: Integrative Module Analysis for Multi-omics Data (iModMix) is a biology-agnostic framework that enables the discovery of novel associations across any type of quantitative abundance data, including but not limited to transcriptomics, proteomics, and metabolomics. Instead of relying on pathway annotations or prior biological knowledge, iModMix constructs data-driven modules using graphical lasso to estimate sparse networks from omics features. These modules are summarized into eigenfeatures and correlated across datasets for horizontal integration, while preserving the distinct feature sets and interpretability of each omics type. iModMix operates directly on matrices containing expression or abundances for a wide range of features, including but not limited to genes, proteins, and metabolites. Because it does not rely on annotations (e.g., KEGG identifiers), it can seamlessly incorporate both identified and unidentified metabolites, addressing a key limitation of many existing metabolomics tools. iModMix is available as a user-friendly R Shiny application requiring no programming expertise (https://imodmix.moffitt.org), and as a Bioconductor R package for advanced users (https://bioconductor.org/packages/release/bioc/html/iModMix.html). The tool includes several public and in-house datasets to illustrate its utility in identifying novel multi-omics relationships in diverse biological contexts.
Availability and implementation: iModMix is freely available from Bioconductor (https://bioconductor.org/packages/release/bioc/html/iModMix.html) and the example dataset package (iModMixData) is also available from Bioconductor (https://bioconductor.org/packages/release/ data/experiment/html/iModMixData.html). The R package source code and Docker is available from GitHub: https://github.com/biodatalab/iModMix. Shiny application can be accessed at: https://imodmix.moffitt.org.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"iModMix: Integrative Module Analysis for Multi-omics Data.","authors":"Isis Narváez-Bandera, Ashley Lui, Yonatan Ayalew Mekonnen, Vanessa Rubio, Augustine Takyi, Noah Sulman, Christopher Wilson, Hayley D Ackerman, Oscar E Ospina, Guillermo Gonzalez-Calderon, Elsa Flores, Qian Li, Ann Chen, Brooke Fridley, Paul Stewart","doi":"10.1093/bioinformatics/btag030","DOIUrl":"10.1093/bioinformatics/btag030","url":null,"abstract":"<p><strong>Summary: </strong>Integrative Module Analysis for Multi-omics Data (iModMix) is a biology-agnostic framework that enables the discovery of novel associations across any type of quantitative abundance data, including but not limited to transcriptomics, proteomics, and metabolomics. Instead of relying on pathway annotations or prior biological knowledge, iModMix constructs data-driven modules using graphical lasso to estimate sparse networks from omics features. These modules are summarized into eigenfeatures and correlated across datasets for horizontal integration, while preserving the distinct feature sets and interpretability of each omics type. iModMix operates directly on matrices containing expression or abundances for a wide range of features, including but not limited to genes, proteins, and metabolites. Because it does not rely on annotations (e.g., KEGG identifiers), it can seamlessly incorporate both identified and unidentified metabolites, addressing a key limitation of many existing metabolomics tools. iModMix is available as a user-friendly R Shiny application requiring no programming expertise (https://imodmix.moffitt.org), and as a Bioconductor R package for advanced users (https://bioconductor.org/packages/release/bioc/html/iModMix.html). The tool includes several public and in-house datasets to illustrate its utility in identifying novel multi-omics relationships in diverse biological contexts.</p><p><strong>Availability and implementation: </strong>iModMix is freely available from Bioconductor (https://bioconductor.org/packages/release/bioc/html/iModMix.html) and the example dataset package (iModMixData) is also available from Bioconductor (https://bioconductor.org/packages/release/ data/experiment/html/iModMixData.html). The R package source code and Docker is available from GitHub: https://github.com/biodatalab/iModMix. Shiny application can be accessed at: https://imodmix.moffitt.org.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146004647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1093/bioinformatics/btag010
Huaiwu Zhang, Xinliang Sun, Jianxin Wang, Min Li, Jing Tang
Motivation: Drug synergy is crucial for developing effective combination therapies, but traditional screening methods suffer from inefficiency and high costs. While deep learning shows promise for predicting drug synergy, current approaches using Transformers and graph neural networks focus on combining drug and cell line features without modelling how genes causally influence drug responses.
Results: To address this limitation, we propose CADS (Causal Adjustment for Drug Synergy), a deep learning framework that integrates causal relationships between genes and drug responses. Leveraging multi-omics data, CADS uses a learnable mask mechanism to identify key causal genes while filtering out irrelevant genetic factors through backdoor adjustment. Our model achieves two key objectives simultaneously: accurate prediction of drug synergy and interpretable causal gene discovery. Experiments on multiple datasets show that CADS consistently outperforms state-of-the-art methods across multiple metrics. Case studies demonstrate that CADS can reduce unnecessary complexity while providing more biological insights through its gene importance scores, which help identify clinically validated cancer-related genes that mediate drug interactions.
Availability and implementation: Taken together, CADS advances combination therapy prediction by explicitly modelling drug synergy causal genes, offering enhanced interpretability for AI-based drug development. The source code can be found at https://github.com/HuaiwuZhang/causalDC.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"CADS: A Causal Inference Framework for Identifying Essential Genes to Enhance Drug Synergy Prediction.","authors":"Huaiwu Zhang, Xinliang Sun, Jianxin Wang, Min Li, Jing Tang","doi":"10.1093/bioinformatics/btag010","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag010","url":null,"abstract":"<p><strong>Motivation: </strong>Drug synergy is crucial for developing effective combination therapies, but traditional screening methods suffer from inefficiency and high costs. While deep learning shows promise for predicting drug synergy, current approaches using Transformers and graph neural networks focus on combining drug and cell line features without modelling how genes causally influence drug responses.</p><p><strong>Results: </strong>To address this limitation, we propose CADS (Causal Adjustment for Drug Synergy), a deep learning framework that integrates causal relationships between genes and drug responses. Leveraging multi-omics data, CADS uses a learnable mask mechanism to identify key causal genes while filtering out irrelevant genetic factors through backdoor adjustment. Our model achieves two key objectives simultaneously: accurate prediction of drug synergy and interpretable causal gene discovery. Experiments on multiple datasets show that CADS consistently outperforms state-of-the-art methods across multiple metrics. Case studies demonstrate that CADS can reduce unnecessary complexity while providing more biological insights through its gene importance scores, which help identify clinically validated cancer-related genes that mediate drug interactions.</p><p><strong>Availability and implementation: </strong>Taken together, CADS advances combination therapy prediction by explicitly modelling drug synergy causal genes, offering enhanced interpretability for AI-based drug development. The source code can be found at https://github.com/HuaiwuZhang/causalDC.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1093/bioinformatics/btag002
Hayden C Metsky, Katherine J Siddle, Christian B Matranga, Pardis C Sabeti
{"title":"Best practices when benchmarking CATCH for the design of genome enrichment probes.","authors":"Hayden C Metsky, Katherine J Siddle, Christian B Matranga, Pardis C Sabeti","doi":"10.1093/bioinformatics/btag002","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag002","url":null,"abstract":"","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145967178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09DOI: 10.1093/bioinformatics/btaf662
Jixiang Li, Ruilin Cai, Ziteng Wang, Ye Sun, Wenge Yang, Yonghong Hu
Accurately predicting protein-ligand interactions and binding affinities is essential for advancing structural biology. Despite recent advancements in deep learning, achieving rapid and precise predictions remains a challenging task. Our approach, PLXFPred (Protein-Ligand Cross-Modal Fusion Predictor), extracts physicochemical properties from amino acid sequences and SMILES. Additionally, it leverages pre-trained models to derive high-dimensional features. GATv2 and BILSTM were used to process the structural and sequence features, respectively. The model's core involves fusing sequence and graph features via a cross-modal cross-attention mechanism, followed by a multi-modal hierarchical fusion strategy that integrates high-level graph, early fusion, and cross-fusion features. Residual connections and conditional domain adversarial learning improve generalization to previously unseen protein-ligand pairs. Compared to state-of-the-art models, PLXFPred demonstrates superior performance, reducing errors (RMSD, MAE, SD) by over 50%, while providing interpretable biological insights through attention weight visualization and SHAP analysis.
Availability: The resource codes are available at https://github.com/xiyuyangtuo/PLXFPred/.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"PLXFPred: Interpretable cross-attention networks with hierarchical fusion of multi-modal features for predicting protein-ligand interactions and affinities.","authors":"Jixiang Li, Ruilin Cai, Ziteng Wang, Ye Sun, Wenge Yang, Yonghong Hu","doi":"10.1093/bioinformatics/btaf662","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf662","url":null,"abstract":"<p><p>Accurately predicting protein-ligand interactions and binding affinities is essential for advancing structural biology. Despite recent advancements in deep learning, achieving rapid and precise predictions remains a challenging task. Our approach, PLXFPred (Protein-Ligand Cross-Modal Fusion Predictor), extracts physicochemical properties from amino acid sequences and SMILES. Additionally, it leverages pre-trained models to derive high-dimensional features. GATv2 and BILSTM were used to process the structural and sequence features, respectively. The model's core involves fusing sequence and graph features via a cross-modal cross-attention mechanism, followed by a multi-modal hierarchical fusion strategy that integrates high-level graph, early fusion, and cross-fusion features. Residual connections and conditional domain adversarial learning improve generalization to previously unseen protein-ligand pairs. Compared to state-of-the-art models, PLXFPred demonstrates superior performance, reducing errors (RMSD, MAE, SD) by over 50%, while providing interpretable biological insights through attention weight visualization and SHAP analysis.</p><p><strong>Availability: </strong>The resource codes are available at https://github.com/xiyuyangtuo/PLXFPred/.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145947005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-03DOI: 10.1093/bioinformatics/btag028
Chenyang Xie, Yingying Song, Song He, Xiaochen Bo, Zhongnan Zhang
Motivation: The goal of molecular representation learning is to automate the extraction of molecular features, a critical task in cheminformatics and drug discovery. While pretraining models using multiple views like SMILES, 2D graphs, and 3D conformations have advanced the field, integrating them effectively to produce superior representations remains a challenge.
Results: To bridge this gap, we propose a novel multi-view molecular pretraining method termed MMPCS, which explicitly factorizes representations into consistency and specific information. Our approach utilizes the Graph Isomorphism Network and the RoBERTa model to encode 2D molecular topological graphs and SMILES sequences, respectively. Each resulting molecular embedding is decomposed into a shared consistency component and a view-specific remainder. An autoencoder then aligns the consistency information across views. The combined consistency and view-specific representations serve as input for downstream tasks, enabling precise and task-aware predictions. When benchmarked against 16 state-of-the-art molecular pretraining methods, MMPCS achieved the highest average performance across both classification and regression tasks for molecular property prediction. It also delivered outstanding results in predicting drug-target binding affinity and cancer drug response, demonstrating its robustness and broad applicability. Additionally, a case study on the SARS-CoV-2 Omicron variant highlights the potential of MMPCS in facilitating drug repurposing efforts.
Availability and implementation: The source code and datasets supporting this study are publicly available at GitHub (https://github.com/xmubiocode/MMPCS) and Zenodo (https://doi.org/10.5281/zenodo.18182748).
{"title":"MMPCS: multi-view molecular pretraining based on consistency information and specific information.","authors":"Chenyang Xie, Yingying Song, Song He, Xiaochen Bo, Zhongnan Zhang","doi":"10.1093/bioinformatics/btag028","DOIUrl":"10.1093/bioinformatics/btag028","url":null,"abstract":"<p><strong>Motivation: </strong>The goal of molecular representation learning is to automate the extraction of molecular features, a critical task in cheminformatics and drug discovery. While pretraining models using multiple views like SMILES, 2D graphs, and 3D conformations have advanced the field, integrating them effectively to produce superior representations remains a challenge.</p><p><strong>Results: </strong>To bridge this gap, we propose a novel multi-view molecular pretraining method termed MMPCS, which explicitly factorizes representations into consistency and specific information. Our approach utilizes the Graph Isomorphism Network and the RoBERTa model to encode 2D molecular topological graphs and SMILES sequences, respectively. Each resulting molecular embedding is decomposed into a shared consistency component and a view-specific remainder. An autoencoder then aligns the consistency information across views. The combined consistency and view-specific representations serve as input for downstream tasks, enabling precise and task-aware predictions. When benchmarked against 16 state-of-the-art molecular pretraining methods, MMPCS achieved the highest average performance across both classification and regression tasks for molecular property prediction. It also delivered outstanding results in predicting drug-target binding affinity and cancer drug response, demonstrating its robustness and broad applicability. Additionally, a case study on the SARS-CoV-2 Omicron variant highlights the potential of MMPCS in facilitating drug repurposing efforts.</p><p><strong>Availability and implementation: </strong>The source code and datasets supporting this study are publicly available at GitHub (https://github.com/xmubiocode/MMPCS) and Zenodo (https://doi.org/10.5281/zenodo.18182748).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12881828/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145986087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-03DOI: 10.1093/bioinformatics/btaf688
Dingyao Zhang, Zhiyuan Chu, Yiran Huo, Yunzhe Jiang, Yuhang Chen, Zhiliang Bai, Rong Fan, Jun Lu, Mark Gerstein
Motivation: Despite significant advances in spatial transcriptomics, the analysis of formalin-fixed paraffin-embedded (FFPE) tissues, which constitute most clinically available samples, remains challenging. Additionally, capturing both coding and non-coding RNAs in a spatial context poses significant challenges. We recently introduced Patho-DBiT, a technology designed to address these unmet needs. However, the marked differences between Patho-DBiT and existing spatial transcriptomics protocols necessitate specialized computational tools for comprehensive whole-transcriptome analysis in FFPE samples.
Results: Here, we present ASTRO, an automated pipeline developed to process spatial transcriptomics data. In addition to supporting standard datasets, ASTRO is optimized for whole-transcriptome analyses of FFPE samples, enabling the detection of various RNA species, including non-coding RNAs such as miRNAs. To compensate for the reduced RNA quality in FFPE tissues, ASTRO incorporates a specialized filtering step and optimizes spatial barcode calling, increasing the mapping rate. These optimizations allow ASTRO to spatially quantify coding and non-coding RNA species in the entire transcriptome and achieve robust performance in FFPE samples.
Availability and implementation: Codes are available at GitHub (https://github.com/gersteinlab/ASTRO) and Zenodo (doi: 10.5281/zenodo.17913760).
{"title":"ASTRO: Automated Spatial-Transcriptome whole RNA Output.","authors":"Dingyao Zhang, Zhiyuan Chu, Yiran Huo, Yunzhe Jiang, Yuhang Chen, Zhiliang Bai, Rong Fan, Jun Lu, Mark Gerstein","doi":"10.1093/bioinformatics/btaf688","DOIUrl":"10.1093/bioinformatics/btaf688","url":null,"abstract":"<p><strong>Motivation: </strong>Despite significant advances in spatial transcriptomics, the analysis of formalin-fixed paraffin-embedded (FFPE) tissues, which constitute most clinically available samples, remains challenging. Additionally, capturing both coding and non-coding RNAs in a spatial context poses significant challenges. We recently introduced Patho-DBiT, a technology designed to address these unmet needs. However, the marked differences between Patho-DBiT and existing spatial transcriptomics protocols necessitate specialized computational tools for comprehensive whole-transcriptome analysis in FFPE samples.</p><p><strong>Results: </strong>Here, we present ASTRO, an automated pipeline developed to process spatial transcriptomics data. In addition to supporting standard datasets, ASTRO is optimized for whole-transcriptome analyses of FFPE samples, enabling the detection of various RNA species, including non-coding RNAs such as miRNAs. To compensate for the reduced RNA quality in FFPE tissues, ASTRO incorporates a specialized filtering step and optimizes spatial barcode calling, increasing the mapping rate. These optimizations allow ASTRO to spatially quantify coding and non-coding RNA species in the entire transcriptome and achieve robust performance in FFPE samples.</p><p><strong>Availability and implementation: </strong>Codes are available at GitHub (https://github.com/gersteinlab/ASTRO) and Zenodo (doi: 10.5281/zenodo.17913760).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866646/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145913520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-03DOI: 10.1093/bioinformatics/btaf615
Alice Cleynen, Agin Ravindran, Nikolay E Shirokikh
Summary: RNA fractionation followed by high-throughput sequencing (RNA-seq) is widely used to study RNA localization, translation, structure, stability and subcellular compartmentalization. Interpreting fractionated RNA-seq data poses a fundamental compositional challenge: library preparation and sequencing depth obscure the original proportions of RNA fractions, which can bias comparisons-particularly when biological changes shift RNA distribution across fractions. This bias compromises comparisons of fraction-specific RNA profiles and limits the utility of standard differential expression methods. Existing approaches using transcript frequency ratios or standard normalization fail to account for the compositional nature of fractionated samples and also cannot estimate the unrecoverable "lost" fraction. We developed FracFixR, a statistical framework that reconstructs original fraction proportions by modeling the compositional relationship between the whole and the fractionated RNA samples. Using non-negative linear regression on carefully selected transcripts, FracFixR estimates global fraction weights, corrects individual transcript frequencies, and quantifies the unrecoverable material. The framework includes methods for differential proportion testing between conditions using binomial GLM, logit, or beta-binomial models. We rigorously validated FracFixR using synthetic data with known ground truth based on naturally observed aligned read distributions and real polysome profiling data from multiple cell lines, demonstrating accurate reconstruction of fraction weights (Pearson correlation >0.85) and enabling detection of differentially translated transcripts between cancer subtypes.
Availability and implementation: FracFixR is implemented as an R package freely available on GitHub at https://github.com/Arnaroo/FracFixR as well as on the CRAN repository.
{"title":"FracFixR: a compositional statistical framework for absolute proportion estimation between fractions in RNA sequencing data.","authors":"Alice Cleynen, Agin Ravindran, Nikolay E Shirokikh","doi":"10.1093/bioinformatics/btaf615","DOIUrl":"10.1093/bioinformatics/btaf615","url":null,"abstract":"<p><strong>Summary: </strong>RNA fractionation followed by high-throughput sequencing (RNA-seq) is widely used to study RNA localization, translation, structure, stability and subcellular compartmentalization. Interpreting fractionated RNA-seq data poses a fundamental compositional challenge: library preparation and sequencing depth obscure the original proportions of RNA fractions, which can bias comparisons-particularly when biological changes shift RNA distribution across fractions. This bias compromises comparisons of fraction-specific RNA profiles and limits the utility of standard differential expression methods. Existing approaches using transcript frequency ratios or standard normalization fail to account for the compositional nature of fractionated samples and also cannot estimate the unrecoverable \"lost\" fraction. We developed FracFixR, a statistical framework that reconstructs original fraction proportions by modeling the compositional relationship between the whole and the fractionated RNA samples. Using non-negative linear regression on carefully selected transcripts, FracFixR estimates global fraction weights, corrects individual transcript frequencies, and quantifies the unrecoverable material. The framework includes methods for differential proportion testing between conditions using binomial GLM, logit, or beta-binomial models. We rigorously validated FracFixR using synthetic data with known ground truth based on naturally observed aligned read distributions and real polysome profiling data from multiple cell lines, demonstrating accurate reconstruction of fraction weights (Pearson correlation >0.85) and enabling detection of differentially translated transcripts between cancer subtypes.</p><p><strong>Availability and implementation: </strong>FracFixR is implemented as an R package freely available on GitHub at https://github.com/Arnaroo/FracFixR as well as on the CRAN repository.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866640/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-03DOI: 10.1093/bioinformatics/btag016
Joseph Thorpe, Nina Billows, Gabrielle C Ngwana-Joseph, Amy Ibrahim, Deborah Nolder, Colin J Sutherland, Thi Hong Ngoc Nguyen, Thi Huong Binh Nguyen, Quang Thieu Nguyen, Jamille G Dombrowski, Silvia Maria Di Santi, Claudio R F Marinho, Jody E Phelan, Tomasz Kurowski, Fady Mohareb, Susana Campino, Taane G Clark
Motivation: Malaria, caused by Plasmodium parasites, imposes a significant public health burden. While Plasmodium falciparum remains the primary target of elimination strategies due to its high mortality rate, lesser-known species such as P. malariae, P. vivax, and P. knowlesi continue to contribute to substantial human morbidity. Genomic approaches, including whole-genome sequencing, offer powerful tools for understanding the biology, transmission, and emerging drug resistance of these neglected Plasmodium species. However, there is an urgent need for informatic tools to summarize and visualize the high-dimensional and complex genomic data generated.
Results: We developed Malaria-GENOMAP, a user-friendly web-based tool, which integrates genomic variant data, such as allele frequencies, with geographical maps and chromosome-wide to gene views for in-depth exploration. The tool includes variation from P. knowlesi (n = 139), P. malariae (n = 158), P. ovale curtisi (n = 36), P. ovale wallikeri (n = 47), P. simium (n = 38), and P. vivax (n = 1359). It enables the investigation of population structure, geographic associations of mutations, and putative drug resistance markers, offering valuable insights for malaria control efforts.
Availability and implementation: Malaria-GENOMAP is available online at https://genomics.lshtm.ac.uk/malaria-genomaps.
{"title":"Malaria-GENOMAP: a web-based tool for exploring genomic variation of malaria parasites.","authors":"Joseph Thorpe, Nina Billows, Gabrielle C Ngwana-Joseph, Amy Ibrahim, Deborah Nolder, Colin J Sutherland, Thi Hong Ngoc Nguyen, Thi Huong Binh Nguyen, Quang Thieu Nguyen, Jamille G Dombrowski, Silvia Maria Di Santi, Claudio R F Marinho, Jody E Phelan, Tomasz Kurowski, Fady Mohareb, Susana Campino, Taane G Clark","doi":"10.1093/bioinformatics/btag016","DOIUrl":"10.1093/bioinformatics/btag016","url":null,"abstract":"<p><strong>Motivation: </strong>Malaria, caused by Plasmodium parasites, imposes a significant public health burden. While Plasmodium falciparum remains the primary target of elimination strategies due to its high mortality rate, lesser-known species such as P. malariae, P. vivax, and P. knowlesi continue to contribute to substantial human morbidity. Genomic approaches, including whole-genome sequencing, offer powerful tools for understanding the biology, transmission, and emerging drug resistance of these neglected Plasmodium species. However, there is an urgent need for informatic tools to summarize and visualize the high-dimensional and complex genomic data generated.</p><p><strong>Results: </strong>We developed Malaria-GENOMAP, a user-friendly web-based tool, which integrates genomic variant data, such as allele frequencies, with geographical maps and chromosome-wide to gene views for in-depth exploration. The tool includes variation from P. knowlesi (n = 139), P. malariae (n = 158), P. ovale curtisi (n = 36), P. ovale wallikeri (n = 47), P. simium (n = 38), and P. vivax (n = 1359). It enables the investigation of population structure, geographic associations of mutations, and putative drug resistance markers, offering valuable insights for malaria control efforts.</p><p><strong>Availability and implementation: </strong>Malaria-GENOMAP is available online at https://genomics.lshtm.ac.uk/malaria-genomaps.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12881832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145954215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-03DOI: 10.1093/bioinformatics/btag033
Junqi Long, Bo Liu, Jianqiang Li, Shuangtao Zhao
Motivation: Interactions among long noncoding RNAs, circular RNAs, microRNAs, and messenger RNAs form complex gene expression regulatory networks, which are of great significance for the diagnosis, prevention, and treatment of complex diseases. Although existing computational methods have been developed to predict interactions among certain molecular types, they are generally limited to single-modality perspectives, overlooking competitive specificity and co-target cooperativity across multi-omics molecules, and thereby limiting their ability to elucidate cross-omics regulatory mechanisms.
Results: We proposed a novel cross-omics adaptive multimodal contrastive learning framework (MCOAN) that learns multimodal regulatory mechanisms and effectively predicts disease-associated molecular regulatory networks. Specifically, we first constructed a five-layer heterogeneous graph architecture to comprehensively integrate the complex regulatory associations among multi-omics nodes. Then, we proposed an unsupervised multimodal contrastive learning strategy that maximizes mutual information across distinct regulatory views, thereby enhancing node representations by efficiently capturing local neighborhood structure and global semantic information. Meanwhile, we also proposed a cross-omics adaptive learning mechanism that captures complex competitive specificity and co-target cooperativity across distinct regulatory networks, thereby further enhancing the structural awareness in node representations. Furthermore, we evaluated multiple downstream classifiers to accurately predict multimodal molecular regulatory networks. Finally, extensive experiments show that MCOAN consistently outperforms existing methods, achieving strong predictive accuracy and generalization (max AUC = 0.9881; max AUPR = 0.9826), and further confirm its real-world predictive performance through case studies.
Availability and implementation: All resources are available at https://github.com/JunqiLab/MCOAN.git.
{"title":"MCOAN: multimodal contrastive representation learning for cross-omics adaptive disease regulatory network prediction.","authors":"Junqi Long, Bo Liu, Jianqiang Li, Shuangtao Zhao","doi":"10.1093/bioinformatics/btag033","DOIUrl":"10.1093/bioinformatics/btag033","url":null,"abstract":"<p><strong>Motivation: </strong>Interactions among long noncoding RNAs, circular RNAs, microRNAs, and messenger RNAs form complex gene expression regulatory networks, which are of great significance for the diagnosis, prevention, and treatment of complex diseases. Although existing computational methods have been developed to predict interactions among certain molecular types, they are generally limited to single-modality perspectives, overlooking competitive specificity and co-target cooperativity across multi-omics molecules, and thereby limiting their ability to elucidate cross-omics regulatory mechanisms.</p><p><strong>Results: </strong>We proposed a novel cross-omics adaptive multimodal contrastive learning framework (MCOAN) that learns multimodal regulatory mechanisms and effectively predicts disease-associated molecular regulatory networks. Specifically, we first constructed a five-layer heterogeneous graph architecture to comprehensively integrate the complex regulatory associations among multi-omics nodes. Then, we proposed an unsupervised multimodal contrastive learning strategy that maximizes mutual information across distinct regulatory views, thereby enhancing node representations by efficiently capturing local neighborhood structure and global semantic information. Meanwhile, we also proposed a cross-omics adaptive learning mechanism that captures complex competitive specificity and co-target cooperativity across distinct regulatory networks, thereby further enhancing the structural awareness in node representations. Furthermore, we evaluated multiple downstream classifiers to accurately predict multimodal molecular regulatory networks. Finally, extensive experiments show that MCOAN consistently outperforms existing methods, achieving strong predictive accuracy and generalization (max AUC = 0.9881; max AUPR = 0.9826), and further confirm its real-world predictive performance through case studies.</p><p><strong>Availability and implementation: </strong>All resources are available at https://github.com/JunqiLab/MCOAN.git.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12881826/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146004644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}