Motivation: The increasing prevalence of antibiotic-resistant bacteria has intensified the demand for novel antimicrobial agents. Antimicrobial peptides (AMPs) have emerged as promising alternatives, yet their identification or classification remains challenging due to the lack of multi-perspective information, insufficient feature representation learning, and monocular data modalities.
Results: In this paper, we propose a dual diffusion model-based representation learning framework for classifying AMPs, which effectively integrates both peptide sequence and structure information to address existing issues for the task. Specifically, our approach utilizes a multi-view feature construction module, which encodes peptide sequences and structures from distinctive perspectives, deriving initial feature representations with enriched biological semantics. To enhance representation learning, the proposed framework leverages both diffusion models for sequence and structure information respectively to effectively capture complex semantics from dual modalities. In addition, both single-modal and dual-modal contrastive learning are employed to further advance the representation learning. Results of comprehensive experiments demonstrate that our model outperforms existing methods for the task of AMPs classification, providing a feasible solution to accelerating the discovery of novel antimicrobial agents.
Availability of data and codes: The data and source codes are available in GitHub at https://github.com/kww567upup/DDM.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"A Dual Diffusion Model-Based Representation Learning Framework for AMPs Classification.","authors":"Wen Kong, Lingling Fu, Xingpeng Jiang, Weizhong Zhao","doi":"10.1093/bioinformatics/btag077","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag077","url":null,"abstract":"<p><strong>Motivation: </strong>The increasing prevalence of antibiotic-resistant bacteria has intensified the demand for novel antimicrobial agents. Antimicrobial peptides (AMPs) have emerged as promising alternatives, yet their identification or classification remains challenging due to the lack of multi-perspective information, insufficient feature representation learning, and monocular data modalities.</p><p><strong>Results: </strong>In this paper, we propose a dual diffusion model-based representation learning framework for classifying AMPs, which effectively integrates both peptide sequence and structure information to address existing issues for the task. Specifically, our approach utilizes a multi-view feature construction module, which encodes peptide sequences and structures from distinctive perspectives, deriving initial feature representations with enriched biological semantics. To enhance representation learning, the proposed framework leverages both diffusion models for sequence and structure information respectively to effectively capture complex semantics from dual modalities. In addition, both single-modal and dual-modal contrastive learning are employed to further advance the representation learning. Results of comprehensive experiments demonstrate that our model outperforms existing methods for the task of AMPs classification, providing a feasible solution to accelerating the discovery of novel antimicrobial agents.</p><p><strong>Availability of data and codes: </strong>The data and source codes are available in GitHub at https://github.com/kww567upup/DDM.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15DOI: 10.1093/bioinformatics/btag064
Yojana Gadiya, Javier Millán Acosta, Ammar Ammar, Alejandro Adriaque Lozano, Delano Wetstede, Dominik Martinát, Ana Claudia Sima, Hailiang Mei, Egon Willighagen, Tooba Abbassi-Daloii
Motivation: Integrating omics data analysis with publicly available databases is crucial for unravelling complex biological mechanisms. However, this integration process is often intricate and time-consuming due to the diversity and complexity of the data involved. Achieving consistent harmonization across data types is challenging when managing disparate formats and sources. To address these issues, we introduce pyBiodatafuse, a query-based Python tool designed to integrate biomedical databases. This tool establishes a modular framework that simplifies data wrangling, enabling the creation of context-specific knowledge graphs (KGs) while supporting graph-based analyses.
Results: We developed a pipeline for generating context-specific knowledge graphs dynamically, allowing users to create KGs on the fly from a set of gene or metabolite identifiers. pyBiodatafuse features a user-friendly interface that streamlines this process, making it accessible even to researchers without extensive computational expertise. Additionally, the tool offers plugins for widely used platforms such as Cytoscape, Neo4j, and GraphDB, enabling local hosting of resulting property and RDF graphs. This versatility ensures that generated KGs can be efficiently utilized within diverse research workflows. To demonstrate its potential, we used pyBiodatafuse to create a graph for post-COVID syndrome using differential gene expression data, showcasing its ability to build adaptable and context-specific knowledge representations. Thus, pyBiodatafuse sets the stage for streamlined data integration, empowering researchers to focus on discovery and analysis without being hindered by data management complexities.
Availability and implementation: pyBiodatafuse is open-source, with its source code and PyPi package available at https://github.com/BioDataFuse/pyBiodatafuse and https://pypi.org/project/pyBiodatafuse/. The user interface can be accessed at https://biodatafuse.org/. Additionally, a release has been made on Zenodo at https://doi.org/10.5281/zenodo.18468942.
{"title":"pyBiodatafuse: Extending interoperability of data using modular queries across biomedical resources.","authors":"Yojana Gadiya, Javier Millán Acosta, Ammar Ammar, Alejandro Adriaque Lozano, Delano Wetstede, Dominik Martinát, Ana Claudia Sima, Hailiang Mei, Egon Willighagen, Tooba Abbassi-Daloii","doi":"10.1093/bioinformatics/btag064","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag064","url":null,"abstract":"<p><strong>Motivation: </strong>Integrating omics data analysis with publicly available databases is crucial for unravelling complex biological mechanisms. However, this integration process is often intricate and time-consuming due to the diversity and complexity of the data involved. Achieving consistent harmonization across data types is challenging when managing disparate formats and sources. To address these issues, we introduce pyBiodatafuse, a query-based Python tool designed to integrate biomedical databases. This tool establishes a modular framework that simplifies data wrangling, enabling the creation of context-specific knowledge graphs (KGs) while supporting graph-based analyses.</p><p><strong>Results: </strong>We developed a pipeline for generating context-specific knowledge graphs dynamically, allowing users to create KGs on the fly from a set of gene or metabolite identifiers. pyBiodatafuse features a user-friendly interface that streamlines this process, making it accessible even to researchers without extensive computational expertise. Additionally, the tool offers plugins for widely used platforms such as Cytoscape, Neo4j, and GraphDB, enabling local hosting of resulting property and RDF graphs. This versatility ensures that generated KGs can be efficiently utilized within diverse research workflows. To demonstrate its potential, we used pyBiodatafuse to create a graph for post-COVID syndrome using differential gene expression data, showcasing its ability to build adaptable and context-specific knowledge representations. Thus, pyBiodatafuse sets the stage for streamlined data integration, empowering researchers to focus on discovery and analysis without being hindered by data management complexities.</p><p><strong>Availability and implementation: </strong>pyBiodatafuse is open-source, with its source code and PyPi package available at https://github.com/BioDataFuse/pyBiodatafuse and https://pypi.org/project/pyBiodatafuse/. The user interface can be accessed at https://biodatafuse.org/. Additionally, a release has been made on Zenodo at https://doi.org/10.5281/zenodo.18468942.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15DOI: 10.1093/bioinformatics/btag071
Ke Xu, Xin Maizie Zhou, Lu Zhang
Spatially resolved transcriptomics (SRT) and spatially resolved proteomics (SRP) data enable the study of gene expression and protein abundances within their precise spatial and cellular contexts in tissues. Certain SRT and SRP technologies also capture corresponding morphology images, adding another layer of valuable information. However, few existing methods developed for SRT data effectively leverage these supplementary images to enhance clustering performance. Here, we introduce stDyer-image, an end-to-end deep learning framework designed for clustering for SRT and SRP datasets with images. Unlike existing methods that utilize images to complement gene expression data, stDyer-image directly links image features to cluster labels. This approach draws inspiration from pathologists, who can visually identify specific cell types or tumor regions from morphological images without relying on gene expression or protein abundances. Benchmarks against state-of-the-art tools demonstrate that stDyer-image achieves superior performance in clustering. Moreover, it is capable of handling large-scale datasets across diverse technologies, making it a versatile and powerful tool for spatial omics analysis.
{"title":"stDyer-image improves clustering analysis of spatially resolved transcriptomics and proteomics with morphological images.","authors":"Ke Xu, Xin Maizie Zhou, Lu Zhang","doi":"10.1093/bioinformatics/btag071","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag071","url":null,"abstract":"<p><p>Spatially resolved transcriptomics (SRT) and spatially resolved proteomics (SRP) data enable the study of gene expression and protein abundances within their precise spatial and cellular contexts in tissues. Certain SRT and SRP technologies also capture corresponding morphology images, adding another layer of valuable information. However, few existing methods developed for SRT data effectively leverage these supplementary images to enhance clustering performance. Here, we introduce stDyer-image, an end-to-end deep learning framework designed for clustering for SRT and SRP datasets with images. Unlike existing methods that utilize images to complement gene expression data, stDyer-image directly links image features to cluster labels. This approach draws inspiration from pathologists, who can visually identify specific cell types or tumor regions from morphological images without relying on gene expression or protein abundances. Benchmarks against state-of-the-art tools demonstrate that stDyer-image achieves superior performance in clustering. Moreover, it is capable of handling large-scale datasets across diverse technologies, making it a versatile and powerful tool for spatial omics analysis.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15DOI: 10.1093/bioinformatics/btag078
Freya E R Woods, Emilyanne Leonard, Timothy Ebbels, Jonathan Cairns, Rhiannon David
Motivation: Flow cytometry (FC) is a widely used technique for analysing cells or particles based on the fluorescence of specific markers. Thresholds for fluorescence are typically set manually, a laborious, subjective process that scales poorly as FC technology advances. Machine learning (ML) methods can address these issues but often require technical expe r tise many bench scientists do not possess. Thus, accessible, open-source, and cross-domain ML-based FC tools are needed.
Results: We present AutoFlow, an easy-to-use, adaptable R Shiny application for automa t ed flow cytometry (FC) analysis. AutoFlow supports two workflows: supervised and uns u pervised learning. The application automates key preprocessing steps including fluore s cence compensation, debris exclusion, single-cell identification, surface marker gating, MFI quantification, and downstream classification or clustering. Across three datasets, two pu b licly available (Mosmann and Nilsson Rare) and a novel bone marrow microphysiological system (BM-MPS) dataset, AutoFlow demonstrated robust performance. In the supervised workflow, multiclass classification on BM-MPS achieved 97.2% accuracy under a leave-one-timepoint-out scheme, with high sensitivity and specificity across major lineages. For rare populations, performance was strong: Mosmann Rare (0.03% prevalence) achieved 87.5% sensitivity, and 100% specificity, while Nilsson Rare (0.08% prevalence) achieved 87.9% sensitivity, and 99.9% specificity. The unsupervised workflow accurately grouped cells into biologically meaningful clusters, recovering known populations and identifying a d ditional candidate populations with marker profiles consistent with true biology. AutoFlow offers a fast, reproducible, and scalable solution for FC analysis, enabling high-throughput studies and improving the discovery of rare or unexpected cell types.
Availability: The application is available at https://github.com/FERWoods/AutoFlow for download using R. An archived version is available at DOI :10.5281/zenodo.18235796.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"AutoFlow: An interactive Shiny app for supervised and unsupervised flow cytometry analysis.","authors":"Freya E R Woods, Emilyanne Leonard, Timothy Ebbels, Jonathan Cairns, Rhiannon David","doi":"10.1093/bioinformatics/btag078","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag078","url":null,"abstract":"<p><strong>Motivation: </strong>Flow cytometry (FC) is a widely used technique for analysing cells or particles based on the fluorescence of specific markers. Thresholds for fluorescence are typically set manually, a laborious, subjective process that scales poorly as FC technology advances. Machine learning (ML) methods can address these issues but often require technical expe r tise many bench scientists do not possess. Thus, accessible, open-source, and cross-domain ML-based FC tools are needed.</p><p><strong>Results: </strong>We present AutoFlow, an easy-to-use, adaptable R Shiny application for automa t ed flow cytometry (FC) analysis. AutoFlow supports two workflows: supervised and uns u pervised learning. The application automates key preprocessing steps including fluore s cence compensation, debris exclusion, single-cell identification, surface marker gating, MFI quantification, and downstream classification or clustering. Across three datasets, two pu b licly available (Mosmann and Nilsson Rare) and a novel bone marrow microphysiological system (BM-MPS) dataset, AutoFlow demonstrated robust performance. In the supervised workflow, multiclass classification on BM-MPS achieved 97.2% accuracy under a leave-one-timepoint-out scheme, with high sensitivity and specificity across major lineages. For rare populations, performance was strong: Mosmann Rare (0.03% prevalence) achieved 87.5% sensitivity, and 100% specificity, while Nilsson Rare (0.08% prevalence) achieved 87.9% sensitivity, and 99.9% specificity. The unsupervised workflow accurately grouped cells into biologically meaningful clusters, recovering known populations and identifying a d ditional candidate populations with marker profiles consistent with true biology. AutoFlow offers a fast, reproducible, and scalable solution for FC analysis, enabling high-throughput studies and improving the discovery of rare or unexpected cell types.</p><p><strong>Availability: </strong>The application is available at https://github.com/FERWoods/AutoFlow for download using R. An archived version is available at DOI :10.5281/zenodo.18235796.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15DOI: 10.1093/bioinformatics/btag075
Leo Carl Foerster, Enrico Frigoli, Xiaoyu Sun, Jooa Hooli, Angela Goncalves, Ana Martin-Villalba
Motivation: Commercial solutions like 10X cellranger provide robust UMI quantification for their proprietary single-cell protocols, but open methods such as Smart-seq3 lack comparable support.
Results: Here, we introduce umite, a Smart-seq3 UMI counting pipeline with a focus on speed and a light memory footprint. Unlike existing tools, umite offers efficient mismatch-tolerant UMI detection, boosting UMI retrieval by 5-15% in benchmarks. It also outperforms current Smart-seq3 quantification tools in runtime, disk usage, and memory footprint, offering better scalability on large datasets.
Availability: umite is available at https://github.com/leoforster/umite (or via Zenodo: https://doi.org/10.5281/zenodo.18166431) and includes a Snakemake workflow for Smart-seq3 quantification. Single cell libraries of the mouse nasal vasculature dataset (GSE207085) and human CD4+ T-cell dataset (GSE270928) used in benchmarking were downloaded from NCBI (see Supplement for details).
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"Umite: fast quantification of smart-seq3 libraries with improved UMI retrieval.","authors":"Leo Carl Foerster, Enrico Frigoli, Xiaoyu Sun, Jooa Hooli, Angela Goncalves, Ana Martin-Villalba","doi":"10.1093/bioinformatics/btag075","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag075","url":null,"abstract":"<p><strong>Motivation: </strong>Commercial solutions like 10X cellranger provide robust UMI quantification for their proprietary single-cell protocols, but open methods such as Smart-seq3 lack comparable support.</p><p><strong>Results: </strong>Here, we introduce umite, a Smart-seq3 UMI counting pipeline with a focus on speed and a light memory footprint. Unlike existing tools, umite offers efficient mismatch-tolerant UMI detection, boosting UMI retrieval by 5-15% in benchmarks. It also outperforms current Smart-seq3 quantification tools in runtime, disk usage, and memory footprint, offering better scalability on large datasets.</p><p><strong>Availability: </strong>umite is available at https://github.com/leoforster/umite (or via Zenodo: https://doi.org/10.5281/zenodo.18166431) and includes a Snakemake workflow for Smart-seq3 quantification. Single cell libraries of the mouse nasal vasculature dataset (GSE207085) and human CD4+ T-cell dataset (GSE270928) used in benchmarking were downloaded from NCBI (see Supplement for details).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-13DOI: 10.1093/bioinformatics/btag068
Haoyu Zhang, Kevin Fotso, Marc Subirana-Granés, Milton Pividori
Motivation: Identifying meaningful patterns in complex biological data necessitates correlation coefficients capable of capturing diverse relationship types beyond simple linearity. Furthermore, efficient computational tools are crucial for handling the ever-increasing scale of biological datasets.
Results: We introduce CCC-GPU, a high-performance, GPU-accelerated implementation of the Clustermatch Correlation Coefficient (CCC). CCC-GPU computes correlation coefficients for mixed data types, effectively detects nonlinear relationships, and offers significant speed improvements over its predecessor.
Availability and implementation: The source code of CCC-GPU is openly available on GitHub (https://github.com/pivlab/ccc-gpu) and archived on Zenodo (https://doi.org/10.5281/zenodo.18310318), distributed under the BSD-2-Clause Plus Patent License.
{"title":"CCC-GPU: A graphics processing unit (GPU)-accelerated nonlinear correlation coefficient for large-scale transcriptomic analyses.","authors":"Haoyu Zhang, Kevin Fotso, Marc Subirana-Granés, Milton Pividori","doi":"10.1093/bioinformatics/btag068","DOIUrl":"10.1093/bioinformatics/btag068","url":null,"abstract":"<p><strong>Motivation: </strong>Identifying meaningful patterns in complex biological data necessitates correlation coefficients capable of capturing diverse relationship types beyond simple linearity. Furthermore, efficient computational tools are crucial for handling the ever-increasing scale of biological datasets.</p><p><strong>Results: </strong>We introduce CCC-GPU, a high-performance, GPU-accelerated implementation of the Clustermatch Correlation Coefficient (CCC). CCC-GPU computes correlation coefficients for mixed data types, effectively detects nonlinear relationships, and offers significant speed improvements over its predecessor.</p><p><strong>Availability and implementation: </strong>The source code of CCC-GPU is openly available on GitHub (https://github.com/pivlab/ccc-gpu) and archived on Zenodo (https://doi.org/10.5281/zenodo.18310318), distributed under the BSD-2-Clause Plus Patent License.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146196162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-12DOI: 10.1093/bioinformatics/btag063
Christos Matsingos, Ka Fu Man, Arianna Fornili
Summary: The protonation propensity of ionisable residues in proteins can change in response to changes in the local residue environment. The link between protein dynamics and pK a is particularly important in pH regulation of protein structure and function. Here, we introduce TrIPP (Trajectory Iterative pK a Predictor), a Python tool to track and analyse changes in the pK a of ionisable residues along Molecular Dynamics trajectories of proteins. We show how TrIPP can be used to identify residues with physiologically relevant variations in their predicted pK a values during the simulations, and link them to changes in the local and global environment.
Availability and implementation: TrIPP is available at https://github.com/fornililab/TrIPP.
Supplementary information: Supplementary data are available at Bioinformatics online.
摘要:蛋白质中可电离残基的质子化倾向会随着局部残基环境的变化而改变。蛋白质动力学和pK - a之间的联系在蛋白质结构和功能的pH调节中尤为重要。本文介绍了TrIPP (Trajectory Iterative pK a Predictor),这是一个Python工具,用于跟踪和分析蛋白质分子动力学轨迹上可电离残基pK a的变化。我们展示了TrIPP如何在模拟过程中用于识别具有预测pK值生理相关变化的残基,并将它们与局部和全球环境的变化联系起来。可用性和实施:TrIPP可在https://github.com/fornililab/TrIPP.Supplementary上获得信息;补充数据可在Bioinformatics在线获得。
{"title":"TrIPP: a Trajectory Iterative pKa Predictor.","authors":"Christos Matsingos, Ka Fu Man, Arianna Fornili","doi":"10.1093/bioinformatics/btag063","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag063","url":null,"abstract":"<p><strong>Summary: </strong>The protonation propensity of ionisable residues in proteins can change in response to changes in the local residue environment. The link between protein dynamics and pK a is particularly important in pH regulation of protein structure and function. Here, we introduce TrIPP (Trajectory Iterative pK a Predictor), a Python tool to track and analyse changes in the pK a of ionisable residues along Molecular Dynamics trajectories of proteins. We show how TrIPP can be used to identify residues with physiologically relevant variations in their predicted pK a values during the simulations, and link them to changes in the local and global environment.</p><p><strong>Availability and implementation: </strong>TrIPP is available at https://github.com/fornililab/TrIPP.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146183744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-10DOI: 10.1093/bioinformatics/btag055
Massimo Andreatta, Santiago J Carmona
Summary: Gene signature scoring provides a simple yet powerful approach for quantifying biological signals within single-cell omics datasets. UCell and pyUCell offer fast and robust implementations of rank-based signature scoring for R and Python, respectively, integrating seamlessly with leading single-cell analysis ecosystems such as Seurat, Bioconductor, and scanpy/scverse.
Availability and implementation: UCell v2 is distributed as an R package by BioConductor (https://bioconductor.org/packages/UCell/) and as a Python package by pyPI (https://pypi.org/project/pyucell/).
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"UCell and pyUCell: single-cell gene signature scoring for R and python.","authors":"Massimo Andreatta, Santiago J Carmona","doi":"10.1093/bioinformatics/btag055","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag055","url":null,"abstract":"<p><strong>Summary: </strong>Gene signature scoring provides a simple yet powerful approach for quantifying biological signals within single-cell omics datasets. UCell and pyUCell offer fast and robust implementations of rank-based signature scoring for R and Python, respectively, integrating seamlessly with leading single-cell analysis ecosystems such as Seurat, Bioconductor, and scanpy/scverse.</p><p><strong>Availability and implementation: </strong>UCell v2 is distributed as an R package by BioConductor (https://bioconductor.org/packages/UCell/) and as a Python package by pyPI (https://pypi.org/project/pyucell/).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146159723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-09DOI: 10.1093/bioinformatics/btag056
Jiaying Hu, Yihang Du, Suyang Hou, Yueyang Ding, Jinyan Li, Hao Wu, Xiaobo Sun
Motivation: Spatial clustering is a critical analytical task in spatial transcriptomics (ST) that aids in uncovering the spatial molecular mechanisms underlying biological phenotypes. Along with the numerous spatial clustering methods, there comes the imperative need for an effective metric to evaluate their performance. An ideal metric should consider three factors: label agreement, spatial organization, and error severity. However, existing evaluation metrics focus solely on either label agreement or spatial organization, leading to biased and misleading evaluations.
Results: To fill this gap, we propose CEMUSA, a novel graph-based metric that integrates these factors into a unified evaluation framework. Extensive testing on both simulated and real datasets demonstrate CEMUSA's superiority over conventional metrics in differentiating clustering results with subtle differences in topology and error severity, while maintaining computational efficiency.
Availability and implementation: The source code and data is freely available at https://github.com/YihDu/CEMUSA. CEMUSA is implemented as an R package at https://yihdu.github.io/CEMUSA.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"CEMUSA: A Graph-based Integrative Metric for Evaluating Clusters in Spatial Transcriptomics.","authors":"Jiaying Hu, Yihang Du, Suyang Hou, Yueyang Ding, Jinyan Li, Hao Wu, Xiaobo Sun","doi":"10.1093/bioinformatics/btag056","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag056","url":null,"abstract":"<p><strong>Motivation: </strong>Spatial clustering is a critical analytical task in spatial transcriptomics (ST) that aids in uncovering the spatial molecular mechanisms underlying biological phenotypes. Along with the numerous spatial clustering methods, there comes the imperative need for an effective metric to evaluate their performance. An ideal metric should consider three factors: label agreement, spatial organization, and error severity. However, existing evaluation metrics focus solely on either label agreement or spatial organization, leading to biased and misleading evaluations.</p><p><strong>Results: </strong>To fill this gap, we propose CEMUSA, a novel graph-based metric that integrates these factors into a unified evaluation framework. Extensive testing on both simulated and real datasets demonstrate CEMUSA's superiority over conventional metrics in differentiating clustering results with subtle differences in topology and error severity, while maintaining computational efficiency.</p><p><strong>Availability and implementation: </strong>The source code and data is freely available at https://github.com/YihDu/CEMUSA. CEMUSA is implemented as an R package at https://yihdu.github.io/CEMUSA.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146151369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: N6-methyladenine (6 mA) is an important epigenetic modification of DNA that regulates biological processes such as gene expression, transcription, replication, DNA repair, and cell cycle without altering the DNA sequence. It also plays a key role in many diseases including cancer and autoimmune diseases. Although experimental approaches such as SMRT sequencing and methylated DNA immunoprecipitation can identify 6 mA sites, they suffer from drawbacks including suboptimal sequencing quality, low signal-to-noise ratios, high costs, and time-consuming procedures. In recent years, deep learning approaches have demonstrated significant advantages in predicting 6 mA sites; however, their generalization ability still requires further improvement.
Results: Inspired by the state space model Mamba, we propose a novel model for 6 mA site prediction, named Mamba6mA. In the Mamba6mA model, we design position-specific linear layers to replace traditional convolutional layers to facilitate capture specific positional information. Meanwhile, we construct a multi-scale feature extraction module and integrate features captured by sliding windows of different scales, feeding them into the classifier for prediction. Experimental results show that Mamba6mA achieves the best MCC on 9 out of 11 species datasets, surpassing existing state-of-the-art models. Ablation studies confirm that the position-specific linear layers and the multi-scale fusion module contribute MCC performance gains of 2.36% and 2.31%, respectively. Feature visualization analysis further reveals that the model effectively captures sequence patterns upstream and downstream of 6 mA sites providing a new technical approach for studying epigenetic modification mechanisms.
Availability and implementation: The source code for Mamba6mA is available at: https://github.com/XploreAI-Lab/Mamba6mA.
Contact: Xiaoya Fan (xiaoyafan@dlut.edu.cn), Zheng Zhao (zhaozheng@dlmu.edu.cn).
Supplementary information: Supplementary information are available at Bioinformatics online.
{"title":"Mamba6mA: A Mamba-based DNA N6-methyladenine Site Prediction Model.","authors":"Qi Zhao, Zhen Zhang, Tingwei Chen, Qian Mao, Haoxuan Shi, Jingjing Chen, Zheng Zhao, Xiaoya Fan","doi":"10.1093/bioinformatics/btag060","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag060","url":null,"abstract":"<p><strong>Motivation: </strong>N6-methyladenine (6 mA) is an important epigenetic modification of DNA that regulates biological processes such as gene expression, transcription, replication, DNA repair, and cell cycle without altering the DNA sequence. It also plays a key role in many diseases including cancer and autoimmune diseases. Although experimental approaches such as SMRT sequencing and methylated DNA immunoprecipitation can identify 6 mA sites, they suffer from drawbacks including suboptimal sequencing quality, low signal-to-noise ratios, high costs, and time-consuming procedures. In recent years, deep learning approaches have demonstrated significant advantages in predicting 6 mA sites; however, their generalization ability still requires further improvement.</p><p><strong>Results: </strong>Inspired by the state space model Mamba, we propose a novel model for 6 mA site prediction, named Mamba6mA. In the Mamba6mA model, we design position-specific linear layers to replace traditional convolutional layers to facilitate capture specific positional information. Meanwhile, we construct a multi-scale feature extraction module and integrate features captured by sliding windows of different scales, feeding them into the classifier for prediction. Experimental results show that Mamba6mA achieves the best MCC on 9 out of 11 species datasets, surpassing existing state-of-the-art models. Ablation studies confirm that the position-specific linear layers and the multi-scale fusion module contribute MCC performance gains of 2.36% and 2.31%, respectively. Feature visualization analysis further reveals that the model effectively captures sequence patterns upstream and downstream of 6 mA sites providing a new technical approach for studying epigenetic modification mechanisms.</p><p><strong>Availability and implementation: </strong>The source code for Mamba6mA is available at: https://github.com/XploreAI-Lab/Mamba6mA.</p><p><strong>Contact: </strong>Xiaoya Fan (xiaoyafan@dlut.edu.cn), Zheng Zhao (zhaozheng@dlmu.edu.cn).</p><p><strong>Supplementary information: </strong>Supplementary information are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}