Pub Date : 2026-02-15DOI: 10.1093/bioinformatics/btag073
Johannes Wirth, Anna Chernysheva, Birthe Lemke, Isabel Giray, Katja Steiger
Motivation: Single-cell spatial omics data provides unprecedented insights into disease states. Comprehensive analysis of such data requires frameworks that integrate diverse modalities and enable joint processing of multiple datasets and corresponding metadata.
Results: To address these challenges, we introduce InSituPy, a versatile and scalable framework for analyzing spatial omics data from the multi-sample level down to the cellular and subcellular level. Its modular data structure organizes all relevant data modalities per sample and links them to their corresponding metadata, enabling scalable analysis of large patient cohorts using spatial omics technologies. Interactive visualization tools within InSituPy enable seamless integration of histopathological expertise, promoting collaborative hypothesis generation in translational research. Additionally, InSituPy includes built-in analytical algorithms and interfaces with external tools, establishing a standardized workflow for multi-sample spatial omics data analysis.
Availability: The Python packages InSituPy is publicly available on GitHub (https://github.com/SpatialPathology/InSituPy) and PyPi (https://pypi.org/project/insitupy-spatial/), and archived on Zenodo (DOI: 10.5281/zenodo.18459471). Tutorials and documentation for InSituPy are available at https://insitupy.readthedocs.io/. All code to replicate the results shown in this manuscript can be found in the GitHub repository. Scripts to connect QuPath and InSituPy can be found at https://github.com/SpatialPathology/InSituPy-QuPath. All data required to complete the tutorials is publicly available, and functions to download the data have been implemented. A Zulip community chat for user support and discussion is accessible at https://insitupy.zulipchat.com.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"InSituPy: A framework for histology-guided, multi-sample analysis of single-cell spatial omics data.","authors":"Johannes Wirth, Anna Chernysheva, Birthe Lemke, Isabel Giray, Katja Steiger","doi":"10.1093/bioinformatics/btag073","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag073","url":null,"abstract":"<p><strong>Motivation: </strong>Single-cell spatial omics data provides unprecedented insights into disease states. Comprehensive analysis of such data requires frameworks that integrate diverse modalities and enable joint processing of multiple datasets and corresponding metadata.</p><p><strong>Results: </strong>To address these challenges, we introduce InSituPy, a versatile and scalable framework for analyzing spatial omics data from the multi-sample level down to the cellular and subcellular level. Its modular data structure organizes all relevant data modalities per sample and links them to their corresponding metadata, enabling scalable analysis of large patient cohorts using spatial omics technologies. Interactive visualization tools within InSituPy enable seamless integration of histopathological expertise, promoting collaborative hypothesis generation in translational research. Additionally, InSituPy includes built-in analytical algorithms and interfaces with external tools, establishing a standardized workflow for multi-sample spatial omics data analysis.</p><p><strong>Availability: </strong>The Python packages InSituPy is publicly available on GitHub (https://github.com/SpatialPathology/InSituPy) and PyPi (https://pypi.org/project/insitupy-spatial/), and archived on Zenodo (DOI: 10.5281/zenodo.18459471). Tutorials and documentation for InSituPy are available at https://insitupy.readthedocs.io/. All code to replicate the results shown in this manuscript can be found in the GitHub repository. Scripts to connect QuPath and InSituPy can be found at https://github.com/SpatialPathology/InSituPy-QuPath. All data required to complete the tutorials is publicly available, and functions to download the data have been implemented. A Zulip community chat for user support and discussion is accessible at https://insitupy.zulipchat.com.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15DOI: 10.1093/bioinformatics/btag076
Tien-Cuong Bui, Injae Chung, Wonjun Lee, Junsu Ko, Juyong Lee
Motivation: Predicting immunoglobulin-antigen (Ig-Ag) binding remains a significant challenge due to the paucity of experimentally-resolved complexes and the limited accuracy of de novo Ig structure prediction.
Results: We introduce IgPose, a generalizable framework for Ig-Ag pose identification and scoring, built on a generative data-augmentation pipeline. To mitigate data scarcity, we constructed the Structural Immunoglobulin Decoy Database (SIDD), a comprehensive repository of high-fidelity synthetic decoys. IgPose integrates equivariant graph neural networks, ESM-2 embeddings, and gated recurrent units to synergistically capture both geometric and evolutionary features. We implemented interface-focused k-hop sampling with biologically guided pooling to enhance generalization across diverse interfaces. The framework comprises two sub-networks-IgPoseClassifier for binding pose discrimination and IgPoseScore for DockQ score estimation-and achieves robust performance on curated internal test sets and the CASP-16 benchmark compared to physics and deep learning baselines. IgPose serves as a versatile computational tool for high-throughput antibody discovery pipelines by providing accurate pose filtering and ranking.
Availability and implementation: IgPose is available on GitHub (https://github.com/arontier/igpose).
Supplementary information: Supplementary information is available at Bioinformatics online.
{"title":"IgPose: A Generative Data-Augmented Pipeline for Robust Immunoglobulin-Antigen Binding Prediction.","authors":"Tien-Cuong Bui, Injae Chung, Wonjun Lee, Junsu Ko, Juyong Lee","doi":"10.1093/bioinformatics/btag076","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag076","url":null,"abstract":"<p><strong>Motivation: </strong>Predicting immunoglobulin-antigen (Ig-Ag) binding remains a significant challenge due to the paucity of experimentally-resolved complexes and the limited accuracy of de novo Ig structure prediction.</p><p><strong>Results: </strong>We introduce IgPose, a generalizable framework for Ig-Ag pose identification and scoring, built on a generative data-augmentation pipeline. To mitigate data scarcity, we constructed the Structural Immunoglobulin Decoy Database (SIDD), a comprehensive repository of high-fidelity synthetic decoys. IgPose integrates equivariant graph neural networks, ESM-2 embeddings, and gated recurrent units to synergistically capture both geometric and evolutionary features. We implemented interface-focused k-hop sampling with biologically guided pooling to enhance generalization across diverse interfaces. The framework comprises two sub-networks-IgPoseClassifier for binding pose discrimination and IgPoseScore for DockQ score estimation-and achieves robust performance on curated internal test sets and the CASP-16 benchmark compared to physics and deep learning baselines. IgPose serves as a versatile computational tool for high-throughput antibody discovery pipelines by providing accurate pose filtering and ranking.</p><p><strong>Availability and implementation: </strong>IgPose is available on GitHub (https://github.com/arontier/igpose).</p><p><strong>Supplementary information: </strong>Supplementary information is available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15DOI: 10.1093/bioinformatics/btag080
Shangjin Han, Dongsup Kim
Motivation: Understanding temporal gene expression is fundamental in the study of cellular development and differentiation. In practice, temporal single-cell datasets tend to contain only a limited number of measured time points, which are often unevenly spaced, resulting in irregular intervals between observations due to experimental constraints. Existing methods typically address these intervals by sequentially predicting one time point after another, yet lack mechanisms to explicitly model time intervals, leading to error accumulation.
Results: In this work, we introduce scMix, a language-model-based framework for predicting single-cell gene expression, which enables prediction from multiple historical time points. We build scMix on the Receptance Weighted Key Value (RWKV) architecture and use its time-decay mechanism to model temporal dependencies over time. Moreover, scMix proposes a delta-time mechanism that allows the model to bypass unmeasured time points, reducing error accumulation and improving robustness. In addition, we incorporate a trend regularization strategy to enhance the temporal coherence of predicted gene expression trajectories. scMix demonstrates state-of-the-art performance in predicting gene expression at unmeasured time points, surpassing existing methods, and also achieves outstanding results on downstream tasks.
Availability and implementation: The code used for this study is available at https://doi.org/10.5281/zenodo.18287184.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"scMix: Learning Temporal Dynamics of Gene Expression under Irregular Time Intervals.","authors":"Shangjin Han, Dongsup Kim","doi":"10.1093/bioinformatics/btag080","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag080","url":null,"abstract":"<p><strong>Motivation: </strong>Understanding temporal gene expression is fundamental in the study of cellular development and differentiation. In practice, temporal single-cell datasets tend to contain only a limited number of measured time points, which are often unevenly spaced, resulting in irregular intervals between observations due to experimental constraints. Existing methods typically address these intervals by sequentially predicting one time point after another, yet lack mechanisms to explicitly model time intervals, leading to error accumulation.</p><p><strong>Results: </strong>In this work, we introduce scMix, a language-model-based framework for predicting single-cell gene expression, which enables prediction from multiple historical time points. We build scMix on the Receptance Weighted Key Value (RWKV) architecture and use its time-decay mechanism to model temporal dependencies over time. Moreover, scMix proposes a delta-time mechanism that allows the model to bypass unmeasured time points, reducing error accumulation and improving robustness. In addition, we incorporate a trend regularization strategy to enhance the temporal coherence of predicted gene expression trajectories. scMix demonstrates state-of-the-art performance in predicting gene expression at unmeasured time points, surpassing existing methods, and also achieves outstanding results on downstream tasks.</p><p><strong>Availability and implementation: </strong>The code used for this study is available at https://doi.org/10.5281/zenodo.18287184.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: The increasing prevalence of antibiotic-resistant bacteria has intensified the demand for novel antimicrobial agents. Antimicrobial peptides (AMPs) have emerged as promising alternatives, yet their identification or classification remains challenging due to the lack of multi-perspective information, insufficient feature representation learning, and monocular data modalities.
Results: In this paper, we propose a dual diffusion model-based representation learning framework for classifying AMPs, which effectively integrates both peptide sequence and structure information to address existing issues for the task. Specifically, our approach utilizes a multi-view feature construction module, which encodes peptide sequences and structures from distinctive perspectives, deriving initial feature representations with enriched biological semantics. To enhance representation learning, the proposed framework leverages both diffusion models for sequence and structure information respectively to effectively capture complex semantics from dual modalities. In addition, both single-modal and dual-modal contrastive learning are employed to further advance the representation learning. Results of comprehensive experiments demonstrate that our model outperforms existing methods for the task of AMPs classification, providing a feasible solution to accelerating the discovery of novel antimicrobial agents.
Availability of data and codes: The data and source codes are available in GitHub at https://github.com/kww567upup/DDM.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"A Dual Diffusion Model-Based Representation Learning Framework for AMPs Classification.","authors":"Wen Kong, Lingling Fu, Xingpeng Jiang, Weizhong Zhao","doi":"10.1093/bioinformatics/btag077","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag077","url":null,"abstract":"<p><strong>Motivation: </strong>The increasing prevalence of antibiotic-resistant bacteria has intensified the demand for novel antimicrobial agents. Antimicrobial peptides (AMPs) have emerged as promising alternatives, yet their identification or classification remains challenging due to the lack of multi-perspective information, insufficient feature representation learning, and monocular data modalities.</p><p><strong>Results: </strong>In this paper, we propose a dual diffusion model-based representation learning framework for classifying AMPs, which effectively integrates both peptide sequence and structure information to address existing issues for the task. Specifically, our approach utilizes a multi-view feature construction module, which encodes peptide sequences and structures from distinctive perspectives, deriving initial feature representations with enriched biological semantics. To enhance representation learning, the proposed framework leverages both diffusion models for sequence and structure information respectively to effectively capture complex semantics from dual modalities. In addition, both single-modal and dual-modal contrastive learning are employed to further advance the representation learning. Results of comprehensive experiments demonstrate that our model outperforms existing methods for the task of AMPs classification, providing a feasible solution to accelerating the discovery of novel antimicrobial agents.</p><p><strong>Availability of data and codes: </strong>The data and source codes are available in GitHub at https://github.com/kww567upup/DDM.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15DOI: 10.1093/bioinformatics/btag064
Yojana Gadiya, Javier Millán Acosta, Ammar Ammar, Alejandro Adriaque Lozano, Delano Wetstede, Dominik Martinát, Ana Claudia Sima, Hailiang Mei, Egon Willighagen, Tooba Abbassi-Daloii
Motivation: Integrating omics data analysis with publicly available databases is crucial for unravelling complex biological mechanisms. However, this integration process is often intricate and time-consuming due to the diversity and complexity of the data involved. Achieving consistent harmonization across data types is challenging when managing disparate formats and sources. To address these issues, we introduce pyBiodatafuse, a query-based Python tool designed to integrate biomedical databases. This tool establishes a modular framework that simplifies data wrangling, enabling the creation of context-specific knowledge graphs (KGs) while supporting graph-based analyses.
Results: We developed a pipeline for generating context-specific knowledge graphs dynamically, allowing users to create KGs on the fly from a set of gene or metabolite identifiers. pyBiodatafuse features a user-friendly interface that streamlines this process, making it accessible even to researchers without extensive computational expertise. Additionally, the tool offers plugins for widely used platforms such as Cytoscape, Neo4j, and GraphDB, enabling local hosting of resulting property and RDF graphs. This versatility ensures that generated KGs can be efficiently utilized within diverse research workflows. To demonstrate its potential, we used pyBiodatafuse to create a graph for post-COVID syndrome using differential gene expression data, showcasing its ability to build adaptable and context-specific knowledge representations. Thus, pyBiodatafuse sets the stage for streamlined data integration, empowering researchers to focus on discovery and analysis without being hindered by data management complexities.
Availability and implementation: pyBiodatafuse is open-source, with its source code and PyPi package available at https://github.com/BioDataFuse/pyBiodatafuse and https://pypi.org/project/pyBiodatafuse/. The user interface can be accessed at https://biodatafuse.org/. Additionally, a release has been made on Zenodo at https://doi.org/10.5281/zenodo.18468942.
{"title":"pyBiodatafuse: Extending interoperability of data using modular queries across biomedical resources.","authors":"Yojana Gadiya, Javier Millán Acosta, Ammar Ammar, Alejandro Adriaque Lozano, Delano Wetstede, Dominik Martinát, Ana Claudia Sima, Hailiang Mei, Egon Willighagen, Tooba Abbassi-Daloii","doi":"10.1093/bioinformatics/btag064","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag064","url":null,"abstract":"<p><strong>Motivation: </strong>Integrating omics data analysis with publicly available databases is crucial for unravelling complex biological mechanisms. However, this integration process is often intricate and time-consuming due to the diversity and complexity of the data involved. Achieving consistent harmonization across data types is challenging when managing disparate formats and sources. To address these issues, we introduce pyBiodatafuse, a query-based Python tool designed to integrate biomedical databases. This tool establishes a modular framework that simplifies data wrangling, enabling the creation of context-specific knowledge graphs (KGs) while supporting graph-based analyses.</p><p><strong>Results: </strong>We developed a pipeline for generating context-specific knowledge graphs dynamically, allowing users to create KGs on the fly from a set of gene or metabolite identifiers. pyBiodatafuse features a user-friendly interface that streamlines this process, making it accessible even to researchers without extensive computational expertise. Additionally, the tool offers plugins for widely used platforms such as Cytoscape, Neo4j, and GraphDB, enabling local hosting of resulting property and RDF graphs. This versatility ensures that generated KGs can be efficiently utilized within diverse research workflows. To demonstrate its potential, we used pyBiodatafuse to create a graph for post-COVID syndrome using differential gene expression data, showcasing its ability to build adaptable and context-specific knowledge representations. Thus, pyBiodatafuse sets the stage for streamlined data integration, empowering researchers to focus on discovery and analysis without being hindered by data management complexities.</p><p><strong>Availability and implementation: </strong>pyBiodatafuse is open-source, with its source code and PyPi package available at https://github.com/BioDataFuse/pyBiodatafuse and https://pypi.org/project/pyBiodatafuse/. The user interface can be accessed at https://biodatafuse.org/. Additionally, a release has been made on Zenodo at https://doi.org/10.5281/zenodo.18468942.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15DOI: 10.1093/bioinformatics/btag071
Ke Xu, Xin Maizie Zhou, Lu Zhang
Spatially resolved transcriptomics (SRT) and spatially resolved proteomics (SRP) data enable the study of gene expression and protein abundances within their precise spatial and cellular contexts in tissues. Certain SRT and SRP technologies also capture corresponding morphology images, adding another layer of valuable information. However, few existing methods developed for SRT data effectively leverage these supplementary images to enhance clustering performance. Here, we introduce stDyer-image, an end-to-end deep learning framework designed for clustering for SRT and SRP datasets with images. Unlike existing methods that utilize images to complement gene expression data, stDyer-image directly links image features to cluster labels. This approach draws inspiration from pathologists, who can visually identify specific cell types or tumor regions from morphological images without relying on gene expression or protein abundances. Benchmarks against state-of-the-art tools demonstrate that stDyer-image achieves superior performance in clustering. Moreover, it is capable of handling large-scale datasets across diverse technologies, making it a versatile and powerful tool for spatial omics analysis.
{"title":"stDyer-image improves clustering analysis of spatially resolved transcriptomics and proteomics with morphological images.","authors":"Ke Xu, Xin Maizie Zhou, Lu Zhang","doi":"10.1093/bioinformatics/btag071","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag071","url":null,"abstract":"<p><p>Spatially resolved transcriptomics (SRT) and spatially resolved proteomics (SRP) data enable the study of gene expression and protein abundances within their precise spatial and cellular contexts in tissues. Certain SRT and SRP technologies also capture corresponding morphology images, adding another layer of valuable information. However, few existing methods developed for SRT data effectively leverage these supplementary images to enhance clustering performance. Here, we introduce stDyer-image, an end-to-end deep learning framework designed for clustering for SRT and SRP datasets with images. Unlike existing methods that utilize images to complement gene expression data, stDyer-image directly links image features to cluster labels. This approach draws inspiration from pathologists, who can visually identify specific cell types or tumor regions from morphological images without relying on gene expression or protein abundances. Benchmarks against state-of-the-art tools demonstrate that stDyer-image achieves superior performance in clustering. Moreover, it is capable of handling large-scale datasets across diverse technologies, making it a versatile and powerful tool for spatial omics analysis.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15DOI: 10.1093/bioinformatics/btag078
Freya E R Woods, Emilyanne Leonard, Timothy Ebbels, Jonathan Cairns, Rhiannon David
Motivation: Flow cytometry (FC) is a widely used technique for analysing cells or particles based on the fluorescence of specific markers. Thresholds for fluorescence are typically set manually, a laborious, subjective process that scales poorly as FC technology advances. Machine learning (ML) methods can address these issues but often require technical expe r tise many bench scientists do not possess. Thus, accessible, open-source, and cross-domain ML-based FC tools are needed.
Results: We present AutoFlow, an easy-to-use, adaptable R Shiny application for automa t ed flow cytometry (FC) analysis. AutoFlow supports two workflows: supervised and uns u pervised learning. The application automates key preprocessing steps including fluore s cence compensation, debris exclusion, single-cell identification, surface marker gating, MFI quantification, and downstream classification or clustering. Across three datasets, two pu b licly available (Mosmann and Nilsson Rare) and a novel bone marrow microphysiological system (BM-MPS) dataset, AutoFlow demonstrated robust performance. In the supervised workflow, multiclass classification on BM-MPS achieved 97.2% accuracy under a leave-one-timepoint-out scheme, with high sensitivity and specificity across major lineages. For rare populations, performance was strong: Mosmann Rare (0.03% prevalence) achieved 87.5% sensitivity, and 100% specificity, while Nilsson Rare (0.08% prevalence) achieved 87.9% sensitivity, and 99.9% specificity. The unsupervised workflow accurately grouped cells into biologically meaningful clusters, recovering known populations and identifying a d ditional candidate populations with marker profiles consistent with true biology. AutoFlow offers a fast, reproducible, and scalable solution for FC analysis, enabling high-throughput studies and improving the discovery of rare or unexpected cell types.
Availability: The application is available at https://github.com/FERWoods/AutoFlow for download using R. An archived version is available at DOI :10.5281/zenodo.18235796.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"AutoFlow: An interactive Shiny app for supervised and unsupervised flow cytometry analysis.","authors":"Freya E R Woods, Emilyanne Leonard, Timothy Ebbels, Jonathan Cairns, Rhiannon David","doi":"10.1093/bioinformatics/btag078","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag078","url":null,"abstract":"<p><strong>Motivation: </strong>Flow cytometry (FC) is a widely used technique for analysing cells or particles based on the fluorescence of specific markers. Thresholds for fluorescence are typically set manually, a laborious, subjective process that scales poorly as FC technology advances. Machine learning (ML) methods can address these issues but often require technical expe r tise many bench scientists do not possess. Thus, accessible, open-source, and cross-domain ML-based FC tools are needed.</p><p><strong>Results: </strong>We present AutoFlow, an easy-to-use, adaptable R Shiny application for automa t ed flow cytometry (FC) analysis. AutoFlow supports two workflows: supervised and uns u pervised learning. The application automates key preprocessing steps including fluore s cence compensation, debris exclusion, single-cell identification, surface marker gating, MFI quantification, and downstream classification or clustering. Across three datasets, two pu b licly available (Mosmann and Nilsson Rare) and a novel bone marrow microphysiological system (BM-MPS) dataset, AutoFlow demonstrated robust performance. In the supervised workflow, multiclass classification on BM-MPS achieved 97.2% accuracy under a leave-one-timepoint-out scheme, with high sensitivity and specificity across major lineages. For rare populations, performance was strong: Mosmann Rare (0.03% prevalence) achieved 87.5% sensitivity, and 100% specificity, while Nilsson Rare (0.08% prevalence) achieved 87.9% sensitivity, and 99.9% specificity. The unsupervised workflow accurately grouped cells into biologically meaningful clusters, recovering known populations and identifying a d ditional candidate populations with marker profiles consistent with true biology. AutoFlow offers a fast, reproducible, and scalable solution for FC analysis, enabling high-throughput studies and improving the discovery of rare or unexpected cell types.</p><p><strong>Availability: </strong>The application is available at https://github.com/FERWoods/AutoFlow for download using R. An archived version is available at DOI :10.5281/zenodo.18235796.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15DOI: 10.1093/bioinformatics/btag075
Leo Carl Foerster, Enrico Frigoli, Xiaoyu Sun, Jooa Hooli, Angela Goncalves, Ana Martin-Villalba
Motivation: Commercial solutions like 10X cellranger provide robust UMI quantification for their proprietary single-cell protocols, but open methods such as Smart-seq3 lack comparable support.
Results: Here, we introduce umite, a Smart-seq3 UMI counting pipeline with a focus on speed and a light memory footprint. Unlike existing tools, umite offers efficient mismatch-tolerant UMI detection, boosting UMI retrieval by 5-15% in benchmarks. It also outperforms current Smart-seq3 quantification tools in runtime, disk usage, and memory footprint, offering better scalability on large datasets.
Availability: umite is available at https://github.com/leoforster/umite (or via Zenodo: https://doi.org/10.5281/zenodo.18166431) and includes a Snakemake workflow for Smart-seq3 quantification. Single cell libraries of the mouse nasal vasculature dataset (GSE207085) and human CD4+ T-cell dataset (GSE270928) used in benchmarking were downloaded from NCBI (see Supplement for details).
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"Umite: fast quantification of smart-seq3 libraries with improved UMI retrieval.","authors":"Leo Carl Foerster, Enrico Frigoli, Xiaoyu Sun, Jooa Hooli, Angela Goncalves, Ana Martin-Villalba","doi":"10.1093/bioinformatics/btag075","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag075","url":null,"abstract":"<p><strong>Motivation: </strong>Commercial solutions like 10X cellranger provide robust UMI quantification for their proprietary single-cell protocols, but open methods such as Smart-seq3 lack comparable support.</p><p><strong>Results: </strong>Here, we introduce umite, a Smart-seq3 UMI counting pipeline with a focus on speed and a light memory footprint. Unlike existing tools, umite offers efficient mismatch-tolerant UMI detection, boosting UMI retrieval by 5-15% in benchmarks. It also outperforms current Smart-seq3 quantification tools in runtime, disk usage, and memory footprint, offering better scalability on large datasets.</p><p><strong>Availability: </strong>umite is available at https://github.com/leoforster/umite (or via Zenodo: https://doi.org/10.5281/zenodo.18166431) and includes a Snakemake workflow for Smart-seq3 quantification. Single cell libraries of the mouse nasal vasculature dataset (GSE207085) and human CD4+ T-cell dataset (GSE270928) used in benchmarking were downloaded from NCBI (see Supplement for details).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-13DOI: 10.1093/bioinformatics/btag068
Haoyu Zhang, Kevin Fotso, Marc Subirana-Granés, Milton Pividori
Motivation: Identifying meaningful patterns in complex biological data necessitates correlation coefficients capable of capturing diverse relationship types beyond simple linearity. Furthermore, efficient computational tools are crucial for handling the ever-increasing scale of biological datasets.
Results: We introduce CCC-GPU, a high-performance, GPU-accelerated implementation of the Clustermatch Correlation Coefficient (CCC). CCC-GPU computes correlation coefficients for mixed data types, effectively detects nonlinear relationships, and offers significant speed improvements over its predecessor.
Availability and implementation: The source code of CCC-GPU is openly available on GitHub (https://github.com/pivlab/ccc-gpu) and archived on Zenodo (https://doi.org/10.5281/zenodo.18310318), distributed under the BSD-2-Clause Plus Patent License.
{"title":"CCC-GPU: A graphics processing unit (GPU)-accelerated nonlinear correlation coefficient for large-scale transcriptomic analyses.","authors":"Haoyu Zhang, Kevin Fotso, Marc Subirana-Granés, Milton Pividori","doi":"10.1093/bioinformatics/btag068","DOIUrl":"10.1093/bioinformatics/btag068","url":null,"abstract":"<p><strong>Motivation: </strong>Identifying meaningful patterns in complex biological data necessitates correlation coefficients capable of capturing diverse relationship types beyond simple linearity. Furthermore, efficient computational tools are crucial for handling the ever-increasing scale of biological datasets.</p><p><strong>Results: </strong>We introduce CCC-GPU, a high-performance, GPU-accelerated implementation of the Clustermatch Correlation Coefficient (CCC). CCC-GPU computes correlation coefficients for mixed data types, effectively detects nonlinear relationships, and offers significant speed improvements over its predecessor.</p><p><strong>Availability and implementation: </strong>The source code of CCC-GPU is openly available on GitHub (https://github.com/pivlab/ccc-gpu) and archived on Zenodo (https://doi.org/10.5281/zenodo.18310318), distributed under the BSD-2-Clause Plus Patent License.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146196162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-12DOI: 10.1093/bioinformatics/btag063
Christos Matsingos, Ka Fu Man, Arianna Fornili
Summary: The protonation propensity of ionisable residues in proteins can change in response to changes in the local residue environment. The link between protein dynamics and pK a is particularly important in pH regulation of protein structure and function. Here, we introduce TrIPP (Trajectory Iterative pK a Predictor), a Python tool to track and analyse changes in the pK a of ionisable residues along Molecular Dynamics trajectories of proteins. We show how TrIPP can be used to identify residues with physiologically relevant variations in their predicted pK a values during the simulations, and link them to changes in the local and global environment.
Availability and implementation: TrIPP is available at https://github.com/fornililab/TrIPP.
Supplementary information: Supplementary data are available at Bioinformatics online.
摘要:蛋白质中可电离残基的质子化倾向会随着局部残基环境的变化而改变。蛋白质动力学和pK - a之间的联系在蛋白质结构和功能的pH调节中尤为重要。本文介绍了TrIPP (Trajectory Iterative pK a Predictor),这是一个Python工具,用于跟踪和分析蛋白质分子动力学轨迹上可电离残基pK a的变化。我们展示了TrIPP如何在模拟过程中用于识别具有预测pK值生理相关变化的残基,并将它们与局部和全球环境的变化联系起来。可用性和实施:TrIPP可在https://github.com/fornililab/TrIPP.Supplementary上获得信息;补充数据可在Bioinformatics在线获得。
{"title":"TrIPP: a Trajectory Iterative pKa Predictor.","authors":"Christos Matsingos, Ka Fu Man, Arianna Fornili","doi":"10.1093/bioinformatics/btag063","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag063","url":null,"abstract":"<p><strong>Summary: </strong>The protonation propensity of ionisable residues in proteins can change in response to changes in the local residue environment. The link between protein dynamics and pK a is particularly important in pH regulation of protein structure and function. Here, we introduce TrIPP (Trajectory Iterative pK a Predictor), a Python tool to track and analyse changes in the pK a of ionisable residues along Molecular Dynamics trajectories of proteins. We show how TrIPP can be used to identify residues with physiologically relevant variations in their predicted pK a values during the simulations, and link them to changes in the local and global environment.</p><p><strong>Availability and implementation: </strong>TrIPP is available at https://github.com/fornililab/TrIPP.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146183744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}