Bioinformatics (Oxford, England)最新文献_第2页

InSituPy: A framework for histology-guided, multi-sample analysis of single-cell spatial omics data. InSituPy：用于单细胞空间组学数据的组织学指导、多样本分析的框架。

IF 5.4

Bioinformatics (Oxford, England)

Pub Date : 2026-02-15 DOI: 10.1093/bioinformatics/btag073

Johannes Wirth, Anna Chernysheva, Birthe Lemke, Isabel Giray, Katja Steiger

Motivation: Single-cell spatial omics data provides unprecedented insights into disease states. Comprehensive analysis of such data requires frameworks that integrate diverse modalities and enable joint processing of multiple datasets and corresponding metadata.

Results: To address these challenges, we introduce InSituPy, a versatile and scalable framework for analyzing spatial omics data from the multi-sample level down to the cellular and subcellular level. Its modular data structure organizes all relevant data modalities per sample and links them to their corresponding metadata, enabling scalable analysis of large patient cohorts using spatial omics technologies. Interactive visualization tools within InSituPy enable seamless integration of histopathological expertise, promoting collaborative hypothesis generation in translational research. Additionally, InSituPy includes built-in analytical algorithms and interfaces with external tools, establishing a standardized workflow for multi-sample spatial omics data analysis.

Availability: The Python packages InSituPy is publicly available on GitHub (https://github.com/SpatialPathology/InSituPy) and PyPi (https://pypi.org/project/insitupy-spatial/), and archived on Zenodo (DOI: 10.5281/zenodo.18459471). Tutorials and documentation for InSituPy are available at https://insitupy.readthedocs.io/. All code to replicate the results shown in this manuscript can be found in the GitHub repository. Scripts to connect QuPath and InSituPy can be found at https://github.com/SpatialPathology/InSituPy-QuPath. All data required to complete the tutorials is publicly available, and functions to download the data have been implemented. A Zulip community chat for user support and discussion is accessible at https://insitupy.zulipchat.com.

Supplementary information: Supplementary data are available at Bioinformatics online.

动机：单细胞空间组学数据为疾病状态提供了前所未有的见解。对此类数据的综合分析需要集成多种模式的框架，并能够联合处理多个数据集和相应的元数据。结果：为了应对这些挑战，我们引入了InSituPy，这是一个多功能和可扩展的框架，用于分析从多样本水平到细胞和亚细胞水平的空间组学数据。其模块化数据结构组织每个样本的所有相关数据模式，并将其链接到相应的元数据，从而使用空间组学技术对大型患者队列进行可扩展分析。InSituPy中的交互式可视化工具可以无缝整合组织病理学专业知识，促进转化研究中的协作假设生成。此外，InSituPy还包括内置的分析算法和与外部工具的接口，为多样本空间组学数据分析建立了标准化的工作流程。可用性：Python包InSituPy在GitHub （https://github.com/SpatialPathology/InSituPy）和PyPi （https://pypi.org/project/insitupy-spatial/）上公开提供，并在Zenodo上存档（DOI: 10.5281/ Zenodo .18459471）。InSituPy的教程和文档可在https://insitupy.readthedocs.io/上获得。所有复制本文中所示结果的代码都可以在GitHub存储库中找到。连接QuPath和InSituPy的脚本可以在https://github.com/SpatialPathology/InSituPy-QuPath上找到。完成教程所需的所有数据都是公开的，并且已经实现了下载数据的功能。用户支持和讨论的Zulip社区聊天可访问https://insitupy.zulipchat.com.Supplementary information：补充数据可在Bioinformatics在线获得。

{"title":"InSituPy: A framework for histology-guided, multi-sample analysis of single-cell spatial omics data.","authors":"Johannes Wirth, Anna Chernysheva, Birthe Lemke, Isabel Giray, Katja Steiger","doi":"10.1093/bioinformatics/btag073","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag073","url":null,"abstract":"Motivation: Single-cell spatial omics data provides unprecedented insights into disease states. Comprehensive analysis of such data requires frameworks that integrate diverse modalities and enable joint processing of multiple datasets and corresponding metadata.Results: To address these challenges, we introduce InSituPy, a versatile and scalable framework for analyzing spatial omics data from the multi-sample level down to the cellular and subcellular level. Its modular data structure organizes all relevant data modalities per sample and links them to their corresponding metadata, enabling scalable analysis of large patient cohorts using spatial omics technologies. Interactive visualization tools within InSituPy enable seamless integration of histopathological expertise, promoting collaborative hypothesis generation in translational research. Additionally, InSituPy includes built-in analytical algorithms and interfaces with external tools, establishing a standardized workflow for multi-sample spatial omics data analysis.Availability: The Python packages InSituPy is publicly available on GitHub (https://github.com/SpatialPathology/InSituPy) and PyPi (https://pypi.org/project/insitupy-spatial/), and archived on Zenodo (DOI: 10.5281/zenodo.18459471). Tutorials and documentation for InSituPy are available at https://insitupy.readthedocs.io/. All code to replicate the results shown in this manuscript can be found in the GitHub repository. Scripts to connect QuPath and InSituPy can be found at https://github.com/SpatialPathology/InSituPy-QuPath. All data required to complete the tutorials is publicly available, and functions to download the data have been implemented. A Zulip community chat for user support and discussion is accessible at https://insitupy.zulipchat.com.Supplementary information: Supplementary data are available at Bioinformatics online.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IgPose: A Generative Data-Augmented Pipeline for Robust Immunoglobulin-Antigen Binding Prediction. IgPose：一个强大的免疫球蛋白抗原结合预测生成数据增强管道。

IF 5.4

Bioinformatics (Oxford, England)

Pub Date : 2026-02-15 DOI: 10.1093/bioinformatics/btag076

Tien-Cuong Bui, Injae Chung, Wonjun Lee, Junsu Ko, Juyong Lee

Motivation: Predicting immunoglobulin-antigen (Ig-Ag) binding remains a significant challenge due to the paucity of experimentally-resolved complexes and the limited accuracy of de novo Ig structure prediction.

Results: We introduce IgPose, a generalizable framework for Ig-Ag pose identification and scoring, built on a generative data-augmentation pipeline. To mitigate data scarcity, we constructed the Structural Immunoglobulin Decoy Database (SIDD), a comprehensive repository of high-fidelity synthetic decoys. IgPose integrates equivariant graph neural networks, ESM-2 embeddings, and gated recurrent units to synergistically capture both geometric and evolutionary features. We implemented interface-focused k-hop sampling with biologically guided pooling to enhance generalization across diverse interfaces. The framework comprises two sub-networks-IgPoseClassifier for binding pose discrimination and IgPoseScore for DockQ score estimation-and achieves robust performance on curated internal test sets and the CASP-16 benchmark compared to physics and deep learning baselines. IgPose serves as a versatile computational tool for high-throughput antibody discovery pipelines by providing accurate pose filtering and ranking.

Availability and implementation: IgPose is available on GitHub (https://github.com/arontier/igpose).

Supplementary information: Supplementary information is available at Bioinformatics online.

动机：预测免疫球蛋白-抗原（Ig- ag）结合仍然是一个重大挑战，因为缺乏实验分解的复合物，并且从头预测Ig结构的准确性有限。结果：我们介绍了IgPose，这是一个基于生成数据增强管道的Ig-Ag姿态识别和评分的通用框架。为了缓解数据短缺，我们构建了结构免疫球蛋白诱饵数据库（SIDD），这是一个高保真合成诱饵的综合存储库。IgPose集成了等变图神经网络、ESM-2嵌入和门控循环单元，以协同捕获几何和进化特征。我们使用生物引导池实现了以接口为中心的k-hop采样，以增强不同接口之间的泛化。该框架包括两个子网络——用于绑定姿态识别的igposecassifier和用于DockQ分数估计的IgPoseScore——与物理和深度学习基线相比，在精心设计的内部测试集和CASP-16基准上实现了强大的性能。IgPose通过提供准确的姿态过滤和排名，作为高通量抗体发现管道的通用计算工具。可用性和实施：IgPose可在GitHub上获得（https://github.com/arontier/igpose）.Supplementary信息：补充信息可在Bioinformatics在线获得。

{"title":"IgPose: A Generative Data-Augmented Pipeline for Robust Immunoglobulin-Antigen Binding Prediction.","authors":"Tien-Cuong Bui, Injae Chung, Wonjun Lee, Junsu Ko, Juyong Lee","doi":"10.1093/bioinformatics/btag076","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag076","url":null,"abstract":"Motivation: Predicting immunoglobulin-antigen (Ig-Ag) binding remains a significant challenge due to the paucity of experimentally-resolved complexes and the limited accuracy of de novo Ig structure prediction.Results: We introduce IgPose, a generalizable framework for Ig-Ag pose identification and scoring, built on a generative data-augmentation pipeline. To mitigate data scarcity, we constructed the Structural Immunoglobulin Decoy Database (SIDD), a comprehensive repository of high-fidelity synthetic decoys. IgPose integrates equivariant graph neural networks, ESM-2 embeddings, and gated recurrent units to synergistically capture both geometric and evolutionary features. We implemented interface-focused k-hop sampling with biologically guided pooling to enhance generalization across diverse interfaces. The framework comprises two sub-networks-IgPoseClassifier for binding pose discrimination and IgPoseScore for DockQ score estimation-and achieves robust performance on curated internal test sets and the CASP-16 benchmark compared to physics and deep learning baselines. IgPose serves as a versatile computational tool for high-throughput antibody discovery pipelines by providing accurate pose filtering and ranking.Availability and implementation: IgPose is available on GitHub (https://github.com/arontier/igpose).Supplementary information: Supplementary information is available at Bioinformatics online.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

scMix: Learning Temporal Dynamics of Gene Expression under Irregular Time Intervals. 学习不规则时间间隔下基因表达的时间动态。

IF 5.4

Bioinformatics (Oxford, England)

Pub Date : 2026-02-15 DOI: 10.1093/bioinformatics/btag080

Shangjin Han, Dongsup Kim

Motivation: Understanding temporal gene expression is fundamental in the study of cellular development and differentiation. In practice, temporal single-cell datasets tend to contain only a limited number of measured time points, which are often unevenly spaced, resulting in irregular intervals between observations due to experimental constraints. Existing methods typically address these intervals by sequentially predicting one time point after another, yet lack mechanisms to explicitly model time intervals, leading to error accumulation.

Results: In this work, we introduce scMix, a language-model-based framework for predicting single-cell gene expression, which enables prediction from multiple historical time points. We build scMix on the Receptance Weighted Key Value (RWKV) architecture and use its time-decay mechanism to model temporal dependencies over time. Moreover, scMix proposes a delta-time mechanism that allows the model to bypass unmeasured time points, reducing error accumulation and improving robustness. In addition, we incorporate a trend regularization strategy to enhance the temporal coherence of predicted gene expression trajectories. scMix demonstrates state-of-the-art performance in predicting gene expression at unmeasured time points, surpassing existing methods, and also achieves outstanding results on downstream tasks.

Availability and implementation: The code used for this study is available at https://doi.org/10.5281/zenodo.18287184.

Supplementary information: Supplementary data are available at Bioinformatics online.

动机：了解时间基因表达是研究细胞发育和分化的基础。在实践中，时间单细胞数据集往往只包含有限数量的测量时间点，这些时间点通常是不均匀间隔的，由于实验限制，导致观测之间的间隔不规则。现有方法通常通过顺序预测一个时间点接着另一个时间点来处理这些间隔，但缺乏显式建模时间间隔的机制，导致误差累积。结果：在这项工作中，我们引入了scMix，这是一个基于语言模型的框架，用于预测单细胞基因表达，可以从多个历史时间点进行预测。我们在接受加权键值（RWKV）架构上构建scMix，并使用其时间衰减机制来建模随时间变化的时间依赖性。此外，scMix提出了一种delta-time机制，允许模型绕过未测量的时间点，减少误差积累并提高鲁棒性。此外，我们采用趋势正则化策略来增强预测基因表达轨迹的时间一致性。scMix在预测非测量时间点的基因表达方面表现出了最先进的性能，超越了现有的方法，并且在下游任务中也取得了出色的结果。可获得性和实施：本研究使用的代码可在https://doi.org/10.5281/zenodo.18287184.Supplementary上获得。补充数据可在Bioinformatics在线上获得。

{"title":"scMix: Learning Temporal Dynamics of Gene Expression under Irregular Time Intervals.","authors":"Shangjin Han, Dongsup Kim","doi":"10.1093/bioinformatics/btag080","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag080","url":null,"abstract":"Motivation: Understanding temporal gene expression is fundamental in the study of cellular development and differentiation. In practice, temporal single-cell datasets tend to contain only a limited number of measured time points, which are often unevenly spaced, resulting in irregular intervals between observations due to experimental constraints. Existing methods typically address these intervals by sequentially predicting one time point after another, yet lack mechanisms to explicitly model time intervals, leading to error accumulation.Results: In this work, we introduce scMix, a language-model-based framework for predicting single-cell gene expression, which enables prediction from multiple historical time points. We build scMix on the Receptance Weighted Key Value (RWKV) architecture and use its time-decay mechanism to model temporal dependencies over time. Moreover, scMix proposes a delta-time mechanism that allows the model to bypass unmeasured time points, reducing error accumulation and improving robustness. In addition, we incorporate a trend regularization strategy to enhance the temporal coherence of predicted gene expression trajectories. scMix demonstrates state-of-the-art performance in predicting gene expression at unmeasured time points, surpassing existing methods, and also achieves outstanding results on downstream tasks.Availability and implementation: The code used for this study is available at https://doi.org/10.5281/zenodo.18287184.Supplementary information: Supplementary data are available at Bioinformatics online.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Dual Diffusion Model-Based Representation Learning Framework for AMPs Classification. 基于双扩散模型的AMPs分类表示学习框架。

IF 5.4

Bioinformatics (Oxford, England)

Pub Date : 2026-02-15 DOI: 10.1093/bioinformatics/btag077

Wen Kong, Lingling Fu, Xingpeng Jiang, Weizhong Zhao

Motivation: The increasing prevalence of antibiotic-resistant bacteria has intensified the demand for novel antimicrobial agents. Antimicrobial peptides (AMPs) have emerged as promising alternatives, yet their identification or classification remains challenging due to the lack of multi-perspective information, insufficient feature representation learning, and monocular data modalities.

Results: In this paper, we propose a dual diffusion model-based representation learning framework for classifying AMPs, which effectively integrates both peptide sequence and structure information to address existing issues for the task. Specifically, our approach utilizes a multi-view feature construction module, which encodes peptide sequences and structures from distinctive perspectives, deriving initial feature representations with enriched biological semantics. To enhance representation learning, the proposed framework leverages both diffusion models for sequence and structure information respectively to effectively capture complex semantics from dual modalities. In addition, both single-modal and dual-modal contrastive learning are employed to further advance the representation learning. Results of comprehensive experiments demonstrate that our model outperforms existing methods for the task of AMPs classification, providing a feasible solution to accelerating the discovery of novel antimicrobial agents.

Availability of data and codes: The data and source codes are available in GitHub at https://github.com/kww567upup/DDM.

Supplementary information: Supplementary data are available at Bioinformatics online.

动机：抗生素耐药细菌的日益流行加剧了对新型抗菌药物的需求。抗菌肽（AMPs）已成为有希望的替代品，但由于缺乏多角度信息、特征表示学习不足和单目数据模式，它们的识别或分类仍然具有挑战性。结果：在本文中，我们提出了一个基于双扩散模型的表征学习框架，该框架有效地整合了肽序列和结构信息，解决了该任务中存在的问题。具体来说，我们的方法利用了一个多视图特征构建模块，该模块从不同的角度编码肽序列和结构，从而获得具有丰富生物语义的初始特征表示。为了增强表征学习，所提出的框架分别利用序列和结构信息的扩散模型来有效地从双重模态中捕获复杂语义。此外，采用单模态和双模态对比学习来进一步推进表征学习。综合实验结果表明，该模型在抗菌药物分类任务上优于现有方法，为加速发现新型抗菌药物提供了可行的解决方案。数据和代码的可用性：数据和源代码可在GitHub上获得https://github.com/kww567upup/DDM.Supplementary information：补充数据可在Bioinformatics在线获得。

{"title":"A Dual Diffusion Model-Based Representation Learning Framework for AMPs Classification.","authors":"Wen Kong, Lingling Fu, Xingpeng Jiang, Weizhong Zhao","doi":"10.1093/bioinformatics/btag077","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag077","url":null,"abstract":"Motivation: The increasing prevalence of antibiotic-resistant bacteria has intensified the demand for novel antimicrobial agents. Antimicrobial peptides (AMPs) have emerged as promising alternatives, yet their identification or classification remains challenging due to the lack of multi-perspective information, insufficient feature representation learning, and monocular data modalities.Results: In this paper, we propose a dual diffusion model-based representation learning framework for classifying AMPs, which effectively integrates both peptide sequence and structure information to address existing issues for the task. Specifically, our approach utilizes a multi-view feature construction module, which encodes peptide sequences and structures from distinctive perspectives, deriving initial feature representations with enriched biological semantics. To enhance representation learning, the proposed framework leverages both diffusion models for sequence and structure information respectively to effectively capture complex semantics from dual modalities. In addition, both single-modal and dual-modal contrastive learning are employed to further advance the representation learning. Results of comprehensive experiments demonstrate that our model outperforms existing methods for the task of AMPs classification, providing a feasible solution to accelerating the discovery of novel antimicrobial agents.Availability of data and codes: The data and source codes are available in GitHub at https://github.com/kww567upup/DDM.Supplementary information: Supplementary data are available at Bioinformatics online.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

pyBiodatafuse: Extending interoperability of data using modular queries across biomedical resources. pyBiodatafuse：使用跨生物医学资源的模块化查询扩展数据的互操作性。

IF 5.4

Bioinformatics (Oxford, England)

Pub Date : 2026-02-15 DOI: 10.1093/bioinformatics/btag064

Yojana Gadiya, Javier Millán Acosta, Ammar Ammar, Alejandro Adriaque Lozano, Delano Wetstede, Dominik Martinát, Ana Claudia Sima, Hailiang Mei, Egon Willighagen, Tooba Abbassi-Daloii

Motivation: Integrating omics data analysis with publicly available databases is crucial for unravelling complex biological mechanisms. However, this integration process is often intricate and time-consuming due to the diversity and complexity of the data involved. Achieving consistent harmonization across data types is challenging when managing disparate formats and sources. To address these issues, we introduce pyBiodatafuse, a query-based Python tool designed to integrate biomedical databases. This tool establishes a modular framework that simplifies data wrangling, enabling the creation of context-specific knowledge graphs (KGs) while supporting graph-based analyses.

Results: We developed a pipeline for generating context-specific knowledge graphs dynamically, allowing users to create KGs on the fly from a set of gene or metabolite identifiers. pyBiodatafuse features a user-friendly interface that streamlines this process, making it accessible even to researchers without extensive computational expertise. Additionally, the tool offers plugins for widely used platforms such as Cytoscape, Neo4j, and GraphDB, enabling local hosting of resulting property and RDF graphs. This versatility ensures that generated KGs can be efficiently utilized within diverse research workflows. To demonstrate its potential, we used pyBiodatafuse to create a graph for post-COVID syndrome using differential gene expression data, showcasing its ability to build adaptable and context-specific knowledge representations. Thus, pyBiodatafuse sets the stage for streamlined data integration, empowering researchers to focus on discovery and analysis without being hindered by data management complexities.

Availability and implementation: pyBiodatafuse is open-source, with its source code and PyPi package available at https://github.com/BioDataFuse/pyBiodatafuse and https://pypi.org/project/pyBiodatafuse/. The user interface can be accessed at https://biodatafuse.org/. Additionally, a release has been made on Zenodo at https://doi.org/10.5281/zenodo.18468942.

动机：将组学数据分析与公开可用的数据库集成对于揭示复杂的生物机制至关重要。然而，由于所涉及的数据的多样性和复杂性，这种集成过程通常是复杂和耗时的。在管理不同的格式和源时，实现跨数据类型的一致协调是一项挑战。为了解决这些问题，我们介绍了pyBiodatafuse，这是一个基于查询的Python工具，旨在集成生物医学数据库。该工具建立了一个模块化框架，简化了数据争论，在支持基于图的分析的同时，支持创建特定于上下文的知识图（KGs）。结果：我们开发了一个动态生成上下文特定知识图谱的管道，允许用户从一组基因或代谢物标识符动态创建kg。pyBiodatafuse具有用户友好的界面，简化了这一过程，即使没有广泛的计算专业知识的研究人员也可以访问它。此外，该工具还为广泛使用的平台（如Cytoscape、Neo4j和GraphDB）提供了插件，支持本地托管生成的属性和RDF图。这种多功能性确保生成的kg可以在不同的研究工作流程中有效地利用。为了展示其潜力，我们使用pyBiodatafuse使用差异基因表达数据创建了后covid综合征的图表，展示了其构建适应性和特定于上下文的知识表示的能力。因此，pyBiodatafuse为简化数据集成奠定了基础，使研究人员能够专注于发现和分析，而不会受到数据管理复杂性的阻碍。可用性和实现：pyBiodatafuse是开源的，其源代码和PyPi包可从https://github.com/BioDataFuse/pyBiodatafuse和https://pypi.org/project/pyBiodatafuse/获得。用户界面可通过https://biodatafuse.org/访问。此外，在Zenodo网站https://doi.org/10.5281/zenodo.18468942上也发布了一个版本。

{"title":"pyBiodatafuse: Extending interoperability of data using modular queries across biomedical resources.","authors":"Yojana Gadiya, Javier Millán Acosta, Ammar Ammar, Alejandro Adriaque Lozano, Delano Wetstede, Dominik Martinát, Ana Claudia Sima, Hailiang Mei, Egon Willighagen, Tooba Abbassi-Daloii","doi":"10.1093/bioinformatics/btag064","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag064","url":null,"abstract":"Motivation: Integrating omics data analysis with publicly available databases is crucial for unravelling complex biological mechanisms. However, this integration process is often intricate and time-consuming due to the diversity and complexity of the data involved. Achieving consistent harmonization across data types is challenging when managing disparate formats and sources. To address these issues, we introduce pyBiodatafuse, a query-based Python tool designed to integrate biomedical databases. This tool establishes a modular framework that simplifies data wrangling, enabling the creation of context-specific knowledge graphs (KGs) while supporting graph-based analyses.Results: We developed a pipeline for generating context-specific knowledge graphs dynamically, allowing users to create KGs on the fly from a set of gene or metabolite identifiers. pyBiodatafuse features a user-friendly interface that streamlines this process, making it accessible even to researchers without extensive computational expertise. Additionally, the tool offers plugins for widely used platforms such as Cytoscape, Neo4j, and GraphDB, enabling local hosting of resulting property and RDF graphs. This versatility ensures that generated KGs can be efficiently utilized within diverse research workflows. To demonstrate its potential, we used pyBiodatafuse to create a graph for post-COVID syndrome using differential gene expression data, showcasing its ability to build adaptable and context-specific knowledge representations. Thus, pyBiodatafuse sets the stage for streamlined data integration, empowering researchers to focus on discovery and analysis without being hindered by data management complexities.Availability and implementation: pyBiodatafuse is open-source, with its source code and PyPi package available at https://github.com/BioDataFuse/pyBiodatafuse and https://pypi.org/project/pyBiodatafuse/. The user interface can be accessed at https://biodatafuse.org/. Additionally, a release has been made on Zenodo at https://doi.org/10.5281/zenodo.18468942.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

stDyer-image improves clustering analysis of spatially resolved transcriptomics and proteomics with morphological images. stDyer-image改进了形态学图像的空间分辨转录组学和蛋白质组学聚类分析。

IF 5.4

Bioinformatics (Oxford, England)

Pub Date : 2026-02-15 DOI: 10.1093/bioinformatics/btag071

Ke Xu, Xin Maizie Zhou, Lu Zhang

Spatially resolved transcriptomics (SRT) and spatially resolved proteomics (SRP) data enable the study of gene expression and protein abundances within their precise spatial and cellular contexts in tissues. Certain SRT and SRP technologies also capture corresponding morphology images, adding another layer of valuable information. However, few existing methods developed for SRT data effectively leverage these supplementary images to enhance clustering performance. Here, we introduce stDyer-image, an end-to-end deep learning framework designed for clustering for SRT and SRP datasets with images. Unlike existing methods that utilize images to complement gene expression data, stDyer-image directly links image features to cluster labels. This approach draws inspiration from pathologists, who can visually identify specific cell types or tumor regions from morphological images without relying on gene expression or protein abundances. Benchmarks against state-of-the-art tools demonstrate that stDyer-image achieves superior performance in clustering. Moreover, it is capable of handling large-scale datasets across diverse technologies, making it a versatile and powerful tool for spatial omics analysis.

空间分辨转录组学（SRT）和空间分辨蛋白质组学（SRP）数据可以在组织中精确的空间和细胞背景下研究基因表达和蛋白质丰度。某些SRT和SRP技术还捕获相应的形态学图像，增加了另一层有价值的信息。然而，针对SRT数据开发的现有方法很少有效地利用这些补充图像来提高聚类性能。在这里，我们介绍了stDyer-image，这是一个端到端深度学习框架，专为具有图像的SRT和SRP数据集聚类而设计。与现有的利用图像来补充基因表达数据的方法不同，stdye -image直接将图像特征与聚类标签联系起来。这种方法从病理学家那里获得灵感，病理学家可以从形态学图像中直观地识别特定的细胞类型或肿瘤区域，而不依赖于基因表达或蛋白质丰度。针对最先进工具的基准测试表明，stdye -image在集群中实现了卓越的性能。此外，它能够处理跨不同技术的大规模数据集，使其成为空间组学分析的多功能和强大工具。

{"title":"stDyer-image improves clustering analysis of spatially resolved transcriptomics and proteomics with morphological images.","authors":"Ke Xu, Xin Maizie Zhou, Lu Zhang","doi":"10.1093/bioinformatics/btag071","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag071","url":null,"abstract":"Spatially resolved transcriptomics (SRT) and spatially resolved proteomics (SRP) data enable the study of gene expression and protein abundances within their precise spatial and cellular contexts in tissues. Certain SRT and SRP technologies also capture corresponding morphology images, adding another layer of valuable information. However, few existing methods developed for SRT data effectively leverage these supplementary images to enhance clustering performance. Here, we introduce stDyer-image, an end-to-end deep learning framework designed for clustering for SRT and SRP datasets with images. Unlike existing methods that utilize images to complement gene expression data, stDyer-image directly links image features to cluster labels. This approach draws inspiration from pathologists, who can visually identify specific cell types or tumor regions from morphological images without relying on gene expression or protein abundances. Benchmarks against state-of-the-art tools demonstrate that stDyer-image achieves superior performance in clustering. Moreover, it is capable of handling large-scale datasets across diverse technologies, making it a versatile and powerful tool for spatial omics analysis.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AutoFlow: An interactive Shiny app for supervised and unsupervised flow cytometry analysis. AutoFlow：用于监督和无监督流式细胞术分析的交互式Shiny应用程序。

IF 5.4

Bioinformatics (Oxford, England)

Pub Date : 2026-02-15 DOI: 10.1093/bioinformatics/btag078

Freya E R Woods, Emilyanne Leonard, Timothy Ebbels, Jonathan Cairns, Rhiannon David

Motivation: Flow cytometry (FC) is a widely used technique for analysing cells or particles based on the fluorescence of specific markers. Thresholds for fluorescence are typically set manually, a laborious, subjective process that scales poorly as FC technology advances. Machine learning (ML) methods can address these issues but often require technical expe r tise many bench scientists do not possess. Thus, accessible, open-source, and cross-domain ML-based FC tools are needed.

Results: We present AutoFlow, an easy-to-use, adaptable R Shiny application for automa t ed flow cytometry (FC) analysis. AutoFlow supports two workflows: supervised and uns u pervised learning. The application automates key preprocessing steps including fluore s cence compensation, debris exclusion, single-cell identification, surface marker gating, MFI quantification, and downstream classification or clustering. Across three datasets, two pu b licly available (Mosmann and Nilsson Rare) and a novel bone marrow microphysiological system (BM-MPS) dataset, AutoFlow demonstrated robust performance. In the supervised workflow, multiclass classification on BM-MPS achieved 97.2% accuracy under a leave-one-timepoint-out scheme, with high sensitivity and specificity across major lineages. For rare populations, performance was strong: Mosmann Rare (0.03% prevalence) achieved 87.5% sensitivity, and 100% specificity, while Nilsson Rare (0.08% prevalence) achieved 87.9% sensitivity, and 99.9% specificity. The unsupervised workflow accurately grouped cells into biologically meaningful clusters, recovering known populations and identifying a d ditional candidate populations with marker profiles consistent with true biology. AutoFlow offers a fast, reproducible, and scalable solution for FC analysis, enabling high-throughput studies and improving the discovery of rare or unexpected cell types.

Availability: The application is available at https://github.com/FERWoods/AutoFlow for download using R. An archived version is available at DOI :10.5281/zenodo.18235796.

Supplementary information: Supplementary data are available at Bioinformatics online.

目的：流式细胞术（FC）是一种广泛使用的基于特定标记物的荧光分析细胞或颗粒的技术。荧光阈值通常是手动设置的，这是一个费力的主观过程，随着FC技术的进步，这个过程的可扩展性很差。机器学习（ML）方法可以解决这些问题，但通常需要许多实验室科学家不具备的技术费用。因此，需要可访问的、开源的、跨域的基于ml的FC工具。结果：我们提出了AutoFlow，一个易于使用，适应性强的R - Shiny应用程序，用于自动流式细胞术（FC）分析。AutoFlow支持两种工作流程：监督学习和非监督学习。该应用程序自动化关键预处理步骤，包括荧光补偿，碎片排除，单细胞鉴定，表面标记门控，MFI量化，和下游分类或聚类。在三个数据集中，两个可用的数据集（Mosmann和Nilsson Rare）和一个新的骨髓微生理系统（BM-MPS）数据集，AutoFlow显示出强大的性能。在监督工作流程中，BM-MPS的多类分类准确率达到97.2%，在主要谱系中具有较高的灵敏度和特异性。对于罕见人群，表现很好：Mosmann rare（患病率0.03%）的敏感性为87.5%，特异性为100%，Nilsson rare（患病率0.08%）的敏感性为87.9%，特异性为99.9%。无监督的工作流程准确地将细胞分组为具有生物学意义的簇，恢复已知的种群，并识别具有与真实生物学一致的标记谱的其他候选种群。AutoFlow为FC分析提供了快速、可重复和可扩展的解决方案，实现了高通量研究，并改进了罕见或意外细胞类型的发现。可用性：该应用程序可在https://github.com/FERWoods/AutoFlow上使用r下载，存档版本可在DOI:10.5281/zenodo.18235796处获得。补充信息：补充数据可在生物信息学在线获取。

{"title":"AutoFlow: An interactive Shiny app for supervised and unsupervised flow cytometry analysis.","authors":"Freya E R Woods, Emilyanne Leonard, Timothy Ebbels, Jonathan Cairns, Rhiannon David","doi":"10.1093/bioinformatics/btag078","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag078","url":null,"abstract":"Motivation: Flow cytometry (FC) is a widely used technique for analysing cells or particles based on the fluorescence of specific markers. Thresholds for fluorescence are typically set manually, a laborious, subjective process that scales poorly as FC technology advances. Machine learning (ML) methods can address these issues but often require technical expe r tise many bench scientists do not possess. Thus, accessible, open-source, and cross-domain ML-based FC tools are needed.Results: We present AutoFlow, an easy-to-use, adaptable R Shiny application for automa t ed flow cytometry (FC) analysis. AutoFlow supports two workflows: supervised and uns u pervised learning. The application automates key preprocessing steps including fluore s cence compensation, debris exclusion, single-cell identification, surface marker gating, MFI quantification, and downstream classification or clustering. Across three datasets, two pu b licly available (Mosmann and Nilsson Rare) and a novel bone marrow microphysiological system (BM-MPS) dataset, AutoFlow demonstrated robust performance. In the supervised workflow, multiclass classification on BM-MPS achieved 97.2% accuracy under a leave-one-timepoint-out scheme, with high sensitivity and specificity across major lineages. For rare populations, performance was strong: Mosmann Rare (0.03% prevalence) achieved 87.5% sensitivity, and 100% specificity, while Nilsson Rare (0.08% prevalence) achieved 87.9% sensitivity, and 99.9% specificity. The unsupervised workflow accurately grouped cells into biologically meaningful clusters, recovering known populations and identifying a d ditional candidate populations with marker profiles consistent with true biology. AutoFlow offers a fast, reproducible, and scalable solution for FC analysis, enabling high-throughput studies and improving the discovery of rare or unexpected cell types.Availability: The application is available at https://github.com/FERWoods/AutoFlow for download using R. An archived version is available at DOI :10.5281/zenodo.18235796.Supplementary information: Supplementary data are available at Bioinformatics online.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Umite: fast quantification of smart-seq3 libraries with improved UMI retrieval. Umite：快速定量智能seq3文库与改进的UMI检索。

IF 5.4

Bioinformatics (Oxford, England)

Pub Date : 2026-02-15 DOI: 10.1093/bioinformatics/btag075

Leo Carl Foerster, Enrico Frigoli, Xiaoyu Sun, Jooa Hooli, Angela Goncalves, Ana Martin-Villalba

Motivation: Commercial solutions like 10X cellranger provide robust UMI quantification for their proprietary single-cell protocols, but open methods such as Smart-seq3 lack comparable support.

Results: Here, we introduce umite, a Smart-seq3 UMI counting pipeline with a focus on speed and a light memory footprint. Unlike existing tools, umite offers efficient mismatch-tolerant UMI detection, boosting UMI retrieval by 5-15% in benchmarks. It also outperforms current Smart-seq3 quantification tools in runtime, disk usage, and memory footprint, offering better scalability on large datasets.

Availability: umite is available at https://github.com/leoforster/umite (or via Zenodo: https://doi.org/10.5281/zenodo.18166431) and includes a Snakemake workflow for Smart-seq3 quantification. Single cell libraries of the mouse nasal vasculature dataset (GSE207085) and human CD4+ T-cell dataset (GSE270928) used in benchmarking were downloaded from NCBI (see Supplement for details).

Supplementary information: Supplementary data are available at Bioinformatics online.

动机：像10X cellranger这样的商业解决方案为其专有的单细胞协议提供了强大的UMI量化，但像Smart-seq3这样的开放方法缺乏类似的支持。结果：在这里，我们介绍了umite，一个专注于速度和内存占用的Smart-seq3 UMI计数管道。与现有工具不同，umite提供了高效的不匹配UMI检测，在基准测试中将UMI检索提高了5-15%。它在运行时、磁盘使用和内存占用方面也优于当前的Smart-seq3量化工具，在大型数据集上提供更好的可伸缩性。可用性：umite可在https://github.com/leoforster/umite（或通过Zenodo: https://doi.org/10.5281/zenodo.18166431）获得，并包括用于Smart-seq3量化的snakemaker工作流。用于基准测试的小鼠鼻血管数据集（GSE207085）和人CD4+ t细胞数据集（GSE270928）的单细胞文库从NCBI下载（详见补充资料）。补充信息：补充数据可在生物信息学在线获取。

{"title":"Umite: fast quantification of smart-seq3 libraries with improved UMI retrieval.","authors":"Leo Carl Foerster, Enrico Frigoli, Xiaoyu Sun, Jooa Hooli, Angela Goncalves, Ana Martin-Villalba","doi":"10.1093/bioinformatics/btag075","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag075","url":null,"abstract":"Motivation: Commercial solutions like 10X cellranger provide robust UMI quantification for their proprietary single-cell protocols, but open methods such as Smart-seq3 lack comparable support.Results: Here, we introduce umite, a Smart-seq3 UMI counting pipeline with a focus on speed and a light memory footprint. Unlike existing tools, umite offers efficient mismatch-tolerant UMI detection, boosting UMI retrieval by 5-15% in benchmarks. It also outperforms current Smart-seq3 quantification tools in runtime, disk usage, and memory footprint, offering better scalability on large datasets.Availability: umite is available at https://github.com/leoforster/umite (or via Zenodo: https://doi.org/10.5281/zenodo.18166431) and includes a Snakemake workflow for Smart-seq3 quantification. Single cell libraries of the mouse nasal vasculature dataset (GSE207085) and human CD4+ T-cell dataset (GSE270928) used in benchmarking were downloaded from NCBI (see Supplement for details).Supplementary information: Supplementary data are available at Bioinformatics online.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CCC-GPU: A graphics processing unit (GPU)-accelerated nonlinear correlation coefficient for large-scale transcriptomic analyses. CCC-GPU：一个图形处理单元(GPU)-加速非线性相关系数大规模转录组分析。

IF 5.4

Bioinformatics (Oxford, England)

Pub Date : 2026-02-13 DOI: 10.1093/bioinformatics/btag068

Haoyu Zhang, Kevin Fotso, Marc Subirana-Granés, Milton Pividori

Motivation: Identifying meaningful patterns in complex biological data necessitates correlation coefficients capable of capturing diverse relationship types beyond simple linearity. Furthermore, efficient computational tools are crucial for handling the ever-increasing scale of biological datasets.

Results: We introduce CCC-GPU, a high-performance, GPU-accelerated implementation of the Clustermatch Correlation Coefficient (CCC). CCC-GPU computes correlation coefficients for mixed data types, effectively detects nonlinear relationships, and offers significant speed improvements over its predecessor.

Availability and implementation: The source code of CCC-GPU is openly available on GitHub (https://github.com/pivlab/ccc-gpu) and archived on Zenodo (https://doi.org/10.5281/zenodo.18310318), distributed under the BSD-2-Clause Plus Patent License.

动机：在复杂的生物数据中识别有意义的模式需要能够捕获超越简单线性的各种关系类型的相关系数。此外，高效的计算工具对于处理不断增长的生物数据集至关重要。结果：我们介绍了CCC- gpu，一种高性能，gpu加速的群集匹配相关系数（CCC）实现。cc - gpu计算混合数据类型的相关系数，有效地检测非线性关系，并且比其前身提供了显着的速度改进。可用性和实现：cc-gpu的源代码在GitHub （https://github.com/pivlab/ccc-gpu）上公开可用，并在Zenodo （https://doi.org/10.5281/zenodo.18310318）上存档，在BSD-2-Clause Plus专利许可下分发。

引用次数: 0

TrIPP: a Trajectory Iterative pKa Predictor. TrIPP：一个轨迹迭代pKa预测器。

IF 5.4

Bioinformatics (Oxford, England)

Pub Date : 2026-02-12 DOI: 10.1093/bioinformatics/btag063

Christos Matsingos, Ka Fu Man, Arianna Fornili

Summary: The protonation propensity of ionisable residues in proteins can change in response to changes in the local residue environment. The link between protein dynamics and pK a is particularly important in pH regulation of protein structure and function. Here, we introduce TrIPP (Trajectory Iterative pK a Predictor), a Python tool to track and analyse changes in the pK a of ionisable residues along Molecular Dynamics trajectories of proteins. We show how TrIPP can be used to identify residues with physiologically relevant variations in their predicted pK a values during the simulations, and link them to changes in the local and global environment.

Availability and implementation: TrIPP is available at https://github.com/fornililab/TrIPP.

Supplementary information: Supplementary data are available at Bioinformatics online.

摘要：蛋白质中可电离残基的质子化倾向会随着局部残基环境的变化而改变。蛋白质动力学和pK - a之间的联系在蛋白质结构和功能的pH调节中尤为重要。本文介绍了TrIPP (Trajectory Iterative pK a Predictor)，这是一个Python工具，用于跟踪和分析蛋白质分子动力学轨迹上可电离残基pK a的变化。我们展示了TrIPP如何在模拟过程中用于识别具有预测pK值生理相关变化的残基，并将它们与局部和全球环境的变化联系起来。可用性和实施：TrIPP可在https://github.com/fornililab/TrIPP.Supplementary上获得信息；补充数据可在Bioinformatics在线获得。

引用次数: 0