Pub Date : 2026-02-10DOI: 10.1093/bioinformatics/btag055
Massimo Andreatta, Santiago J Carmona
Summary: Gene signature scoring provides a simple yet powerful approach for quantifying biological signals within single-cell omics datasets. UCell and pyUCell offer fast and robust implementations of rank-based signature scoring for R and Python, respectively, integrating seamlessly with leading single-cell analysis ecosystems such as Seurat, Bioconductor, and scanpy/scverse.
Availability and implementation: UCell v2 is distributed as an R package by BioConductor (https://bioconductor.org/packages/UCell/) and as a Python package by pyPI (https://pypi.org/project/pyucell/).
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"UCell and pyUCell: single-cell gene signature scoring for R and python.","authors":"Massimo Andreatta, Santiago J Carmona","doi":"10.1093/bioinformatics/btag055","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag055","url":null,"abstract":"<p><strong>Summary: </strong>Gene signature scoring provides a simple yet powerful approach for quantifying biological signals within single-cell omics datasets. UCell and pyUCell offer fast and robust implementations of rank-based signature scoring for R and Python, respectively, integrating seamlessly with leading single-cell analysis ecosystems such as Seurat, Bioconductor, and scanpy/scverse.</p><p><strong>Availability and implementation: </strong>UCell v2 is distributed as an R package by BioConductor (https://bioconductor.org/packages/UCell/) and as a Python package by pyPI (https://pypi.org/project/pyucell/).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146159723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-09DOI: 10.1093/bioinformatics/btag056
Jiaying Hu, Yihang Du, Suyang Hou, Yueyang Ding, Jinyan Li, Hao Wu, Xiaobo Sun
Motivation: Spatial clustering is a critical analytical task in spatial transcriptomics (ST) that aids in uncovering the spatial molecular mechanisms underlying biological phenotypes. Along with the numerous spatial clustering methods, there comes the imperative need for an effective metric to evaluate their performance. An ideal metric should consider three factors: label agreement, spatial organization, and error severity. However, existing evaluation metrics focus solely on either label agreement or spatial organization, leading to biased and misleading evaluations.
Results: To fill this gap, we propose CEMUSA, a novel graph-based metric that integrates these factors into a unified evaluation framework. Extensive testing on both simulated and real datasets demonstrate CEMUSA's superiority over conventional metrics in differentiating clustering results with subtle differences in topology and error severity, while maintaining computational efficiency.
Availability and implementation: The source code and data is freely available at https://github.com/YihDu/CEMUSA. CEMUSA is implemented as an R package at https://yihdu.github.io/CEMUSA.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"CEMUSA: A Graph-based Integrative Metric for Evaluating Clusters in Spatial Transcriptomics.","authors":"Jiaying Hu, Yihang Du, Suyang Hou, Yueyang Ding, Jinyan Li, Hao Wu, Xiaobo Sun","doi":"10.1093/bioinformatics/btag056","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag056","url":null,"abstract":"<p><strong>Motivation: </strong>Spatial clustering is a critical analytical task in spatial transcriptomics (ST) that aids in uncovering the spatial molecular mechanisms underlying biological phenotypes. Along with the numerous spatial clustering methods, there comes the imperative need for an effective metric to evaluate their performance. An ideal metric should consider three factors: label agreement, spatial organization, and error severity. However, existing evaluation metrics focus solely on either label agreement or spatial organization, leading to biased and misleading evaluations.</p><p><strong>Results: </strong>To fill this gap, we propose CEMUSA, a novel graph-based metric that integrates these factors into a unified evaluation framework. Extensive testing on both simulated and real datasets demonstrate CEMUSA's superiority over conventional metrics in differentiating clustering results with subtle differences in topology and error severity, while maintaining computational efficiency.</p><p><strong>Availability and implementation: </strong>The source code and data is freely available at https://github.com/YihDu/CEMUSA. CEMUSA is implemented as an R package at https://yihdu.github.io/CEMUSA.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146151369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: N6-methyladenine (6 mA) is an important epigenetic modification of DNA that regulates biological processes such as gene expression, transcription, replication, DNA repair, and cell cycle without altering the DNA sequence. It also plays a key role in many diseases including cancer and autoimmune diseases. Although experimental approaches such as SMRT sequencing and methylated DNA immunoprecipitation can identify 6 mA sites, they suffer from drawbacks including suboptimal sequencing quality, low signal-to-noise ratios, high costs, and time-consuming procedures. In recent years, deep learning approaches have demonstrated significant advantages in predicting 6 mA sites; however, their generalization ability still requires further improvement.
Results: Inspired by the state space model Mamba, we propose a novel model for 6 mA site prediction, named Mamba6mA. In the Mamba6mA model, we design position-specific linear layers to replace traditional convolutional layers to facilitate capture specific positional information. Meanwhile, we construct a multi-scale feature extraction module and integrate features captured by sliding windows of different scales, feeding them into the classifier for prediction. Experimental results show that Mamba6mA achieves the best MCC on 9 out of 11 species datasets, surpassing existing state-of-the-art models. Ablation studies confirm that the position-specific linear layers and the multi-scale fusion module contribute MCC performance gains of 2.36% and 2.31%, respectively. Feature visualization analysis further reveals that the model effectively captures sequence patterns upstream and downstream of 6 mA sites providing a new technical approach for studying epigenetic modification mechanisms.
Availability and implementation: The source code for Mamba6mA is available at: https://github.com/XploreAI-Lab/Mamba6mA.
Contact: Xiaoya Fan (xiaoyafan@dlut.edu.cn), Zheng Zhao (zhaozheng@dlmu.edu.cn).
Supplementary information: Supplementary information are available at Bioinformatics online.
{"title":"Mamba6mA: A Mamba-based DNA N6-methyladenine Site Prediction Model.","authors":"Qi Zhao, Zhen Zhang, Tingwei Chen, Qian Mao, Haoxuan Shi, Jingjing Chen, Zheng Zhao, Xiaoya Fan","doi":"10.1093/bioinformatics/btag060","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag060","url":null,"abstract":"<p><strong>Motivation: </strong>N6-methyladenine (6 mA) is an important epigenetic modification of DNA that regulates biological processes such as gene expression, transcription, replication, DNA repair, and cell cycle without altering the DNA sequence. It also plays a key role in many diseases including cancer and autoimmune diseases. Although experimental approaches such as SMRT sequencing and methylated DNA immunoprecipitation can identify 6 mA sites, they suffer from drawbacks including suboptimal sequencing quality, low signal-to-noise ratios, high costs, and time-consuming procedures. In recent years, deep learning approaches have demonstrated significant advantages in predicting 6 mA sites; however, their generalization ability still requires further improvement.</p><p><strong>Results: </strong>Inspired by the state space model Mamba, we propose a novel model for 6 mA site prediction, named Mamba6mA. In the Mamba6mA model, we design position-specific linear layers to replace traditional convolutional layers to facilitate capture specific positional information. Meanwhile, we construct a multi-scale feature extraction module and integrate features captured by sliding windows of different scales, feeding them into the classifier for prediction. Experimental results show that Mamba6mA achieves the best MCC on 9 out of 11 species datasets, surpassing existing state-of-the-art models. Ablation studies confirm that the position-specific linear layers and the multi-scale fusion module contribute MCC performance gains of 2.36% and 2.31%, respectively. Feature visualization analysis further reveals that the model effectively captures sequence patterns upstream and downstream of 6 mA sites providing a new technical approach for studying epigenetic modification mechanisms.</p><p><strong>Availability and implementation: </strong>The source code for Mamba6mA is available at: https://github.com/XploreAI-Lab/Mamba6mA.</p><p><strong>Contact: </strong>Xiaoya Fan (xiaoyafan@dlut.edu.cn), Zheng Zhao (zhaozheng@dlmu.edu.cn).</p><p><strong>Supplementary information: </strong>Supplementary information are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-05DOI: 10.1093/bioinformatics/btag059
Weiqiang Lin, Xinyi Xiao, Chuan Qiu, Hui Shen, Hongwen Deng
Motivation: Understanding spatial organization, intercellular interactions and regulatory networks within the spatial context of tissues is crucial for uncovering complex biological processes and disease mechanisms. Spatial transcriptomics technologies have revolutionized this field by enabling the spatially resolved profiling of gene expression. 10X Genomics Visium has emerged as the predominant spatial technology, but its low resolution and the complexity of integrating multimodal datasets present significant analytical challenges, particularly for researchers with limited computational and statistical expertise. Current spatial transcriptomics analysis platforms generally fall short of effectively integrating multi-modal data and maximizing the utility of spatial information-such as uncovering complex cellular spatial dependencies, multimodal gradient patterns and spatial co-expression of ligand-receptor pairs and regulatory networks related to disease or biological states-thereby limiting their ability to provide comprehensive end-to-end analytical workflows when analyzing 10X Genomics Visium data.
Results: To address these limitations, we developed transFusion, a novel, advanced web-based platform specializing in the most comprehensive and effective integration analysis of scRNA-seq and 10X Visium spatial transcriptomics data. transFusion offers 12 key functions, from basic visualization to advanced analyses, including intercellular dependency analysis, ligand-receptor co-expression identification and visualization, and spatial multimodal gradient variation patterns. Two case studies were used to demonstrate transFusion's capabilities in exploring tissue architecture, intercellular communication, dependency networks and multimodal gradient variation patterns with minimal computational skills and statistical expertise. transFusion provides a flexible and powerful framework for multi-modal data integration analysis.
Availability: transFusion is freely available at https://github.com/WQLin8/transFusion.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"transFusion: a Novel Comprehensive Platform for integration Analysis of Single-Cell and Spatial Transcriptomics.","authors":"Weiqiang Lin, Xinyi Xiao, Chuan Qiu, Hui Shen, Hongwen Deng","doi":"10.1093/bioinformatics/btag059","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag059","url":null,"abstract":"<p><strong>Motivation: </strong>Understanding spatial organization, intercellular interactions and regulatory networks within the spatial context of tissues is crucial for uncovering complex biological processes and disease mechanisms. Spatial transcriptomics technologies have revolutionized this field by enabling the spatially resolved profiling of gene expression. 10X Genomics Visium has emerged as the predominant spatial technology, but its low resolution and the complexity of integrating multimodal datasets present significant analytical challenges, particularly for researchers with limited computational and statistical expertise. Current spatial transcriptomics analysis platforms generally fall short of effectively integrating multi-modal data and maximizing the utility of spatial information-such as uncovering complex cellular spatial dependencies, multimodal gradient patterns and spatial co-expression of ligand-receptor pairs and regulatory networks related to disease or biological states-thereby limiting their ability to provide comprehensive end-to-end analytical workflows when analyzing 10X Genomics Visium data.</p><p><strong>Results: </strong>To address these limitations, we developed transFusion, a novel, advanced web-based platform specializing in the most comprehensive and effective integration analysis of scRNA-seq and 10X Visium spatial transcriptomics data. transFusion offers 12 key functions, from basic visualization to advanced analyses, including intercellular dependency analysis, ligand-receptor co-expression identification and visualization, and spatial multimodal gradient variation patterns. Two case studies were used to demonstrate transFusion's capabilities in exploring tissue architecture, intercellular communication, dependency networks and multimodal gradient variation patterns with minimal computational skills and statistical expertise. transFusion provides a flexible and powerful framework for multi-modal data integration analysis.</p><p><strong>Availability: </strong>transFusion is freely available at https://github.com/WQLin8/transFusion.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-04DOI: 10.1093/bioinformatics/btag057
Chengye Li, Hongwei Ma, Mingyang Ren
Motivation: Heterogeneity is a hallmark of both macroscopic complex diseases and microscopic single-cell distribution. Gaussian Graphical Models (GGM)-based heterogeneity analysis highlights its important role in capturing the essential characteristics of biological regulatory networks, but faces instability with scarce samples from rare subgroups. Transfer learning offers promise by leveraging auxiliary data, yet existing approaches rely on unrealistic overall similarity between domains, requiring the same subgroup number and similar parameters. Numerous biological problems call for local similarities, where only some subgroups share statistical structures.
Results: In this article, we propose LtransHeteroGGM, a novel local transfer learning framework for GGM-based heterogeneity analysis. It can achieve powerful subgroup-level local knowledge transfer between target and informative auxiliary domains, despite unknown subgroup structures and numbers, while mitigating the negative interference of non-informative domains. The effectiveness and robustness of the proposed approach are demonstrated through comprehensive numerical simulations and real-world T cell heterogeneity analysis.
Availability and implementation: The R implementation of LtransHeteroGGM is available at https://github.com/Ren-Mingyang/LtransHeteroGGM.
{"title":"LtransHeteroGGM: Local transfer learning for Gaussian graphical model-based heterogeneity analysis.","authors":"Chengye Li, Hongwei Ma, Mingyang Ren","doi":"10.1093/bioinformatics/btag057","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag057","url":null,"abstract":"<p><strong>Motivation: </strong>Heterogeneity is a hallmark of both macroscopic complex diseases and microscopic single-cell distribution. Gaussian Graphical Models (GGM)-based heterogeneity analysis highlights its important role in capturing the essential characteristics of biological regulatory networks, but faces instability with scarce samples from rare subgroups. Transfer learning offers promise by leveraging auxiliary data, yet existing approaches rely on unrealistic overall similarity between domains, requiring the same subgroup number and similar parameters. Numerous biological problems call for local similarities, where only some subgroups share statistical structures.</p><p><strong>Results: </strong>In this article, we propose LtransHeteroGGM, a novel local transfer learning framework for GGM-based heterogeneity analysis. It can achieve powerful subgroup-level local knowledge transfer between target and informative auxiliary domains, despite unknown subgroup structures and numbers, while mitigating the negative interference of non-informative domains. The effectiveness and robustness of the proposed approach are demonstrated through comprehensive numerical simulations and real-world T cell heterogeneity analysis.</p><p><strong>Availability and implementation: </strong>The R implementation of LtransHeteroGGM is available at https://github.com/Ren-Mingyang/LtransHeteroGGM.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1093/bioinformatics/btag058
Joan Segura, Ruben Sanchez-Garcia, Sebastian Bittrich, Yana Rose, Stephen K Burley, Jose M Duarte
Motivation: The rapid expansion of three-dimensional (3D) biomolecular structure information, driven by breakthroughs in artificial intelligence/deep learning (AI/DL)-based structure predictions, has created an urgent need for scalable and efficient structure similarity search methods. Traditional alignment-based approaches, such as structural superposition tools, are computationally expensive and challenging to scale with the vast number of available macromolecular structures.
Results: Herein, we present a scalable structure similarity search strategy designed to navigate extensive repositories of experimentally determined structures and computed structure models predicted using AI/DL methods. Our approach leverages protein language models and a deep neural network architecture to transform 3D structures into fixed-length vectors, enabling efficient large-scale comparisons. Although trained to predict TM-scores between single-domain structures, our model generalizes beyond the domain level, accurately identifying 3D similarity for full-length polypeptide chains and multimeric assemblies. By integrating vector databases, our method facilitates efficient large-scale structure retrieval, addressing the growing challenges posed by the expanding volume of 3D biostructure information.
Availability: Source code available at https://github.com/bioinsilico/rcsb-embedding-search.Source code DOI: https://doi.org/10.6084/m9.figshare.30546698.v1.Benchmark datasets DOI: https://doi.org/10.6084/m9.figshare.30546650.v1.Web server prototype available at: http://embedding-search.rcsb.org/.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"Multi-scale structural similarity embedding search across entire proteomes.","authors":"Joan Segura, Ruben Sanchez-Garcia, Sebastian Bittrich, Yana Rose, Stephen K Burley, Jose M Duarte","doi":"10.1093/bioinformatics/btag058","DOIUrl":"10.1093/bioinformatics/btag058","url":null,"abstract":"<p><strong>Motivation: </strong>The rapid expansion of three-dimensional (3D) biomolecular structure information, driven by breakthroughs in artificial intelligence/deep learning (AI/DL)-based structure predictions, has created an urgent need for scalable and efficient structure similarity search methods. Traditional alignment-based approaches, such as structural superposition tools, are computationally expensive and challenging to scale with the vast number of available macromolecular structures.</p><p><strong>Results: </strong>Herein, we present a scalable structure similarity search strategy designed to navigate extensive repositories of experimentally determined structures and computed structure models predicted using AI/DL methods. Our approach leverages protein language models and a deep neural network architecture to transform 3D structures into fixed-length vectors, enabling efficient large-scale comparisons. Although trained to predict TM-scores between single-domain structures, our model generalizes beyond the domain level, accurately identifying 3D similarity for full-length polypeptide chains and multimeric assemblies. By integrating vector databases, our method facilitates efficient large-scale structure retrieval, addressing the growing challenges posed by the expanding volume of 3D biostructure information.</p><p><strong>Availability: </strong>Source code available at https://github.com/bioinsilico/rcsb-embedding-search.Source code DOI: https://doi.org/10.6084/m9.figshare.30546698.v1.Benchmark datasets DOI: https://doi.org/10.6084/m9.figshare.30546650.v1.Web server prototype available at: http://embedding-search.rcsb.org/.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146115223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1093/bioinformatics/btag018
Artem Ivanov, Vladimir Popov, Maxim Morozov, Evgenii Olekhnovich, Vladimir Ulyantsev
Motivation: Microbial communities consist of thousands of microorganisms and viruses and have a tight connection with an environment, such as gut microbiota modulation of host body metabolism. However, the direct relationship between the presence of certain microorganism and the host state often remains unknown. Toolkits using reference-based approaches are limited to microbes present in databases. Reference-free methods often require enormous resources for metagenomic assembly or results in many poorly interpretable features based on k-mers.
Results: Here we present MetaFX-an open-source library for feature extraction from whole-genome metagenomic sequencing data and classification of groups of samples. Using a large volume of metagenomic samples deposited in databases, MetaFX compares samples grouped by metadata criteria (e.g. disease, treatment, etc.) and constructs genomic features distinct for certain types of communities. Features constructed based on statistical k-mer analysis and de Bruijn graphs partition. Those features are used in machine learning models for classification of novel samples. Extracted features can be visualized on de Bruijn graphs and annotated for providing biological insights. We demonstrate the utility of MetaFX by building classification models for 590 human gut samples with inflammatory bowel disease. Our results outperform the previous research disease prediction accuracy up to 17%, and improves classification results compared to taxonomic analysis by 9±10% on average.
Availability and implementation: MetaFX is a feature extraction toolkit applicable for metagenomic datasets analysis and samples classification. The source code, test data, and relevant information for MetaFX are freely accessible at https://github.com/ctlab/metafx under the MIT License. Alternatively, MetaFX can be obtained via http://doi.org/10.5281/zenodo.16949369.
{"title":"MetaFX: feature extraction from whole-genome metagenomic sequencing data.","authors":"Artem Ivanov, Vladimir Popov, Maxim Morozov, Evgenii Olekhnovich, Vladimir Ulyantsev","doi":"10.1093/bioinformatics/btag018","DOIUrl":"10.1093/bioinformatics/btag018","url":null,"abstract":"<p><strong>Motivation: </strong>Microbial communities consist of thousands of microorganisms and viruses and have a tight connection with an environment, such as gut microbiota modulation of host body metabolism. However, the direct relationship between the presence of certain microorganism and the host state often remains unknown. Toolkits using reference-based approaches are limited to microbes present in databases. Reference-free methods often require enormous resources for metagenomic assembly or results in many poorly interpretable features based on k-mers.</p><p><strong>Results: </strong>Here we present MetaFX-an open-source library for feature extraction from whole-genome metagenomic sequencing data and classification of groups of samples. Using a large volume of metagenomic samples deposited in databases, MetaFX compares samples grouped by metadata criteria (e.g. disease, treatment, etc.) and constructs genomic features distinct for certain types of communities. Features constructed based on statistical k-mer analysis and de Bruijn graphs partition. Those features are used in machine learning models for classification of novel samples. Extracted features can be visualized on de Bruijn graphs and annotated for providing biological insights. We demonstrate the utility of MetaFX by building classification models for 590 human gut samples with inflammatory bowel disease. Our results outperform the previous research disease prediction accuracy up to 17%, and improves classification results compared to taxonomic analysis by 9±10% on average.</p><p><strong>Availability and implementation: </strong>MetaFX is a feature extraction toolkit applicable for metagenomic datasets analysis and samples classification. The source code, test data, and relevant information for MetaFX are freely accessible at https://github.com/ctlab/metafx under the MIT License. Alternatively, MetaFX can be obtained via http://doi.org/10.5281/zenodo.16949369.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12891910/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1093/bioinformatics/btag035
David Kouřil, Trevor Manz, Tereza Clarence, Nils Gehlenborg
Summary: Uchimata is a toolkit for visualization of 3D structures of genomes. It consists of two packages: a Javascript library facilitating the rendering of 3D models of genomes, and a Python widget for visualization in Jupyter Notebooks. Main features include an expressive way to specify visual encodings, and filtering of 3D genome structures based on genomic semantics and spatial aspects. Uchimata is designed to be highly integratable with biological tooling available in Python.
Availability and implementation: Uchimata is released under the MIT License. The Javascript library is available on NPM, while the widget is available as a Python package hosted on PyPI. The source code for both is available publicly on Github (https://github.com/hms-dbmi/uchimata and https://github.com/hms-dbmi/uchimata-py) and Zenodo (https://doi.org/10.5281/zenodo.17831959 and https://doi.org/10.5281/zenodo.17832045). The documentation with examples is hosted at https://hms-dbmi.github.io/uchimata/.
{"title":"Uchimata: a toolkit for visualization of 3D genome structures on the web and in computational notebooks.","authors":"David Kouřil, Trevor Manz, Tereza Clarence, Nils Gehlenborg","doi":"10.1093/bioinformatics/btag035","DOIUrl":"10.1093/bioinformatics/btag035","url":null,"abstract":"<p><strong>Summary: </strong>Uchimata is a toolkit for visualization of 3D structures of genomes. It consists of two packages: a Javascript library facilitating the rendering of 3D models of genomes, and a Python widget for visualization in Jupyter Notebooks. Main features include an expressive way to specify visual encodings, and filtering of 3D genome structures based on genomic semantics and spatial aspects. Uchimata is designed to be highly integratable with biological tooling available in Python.</p><p><strong>Availability and implementation: </strong>Uchimata is released under the MIT License. The Javascript library is available on NPM, while the widget is available as a Python package hosted on PyPI. The source code for both is available publicly on Github (https://github.com/hms-dbmi/uchimata and https://github.com/hms-dbmi/uchimata-py) and Zenodo (https://doi.org/10.5281/zenodo.17831959 and https://doi.org/10.5281/zenodo.17832045). The documentation with examples is hosted at https://hms-dbmi.github.io/uchimata/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12904833/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ProteoGyver: a fast, user-friendly tool for routine QC and analysis of MS-based proteomics data.","authors":"Kari Salokas, Salla Keskitalo, Markku Varjosalo","doi":"10.1093/bioinformatics/btag050","DOIUrl":"10.1093/bioinformatics/btag050","url":null,"abstract":"<p><strong>Availability and implementation: </strong>PG image and source code are available in github and dockerhub under LGPL-2.1.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12910381/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1093/bioinformatics/btag024
Soo Bin Kwon, Jason Ernst
Motivation: Identifying pairwise associations between genomic loci is an important challenge for which large and diverse collections of epigenomic and transcription factor (TF) binding data can potentially be informative.
Results: We developed Learning Evidence of Pairwise Association from Epigenomic and TF binding data (LEPAE). LEPAE uses neural networks to quantify evidence of association for pairs of genomic windows from large-scale epigenomic and TF binding data along with distance information. We applied LEPAE using thousands of human datasets. We show using additional data that LEPAE captures biologically meaningful pairwise relationships between genomic loci, and we expect LEPAE scores to be a resource.
Availability and implementation: The LEPAE scores and the software are available at https://github.com/ernstlab/LEPAE.
{"title":"Learning a pairwise epigenomic and transcription factor binding association score across the human genome.","authors":"Soo Bin Kwon, Jason Ernst","doi":"10.1093/bioinformatics/btag024","DOIUrl":"10.1093/bioinformatics/btag024","url":null,"abstract":"<p><strong>Motivation: </strong>Identifying pairwise associations between genomic loci is an important challenge for which large and diverse collections of epigenomic and transcription factor (TF) binding data can potentially be informative.</p><p><strong>Results: </strong>We developed Learning Evidence of Pairwise Association from Epigenomic and TF binding data (LEPAE). LEPAE uses neural networks to quantify evidence of association for pairs of genomic windows from large-scale epigenomic and TF binding data along with distance information. We applied LEPAE using thousands of human datasets. We show using additional data that LEPAE captures biologically meaningful pairwise relationships between genomic loci, and we expect LEPAE scores to be a resource.</p><p><strong>Availability and implementation: </strong>The LEPAE scores and the software are available at https://github.com/ernstlab/LEPAE.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12910503/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}