Pub Date : 2026-01-22eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag025
Shenghui Huang, Berina Šabanović, Yuzhong Peng, Quan Zheng, Luca Alessandri, Christopher Heeschen
Motivation: Large language models (LLMs) are rapidly becoming indispensable across the life‑sciences spectrum, from literature mining through clinical decision support to experimental design. Yet, in single‑cell RNA‑sequencing (scRNA‑seq) analysis, most LLM‑enabled tools remain opaque: they output a single label per cluster without disclosing the chain‑of‑ thought that led to that decision. This opaqueness undermines reproducibility, complicates peer‑review, and ultimately slows the adoption of otherwise powerful methods.
Results: We developed GPTBioInsightor, an LLM‑powered assistant that not only annotates cell types, cell states, and pathway activities but also narrates how it arrived at each conclusion, step-by-step. Across benchmark datasets-including peripheral blood mononuclear cells (PBMC3K) and pancreatic ductal adenocarcinoma-GPTBioInsightor achieved at least parity with expert manual curation while delivering transparent reasoning, confidence scores, and literature‑based evidence. By closing the "interpretability gap," GPTBioInsightor equips wet‑lab biologists, computational scientists, and reviewers with an audit‑ready trail, thereby accelerating discovery and fostering trust in AI‑assisted bioinformatics.
Availability and implementation: GPTBioInsightor is freely available on GitHub under a BSD-3-Clause license (https://github.com/huang-sh/GPTBioInsightor).
{"title":"GPTBioInsightor-leveraging large language models for transparent scRAN-seq cell type annotations.","authors":"Shenghui Huang, Berina Šabanović, Yuzhong Peng, Quan Zheng, Luca Alessandri, Christopher Heeschen","doi":"10.1093/bioadv/vbag025","DOIUrl":"https://doi.org/10.1093/bioadv/vbag025","url":null,"abstract":"<p><strong>Motivation: </strong>Large language models (LLMs) are rapidly becoming indispensable across the life‑sciences spectrum, from literature mining through clinical decision support to experimental design. Yet, in single‑cell RNA‑sequencing (scRNA‑seq) analysis, most LLM‑enabled tools remain opaque: they output a single label per cluster without disclosing the chain‑of‑ thought that led to that decision. This opaqueness undermines reproducibility, complicates peer‑review, and ultimately slows the adoption of otherwise powerful methods.</p><p><strong>Results: </strong>We developed GPTBioInsightor, an LLM‑powered assistant that not only annotates cell types, cell states, and pathway activities but also narrates how it arrived at each conclusion, step-by-step. Across benchmark datasets-including peripheral blood mononuclear cells (PBMC3K) and pancreatic ductal adenocarcinoma-GPTBioInsightor achieved at least parity with expert manual curation while delivering transparent reasoning, confidence scores, and literature‑based evidence. By closing the \"interpretability gap,\" GPTBioInsightor equips wet‑lab biologists, computational scientists, and reviewers with an audit‑ready trail, thereby accelerating discovery and fostering trust in AI‑assisted bioinformatics.</p><p><strong>Availability and implementation: </strong>GPTBioInsightor is freely available on GitHub under a BSD-3-Clause license (https://github.com/huang-sh/GPTBioInsightor).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag025"},"PeriodicalIF":2.8,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12975716/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147446097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag002
Barış Salman, Nerses Bebek, Sibel Uğur İşeri
Motivation: Verification of sample sex is an essential quality control step in next-generation sequencing studies, typically assessed from genomic data. Clustering individuals by X chromosome heterozygosity (Xhet) and incorporating relatedness estimates offers a practical first-pass screen for potential sex label errors, sample mix-ups, and pedigree inconsistencies. To better interpret Xhet based patterns, we further investigated the biological and technical origins using the 1000 Genomes Project dataset.
Results: We developed XhetRel, a user-friendly workflow and notebook application that computes Xhet and performs relatedness estimation directly from VCF files. As a fully genotype-based approach, XhetRel enables both sex-based clustering and relatedness assessment as an initial quality control (QC) step in NGS. XhetRel serves groups without bioinformatics infrastructure, users requiring a browser-based QC tool, and workflow developers seeking a modular Nextflow component. Our investigation into the sources of Xhet variation highlighted important limitations in sequencing and variant-calling approaches. In particular, specific pseudogenes and gene clusters, such as SLC25A5 and the GAGE cluster, as recurrent contributors to misleading variant allele fractions.
Availability and implementation: The source code and data are available at Figshare (doi: 10.6084/m9.figshare.28280414). XhetRel can be executed locally via Nextflow or accessed directly through the online Collab notebook at https://colab.research.google.com/drive/1ep69JvXLwK5ndHUQ8qIGTWvauzsTW9fi.
{"title":"XhetRel: a pipeline for X heterozygosity and relatedness analysis of sequencing data.","authors":"Barış Salman, Nerses Bebek, Sibel Uğur İşeri","doi":"10.1093/bioadv/vbag002","DOIUrl":"10.1093/bioadv/vbag002","url":null,"abstract":"<p><strong>Motivation: </strong>Verification of sample sex is an essential quality control step in next-generation sequencing studies, typically assessed from genomic data. Clustering individuals by X chromosome heterozygosity (Xhet) and incorporating relatedness estimates offers a practical first-pass screen for potential sex label errors, sample mix-ups, and pedigree inconsistencies. To better interpret Xhet based patterns, we further investigated the biological and technical origins using the 1000 Genomes Project dataset.</p><p><strong>Results: </strong>We developed XhetRel, a user-friendly workflow and notebook application that computes Xhet and performs relatedness estimation directly from VCF files. As a fully genotype-based approach, XhetRel enables both sex-based clustering and relatedness assessment as an initial quality control (QC) step in NGS. XhetRel serves groups without bioinformatics infrastructure, users requiring a browser-based QC tool, and workflow developers seeking a modular Nextflow component. Our investigation into the sources of Xhet variation highlighted important limitations in sequencing and variant-calling approaches. In particular, specific pseudogenes and gene clusters, such as SLC25A5 and the GAGE cluster, as recurrent contributors to misleading variant allele fractions.</p><p><strong>Availability and implementation: </strong>The source code and data are available at Figshare (doi: 10.6084/m9.figshare.28280414). XhetRel can be executed locally via Nextflow or accessed directly through the online Collab notebook at https://colab.research.google.com/drive/1ep69JvXLwK5ndHUQ8qIGTWvauzsTW9fi.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag002"},"PeriodicalIF":2.8,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12883445/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146159437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag014
Ondřej Huvar, Nikola Beneš, Luboš Brim, Samuel Pastva, David Šafránek
Summary: Sketchbook is a tool for design and analysis of Boolean network sketches, a framework for partial specification of Boolean network models combining static and dynamic logical constraints. The tool combines an intuitive graphical interface with a high-performance inference engine able to efficiently compute the whole set of all admissible candidate models.
Availability and implementation: All software and data are freely available as a reproducible artefact at https://doi.org/10.5281/zenodo.15828328. The up-to-date version of the tool is accessible through https://github.com/sybila/biodivine-sketchbook.
{"title":"Sketchbook: logical model inference from Boolean network sketches.","authors":"Ondřej Huvar, Nikola Beneš, Luboš Brim, Samuel Pastva, David Šafránek","doi":"10.1093/bioadv/vbag014","DOIUrl":"10.1093/bioadv/vbag014","url":null,"abstract":"<p><strong>Summary: </strong>Sketchbook is a tool for design and analysis of <i>Boolean network sketches</i>, a framework for partial specification of Boolean network models combining static and dynamic logical constraints. The tool combines an intuitive graphical interface with a high-performance inference engine able to efficiently compute the whole set of all admissible candidate models.</p><p><strong>Availability and implementation: </strong>All software and data are freely available as a reproducible artefact at https://doi.org/10.5281/zenodo.15828328. The up-to-date version of the tool is accessible through https://github.com/sybila/biodivine-sketchbook.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag014"},"PeriodicalIF":2.8,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12883443/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146159446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag021
Alessandro Barberis, Francesca M Buffa
Summary: The rapid expansion of multi-omics data has enabled the development of molecular signatures-coordinated patterns of molecular features that serve as powerful biomarkers for diagnosis, prognosis, and therapeutic decision-making. Despite their potential, many published signatures suffer from limited reproducibility and narrow applicability, partly due to challenges in summarizing complex, multi-feature profiles into a single, statistically sound and biologically meaningful score. Here, we introduce sigscores, an R package that streamlines the computation of summary scores for molecular signatures. Building on the quality control principles of our earlier tool, sigQC, sigscores supports an extensive array of scoring metrics-including measures of central tendency, dispersion, and aggregation. It incorporates a resampling framework to generate empirical null distributions for rigorous significance assessment and provides integrated visualization tools for diagnostic evaluation. Optimized for parallel execution on multi-core systems, sigscores is well-suited for both exploratory research and high-throughput large-scale applications.
Availability and implementation: Source code freely available for download on GitHub at https://github.com/alebarberis/sigscores, implemented in R and supported on MacOS and MS Windows.
{"title":"Sigscores: summary scores for molecular signatures in R.","authors":"Alessandro Barberis, Francesca M Buffa","doi":"10.1093/bioadv/vbag021","DOIUrl":"10.1093/bioadv/vbag021","url":null,"abstract":"<p><strong>Summary: </strong>The rapid expansion of multi-omics data has enabled the development of molecular signatures-coordinated patterns of molecular features that serve as powerful biomarkers for diagnosis, prognosis, and therapeutic decision-making. Despite their potential, many published signatures suffer from limited reproducibility and narrow applicability, partly due to challenges in summarizing complex, multi-feature profiles into a single, statistically sound and biologically meaningful score. Here, we introduce sigscores, an R package that streamlines the computation of summary scores for molecular signatures. Building on the quality control principles of our earlier tool, sigQC, sigscores supports an extensive array of scoring metrics-including measures of central tendency, dispersion, and aggregation. It incorporates a resampling framework to generate empirical null distributions for rigorous significance assessment and provides integrated visualization tools for diagnostic evaluation. Optimized for parallel execution on multi-core systems, sigscores is well-suited for both exploratory research and high-throughput large-scale applications.</p><p><strong>Availability and implementation: </strong>Source code freely available for download on GitHub at https://github.com/alebarberis/sigscores, implemented in R and supported on MacOS and MS Windows.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag021"},"PeriodicalIF":2.8,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12967214/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147379708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag018
Alexander R Bennett, George Birchenough, Daniel Bojar
Motivation: Protein, mRNA, and metabolite abundances can exhibit rhythmic dynamics, such as during the day-night cycle. Leading bioinformatics platforms for identifying biological rhythms often utilize single-component models of the harmonic oscillator equation, or multi-component models based upon the Cosinor framework. These approaches offer distinct advantages: modelling either temporally resolved regulatory behaviour via the extended harmonic oscillator equation, or complex rhythmic patterns in the case of Cosinor.
Results: Here, we have developed a new platform to combine the advantages of these two approaches. PyCycleBio utilizes bounded-multi-component models and modulus operators alongside the harmonic oscillator equation, to model a diverse and interpretable array of rhythmic behaviours, including the regulation of temporal dynamics via amplitude coefficients. We demonstrate increased sensitivity and functionality of PyCycleBio compared to other analytical frameworks, and uncover new relationships between data modalities or sampling conditions with the qualities of rhythmic behaviours from biological datasets-including transcriptomics, proteomics, and metabolomics. We envision that this new approach for disentangling complicated temporal regulation of biomolecules will advance chronobiology and our understanding of physiology.
Availability and implementation: PyCycleBio is available at: https://github.com/Glycocalex/PyCycleBio, and the Python package is available to install at: https://pypi.org/project/pycyclebio/. PyCycleBio can also be used at https://colab.research.google.com/github/Glycocalex/PyCycleBio/blob/main/PyCycleBio.ipynb with no installations necessary.
{"title":"<i>PyCycleBio</i>: modelling non-sinusoidal-oscillator systems in temporal biology.","authors":"Alexander R Bennett, George Birchenough, Daniel Bojar","doi":"10.1093/bioadv/vbag018","DOIUrl":"10.1093/bioadv/vbag018","url":null,"abstract":"<p><strong>Motivation: </strong>Protein, mRNA, and metabolite abundances can exhibit rhythmic dynamics, such as during the day-night cycle. Leading bioinformatics platforms for identifying biological rhythms often utilize single-component models of the harmonic oscillator equation, or multi-component models based upon the Cosinor framework. These approaches offer distinct advantages: modelling either temporally resolved regulatory behaviour via the extended harmonic oscillator equation, or complex rhythmic patterns in the case of Cosinor.</p><p><strong>Results: </strong>Here, we have developed a new platform to combine the advantages of these two approaches. <i>PyCycleBio</i> utilizes bounded-multi-component models and modulus operators alongside the harmonic oscillator equation, to model a diverse and interpretable array of rhythmic behaviours, including the regulation of temporal dynamics via amplitude coefficients. We demonstrate increased sensitivity and functionality of <i>PyCycleBio</i> compared to other analytical frameworks, and uncover new relationships between data modalities or sampling conditions with the qualities of rhythmic behaviours from biological datasets-including transcriptomics, proteomics, and metabolomics. We envision that this new approach for disentangling complicated temporal regulation of biomolecules will advance chronobiology and our understanding of physiology.</p><p><strong>Availability and implementation: </strong><i>PyCycleBio</i> is available at: https://github.com/Glycocalex/PyCycleBio, and the Python package is available to install at: https://pypi.org/project/pycyclebio/. <i>PyCycleBio</i> can also be used at https://colab.research.google.com/github/Glycocalex/PyCycleBio/blob/main/PyCycleBio.ipynb with no installations necessary.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag018"},"PeriodicalIF":2.8,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12895064/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146203982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag013
Sora Yonezawa, Hidemasa Bono
Motivation: Life science databases include large collections of public transcriptome and large-scale structural data. The reuse and integration of these datasets may facilitate the identification of understudied genes and enable functional annotation across distantly related species, including plants and humans.
Results: In this study, we used heat stress-responsive genes in rice as a model to functionally annotate previously understudied genes by integrating publicly available transcriptome data with structural information from the AlphaFold Protein Structure Database. Initially, we conducted a meta-analysis of public heat stress-related transcriptome datasets, identified gene groups, and verified stress-related terms through enrichment analysis. Subsequently, we performed structural alignment and sequence alignment between rice and human proteins, focusing on candidates exhibiting low sequence similarity but high structural similarity. We further incorporated supplemental data from public databases, including shared domain information between rice and human. This approach yielded a unique set of these candidates, notably those associated with metal homeostasis, such as iron and copper metabolism. Overall, our integrative method provided insights into these genes by leveraging diverse, publicly available datasets.
Availability and implementation: The "plant2human workflow" for this analysis is available at https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.1206.10.
{"title":"Functional annotation of novel heat stress-responsive genes in rice utilizing public transcriptomes and structurome.","authors":"Sora Yonezawa, Hidemasa Bono","doi":"10.1093/bioadv/vbag013","DOIUrl":"10.1093/bioadv/vbag013","url":null,"abstract":"<p><strong>Motivation: </strong>Life science databases include large collections of public transcriptome and large-scale structural data. The reuse and integration of these datasets may facilitate the identification of understudied genes and enable functional annotation across distantly related species, including plants and humans.</p><p><strong>Results: </strong>In this study, we used heat stress-responsive genes in rice as a model to functionally annotate previously understudied genes by integrating publicly available transcriptome data with structural information from the AlphaFold Protein Structure Database. Initially, we conducted a meta-analysis of public heat stress-related transcriptome datasets, identified gene groups, and verified stress-related terms through enrichment analysis. Subsequently, we performed structural alignment and sequence alignment between rice and human proteins, focusing on candidates exhibiting low sequence similarity but high structural similarity. We further incorporated supplemental data from public databases, including shared domain information between rice and human. This approach yielded a unique set of these candidates, notably those associated with metal homeostasis, such as iron and copper metabolism. Overall, our integrative method provided insights into these genes by leveraging diverse, publicly available datasets.</p><p><strong>Availability and implementation: </strong>The \"plant2human workflow\" for this analysis is available at https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.1206.10.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag013"},"PeriodicalIF":2.8,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12889164/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146168093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag019
Ted Zhang, Haoran Chen, Young Je Lee, Matthew Ruffalo, Robert F Murphy
Motivation: There has been tremendous recent growth both in technologies for measurement of many different markers in the same tissue and in resulting datasets (especially from projects such as HuBMAP and the Human Cell Atlas). Analysis of images in these datasets is often restricted to measuring the amount of each marker in each cell. While this is important, it ignores other information that is contained in tissue images. SPRM was therefore created for use in the HuBMAP image analysis pipelines and can be used for any spatial proteomics dataset.
Results: It calculates a number of measures of image quality, including metrics for the quality of cell segmentation, and extracts many different types of cell features that give much richer characterization than just marker intensities per cell. Different feature types are used to cluster cells into potential cell types to view the tissue through these different lenses, and these are compared to expert annotations if provided in order to define cell subtypes. The package also constructs a cell adjacency matrix to characterize cell spatial distributions. Example analyses are provided in Supplementary Information.
Availability and implementation: SPRM is available as python open source at https://github.com/hubmapconsortium/sprm and as a PyPI package.
{"title":"SPRM: spatial process and relationship modeling for multiplexed images.","authors":"Ted Zhang, Haoran Chen, Young Je Lee, Matthew Ruffalo, Robert F Murphy","doi":"10.1093/bioadv/vbag019","DOIUrl":"10.1093/bioadv/vbag019","url":null,"abstract":"<p><strong>Motivation: </strong>There has been tremendous recent growth both in technologies for measurement of many different markers in the same tissue and in resulting datasets (especially from projects such as HuBMAP and the Human Cell Atlas). Analysis of images in these datasets is often restricted to measuring the amount of each marker in each cell. While this is important, it ignores other information that is contained in tissue images. SPRM was therefore created for use in the HuBMAP image analysis pipelines and can be used for any spatial proteomics dataset.</p><p><strong>Results: </strong>It calculates a number of measures of image quality, including metrics for the quality of cell segmentation, and extracts many different types of cell features that give much richer characterization than just marker intensities per cell. Different feature types are used to cluster cells into potential cell types to view the tissue through these different lenses, and these are compared to expert annotations if provided in order to define cell subtypes. The package also constructs a cell adjacency matrix to characterize cell spatial distributions. Example analyses are provided in Supplementary Information.</p><p><strong>Availability and implementation: </strong>SPRM is available as python open source at https://github.com/hubmapconsortium/sprm and as a PyPI package.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag019"},"PeriodicalIF":2.8,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12895062/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag017
Daniella L Matute, Thomas H Clarke, Andrew R LaPointe, Indresh Singh, Derrick E Fouts
Summary: cAMRah is a curated workflow designed to predict antimicrobial resistance (AMR) genes in microbial genomes, either in the cloud or on any personal computer running Docker containers. Numerous AMR gene-finding packages exist, each utilizing different algorithms and prediction methods. cAMRah adopts a consensus-based approach to AMR prediction, recognizing that no single tool can identify all AMR genes. It integrates and runs six AMR-finder tools and databases (with plans for future expansion), scores the AMR predictions, maps all results to CDS coordinates and harmonizes the annotation, resulting in consistent gene symbols and ontologies.
Availability and implementation: Source code, demo data and detailed documentation are freely available at https://github.com/JCVenterInstitute/CAMRA.
{"title":"cAMRah: a scalable and portable workflow for harmonized antimicrobial resistance gene prediction from bacterial genomes.","authors":"Daniella L Matute, Thomas H Clarke, Andrew R LaPointe, Indresh Singh, Derrick E Fouts","doi":"10.1093/bioadv/vbag017","DOIUrl":"https://doi.org/10.1093/bioadv/vbag017","url":null,"abstract":"<p><strong>Summary: </strong>cAMRah is a curated workflow designed to predict antimicrobial resistance (AMR) genes in microbial genomes, either in the cloud or on any personal computer running Docker containers. Numerous AMR gene-finding packages exist, each utilizing different algorithms and prediction methods. cAMRah adopts a consensus-based approach to AMR prediction, recognizing that no single tool can identify all AMR genes. It integrates and runs six AMR-finder tools and databases (with plans for future expansion), scores the AMR predictions, maps all results to CDS coordinates and harmonizes the annotation, resulting in consistent gene symbols and ontologies.</p><p><strong>Availability and implementation: </strong>Source code, demo data and detailed documentation are freely available at https://github.com/JCVenterInstitute/CAMRA.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag017"},"PeriodicalIF":2.8,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12910510/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146222213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag012
Prateek Arora, Simon Isfort, Nick Kirschke, Mojgan Masoodi, Nadia Mercader
Motivation: Spatial lipidomics enables the study of how lipids are distributed within tissues, providing insights into tissue structure and function. However, analyzing complex mass spectrometry (MS) imaging (MSI) data remains challenging due to limited tools for high-confidence annotation, especially for integrating MSI, MS, and MS/MS pipelines.
Results: We developed LipidLocator, an open-source, interactive Shiny web application as a unified spatial lipidomics pipeline. LipidLocator integrates MSI data analysis from normalization, spatial clustering, and differential abundance analysis to MS and MS/MS-based lipid annotation. We utilized LipidLocator to analyze DESI-MSI and AP-SMALDI data from adult zebrafish sections, human renal carcinoma, and mouse whole brain sections, to demonstrate its ability to segment distinct anatomical structures and tissue sub-regions and to generate high-confidence lipid profiles using integrated MS and MS/MS annotation. LipidLocator is an end-to-end open-source spatial lipidomics pipeline, facilitating lipid imaging studies in various organisms and covering different lipid detection technologies, providing a valuable and user-friendly resource for investigating lipid metabolism.
Availability and implementation: The LipidLocator application is freely available as a Docker image on Docker Hub at pratarora/lipidlocator. Installation instructions and code are available at https://github.com/MercaderLabAnatomy/LipidLocator.
{"title":"LipidLocator: an open source Shiny web application for spatial lipidomics.","authors":"Prateek Arora, Simon Isfort, Nick Kirschke, Mojgan Masoodi, Nadia Mercader","doi":"10.1093/bioadv/vbag012","DOIUrl":"10.1093/bioadv/vbag012","url":null,"abstract":"<p><strong>Motivation: </strong>Spatial lipidomics enables the study of how lipids are distributed within tissues, providing insights into tissue structure and function. However, analyzing complex mass spectrometry (MS) imaging (MSI) data remains challenging due to limited tools for high-confidence annotation, especially for integrating MSI, MS, and MS/MS pipelines.</p><p><strong>Results: </strong>We developed LipidLocator, an open-source, interactive Shiny web application as a unified spatial lipidomics pipeline. LipidLocator integrates MSI data analysis from normalization, spatial clustering, and differential abundance analysis to MS and MS/MS-based lipid annotation. We utilized LipidLocator to analyze DESI-MSI and AP-SMALDI data from adult zebrafish sections, human renal carcinoma, and mouse whole brain sections, to demonstrate its ability to segment distinct anatomical structures and tissue sub-regions and to generate high-confidence lipid profiles using integrated MS and MS/MS annotation. LipidLocator is an end-to-end open-source spatial lipidomics pipeline, facilitating lipid imaging studies in various organisms and covering different lipid detection technologies, providing a valuable and user-friendly resource for investigating lipid metabolism.</p><p><strong>Availability and implementation: </strong>The LipidLocator application is freely available as a Docker image on Docker Hub at pratarora/lipidlocator. Installation instructions and code are available at https://github.com/MercaderLabAnatomy/LipidLocator.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag012"},"PeriodicalIF":2.8,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12883462/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146159489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag011
Tianxiao Zhao, Adam L Haber
Motivation: A defining characteristic of biological tissue is its cell type composition. Many pathologies and chronic diseases are associated with perturbations from the homeostatic composition, and these transformations can lead to aberrant or deleterious tissue function. Spatial transcriptomics enables the concurrent measurement of gene expression and cell type composition, providing an opportunity to identify transcripts that co-vary with and potentially influence nearby cell composition. However, no method yet exists to systematically identify such intercellular regulatory factors.
Results: Here, we develop Spatial Paired Expression Ratio (SPER), a computational approach to evaluate the spatial dependence between transcript abundance and cell type proportions in spatial transcriptomics. We demonstrate the ability of SPER to accurately detect paracrine drivers of cellular abundance using simulated data. Using publicly available spatial transcriptomics data from mouse brain and human lung, we show that genes identified by SPER show statistical enrichment for both extracellular secretion and participation in known receptor-ligand interactions, supporting their potential role as compositional regulators. Taken together, SPER represents a general approach to discover paracrine drivers of cellular compositional changes from spatial transcriptomics.
Availability and implementation: The methods are implemented in R and available at: https://github.com/TianxiaoNYU/SPER.
{"title":"Discovering paracrine regulators of cell type composition from spatial transcriptomics using SPER.","authors":"Tianxiao Zhao, Adam L Haber","doi":"10.1093/bioadv/vbag011","DOIUrl":"10.1093/bioadv/vbag011","url":null,"abstract":"<p><strong>Motivation: </strong>A defining characteristic of biological tissue is its cell type composition. Many pathologies and chronic diseases are associated with perturbations from the homeostatic composition, and these transformations can lead to aberrant or deleterious tissue function. Spatial transcriptomics enables the concurrent measurement of gene expression and cell type composition, providing an opportunity to identify transcripts that co-vary with and potentially influence nearby cell composition. However, no method yet exists to systematically identify such intercellular regulatory factors.</p><p><strong>Results: </strong>Here, we develop Spatial Paired Expression Ratio (SPER), a computational approach to evaluate the spatial dependence between transcript abundance and cell type proportions in spatial transcriptomics. We demonstrate the ability of SPER to accurately detect paracrine drivers of cellular abundance using simulated data. Using publicly available spatial transcriptomics data from mouse brain and human lung, we show that genes identified by SPER show statistical enrichment for both extracellular secretion and participation in known receptor-ligand interactions, supporting their potential role as compositional regulators. Taken together, SPER represents a general approach to discover paracrine drivers of cellular compositional changes from spatial transcriptomics.</p><p><strong>Availability and implementation: </strong>The methods are implemented in R and available at: https://github.com/TianxiaoNYU/SPER.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag011"},"PeriodicalIF":2.8,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12895071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146203912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}