Pub Date : 2026-01-27eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf331
Guadalupe Hernández-Martínez, Andrés Hernández-Oliveras, Ángel Zarain-Herzberg, Juan Santiago-García
Motivation: Dysregulation of Ca2+-signaling genes has been shown in some types of cancer; however, it is virtually unknown in hepatitis B-derived hepatocellular carcinoma (HBV-HCC). Here, we evaluate the transcriptional and epigenetic regulation of Ca2+-signaling genes in HBV-HCC and whether their expression is associated with cancer hallmarks, and prognostic potential.
Results: We identified 432 differentially expressed Ca2+-signaling genes in HBV-HCC, including 134 that are specific to this condition, and were not found in non-HBV HCC. Fifty-three of these genes were associated with cancer hallmarks, of which 17 exhibited potential prognostic value by Cox multivariate analyses. We also provide new evidence for epigenetic regulation by post-transcriptional histone modifications and DNA methylation at the promoter of some of these genes. Finally, using Least Absolute Shrinkage and Selection Operator (LASSO) regression, we identified a four-gene prognostic signature (FBLN1, STC2, C1R, and F2RL2) that robustly stratified patient outcomes. This study presents the first integrative transcriptomic and epigenetic analysis of Ca2+-signaling genes in HBV-HCC, introducing a novel four-gene signature with prognostic potential. These findings highlight the relevance of a dysregulation of a subset of Ca2+-signaling genes as a distinctive feature of HBV-HCC.
Availability and implementation: All data generated or analyzed during this study are included in this article.
{"title":"Transcriptional and epigenetic regulation of Ca<sup>2+</sup>-signaling genes in hepatitis B-derived hepatocellular carcinoma and their association with the cancer hallmarks.","authors":"Guadalupe Hernández-Martínez, Andrés Hernández-Oliveras, Ángel Zarain-Herzberg, Juan Santiago-García","doi":"10.1093/bioadv/vbaf331","DOIUrl":"10.1093/bioadv/vbaf331","url":null,"abstract":"<p><strong>Motivation: </strong>Dysregulation of Ca<sup>2+</sup>-signaling genes has been shown in some types of cancer; however, it is virtually unknown in hepatitis B-derived hepatocellular carcinoma (HBV-HCC). Here, we evaluate the transcriptional and epigenetic regulation of Ca<sup>2+</sup>-signaling genes in HBV-HCC and whether their expression is associated with cancer hallmarks, and prognostic potential.</p><p><strong>Results: </strong>We identified 432 differentially expressed Ca<sup>2+</sup>-signaling genes in HBV-HCC, including 134 that are specific to this condition, and were not found in non-HBV HCC. Fifty-three of these genes were associated with cancer hallmarks, of which 17 exhibited potential prognostic value by Cox multivariate analyses. We also provide new evidence for epigenetic regulation by post-transcriptional histone modifications and DNA methylation at the promoter of some of these genes. Finally, using Least Absolute Shrinkage and Selection Operator (LASSO) regression, we identified a four-gene prognostic signature (<i>FBLN1</i>, <i>STC2</i>, <i>C1R</i>, and <i>F2RL2</i>) that robustly stratified patient outcomes. This study presents the first integrative transcriptomic and epigenetic analysis of Ca<sup>2+</sup>-signaling genes in HBV-HCC, introducing a novel four-gene signature with prognostic potential. These findings highlight the relevance of a dysregulation of a subset of Ca<sup>2+</sup>-signaling genes as a distinctive feature of HBV-HCC.</p><p><strong>Availability and implementation: </strong>All data generated or analyzed during this study are included in this article.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf331"},"PeriodicalIF":2.8,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866915/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-25eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf313
Daniela Coelho Batista Guedes Pereira, João Vitor Ferreira Cavalcante, Laise Florentino Cavalcanti, Raul Maia Falcão, Jorge Estefano Santana de Souza, Rodrigo Juliani Siqueira Dalmolin, Thaís Gaudencio do Rêgo, Serghei Mangul, Gustavo Antônio de Souza, Patrick Terrematte, João Paulo Matos Santos Lima
Motivation: Predicting the impact of missense mutations on protein structure and function is a fundamental challenge for cancer research and clinical applications. Despite all the computational advances and, more recently, the use of artificial intelligence (AI), assessing the functional consequences of residue substitutions remains a challenging task. Proteins have complex three-dimensional structures, where the maintenance of their functionality depends on chemical interactions between amino acid residues. Single substitutions can affect these interactions, leading to more profound structural changes that are difficult to visualize.
Results: Here, we present CaRinDB, a database that integrates cancer-associated missense mutation data, functional predictions, molecular features, allelic frequencies, and residue interaction network (RIN) parameters derived from Protein Data Bank structures and AlphaFold models. Users can access and explore variant information through an intuitive web portal, with custom plots and tables to visualize and analyze cancer-associated mutation data. CaRinDB is the first database that unites distinct annotation features of cancer-associated mutations and their structural impacts, utilizing RINs graph parameters and a source of compiled and processed data for the development of AI tools.
Availability and implementation: CaRinDB is freely available at https://bioinfo.imd.ufrn.br/CaRinDB/. The integrated development environment used was Jupyter notebooks, available on GitHub (https://github.com/evomol-lab/CaRinDB). CaRinDB web interface was implemented in R and Shiny.
{"title":"CaRinDB: an integrated database of common cancer mutations and residue interaction network parameters.","authors":"Daniela Coelho Batista Guedes Pereira, João Vitor Ferreira Cavalcante, Laise Florentino Cavalcanti, Raul Maia Falcão, Jorge Estefano Santana de Souza, Rodrigo Juliani Siqueira Dalmolin, Thaís Gaudencio do Rêgo, Serghei Mangul, Gustavo Antônio de Souza, Patrick Terrematte, João Paulo Matos Santos Lima","doi":"10.1093/bioadv/vbaf313","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf313","url":null,"abstract":"<p><strong>Motivation: </strong>Predicting the impact of missense mutations on protein structure and function is a fundamental challenge for cancer research and clinical applications. Despite all the computational advances and, more recently, the use of artificial intelligence (AI), assessing the functional consequences of residue substitutions remains a challenging task. Proteins have complex three-dimensional structures, where the maintenance of their functionality depends on chemical interactions between amino acid residues. Single substitutions can affect these interactions, leading to more profound structural changes that are difficult to visualize.</p><p><strong>Results: </strong>Here, we present CaRinDB, a database that integrates cancer-associated missense mutation data, functional predictions, molecular features, allelic frequencies, and residue interaction network (RIN) parameters derived from Protein Data Bank structures and AlphaFold models. Users can access and explore variant information through an intuitive web portal, with custom plots and tables to visualize and analyze cancer-associated mutation data. CaRinDB is the first database that unites distinct annotation features of cancer-associated mutations and their structural impacts, utilizing RINs graph parameters and a source of compiled and processed data for the development of AI tools.</p><p><strong>Availability and implementation: </strong>CaRinDB is freely available at https://bioinfo.imd.ufrn.br/CaRinDB/. The integrated development environment used was Jupyter notebooks, available on GitHub (https://github.com/evomol-lab/CaRinDB). CaRinDB web interface was implemented in R and Shiny.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf313"},"PeriodicalIF":2.8,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872580/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: The availability of large-scale genetic data presents a unique opportunity to study the genetic ancestries of individuals, which requires an efficient and scalable method. The existing global ancestry methods are accurate, but they cannot scale to large genetic datasets. Identity-by-descent (IBD) segments are DNA segments shared by individuals such that they are inherited from a common recent ancestor without recombination. These IBD segments, which reflect co-ancestry, provide an efficient alternative for inferring genetic ancestry.
Results: We introduced a reference-based global ancestry inference method called FRAME (Fast Reference-based Ancestry Makeup Estimation). FRAME utilizes partial local ancestry information estimated through IBD segments. Instead of using sophisticated local ancestry inference methods designed to make the best calls at each site, we employed an efficient IBD method for faster and space-efficient algorithms that are robust to genotyping errors. Additionally, we introduced a new method of panel refinement that can enrich the ancestral homogeneity of individual haplotypes in the reference panel, thus leading to more accurate ancestry composition estimates. We benchmarked the performance of our method with real and simulated data. FRAME consumes ∼10-100 times less memory while maintaining a comparable accuracy.
Availability and implementation: Source code is available at https://github.com/ucfcbb/FRAME.
{"title":"FRAME: fast reference-based ancestry makeup estimation tool.","authors":"Pramesh Shakya, Ardalan Naseri, Degui Zhi, Shaojie Zhang","doi":"10.1093/bioadv/vbag006","DOIUrl":"10.1093/bioadv/vbag006","url":null,"abstract":"<p><strong>Motivation: </strong>The availability of large-scale genetic data presents a unique opportunity to study the genetic ancestries of individuals, which requires an efficient and scalable method. The existing global ancestry methods are accurate, but they cannot scale to large genetic datasets. Identity-by-descent (IBD) segments are DNA segments shared by individuals such that they are inherited from a common recent ancestor without recombination. These IBD segments, which reflect co-ancestry, provide an efficient alternative for inferring genetic ancestry.</p><p><strong>Results: </strong>We introduced a reference-based global ancestry inference method called FRAME (Fast Reference-based Ancestry Makeup Estimation). FRAME utilizes partial local ancestry information estimated through IBD segments. Instead of using sophisticated local ancestry inference methods designed to make the best calls at each site, we employed an efficient IBD method for faster and space-efficient algorithms that are robust to genotyping errors. Additionally, we introduced a new method of panel refinement that can enrich the ancestral homogeneity of individual haplotypes in the reference panel, thus leading to more accurate ancestry composition estimates. We benchmarked the performance of our method with real and simulated data. FRAME consumes ∼10-100 times less memory while maintaining a comparable accuracy.</p><p><strong>Availability and implementation: </strong>Source code is available at https://github.com/ucfcbb/FRAME.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag006"},"PeriodicalIF":2.8,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866910/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag007
Sankarasubramanian Jagadesan, Chittibabu Guda
Summary: Visium HD Spatial Transcriptomics Data Analysis and Visualization (VST-DAVis) is an interactive, R Shiny application and web browser designed for intuitive analysis of spatial transcriptomics data generated using the 10x Genomics Visium HD platform. This user-friendly tool empowers researchers, particularly those without programming expertise, to perform end-to-end spatial transcriptomics analysis through a streamlined graphical interface. The platform is capable of handling both single and multiple samples, enabling comparative analyses across diverse biological conditions or replicates. It accepts various input formats including both H5 and matrix-based files from Space Ranger and outputs high-quality graphics from various visualization tools. VST-DAVis integrates several widely used R packages, such as Seurat, Monocle3, CellChat, and hdWGCNA, to offer a robust and flexible analytical environment that supports a wide range of analytical tasks, including quality control, clustering, marker gene identification, subclustering, trajectory inference, pathway enrichment analysis, cell-cell communication modeling, co-expression analysis, and transcription factor network reconstruction. By combining its analytical depth with user-friendliness, VST-DAVis makes advanced analyses accessible to various research communities that utilize spatial transcriptomics data.
Availability and implementation: VST-DAVis is freely available at https://www.gudalab-rtools.net/VST-DAVis. It is implemented in R 4.5.2 and Bioconductor ≥ 3.22 using the Shiny framework and supports input from Space Ranger outputs. The source code and documentation are hosted on GitHub: https://github.com/GudaLab/VST-DAVis.
{"title":"VST-DAVis: an R Shiny application and web-browser for spatial transcriptomics data analysis and visualization.","authors":"Sankarasubramanian Jagadesan, Chittibabu Guda","doi":"10.1093/bioadv/vbag007","DOIUrl":"10.1093/bioadv/vbag007","url":null,"abstract":"<p><strong>Summary: </strong>Visium HD Spatial Transcriptomics Data Analysis and Visualization (VST-DAVis) is an interactive, R Shiny application and web browser designed for intuitive analysis of spatial transcriptomics data generated using the 10x Genomics Visium HD platform. This user-friendly tool empowers researchers, particularly those without programming expertise, to perform end-to-end spatial transcriptomics analysis through a streamlined graphical interface. The platform is capable of handling both single and multiple samples, enabling comparative analyses across diverse biological conditions or replicates. It accepts various input formats including both H5 and matrix-based files from Space Ranger and outputs high-quality graphics from various visualization tools. VST-DAVis integrates several widely used R packages, such as Seurat, Monocle3, CellChat, and hdWGCNA, to offer a robust and flexible analytical environment that supports a wide range of analytical tasks, including quality control, clustering, marker gene identification, subclustering, trajectory inference, pathway enrichment analysis, cell-cell communication modeling, co-expression analysis, and transcription factor network reconstruction. By combining its analytical depth with user-friendliness, VST-DAVis makes advanced analyses accessible to various research communities that utilize spatial transcriptomics data.</p><p><strong>Availability and implementation: </strong>VST-DAVis is freely available at https://www.gudalab-rtools.net/VST-DAVis. It is implemented in R 4.5.2 and Bioconductor ≥ 3.22 using the Shiny framework and supports input from Space Ranger outputs. The source code and documentation are hosted on GitHub: https://github.com/GudaLab/VST-DAVis.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag007"},"PeriodicalIF":2.8,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866912/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf325
Bram Nap, Bronson Weston, Annette Brandt, Maximilian F Wodak, Ina Bergheim, Ines Thiele
Motivation: Nutrition is an important factor in human health, used to alleviate or prevent symptoms of various diseases. However, the effects of nutrition on the gut microbiome and human metabolism are not well understood. Whole-body metabolic models (WBMs) have been applied to study relationships between regional diets and human/microbiome metabolism. This method requires diets to be defined at the metabolite level, rather than the food item level, which has gated the application of personalized diets to WBMs.
Results: We developed the Nutrition Toolbox, which leverages open-source databases containing metabolite composition for over ten thousand food items to convert food items into their metabolic composition to create in silico diets. Additionally, when used with a previously published nutrition algorithm, minimal changes to a diet can be identified to achieve desirable shifts in human and microbiome metabolism. Taken together, we believe that the Nutrition Toolbox can help to understand the effects of nutrition on human metabolism and has the potential to contribute to personalized nutrition.
Availability and implementation: The Nutrition Toolbox is written in MATLAB. The code can be found at https://github.com/opencobra/cobratoolbox. A tutorial explaining the code is available in the COBRA toolbox and as view-only supplementary tutorial. Details on installing the COBRA toolbox are available at https://opencobra.github.io/cobratoolbox/stable/installation.html.
{"title":"The nutrition toolbox permits <i>in silico</i> generation, analysis, and optimization of personalized diets through metabolic modelling.","authors":"Bram Nap, Bronson Weston, Annette Brandt, Maximilian F Wodak, Ina Bergheim, Ines Thiele","doi":"10.1093/bioadv/vbaf325","DOIUrl":"10.1093/bioadv/vbaf325","url":null,"abstract":"<p><strong>Motivation: </strong>Nutrition is an important factor in human health, used to alleviate or prevent symptoms of various diseases. However, the effects of nutrition on the gut microbiome and human metabolism are not well understood. Whole-body metabolic models (WBMs) have been applied to study relationships between regional diets and human/microbiome metabolism. This method requires diets to be defined at the metabolite level, rather than the food item level, which has gated the application of personalized diets to WBMs.</p><p><strong>Results: </strong>We developed the Nutrition Toolbox, which leverages open-source databases containing metabolite composition for over ten thousand food items to convert food items into their metabolic composition to create <i>in silico</i> diets. Additionally, when used with a previously published nutrition algorithm, minimal changes to a diet can be identified to achieve desirable shifts in human and microbiome metabolism. Taken together, we believe that the Nutrition Toolbox can help to understand the effects of nutrition on human metabolism and has the potential to contribute to personalized nutrition.</p><p><strong>Availability and implementation: </strong>The Nutrition Toolbox is written in MATLAB. The code can be found at https://github.com/opencobra/cobratoolbox. A tutorial explaining the code is available in the COBRA toolbox and as view-only supplementary tutorial. Details on installing the COBRA toolbox are available at https://opencobra.github.io/cobratoolbox/stable/installation.html.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf325"},"PeriodicalIF":2.8,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12820401/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf328
Aryan Bhasin, Francesco Saccon, Callum Canavan, Andrew Robson, Joao Euko, Alexandra C Walls, Yunguan Fu
Summary: Since the emergence of SARS-CoV-2, numerous studies have investigated antibody interactions with viral variants in vitro, and several datasets have been curated to compile available protein structures and experimental measurements. However, existing data remain fragmented, limiting their utility for the development and validation of machine learning models for antibody-antigen interaction prediction. Here, we present CoV-UniBind, a unified database comprising over 75 000 entries of SARS-CoV-2 antibody-antigen sequence, binding, and structural data, integrated and standardized from three public sources and multiple peer-reviewed publications. To demonstrate its utility, we benchmarked multiple protein folding, inverse folding, and language models across tasks relevant to antibody design and vaccine development. We expect CoV-UniBind to facilitate future computational efforts in antibody and vaccine development against SARS-CoV-2.
Availability and implementation: The curated datasets, model scores and antibody synonyms are free to download at https://huggingface.co/datasets/InstaDeepAI/cov-unibind. Folded structures are available upon request.
{"title":"CoV-UniBind: a unified antibody binding database for SARS-CoV-2.","authors":"Aryan Bhasin, Francesco Saccon, Callum Canavan, Andrew Robson, Joao Euko, Alexandra C Walls, Yunguan Fu","doi":"10.1093/bioadv/vbaf328","DOIUrl":"10.1093/bioadv/vbaf328","url":null,"abstract":"<p><strong>Summary: </strong>Since the emergence of SARS-CoV-2, numerous studies have investigated antibody interactions with viral variants <i>in vitro</i>, and several datasets have been curated to compile available protein structures and experimental measurements. However, existing data remain fragmented, limiting their utility for the development and validation of machine learning models for antibody-antigen interaction prediction. Here, we present CoV-UniBind, a unified database comprising over 75 000 entries of SARS-CoV-2 antibody-antigen sequence, binding, and structural data, integrated and standardized from three public sources and multiple peer-reviewed publications. To demonstrate its utility, we benchmarked multiple protein folding, inverse folding, and language models across tasks relevant to antibody design and vaccine development. We expect CoV-UniBind to facilitate future computational efforts in antibody and vaccine development against SARS-CoV-2.</p><p><strong>Availability and implementation: </strong>The curated datasets, model scores and antibody synonyms are free to download at https://huggingface.co/datasets/InstaDeepAI/cov-unibind. Folded structures are available upon request.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf328"},"PeriodicalIF":2.8,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12800777/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-02eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf326
Bruno Marques Silva, Fernanda de Jesus Trindade, Lucas Eduardo Costa Canesin, Giordano Souza, Alexandre Aleixo, Gisele Nunes, Renato Renison Moreira-Oliveira
Motivation: Although high-quality chromosome-scale genome assemblies are feasible, assembling large ones remains complex and resource-intensive. This demands reproducible and automated workflows that not only implement current best practices efficiently but also allow for improvement alongside future updates to those standards.
Results: We present Pipeasm, a Snakemake-based genome assembly pipeline containerized with Singularity. Pipeasm can use HiFi, ONT, and Hi-C data, automating read trimming, nuclear and mitogenome assembly, scaffolding, decontamination, and quality evaluation. Applied to four vertebrate species with distinct genomic characteristics, starting from a single command line and configuration file, it produced assemblies with scaffold L50 proportional to the expected chromosome and genome length, and up to 99.6% BUSCO completeness. Its output also includes detailed reports for each step, genome statistics, Hi-C maps, and files ready for curation.
Availability and implementation: Pipeasm is available at https://github.com/itvgenomics/pipeasm, implemented in Python/Snakemake with Singularity, and runs on Unix-based systems.
{"title":"Pipeasm: a tool for automated large chromosome-scale genome assembly and evaluation.","authors":"Bruno Marques Silva, Fernanda de Jesus Trindade, Lucas Eduardo Costa Canesin, Giordano Souza, Alexandre Aleixo, Gisele Nunes, Renato Renison Moreira-Oliveira","doi":"10.1093/bioadv/vbaf326","DOIUrl":"10.1093/bioadv/vbaf326","url":null,"abstract":"<p><strong>Motivation: </strong>Although high-quality chromosome-scale genome assemblies are feasible, assembling large ones remains complex and resource-intensive. This demands reproducible and automated workflows that not only implement current best practices efficiently but also allow for improvement alongside future updates to those standards.</p><p><strong>Results: </strong>We present Pipeasm, a Snakemake-based genome assembly pipeline containerized with Singularity. Pipeasm can use HiFi, ONT, and Hi-C data, automating read trimming, nuclear and mitogenome assembly, scaffolding, decontamination, and quality evaluation. Applied to four vertebrate species with distinct genomic characteristics, starting from a single command line and configuration file, it produced assemblies with scaffold L50 proportional to the expected chromosome and genome length, and up to 99.6% BUSCO completeness. Its output also includes detailed reports for each step, genome statistics, Hi-C maps, and files ready for curation.</p><p><strong>Availability and implementation: </strong>Pipeasm is available at https://github.com/itvgenomics/pipeasm, implemented in Python/Snakemake with Singularity, and runs on Unix-based systems.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf326"},"PeriodicalIF":2.8,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12800776/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: Creating publication-quality visualizations is essential for bioinformatics but remains a bottleneck for researchers with limited coding expertise. While Large Language Models (LLMs) are proficient at generating code, they often fail in practice due to library dependencies, dataset mismatches, or syntax errors. These issues require manual intervention, slowing data interpretation.
Results: We present ggplotAgent, a novel multi-modal, self-debugging artificial intelligence agent that automates publication-ready ggplot2 visualizations. It features a dual-layered framework that resolves code execution errors and uses a vision-enabled agent to verify aesthetic correctness. In benchmarks against the DeepSeek-V3 model, ggplotAgent achieved a 100% code executability rate(versus 85%) and a "Publication-Ready" score of 1.9 (versus 0.7). Surprisingly, it showcased the ability to act as an expert collaborator by intelligently enhancing plots beyond the user's literal prompt, achieving a positive Insight Score of +0.3 over than the baseline (-0.05). These results demonstrate its ability to reliably produce accurate, high-quality visualizations directly from natural language.
Availability and implementation: ggplotAgent is freely accessible as a public web application at https://ggplotagent.databio1.com/ and an offline Streamlit app. The source code is available on GitHub at https://github.com/charlin90/ggplotAgent. This software is distributed under the MIT License.
{"title":"ggplotAgent: a self-debugging multi-modal agent for robust and reproducible scientific visualization.","authors":"Zelin Wang, Yuanyuan Yin, Jien Wang, Haiyan Yan, Xuan Xie, Yiqing Zheng","doi":"10.1093/bioadv/vbaf332","DOIUrl":"10.1093/bioadv/vbaf332","url":null,"abstract":"<p><strong>Motivation: </strong>Creating publication-quality visualizations is essential for bioinformatics but remains a bottleneck for researchers with limited coding expertise. While Large Language Models (LLMs) are proficient at generating code, they often fail in practice due to library dependencies, dataset mismatches, or syntax errors. These issues require manual intervention, slowing data interpretation.</p><p><strong>Results: </strong>We present ggplotAgent, a novel multi-modal, self-debugging artificial intelligence agent that automates publication-ready ggplot2 visualizations. It features a dual-layered framework that resolves code execution errors and uses a vision-enabled agent to verify aesthetic correctness. In benchmarks against the DeepSeek-V3 model, ggplotAgent achieved a 100% code executability rate(versus 85%) and a \"Publication-Ready\" score of 1.9 (versus 0.7). Surprisingly, it showcased the ability to act as an expert collaborator by intelligently enhancing plots beyond the user's literal prompt, achieving a positive Insight Score of +0.3 over than the baseline (-0.05). These results demonstrate its ability to reliably produce accurate, high-quality visualizations directly from natural language.</p><p><strong>Availability and implementation: </strong>ggplotAgent is freely accessible as a public web application at https://ggplotagent.databio1.com/ and an offline Streamlit app. The source code is available on GitHub at https://github.com/charlin90/ggplotAgent. This software is distributed under the MIT License.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf332"},"PeriodicalIF":2.8,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12802885/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf329
Justin Merondun, Qingyi Yu
Motivation: Chromosome-level assemblies are essential for modern genomics, from comparative genomics and evolutionary studies to precision breeding. While integrated HiFi and Hi-C data now enable accurate chromosome-scale genome assemblies, the bioinformatic process remains complex and involves specialized tools and expertise. With large-scale pan-genomic efforts requiring dozens to hundreds of platinum quality chromosome-scale genomes, there is a need for scalable, portable, and user-friendly pipelines that streamline and standardize high-quality genome assembly workflows.
Results: We introduce Puzzler, a containerized, scalable pipeline for chromosome-scale de novo genome assembly using PacBio HiFi and Hi-C data. Designed for portability and minimal user input, Puzzler automates contig assembly, duplicate purging, Hi-C-based scaffolding, and chromosome assignment via synteny, even with highly diverged reference taxa. Optional modules generate input files for manual Hi-C curation or operate reference-free. Quality control is integrated and includes Hi-C contact maps, BUSCO, yak k-mer completeness, and BlobTools contamination screening. A checkpointing system ensures that previously completed tasks are not re-executed, while a simple sample sheet input structure supports scalable batch processing. Puzzler has been validated on genomes ranging from 24 Mbp to 6.5 Gbp, delivering highly contiguous assemblies with <10 min of user input, enabling high-throughput platinum-quality genome assembly.
Availability and implementation: Puzzler is released into the public domain under 17 U.S.C. §105. Source code, documentation, and tutorials are available at https://github.com/merondun/puzzler and archived on Zenodo: https://doi.org/10.5281/zenodo.15733730 and https://doi.org/10.5281/zenodo.15693025. Pre-configured runtime environments including dependencies are provided via both a Conda environment (https://anaconda.org/heritabilities/puzzler) and an Apptainer hosted both on Zenodo and Sylabs (https://cloud.sylabs.io/library/merondun/default/puzzler).
{"title":"Puzzler: scalable one-command platinum-quality genome assembly from HiFi and Hi-C.","authors":"Justin Merondun, Qingyi Yu","doi":"10.1093/bioadv/vbaf329","DOIUrl":"10.1093/bioadv/vbaf329","url":null,"abstract":"<p><strong>Motivation: </strong>Chromosome-level assemblies are essential for modern genomics, from comparative genomics and evolutionary studies to precision breeding. While integrated HiFi and Hi-C data now enable accurate chromosome-scale genome assemblies, the bioinformatic process remains complex and involves specialized tools and expertise. With large-scale pan-genomic efforts requiring dozens to hundreds of platinum quality chromosome-scale genomes, there is a need for scalable, portable, and user-friendly pipelines that streamline and standardize high-quality genome assembly workflows.</p><p><strong>Results: </strong>We introduce Puzzler, a containerized, scalable pipeline for chromosome-scale <i>de novo</i> genome assembly using PacBio HiFi and Hi-C data. Designed for portability and minimal user input, Puzzler automates contig assembly, duplicate purging, Hi-C-based scaffolding, and chromosome assignment via synteny, even with highly diverged reference taxa. Optional modules generate input files for manual Hi-C curation or operate reference-free. Quality control is integrated and includes Hi-C contact maps, BUSCO, yak k-mer completeness, and BlobTools contamination screening. A checkpointing system ensures that previously completed tasks are not re-executed, while a simple sample sheet input structure supports scalable batch processing. Puzzler has been validated on genomes ranging from 24 Mbp to 6.5 Gbp, delivering highly contiguous assemblies with <10 min of user input, enabling high-throughput platinum-quality genome assembly.</p><p><strong>Availability and implementation: </strong>Puzzler is released into the public domain under 17 U.S.C. §105. Source code, documentation, and tutorials are available at https://github.com/merondun/puzzler and archived on Zenodo: https://doi.org/10.5281/zenodo.15733730 and https://doi.org/10.5281/zenodo.15693025. Pre-configured runtime environments including dependencies are provided via both a Conda environment (https://anaconda.org/heritabilities/puzzler) and an Apptainer hosted both on Zenodo and Sylabs (https://cloud.sylabs.io/library/merondun/default/puzzler).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf329"},"PeriodicalIF":2.8,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12820402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: Cyclic immunofluorescence (IF) techniques enable deep phenotyping of cells and help quantify tissue organization at high resolution. Due to its high dimensionality, workflows typically rely on unsupervised clustering, followed by cell type annotation at a cluster level for cell type assignment. Most of these methods use marker expression averages that lack a statistical evaluation of cell type annotations, which can result in misclassification. Here, we propose a strategy through an end-to-end pipeline using a semi-supervised, random forest approach to predict cell type annotations.
Results: Our method includes cluster-based sampling for training data, cell type prediction, and downstream visualization for interpretability of cell annotation that ultimately improves classification results. We show that our workflow can annotate cells more accurately compared to representative deep learning and probabilistic methods, with a training set <5% of the total number of cells tested. In addition, our pipeline outputs cell type probabilities and model performance metrics for users to decide if it could boost their existing clustering-based workflow results for complex IF data.
Availability and implementation: Fluoro-forest is freely available on GitHub under an MIT license (https://github.com/Josh-Brand/Fluoro-forest).
{"title":"Fluoro-forest: a random forest workflow for cell type annotation in high-dimensional immunofluorescence imaging with limited training data.","authors":"Joshua Brand, Wei Zhang, Evie Carchman, Huy Q Dinh","doi":"10.1093/bioadv/vbaf320","DOIUrl":"10.1093/bioadv/vbaf320","url":null,"abstract":"<p><strong>Motivation: </strong>Cyclic immunofluorescence (IF) techniques enable deep phenotyping of cells and help quantify tissue organization at high resolution. Due to its high dimensionality, workflows typically rely on unsupervised clustering, followed by cell type annotation at a cluster level for cell type assignment. Most of these methods use marker expression averages that lack a statistical evaluation of cell type annotations, which can result in misclassification. Here, we propose a strategy through an end-to-end pipeline using a semi-supervised, random forest approach to predict cell type annotations.</p><p><strong>Results: </strong>Our method includes cluster-based sampling for training data, cell type prediction, and downstream visualization for interpretability of cell annotation that ultimately improves classification results. We show that our workflow can annotate cells more accurately compared to representative deep learning and probabilistic methods, with a training set <5% of the total number of cells tested. In addition, our pipeline outputs cell type probabilities and model performance metrics for users to decide if it could boost their existing clustering-based workflow results for complex IF data.</p><p><strong>Availability and implementation: </strong>Fluoro-forest is freely available on GitHub under an MIT license (https://github.com/Josh-Brand/Fluoro-forest).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf320"},"PeriodicalIF":2.8,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12782655/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145954108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}