首页 > 最新文献

Bioinformatics advances最新文献

英文 中文
Transcriptional and epigenetic regulation of Ca2+-signaling genes in hepatitis B-derived hepatocellular carcinoma and their association with the cancer hallmarks. 乙型肝炎源性肝细胞癌中Ca2+信号基因的转录和表观遗传调控及其与癌症特征的关联。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-27 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf331
Guadalupe Hernández-Martínez, Andrés Hernández-Oliveras, Ángel Zarain-Herzberg, Juan Santiago-García

Motivation: Dysregulation of Ca2+-signaling genes has been shown in some types of cancer; however, it is virtually unknown in hepatitis B-derived hepatocellular carcinoma (HBV-HCC). Here, we evaluate the transcriptional and epigenetic regulation of Ca2+-signaling genes in HBV-HCC and whether their expression is associated with cancer hallmarks, and prognostic potential.

Results: We identified 432 differentially expressed Ca2+-signaling genes in HBV-HCC, including 134 that are specific to this condition, and were not found in non-HBV HCC. Fifty-three of these genes were associated with cancer hallmarks, of which 17 exhibited potential prognostic value by Cox multivariate analyses. We also provide new evidence for epigenetic regulation by post-transcriptional histone modifications and DNA methylation at the promoter of some of these genes. Finally, using Least Absolute Shrinkage and Selection Operator (LASSO) regression, we identified a four-gene prognostic signature (FBLN1, STC2, C1R, and F2RL2) that robustly stratified patient outcomes. This study presents the first integrative transcriptomic and epigenetic analysis of Ca2+-signaling genes in HBV-HCC, introducing a novel four-gene signature with prognostic potential. These findings highlight the relevance of a dysregulation of a subset of Ca2+-signaling genes as a distinctive feature of HBV-HCC.

Availability and implementation: All data generated or analyzed during this study are included in this article.

动机:在某些类型的癌症中已经发现Ca2+信号基因的失调;然而,它在乙型肝炎源性肝细胞癌(HBV-HCC)中几乎是未知的。在这里,我们评估了HBV-HCC中Ca2+信号基因的转录和表观遗传调控,以及它们的表达是否与癌症特征和预后潜力相关。结果:我们在HBV-HCC中发现了432个差异表达的Ca2+信号基因,其中134个是特异性的,而在非hbv HCC中没有发现。这些基因中53个与癌症特征相关,其中17个通过Cox多变量分析显示出潜在的预后价值。我们也为这些基因的启动子转录后组蛋白修饰和DNA甲基化的表观遗传调控提供了新的证据。最后,使用最小绝对收缩和选择算子(LASSO)回归,我们确定了一个四基因预后特征(FBLN1, STC2, C1R和F2RL2),该特征有力地划分了患者的预后。这项研究首次提出了HBV-HCC中Ca2+信号基因的综合转录组学和表观遗传学分析,引入了一种具有预后潜力的新型四基因标记。这些发现强调了Ca2+信号基因亚群失调作为HBV-HCC的显著特征的相关性。可用性和实现:本研究过程中生成或分析的所有数据都包含在本文中。
{"title":"Transcriptional and epigenetic regulation of Ca<sup>2+</sup>-signaling genes in hepatitis B-derived hepatocellular carcinoma and their association with the cancer hallmarks.","authors":"Guadalupe Hernández-Martínez, Andrés Hernández-Oliveras, Ángel Zarain-Herzberg, Juan Santiago-García","doi":"10.1093/bioadv/vbaf331","DOIUrl":"10.1093/bioadv/vbaf331","url":null,"abstract":"<p><strong>Motivation: </strong>Dysregulation of Ca<sup>2+</sup>-signaling genes has been shown in some types of cancer; however, it is virtually unknown in hepatitis B-derived hepatocellular carcinoma (HBV-HCC). Here, we evaluate the transcriptional and epigenetic regulation of Ca<sup>2+</sup>-signaling genes in HBV-HCC and whether their expression is associated with cancer hallmarks, and prognostic potential.</p><p><strong>Results: </strong>We identified 432 differentially expressed Ca<sup>2+</sup>-signaling genes in HBV-HCC, including 134 that are specific to this condition, and were not found in non-HBV HCC. Fifty-three of these genes were associated with cancer hallmarks, of which 17 exhibited potential prognostic value by Cox multivariate analyses. We also provide new evidence for epigenetic regulation by post-transcriptional histone modifications and DNA methylation at the promoter of some of these genes. Finally, using Least Absolute Shrinkage and Selection Operator (LASSO) regression, we identified a four-gene prognostic signature (<i>FBLN1</i>, <i>STC2</i>, <i>C1R</i>, and <i>F2RL2</i>) that robustly stratified patient outcomes. This study presents the first integrative transcriptomic and epigenetic analysis of Ca<sup>2+</sup>-signaling genes in HBV-HCC, introducing a novel four-gene signature with prognostic potential. These findings highlight the relevance of a dysregulation of a subset of Ca<sup>2+</sup>-signaling genes as a distinctive feature of HBV-HCC.</p><p><strong>Availability and implementation: </strong>All data generated or analyzed during this study are included in this article.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf331"},"PeriodicalIF":2.8,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866915/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CaRinDB: an integrated database of common cancer mutations and residue interaction network parameters. CaRinDB:常见癌症突变和残基相互作用网络参数的集成数据库。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-25 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf313
Daniela Coelho Batista Guedes Pereira, João Vitor Ferreira Cavalcante, Laise Florentino Cavalcanti, Raul Maia Falcão, Jorge Estefano Santana de Souza, Rodrigo Juliani Siqueira Dalmolin, Thaís Gaudencio do Rêgo, Serghei Mangul, Gustavo Antônio de Souza, Patrick Terrematte, João Paulo Matos Santos Lima

Motivation: Predicting the impact of missense mutations on protein structure and function is a fundamental challenge for cancer research and clinical applications. Despite all the computational advances and, more recently, the use of artificial intelligence (AI), assessing the functional consequences of residue substitutions remains a challenging task. Proteins have complex three-dimensional structures, where the maintenance of their functionality depends on chemical interactions between amino acid residues. Single substitutions can affect these interactions, leading to more profound structural changes that are difficult to visualize.

Results: Here, we present CaRinDB, a database that integrates cancer-associated missense mutation data, functional predictions, molecular features, allelic frequencies, and residue interaction network (RIN) parameters derived from Protein Data Bank structures and AlphaFold models. Users can access and explore variant information through an intuitive web portal, with custom plots and tables to visualize and analyze cancer-associated mutation data. CaRinDB is the first database that unites distinct annotation features of cancer-associated mutations and their structural impacts, utilizing RINs graph parameters and a source of compiled and processed data for the development of AI tools.

Availability and implementation: CaRinDB is freely available at https://bioinfo.imd.ufrn.br/CaRinDB/. The integrated development environment used was Jupyter notebooks, available on GitHub (https://github.com/evomol-lab/CaRinDB). CaRinDB web interface was implemented in R and Shiny.

动机:预测错义突变对蛋白质结构和功能的影响是癌症研究和临床应用的基本挑战。尽管计算技术取得了很大的进步,而且最近人工智能(AI)的应用也越来越广泛,但评估残留物替代的功能后果仍然是一项具有挑战性的任务。蛋白质具有复杂的三维结构,其功能的维持依赖于氨基酸残基之间的化学相互作用。单次取代可以影响这些相互作用,导致难以可视化的更深刻的结构变化。在此,我们提出了CaRinDB,一个整合了癌症相关错义突变数据、功能预测、分子特征、等位基因频率和残基相互作用网络(RIN)参数的数据库,这些参数来源于蛋白质数据库结构和AlphaFold模型。用户可以通过一个直观的门户网站访问和探索变异信息,使用自定义的图表和表格来可视化和分析癌症相关的突变数据。CaRinDB是第一个将癌症相关突变及其结构影响的不同注释特征结合起来的数据库,利用RINs图参数和用于开发人工智能工具的编译和处理数据来源。可用性和实现:CaRinDB可在https://bioinfo.imd.ufrn.br/CaRinDB/免费获得。使用的集成开发环境是Jupyter notebook,可以在GitHub (https://github.com/evomol-lab/CaRinDB)上获得。CaRinDB web界面是用R和Shiny实现的。
{"title":"CaRinDB: an integrated database of common cancer mutations and residue interaction network parameters.","authors":"Daniela Coelho Batista Guedes Pereira, João Vitor Ferreira Cavalcante, Laise Florentino Cavalcanti, Raul Maia Falcão, Jorge Estefano Santana de Souza, Rodrigo Juliani Siqueira Dalmolin, Thaís Gaudencio do Rêgo, Serghei Mangul, Gustavo Antônio de Souza, Patrick Terrematte, João Paulo Matos Santos Lima","doi":"10.1093/bioadv/vbaf313","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf313","url":null,"abstract":"<p><strong>Motivation: </strong>Predicting the impact of missense mutations on protein structure and function is a fundamental challenge for cancer research and clinical applications. Despite all the computational advances and, more recently, the use of artificial intelligence (AI), assessing the functional consequences of residue substitutions remains a challenging task. Proteins have complex three-dimensional structures, where the maintenance of their functionality depends on chemical interactions between amino acid residues. Single substitutions can affect these interactions, leading to more profound structural changes that are difficult to visualize.</p><p><strong>Results: </strong>Here, we present CaRinDB, a database that integrates cancer-associated missense mutation data, functional predictions, molecular features, allelic frequencies, and residue interaction network (RIN) parameters derived from Protein Data Bank structures and AlphaFold models. Users can access and explore variant information through an intuitive web portal, with custom plots and tables to visualize and analyze cancer-associated mutation data. CaRinDB is the first database that unites distinct annotation features of cancer-associated mutations and their structural impacts, utilizing RINs graph parameters and a source of compiled and processed data for the development of AI tools.</p><p><strong>Availability and implementation: </strong>CaRinDB is freely available at https://bioinfo.imd.ufrn.br/CaRinDB/. The integrated development environment used was Jupyter notebooks, available on GitHub (https://github.com/evomol-lab/CaRinDB). CaRinDB web interface was implemented in R and Shiny.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf313"},"PeriodicalIF":2.8,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872580/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FRAME: fast reference-based ancestry makeup estimation tool. FRAME:快速基于参考的祖先组成估计工具。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-12 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbag006
Pramesh Shakya, Ardalan Naseri, Degui Zhi, Shaojie Zhang

Motivation: The availability of large-scale genetic data presents a unique opportunity to study the genetic ancestries of individuals, which requires an efficient and scalable method. The existing global ancestry methods are accurate, but they cannot scale to large genetic datasets. Identity-by-descent (IBD) segments are DNA segments shared by individuals such that they are inherited from a common recent ancestor without recombination. These IBD segments, which reflect co-ancestry, provide an efficient alternative for inferring genetic ancestry.

Results: We introduced a reference-based global ancestry inference method called FRAME (Fast Reference-based Ancestry Makeup Estimation). FRAME utilizes partial local ancestry information estimated through IBD segments. Instead of using sophisticated local ancestry inference methods designed to make the best calls at each site, we employed an efficient IBD method for faster and space-efficient algorithms that are robust to genotyping errors. Additionally, we introduced a new method of panel refinement that can enrich the ancestral homogeneity of individual haplotypes in the reference panel, thus leading to more accurate ancestry composition estimates. We benchmarked the performance of our method with real and simulated data. FRAME consumes ∼10-100 times less memory while maintaining a comparable accuracy.

Availability and implementation: Source code is available at https://github.com/ucfcbb/FRAME.

动机:大规模遗传数据的可用性为研究个体的遗传祖先提供了一个独特的机会,这需要一种有效和可扩展的方法。现有的全球祖先方法是准确的,但它们不能扩展到大型遗传数据集。血统识别(IBD)片段是个体共享的DNA片段,它们从共同的最近祖先那里遗传而来,没有重组。这些IBD片段反映了共同祖先,为推断遗传祖先提供了有效的替代方法。结果:我们引入了一种基于参考的全局祖先推断方法FRAME (Fast reference-based ancestry Makeup Estimation)。FRAME利用通过IBD片段估计的部分本地祖先信息。我们没有使用复杂的本地祖先推断方法来设计每个位点的最佳呼叫,而是采用了一种高效的IBD方法,这种方法更快,更节省空间,对基因分型错误具有鲁棒性。此外,我们引入了一种新的面板改进方法,可以丰富参考面板中单个单倍型的祖先同质性,从而导致更准确的祖先组成估计。我们用真实和模拟数据对我们的方法的性能进行了基准测试。FRAME消耗的内存减少了10-100倍,同时保持了相当的精度。可用性和实现:源代码可从https://github.com/ucfcbb/FRAME获得。
{"title":"FRAME: fast reference-based ancestry makeup estimation tool.","authors":"Pramesh Shakya, Ardalan Naseri, Degui Zhi, Shaojie Zhang","doi":"10.1093/bioadv/vbag006","DOIUrl":"10.1093/bioadv/vbag006","url":null,"abstract":"<p><strong>Motivation: </strong>The availability of large-scale genetic data presents a unique opportunity to study the genetic ancestries of individuals, which requires an efficient and scalable method. The existing global ancestry methods are accurate, but they cannot scale to large genetic datasets. Identity-by-descent (IBD) segments are DNA segments shared by individuals such that they are inherited from a common recent ancestor without recombination. These IBD segments, which reflect co-ancestry, provide an efficient alternative for inferring genetic ancestry.</p><p><strong>Results: </strong>We introduced a reference-based global ancestry inference method called FRAME (Fast Reference-based Ancestry Makeup Estimation). FRAME utilizes partial local ancestry information estimated through IBD segments. Instead of using sophisticated local ancestry inference methods designed to make the best calls at each site, we employed an efficient IBD method for faster and space-efficient algorithms that are robust to genotyping errors. Additionally, we introduced a new method of panel refinement that can enrich the ancestral homogeneity of individual haplotypes in the reference panel, thus leading to more accurate ancestry composition estimates. We benchmarked the performance of our method with real and simulated data. FRAME consumes ∼10-100 times less memory while maintaining a comparable accuracy.</p><p><strong>Availability and implementation: </strong>Source code is available at https://github.com/ucfcbb/FRAME.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag006"},"PeriodicalIF":2.8,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866910/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VST-DAVis: an R Shiny application and web-browser for spatial transcriptomics data analysis and visualization. VST-DAVis:一个用于空间转录组学数据分析和可视化的R Shiny应用程序和web浏览器。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-09 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbag007
Sankarasubramanian Jagadesan, Chittibabu Guda

Summary: Visium HD Spatial Transcriptomics Data Analysis and Visualization (VST-DAVis) is an interactive, R Shiny application and web browser designed for intuitive analysis of spatial transcriptomics data generated using the 10x Genomics Visium HD platform. This user-friendly tool empowers researchers, particularly those without programming expertise, to perform end-to-end spatial transcriptomics analysis through a streamlined graphical interface. The platform is capable of handling both single and multiple samples, enabling comparative analyses across diverse biological conditions or replicates. It accepts various input formats including both H5 and matrix-based files from Space Ranger and outputs high-quality graphics from various visualization tools. VST-DAVis integrates several widely used R packages, such as Seurat, Monocle3, CellChat, and hdWGCNA, to offer a robust and flexible analytical environment that supports a wide range of analytical tasks, including quality control, clustering, marker gene identification, subclustering, trajectory inference, pathway enrichment analysis, cell-cell communication modeling, co-expression analysis, and transcription factor network reconstruction. By combining its analytical depth with user-friendliness, VST-DAVis makes advanced analyses accessible to various research communities that utilize spatial transcriptomics data.

Availability and implementation: VST-DAVis is freely available at https://www.gudalab-rtools.net/VST-DAVis. It is implemented in R 4.5.2 and Bioconductor ≥ 3.22 using the Shiny framework and supports input from Space Ranger outputs. The source code and documentation are hosted on GitHub: https://github.com/GudaLab/VST-DAVis.

Visium HD空间转录组学数据分析和可视化(VST-DAVis)是一个交互式的R Shiny应用程序和web浏览器,专为使用10x Genomics Visium HD平台生成的空间转录组学数据进行直观分析而设计。这个用户友好的工具使研究人员,特别是那些没有编程专业知识的研究人员,能够通过简化的图形界面执行端到端的空间转录组学分析。该平台能够处理单个和多个样本,能够在不同的生物条件或复制中进行比较分析。它接受各种输入格式,包括来自太空游侠的H5和基于矩阵的文件,并从各种可视化工具输出高质量的图形。VST-DAVis集成了几个广泛使用的R软件包,如Seurat、Monocle3、CellChat和hdWGCNA,提供了一个强大而灵活的分析环境,支持广泛的分析任务,包括质量控制、聚类、标记基因鉴定、亚聚类、轨迹推断、途径富集分析、细胞-细胞通信建模、共表达分析和转录因子网络重建。通过将其分析深度与用户友好性相结合,VST-DAVis为利用空间转录组学数据的各种研究团体提供了先进的分析。可用性和实现:VST-DAVis可在https://www.gudalab-rtools.net/VST-DAVis免费获得。它在R 4.5.2和Bioconductor≥3.22中使用Shiny框架实现,并支持来自Space Ranger输出的输入。源代码和文档托管在GitHub上:https://github.com/GudaLab/VST-DAVis。
{"title":"VST-DAVis: an R Shiny application and web-browser for spatial transcriptomics data analysis and visualization.","authors":"Sankarasubramanian Jagadesan, Chittibabu Guda","doi":"10.1093/bioadv/vbag007","DOIUrl":"10.1093/bioadv/vbag007","url":null,"abstract":"<p><strong>Summary: </strong>Visium HD Spatial Transcriptomics Data Analysis and Visualization (VST-DAVis) is an interactive, R Shiny application and web browser designed for intuitive analysis of spatial transcriptomics data generated using the 10x Genomics Visium HD platform. This user-friendly tool empowers researchers, particularly those without programming expertise, to perform end-to-end spatial transcriptomics analysis through a streamlined graphical interface. The platform is capable of handling both single and multiple samples, enabling comparative analyses across diverse biological conditions or replicates. It accepts various input formats including both H5 and matrix-based files from Space Ranger and outputs high-quality graphics from various visualization tools. VST-DAVis integrates several widely used R packages, such as Seurat, Monocle3, CellChat, and hdWGCNA, to offer a robust and flexible analytical environment that supports a wide range of analytical tasks, including quality control, clustering, marker gene identification, subclustering, trajectory inference, pathway enrichment analysis, cell-cell communication modeling, co-expression analysis, and transcription factor network reconstruction. By combining its analytical depth with user-friendliness, VST-DAVis makes advanced analyses accessible to various research communities that utilize spatial transcriptomics data.</p><p><strong>Availability and implementation: </strong>VST-DAVis is freely available at https://www.gudalab-rtools.net/VST-DAVis. It is implemented in R 4.5.2 and Bioconductor ≥ 3.22 using the Shiny framework and supports input from Space Ranger outputs. The source code and documentation are hosted on GitHub: https://github.com/GudaLab/VST-DAVis.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag007"},"PeriodicalIF":2.8,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866912/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The nutrition toolbox permits in silico generation, analysis, and optimization of personalized diets through metabolic modelling. 营养工具箱允许通过代谢模型生成、分析和优化个性化饮食。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-08 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf325
Bram Nap, Bronson Weston, Annette Brandt, Maximilian F Wodak, Ina Bergheim, Ines Thiele

Motivation: Nutrition is an important factor in human health, used to alleviate or prevent symptoms of various diseases. However, the effects of nutrition on the gut microbiome and human metabolism are not well understood. Whole-body metabolic models (WBMs) have been applied to study relationships between regional diets and human/microbiome metabolism. This method requires diets to be defined at the metabolite level, rather than the food item level, which has gated the application of personalized diets to WBMs.

Results: We developed the Nutrition Toolbox, which leverages open-source databases containing metabolite composition for over ten thousand food items to convert food items into their metabolic composition to create in silico diets. Additionally, when used with a previously published nutrition algorithm, minimal changes to a diet can be identified to achieve desirable shifts in human and microbiome metabolism. Taken together, we believe that the Nutrition Toolbox can help to understand the effects of nutrition on human metabolism and has the potential to contribute to personalized nutrition.

Availability and implementation: The Nutrition Toolbox is written in MATLAB. The code can be found at https://github.com/opencobra/cobratoolbox. A tutorial explaining the code is available in the COBRA toolbox and as view-only supplementary tutorial. Details on installing the COBRA toolbox are available at https://opencobra.github.io/cobratoolbox/stable/installation.html.

动机:营养是人类健康的重要因素,用于减轻或预防各种疾病的症状。然而,营养对肠道微生物群和人体代谢的影响尚不清楚。全身代谢模型(WBMs)已被应用于研究区域饮食与人体/微生物组代谢之间的关系。这种方法要求在代谢物水平上定义饮食,而不是在食物项目水平上定义饮食,这限制了个性化饮食在体重增加者中的应用。结果:我们开发了营养工具箱,它利用包含超过一万种食物的代谢物组成的开源数据库,将食物转化为它们的代谢组成,以创建硅化饮食。此外,当与先前发表的营养算法一起使用时,可以确定饮食的最小变化,以实现人体和微生物组代谢的理想变化。综上所述,我们相信营养工具箱可以帮助理解营养对人体代谢的影响,并有可能为个性化营养做出贡献。可用性和实现:营养工具箱是用MATLAB编写的。代码可以在https://github.com/opencobra/cobratoolbox上找到。在COBRA工具箱中可以找到解释代码的教程,也可以作为仅视图的补充教程。有关安装COBRA工具箱的详细信息,请访问https://opencobra.github.io/cobratoolbox/stable/installation.html。
{"title":"The nutrition toolbox permits <i>in silico</i> generation, analysis, and optimization of personalized diets through metabolic modelling.","authors":"Bram Nap, Bronson Weston, Annette Brandt, Maximilian F Wodak, Ina Bergheim, Ines Thiele","doi":"10.1093/bioadv/vbaf325","DOIUrl":"10.1093/bioadv/vbaf325","url":null,"abstract":"<p><strong>Motivation: </strong>Nutrition is an important factor in human health, used to alleviate or prevent symptoms of various diseases. However, the effects of nutrition on the gut microbiome and human metabolism are not well understood. Whole-body metabolic models (WBMs) have been applied to study relationships between regional diets and human/microbiome metabolism. This method requires diets to be defined at the metabolite level, rather than the food item level, which has gated the application of personalized diets to WBMs.</p><p><strong>Results: </strong>We developed the Nutrition Toolbox, which leverages open-source databases containing metabolite composition for over ten thousand food items to convert food items into their metabolic composition to create <i>in silico</i> diets. Additionally, when used with a previously published nutrition algorithm, minimal changes to a diet can be identified to achieve desirable shifts in human and microbiome metabolism. Taken together, we believe that the Nutrition Toolbox can help to understand the effects of nutrition on human metabolism and has the potential to contribute to personalized nutrition.</p><p><strong>Availability and implementation: </strong>The Nutrition Toolbox is written in MATLAB. The code can be found at https://github.com/opencobra/cobratoolbox. A tutorial explaining the code is available in the COBRA toolbox and as view-only supplementary tutorial. Details on installing the COBRA toolbox are available at https://opencobra.github.io/cobratoolbox/stable/installation.html.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf325"},"PeriodicalIF":2.8,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12820401/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CoV-UniBind: a unified antibody binding database for SARS-CoV-2. CoV-UniBind: SARS-CoV-2的统一抗体结合数据库。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-08 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf328
Aryan Bhasin, Francesco Saccon, Callum Canavan, Andrew Robson, Joao Euko, Alexandra C Walls, Yunguan Fu

Summary: Since the emergence of SARS-CoV-2, numerous studies have investigated antibody interactions with viral variants in vitro, and several datasets have been curated to compile available protein structures and experimental measurements. However, existing data remain fragmented, limiting their utility for the development and validation of machine learning models for antibody-antigen interaction prediction. Here, we present CoV-UniBind, a unified database comprising over 75 000 entries of SARS-CoV-2 antibody-antigen sequence, binding, and structural data, integrated and standardized from three public sources and multiple peer-reviewed publications. To demonstrate its utility, we benchmarked multiple protein folding, inverse folding, and language models across tasks relevant to antibody design and vaccine development. We expect CoV-UniBind to facilitate future computational efforts in antibody and vaccine development against SARS-CoV-2.

Availability and implementation: The curated datasets, model scores and antibody synonyms are free to download at https://huggingface.co/datasets/InstaDeepAI/cov-unibind. Folded structures are available upon request.

摘要:自SARS-CoV-2出现以来,许多研究调查了抗体在体外与病毒变体的相互作用,并整理了几个数据集,以汇编可用的蛋白质结构和实验测量结果。然而,现有的数据仍然是碎片化的,限制了它们在用于抗体-抗原相互作用预测的机器学习模型的开发和验证中的效用。在这里,我们提出了CoV-UniBind,这是一个统一的数据库,包含超过75,000条SARS-CoV-2抗体-抗原序列,结合和结构数据,整合和标准化,来自三个公共来源和多个同行评审出版物。为了证明其实用性,我们在与抗体设计和疫苗开发相关的任务中对多种蛋白质折叠、逆折叠和语言模型进行了基准测试。我们希望CoV-UniBind能够促进未来针对SARS-CoV-2的抗体和疫苗开发的计算工作。可用性和实现:策划的数据集,模型分数和抗体同义词在https://huggingface.co/datasets/InstaDeepAI/cov-unibind免费下载。可根据要求提供折叠结构。
{"title":"CoV-UniBind: a unified antibody binding database for SARS-CoV-2.","authors":"Aryan Bhasin, Francesco Saccon, Callum Canavan, Andrew Robson, Joao Euko, Alexandra C Walls, Yunguan Fu","doi":"10.1093/bioadv/vbaf328","DOIUrl":"10.1093/bioadv/vbaf328","url":null,"abstract":"<p><strong>Summary: </strong>Since the emergence of SARS-CoV-2, numerous studies have investigated antibody interactions with viral variants <i>in vitro</i>, and several datasets have been curated to compile available protein structures and experimental measurements. However, existing data remain fragmented, limiting their utility for the development and validation of machine learning models for antibody-antigen interaction prediction. Here, we present CoV-UniBind, a unified database comprising over 75 000 entries of SARS-CoV-2 antibody-antigen sequence, binding, and structural data, integrated and standardized from three public sources and multiple peer-reviewed publications. To demonstrate its utility, we benchmarked multiple protein folding, inverse folding, and language models across tasks relevant to antibody design and vaccine development. We expect CoV-UniBind to facilitate future computational efforts in antibody and vaccine development against SARS-CoV-2.</p><p><strong>Availability and implementation: </strong>The curated datasets, model scores and antibody synonyms are free to download at https://huggingface.co/datasets/InstaDeepAI/cov-unibind. Folded structures are available upon request.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf328"},"PeriodicalIF":2.8,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12800777/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pipeasm: a tool for automated large chromosome-scale genome assembly and evaluation. Pipeasm:一个自动化的大染色体规模基因组组装和评估工具。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-02 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf326
Bruno Marques Silva, Fernanda de Jesus Trindade, Lucas Eduardo Costa Canesin, Giordano Souza, Alexandre Aleixo, Gisele Nunes, Renato Renison Moreira-Oliveira

Motivation: Although high-quality chromosome-scale genome assemblies are feasible, assembling large ones remains complex and resource-intensive. This demands reproducible and automated workflows that not only implement current best practices efficiently but also allow for improvement alongside future updates to those standards.

Results: We present Pipeasm, a Snakemake-based genome assembly pipeline containerized with Singularity. Pipeasm can use HiFi, ONT, and Hi-C data, automating read trimming, nuclear and mitogenome assembly, scaffolding, decontamination, and quality evaluation. Applied to four vertebrate species with distinct genomic characteristics, starting from a single command line and configuration file, it produced assemblies with scaffold L50 proportional to the expected chromosome and genome length, and up to 99.6% BUSCO completeness. Its output also includes detailed reports for each step, genome statistics, Hi-C maps, and files ready for curation.

Availability and implementation: Pipeasm is available at https://github.com/itvgenomics/pipeasm, implemented in Python/Snakemake with Singularity, and runs on Unix-based systems.

动机:虽然高质量的染色体规模基因组组装是可行的,但组装大型基因组仍然是复杂和资源密集的。这就需要可重复的自动化工作流,它不仅要有效地实现当前的最佳实践,而且还要允许对这些标准进行改进和未来的更新。结果:我们提出了Pipeasm,一种基于snakemaker的基因组组装管道,其中包含了Singularity。Pipeasm可以使用HiFi, ONT和Hi-C数据,自动读取修剪,核和有丝分裂基因组组装,脚手架,去污染和质量评估。应用于四种具有不同基因组特征的脊椎动物,从单个命令行和配置文件开始,它产生的支架L50与预期的染色体和基因组长度成比例,BUSCO完整性高达99.6%。它的输出还包括每个步骤的详细报告、基因组统计、Hi-C地图和准备管理的文件。可用性和实现:Pipeasm在https://github.com/itvgenomics/pipeasm上可用,用Python/Snakemake与Singularity实现,并运行在基于unix的系统上。
{"title":"Pipeasm: a tool for automated large chromosome-scale genome assembly and evaluation.","authors":"Bruno Marques Silva, Fernanda de Jesus Trindade, Lucas Eduardo Costa Canesin, Giordano Souza, Alexandre Aleixo, Gisele Nunes, Renato Renison Moreira-Oliveira","doi":"10.1093/bioadv/vbaf326","DOIUrl":"10.1093/bioadv/vbaf326","url":null,"abstract":"<p><strong>Motivation: </strong>Although high-quality chromosome-scale genome assemblies are feasible, assembling large ones remains complex and resource-intensive. This demands reproducible and automated workflows that not only implement current best practices efficiently but also allow for improvement alongside future updates to those standards.</p><p><strong>Results: </strong>We present Pipeasm, a Snakemake-based genome assembly pipeline containerized with Singularity. Pipeasm can use HiFi, ONT, and Hi-C data, automating read trimming, nuclear and mitogenome assembly, scaffolding, decontamination, and quality evaluation. Applied to four vertebrate species with distinct genomic characteristics, starting from a single command line and configuration file, it produced assemblies with scaffold L50 proportional to the expected chromosome and genome length, and up to 99.6% BUSCO completeness. Its output also includes detailed reports for each step, genome statistics, Hi-C maps, and files ready for curation.</p><p><strong>Availability and implementation: </strong>Pipeasm is available at https://github.com/itvgenomics/pipeasm, implemented in Python/Snakemake with Singularity, and runs on Unix-based systems.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf326"},"PeriodicalIF":2.8,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12800776/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ggplotAgent: a self-debugging multi-modal agent for robust and reproducible scientific visualization. ggplotAgent:一个自调试的多模态代理,用于稳健和可重复的科学可视化。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-02 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf332
Zelin Wang, Yuanyuan Yin, Jien Wang, Haiyan Yan, Xuan Xie, Yiqing Zheng

Motivation: Creating publication-quality visualizations is essential for bioinformatics but remains a bottleneck for researchers with limited coding expertise. While Large Language Models (LLMs) are proficient at generating code, they often fail in practice due to library dependencies, dataset mismatches, or syntax errors. These issues require manual intervention, slowing data interpretation.

Results: We present ggplotAgent, a novel multi-modal, self-debugging artificial intelligence agent that automates publication-ready ggplot2 visualizations. It features a dual-layered framework that resolves code execution errors and uses a vision-enabled agent to verify aesthetic correctness. In benchmarks against the DeepSeek-V3 model, ggplotAgent achieved a 100% code executability rate(versus 85%) and a "Publication-Ready" score of 1.9 (versus 0.7). Surprisingly, it showcased the ability to act as an expert collaborator by intelligently enhancing plots beyond the user's literal prompt, achieving a positive Insight Score of +0.3 over than the baseline (-0.05). These results demonstrate its ability to reliably produce accurate, high-quality visualizations directly from natural language.

Availability and implementation: ggplotAgent is freely accessible as a public web application at https://ggplotagent.databio1.com/ and an offline Streamlit app. The source code is available on GitHub at https://github.com/charlin90/ggplotAgent. This software is distributed under the MIT License.

动机:创建出版质量的可视化对于生物信息学来说是必不可少的,但对于编码专业知识有限的研究人员来说仍然是一个瓶颈。虽然大型语言模型(llm)精通生成代码,但由于库依赖、数据集不匹配或语法错误,它们经常在实践中失败。这些问题需要人工干预,降低了数据解释的速度。结果:我们提出了ggplotAgent,这是一个新颖的多模态、自调试的人工智能代理,可以自动实现可发表的ggplot2可视化。它采用了一个双层框架来解决代码执行错误,并使用支持视觉的代理来验证美学正确性。在针对DeepSeek-V3模型的基准测试中,ggplotAgent实现了100%的代码可执行率(相对于85%)和1.9的“发布就绪”分数(相对于0.7)。令人惊讶的是,它展示了作为专家合作者的能力,在用户的文字提示之外,通过智能地增强情节,获得了比基线(-0.05)更高的0.3分。这些结果证明了它能够直接从自然语言可靠地生成准确、高质量的可视化。可用性和实现:ggplotAgent作为公共web应用程序可在https://ggplotagent.databio1.com/和离线Streamlit应用程序免费访问。源代码可在GitHub上获得https://github.com/charlin90/ggplotAgent。本软件在MIT许可下发布。
{"title":"ggplotAgent: a self-debugging multi-modal agent for robust and reproducible scientific visualization.","authors":"Zelin Wang, Yuanyuan Yin, Jien Wang, Haiyan Yan, Xuan Xie, Yiqing Zheng","doi":"10.1093/bioadv/vbaf332","DOIUrl":"10.1093/bioadv/vbaf332","url":null,"abstract":"<p><strong>Motivation: </strong>Creating publication-quality visualizations is essential for bioinformatics but remains a bottleneck for researchers with limited coding expertise. While Large Language Models (LLMs) are proficient at generating code, they often fail in practice due to library dependencies, dataset mismatches, or syntax errors. These issues require manual intervention, slowing data interpretation.</p><p><strong>Results: </strong>We present ggplotAgent, a novel multi-modal, self-debugging artificial intelligence agent that automates publication-ready ggplot2 visualizations. It features a dual-layered framework that resolves code execution errors and uses a vision-enabled agent to verify aesthetic correctness. In benchmarks against the DeepSeek-V3 model, ggplotAgent achieved a 100% code executability rate(versus 85%) and a \"Publication-Ready\" score of 1.9 (versus 0.7). Surprisingly, it showcased the ability to act as an expert collaborator by intelligently enhancing plots beyond the user's literal prompt, achieving a positive Insight Score of +0.3 over than the baseline (-0.05). These results demonstrate its ability to reliably produce accurate, high-quality visualizations directly from natural language.</p><p><strong>Availability and implementation: </strong>ggplotAgent is freely accessible as a public web application at https://ggplotagent.databio1.com/ and an offline Streamlit app. The source code is available on GitHub at https://github.com/charlin90/ggplotAgent. This software is distributed under the MIT License.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf332"},"PeriodicalIF":2.8,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12802885/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Puzzler: scalable one-command platinum-quality genome assembly from HiFi and Hi-C. 难题:可扩展的单命令白金质量基因组组装从HiFi和Hi-C。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-31 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf329
Justin Merondun, Qingyi Yu

Motivation: Chromosome-level assemblies are essential for modern genomics, from comparative genomics and evolutionary studies to precision breeding. While integrated HiFi and Hi-C data now enable accurate chromosome-scale genome assemblies, the bioinformatic process remains complex and involves specialized tools and expertise. With large-scale pan-genomic efforts requiring dozens to hundreds of platinum quality chromosome-scale genomes, there is a need for scalable, portable, and user-friendly pipelines that streamline and standardize high-quality genome assembly workflows.

Results: We introduce Puzzler, a containerized, scalable pipeline for chromosome-scale de novo genome assembly using PacBio HiFi and Hi-C data. Designed for portability and minimal user input, Puzzler automates contig assembly, duplicate purging, Hi-C-based scaffolding, and chromosome assignment via synteny, even with highly diverged reference taxa. Optional modules generate input files for manual Hi-C curation or operate reference-free. Quality control is integrated and includes Hi-C contact maps, BUSCO, yak k-mer completeness, and BlobTools contamination screening. A checkpointing system ensures that previously completed tasks are not re-executed, while a simple sample sheet input structure supports scalable batch processing. Puzzler has been validated on genomes ranging from 24 Mbp to 6.5 Gbp, delivering highly contiguous assemblies with <10 min of user input, enabling high-throughput platinum-quality genome assembly.

Availability and implementation: Puzzler is released into the public domain under 17 U.S.C. §105. Source code, documentation, and tutorials are available at https://github.com/merondun/puzzler and archived on Zenodo: https://doi.org/10.5281/zenodo.15733730 and https://doi.org/10.5281/zenodo.15693025. Pre-configured runtime environments including dependencies are provided via both a Conda environment (https://anaconda.org/heritabilities/puzzler) and an Apptainer hosted both on Zenodo and Sylabs (https://cloud.sylabs.io/library/merondun/default/puzzler).

动机:染色体水平的组装对于现代基因组学是必不可少的,从比较基因组学和进化研究到精确育种。虽然集成的HiFi和Hi-C数据现在可以实现精确的染色体尺度基因组组装,但生物信息学过程仍然复杂,需要专门的工具和专业知识。由于大规模的泛基因组工作需要数十到数百个铂级染色体基因组,因此需要可扩展、便携和用户友好的管道来简化和标准化高质量的基因组组装工作流程。结果:我们介绍了Puzzler,这是一个容器化的,可扩展的流水线,用于使用PacBio HiFi和Hi-C数据进行染色体尺度的从头基因组组装。Puzzler专为可移植性和最小的用户输入而设计,即使具有高度分化的参考分类群,也可以通过同音性自动进行配置组装,重复清除,基于hi - c的脚手架和染色体分配。可选模块生成输入文件,手动Hi-C管理或操作参考自由。质量控制是集成的,包括Hi-C接触图,BUSCO,牦牛k-mer完整性和BlobTools污染筛选。检查点系统确保以前完成的任务不会被重新执行,而简单的样本表输入结构支持可扩展的批处理。Puzzler已在24mbp至6.5 Gbp的基因组上进行了验证,提供具有可用性和实现的高度连续的组装:Puzzler根据17 U.S.C.§105发布到公共领域。源代码、文档和教程可在https://github.com/merondun/puzzler上获得,并可在Zenodo上存档:https://doi.org/10.5281/zenodo.15733730和https://doi.org/10.5281/zenodo.15693025。预配置的运行时环境(包括依赖项)通过Conda环境(https://anaconda.org/heritabilities/puzzler)和在Zenodo和Sylabs (https://cloud.sylabs.io/library/merondun/default/puzzler)上托管的Apptainer提供。
{"title":"Puzzler: scalable one-command platinum-quality genome assembly from HiFi and Hi-C.","authors":"Justin Merondun, Qingyi Yu","doi":"10.1093/bioadv/vbaf329","DOIUrl":"10.1093/bioadv/vbaf329","url":null,"abstract":"<p><strong>Motivation: </strong>Chromosome-level assemblies are essential for modern genomics, from comparative genomics and evolutionary studies to precision breeding. While integrated HiFi and Hi-C data now enable accurate chromosome-scale genome assemblies, the bioinformatic process remains complex and involves specialized tools and expertise. With large-scale pan-genomic efforts requiring dozens to hundreds of platinum quality chromosome-scale genomes, there is a need for scalable, portable, and user-friendly pipelines that streamline and standardize high-quality genome assembly workflows.</p><p><strong>Results: </strong>We introduce Puzzler, a containerized, scalable pipeline for chromosome-scale <i>de novo</i> genome assembly using PacBio HiFi and Hi-C data. Designed for portability and minimal user input, Puzzler automates contig assembly, duplicate purging, Hi-C-based scaffolding, and chromosome assignment via synteny, even with highly diverged reference taxa. Optional modules generate input files for manual Hi-C curation or operate reference-free. Quality control is integrated and includes Hi-C contact maps, BUSCO, yak k-mer completeness, and BlobTools contamination screening. A checkpointing system ensures that previously completed tasks are not re-executed, while a simple sample sheet input structure supports scalable batch processing. Puzzler has been validated on genomes ranging from 24 Mbp to 6.5 Gbp, delivering highly contiguous assemblies with <10 min of user input, enabling high-throughput platinum-quality genome assembly.</p><p><strong>Availability and implementation: </strong>Puzzler is released into the public domain under 17 U.S.C. §105. Source code, documentation, and tutorials are available at https://github.com/merondun/puzzler and archived on Zenodo: https://doi.org/10.5281/zenodo.15733730 and https://doi.org/10.5281/zenodo.15693025. Pre-configured runtime environments including dependencies are provided via both a Conda environment (https://anaconda.org/heritabilities/puzzler) and an Apptainer hosted both on Zenodo and Sylabs (https://cloud.sylabs.io/library/merondun/default/puzzler).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf329"},"PeriodicalIF":2.8,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12820402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fluoro-forest: a random forest workflow for cell type annotation in high-dimensional immunofluorescence imaging with limited training data. Fluoro-forest:一个随机森林工作流,用于高维免疫荧光成像中有限训练数据的细胞类型注释。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-24 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf320
Joshua Brand, Wei Zhang, Evie Carchman, Huy Q Dinh

Motivation: Cyclic immunofluorescence (IF) techniques enable deep phenotyping of cells and help quantify tissue organization at high resolution. Due to its high dimensionality, workflows typically rely on unsupervised clustering, followed by cell type annotation at a cluster level for cell type assignment. Most of these methods use marker expression averages that lack a statistical evaluation of cell type annotations, which can result in misclassification. Here, we propose a strategy through an end-to-end pipeline using a semi-supervised, random forest approach to predict cell type annotations.

Results: Our method includes cluster-based sampling for training data, cell type prediction, and downstream visualization for interpretability of cell annotation that ultimately improves classification results. We show that our workflow can annotate cells more accurately compared to representative deep learning and probabilistic methods, with a training set <5% of the total number of cells tested. In addition, our pipeline outputs cell type probabilities and model performance metrics for users to decide if it could boost their existing clustering-based workflow results for complex IF data.

Availability and implementation: Fluoro-forest is freely available on GitHub under an MIT license (https://github.com/Josh-Brand/Fluoro-forest).

动机:循环免疫荧光(IF)技术可以实现细胞的深度表型,并有助于高分辨率量化组织组织。由于其高维性,工作流通常依赖于无监督聚类,然后在聚类级别上进行单元类型注释以进行单元类型分配。这些方法大多使用缺乏细胞类型注释统计评估的标记表达平均值,这可能导致错误分类。在这里,我们提出了一种策略,通过端到端管道使用半监督,随机森林方法来预测细胞类型注释。结果:我们的方法包括基于聚类的训练数据采样,细胞类型预测,以及最终提高分类结果的细胞注释可解释性的下游可视化。我们证明,与代表性的深度学习和概率方法相比,我们的工作流可以更准确地注释细胞,使用训练集。可用性和实现:Fluoro-forest在麻省理工学院许可(https://github.com/Josh-Brand/Fluoro-forest)下在GitHub上免费提供。
{"title":"Fluoro-forest: a random forest workflow for cell type annotation in high-dimensional immunofluorescence imaging with limited training data.","authors":"Joshua Brand, Wei Zhang, Evie Carchman, Huy Q Dinh","doi":"10.1093/bioadv/vbaf320","DOIUrl":"10.1093/bioadv/vbaf320","url":null,"abstract":"<p><strong>Motivation: </strong>Cyclic immunofluorescence (IF) techniques enable deep phenotyping of cells and help quantify tissue organization at high resolution. Due to its high dimensionality, workflows typically rely on unsupervised clustering, followed by cell type annotation at a cluster level for cell type assignment. Most of these methods use marker expression averages that lack a statistical evaluation of cell type annotations, which can result in misclassification. Here, we propose a strategy through an end-to-end pipeline using a semi-supervised, random forest approach to predict cell type annotations.</p><p><strong>Results: </strong>Our method includes cluster-based sampling for training data, cell type prediction, and downstream visualization for interpretability of cell annotation that ultimately improves classification results. We show that our workflow can annotate cells more accurately compared to representative deep learning and probabilistic methods, with a training set <5% of the total number of cells tested. In addition, our pipeline outputs cell type probabilities and model performance metrics for users to decide if it could boost their existing clustering-based workflow results for complex IF data.</p><p><strong>Availability and implementation: </strong>Fluoro-forest is freely available on GitHub under an MIT license (https://github.com/Josh-Brand/Fluoro-forest).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf320"},"PeriodicalIF":2.8,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12782655/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145954108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics advances
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1