首页 > 最新文献

Bioinformatics advances最新文献

英文 中文
Multi-view deep learning of highly multiplexed imaging data improves association of cell states with clinical outcomes. 高度复用成像数据的多视图深度学习改善了细胞状态与临床结果的关联。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-14 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbag010
Shanza Ayub, Jennifer L Gorman, Edward L Y Chen, Hartland W Jackson, Alina Selega, Kieran R Campbell

Motivation: Analysis workflows for highly multiplexed imaging technologies typically summarize each cell in terms of its post-segmentation mean expression, but additional cellular information can be quantified including cell morphology, sub-cellular expression patterns, and spatial cellular context, ultimately giving a multi-modal view of each cell. While deep learning models such as variational autoencoders are well-established for other multi-modal single-cell assays, their ability to integrate these multiple views of a cell from highly multiplexed imaging data remains largely unknown.

Results: Here, we explore the abilities of multi-modal variational autoencoders to learn unified latent cellular representations from multiple views of each single-cell quantified from highly multiplexed imaging, including mean expression, morphology, sub-cellular protein co-localization, and spatial cellular context, while conditioning on technical and batch specific effects. We show that the integrated multi-modal latent space is often more associated with patient-specific clinical outcomes compared to a set of existing baselines. In addition, we perform ablation analyses to understand which input views contribute to model performance, and explore the ability of these models to learn cellular representations that align with cellular phenotypes and enable integration across divergent datasets.

Availability and implementation: hmiVAE is implemented as a python package and is available at https://github.com/camlab-bioml/hmiVAE.

动机:高复用成像技术的分析工作流程通常根据其分割后的平均表达来总结每个细胞,但可以量化其他细胞信息,包括细胞形态、亚细胞表达模式和空间细胞背景,最终给出每个细胞的多模态视图。虽然深度学习模型(如变分自编码器)在其他多模态单细胞分析中已经得到了很好的应用,但它们整合来自高复用成像数据的细胞多个视图的能力在很大程度上仍然未知。结果:在这里,我们探索了多模态变分自编码器的能力,从高度多路成像量化的每个单细胞的多个视图中学习统一的潜在细胞表征,包括平均表达、形态、亚细胞蛋白共定位和空间细胞背景,同时调节技术和批量特异性效应。我们表明,与一组现有基线相比,综合多模态潜在空间通常与患者特异性临床结果更相关。此外,我们执行消融分析,以了解哪些输入视图有助于模型性能,并探索这些模型学习与细胞表型一致的细胞表示的能力,并实现跨不同数据集的集成。可用性和实现:hmiVAE作为python包实现,可从https://github.com/camlab-bioml/hmiVAE获得。
{"title":"Multi-view deep learning of highly multiplexed imaging data improves association of cell states with clinical outcomes.","authors":"Shanza Ayub, Jennifer L Gorman, Edward L Y Chen, Hartland W Jackson, Alina Selega, Kieran R Campbell","doi":"10.1093/bioadv/vbag010","DOIUrl":"https://doi.org/10.1093/bioadv/vbag010","url":null,"abstract":"<p><strong>Motivation: </strong>Analysis workflows for highly multiplexed imaging technologies typically summarize each cell in terms of its post-segmentation mean expression, but additional cellular information can be quantified including cell morphology, sub-cellular expression patterns, and spatial cellular context, ultimately giving a multi-modal view of each cell. While deep learning models such as variational autoencoders are well-established for other multi-modal single-cell assays, their ability to integrate these multiple views of a cell from highly multiplexed imaging data remains largely unknown.</p><p><strong>Results: </strong>Here, we explore the abilities of multi-modal variational autoencoders to learn unified latent cellular representations from multiple views of each single-cell quantified from highly multiplexed imaging, including mean expression, morphology, sub-cellular protein co-localization, and spatial cellular context, while conditioning on technical and batch specific effects. We show that the integrated multi-modal latent space is often more associated with patient-specific clinical outcomes compared to a set of existing baselines. In addition, we perform ablation analyses to understand which input views contribute to model performance, and explore the ability of these models to learn cellular representations that align with cellular phenotypes and enable integration across divergent datasets.</p><p><strong>Availability and implementation: </strong>hmiVAE is implemented as a python package and is available at https://github.com/camlab-bioml/hmiVAE.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag010"},"PeriodicalIF":2.8,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12955845/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147357756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CanDrivR-CS: a cancer-specific machine learning framework for distinguishing recurrent and rare variants. CanDrivR-CS:一种癌症特异性机器学习框架,用于区分复发性和罕见的变异。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-12 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbag008
Amy Francis, Colin Campbell, Tom R Gaunt

Motivation: Missense variants-single nucleotide substitutions that result in an amino acid change in the encoded protein-play an important role in cancer. Distinguishing between recurrent and rare missense variants may reveal insights into selective pressures and functional consequences. While recurrent variants may undergo positive selection across patients, rare variants can also drive resistance or other phenotypes. However, most existing tools predict pathogenicity across broad populations and ignore tumour-specific contexts. Here, we present CanDrivR-CS, a suite of cancer-specific gradient boosting models designed to distinguish between rare and recurrent somatic missense variants.

Results: We curated data from the International Cancer Genome Consortium (ICGC) and trained 50 cancer-specific models. These significantly outperformed a pan-cancer baseline, achieving up to 90% F1 score in leave-one-group-out cross-validation (LOGO-CV) for skin melanoma. Notably, DNA shape features ranked among the most predictive across all cancers, with recurrent variants enriched in structurally complex DNA regions such as bends and rolls-potential mutational hotspots.

Availability and implementation: All code and data are available at CanDrivR-CS GitHub repository https://github.com/amyfrancis97/CanDrivR-CS, with further advice on the installation procedure in Section 1 of the Supplementary Materials.

动机:错义变异——导致编码蛋白中氨基酸变化的单核苷酸替换——在癌症中起着重要作用。区分复发性和罕见的错义变异可能揭示选择压力和功能后果的见解。虽然复发性变异可能在患者中进行正选择,但罕见的变异也可能导致耐药性或其他表型。然而,大多数现有的工具预测了广泛人群的致病性,而忽略了肿瘤特异性背景。在这里,我们提出了CanDrivR-CS,一套癌症特异性梯度增强模型,旨在区分罕见和复发的体细胞错义变异。结果:我们整理了来自国际癌症基因组联盟(ICGC)的数据,并训练了50个癌症特异性模型。这些显著优于泛癌症基线,在皮肤黑色素瘤的留一组交叉验证(LOGO-CV)中达到高达90%的F1评分。值得注意的是,DNA形状特征在所有癌症中都是最具预测性的,在结构复杂的DNA区域(如弯曲和劳斯莱斯)中,反复出现的变体丰富,可能是突变热点。可用性和实现:所有代码和数据都可以在CanDrivR-CS GitHub存储库https://github.com/amyfrancis97/CanDrivR-CS上获得,关于安装过程的进一步建议请参见补充材料第1节。
{"title":"<i>CanDrivR-CS</i>: a cancer-specific machine learning framework for distinguishing recurrent and rare variants.","authors":"Amy Francis, Colin Campbell, Tom R Gaunt","doi":"10.1093/bioadv/vbag008","DOIUrl":"10.1093/bioadv/vbag008","url":null,"abstract":"<p><strong>Motivation: </strong>Missense variants-single nucleotide substitutions that result in an amino acid change in the encoded protein-play an important role in cancer. Distinguishing between recurrent and rare missense variants may reveal insights into selective pressures and functional consequences. While recurrent variants may undergo positive selection across patients, rare variants can also drive resistance or other phenotypes. However, most existing tools predict pathogenicity across broad populations and ignore tumour-specific contexts. Here, we present <i>CanDrivR-CS</i>, a suite of cancer-specific gradient boosting models designed to distinguish between rare and recurrent somatic missense variants.</p><p><strong>Results: </strong>We curated data from the International Cancer Genome Consortium (ICGC) and trained 50 cancer-specific models. These significantly outperformed a pan-cancer baseline, achieving up to 90% F1 score in leave-one-group-out cross-validation (LOGO-CV) for skin melanoma. Notably, DNA shape features ranked among the most predictive across all cancers, with recurrent variants enriched in structurally complex DNA regions such as bends and rolls-potential mutational hotspots.</p><p><strong>Availability and implementation: </strong>All code and data are available at <i>CanDrivR-CS</i> GitHub repository https://github.com/amyfrancis97/CanDrivR-CS, with further advice on the installation procedure in Section 1 of the Supplementary Materials.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag008"},"PeriodicalIF":2.8,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12935160/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147312628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FRAME: fast reference-based ancestry makeup estimation tool. FRAME:快速基于参考的祖先组成估计工具。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-12 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbag006
Pramesh Shakya, Ardalan Naseri, Degui Zhi, Shaojie Zhang

Motivation: The availability of large-scale genetic data presents a unique opportunity to study the genetic ancestries of individuals, which requires an efficient and scalable method. The existing global ancestry methods are accurate, but they cannot scale to large genetic datasets. Identity-by-descent (IBD) segments are DNA segments shared by individuals such that they are inherited from a common recent ancestor without recombination. These IBD segments, which reflect co-ancestry, provide an efficient alternative for inferring genetic ancestry.

Results: We introduced a reference-based global ancestry inference method called FRAME (Fast Reference-based Ancestry Makeup Estimation). FRAME utilizes partial local ancestry information estimated through IBD segments. Instead of using sophisticated local ancestry inference methods designed to make the best calls at each site, we employed an efficient IBD method for faster and space-efficient algorithms that are robust to genotyping errors. Additionally, we introduced a new method of panel refinement that can enrich the ancestral homogeneity of individual haplotypes in the reference panel, thus leading to more accurate ancestry composition estimates. We benchmarked the performance of our method with real and simulated data. FRAME consumes ∼10-100 times less memory while maintaining a comparable accuracy.

Availability and implementation: Source code is available at https://github.com/ucfcbb/FRAME.

动机:大规模遗传数据的可用性为研究个体的遗传祖先提供了一个独特的机会,这需要一种有效和可扩展的方法。现有的全球祖先方法是准确的,但它们不能扩展到大型遗传数据集。血统识别(IBD)片段是个体共享的DNA片段,它们从共同的最近祖先那里遗传而来,没有重组。这些IBD片段反映了共同祖先,为推断遗传祖先提供了有效的替代方法。结果:我们引入了一种基于参考的全局祖先推断方法FRAME (Fast reference-based ancestry Makeup Estimation)。FRAME利用通过IBD片段估计的部分本地祖先信息。我们没有使用复杂的本地祖先推断方法来设计每个位点的最佳呼叫,而是采用了一种高效的IBD方法,这种方法更快,更节省空间,对基因分型错误具有鲁棒性。此外,我们引入了一种新的面板改进方法,可以丰富参考面板中单个单倍型的祖先同质性,从而导致更准确的祖先组成估计。我们用真实和模拟数据对我们的方法的性能进行了基准测试。FRAME消耗的内存减少了10-100倍,同时保持了相当的精度。可用性和实现:源代码可从https://github.com/ucfcbb/FRAME获得。
{"title":"FRAME: fast reference-based ancestry makeup estimation tool.","authors":"Pramesh Shakya, Ardalan Naseri, Degui Zhi, Shaojie Zhang","doi":"10.1093/bioadv/vbag006","DOIUrl":"10.1093/bioadv/vbag006","url":null,"abstract":"<p><strong>Motivation: </strong>The availability of large-scale genetic data presents a unique opportunity to study the genetic ancestries of individuals, which requires an efficient and scalable method. The existing global ancestry methods are accurate, but they cannot scale to large genetic datasets. Identity-by-descent (IBD) segments are DNA segments shared by individuals such that they are inherited from a common recent ancestor without recombination. These IBD segments, which reflect co-ancestry, provide an efficient alternative for inferring genetic ancestry.</p><p><strong>Results: </strong>We introduced a reference-based global ancestry inference method called FRAME (Fast Reference-based Ancestry Makeup Estimation). FRAME utilizes partial local ancestry information estimated through IBD segments. Instead of using sophisticated local ancestry inference methods designed to make the best calls at each site, we employed an efficient IBD method for faster and space-efficient algorithms that are robust to genotyping errors. Additionally, we introduced a new method of panel refinement that can enrich the ancestral homogeneity of individual haplotypes in the reference panel, thus leading to more accurate ancestry composition estimates. We benchmarked the performance of our method with real and simulated data. FRAME consumes ∼10-100 times less memory while maintaining a comparable accuracy.</p><p><strong>Availability and implementation: </strong>Source code is available at https://github.com/ucfcbb/FRAME.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag006"},"PeriodicalIF":2.8,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866910/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VST-DAVis: an R Shiny application and web-browser for spatial transcriptomics data analysis and visualization. VST-DAVis:一个用于空间转录组学数据分析和可视化的R Shiny应用程序和web浏览器。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-09 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbag007
Sankarasubramanian Jagadesan, Chittibabu Guda

Summary: Visium HD Spatial Transcriptomics Data Analysis and Visualization (VST-DAVis) is an interactive, R Shiny application and web browser designed for intuitive analysis of spatial transcriptomics data generated using the 10x Genomics Visium HD platform. This user-friendly tool empowers researchers, particularly those without programming expertise, to perform end-to-end spatial transcriptomics analysis through a streamlined graphical interface. The platform is capable of handling both single and multiple samples, enabling comparative analyses across diverse biological conditions or replicates. It accepts various input formats including both H5 and matrix-based files from Space Ranger and outputs high-quality graphics from various visualization tools. VST-DAVis integrates several widely used R packages, such as Seurat, Monocle3, CellChat, and hdWGCNA, to offer a robust and flexible analytical environment that supports a wide range of analytical tasks, including quality control, clustering, marker gene identification, subclustering, trajectory inference, pathway enrichment analysis, cell-cell communication modeling, co-expression analysis, and transcription factor network reconstruction. By combining its analytical depth with user-friendliness, VST-DAVis makes advanced analyses accessible to various research communities that utilize spatial transcriptomics data.

Availability and implementation: VST-DAVis is freely available at https://www.gudalab-rtools.net/VST-DAVis. It is implemented in R 4.5.2 and Bioconductor ≥ 3.22 using the Shiny framework and supports input from Space Ranger outputs. The source code and documentation are hosted on GitHub: https://github.com/GudaLab/VST-DAVis.

Visium HD空间转录组学数据分析和可视化(VST-DAVis)是一个交互式的R Shiny应用程序和web浏览器,专为使用10x Genomics Visium HD平台生成的空间转录组学数据进行直观分析而设计。这个用户友好的工具使研究人员,特别是那些没有编程专业知识的研究人员,能够通过简化的图形界面执行端到端的空间转录组学分析。该平台能够处理单个和多个样本,能够在不同的生物条件或复制中进行比较分析。它接受各种输入格式,包括来自太空游侠的H5和基于矩阵的文件,并从各种可视化工具输出高质量的图形。VST-DAVis集成了几个广泛使用的R软件包,如Seurat、Monocle3、CellChat和hdWGCNA,提供了一个强大而灵活的分析环境,支持广泛的分析任务,包括质量控制、聚类、标记基因鉴定、亚聚类、轨迹推断、途径富集分析、细胞-细胞通信建模、共表达分析和转录因子网络重建。通过将其分析深度与用户友好性相结合,VST-DAVis为利用空间转录组学数据的各种研究团体提供了先进的分析。可用性和实现:VST-DAVis可在https://www.gudalab-rtools.net/VST-DAVis免费获得。它在R 4.5.2和Bioconductor≥3.22中使用Shiny框架实现,并支持来自Space Ranger输出的输入。源代码和文档托管在GitHub上:https://github.com/GudaLab/VST-DAVis。
{"title":"VST-DAVis: an R Shiny application and web-browser for spatial transcriptomics data analysis and visualization.","authors":"Sankarasubramanian Jagadesan, Chittibabu Guda","doi":"10.1093/bioadv/vbag007","DOIUrl":"10.1093/bioadv/vbag007","url":null,"abstract":"<p><strong>Summary: </strong>Visium HD Spatial Transcriptomics Data Analysis and Visualization (VST-DAVis) is an interactive, R Shiny application and web browser designed for intuitive analysis of spatial transcriptomics data generated using the 10x Genomics Visium HD platform. This user-friendly tool empowers researchers, particularly those without programming expertise, to perform end-to-end spatial transcriptomics analysis through a streamlined graphical interface. The platform is capable of handling both single and multiple samples, enabling comparative analyses across diverse biological conditions or replicates. It accepts various input formats including both H5 and matrix-based files from Space Ranger and outputs high-quality graphics from various visualization tools. VST-DAVis integrates several widely used R packages, such as Seurat, Monocle3, CellChat, and hdWGCNA, to offer a robust and flexible analytical environment that supports a wide range of analytical tasks, including quality control, clustering, marker gene identification, subclustering, trajectory inference, pathway enrichment analysis, cell-cell communication modeling, co-expression analysis, and transcription factor network reconstruction. By combining its analytical depth with user-friendliness, VST-DAVis makes advanced analyses accessible to various research communities that utilize spatial transcriptomics data.</p><p><strong>Availability and implementation: </strong>VST-DAVis is freely available at https://www.gudalab-rtools.net/VST-DAVis. It is implemented in R 4.5.2 and Bioconductor ≥ 3.22 using the Shiny framework and supports input from Space Ranger outputs. The source code and documentation are hosted on GitHub: https://github.com/GudaLab/VST-DAVis.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag007"},"PeriodicalIF":2.8,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866912/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ChromoMapper: a new tool to quickly compare large genome assemblies. ChromoMapper:一种快速比较大型基因组组装的新工具。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-09 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbag005
Elvira Toscano, Elena Cimmino, Angelo Boccia, Leandra Sepe, Giovanni Paolella

Motivation: Quality assessment and assembly comparison are essential steps while assembling new genomes. Many tools for evaluating assemblies typically provide synthetic parameters representing assembly quality or overall features, while others provide long detailed files where it is not always easy to identify and visualize the regions of correspondence and difference among different chromosome assemblies.

Results: Here we present ChromoMapper, a new tool which scans the output from QUAST, as well as other similar alignment description files, to quickly identify and display similarities and differences between the compared assemblies. It uses the information provided about aligned blocks, combined with additional annotations, to represent the main alignment regions at chromosomal or sub-chromosomal scale, highlighting similarities and collinearity between compared sequences, points of inconsistency, discontinuities, repeated regions and interruptions in the assembled sequences.

Availability and implementation: ChromoMapper is available at https://chromomapper.ceinge.unina.it/ and via Zenodo (https://doi.org/10.5281/zenodo.16778863).

动机:质量评估和组装比较是组装新基因组的必要步骤。许多评估组合的工具通常提供表示组合质量或整体特征的合成参数,而其他工具提供长而详细的文件,其中并不总是容易识别和可视化不同染色体组合之间的对应和差异区域。结果:在这里,我们提出了ChromoMapper,一个扫描QUAST输出的新工具,以及其他类似的比对描述文件,以快速识别和显示比较组件之间的相似性和差异性。它利用所提供的关于对齐块的信息,结合额外的注释,来表示染色体或亚染色体尺度上的主要对齐区域,突出比较序列之间的相似性和共线性,不一致点,不连续点,重复区域和组装序列中的中断。可用性和实现:ChromoMapper可在https://chromomapper.ceinge.unina.it/和通过Zenodo (https://doi.org/10.5281/zenodo.16778863)。
{"title":"ChromoMapper: a new tool to quickly compare large genome assemblies.","authors":"Elvira Toscano, Elena Cimmino, Angelo Boccia, Leandra Sepe, Giovanni Paolella","doi":"10.1093/bioadv/vbag005","DOIUrl":"https://doi.org/10.1093/bioadv/vbag005","url":null,"abstract":"<p><strong>Motivation: </strong>Quality assessment and assembly comparison are essential steps while assembling new genomes. Many tools for evaluating assemblies typically provide synthetic parameters representing assembly quality or overall features, while others provide long detailed files where it is not always easy to identify and visualize the regions of correspondence and difference among different chromosome assemblies.</p><p><strong>Results: </strong>Here we present <i>ChromoMapper</i>, a new tool which scans the output from <i>QUAST</i>, as well as other similar alignment description files, to quickly identify and display similarities and differences between the compared assemblies. It uses the information provided about aligned blocks, combined with additional annotations, to represent the main alignment regions at chromosomal or sub-chromosomal scale, highlighting similarities and collinearity between compared sequences, points of inconsistency, discontinuities, repeated regions and interruptions in the assembled sequences.</p><p><strong>Availability and implementation: </strong><i>ChromoMapper</i> is available at https://chromomapper.ceinge.unina.it/ and via Zenodo (https://doi.org/10.5281/zenodo.16778863).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag005"},"PeriodicalIF":2.8,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12947579/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147328491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EC-Bench: a benchmark for enzyme commission number prediction. EC-Bench:酶委托数预测的基准。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-08 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbag004
Saeedeh Davoudi, Christopher S Henry, Christopher S Miller, Farnoush Banaei-Kashani

Motivation: Enzymes are proteins that catalyze specific biochemical reactions in cells. Enzyme Commission (EC) numbers are used to annotate enzymes in a four-level hierarchy that classifies enzymes based on the specific chemical reactions they catalyze. Accurate EC number prediction is essential for understanding enzyme functions. Despite the availability of numerous methods for predicting EC numbers from protein sequences, there is no unified framework for evaluating and studying such methods systematically. This gap limits the ability of the community to identify the most effective approaches for enzyme annotation.

Results: We introduce EC-Bench, a benchmark for EC number prediction, consisting of (i) an initial representative set of existing methods (including homology-based, deep learning, contrastive learning, and language model methods), (ii) existing and novel accuracy and efficiency performance metrics, and (iii) selected datasets to allow for comprehensive comparative study. EC-Bench is open-source and provides a framework for researchers to not only compare among existing methods objectively under uniform conditions, but also to introduce and effectively evaluate performance of new methods in a comparative framework. To demonstrate the utility of EC-Bench, we perform extensive experimentation to compare the existing EC number prediction methods and establish their advantages and disadvantages in a variety of prediction tasks, namely "exact EC number prediction," "EC number completion," and (partial or additional) "EC number recommendation." We find wide variation in the performance of different methods, but also subtle but potentially useful differences in the performance of different methods across tasks and for different parts of the EC hierarchy.

Availability and implementation: The benchmarking pipeline is available at https://github.com/dsaeedeh/EC-Bench.

动机:酶是细胞中催化特定生化反应的蛋白质。酶委员会(Enzyme Commission, EC)编号用于对酶进行注释,酶根据其催化的特定化学反应进行四级分类。准确的EC数预测对于理解酶的功能至关重要。尽管有许多方法可以从蛋白质序列中预测EC数,但没有一个统一的框架来系统地评估和研究这些方法。这种差距限制了社区确定最有效的酶注释方法的能力。结果:我们介绍了EC- bench,这是一个EC数预测的基准,由(i)现有方法的初始代表性集(包括基于同构的、深度学习的、对比学习的和语言模型方法),(ii)现有的和新的准确性和效率性能指标,以及(iii)选择的数据集进行全面的比较研究。EC-Bench是开源的,为研究人员提供了一个框架,不仅可以在统一条件下客观地比较现有方法,还可以在比较框架下介绍和有效评估新方法的性能。为了证明EC- bench的实用性,我们进行了大量的实验来比较现有的EC数预测方法,并确定它们在各种预测任务中的优缺点,即“精确EC数预测”、“EC数完成”和(部分或额外)“EC数推荐”。我们发现不同方法的性能差异很大,但在不同任务和EC层次结构的不同部分,不同方法的性能差异也很微妙,但可能有用。可用性和实现:基准测试管道可在https://github.com/dsaeedeh/EC-Bench上获得。
{"title":"EC-Bench: a benchmark for enzyme commission number prediction.","authors":"Saeedeh Davoudi, Christopher S Henry, Christopher S Miller, Farnoush Banaei-Kashani","doi":"10.1093/bioadv/vbag004","DOIUrl":"10.1093/bioadv/vbag004","url":null,"abstract":"<p><strong>Motivation: </strong>Enzymes are proteins that catalyze specific biochemical reactions in cells. Enzyme Commission (EC) numbers are used to annotate enzymes in a four-level hierarchy that classifies enzymes based on the specific chemical reactions they catalyze. Accurate EC number prediction is essential for understanding enzyme functions. Despite the availability of numerous methods for predicting EC numbers from protein sequences, there is no unified framework for evaluating and studying such methods systematically. This gap limits the ability of the community to identify the most effective approaches for enzyme annotation.</p><p><strong>Results: </strong>We introduce EC-Bench, a benchmark for EC number prediction, consisting of (i) an initial representative set of existing methods (including homology-based, deep learning, contrastive learning, and language model methods), (ii) existing and novel accuracy and efficiency performance metrics, and (iii) selected datasets to allow for comprehensive comparative study. EC-Bench is open-source and provides a framework for researchers to not only compare among existing methods objectively under uniform conditions, but also to introduce and effectively evaluate performance of new methods in a comparative framework. To demonstrate the utility of EC-Bench, we perform extensive experimentation to compare the existing EC number prediction methods and establish their advantages and disadvantages in a variety of prediction tasks, namely \"exact EC number prediction,\" \"EC number completion,\" and (partial or additional) \"EC number recommendation.\" We find wide variation in the performance of different methods, but also subtle but potentially useful differences in the performance of different methods across tasks and for different parts of the EC hierarchy.</p><p><strong>Availability and implementation: </strong>The benchmarking pipeline is available at https://github.com/dsaeedeh/EC-Bench.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag004"},"PeriodicalIF":2.8,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12889163/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146168086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The nutrition toolbox permits in silico generation, analysis, and optimization of personalized diets through metabolic modelling. 营养工具箱允许通过代谢模型生成、分析和优化个性化饮食。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-08 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf325
Bram Nap, Bronson Weston, Annette Brandt, Maximilian F Wodak, Ina Bergheim, Ines Thiele

Motivation: Nutrition is an important factor in human health, used to alleviate or prevent symptoms of various diseases. However, the effects of nutrition on the gut microbiome and human metabolism are not well understood. Whole-body metabolic models (WBMs) have been applied to study relationships between regional diets and human/microbiome metabolism. This method requires diets to be defined at the metabolite level, rather than the food item level, which has gated the application of personalized diets to WBMs.

Results: We developed the Nutrition Toolbox, which leverages open-source databases containing metabolite composition for over ten thousand food items to convert food items into their metabolic composition to create in silico diets. Additionally, when used with a previously published nutrition algorithm, minimal changes to a diet can be identified to achieve desirable shifts in human and microbiome metabolism. Taken together, we believe that the Nutrition Toolbox can help to understand the effects of nutrition on human metabolism and has the potential to contribute to personalized nutrition.

Availability and implementation: The Nutrition Toolbox is written in MATLAB. The code can be found at https://github.com/opencobra/cobratoolbox. A tutorial explaining the code is available in the COBRA toolbox and as view-only supplementary tutorial. Details on installing the COBRA toolbox are available at https://opencobra.github.io/cobratoolbox/stable/installation.html.

动机:营养是人类健康的重要因素,用于减轻或预防各种疾病的症状。然而,营养对肠道微生物群和人体代谢的影响尚不清楚。全身代谢模型(WBMs)已被应用于研究区域饮食与人体/微生物组代谢之间的关系。这种方法要求在代谢物水平上定义饮食,而不是在食物项目水平上定义饮食,这限制了个性化饮食在体重增加者中的应用。结果:我们开发了营养工具箱,它利用包含超过一万种食物的代谢物组成的开源数据库,将食物转化为它们的代谢组成,以创建硅化饮食。此外,当与先前发表的营养算法一起使用时,可以确定饮食的最小变化,以实现人体和微生物组代谢的理想变化。综上所述,我们相信营养工具箱可以帮助理解营养对人体代谢的影响,并有可能为个性化营养做出贡献。可用性和实现:营养工具箱是用MATLAB编写的。代码可以在https://github.com/opencobra/cobratoolbox上找到。在COBRA工具箱中可以找到解释代码的教程,也可以作为仅视图的补充教程。有关安装COBRA工具箱的详细信息,请访问https://opencobra.github.io/cobratoolbox/stable/installation.html。
{"title":"The nutrition toolbox permits <i>in silico</i> generation, analysis, and optimization of personalized diets through metabolic modelling.","authors":"Bram Nap, Bronson Weston, Annette Brandt, Maximilian F Wodak, Ina Bergheim, Ines Thiele","doi":"10.1093/bioadv/vbaf325","DOIUrl":"10.1093/bioadv/vbaf325","url":null,"abstract":"<p><strong>Motivation: </strong>Nutrition is an important factor in human health, used to alleviate or prevent symptoms of various diseases. However, the effects of nutrition on the gut microbiome and human metabolism are not well understood. Whole-body metabolic models (WBMs) have been applied to study relationships between regional diets and human/microbiome metabolism. This method requires diets to be defined at the metabolite level, rather than the food item level, which has gated the application of personalized diets to WBMs.</p><p><strong>Results: </strong>We developed the Nutrition Toolbox, which leverages open-source databases containing metabolite composition for over ten thousand food items to convert food items into their metabolic composition to create <i>in silico</i> diets. Additionally, when used with a previously published nutrition algorithm, minimal changes to a diet can be identified to achieve desirable shifts in human and microbiome metabolism. Taken together, we believe that the Nutrition Toolbox can help to understand the effects of nutrition on human metabolism and has the potential to contribute to personalized nutrition.</p><p><strong>Availability and implementation: </strong>The Nutrition Toolbox is written in MATLAB. The code can be found at https://github.com/opencobra/cobratoolbox. A tutorial explaining the code is available in the COBRA toolbox and as view-only supplementary tutorial. Details on installing the COBRA toolbox are available at https://opencobra.github.io/cobratoolbox/stable/installation.html.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf325"},"PeriodicalIF":2.8,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12820401/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CoV-UniBind: a unified antibody binding database for SARS-CoV-2. CoV-UniBind: SARS-CoV-2的统一抗体结合数据库。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-08 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf328
Aryan Bhasin, Francesco Saccon, Callum Canavan, Andrew Robson, Joao Euko, Alexandra C Walls, Yunguan Fu

Summary: Since the emergence of SARS-CoV-2, numerous studies have investigated antibody interactions with viral variants in vitro, and several datasets have been curated to compile available protein structures and experimental measurements. However, existing data remain fragmented, limiting their utility for the development and validation of machine learning models for antibody-antigen interaction prediction. Here, we present CoV-UniBind, a unified database comprising over 75 000 entries of SARS-CoV-2 antibody-antigen sequence, binding, and structural data, integrated and standardized from three public sources and multiple peer-reviewed publications. To demonstrate its utility, we benchmarked multiple protein folding, inverse folding, and language models across tasks relevant to antibody design and vaccine development. We expect CoV-UniBind to facilitate future computational efforts in antibody and vaccine development against SARS-CoV-2.

Availability and implementation: The curated datasets, model scores and antibody synonyms are free to download at https://huggingface.co/datasets/InstaDeepAI/cov-unibind. Folded structures are available upon request.

摘要:自SARS-CoV-2出现以来,许多研究调查了抗体在体外与病毒变体的相互作用,并整理了几个数据集,以汇编可用的蛋白质结构和实验测量结果。然而,现有的数据仍然是碎片化的,限制了它们在用于抗体-抗原相互作用预测的机器学习模型的开发和验证中的效用。在这里,我们提出了CoV-UniBind,这是一个统一的数据库,包含超过75,000条SARS-CoV-2抗体-抗原序列,结合和结构数据,整合和标准化,来自三个公共来源和多个同行评审出版物。为了证明其实用性,我们在与抗体设计和疫苗开发相关的任务中对多种蛋白质折叠、逆折叠和语言模型进行了基准测试。我们希望CoV-UniBind能够促进未来针对SARS-CoV-2的抗体和疫苗开发的计算工作。可用性和实现:策划的数据集,模型分数和抗体同义词在https://huggingface.co/datasets/InstaDeepAI/cov-unibind免费下载。可根据要求提供折叠结构。
{"title":"CoV-UniBind: a unified antibody binding database for SARS-CoV-2.","authors":"Aryan Bhasin, Francesco Saccon, Callum Canavan, Andrew Robson, Joao Euko, Alexandra C Walls, Yunguan Fu","doi":"10.1093/bioadv/vbaf328","DOIUrl":"10.1093/bioadv/vbaf328","url":null,"abstract":"<p><strong>Summary: </strong>Since the emergence of SARS-CoV-2, numerous studies have investigated antibody interactions with viral variants <i>in vitro</i>, and several datasets have been curated to compile available protein structures and experimental measurements. However, existing data remain fragmented, limiting their utility for the development and validation of machine learning models for antibody-antigen interaction prediction. Here, we present CoV-UniBind, a unified database comprising over 75 000 entries of SARS-CoV-2 antibody-antigen sequence, binding, and structural data, integrated and standardized from three public sources and multiple peer-reviewed publications. To demonstrate its utility, we benchmarked multiple protein folding, inverse folding, and language models across tasks relevant to antibody design and vaccine development. We expect CoV-UniBind to facilitate future computational efforts in antibody and vaccine development against SARS-CoV-2.</p><p><strong>Availability and implementation: </strong>The curated datasets, model scores and antibody synonyms are free to download at https://huggingface.co/datasets/InstaDeepAI/cov-unibind. Folded structures are available upon request.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf328"},"PeriodicalIF":2.8,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12800777/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pipeasm: a tool for automated large chromosome-scale genome assembly and evaluation. Pipeasm:一个自动化的大染色体规模基因组组装和评估工具。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-02 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf326
Bruno Marques Silva, Fernanda de Jesus Trindade, Lucas Eduardo Costa Canesin, Giordano Souza, Alexandre Aleixo, Gisele Nunes, Renato Renison Moreira-Oliveira

Motivation: Although high-quality chromosome-scale genome assemblies are feasible, assembling large ones remains complex and resource-intensive. This demands reproducible and automated workflows that not only implement current best practices efficiently but also allow for improvement alongside future updates to those standards.

Results: We present Pipeasm, a Snakemake-based genome assembly pipeline containerized with Singularity. Pipeasm can use HiFi, ONT, and Hi-C data, automating read trimming, nuclear and mitogenome assembly, scaffolding, decontamination, and quality evaluation. Applied to four vertebrate species with distinct genomic characteristics, starting from a single command line and configuration file, it produced assemblies with scaffold L50 proportional to the expected chromosome and genome length, and up to 99.6% BUSCO completeness. Its output also includes detailed reports for each step, genome statistics, Hi-C maps, and files ready for curation.

Availability and implementation: Pipeasm is available at https://github.com/itvgenomics/pipeasm, implemented in Python/Snakemake with Singularity, and runs on Unix-based systems.

动机:虽然高质量的染色体规模基因组组装是可行的,但组装大型基因组仍然是复杂和资源密集的。这就需要可重复的自动化工作流,它不仅要有效地实现当前的最佳实践,而且还要允许对这些标准进行改进和未来的更新。结果:我们提出了Pipeasm,一种基于snakemaker的基因组组装管道,其中包含了Singularity。Pipeasm可以使用HiFi, ONT和Hi-C数据,自动读取修剪,核和有丝分裂基因组组装,脚手架,去污染和质量评估。应用于四种具有不同基因组特征的脊椎动物,从单个命令行和配置文件开始,它产生的支架L50与预期的染色体和基因组长度成比例,BUSCO完整性高达99.6%。它的输出还包括每个步骤的详细报告、基因组统计、Hi-C地图和准备管理的文件。可用性和实现:Pipeasm在https://github.com/itvgenomics/pipeasm上可用,用Python/Snakemake与Singularity实现,并运行在基于unix的系统上。
{"title":"Pipeasm: a tool for automated large chromosome-scale genome assembly and evaluation.","authors":"Bruno Marques Silva, Fernanda de Jesus Trindade, Lucas Eduardo Costa Canesin, Giordano Souza, Alexandre Aleixo, Gisele Nunes, Renato Renison Moreira-Oliveira","doi":"10.1093/bioadv/vbaf326","DOIUrl":"10.1093/bioadv/vbaf326","url":null,"abstract":"<p><strong>Motivation: </strong>Although high-quality chromosome-scale genome assemblies are feasible, assembling large ones remains complex and resource-intensive. This demands reproducible and automated workflows that not only implement current best practices efficiently but also allow for improvement alongside future updates to those standards.</p><p><strong>Results: </strong>We present Pipeasm, a Snakemake-based genome assembly pipeline containerized with Singularity. Pipeasm can use HiFi, ONT, and Hi-C data, automating read trimming, nuclear and mitogenome assembly, scaffolding, decontamination, and quality evaluation. Applied to four vertebrate species with distinct genomic characteristics, starting from a single command line and configuration file, it produced assemblies with scaffold L50 proportional to the expected chromosome and genome length, and up to 99.6% BUSCO completeness. Its output also includes detailed reports for each step, genome statistics, Hi-C maps, and files ready for curation.</p><p><strong>Availability and implementation: </strong>Pipeasm is available at https://github.com/itvgenomics/pipeasm, implemented in Python/Snakemake with Singularity, and runs on Unix-based systems.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf326"},"PeriodicalIF":2.8,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12800776/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ggplotAgent: a self-debugging multi-modal agent for robust and reproducible scientific visualization. ggplotAgent:一个自调试的多模态代理,用于稳健和可重复的科学可视化。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-02 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf332
Zelin Wang, Yuanyuan Yin, Jien Wang, Haiyan Yan, Xuan Xie, Yiqing Zheng

Motivation: Creating publication-quality visualizations is essential for bioinformatics but remains a bottleneck for researchers with limited coding expertise. While Large Language Models (LLMs) are proficient at generating code, they often fail in practice due to library dependencies, dataset mismatches, or syntax errors. These issues require manual intervention, slowing data interpretation.

Results: We present ggplotAgent, a novel multi-modal, self-debugging artificial intelligence agent that automates publication-ready ggplot2 visualizations. It features a dual-layered framework that resolves code execution errors and uses a vision-enabled agent to verify aesthetic correctness. In benchmarks against the DeepSeek-V3 model, ggplotAgent achieved a 100% code executability rate(versus 85%) and a "Publication-Ready" score of 1.9 (versus 0.7). Surprisingly, it showcased the ability to act as an expert collaborator by intelligently enhancing plots beyond the user's literal prompt, achieving a positive Insight Score of +0.3 over than the baseline (-0.05). These results demonstrate its ability to reliably produce accurate, high-quality visualizations directly from natural language.

Availability and implementation: ggplotAgent is freely accessible as a public web application at https://ggplotagent.databio1.com/ and an offline Streamlit app. The source code is available on GitHub at https://github.com/charlin90/ggplotAgent. This software is distributed under the MIT License.

动机:创建出版质量的可视化对于生物信息学来说是必不可少的,但对于编码专业知识有限的研究人员来说仍然是一个瓶颈。虽然大型语言模型(llm)精通生成代码,但由于库依赖、数据集不匹配或语法错误,它们经常在实践中失败。这些问题需要人工干预,降低了数据解释的速度。结果:我们提出了ggplotAgent,这是一个新颖的多模态、自调试的人工智能代理,可以自动实现可发表的ggplot2可视化。它采用了一个双层框架来解决代码执行错误,并使用支持视觉的代理来验证美学正确性。在针对DeepSeek-V3模型的基准测试中,ggplotAgent实现了100%的代码可执行率(相对于85%)和1.9的“发布就绪”分数(相对于0.7)。令人惊讶的是,它展示了作为专家合作者的能力,在用户的文字提示之外,通过智能地增强情节,获得了比基线(-0.05)更高的0.3分。这些结果证明了它能够直接从自然语言可靠地生成准确、高质量的可视化。可用性和实现:ggplotAgent作为公共web应用程序可在https://ggplotagent.databio1.com/和离线Streamlit应用程序免费访问。源代码可在GitHub上获得https://github.com/charlin90/ggplotAgent。本软件在MIT许可下发布。
{"title":"ggplotAgent: a self-debugging multi-modal agent for robust and reproducible scientific visualization.","authors":"Zelin Wang, Yuanyuan Yin, Jien Wang, Haiyan Yan, Xuan Xie, Yiqing Zheng","doi":"10.1093/bioadv/vbaf332","DOIUrl":"10.1093/bioadv/vbaf332","url":null,"abstract":"<p><strong>Motivation: </strong>Creating publication-quality visualizations is essential for bioinformatics but remains a bottleneck for researchers with limited coding expertise. While Large Language Models (LLMs) are proficient at generating code, they often fail in practice due to library dependencies, dataset mismatches, or syntax errors. These issues require manual intervention, slowing data interpretation.</p><p><strong>Results: </strong>We present ggplotAgent, a novel multi-modal, self-debugging artificial intelligence agent that automates publication-ready ggplot2 visualizations. It features a dual-layered framework that resolves code execution errors and uses a vision-enabled agent to verify aesthetic correctness. In benchmarks against the DeepSeek-V3 model, ggplotAgent achieved a 100% code executability rate(versus 85%) and a \"Publication-Ready\" score of 1.9 (versus 0.7). Surprisingly, it showcased the ability to act as an expert collaborator by intelligently enhancing plots beyond the user's literal prompt, achieving a positive Insight Score of +0.3 over than the baseline (-0.05). These results demonstrate its ability to reliably produce accurate, high-quality visualizations directly from natural language.</p><p><strong>Availability and implementation: </strong>ggplotAgent is freely accessible as a public web application at https://ggplotagent.databio1.com/ and an offline Streamlit app. The source code is available on GitHub at https://github.com/charlin90/ggplotAgent. This software is distributed under the MIT License.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf332"},"PeriodicalIF":2.8,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12802885/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics advances
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1