首页 > 最新文献

Bioinformatics advances最新文献

英文 中文
VST-DAVis: an R Shiny application and web-browser for spatial transcriptomics data analysis and visualization. VST-DAVis:一个用于空间转录组学数据分析和可视化的R Shiny应用程序和web浏览器。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-09 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbag007
Sankarasubramanian Jagadesan, Chittibabu Guda

Summary: Visium HD Spatial Transcriptomics Data Analysis and Visualization (VST-DAVis) is an interactive, R Shiny application and web browser designed for intuitive analysis of spatial transcriptomics data generated using the 10x Genomics Visium HD platform. This user-friendly tool empowers researchers, particularly those without programming expertise, to perform end-to-end spatial transcriptomics analysis through a streamlined graphical interface. The platform is capable of handling both single and multiple samples, enabling comparative analyses across diverse biological conditions or replicates. It accepts various input formats including both H5 and matrix-based files from Space Ranger and outputs high-quality graphics from various visualization tools. VST-DAVis integrates several widely used R packages, such as Seurat, Monocle3, CellChat, and hdWGCNA, to offer a robust and flexible analytical environment that supports a wide range of analytical tasks, including quality control, clustering, marker gene identification, subclustering, trajectory inference, pathway enrichment analysis, cell-cell communication modeling, co-expression analysis, and transcription factor network reconstruction. By combining its analytical depth with user-friendliness, VST-DAVis makes advanced analyses accessible to various research communities that utilize spatial transcriptomics data.

Availability and implementation: VST-DAVis is freely available at https://www.gudalab-rtools.net/VST-DAVis. It is implemented in R 4.5.2 and Bioconductor ≥ 3.22 using the Shiny framework and supports input from Space Ranger outputs. The source code and documentation are hosted on GitHub: https://github.com/GudaLab/VST-DAVis.

Visium HD空间转录组学数据分析和可视化(VST-DAVis)是一个交互式的R Shiny应用程序和web浏览器,专为使用10x Genomics Visium HD平台生成的空间转录组学数据进行直观分析而设计。这个用户友好的工具使研究人员,特别是那些没有编程专业知识的研究人员,能够通过简化的图形界面执行端到端的空间转录组学分析。该平台能够处理单个和多个样本,能够在不同的生物条件或复制中进行比较分析。它接受各种输入格式,包括来自太空游侠的H5和基于矩阵的文件,并从各种可视化工具输出高质量的图形。VST-DAVis集成了几个广泛使用的R软件包,如Seurat、Monocle3、CellChat和hdWGCNA,提供了一个强大而灵活的分析环境,支持广泛的分析任务,包括质量控制、聚类、标记基因鉴定、亚聚类、轨迹推断、途径富集分析、细胞-细胞通信建模、共表达分析和转录因子网络重建。通过将其分析深度与用户友好性相结合,VST-DAVis为利用空间转录组学数据的各种研究团体提供了先进的分析。可用性和实现:VST-DAVis可在https://www.gudalab-rtools.net/VST-DAVis免费获得。它在R 4.5.2和Bioconductor≥3.22中使用Shiny框架实现,并支持来自Space Ranger输出的输入。源代码和文档托管在GitHub上:https://github.com/GudaLab/VST-DAVis。
{"title":"VST-DAVis: an R Shiny application and web-browser for spatial transcriptomics data analysis and visualization.","authors":"Sankarasubramanian Jagadesan, Chittibabu Guda","doi":"10.1093/bioadv/vbag007","DOIUrl":"10.1093/bioadv/vbag007","url":null,"abstract":"<p><strong>Summary: </strong>Visium HD Spatial Transcriptomics Data Analysis and Visualization (VST-DAVis) is an interactive, R Shiny application and web browser designed for intuitive analysis of spatial transcriptomics data generated using the 10x Genomics Visium HD platform. This user-friendly tool empowers researchers, particularly those without programming expertise, to perform end-to-end spatial transcriptomics analysis through a streamlined graphical interface. The platform is capable of handling both single and multiple samples, enabling comparative analyses across diverse biological conditions or replicates. It accepts various input formats including both H5 and matrix-based files from Space Ranger and outputs high-quality graphics from various visualization tools. VST-DAVis integrates several widely used R packages, such as Seurat, Monocle3, CellChat, and hdWGCNA, to offer a robust and flexible analytical environment that supports a wide range of analytical tasks, including quality control, clustering, marker gene identification, subclustering, trajectory inference, pathway enrichment analysis, cell-cell communication modeling, co-expression analysis, and transcription factor network reconstruction. By combining its analytical depth with user-friendliness, VST-DAVis makes advanced analyses accessible to various research communities that utilize spatial transcriptomics data.</p><p><strong>Availability and implementation: </strong>VST-DAVis is freely available at https://www.gudalab-rtools.net/VST-DAVis. It is implemented in R 4.5.2 and Bioconductor ≥ 3.22 using the Shiny framework and supports input from Space Ranger outputs. The source code and documentation are hosted on GitHub: https://github.com/GudaLab/VST-DAVis.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag007"},"PeriodicalIF":2.8,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866912/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EC-Bench: a benchmark for enzyme commission number prediction. EC-Bench:酶委托数预测的基准。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-08 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbag004
Saeedeh Davoudi, Christopher S Henry, Christopher S Miller, Farnoush Banaei-Kashani

Motivation: Enzymes are proteins that catalyze specific biochemical reactions in cells. Enzyme Commission (EC) numbers are used to annotate enzymes in a four-level hierarchy that classifies enzymes based on the specific chemical reactions they catalyze. Accurate EC number prediction is essential for understanding enzyme functions. Despite the availability of numerous methods for predicting EC numbers from protein sequences, there is no unified framework for evaluating and studying such methods systematically. This gap limits the ability of the community to identify the most effective approaches for enzyme annotation.

Results: We introduce EC-Bench, a benchmark for EC number prediction, consisting of (i) an initial representative set of existing methods (including homology-based, deep learning, contrastive learning, and language model methods), (ii) existing and novel accuracy and efficiency performance metrics, and (iii) selected datasets to allow for comprehensive comparative study. EC-Bench is open-source and provides a framework for researchers to not only compare among existing methods objectively under uniform conditions, but also to introduce and effectively evaluate performance of new methods in a comparative framework. To demonstrate the utility of EC-Bench, we perform extensive experimentation to compare the existing EC number prediction methods and establish their advantages and disadvantages in a variety of prediction tasks, namely "exact EC number prediction," "EC number completion," and (partial or additional) "EC number recommendation." We find wide variation in the performance of different methods, but also subtle but potentially useful differences in the performance of different methods across tasks and for different parts of the EC hierarchy.

Availability and implementation: The benchmarking pipeline is available at https://github.com/dsaeedeh/EC-Bench.

动机:酶是细胞中催化特定生化反应的蛋白质。酶委员会(Enzyme Commission, EC)编号用于对酶进行注释,酶根据其催化的特定化学反应进行四级分类。准确的EC数预测对于理解酶的功能至关重要。尽管有许多方法可以从蛋白质序列中预测EC数,但没有一个统一的框架来系统地评估和研究这些方法。这种差距限制了社区确定最有效的酶注释方法的能力。结果:我们介绍了EC- bench,这是一个EC数预测的基准,由(i)现有方法的初始代表性集(包括基于同构的、深度学习的、对比学习的和语言模型方法),(ii)现有的和新的准确性和效率性能指标,以及(iii)选择的数据集进行全面的比较研究。EC-Bench是开源的,为研究人员提供了一个框架,不仅可以在统一条件下客观地比较现有方法,还可以在比较框架下介绍和有效评估新方法的性能。为了证明EC- bench的实用性,我们进行了大量的实验来比较现有的EC数预测方法,并确定它们在各种预测任务中的优缺点,即“精确EC数预测”、“EC数完成”和(部分或额外)“EC数推荐”。我们发现不同方法的性能差异很大,但在不同任务和EC层次结构的不同部分,不同方法的性能差异也很微妙,但可能有用。可用性和实现:基准测试管道可在https://github.com/dsaeedeh/EC-Bench上获得。
{"title":"EC-Bench: a benchmark for enzyme commission number prediction.","authors":"Saeedeh Davoudi, Christopher S Henry, Christopher S Miller, Farnoush Banaei-Kashani","doi":"10.1093/bioadv/vbag004","DOIUrl":"https://doi.org/10.1093/bioadv/vbag004","url":null,"abstract":"<p><strong>Motivation: </strong>Enzymes are proteins that catalyze specific biochemical reactions in cells. Enzyme Commission (EC) numbers are used to annotate enzymes in a four-level hierarchy that classifies enzymes based on the specific chemical reactions they catalyze. Accurate EC number prediction is essential for understanding enzyme functions. Despite the availability of numerous methods for predicting EC numbers from protein sequences, there is no unified framework for evaluating and studying such methods systematically. This gap limits the ability of the community to identify the most effective approaches for enzyme annotation.</p><p><strong>Results: </strong>We introduce EC-Bench, a benchmark for EC number prediction, consisting of (i) an initial representative set of existing methods (including homology-based, deep learning, contrastive learning, and language model methods), (ii) existing and novel accuracy and efficiency performance metrics, and (iii) selected datasets to allow for comprehensive comparative study. EC-Bench is open-source and provides a framework for researchers to not only compare among existing methods objectively under uniform conditions, but also to introduce and effectively evaluate performance of new methods in a comparative framework. To demonstrate the utility of EC-Bench, we perform extensive experimentation to compare the existing EC number prediction methods and establish their advantages and disadvantages in a variety of prediction tasks, namely \"exact EC number prediction,\" \"EC number completion,\" and (partial or additional) \"EC number recommendation.\" We find wide variation in the performance of different methods, but also subtle but potentially useful differences in the performance of different methods across tasks and for different parts of the EC hierarchy.</p><p><strong>Availability and implementation: </strong>The benchmarking pipeline is available at https://github.com/dsaeedeh/EC-Bench.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag004"},"PeriodicalIF":2.8,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12889163/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146168086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The nutrition toolbox permits in silico generation, analysis, and optimization of personalized diets through metabolic modelling. 营养工具箱允许通过代谢模型生成、分析和优化个性化饮食。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-08 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf325
Bram Nap, Bronson Weston, Annette Brandt, Maximilian F Wodak, Ina Bergheim, Ines Thiele

Motivation: Nutrition is an important factor in human health, used to alleviate or prevent symptoms of various diseases. However, the effects of nutrition on the gut microbiome and human metabolism are not well understood. Whole-body metabolic models (WBMs) have been applied to study relationships between regional diets and human/microbiome metabolism. This method requires diets to be defined at the metabolite level, rather than the food item level, which has gated the application of personalized diets to WBMs.

Results: We developed the Nutrition Toolbox, which leverages open-source databases containing metabolite composition for over ten thousand food items to convert food items into their metabolic composition to create in silico diets. Additionally, when used with a previously published nutrition algorithm, minimal changes to a diet can be identified to achieve desirable shifts in human and microbiome metabolism. Taken together, we believe that the Nutrition Toolbox can help to understand the effects of nutrition on human metabolism and has the potential to contribute to personalized nutrition.

Availability and implementation: The Nutrition Toolbox is written in MATLAB. The code can be found at https://github.com/opencobra/cobratoolbox. A tutorial explaining the code is available in the COBRA toolbox and as view-only supplementary tutorial. Details on installing the COBRA toolbox are available at https://opencobra.github.io/cobratoolbox/stable/installation.html.

动机:营养是人类健康的重要因素,用于减轻或预防各种疾病的症状。然而,营养对肠道微生物群和人体代谢的影响尚不清楚。全身代谢模型(WBMs)已被应用于研究区域饮食与人体/微生物组代谢之间的关系。这种方法要求在代谢物水平上定义饮食,而不是在食物项目水平上定义饮食,这限制了个性化饮食在体重增加者中的应用。结果:我们开发了营养工具箱,它利用包含超过一万种食物的代谢物组成的开源数据库,将食物转化为它们的代谢组成,以创建硅化饮食。此外,当与先前发表的营养算法一起使用时,可以确定饮食的最小变化,以实现人体和微生物组代谢的理想变化。综上所述,我们相信营养工具箱可以帮助理解营养对人体代谢的影响,并有可能为个性化营养做出贡献。可用性和实现:营养工具箱是用MATLAB编写的。代码可以在https://github.com/opencobra/cobratoolbox上找到。在COBRA工具箱中可以找到解释代码的教程,也可以作为仅视图的补充教程。有关安装COBRA工具箱的详细信息,请访问https://opencobra.github.io/cobratoolbox/stable/installation.html。
{"title":"The nutrition toolbox permits <i>in silico</i> generation, analysis, and optimization of personalized diets through metabolic modelling.","authors":"Bram Nap, Bronson Weston, Annette Brandt, Maximilian F Wodak, Ina Bergheim, Ines Thiele","doi":"10.1093/bioadv/vbaf325","DOIUrl":"10.1093/bioadv/vbaf325","url":null,"abstract":"<p><strong>Motivation: </strong>Nutrition is an important factor in human health, used to alleviate or prevent symptoms of various diseases. However, the effects of nutrition on the gut microbiome and human metabolism are not well understood. Whole-body metabolic models (WBMs) have been applied to study relationships between regional diets and human/microbiome metabolism. This method requires diets to be defined at the metabolite level, rather than the food item level, which has gated the application of personalized diets to WBMs.</p><p><strong>Results: </strong>We developed the Nutrition Toolbox, which leverages open-source databases containing metabolite composition for over ten thousand food items to convert food items into their metabolic composition to create <i>in silico</i> diets. Additionally, when used with a previously published nutrition algorithm, minimal changes to a diet can be identified to achieve desirable shifts in human and microbiome metabolism. Taken together, we believe that the Nutrition Toolbox can help to understand the effects of nutrition on human metabolism and has the potential to contribute to personalized nutrition.</p><p><strong>Availability and implementation: </strong>The Nutrition Toolbox is written in MATLAB. The code can be found at https://github.com/opencobra/cobratoolbox. A tutorial explaining the code is available in the COBRA toolbox and as view-only supplementary tutorial. Details on installing the COBRA toolbox are available at https://opencobra.github.io/cobratoolbox/stable/installation.html.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf325"},"PeriodicalIF":2.8,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12820401/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CoV-UniBind: a unified antibody binding database for SARS-CoV-2. CoV-UniBind: SARS-CoV-2的统一抗体结合数据库。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-08 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf328
Aryan Bhasin, Francesco Saccon, Callum Canavan, Andrew Robson, Joao Euko, Alexandra C Walls, Yunguan Fu

Summary: Since the emergence of SARS-CoV-2, numerous studies have investigated antibody interactions with viral variants in vitro, and several datasets have been curated to compile available protein structures and experimental measurements. However, existing data remain fragmented, limiting their utility for the development and validation of machine learning models for antibody-antigen interaction prediction. Here, we present CoV-UniBind, a unified database comprising over 75 000 entries of SARS-CoV-2 antibody-antigen sequence, binding, and structural data, integrated and standardized from three public sources and multiple peer-reviewed publications. To demonstrate its utility, we benchmarked multiple protein folding, inverse folding, and language models across tasks relevant to antibody design and vaccine development. We expect CoV-UniBind to facilitate future computational efforts in antibody and vaccine development against SARS-CoV-2.

Availability and implementation: The curated datasets, model scores and antibody synonyms are free to download at https://huggingface.co/datasets/InstaDeepAI/cov-unibind. Folded structures are available upon request.

摘要:自SARS-CoV-2出现以来,许多研究调查了抗体在体外与病毒变体的相互作用,并整理了几个数据集,以汇编可用的蛋白质结构和实验测量结果。然而,现有的数据仍然是碎片化的,限制了它们在用于抗体-抗原相互作用预测的机器学习模型的开发和验证中的效用。在这里,我们提出了CoV-UniBind,这是一个统一的数据库,包含超过75,000条SARS-CoV-2抗体-抗原序列,结合和结构数据,整合和标准化,来自三个公共来源和多个同行评审出版物。为了证明其实用性,我们在与抗体设计和疫苗开发相关的任务中对多种蛋白质折叠、逆折叠和语言模型进行了基准测试。我们希望CoV-UniBind能够促进未来针对SARS-CoV-2的抗体和疫苗开发的计算工作。可用性和实现:策划的数据集,模型分数和抗体同义词在https://huggingface.co/datasets/InstaDeepAI/cov-unibind免费下载。可根据要求提供折叠结构。
{"title":"CoV-UniBind: a unified antibody binding database for SARS-CoV-2.","authors":"Aryan Bhasin, Francesco Saccon, Callum Canavan, Andrew Robson, Joao Euko, Alexandra C Walls, Yunguan Fu","doi":"10.1093/bioadv/vbaf328","DOIUrl":"10.1093/bioadv/vbaf328","url":null,"abstract":"<p><strong>Summary: </strong>Since the emergence of SARS-CoV-2, numerous studies have investigated antibody interactions with viral variants <i>in vitro</i>, and several datasets have been curated to compile available protein structures and experimental measurements. However, existing data remain fragmented, limiting their utility for the development and validation of machine learning models for antibody-antigen interaction prediction. Here, we present CoV-UniBind, a unified database comprising over 75 000 entries of SARS-CoV-2 antibody-antigen sequence, binding, and structural data, integrated and standardized from three public sources and multiple peer-reviewed publications. To demonstrate its utility, we benchmarked multiple protein folding, inverse folding, and language models across tasks relevant to antibody design and vaccine development. We expect CoV-UniBind to facilitate future computational efforts in antibody and vaccine development against SARS-CoV-2.</p><p><strong>Availability and implementation: </strong>The curated datasets, model scores and antibody synonyms are free to download at https://huggingface.co/datasets/InstaDeepAI/cov-unibind. Folded structures are available upon request.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf328"},"PeriodicalIF":2.8,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12800777/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pipeasm: a tool for automated large chromosome-scale genome assembly and evaluation. Pipeasm:一个自动化的大染色体规模基因组组装和评估工具。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-02 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf326
Bruno Marques Silva, Fernanda de Jesus Trindade, Lucas Eduardo Costa Canesin, Giordano Souza, Alexandre Aleixo, Gisele Nunes, Renato Renison Moreira-Oliveira

Motivation: Although high-quality chromosome-scale genome assemblies are feasible, assembling large ones remains complex and resource-intensive. This demands reproducible and automated workflows that not only implement current best practices efficiently but also allow for improvement alongside future updates to those standards.

Results: We present Pipeasm, a Snakemake-based genome assembly pipeline containerized with Singularity. Pipeasm can use HiFi, ONT, and Hi-C data, automating read trimming, nuclear and mitogenome assembly, scaffolding, decontamination, and quality evaluation. Applied to four vertebrate species with distinct genomic characteristics, starting from a single command line and configuration file, it produced assemblies with scaffold L50 proportional to the expected chromosome and genome length, and up to 99.6% BUSCO completeness. Its output also includes detailed reports for each step, genome statistics, Hi-C maps, and files ready for curation.

Availability and implementation: Pipeasm is available at https://github.com/itvgenomics/pipeasm, implemented in Python/Snakemake with Singularity, and runs on Unix-based systems.

动机:虽然高质量的染色体规模基因组组装是可行的,但组装大型基因组仍然是复杂和资源密集的。这就需要可重复的自动化工作流,它不仅要有效地实现当前的最佳实践,而且还要允许对这些标准进行改进和未来的更新。结果:我们提出了Pipeasm,一种基于snakemaker的基因组组装管道,其中包含了Singularity。Pipeasm可以使用HiFi, ONT和Hi-C数据,自动读取修剪,核和有丝分裂基因组组装,脚手架,去污染和质量评估。应用于四种具有不同基因组特征的脊椎动物,从单个命令行和配置文件开始,它产生的支架L50与预期的染色体和基因组长度成比例,BUSCO完整性高达99.6%。它的输出还包括每个步骤的详细报告、基因组统计、Hi-C地图和准备管理的文件。可用性和实现:Pipeasm在https://github.com/itvgenomics/pipeasm上可用,用Python/Snakemake与Singularity实现,并运行在基于unix的系统上。
{"title":"Pipeasm: a tool for automated large chromosome-scale genome assembly and evaluation.","authors":"Bruno Marques Silva, Fernanda de Jesus Trindade, Lucas Eduardo Costa Canesin, Giordano Souza, Alexandre Aleixo, Gisele Nunes, Renato Renison Moreira-Oliveira","doi":"10.1093/bioadv/vbaf326","DOIUrl":"10.1093/bioadv/vbaf326","url":null,"abstract":"<p><strong>Motivation: </strong>Although high-quality chromosome-scale genome assemblies are feasible, assembling large ones remains complex and resource-intensive. This demands reproducible and automated workflows that not only implement current best practices efficiently but also allow for improvement alongside future updates to those standards.</p><p><strong>Results: </strong>We present Pipeasm, a Snakemake-based genome assembly pipeline containerized with Singularity. Pipeasm can use HiFi, ONT, and Hi-C data, automating read trimming, nuclear and mitogenome assembly, scaffolding, decontamination, and quality evaluation. Applied to four vertebrate species with distinct genomic characteristics, starting from a single command line and configuration file, it produced assemblies with scaffold L50 proportional to the expected chromosome and genome length, and up to 99.6% BUSCO completeness. Its output also includes detailed reports for each step, genome statistics, Hi-C maps, and files ready for curation.</p><p><strong>Availability and implementation: </strong>Pipeasm is available at https://github.com/itvgenomics/pipeasm, implemented in Python/Snakemake with Singularity, and runs on Unix-based systems.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf326"},"PeriodicalIF":2.8,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12800776/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ggplotAgent: a self-debugging multi-modal agent for robust and reproducible scientific visualization. ggplotAgent:一个自调试的多模态代理,用于稳健和可重复的科学可视化。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-02 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf332
Zelin Wang, Yuanyuan Yin, Jien Wang, Haiyan Yan, Xuan Xie, Yiqing Zheng

Motivation: Creating publication-quality visualizations is essential for bioinformatics but remains a bottleneck for researchers with limited coding expertise. While Large Language Models (LLMs) are proficient at generating code, they often fail in practice due to library dependencies, dataset mismatches, or syntax errors. These issues require manual intervention, slowing data interpretation.

Results: We present ggplotAgent, a novel multi-modal, self-debugging artificial intelligence agent that automates publication-ready ggplot2 visualizations. It features a dual-layered framework that resolves code execution errors and uses a vision-enabled agent to verify aesthetic correctness. In benchmarks against the DeepSeek-V3 model, ggplotAgent achieved a 100% code executability rate(versus 85%) and a "Publication-Ready" score of 1.9 (versus 0.7). Surprisingly, it showcased the ability to act as an expert collaborator by intelligently enhancing plots beyond the user's literal prompt, achieving a positive Insight Score of +0.3 over than the baseline (-0.05). These results demonstrate its ability to reliably produce accurate, high-quality visualizations directly from natural language.

Availability and implementation: ggplotAgent is freely accessible as a public web application at https://ggplotagent.databio1.com/ and an offline Streamlit app. The source code is available on GitHub at https://github.com/charlin90/ggplotAgent. This software is distributed under the MIT License.

动机:创建出版质量的可视化对于生物信息学来说是必不可少的,但对于编码专业知识有限的研究人员来说仍然是一个瓶颈。虽然大型语言模型(llm)精通生成代码,但由于库依赖、数据集不匹配或语法错误,它们经常在实践中失败。这些问题需要人工干预,降低了数据解释的速度。结果:我们提出了ggplotAgent,这是一个新颖的多模态、自调试的人工智能代理,可以自动实现可发表的ggplot2可视化。它采用了一个双层框架来解决代码执行错误,并使用支持视觉的代理来验证美学正确性。在针对DeepSeek-V3模型的基准测试中,ggplotAgent实现了100%的代码可执行率(相对于85%)和1.9的“发布就绪”分数(相对于0.7)。令人惊讶的是,它展示了作为专家合作者的能力,在用户的文字提示之外,通过智能地增强情节,获得了比基线(-0.05)更高的0.3分。这些结果证明了它能够直接从自然语言可靠地生成准确、高质量的可视化。可用性和实现:ggplotAgent作为公共web应用程序可在https://ggplotagent.databio1.com/和离线Streamlit应用程序免费访问。源代码可在GitHub上获得https://github.com/charlin90/ggplotAgent。本软件在MIT许可下发布。
{"title":"ggplotAgent: a self-debugging multi-modal agent for robust and reproducible scientific visualization.","authors":"Zelin Wang, Yuanyuan Yin, Jien Wang, Haiyan Yan, Xuan Xie, Yiqing Zheng","doi":"10.1093/bioadv/vbaf332","DOIUrl":"10.1093/bioadv/vbaf332","url":null,"abstract":"<p><strong>Motivation: </strong>Creating publication-quality visualizations is essential for bioinformatics but remains a bottleneck for researchers with limited coding expertise. While Large Language Models (LLMs) are proficient at generating code, they often fail in practice due to library dependencies, dataset mismatches, or syntax errors. These issues require manual intervention, slowing data interpretation.</p><p><strong>Results: </strong>We present ggplotAgent, a novel multi-modal, self-debugging artificial intelligence agent that automates publication-ready ggplot2 visualizations. It features a dual-layered framework that resolves code execution errors and uses a vision-enabled agent to verify aesthetic correctness. In benchmarks against the DeepSeek-V3 model, ggplotAgent achieved a 100% code executability rate(versus 85%) and a \"Publication-Ready\" score of 1.9 (versus 0.7). Surprisingly, it showcased the ability to act as an expert collaborator by intelligently enhancing plots beyond the user's literal prompt, achieving a positive Insight Score of +0.3 over than the baseline (-0.05). These results demonstrate its ability to reliably produce accurate, high-quality visualizations directly from natural language.</p><p><strong>Availability and implementation: </strong>ggplotAgent is freely accessible as a public web application at https://ggplotagent.databio1.com/ and an offline Streamlit app. The source code is available on GitHub at https://github.com/charlin90/ggplotAgent. This software is distributed under the MIT License.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf332"},"PeriodicalIF":2.8,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12802885/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Puzzler: scalable one-command platinum-quality genome assembly from HiFi and Hi-C. 难题:可扩展的单命令白金质量基因组组装从HiFi和Hi-C。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-31 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf329
Justin Merondun, Qingyi Yu

Motivation: Chromosome-level assemblies are essential for modern genomics, from comparative genomics and evolutionary studies to precision breeding. While integrated HiFi and Hi-C data now enable accurate chromosome-scale genome assemblies, the bioinformatic process remains complex and involves specialized tools and expertise. With large-scale pan-genomic efforts requiring dozens to hundreds of platinum quality chromosome-scale genomes, there is a need for scalable, portable, and user-friendly pipelines that streamline and standardize high-quality genome assembly workflows.

Results: We introduce Puzzler, a containerized, scalable pipeline for chromosome-scale de novo genome assembly using PacBio HiFi and Hi-C data. Designed for portability and minimal user input, Puzzler automates contig assembly, duplicate purging, Hi-C-based scaffolding, and chromosome assignment via synteny, even with highly diverged reference taxa. Optional modules generate input files for manual Hi-C curation or operate reference-free. Quality control is integrated and includes Hi-C contact maps, BUSCO, yak k-mer completeness, and BlobTools contamination screening. A checkpointing system ensures that previously completed tasks are not re-executed, while a simple sample sheet input structure supports scalable batch processing. Puzzler has been validated on genomes ranging from 24 Mbp to 6.5 Gbp, delivering highly contiguous assemblies with <10 min of user input, enabling high-throughput platinum-quality genome assembly.

Availability and implementation: Puzzler is released into the public domain under 17 U.S.C. §105. Source code, documentation, and tutorials are available at https://github.com/merondun/puzzler and archived on Zenodo: https://doi.org/10.5281/zenodo.15733730 and https://doi.org/10.5281/zenodo.15693025. Pre-configured runtime environments including dependencies are provided via both a Conda environment (https://anaconda.org/heritabilities/puzzler) and an Apptainer hosted both on Zenodo and Sylabs (https://cloud.sylabs.io/library/merondun/default/puzzler).

动机:染色体水平的组装对于现代基因组学是必不可少的,从比较基因组学和进化研究到精确育种。虽然集成的HiFi和Hi-C数据现在可以实现精确的染色体尺度基因组组装,但生物信息学过程仍然复杂,需要专门的工具和专业知识。由于大规模的泛基因组工作需要数十到数百个铂级染色体基因组,因此需要可扩展、便携和用户友好的管道来简化和标准化高质量的基因组组装工作流程。结果:我们介绍了Puzzler,这是一个容器化的,可扩展的流水线,用于使用PacBio HiFi和Hi-C数据进行染色体尺度的从头基因组组装。Puzzler专为可移植性和最小的用户输入而设计,即使具有高度分化的参考分类群,也可以通过同音性自动进行配置组装,重复清除,基于hi - c的脚手架和染色体分配。可选模块生成输入文件,手动Hi-C管理或操作参考自由。质量控制是集成的,包括Hi-C接触图,BUSCO,牦牛k-mer完整性和BlobTools污染筛选。检查点系统确保以前完成的任务不会被重新执行,而简单的样本表输入结构支持可扩展的批处理。Puzzler已在24mbp至6.5 Gbp的基因组上进行了验证,提供具有可用性和实现的高度连续的组装:Puzzler根据17 U.S.C.§105发布到公共领域。源代码、文档和教程可在https://github.com/merondun/puzzler上获得,并可在Zenodo上存档:https://doi.org/10.5281/zenodo.15733730和https://doi.org/10.5281/zenodo.15693025。预配置的运行时环境(包括依赖项)通过Conda环境(https://anaconda.org/heritabilities/puzzler)和在Zenodo和Sylabs (https://cloud.sylabs.io/library/merondun/default/puzzler)上托管的Apptainer提供。
{"title":"Puzzler: scalable one-command platinum-quality genome assembly from HiFi and Hi-C.","authors":"Justin Merondun, Qingyi Yu","doi":"10.1093/bioadv/vbaf329","DOIUrl":"10.1093/bioadv/vbaf329","url":null,"abstract":"<p><strong>Motivation: </strong>Chromosome-level assemblies are essential for modern genomics, from comparative genomics and evolutionary studies to precision breeding. While integrated HiFi and Hi-C data now enable accurate chromosome-scale genome assemblies, the bioinformatic process remains complex and involves specialized tools and expertise. With large-scale pan-genomic efforts requiring dozens to hundreds of platinum quality chromosome-scale genomes, there is a need for scalable, portable, and user-friendly pipelines that streamline and standardize high-quality genome assembly workflows.</p><p><strong>Results: </strong>We introduce Puzzler, a containerized, scalable pipeline for chromosome-scale <i>de novo</i> genome assembly using PacBio HiFi and Hi-C data. Designed for portability and minimal user input, Puzzler automates contig assembly, duplicate purging, Hi-C-based scaffolding, and chromosome assignment via synteny, even with highly diverged reference taxa. Optional modules generate input files for manual Hi-C curation or operate reference-free. Quality control is integrated and includes Hi-C contact maps, BUSCO, yak k-mer completeness, and BlobTools contamination screening. A checkpointing system ensures that previously completed tasks are not re-executed, while a simple sample sheet input structure supports scalable batch processing. Puzzler has been validated on genomes ranging from 24 Mbp to 6.5 Gbp, delivering highly contiguous assemblies with <10 min of user input, enabling high-throughput platinum-quality genome assembly.</p><p><strong>Availability and implementation: </strong>Puzzler is released into the public domain under 17 U.S.C. §105. Source code, documentation, and tutorials are available at https://github.com/merondun/puzzler and archived on Zenodo: https://doi.org/10.5281/zenodo.15733730 and https://doi.org/10.5281/zenodo.15693025. Pre-configured runtime environments including dependencies are provided via both a Conda environment (https://anaconda.org/heritabilities/puzzler) and an Apptainer hosted both on Zenodo and Sylabs (https://cloud.sylabs.io/library/merondun/default/puzzler).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf329"},"PeriodicalIF":2.8,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12820402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fluoro-forest: a random forest workflow for cell type annotation in high-dimensional immunofluorescence imaging with limited training data. Fluoro-forest:一个随机森林工作流,用于高维免疫荧光成像中有限训练数据的细胞类型注释。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-24 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf320
Joshua Brand, Wei Zhang, Evie Carchman, Huy Q Dinh

Motivation: Cyclic immunofluorescence (IF) techniques enable deep phenotyping of cells and help quantify tissue organization at high resolution. Due to its high dimensionality, workflows typically rely on unsupervised clustering, followed by cell type annotation at a cluster level for cell type assignment. Most of these methods use marker expression averages that lack a statistical evaluation of cell type annotations, which can result in misclassification. Here, we propose a strategy through an end-to-end pipeline using a semi-supervised, random forest approach to predict cell type annotations.

Results: Our method includes cluster-based sampling for training data, cell type prediction, and downstream visualization for interpretability of cell annotation that ultimately improves classification results. We show that our workflow can annotate cells more accurately compared to representative deep learning and probabilistic methods, with a training set <5% of the total number of cells tested. In addition, our pipeline outputs cell type probabilities and model performance metrics for users to decide if it could boost their existing clustering-based workflow results for complex IF data.

Availability and implementation: Fluoro-forest is freely available on GitHub under an MIT license (https://github.com/Josh-Brand/Fluoro-forest).

动机:循环免疫荧光(IF)技术可以实现细胞的深度表型,并有助于高分辨率量化组织组织。由于其高维性,工作流通常依赖于无监督聚类,然后在聚类级别上进行单元类型注释以进行单元类型分配。这些方法大多使用缺乏细胞类型注释统计评估的标记表达平均值,这可能导致错误分类。在这里,我们提出了一种策略,通过端到端管道使用半监督,随机森林方法来预测细胞类型注释。结果:我们的方法包括基于聚类的训练数据采样,细胞类型预测,以及最终提高分类结果的细胞注释可解释性的下游可视化。我们证明,与代表性的深度学习和概率方法相比,我们的工作流可以更准确地注释细胞,使用训练集。可用性和实现:Fluoro-forest在麻省理工学院许可(https://github.com/Josh-Brand/Fluoro-forest)下在GitHub上免费提供。
{"title":"Fluoro-forest: a random forest workflow for cell type annotation in high-dimensional immunofluorescence imaging with limited training data.","authors":"Joshua Brand, Wei Zhang, Evie Carchman, Huy Q Dinh","doi":"10.1093/bioadv/vbaf320","DOIUrl":"10.1093/bioadv/vbaf320","url":null,"abstract":"<p><strong>Motivation: </strong>Cyclic immunofluorescence (IF) techniques enable deep phenotyping of cells and help quantify tissue organization at high resolution. Due to its high dimensionality, workflows typically rely on unsupervised clustering, followed by cell type annotation at a cluster level for cell type assignment. Most of these methods use marker expression averages that lack a statistical evaluation of cell type annotations, which can result in misclassification. Here, we propose a strategy through an end-to-end pipeline using a semi-supervised, random forest approach to predict cell type annotations.</p><p><strong>Results: </strong>Our method includes cluster-based sampling for training data, cell type prediction, and downstream visualization for interpretability of cell annotation that ultimately improves classification results. We show that our workflow can annotate cells more accurately compared to representative deep learning and probabilistic methods, with a training set <5% of the total number of cells tested. In addition, our pipeline outputs cell type probabilities and model performance metrics for users to decide if it could boost their existing clustering-based workflow results for complex IF data.</p><p><strong>Availability and implementation: </strong>Fluoro-forest is freely available on GitHub under an MIT license (https://github.com/Josh-Brand/Fluoro-forest).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf320"},"PeriodicalIF":2.8,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12782655/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145954108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prompt-to-Pill: Multi-Agent Drug Discovery and Clinical Simulation Pipeline. 快速到药丸:多药物发现和临床模拟管道。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-23 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf323
Ivana Vichentijevikj, Kostadin Mishev, Monika Simjanoska Misheva

Summary: This study presents a proof-of-concept, comprehensive, modular framework for AI-driven drug discovery (DD) and clinical trial simulation, spanning from target identification to virtual patient recruitment. Synthesized from a systematic analysis of 51 large language model (LLM)-based systems, the proposed Prompt-to-Pill architecture and corresponding implementation leverages a multi-agent system (MAS) divided into DD, preclinical and clinical phases, coordinated by a central Orchestrator. Each phase comprises specialized LLM for molecular generation, toxicity screening, docking, trial design, and patient matching. To demonstrate the full pipeline in practice, the well-characterized target Dipeptidyl Peptidase 4 (DPP4) was selected as a representative use case. The process begins with generative molecule creation and proceeds through ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) evaluation, structure-based docking, and lead optimization. Clinical-phase agents then simulate trial generation, patient eligibility screening using electronic health records (EHRs), and predict trial outcomes. By tightly integrating generative, predictive, and retrieval-based LLM components, this architecture bridges drug discovery and preclinical phase with virtual clinical development, offering a demonstration of how LLM-based agents can operationalize the drug development workflow in silico.

Availability and implementation: The implementation and code are available at: https://github.com/ChatMED/Prompt-to-Pill.

摘要:本研究提出了一个概念验证、全面、模块化的框架,用于人工智能驱动的药物发现(DD)和临床试验模拟,从目标识别到虚拟患者招募。通过对51个基于大语言模型(LLM)的系统的系统分析,提出的即时到药丸(Prompt-to-Pill)架构和相应的实现利用了一个多智能体系统(MAS),该系统分为DD、临床前和临床阶段,由中央Orchestrator协调。每个阶段都包括专门的LLM,用于分子生成、毒性筛选、对接、试验设计和患者匹配。为了在实践中展示完整的管道,选择表征良好的目标二肽基肽酶4 (DPP4)作为代表性用例。这个过程从生成分子开始,通过ADMET(吸收、分布、代谢、排泄和毒性)评估、基于结构的对接和先导物优化。然后,临床阶段药物模拟试验生成,使用电子健康记录(EHRs)筛选患者资格,并预测试验结果。通过紧密集成生成、预测和基于检索的LLM组件,该架构将药物发现和临床前阶段与虚拟临床开发连接起来,展示了基于LLM的代理如何在计算机上操作药物开发工作流。可用性和实现:实现和代码可在:https://github.com/ChatMED/Prompt-to-Pill上获得。
{"title":"Prompt-to-Pill: Multi-Agent Drug Discovery and Clinical Simulation Pipeline.","authors":"Ivana Vichentijevikj, Kostadin Mishev, Monika Simjanoska Misheva","doi":"10.1093/bioadv/vbaf323","DOIUrl":"10.1093/bioadv/vbaf323","url":null,"abstract":"<p><strong>Summary: </strong>This study presents a proof-of-concept, comprehensive, modular framework for AI-driven drug discovery (DD) and clinical trial simulation, spanning from target identification to virtual patient recruitment. Synthesized from a systematic analysis of 51 large language model (LLM)-based systems, the proposed <i>Prompt-to-Pill</i> architecture and corresponding implementation leverages a multi-agent system (MAS) divided into DD, preclinical and clinical phases, coordinated by a central <i>Orchestrator</i>. Each phase comprises specialized LLM for molecular generation, toxicity screening, docking, trial design, and patient matching. To demonstrate the full pipeline in practice, the well-characterized target Dipeptidyl Peptidase 4 (DPP4) was selected as a representative use case. The process begins with generative molecule creation and proceeds through ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) evaluation, structure-based docking, and lead optimization. Clinical-phase agents then simulate trial generation, patient eligibility screening using electronic health records (EHRs), and predict trial outcomes. By tightly integrating generative, predictive, and retrieval-based LLM components, this architecture bridges drug discovery and preclinical phase with virtual clinical development, offering a demonstration of how LLM-based agents can operationalize the drug development workflow <i>in silico</i>.</p><p><strong>Availability and implementation: </strong>The implementation and code are available at: https://github.com/ChatMED/Prompt-to-Pill.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf323"},"PeriodicalIF":2.8,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12800774/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond synthetic lethality in large-scale metabolic and regulatory network models via genetic minimal intervention set. 通过遗传最小干预集在大规模代谢和调节网络模型中超越合成致死率。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-19 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf319
Naroa Barrena, Carlos Rodriguez-Flores, Luis V Valcárcel, Danel Olaverri-Mendizabal, Xabier Agirre, Felipe Prósper, Francisco J Planes

Motivation: The integration of genome-scale metabolic and regulatory networks has received significant interest in cancer systems biology. However, the identification of lethal genetic interventions in these integrated models remains challenging due to the combinatorial explosion of potential solutions. To address this, we developed the genetic Minimal Cut Set (gMCS) framework, which computes synthetic lethal interactions-minimal sets of gene knockouts that are lethal for cellular proliferation- in genome-scale metabolic networks with signed directed acyclic regulatory pathways. Here, we present a novel formulation to calculate genetic Minimal Intervention Sets, gMISs, which incorporate both gene knockouts and knock-ins.

Results: With our gMIS approach, we assessed the landscape of lethal genetic interactions in human cells, capturing interventions beyond synthetic lethality, including synthetic dosage lethality and tumor suppressor gene complexes. We applied the concept of synthetic dosage lethality to predict essential genes in cancer and demonstrated a significant increase in sensitivity when compared to large-scale gene knockout screen data. We also analyzed tumor suppressors in cancer cell lines and identified lethal gene knock-in strategies. Finally, we demonstrate how gMISs can help uncover potential therapeutic targets, providing examples in hematological malignancies.

Availability and implementation: The gMCSpy Python package now includes gMIS functionalities. Access: https://github.com/PlanesLab/gMCSpy.

动机:基因组尺度代谢和调控网络的整合在癌症系统生物学中引起了极大的兴趣。然而,由于潜在解决方案的组合爆炸,在这些综合模型中识别致命的遗传干预仍然具有挑战性。为了解决这个问题,我们开发了遗传最小切割集(gMCS)框架,该框架计算了基因组尺度代谢网络中具有符号定向无环调控途径的合成致死相互作用-对细胞增殖致命的最小基因敲除集。在这里,我们提出了一个新的公式来计算遗传最小干预集,gMISs,其中包括基因敲除和敲入。结果:通过我们的gMIS方法,我们评估了人类细胞中致命性基因相互作用的情况,捕获了合成致死率之外的干预措施,包括合成剂量致死率和肿瘤抑制基因复合物。我们应用合成剂量致死的概念来预测癌症中的必要基因,并证明与大规模基因敲除筛选数据相比,敏感性显着增加。我们还分析了癌细胞系中的肿瘤抑制因子,并确定了致命的基因敲入策略。最后,我们展示了gMISs如何帮助发现潜在的治疗靶点,并提供了血液恶性肿瘤的例子。可用性和实现:gMCSpy Python包现在包含gMIS功能。访问:https://github.com/PlanesLab/gMCSpy。
{"title":"Beyond synthetic lethality in large-scale metabolic and regulatory network models via genetic minimal intervention set.","authors":"Naroa Barrena, Carlos Rodriguez-Flores, Luis V Valcárcel, Danel Olaverri-Mendizabal, Xabier Agirre, Felipe Prósper, Francisco J Planes","doi":"10.1093/bioadv/vbaf319","DOIUrl":"10.1093/bioadv/vbaf319","url":null,"abstract":"<p><strong>Motivation: </strong>The integration of genome-scale metabolic and regulatory networks has received significant interest in cancer systems biology. However, the identification of lethal genetic interventions in these integrated models remains challenging due to the combinatorial explosion of potential solutions. To address this, we developed the genetic Minimal Cut Set (gMCS) framework, which computes synthetic lethal interactions-minimal sets of gene knockouts that are lethal for cellular proliferation- in genome-scale metabolic networks with signed directed acyclic regulatory pathways. Here, we present a novel formulation to calculate genetic Minimal Intervention Sets, gMISs, which incorporate both gene knockouts and knock-ins.</p><p><strong>Results: </strong>With our gMIS approach, we assessed the landscape of lethal genetic interactions in human cells, capturing interventions beyond synthetic lethality, including synthetic dosage lethality and tumor suppressor gene complexes. We applied the concept of synthetic dosage lethality to predict essential genes in cancer and demonstrated a significant increase in sensitivity when compared to large-scale gene knockout screen data. We also analyzed tumor suppressors in cancer cell lines and identified lethal gene knock-in strategies. Finally, we demonstrate how gMISs can help uncover potential therapeutic targets, providing examples in hematological malignancies.</p><p><strong>Availability and implementation: </strong>The gMCSpy Python package now includes gMIS functionalities. Access: https://github.com/PlanesLab/gMCSpy.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf319"},"PeriodicalIF":2.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12784249/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145954096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics advances
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1