首页 > 最新文献

Bioinformatics advances最新文献

英文 中文
zelll: a fast, framework-free, and flexible implementation of the cell lists algorithm for the Rust programming language. zell: Rust编程语言的快速、无框架、灵活的单元列表算法实现。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-02 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf330
Vincent Messow, Christian Höner Zu Siederdissen, Michael Habeck

Summary: The cell lists algorithm is widely used to compute pairwise particle interactions below a fixed cutoff distance in approximately linear time. Prominent molecular dynamics frameworks implementing cell lists variants assume pre-determined and densely populated simulation boxes suitable for e.g. all-atom simulations with explicit solvents. zelll implements a simple yet efficient variant of the cell lists algorithm that uses sparse storage for the underlying partitioning grid. This allows for applications with dynamic simulation boundaries and sparsely populated simulation space not strictly fitting into the scope of common molecular dynamics frameworks, such as many coarse-grained simulations. For this reason, zelll does not target specific frameworks.

Availability and implementation: zelll is an open-source Rust library available under the MIT license at https://github.com/microscopic-image-analysis/zelll and https://crates.io/crates/zelll. Python bindings are available at https://pypi.org/project/zelll.

摘要:单元列表算法被广泛用于在近似线性时间内计算固定截止距离以下的成对粒子相互作用。实现细胞列表变体的突出分子动力学框架假设预先确定的和密集的模拟盒,适用于例如具有显式溶剂的全原子模拟。zell实现了cell lists算法的一个简单而高效的变体,该算法为底层分区网格使用稀疏存储。这允许具有动态模拟边界和稀疏填充的模拟空间的应用程序不严格适合普通分子动力学框架的范围,例如许多粗粒度模拟。因此,zell并不针对特定的框架。可用性和实现:zell是一个开源的Rust库,在MIT许可下可在https://github.com/microscopic-image-analysis/zelll和https://crates.io/crates/zelll获得。Python绑定可从https://pypi.org/project/zelll获得。
{"title":"zelll: a fast, framework-free, and flexible implementation of the cell lists algorithm for the Rust programming language.","authors":"Vincent Messow, Christian Höner Zu Siederdissen, Michael Habeck","doi":"10.1093/bioadv/vbaf330","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf330","url":null,"abstract":"<p><strong>Summary: </strong>The cell lists algorithm is widely used to compute pairwise particle interactions below a fixed cutoff distance in approximately linear time. Prominent molecular dynamics frameworks implementing cell lists variants assume pre-determined and densely populated simulation boxes suitable for e.g. all-atom simulations with explicit solvents. zelll implements a simple yet efficient variant of the cell lists algorithm that uses sparse storage for the underlying partitioning grid. This allows for applications with dynamic simulation boundaries and sparsely populated simulation space not strictly fitting into the scope of common molecular dynamics frameworks, such as many coarse-grained simulations. For this reason, zelll does not target specific frameworks.</p><p><strong>Availability and implementation: </strong>zelll is an open-source Rust library available under the MIT license at https://github.com/microscopic-image-analysis/zelll and https://crates.io/crates/zelll. Python bindings are available at https://pypi.org/project/zelll.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf330"},"PeriodicalIF":2.8,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12910374/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146222249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Puzzler: scalable one-command platinum-quality genome assembly from HiFi and Hi-C. 难题:可扩展的单命令白金质量基因组组装从HiFi和Hi-C。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-31 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf329
Justin Merondun, Qingyi Yu

Motivation: Chromosome-level assemblies are essential for modern genomics, from comparative genomics and evolutionary studies to precision breeding. While integrated HiFi and Hi-C data now enable accurate chromosome-scale genome assemblies, the bioinformatic process remains complex and involves specialized tools and expertise. With large-scale pan-genomic efforts requiring dozens to hundreds of platinum quality chromosome-scale genomes, there is a need for scalable, portable, and user-friendly pipelines that streamline and standardize high-quality genome assembly workflows.

Results: We introduce Puzzler, a containerized, scalable pipeline for chromosome-scale de novo genome assembly using PacBio HiFi and Hi-C data. Designed for portability and minimal user input, Puzzler automates contig assembly, duplicate purging, Hi-C-based scaffolding, and chromosome assignment via synteny, even with highly diverged reference taxa. Optional modules generate input files for manual Hi-C curation or operate reference-free. Quality control is integrated and includes Hi-C contact maps, BUSCO, yak k-mer completeness, and BlobTools contamination screening. A checkpointing system ensures that previously completed tasks are not re-executed, while a simple sample sheet input structure supports scalable batch processing. Puzzler has been validated on genomes ranging from 24 Mbp to 6.5 Gbp, delivering highly contiguous assemblies with <10 min of user input, enabling high-throughput platinum-quality genome assembly.

Availability and implementation: Puzzler is released into the public domain under 17 U.S.C. §105. Source code, documentation, and tutorials are available at https://github.com/merondun/puzzler and archived on Zenodo: https://doi.org/10.5281/zenodo.15733730 and https://doi.org/10.5281/zenodo.15693025. Pre-configured runtime environments including dependencies are provided via both a Conda environment (https://anaconda.org/heritabilities/puzzler) and an Apptainer hosted both on Zenodo and Sylabs (https://cloud.sylabs.io/library/merondun/default/puzzler).

动机:染色体水平的组装对于现代基因组学是必不可少的,从比较基因组学和进化研究到精确育种。虽然集成的HiFi和Hi-C数据现在可以实现精确的染色体尺度基因组组装,但生物信息学过程仍然复杂,需要专门的工具和专业知识。由于大规模的泛基因组工作需要数十到数百个铂级染色体基因组,因此需要可扩展、便携和用户友好的管道来简化和标准化高质量的基因组组装工作流程。结果:我们介绍了Puzzler,这是一个容器化的,可扩展的流水线,用于使用PacBio HiFi和Hi-C数据进行染色体尺度的从头基因组组装。Puzzler专为可移植性和最小的用户输入而设计,即使具有高度分化的参考分类群,也可以通过同音性自动进行配置组装,重复清除,基于hi - c的脚手架和染色体分配。可选模块生成输入文件,手动Hi-C管理或操作参考自由。质量控制是集成的,包括Hi-C接触图,BUSCO,牦牛k-mer完整性和BlobTools污染筛选。检查点系统确保以前完成的任务不会被重新执行,而简单的样本表输入结构支持可扩展的批处理。Puzzler已在24mbp至6.5 Gbp的基因组上进行了验证,提供具有可用性和实现的高度连续的组装:Puzzler根据17 U.S.C.§105发布到公共领域。源代码、文档和教程可在https://github.com/merondun/puzzler上获得,并可在Zenodo上存档:https://doi.org/10.5281/zenodo.15733730和https://doi.org/10.5281/zenodo.15693025。预配置的运行时环境(包括依赖项)通过Conda环境(https://anaconda.org/heritabilities/puzzler)和在Zenodo和Sylabs (https://cloud.sylabs.io/library/merondun/default/puzzler)上托管的Apptainer提供。
{"title":"Puzzler: scalable one-command platinum-quality genome assembly from HiFi and Hi-C.","authors":"Justin Merondun, Qingyi Yu","doi":"10.1093/bioadv/vbaf329","DOIUrl":"10.1093/bioadv/vbaf329","url":null,"abstract":"<p><strong>Motivation: </strong>Chromosome-level assemblies are essential for modern genomics, from comparative genomics and evolutionary studies to precision breeding. While integrated HiFi and Hi-C data now enable accurate chromosome-scale genome assemblies, the bioinformatic process remains complex and involves specialized tools and expertise. With large-scale pan-genomic efforts requiring dozens to hundreds of platinum quality chromosome-scale genomes, there is a need for scalable, portable, and user-friendly pipelines that streamline and standardize high-quality genome assembly workflows.</p><p><strong>Results: </strong>We introduce Puzzler, a containerized, scalable pipeline for chromosome-scale <i>de novo</i> genome assembly using PacBio HiFi and Hi-C data. Designed for portability and minimal user input, Puzzler automates contig assembly, duplicate purging, Hi-C-based scaffolding, and chromosome assignment via synteny, even with highly diverged reference taxa. Optional modules generate input files for manual Hi-C curation or operate reference-free. Quality control is integrated and includes Hi-C contact maps, BUSCO, yak k-mer completeness, and BlobTools contamination screening. A checkpointing system ensures that previously completed tasks are not re-executed, while a simple sample sheet input structure supports scalable batch processing. Puzzler has been validated on genomes ranging from 24 Mbp to 6.5 Gbp, delivering highly contiguous assemblies with <10 min of user input, enabling high-throughput platinum-quality genome assembly.</p><p><strong>Availability and implementation: </strong>Puzzler is released into the public domain under 17 U.S.C. §105. Source code, documentation, and tutorials are available at https://github.com/merondun/puzzler and archived on Zenodo: https://doi.org/10.5281/zenodo.15733730 and https://doi.org/10.5281/zenodo.15693025. Pre-configured runtime environments including dependencies are provided via both a Conda environment (https://anaconda.org/heritabilities/puzzler) and an Apptainer hosted both on Zenodo and Sylabs (https://cloud.sylabs.io/library/merondun/default/puzzler).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf329"},"PeriodicalIF":2.8,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12820402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning can distinguish orphans that have resulted from sequence divergence beyond recognition. 机器学习可以区分由于序列差异而导致的孤儿。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-27 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf324
Emilios Tassios, Jori de Leuw, Christoforos Nikolaou, Anne Kupczok, Nikolaos Vakirlis

Motivation: Species-specific orphan genes lack homologues outside of a given taxon and frequently underlie unique species traits. Orphans can result from sequence divergence beyond recognition, when homologous proteins diverge to an extent at which sequence similarity search algorithms can no longer identify them as homologues, but they can also evolve de novo from previously noncoding sequences, in which case homologous protein-coding genes truly do not exist.

Results: Here we propose that sequence divergent orphans might be recognizable from their patterns of non-statistically significant similarity hits which are typically discarded. To test this, we simulated diverged orphan protein sequences under varying parameters. Using reversed protein sequences as negative control, we trained machine learning classifiers on features extracted from similarity search output. We found that this approach works, but performance of the models depends on the simulation parameters, with ∼90% accuracy when the underlying simulated divergence was moderate and ∼70% when it is extreme. When applying our classifiers on a set of real orphans we found that ∼30% of them are predicted to be divergent and these are shorter and more disordered than the rest. Our work contributes to the effort of better understanding how genetic novelty arises.

Availability and implementation: The models and data used can be found at https://github.com/emiliostassios/Classification-of-divergent-genes-using-ML.

动机:物种特异性孤儿基因在特定分类单元之外缺乏同源物,并且经常是独特物种特征的基础。孤儿可能是由于序列分化到无法识别的程度,当同源蛋白分化到序列相似性搜索算法不能再将其识别为同源时,但它们也可以从先前的非编码序列从头进化而来,在这种情况下,同源蛋白编码基因实际上不存在。结果:在这里,我们提出序列发散孤儿可以从他们的非统计显着的相似命中模式来识别,这些相似命中通常被丢弃。为了验证这一点,我们模拟了不同参数下分散的孤儿蛋白序列。使用反向蛋白质序列作为负对照,我们训练机器学习分类器从相似性搜索输出中提取特征。我们发现这种方法是有效的,但模型的性能取决于模拟参数,当潜在的模拟散度为中等时,准确率为~ 90%,当其为极端时,准确率为~ 70%。当将我们的分类器应用于一组真实的孤儿时,我们发现其中约30%的孤儿被预测为发散的,这些孤儿比其他孤儿更短,更无序。我们的工作有助于更好地理解基因新颖性如何产生。可用性和实现:所使用的模型和数据可以在https://github.com/emiliostassios/Classification-of-divergent-genes-using-ML上找到。
{"title":"Machine learning can distinguish orphans that have resulted from sequence divergence beyond recognition.","authors":"Emilios Tassios, Jori de Leuw, Christoforos Nikolaou, Anne Kupczok, Nikolaos Vakirlis","doi":"10.1093/bioadv/vbaf324","DOIUrl":"10.1093/bioadv/vbaf324","url":null,"abstract":"<p><strong>Motivation: </strong>Species-specific orphan genes lack homologues outside of a given taxon and frequently underlie unique species traits. Orphans can result from sequence divergence beyond recognition, when homologous proteins diverge to an extent at which sequence similarity search algorithms can no longer identify them as homologues, but they can also evolve de novo from previously noncoding sequences, in which case homologous protein-coding genes truly do not exist.</p><p><strong>Results: </strong>Here we propose that sequence divergent orphans might be recognizable from their patterns of non-statistically significant similarity hits which are typically discarded. To test this, we simulated diverged orphan protein sequences under varying parameters. Using reversed protein sequences as negative control, we trained machine learning classifiers on features extracted from similarity search output. We found that this approach works, but performance of the models depends on the simulation parameters, with ∼90% accuracy when the underlying simulated divergence was moderate and ∼70% when it is extreme. When applying our classifiers on a set of real orphans we found that ∼30% of them are predicted to be divergent and these are shorter and more disordered than the rest. Our work contributes to the effort of better understanding how genetic novelty arises.</p><p><strong>Availability and implementation: </strong>The models and data used can be found at https://github.com/emiliostassios/Classification-of-divergent-genes-using-ML.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf324"},"PeriodicalIF":2.8,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12904771/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
REDAC: RNA-seq expression data analysis chatbot. REDAC: RNA-seq表达数据分析聊天机器人。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-27 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf321
Giovanni Maria De Filippis, Pranoy Sahu, Pasqualina Ambrosio, Stefania Picascia, Matteo Lo Monte, Ilenia Agliarulo, Simone Di Paola, Cristiano Russo, Christian Tommasino, Nicola Normanno, Daniela Frezzetti, Seetharaman Parashuraman, Antonio M Rinaldi, Francesco Russo

Motivation: To date, due to the complexity of both the analytical processes and the result interpretation of RNA-seq expression data analyses, researchers often require the support of bioinformaticians expertise. Selecting appropriate statistical tests and performing essential data manipulations, such as normalization and filtering, in a rigorous and reproducible manner remains a significant challenge for many users.

Results: We developed REDAC, a web-based R application that offers an interactive platform designed to simplify and enhance RNA-seq expression data exploration and analysis. REDAC provides a straightforward approach to perform differentially RNA-seq analysis rapidly, easily, and transparently through natural language queries from users. Moreover, it allows to run complete analyses, generate comprehensive visualizations, and obtain biological interpretation of pathway enrichment results via two popular Large Language Models: Gemma and LLaMA guided by a PubMed based Retrieval-Augmented Generation module. Finally, REDAC promotes reproducibility through the automated generation of analysis reports.

Availability and implementation: REDAC is available for local (https://github.com/franruss/REDAC) and online use (https://frusso.shinyapps.io/REDAC). User manual: https://github.com/franruss/REDAC/blob/main/docs/REDAC_user_manual.pdf.

动机:迄今为止,由于RNA-seq表达数据分析的分析过程和结果解释的复杂性,研究人员经常需要生物信息学家专业知识的支持。以严格和可重复的方式选择适当的统计测试和执行基本的数据操作,例如规范化和过滤,仍然是许多用户面临的重大挑战。结果:我们开发了REDAC,这是一个基于web的R应用程序,提供了一个交互式平台,旨在简化和增强RNA-seq表达数据的探索和分析。REDAC提供了一种简单的方法,通过用户的自然语言查询,快速、轻松、透明地执行差异RNA-seq分析。此外,它允许运行完整的分析,生成全面的可视化,并通过两种流行的大型语言模型(Gemma和LLaMA)获得途径富集结果的生物学解释,这些模型由基于PubMed的检索-增强生成模块指导。最后,REDAC通过自动生成分析报告来提高再现性。可用性和实现:REDAC可用于本地(https://github.com/franruss/REDAC)和在线使用(https://frusso.shinyapps.io/REDAC)。用户手册:https://github.com/franruss/REDAC/blob/main/docs/REDAC_user_manual.pdf。
{"title":"REDAC: RNA-seq expression data analysis chatbot.","authors":"Giovanni Maria De Filippis, Pranoy Sahu, Pasqualina Ambrosio, Stefania Picascia, Matteo Lo Monte, Ilenia Agliarulo, Simone Di Paola, Cristiano Russo, Christian Tommasino, Nicola Normanno, Daniela Frezzetti, Seetharaman Parashuraman, Antonio M Rinaldi, Francesco Russo","doi":"10.1093/bioadv/vbaf321","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf321","url":null,"abstract":"<p><strong>Motivation: </strong>To date, due to the complexity of both the analytical processes and the result interpretation of RNA-seq expression data analyses, researchers often require the support of bioinformaticians expertise. Selecting appropriate statistical tests and performing essential data manipulations, such as normalization and filtering, in a rigorous and reproducible manner remains a significant challenge for many users.</p><p><strong>Results: </strong>We developed REDAC, a web-based R application that offers an interactive platform designed to simplify and enhance RNA-seq expression data exploration and analysis. REDAC provides a straightforward approach to perform differentially RNA-seq analysis rapidly, easily, and transparently through natural language queries from users. Moreover, it allows to run complete analyses, generate comprehensive visualizations, and obtain biological interpretation of pathway enrichment results via two popular Large Language Models: <i>Gemma</i> and <i>LLaMA</i> guided by a PubMed based Retrieval-Augmented Generation module. Finally, REDAC promotes reproducibility through the automated generation of analysis reports.</p><p><strong>Availability and implementation: </strong>REDAC is available for local (https://github.com/franruss/REDAC) and online use (https://frusso.shinyapps.io/REDAC). User manual: https://github.com/franruss/REDAC/blob/main/docs/REDAC_user_manual.pdf.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf321"},"PeriodicalIF":2.8,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12927421/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147286350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fluoro-forest: a random forest workflow for cell type annotation in high-dimensional immunofluorescence imaging with limited training data. Fluoro-forest:一个随机森林工作流,用于高维免疫荧光成像中有限训练数据的细胞类型注释。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-24 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf320
Joshua Brand, Wei Zhang, Evie Carchman, Huy Q Dinh

Motivation: Cyclic immunofluorescence (IF) techniques enable deep phenotyping of cells and help quantify tissue organization at high resolution. Due to its high dimensionality, workflows typically rely on unsupervised clustering, followed by cell type annotation at a cluster level for cell type assignment. Most of these methods use marker expression averages that lack a statistical evaluation of cell type annotations, which can result in misclassification. Here, we propose a strategy through an end-to-end pipeline using a semi-supervised, random forest approach to predict cell type annotations.

Results: Our method includes cluster-based sampling for training data, cell type prediction, and downstream visualization for interpretability of cell annotation that ultimately improves classification results. We show that our workflow can annotate cells more accurately compared to representative deep learning and probabilistic methods, with a training set <5% of the total number of cells tested. In addition, our pipeline outputs cell type probabilities and model performance metrics for users to decide if it could boost their existing clustering-based workflow results for complex IF data.

Availability and implementation: Fluoro-forest is freely available on GitHub under an MIT license (https://github.com/Josh-Brand/Fluoro-forest).

动机:循环免疫荧光(IF)技术可以实现细胞的深度表型,并有助于高分辨率量化组织组织。由于其高维性,工作流通常依赖于无监督聚类,然后在聚类级别上进行单元类型注释以进行单元类型分配。这些方法大多使用缺乏细胞类型注释统计评估的标记表达平均值,这可能导致错误分类。在这里,我们提出了一种策略,通过端到端管道使用半监督,随机森林方法来预测细胞类型注释。结果:我们的方法包括基于聚类的训练数据采样,细胞类型预测,以及最终提高分类结果的细胞注释可解释性的下游可视化。我们证明,与代表性的深度学习和概率方法相比,我们的工作流可以更准确地注释细胞,使用训练集。可用性和实现:Fluoro-forest在麻省理工学院许可(https://github.com/Josh-Brand/Fluoro-forest)下在GitHub上免费提供。
{"title":"Fluoro-forest: a random forest workflow for cell type annotation in high-dimensional immunofluorescence imaging with limited training data.","authors":"Joshua Brand, Wei Zhang, Evie Carchman, Huy Q Dinh","doi":"10.1093/bioadv/vbaf320","DOIUrl":"10.1093/bioadv/vbaf320","url":null,"abstract":"<p><strong>Motivation: </strong>Cyclic immunofluorescence (IF) techniques enable deep phenotyping of cells and help quantify tissue organization at high resolution. Due to its high dimensionality, workflows typically rely on unsupervised clustering, followed by cell type annotation at a cluster level for cell type assignment. Most of these methods use marker expression averages that lack a statistical evaluation of cell type annotations, which can result in misclassification. Here, we propose a strategy through an end-to-end pipeline using a semi-supervised, random forest approach to predict cell type annotations.</p><p><strong>Results: </strong>Our method includes cluster-based sampling for training data, cell type prediction, and downstream visualization for interpretability of cell annotation that ultimately improves classification results. We show that our workflow can annotate cells more accurately compared to representative deep learning and probabilistic methods, with a training set <5% of the total number of cells tested. In addition, our pipeline outputs cell type probabilities and model performance metrics for users to decide if it could boost their existing clustering-based workflow results for complex IF data.</p><p><strong>Availability and implementation: </strong>Fluoro-forest is freely available on GitHub under an MIT license (https://github.com/Josh-Brand/Fluoro-forest).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf320"},"PeriodicalIF":2.8,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12782655/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145954108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prompt-to-Pill: Multi-Agent Drug Discovery and Clinical Simulation Pipeline. 快速到药丸:多药物发现和临床模拟管道。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-23 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf323
Ivana Vichentijevikj, Kostadin Mishev, Monika Simjanoska Misheva

Summary: This study presents a proof-of-concept, comprehensive, modular framework for AI-driven drug discovery (DD) and clinical trial simulation, spanning from target identification to virtual patient recruitment. Synthesized from a systematic analysis of 51 large language model (LLM)-based systems, the proposed Prompt-to-Pill architecture and corresponding implementation leverages a multi-agent system (MAS) divided into DD, preclinical and clinical phases, coordinated by a central Orchestrator. Each phase comprises specialized LLM for molecular generation, toxicity screening, docking, trial design, and patient matching. To demonstrate the full pipeline in practice, the well-characterized target Dipeptidyl Peptidase 4 (DPP4) was selected as a representative use case. The process begins with generative molecule creation and proceeds through ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) evaluation, structure-based docking, and lead optimization. Clinical-phase agents then simulate trial generation, patient eligibility screening using electronic health records (EHRs), and predict trial outcomes. By tightly integrating generative, predictive, and retrieval-based LLM components, this architecture bridges drug discovery and preclinical phase with virtual clinical development, offering a demonstration of how LLM-based agents can operationalize the drug development workflow in silico.

Availability and implementation: The implementation and code are available at: https://github.com/ChatMED/Prompt-to-Pill.

摘要:本研究提出了一个概念验证、全面、模块化的框架,用于人工智能驱动的药物发现(DD)和临床试验模拟,从目标识别到虚拟患者招募。通过对51个基于大语言模型(LLM)的系统的系统分析,提出的即时到药丸(Prompt-to-Pill)架构和相应的实现利用了一个多智能体系统(MAS),该系统分为DD、临床前和临床阶段,由中央Orchestrator协调。每个阶段都包括专门的LLM,用于分子生成、毒性筛选、对接、试验设计和患者匹配。为了在实践中展示完整的管道,选择表征良好的目标二肽基肽酶4 (DPP4)作为代表性用例。这个过程从生成分子开始,通过ADMET(吸收、分布、代谢、排泄和毒性)评估、基于结构的对接和先导物优化。然后,临床阶段药物模拟试验生成,使用电子健康记录(EHRs)筛选患者资格,并预测试验结果。通过紧密集成生成、预测和基于检索的LLM组件,该架构将药物发现和临床前阶段与虚拟临床开发连接起来,展示了基于LLM的代理如何在计算机上操作药物开发工作流。可用性和实现:实现和代码可在:https://github.com/ChatMED/Prompt-to-Pill上获得。
{"title":"Prompt-to-Pill: Multi-Agent Drug Discovery and Clinical Simulation Pipeline.","authors":"Ivana Vichentijevikj, Kostadin Mishev, Monika Simjanoska Misheva","doi":"10.1093/bioadv/vbaf323","DOIUrl":"10.1093/bioadv/vbaf323","url":null,"abstract":"<p><strong>Summary: </strong>This study presents a proof-of-concept, comprehensive, modular framework for AI-driven drug discovery (DD) and clinical trial simulation, spanning from target identification to virtual patient recruitment. Synthesized from a systematic analysis of 51 large language model (LLM)-based systems, the proposed <i>Prompt-to-Pill</i> architecture and corresponding implementation leverages a multi-agent system (MAS) divided into DD, preclinical and clinical phases, coordinated by a central <i>Orchestrator</i>. Each phase comprises specialized LLM for molecular generation, toxicity screening, docking, trial design, and patient matching. To demonstrate the full pipeline in practice, the well-characterized target Dipeptidyl Peptidase 4 (DPP4) was selected as a representative use case. The process begins with generative molecule creation and proceeds through ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) evaluation, structure-based docking, and lead optimization. Clinical-phase agents then simulate trial generation, patient eligibility screening using electronic health records (EHRs), and predict trial outcomes. By tightly integrating generative, predictive, and retrieval-based LLM components, this architecture bridges drug discovery and preclinical phase with virtual clinical development, offering a demonstration of how LLM-based agents can operationalize the drug development workflow <i>in silico</i>.</p><p><strong>Availability and implementation: </strong>The implementation and code are available at: https://github.com/ChatMED/Prompt-to-Pill.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf323"},"PeriodicalIF":2.8,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12800774/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IHIT-BED: an interpretable transformer approach using unbiased hematology analyzer impedance data for early identification of bacteremia in emergency department. IHIT-BED:一种可解释的变压器方法,使用无偏血液学分析仪阻抗数据,用于急诊科早期识别菌血症。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-23 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf322
Tung-Lin Tsai, Chien-Chong Hong, Hsing-Wen Cheng, Chin-An Yang

Motivation: Early detection of severe bloodstream infections is essential for early treatment initiation. However, the suspicion of bacteremia relies on the combined interpretation of routine laboratory tests, such as complete blood count (CBC), differential count (DC), and elevated C-reactive protein (CRP). Furthermore, a definite diagnosis of bacteremia requires a positive blood culture, which takes several days.

Results: We developed the Interpretable Hematology analyzer Impedance data-based Tabular network for early identification of Bacteremia in Emergency Department (IHIT-BED), a blood stream infection prediction system built by machine learning methods using the integrated data of hematology analyzer impedance histogram signals of CBC, blood culture reports, and CRP levels, which were simultaneously tested in the first blood draw of patients visiting the ED. To our knowledge, IHIT-BED is the first predictor based on hematology impedance histogram signals, which performs well not only in predicting a positive blood culture and severe inflammation, but also is sensitive to detect changes in blood cell morphologies correlated with active inflammatory responses to bacterial infections. IHIT-BED provides clinical decision support for prompt initiation of antibiotics treatment.

Availability and implementation: The method can be found in https://github.com/appleRtsan/IHIT-BED.

动机:早期发现严重血液感染对于早期开始治疗至关重要。然而,怀疑菌血症依赖于常规实验室检查的综合解释,如全血细胞计数(CBC)、差异计数(DC)和升高的c反应蛋白(CRP)。此外,明确诊断菌血症需要阳性血培养,这需要几天时间。结果:我们开发了用于早期识别急诊科菌血症的可解释血液学分析仪阻抗数据表网络(IHIT-BED),这是一个通过机器学习方法构建的血流感染预测系统,使用血液学分析仪CBC阻抗直方图信号、血培养报告和CRP水平的综合数据,这些数据在访问ED的患者首次抽血时同时进行检测。据我们所知,IHIT-BED是第一个基于血液学阻抗直方图信号的预测器,它不仅在预测血培养阳性和严重炎症方面表现良好,而且对检测与细菌感染的活跃炎症反应相关的血细胞形态学变化也很敏感。IHIT-BED为及时开始抗生素治疗提供临床决策支持。可用性和实现:该方法可在https://github.com/appleRtsan/IHIT-BED中找到。
{"title":"IHIT-BED: an interpretable transformer approach using unbiased hematology analyzer impedance data for early identification of bacteremia in emergency department.","authors":"Tung-Lin Tsai, Chien-Chong Hong, Hsing-Wen Cheng, Chin-An Yang","doi":"10.1093/bioadv/vbaf322","DOIUrl":"10.1093/bioadv/vbaf322","url":null,"abstract":"<p><strong>Motivation: </strong>Early detection of severe bloodstream infections is essential for early treatment initiation. However, the suspicion of bacteremia relies on the combined interpretation of routine laboratory tests, such as complete blood count (CBC), differential count (DC), and elevated C-reactive protein (CRP). Furthermore, a definite diagnosis of bacteremia requires a positive blood culture, which takes several days.</p><p><strong>Results: </strong>We developed the Interpretable Hematology analyzer Impedance data-based Tabular network for early identification of Bacteremia in Emergency Department (IHIT-BED), a blood stream infection prediction system built by machine learning methods using the integrated data of hematology analyzer impedance histogram signals of CBC, blood culture reports, and CRP levels, which were simultaneously tested in the first blood draw of patients visiting the ED. To our knowledge, IHIT-BED is the first predictor based on hematology impedance histogram signals, which performs well not only in predicting a positive blood culture and severe inflammation, but also is sensitive to detect changes in blood cell morphologies correlated with active inflammatory responses to bacterial infections. IHIT-BED provides clinical decision support for prompt initiation of antibiotics treatment.</p><p><strong>Availability and implementation: </strong>The method can be found in https://github.com/appleRtsan/IHIT-BED.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf322"},"PeriodicalIF":2.8,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12895069/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146203899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond synthetic lethality in large-scale metabolic and regulatory network models via genetic minimal intervention set. 通过遗传最小干预集在大规模代谢和调节网络模型中超越合成致死率。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-19 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf319
Naroa Barrena, Carlos Rodriguez-Flores, Luis V Valcárcel, Danel Olaverri-Mendizabal, Xabier Agirre, Felipe Prósper, Francisco J Planes

Motivation: The integration of genome-scale metabolic and regulatory networks has received significant interest in cancer systems biology. However, the identification of lethal genetic interventions in these integrated models remains challenging due to the combinatorial explosion of potential solutions. To address this, we developed the genetic Minimal Cut Set (gMCS) framework, which computes synthetic lethal interactions-minimal sets of gene knockouts that are lethal for cellular proliferation- in genome-scale metabolic networks with signed directed acyclic regulatory pathways. Here, we present a novel formulation to calculate genetic Minimal Intervention Sets, gMISs, which incorporate both gene knockouts and knock-ins.

Results: With our gMIS approach, we assessed the landscape of lethal genetic interactions in human cells, capturing interventions beyond synthetic lethality, including synthetic dosage lethality and tumor suppressor gene complexes. We applied the concept of synthetic dosage lethality to predict essential genes in cancer and demonstrated a significant increase in sensitivity when compared to large-scale gene knockout screen data. We also analyzed tumor suppressors in cancer cell lines and identified lethal gene knock-in strategies. Finally, we demonstrate how gMISs can help uncover potential therapeutic targets, providing examples in hematological malignancies.

Availability and implementation: The gMCSpy Python package now includes gMIS functionalities. Access: https://github.com/PlanesLab/gMCSpy.

动机:基因组尺度代谢和调控网络的整合在癌症系统生物学中引起了极大的兴趣。然而,由于潜在解决方案的组合爆炸,在这些综合模型中识别致命的遗传干预仍然具有挑战性。为了解决这个问题,我们开发了遗传最小切割集(gMCS)框架,该框架计算了基因组尺度代谢网络中具有符号定向无环调控途径的合成致死相互作用-对细胞增殖致命的最小基因敲除集。在这里,我们提出了一个新的公式来计算遗传最小干预集,gMISs,其中包括基因敲除和敲入。结果:通过我们的gMIS方法,我们评估了人类细胞中致命性基因相互作用的情况,捕获了合成致死率之外的干预措施,包括合成剂量致死率和肿瘤抑制基因复合物。我们应用合成剂量致死的概念来预测癌症中的必要基因,并证明与大规模基因敲除筛选数据相比,敏感性显着增加。我们还分析了癌细胞系中的肿瘤抑制因子,并确定了致命的基因敲入策略。最后,我们展示了gMISs如何帮助发现潜在的治疗靶点,并提供了血液恶性肿瘤的例子。可用性和实现:gMCSpy Python包现在包含gMIS功能。访问:https://github.com/PlanesLab/gMCSpy。
{"title":"Beyond synthetic lethality in large-scale metabolic and regulatory network models via genetic minimal intervention set.","authors":"Naroa Barrena, Carlos Rodriguez-Flores, Luis V Valcárcel, Danel Olaverri-Mendizabal, Xabier Agirre, Felipe Prósper, Francisco J Planes","doi":"10.1093/bioadv/vbaf319","DOIUrl":"10.1093/bioadv/vbaf319","url":null,"abstract":"<p><strong>Motivation: </strong>The integration of genome-scale metabolic and regulatory networks has received significant interest in cancer systems biology. However, the identification of lethal genetic interventions in these integrated models remains challenging due to the combinatorial explosion of potential solutions. To address this, we developed the genetic Minimal Cut Set (gMCS) framework, which computes synthetic lethal interactions-minimal sets of gene knockouts that are lethal for cellular proliferation- in genome-scale metabolic networks with signed directed acyclic regulatory pathways. Here, we present a novel formulation to calculate genetic Minimal Intervention Sets, gMISs, which incorporate both gene knockouts and knock-ins.</p><p><strong>Results: </strong>With our gMIS approach, we assessed the landscape of lethal genetic interactions in human cells, capturing interventions beyond synthetic lethality, including synthetic dosage lethality and tumor suppressor gene complexes. We applied the concept of synthetic dosage lethality to predict essential genes in cancer and demonstrated a significant increase in sensitivity when compared to large-scale gene knockout screen data. We also analyzed tumor suppressors in cancer cell lines and identified lethal gene knock-in strategies. Finally, we demonstrate how gMISs can help uncover potential therapeutic targets, providing examples in hematological malignancies.</p><p><strong>Availability and implementation: </strong>The gMCSpy Python package now includes gMIS functionalities. Access: https://github.com/PlanesLab/gMCSpy.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf319"},"PeriodicalIF":2.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12784249/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145954096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computational identification of Azadirachta indica compounds targeting trypanothione reductase in Leishmania infantum. 针对婴儿利什曼原虫锥虫硫酮还原酶的印楝化合物的计算鉴定。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-17 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf318
Onile Olugbenga Samson, Olukunle Samuel, Fadahunsi Adeyinka Ignatius, Onile Tolulope Adelonpe, Momoh Abdul, Kolawole Oladipo, Afolabi Titilope Esther, Raji Omotara, Hassan Nour, Samir Chtita

Motivation: Leishmania infantum is the primary cause of VL, and its trypanothione reductase (TR) creates a favorable environment in the host, making TR an attractive drug target. This study aims to identify potential TR inhibitors from Azadirachta indica phytochemicals using molecular modeling techniques. Results: Sixty compounds from A. indica were screened via molecular docking for their binding affinity to TR, followed by binding free energy calculations. Drug-likeness, pharmacokinetics, and toxicity properties of the hit compounds were then evaluated. The top compounds were subjected to a 100 ns molecular dynamics (MDs) simulation to further assess the stability of their interaction with TR. Ten of the screened compounds exhibited higher affinity for TR compared to miltefosine (standard drug), with docking scores ranging from -3.501 to -8.482 kcal/mol, compared to miltefosine's -3.231 kcal/mol. All the drug-like hit compounds showed favorable pharmacokinetics and toxicity profiles and their binding free energies indicated stable interactions. MDs simulations confirmed that these interactions persisted for most of the simulation time, confirming the stability and potential efficacy of the compounds as TR inhibitors. Availability and Implementation: This study identifies isorhamnetin, meliantriol, and quercetin as promising candidates for further in vitro and in vivo evaluation for the development of TR inhibitors against L. infantum.

动机:婴儿利什曼原虫是VL的主要病因,其锥虫硫酮还原酶(TR)在宿主体内创造了良好的环境,使TR成为有吸引力的药物靶点。本研究旨在利用分子模拟技术鉴定印楝植物化学物质中潜在的TR抑制剂。结果:通过分子对接筛选出60个与TR结合的化合物,并进行结合自由能计算。然后评估了击中化合物的药物相似性、药代动力学和毒性。筛选到的10个化合物与标准药物米替福辛(miltefoine)相比,对TR具有更高的亲和力,对接评分范围为-3.501至-8.482 kcal/mol,而米替福辛的对接评分为-3.231 kcal/mol。所有类药物击中化合物均表现出良好的药代动力学和毒性特征,其结合自由能显示出稳定的相互作用。MDs模拟证实,这些相互作用在大部分模拟时间内持续存在,证实了化合物作为TR抑制剂的稳定性和潜在功效。可获得性和实施:本研究确定异鼠李素、三醇和槲皮素是有前途的候选者,可以进一步进行体外和体内评估,以开发针对婴儿乳杆菌的TR抑制剂。
{"title":"Computational identification of <i>Azadirachta indica</i> compounds targeting trypanothione reductase in <i>Leishmania infantum</i>.","authors":"Onile Olugbenga Samson, Olukunle Samuel, Fadahunsi Adeyinka Ignatius, Onile Tolulope Adelonpe, Momoh Abdul, Kolawole Oladipo, Afolabi Titilope Esther, Raji Omotara, Hassan Nour, Samir Chtita","doi":"10.1093/bioadv/vbaf318","DOIUrl":"10.1093/bioadv/vbaf318","url":null,"abstract":"<p><p><b>Motivation:</b> <i>Leishmania infantum</i> is the primary cause of VL, and its trypanothione reductase (TR) creates a favorable environment in the host, making TR an attractive drug target. This study aims to identify potential TR inhibitors from <i>Azadirachta indica</i> phytochemicals using molecular modeling techniques. <b>Results:</b> Sixty compounds from <i>A. indica</i> were screened via molecular docking for their binding affinity to TR, followed by binding free energy calculations. Drug-likeness, pharmacokinetics, and toxicity properties of the hit compounds were then evaluated. The top compounds were subjected to a 100 ns molecular dynamics (MDs) simulation to further assess the stability of their interaction with TR. Ten of the screened compounds exhibited higher affinity for TR compared to miltefosine (standard drug), with docking scores ranging from -3.501 to -8.482 kcal/mol, compared to miltefosine's -3.231 kcal/mol. All the drug-like hit compounds showed favorable pharmacokinetics and toxicity profiles and their binding free energies indicated stable interactions. MDs simulations confirmed that these interactions persisted for most of the simulation time, confirming the stability and potential efficacy of the compounds as TR inhibitors. <b>Availability and Implementation:</b> This study identifies isorhamnetin, meliantriol, and quercetin as promising candidates for further <i>in vitro</i> and <i>in vivo</i> evaluation for the development of TR inhibitors against <i>L. infantum</i>.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf318"},"PeriodicalIF":2.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12776344/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Perspectives in computational mass spectrometry: recent developments and key challenges. 计算质谱的前景:最近的发展和主要挑战。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-17 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf301
Timo Sachsenberg, Lindsay K Pino, Marie Brunet, Isabell Bludau, Oliver Kohlbacher, Juan Antonio Vizcaino, Wout Bittremieux

Summary: Mass spectrometry (MS) is a cornerstone technology in modern molecular biology, powering diverse applications across proteomics, metabolomics, lipidomics, glycomics, and beyond. As the field continues to evolve, rapid advancements in instrumentation, acquisition strategies, machine learning, and scalable computing have reshaped the landscape of computational MS. This perspective reviews recent developments and highlights key challenges, including data harmonization, statistical confidence estimation, repository-scale analysis, multi-omics integration, and privacy in clinical MS. We also discuss the increasing importance of machine learning and the need to build corresponding literacy within the community. Finally, we reflect on the role of the Computational Mass Spectrometry (CompMS) Community of Special Interest of the International Society for Computational Biology in supporting collaboration, innovation, and knowledge exchange. With MS-based technologies now central to both basic and translational research, continued investment in robust and reproducible computational methods will be essential to realize their full potential.

摘要:质谱(MS)是现代分子生物学的基础技术,在蛋白质组学、代谢组学、脂质组学、糖组学等领域有着广泛的应用。随着该领域的不断发展,仪器仪表、采集策略、机器学习和可扩展计算的快速发展重塑了计算ms的格局。本观点回顾了最近的发展,并强调了关键挑战,包括数据协调、统计置信度估计、存储库规模分析、多组学集成、我们还讨论了机器学习日益增长的重要性以及在社区中建立相应素养的必要性。最后,我们反思了国际计算生物学学会计算质谱(CompMS)社区在支持合作、创新和知识交流方面的作用。基于ms的技术现在是基础研究和转化研究的核心,对强大和可重复的计算方法的持续投资将是实现其全部潜力的必要条件。
{"title":"Perspectives in computational mass spectrometry: recent developments and key challenges.","authors":"Timo Sachsenberg, Lindsay K Pino, Marie Brunet, Isabell Bludau, Oliver Kohlbacher, Juan Antonio Vizcaino, Wout Bittremieux","doi":"10.1093/bioadv/vbaf301","DOIUrl":"10.1093/bioadv/vbaf301","url":null,"abstract":"<p><p><b>Summary</b>: Mass spectrometry (MS) is a cornerstone technology in modern molecular biology, powering diverse applications across proteomics, metabolomics, lipidomics, glycomics, and beyond. As the field continues to evolve, rapid advancements in instrumentation, acquisition strategies, machine learning, and scalable computing have reshaped the landscape of computational MS. This perspective reviews recent developments and highlights key challenges, including data harmonization, statistical confidence estimation, repository-scale analysis, multi-omics integration, and privacy in clinical MS. We also discuss the increasing importance of machine learning and the need to build corresponding literacy within the community. Finally, we reflect on the role of the Computational Mass Spectrometry (CompMS) Community of Special Interest of the International Society for Computational Biology in supporting collaboration, innovation, and knowledge exchange. With MS-based technologies now central to both basic and translational research, continued investment in robust and reproducible computational methods will be essential to realize their full potential.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf301"},"PeriodicalIF":2.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12715313/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics advances
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1