FlowBack: A Generalized Flow-Matching Approach for Biomolecular Backmapping.

IF 5.3 2区化学 Q1 CHEMISTRY, MEDICINAL Journal of Chemical Information and Modeling Pub Date : 2025-01-27 Epub Date: 2025-01-07 DOI:10.1021/acs.jcim.4c02046

Michael S Jones, Smayan Khanna, Andrew L Ferguson

{"title":"FlowBack: A Generalized Flow-Matching Approach for Biomolecular Backmapping.","authors":"Michael S Jones, Smayan Khanna, Andrew L Ferguson","doi":"10.1021/acs.jcim.4c02046","DOIUrl":null,"url":null,"abstract":"<p><p>Coarse-grained models have become ubiquitous in biomolecular modeling tasks aimed at studying slow dynamical processes such as protein folding and DNA hybridization. These models can considerably accelerate sampling but it remains challenging to accurately and efficiently restore all-atom detail to the coarse-grained trajectory, which can be vital for detailed understanding of molecular mechanisms and calculation of observables contingent on all-atom coordinates. In this work, we introduce FlowBack as a deep generative model employing a flow-matching objective to map samples from a coarse-grained prior distribution to an all-atom data distribution. We construct our prior distribution to be agnostic to the coarse-grained map and molecular type. A protein-specific model trained on ∼65k structures from the Protein Data Bank achieves state-of-the-art performance on structural metrics compared to previous generative and rules-based approaches in applications to static PDB structures, all-atom simulations of fast-folding proteins, and coarse-grained trajectories generated by a machine-learned force field. A DNA-protein model trained on ∼1.5k DNA-protein complexes achieves excellent reconstruction and generative capabilities on static DNA-protein complexes from the Protein Data Bank as well as on out-of-distribution coarse-grained dynamical simulations of DNA-protein complexation. FlowBack offers an accurate, efficient, and easy-to-use tool to recover all-atom structures from coarse-grained molecular simulations with higher robustness and fewer steric clashes than previous approaches. We make FlowBack freely available to the community as an open source Python package.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"672-692"},"PeriodicalIF":5.3000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c02046","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/7 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

Abstract

Coarse-grained models have become ubiquitous in biomolecular modeling tasks aimed at studying slow dynamical processes such as protein folding and DNA hybridization. These models can considerably accelerate sampling but it remains challenging to accurately and efficiently restore all-atom detail to the coarse-grained trajectory, which can be vital for detailed understanding of molecular mechanisms and calculation of observables contingent on all-atom coordinates. In this work, we introduce FlowBack as a deep generative model employing a flow-matching objective to map samples from a coarse-grained prior distribution to an all-atom data distribution. We construct our prior distribution to be agnostic to the coarse-grained map and molecular type. A protein-specific model trained on ∼65k structures from the Protein Data Bank achieves state-of-the-art performance on structural metrics compared to previous generative and rules-based approaches in applications to static PDB structures, all-atom simulations of fast-folding proteins, and coarse-grained trajectories generated by a machine-learned force field. A DNA-protein model trained on ∼1.5k DNA-protein complexes achieves excellent reconstruction and generative capabilities on static DNA-protein complexes from the Protein Data Bank as well as on out-of-distribution coarse-grained dynamical simulations of DNA-protein complexation. FlowBack offers an accurate, efficient, and easy-to-use tool to recover all-atom structures from coarse-grained molecular simulations with higher robustness and fewer steric clashes than previous approaches. We make FlowBack freely available to the community as an open source Python package.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

反排：生物分子反映射的广义流匹配方法。

粗粒度模型在研究蛋白质折叠和DNA杂交等慢动力学过程的生物分子建模任务中已经变得普遍存在。这些模型可以大大加快采样速度，但准确有效地恢复全原子细节到粗粒度轨迹仍然具有挑战性，这对于详细理解分子机制和计算全原子坐标下的观测值至关重要。在这项工作中，我们将FlowBack作为一种深度生成模型引入，该模型采用流匹配目标将样本从粗粒度先验分布映射到全原子数据分布。我们构建先验分布，使其与粗粒度图谱和分子类型无关。与以前的生成和基于规则的方法相比，在静态PDB结构、快速折叠蛋白质的全原子模拟和由机器学习力场生成的粗粒度轨迹的应用中，对来自蛋白质数据库的~ 65k结构进行训练的蛋白质特异性模型在结构指标上实现了最先进的性能。在~ 1.5k dna -蛋白质复合物上训练的dna -蛋白质模型在来自蛋白质数据库的静态dna -蛋白质复合物以及分布外的粗粒度dna -蛋白质复合物的动态模拟上实现了出色的重建和生成能力。FlowBack提供了一种准确、高效、易于使用的工具，可以从粗粒度的分子模拟中恢复全原子结构，与以前的方法相比，具有更高的鲁棒性和更少的空间冲突。我们将FlowBack作为开源Python包免费提供给社区。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Chemical Information and Modeling 化学-化学综合

CiteScore

9.80

自引率

10.70%

发文量

529

审稿时长

1.4 months

期刊介绍： The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.