FlowBack: A Generalized Flow-Matching Approach for Biomolecular Backmapping.

IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Journal of Chemical Information and Modeling Pub Date : 2025-01-27 Epub Date: 2025-01-07 DOI:10.1021/acs.jcim.4c02046
Michael S Jones, Smayan Khanna, Andrew L Ferguson
{"title":"FlowBack: A Generalized Flow-Matching Approach for Biomolecular Backmapping.","authors":"Michael S Jones, Smayan Khanna, Andrew L Ferguson","doi":"10.1021/acs.jcim.4c02046","DOIUrl":null,"url":null,"abstract":"<p><p>Coarse-grained models have become ubiquitous in biomolecular modeling tasks aimed at studying slow dynamical processes such as protein folding and DNA hybridization. These models can considerably accelerate sampling but it remains challenging to accurately and efficiently restore all-atom detail to the coarse-grained trajectory, which can be vital for detailed understanding of molecular mechanisms and calculation of observables contingent on all-atom coordinates. In this work, we introduce FlowBack as a deep generative model employing a flow-matching objective to map samples from a coarse-grained prior distribution to an all-atom data distribution. We construct our prior distribution to be agnostic to the coarse-grained map and molecular type. A protein-specific model trained on ∼65k structures from the Protein Data Bank achieves state-of-the-art performance on structural metrics compared to previous generative and rules-based approaches in applications to static PDB structures, all-atom simulations of fast-folding proteins, and coarse-grained trajectories generated by a machine-learned force field. A DNA-protein model trained on ∼1.5k DNA-protein complexes achieves excellent reconstruction and generative capabilities on static DNA-protein complexes from the Protein Data Bank as well as on out-of-distribution coarse-grained dynamical simulations of DNA-protein complexation. FlowBack offers an accurate, efficient, and easy-to-use tool to recover all-atom structures from coarse-grained molecular simulations with higher robustness and fewer steric clashes than previous approaches. We make FlowBack freely available to the community as an open source Python package.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"672-692"},"PeriodicalIF":5.6000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c02046","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/7 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

Coarse-grained models have become ubiquitous in biomolecular modeling tasks aimed at studying slow dynamical processes such as protein folding and DNA hybridization. These models can considerably accelerate sampling but it remains challenging to accurately and efficiently restore all-atom detail to the coarse-grained trajectory, which can be vital for detailed understanding of molecular mechanisms and calculation of observables contingent on all-atom coordinates. In this work, we introduce FlowBack as a deep generative model employing a flow-matching objective to map samples from a coarse-grained prior distribution to an all-atom data distribution. We construct our prior distribution to be agnostic to the coarse-grained map and molecular type. A protein-specific model trained on ∼65k structures from the Protein Data Bank achieves state-of-the-art performance on structural metrics compared to previous generative and rules-based approaches in applications to static PDB structures, all-atom simulations of fast-folding proteins, and coarse-grained trajectories generated by a machine-learned force field. A DNA-protein model trained on ∼1.5k DNA-protein complexes achieves excellent reconstruction and generative capabilities on static DNA-protein complexes from the Protein Data Bank as well as on out-of-distribution coarse-grained dynamical simulations of DNA-protein complexation. FlowBack offers an accurate, efficient, and easy-to-use tool to recover all-atom structures from coarse-grained molecular simulations with higher robustness and fewer steric clashes than previous approaches. We make FlowBack freely available to the community as an open source Python package.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
反排:生物分子反映射的广义流匹配方法。
粗粒度模型在研究蛋白质折叠和DNA杂交等慢动力学过程的生物分子建模任务中已经变得普遍存在。这些模型可以大大加快采样速度,但准确有效地恢复全原子细节到粗粒度轨迹仍然具有挑战性,这对于详细理解分子机制和计算全原子坐标下的观测值至关重要。在这项工作中,我们将FlowBack作为一种深度生成模型引入,该模型采用流匹配目标将样本从粗粒度先验分布映射到全原子数据分布。我们构建先验分布,使其与粗粒度图谱和分子类型无关。与以前的生成和基于规则的方法相比,在静态PDB结构、快速折叠蛋白质的全原子模拟和由机器学习力场生成的粗粒度轨迹的应用中,对来自蛋白质数据库的~ 65k结构进行训练的蛋白质特异性模型在结构指标上实现了最先进的性能。在~ 1.5k dna -蛋白质复合物上训练的dna -蛋白质模型在来自蛋白质数据库的静态dna -蛋白质复合物以及分布外的粗粒度dna -蛋白质复合物的动态模拟上实现了出色的重建和生成能力。FlowBack提供了一种准确、高效、易于使用的工具,可以从粗粒度的分子模拟中恢复全原子结构,与以前的方法相比,具有更高的鲁棒性和更少的空间冲突。我们将FlowBack作为开源Python包免费提供给社区。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
期刊最新文献
MutualDTA: An Interpretable Drug-Target Affinity Prediction Model Leveraging Pretrained Models and Mutual Attention. pLM4CPPs: Protein Language Model-Based Predictor for Cell Penetrating Peptides. Chemically Informed Deep Learning for Interpretable Radical Reaction Prediction. Improving Bond Dissociations of Reactive Machine Learning Potentials through Physics-Constrained Data Augmentation. DO-GMA: An End-to-End Drug-Target Interaction Identification Framework with a Depthwise Overparameterized Convolutional Network and the Gated Multihead Attention Mechanism.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1