Tutorial on Molecular Latent Space Simulators (LSSs): Spatially and Temporally Continuous Data-Driven Surrogate Dynamical Models of Molecular Systems.

IF 2.7 2区 化学 Q3 CHEMISTRY, PHYSICAL The Journal of Physical Chemistry A Pub Date : 2024-11-14 DOI:10.1021/acs.jpca.4c05389
Michael S Jones, Kirill Shmilovich, Andrew L Ferguson
{"title":"Tutorial on Molecular Latent Space Simulators (LSSs): Spatially and Temporally Continuous Data-Driven Surrogate Dynamical Models of Molecular Systems.","authors":"Michael S Jones, Kirill Shmilovich, Andrew L Ferguson","doi":"10.1021/acs.jpca.4c05389","DOIUrl":null,"url":null,"abstract":"<p><p>The inherently serial nature and requirement for short integration time steps in the numerical integration of molecular dynamics (MD) calculations place strong limitations on the accessible simulation time scales and statistical uncertainties in sampling slowly relaxing dynamical modes and rare events. Molecular latent space simulators (LSSs) are a data-driven approach to learning a surrogate dynamical model of the molecular system from modest MD training trajectories that can generate synthetic trajectories at a fraction of the computational cost. The training data may comprise single long trajectories or multiple short, discontinuous trajectories collected over, for example, distributed computing resources. Provided the training data provide sufficient sampling of the relevant thermodynamic states and dynamical transitions to robustly learn the underlying microscopic propagator, an LSS furnishes a global model of the dynamics capable of producing temporally and spatially continuous molecular trajectories. Trained LSS models have produced simulation trajectories at up to 6 orders of magnitude lower cost than standard MD to enable dense sampling of molecular phase space and large reduction of the statistical errors in structural, thermodynamic, and kinetic observables. The LSS employs three deep learning architectures to solve three independent learning problems over the training data: (i) an encoding of the high-dimensional MD into a low-dimensional slow latent space using state-free reversible VAMPnets (SRVs), (ii) a propagator of the microscopic dynamics within the low-dimensional latent space using mixture density networks (MDNs), and (iii) a generative decoding of the low-dimensional latent coordinates back to the original high-dimensional molecular configuration space using conditional Wasserstein generative adversarial networks (cWGANs) or denoising diffusion probability models (DDPMs). In this software tutorial, we introduce the mathematical and numerical background and theory of LSS and present example applications of a user-friendly Python package software implementation to alanine dipeptide and a 28-residue beta-beta-alpha (BBA) protein within simple Google Colab notebooks.</p>","PeriodicalId":59,"journal":{"name":"The Journal of Physical Chemistry A","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Physical Chemistry A","FirstCategoryId":"1","ListUrlMain":"https://doi.org/10.1021/acs.jpca.4c05389","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

The inherently serial nature and requirement for short integration time steps in the numerical integration of molecular dynamics (MD) calculations place strong limitations on the accessible simulation time scales and statistical uncertainties in sampling slowly relaxing dynamical modes and rare events. Molecular latent space simulators (LSSs) are a data-driven approach to learning a surrogate dynamical model of the molecular system from modest MD training trajectories that can generate synthetic trajectories at a fraction of the computational cost. The training data may comprise single long trajectories or multiple short, discontinuous trajectories collected over, for example, distributed computing resources. Provided the training data provide sufficient sampling of the relevant thermodynamic states and dynamical transitions to robustly learn the underlying microscopic propagator, an LSS furnishes a global model of the dynamics capable of producing temporally and spatially continuous molecular trajectories. Trained LSS models have produced simulation trajectories at up to 6 orders of magnitude lower cost than standard MD to enable dense sampling of molecular phase space and large reduction of the statistical errors in structural, thermodynamic, and kinetic observables. The LSS employs three deep learning architectures to solve three independent learning problems over the training data: (i) an encoding of the high-dimensional MD into a low-dimensional slow latent space using state-free reversible VAMPnets (SRVs), (ii) a propagator of the microscopic dynamics within the low-dimensional latent space using mixture density networks (MDNs), and (iii) a generative decoding of the low-dimensional latent coordinates back to the original high-dimensional molecular configuration space using conditional Wasserstein generative adversarial networks (cWGANs) or denoising diffusion probability models (DDPMs). In this software tutorial, we introduce the mathematical and numerical background and theory of LSS and present example applications of a user-friendly Python package software implementation to alanine dipeptide and a 28-residue beta-beta-alpha (BBA) protein within simple Google Colab notebooks.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
分子潜空间模拟器(LSSs)教程:分子系统的时空连续数据驱动替代动力学模型。
分子动力学(MD)计算的固有串行性质和对短积分时间步长的要求,对可访问的模拟时间尺度以及对缓慢松弛动力学模式和罕见事件采样的统计不确定性造成了很大限制。分子潜空间模拟器(LSS)是一种数据驱动方法,可从适度的 MD 训练轨迹中学习分子系统的替代动力学模型,从而以较低的计算成本生成合成轨迹。训练数据可包括通过分布式计算资源等收集的单个长轨迹或多个短的、不连续的轨迹。只要训练数据能提供足够的相关热力学状态和动力学转换采样,从而稳健地学习底层微观传播者,LSS 就能提供一个全局动力学模型,并能生成时间和空间上连续的分子轨迹。训练有素的 LSS 模型生成模拟轨迹的成本比标准 MD 低达 6 个数量级,从而能够对分子相空间进行密集采样,并大幅降低结构、热力学和动力学观测值的统计误差。LSS 采用三种深度学习架构来解决训练数据中的三个独立学习问题:(i) 使用无状态可逆 VAMPnet(SRV)将高维 MD 编码为低维慢潜空间;(ii) 使用混合密度网络(MDN)在低维潜空间内传播微观动力学;(iii) 使用条件瓦瑟斯坦生成对抗网络(cWGAN)或去噪扩散概率模型(DDPM)将低维潜坐标生成解码回原始高维分子构型空间。在本软件教程中,我们将介绍 LSS 的数学和数值背景及理论,并在简单的 Google Colab 笔记本中介绍用户友好型 Python 软件包在丙氨酸二肽和 28 位元 beta-beta-α (BBA) 蛋白质中的应用实例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
The Journal of Physical Chemistry A
The Journal of Physical Chemistry A 化学-物理:原子、分子和化学物理
CiteScore
5.20
自引率
10.30%
发文量
922
审稿时长
1.3 months
期刊介绍: The Journal of Physical Chemistry A is devoted to reporting new and original experimental and theoretical basic research of interest to physical chemists, biophysical chemists, and chemical physicists.
期刊最新文献
DFT and Model Hamiltonian Study of Optoelectronic Properties of Some Low-Symmetry Graphene Quantum Dots. Force-Assisted Orbital Crossing in Mechanochemical Oxirane Ring Opening. Intramolecular Polarization Contributions to the pKa's of Carboxylic Acids Through the Chain Length Dependence of Vibrational Tag-Shifts in Cryogenically Cooled Pyridinium-(CH2)n-COOH (n = 1-7) Cations. Ordering Effect of Charge-Charge Repulsion in Doped Antiferromagnetic Lattices: A Coupled Cluster Study. Ring Currents in the Clar Goblet Calculated Using Configurational State Averaging.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1