Pub Date : 2026-03-01Epub Date: 2026-01-08DOI: 10.1063/5.0287423
Robert M Raddi, Tim Marshall, Vincent A Voelz
<p><p>To quantify how well theoretical predictions of structural ensembles agree with experimental measurements, we depend on the accuracy of forward models (FMs). These models are computational frameworks that generate observable quantities from molecular configurations based on empirical relationships linking specific molecular properties to experimental measurements. Bayesian Inference of Conformational Populations (BICePs) is a reweighting algorithm that reconciles simulated ensembles with ensemble-averaged experimental observations, even when such observations are sparse and/or noisy. This is achieved by sampling the posterior distribution of conformational populations under experimental restraints as well as sampling the posterior distribution of uncertainties due to random and systematic error. In this study, we enhance the algorithm for the refinement of empirical FM parameters. We introduce and evaluate two novel methods for optimizing FM parameters. The first method treats FM parameters as nuisance parameters, integrating over them in the full posterior distribution. The second method employs variational minimization of a quantity called the BICePs score that reports the free energy of "turning on" the experimental restraints. This technique, coupled with improved likelihood functions for handling experimental outliers, facilitates force field validation and optimization, as illustrated in recent studies [R. M. Raddi <i>et al.</i>, J. Chem. Theory Comput. <b>21</b>, 5880-5889 (2025) and R. M. Raddi and V. A. Voelz, "Automated optimization of force field parameters against ensemble-averaged measurements with Bayesian inference of conformational populations," arXiv:2402.11169 (2024)]. Using this approach, we refine parameters that modulate the Karplus relation, crucial for accurate predictions of <i>J</i>-coupling constants based on dihedral angles (<i>ϕ</i>) between interacting nuclei. We validate this approach first with a toy model system and then for human ubiquitin, predicting six sets of Karplus parameters for <math> <mmultiscripts><mrow><mi>J</mi></mrow> <mrow> <msup><mrow><mi>H</mi></mrow> <mrow><mi>N</mi></mrow> </msup> <msup><mrow><mi>H</mi></mrow> <mrow><mi>α</mi></mrow> </msup> </mrow> <none></none> <mprescripts></mprescripts> <none></none> <mrow><mn>3</mn></mrow> </mmultiscripts> </math> , <math> <mmultiscripts><mrow><mi>J</mi></mrow> <mrow> <msup><mrow><mi>H</mi></mrow> <mrow><mi>α</mi></mrow> </msup> <msup><mrow><mi>C</mi></mrow> <mrow><mo>'</mo></mrow> </msup> </mrow> <none></none> <mprescripts></mprescripts> <none></none> <mrow><mn>3</mn></mrow> </mmultiscripts> </math> , <math> <mmultiscripts><mrow><mi>J</mi></mrow> <mrow> <msup><mrow><mi>H</mi></mrow> <mrow><mi>N</mi></mrow> </msup> <msup><mrow><mi>C</mi></mrow> <mrow><mi>β</mi></mrow> </msup> </mrow> <none></none> <mprescripts></mprescripts> <none></none> <mrow><mn>3</mn></mrow> </mmultiscripts> </math> , <math> <mmultiscripts><mrow><mi>J</mi></mrow> <mrow> <msup><mrow
为了量化结构整体的理论预测与实验测量的一致程度,我们依赖于正演模型(FMs)的准确性。这些模型是计算框架,基于将特定分子性质与实验测量联系起来的经验关系,从分子构型中产生可观察到的数量。构象种群的贝叶斯推断(BICePs)是一种重新加权算法,它将模拟的集成与集成平均的实验观测相协调,即使这些观测是稀疏的和/或有噪声的。这是通过在实验约束下对构象总体的后验分布进行抽样,以及对随机和系统误差引起的不确定性的后验分布进行抽样来实现的。在本研究中,我们对经验调频参数的细化算法进行了改进。介绍并评价了两种优化调频参数的新方法。第一种方法将FM参数作为干扰参数,在完全后验分布中对它们进行积分。第二种方法是对BICePs分数进行变分最小化,该分数报告了“开启”实验约束的自由能。该技术与改进的似然函数一起用于处理实验异常值,有助于力场验证和优化,如最近的研究所示[R]。M. Raddi et al., J. Chem。李晓东,李晓东,李晓东,等。基于Bayesian推理的力场参数自动优化[j].计算机工程学报,2013,33(4):589 - 589(2025)。使用这种方法,我们改进了调节Karplus关系的参数,这对于基于相互作用原子核之间的二面角(φ)准确预测j耦合常数至关重要。我们首先用玩具模型系统验证了这一方法,然后对人类泛素进行了验证,预测了jh N H α 3、jh α C ' 3、jh N C β 3、jh N C ' 3、jc ' C β 3和jc ' C ' 3的六组Karplus参数。最后,我们证明了我们的框架自然地将优化推广到任何可微FM,例如那些由神经网络构建的FM。这种方法为训练和验证基于神经网络的FMs提供了一个有前途的方向。
{"title":"Automatic forward model parameterization with Bayesian inference of conformational populations.","authors":"Robert M Raddi, Tim Marshall, Vincent A Voelz","doi":"10.1063/5.0287423","DOIUrl":"10.1063/5.0287423","url":null,"abstract":"<p><p>To quantify how well theoretical predictions of structural ensembles agree with experimental measurements, we depend on the accuracy of forward models (FMs). These models are computational frameworks that generate observable quantities from molecular configurations based on empirical relationships linking specific molecular properties to experimental measurements. Bayesian Inference of Conformational Populations (BICePs) is a reweighting algorithm that reconciles simulated ensembles with ensemble-averaged experimental observations, even when such observations are sparse and/or noisy. This is achieved by sampling the posterior distribution of conformational populations under experimental restraints as well as sampling the posterior distribution of uncertainties due to random and systematic error. In this study, we enhance the algorithm for the refinement of empirical FM parameters. We introduce and evaluate two novel methods for optimizing FM parameters. The first method treats FM parameters as nuisance parameters, integrating over them in the full posterior distribution. The second method employs variational minimization of a quantity called the BICePs score that reports the free energy of \"turning on\" the experimental restraints. This technique, coupled with improved likelihood functions for handling experimental outliers, facilitates force field validation and optimization, as illustrated in recent studies [R. M. Raddi <i>et al.</i>, J. Chem. Theory Comput. <b>21</b>, 5880-5889 (2025) and R. M. Raddi and V. A. Voelz, \"Automated optimization of force field parameters against ensemble-averaged measurements with Bayesian inference of conformational populations,\" arXiv:2402.11169 (2024)]. Using this approach, we refine parameters that modulate the Karplus relation, crucial for accurate predictions of <i>J</i>-coupling constants based on dihedral angles (<i>ϕ</i>) between interacting nuclei. We validate this approach first with a toy model system and then for human ubiquitin, predicting six sets of Karplus parameters for <math> <mmultiscripts><mrow><mi>J</mi></mrow> <mrow> <msup><mrow><mi>H</mi></mrow> <mrow><mi>N</mi></mrow> </msup> <msup><mrow><mi>H</mi></mrow> <mrow><mi>α</mi></mrow> </msup> </mrow> <none></none> <mprescripts></mprescripts> <none></none> <mrow><mn>3</mn></mrow> </mmultiscripts> </math> , <math> <mmultiscripts><mrow><mi>J</mi></mrow> <mrow> <msup><mrow><mi>H</mi></mrow> <mrow><mi>α</mi></mrow> </msup> <msup><mrow><mi>C</mi></mrow> <mrow><mo>'</mo></mrow> </msup> </mrow> <none></none> <mprescripts></mprescripts> <none></none> <mrow><mn>3</mn></mrow> </mmultiscripts> </math> , <math> <mmultiscripts><mrow><mi>J</mi></mrow> <mrow> <msup><mrow><mi>H</mi></mrow> <mrow><mi>N</mi></mrow> </msup> <msup><mrow><mi>C</mi></mrow> <mrow><mi>β</mi></mrow> </msup> </mrow> <none></none> <mprescripts></mprescripts> <none></none> <mrow><mn>3</mn></mrow> </mmultiscripts> </math> , <math> <mmultiscripts><mrow><mi>J</mi></mrow> <mrow> <msup><mrow","PeriodicalId":520238,"journal":{"name":"APL machine learning","volume":"4 1","pages":"016102"},"PeriodicalIF":0.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12818351/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-08-12DOI: 10.1063/5.0271073
Duncan R Sutherland, Rachel Ford, Yun Liu, Tyler B Martin, Peter A Beaucage
The advancement of artificial-intelligence driven autonomous experiments demands physics-based modeling and decision-making processes, not only to improve the accuracy of the experimental trajectory but also to increase trust by allowing transparent human-machine collaboration. High-quality structural characterization techniques (e.g., x ray, neutron, or static light scattering) are a particularly relevant example of this need: they provide invaluable information but are challenging to analyze without expert oversight. Here, we introduce AutoSAS, a novel framework for human-aside-the-loop automated data classification. AutoSAS leverages human-defined candidate models, high-throughput combinatorial fitting, and information-theoretic model selection to generate both classification results and quantitative structural descriptors. We implement AutoSAS in an open-source package designed for use with the Autonomous Formulation Laboratory for x-ray and neutron scattering-based optimization of multi-component liquid formulations. In a first application, we leveraged a set of expert defined candidate models to classify, refine the structure, and track transformations in a model injectable drug carrier system. We evaluated four model selection methods and benchmarked them against an optimized machine learning classifier, and the best approach was one that balanced quality of the fit and complexity of the model. AutoSAS not only corroborated the critical micelle concentration boundary identified in previous experiments but also discovered a second structural transition boundary not identified by the previous methods. These results demonstrate the potential of AutoSAS to enhance autonomous experimental workflows by providing robust, interpretable model selection, paving the way for more reliable and insightful structural characterization in complex formulations.
{"title":"AutoSAS: A new human-aside-the-loop paradigm for automated SAS fitting for high throughput and autonomous experimentation.","authors":"Duncan R Sutherland, Rachel Ford, Yun Liu, Tyler B Martin, Peter A Beaucage","doi":"10.1063/5.0271073","DOIUrl":"https://doi.org/10.1063/5.0271073","url":null,"abstract":"<p><p>The advancement of artificial-intelligence driven autonomous experiments demands physics-based modeling and decision-making processes, not only to improve the accuracy of the experimental trajectory but also to increase trust by allowing transparent human-machine collaboration. High-quality structural characterization techniques (e.g., x ray, neutron, or static light scattering) are a particularly relevant example of this need: they provide invaluable information but are challenging to analyze without expert oversight. Here, we introduce AutoSAS, a novel framework for human-aside-the-loop automated data classification. AutoSAS leverages human-defined candidate models, high-throughput combinatorial fitting, and information-theoretic model selection to generate both classification results and quantitative structural descriptors. We implement AutoSAS in an open-source package designed for use with the Autonomous Formulation Laboratory for x-ray and neutron scattering-based optimization of multi-component liquid formulations. In a first application, we leveraged a set of expert defined candidate models to classify, refine the structure, and track transformations in a model injectable drug carrier system. We evaluated four model selection methods and benchmarked them against an optimized machine learning classifier, and the best approach was one that balanced quality of the fit and complexity of the model. AutoSAS not only corroborated the critical micelle concentration boundary identified in previous experiments but also discovered a second structural transition boundary not identified by the previous methods. These results demonstrate the potential of AutoSAS to enhance autonomous experimental workflows by providing robust, interpretable model selection, paving the way for more reliable and insightful structural characterization in complex formulations.</p>","PeriodicalId":520238,"journal":{"name":"APL machine learning","volume":"3 3","pages":"036111"},"PeriodicalIF":0.0,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12376025/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144985797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01Epub Date: 2024-09-23DOI: 10.1063/5.0194391
Tanish Baranwal, Jan Lebert, Jan Christoph
Electrical waves in the heart form rotating spiral or scroll waves during life-threatening arrhythmias, such as atrial or ventricular fibrillation. The wave dynamics are typically modeled using coupled partial differential equations, which describe reaction-diffusion dynamics in excitable media. More recently, data-driven generative modeling has emerged as an alternative to generate spatio-temporal patterns in physical and biological systems. Here, we explore denoising diffusion probabilistic models for the generative modeling of electrical wave patterns in cardiac tissue. We trained diffusion models with simulated electrical wave patterns to be able to generate such wave patterns in unconditional and conditional generation tasks. For instance, we explored the diffusion-based (i) parameter-specific generation, (ii) evolution, and (iii) inpainting of spiral wave dynamics, including reconstructing three-dimensional scroll wave dynamics from superficial two-dimensional measurements. Furthermore, we generated arbitrarily shaped bi-ventricular geometries and simultaneously initiated scroll wave patterns inside these geometries using diffusion. We characterized and compared the diffusion-generated solutions to solutions obtained with corresponding biophysical models and found that diffusion models learn to replicate spiral and scroll wave dynamics so well that they could be used for data-driven modeling of excitation waves in cardiac tissue. For instance, an ensemble of diffusion-generated spiral wave dynamics exhibits similar self-termination statistics as the corresponding ensemble simulated with a biophysical model. However, we also found that diffusion models produce artifacts if training data are lacking, e.g., during self-termination, and "hallucinate" wave patterns when insufficiently constrained.
{"title":"Dreaming of electrical waves: Generative modeling of cardiac excitation waves using diffusion models.","authors":"Tanish Baranwal, Jan Lebert, Jan Christoph","doi":"10.1063/5.0194391","DOIUrl":"10.1063/5.0194391","url":null,"abstract":"<p><p>Electrical waves in the heart form rotating spiral or scroll waves during life-threatening arrhythmias, such as atrial or ventricular fibrillation. The wave dynamics are typically modeled using coupled partial differential equations, which describe reaction-diffusion dynamics in excitable media. More recently, data-driven generative modeling has emerged as an alternative to generate spatio-temporal patterns in physical and biological systems. Here, we explore denoising diffusion probabilistic models for the generative modeling of electrical wave patterns in cardiac tissue. We trained diffusion models with simulated electrical wave patterns to be able to generate such wave patterns in unconditional and conditional generation tasks. For instance, we explored the diffusion-based (i) parameter-specific generation, (ii) evolution, and (iii) inpainting of spiral wave dynamics, including reconstructing three-dimensional scroll wave dynamics from superficial two-dimensional measurements. Furthermore, we generated arbitrarily shaped bi-ventricular geometries and simultaneously initiated scroll wave patterns inside these geometries using diffusion. We characterized and compared the diffusion-generated solutions to solutions obtained with corresponding biophysical models and found that diffusion models learn to replicate spiral and scroll wave dynamics so well that they could be used for data-driven modeling of excitation waves in cardiac tissue. For instance, an ensemble of diffusion-generated spiral wave dynamics exhibits similar self-termination statistics as the corresponding ensemble simulated with a biophysical model. However, we also found that diffusion models produce artifacts if training data are lacking, e.g., during self-termination, and \"hallucinate\" wave patterns when insufficiently constrained.</p>","PeriodicalId":520238,"journal":{"name":"APL machine learning","volume":"2 3","pages":"036113"},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11446137/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142374000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}