Maarten Joosten , Joel Greer , James Parkhurst , Tom Burnley , Arjen J. Jakobi
{"title":"Roodmus: a toolkit for benchmarking heterogeneous electron cryo-microscopy reconstructions","authors":"Maarten Joosten , Joel Greer , James Parkhurst , Tom Burnley , Arjen J. Jakobi","doi":"10.1107/S2052252524009321","DOIUrl":null,"url":null,"abstract":"<div><div><em>Roodmus</em> is a toolkit sourcing conformational heterogeneity of biomacromolecules from molecular dynamics simulations to generate high-quality synthetic data for the development and benchmarking of heterogeneous reconstruction algorithms.</div></div><div><div>Conformational heterogeneity of biological macromolecules is a challenge in single-particle averaging (SPA). Current standard practice is to employ classification and filtering methods that may allow a discrete number of conformational states to be reconstructed. However, the conformation space accessible to these molecules is continuous and, therefore, explored incompletely by a small number of discrete classes. Recently developed heterogeneous reconstruction algorithms (HRAs) to analyse continuous heterogeneity rely on machine-learning methods that employ low-dimensional latent space representations. The non-linear nature of many of these methods poses a challenge to their validation and interpretation and to identifying functionally relevant conformational trajectories. These methods would benefit from in-depth benchmarking using high-quality synthetic data and concomitant ground truth information. We present a framework for the simulation and subsequent analysis with respect to the ground truth of cryo-EM micrographs containing particles whose conformational heterogeneity is sourced from molecular dynamics simulations. These synthetic data can be processed as if they were experimental data, allowing aspects of standard SPA workflows as well as heterogeneous reconstruction methods to be compared with known ground truth using available utilities. The simulation and analysis of several such datasets are demonstrated and an initial investigation into HRAs is presented.</div></div>","PeriodicalId":14775,"journal":{"name":"IUCrJ","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11533995/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IUCrJ","FirstCategoryId":"88","ListUrlMain":"https://www.sciencedirect.com/org/science/article/pii/S2052252524000939","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Roodmus is a toolkit sourcing conformational heterogeneity of biomacromolecules from molecular dynamics simulations to generate high-quality synthetic data for the development and benchmarking of heterogeneous reconstruction algorithms.
Conformational heterogeneity of biological macromolecules is a challenge in single-particle averaging (SPA). Current standard practice is to employ classification and filtering methods that may allow a discrete number of conformational states to be reconstructed. However, the conformation space accessible to these molecules is continuous and, therefore, explored incompletely by a small number of discrete classes. Recently developed heterogeneous reconstruction algorithms (HRAs) to analyse continuous heterogeneity rely on machine-learning methods that employ low-dimensional latent space representations. The non-linear nature of many of these methods poses a challenge to their validation and interpretation and to identifying functionally relevant conformational trajectories. These methods would benefit from in-depth benchmarking using high-quality synthetic data and concomitant ground truth information. We present a framework for the simulation and subsequent analysis with respect to the ground truth of cryo-EM micrographs containing particles whose conformational heterogeneity is sourced from molecular dynamics simulations. These synthetic data can be processed as if they were experimental data, allowing aspects of standard SPA workflows as well as heterogeneous reconstruction methods to be compared with known ground truth using available utilities. The simulation and analysis of several such datasets are demonstrated and an initial investigation into HRAs is presented.
生物大分子的构象异质性是单粒子平均法(SPA)面临的一个挑战。目前的标准做法是采用分类和过滤方法,这样可以重建离散的构象状态。然而,这些分子所能进入的构象空间是连续的,因此少量的离散类别对其进行的探索是不完整的。最近开发的异构重构算法(HRAs)分析连续异构性依赖于采用低维潜在空间表示的机器学习方法。其中许多方法的非线性性质对其验证和解释以及识别功能相关的构象轨迹构成了挑战。利用高质量的合成数据和相关的地面实况信息对这些方法进行深入的基准测试将使它们受益匪浅。我们提出了一个框架,用于模拟和随后分析含有构象异质性来自分子动力学模拟的颗粒的低温电子显微镜显微照片的基本真相。可以像处理实验数据一样处理这些合成数据,从而利用现有工具将标准 SPA 工作流程以及异构重建方法的各个方面与已知的基本事实进行比较。本文演示了几个此类数据集的模拟和分析,并介绍了对 HRA 的初步研究。
期刊介绍:
IUCrJ is a new fully open-access peer-reviewed journal from the International Union of Crystallography (IUCr).
The journal will publish high-profile articles on all aspects of the sciences and technologies supported by the IUCr via its commissions, including emerging fields where structural results underpin the science reported in the article. Our aim is to make IUCrJ the natural home for high-quality structural science results. Chemists, biologists, physicists and material scientists will be actively encouraged to report their structural studies in IUCrJ.