Justin Trujillo, Russell Fung, Madan Kumar Shankar, Peter Schwander, Ahmad Hosseinizadeh
{"title":"Filling data analysis gaps in time-resolved crystallography by machine learning.","authors":"Justin Trujillo, Russell Fung, Madan Kumar Shankar, Peter Schwander, Ahmad Hosseinizadeh","doi":"10.1063/4.0000280","DOIUrl":null,"url":null,"abstract":"<p><p>There is a growing understanding of the structural dynamics of biological molecules fueled by x-ray crystallography experiments. Time-resolved serial femtosecond crystallography (TR-SFX) with x-ray Free Electron Lasers allows the measurement of ultrafast structural changes in proteins. Nevertheless, this technique comes with some limitations. One major challenge is the quality of data from TR-SFX measurements, which often faces issues like data sparsity, partial recording of Bragg reflections, timing errors, and pixel noise. To overcome these difficulties, conventionally, large volumes of data are collected and grouped into a few temporal bins. The data in each bin are then averaged and paired with the mean of their corresponding jittered timestamps. This procedure provides one structure per bin, resulting in a limited number of averaged structures for the entire time interval spanned by the experiment. Therefore, the information on ultrafast structural dynamics at high temporal resolution is lost. This has initiated research for advanced methods of analyzing experimental TR-SFX data beyond the standard binning and averaging method. To address this problem, we use a machine learning algorithm called Nonlinear Laplacian Spectral Analysis (NLSA), which has emerged as a promising technique for studying the dynamics of complex systems. In this work, we demonstrate the power of this algorithm using synthetic x-ray diffraction snapshots from a protein with significant data incompleteness, timing uncertainties, and noise. Our study confirms that NLSA is a suitable approach that effectively mitigates the effects of these artifacts in TR-SFX data and recovers accurate structural dynamics information hidden in such data.</p>","PeriodicalId":48683,"journal":{"name":"Structural Dynamics-Us","volume":"12 1","pages":"014101"},"PeriodicalIF":2.3000,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11758283/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Structural Dynamics-Us","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1063/4.0000280","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
There is a growing understanding of the structural dynamics of biological molecules fueled by x-ray crystallography experiments. Time-resolved serial femtosecond crystallography (TR-SFX) with x-ray Free Electron Lasers allows the measurement of ultrafast structural changes in proteins. Nevertheless, this technique comes with some limitations. One major challenge is the quality of data from TR-SFX measurements, which often faces issues like data sparsity, partial recording of Bragg reflections, timing errors, and pixel noise. To overcome these difficulties, conventionally, large volumes of data are collected and grouped into a few temporal bins. The data in each bin are then averaged and paired with the mean of their corresponding jittered timestamps. This procedure provides one structure per bin, resulting in a limited number of averaged structures for the entire time interval spanned by the experiment. Therefore, the information on ultrafast structural dynamics at high temporal resolution is lost. This has initiated research for advanced methods of analyzing experimental TR-SFX data beyond the standard binning and averaging method. To address this problem, we use a machine learning algorithm called Nonlinear Laplacian Spectral Analysis (NLSA), which has emerged as a promising technique for studying the dynamics of complex systems. In this work, we demonstrate the power of this algorithm using synthetic x-ray diffraction snapshots from a protein with significant data incompleteness, timing uncertainties, and noise. Our study confirms that NLSA is a suitable approach that effectively mitigates the effects of these artifacts in TR-SFX data and recovers accurate structural dynamics information hidden in such data.
Structural Dynamics-UsCHEMISTRY, PHYSICALPHYSICS, ATOMIC, MOLECU-PHYSICS, ATOMIC, MOLECULAR & CHEMICAL
CiteScore
5.50
自引率
3.60%
发文量
24
审稿时长
16 weeks
期刊介绍:
Structural Dynamics focuses on the recent developments in experimental and theoretical methods and techniques that allow a visualization of the electronic and geometric structural changes in real time of chemical, biological, and condensed-matter systems. The community of scientists and engineers working on structural dynamics in such diverse systems often use similar instrumentation and methods.
The journal welcomes articles dealing with fundamental problems of electronic and structural dynamics that are tackled by new methods, such as:
Time-resolved X-ray and electron diffraction and scattering,
Coherent diffractive imaging,
Time-resolved X-ray spectroscopies (absorption, emission, resonant inelastic scattering, etc.),
Time-resolved electron energy loss spectroscopy (EELS) and electron microscopy,
Time-resolved photoelectron spectroscopies (UPS, XPS, ARPES, etc.),
Multidimensional spectroscopies in the infrared, the visible and the ultraviolet,
Nonlinear spectroscopies in the VUV, the soft and the hard X-ray domains,
Theory and computational methods and algorithms for the analysis and description of structuraldynamics and their associated experimental signals.
These new methods are enabled by new instrumentation, such as:
X-ray free electron lasers, which provide flux, coherence, and time resolution,
New sources of ultrashort electron pulses,
New sources of ultrashort vacuum ultraviolet (VUV) to hard X-ray pulses, such as high-harmonic generation (HHG) sources or plasma-based sources,
New sources of ultrashort infrared and terahertz (THz) radiation,
New detectors for X-rays and electrons,
New sample handling and delivery schemes,
New computational capabilities.