A Python package based on robust statistical analysis for serial crystallography data processing.

IF 2.6 4区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Acta Crystallographica. Section D, Structural Biology Pub Date : 2023-09-01 Epub Date: 2023-08-16 DOI:10.1107/S2059798323005855
Marjan Hadian-Jazi, Alireza Sadri
{"title":"A Python package based on robust statistical analysis for serial crystallography data processing.","authors":"Marjan Hadian-Jazi, Alireza Sadri","doi":"10.1107/S2059798323005855","DOIUrl":null,"url":null,"abstract":"<p><p>The term robustness in statistics refers to methods that are generally insensitive to deviations from model assumptions. In other words, robust methods are able to preserve their accuracy even when the data do not perfectly fit the statistical models. Robust statistical analyses are particularly effective when analysing mixtures of probability distributions. Therefore, these methods enable the discretization of X-ray serial crystallography data into two probability distributions: a group comprising true data points (for example the background intensities) and another group comprising outliers (for example Bragg peaks or bad pixels on an X-ray detector). These characteristics of robust statistical analysis are beneficial for the ever-increasing volume of serial crystallography (SX) data sets produced at synchrotron and X-ray free-electron laser (XFEL) sources. The key advantage of the use of robust statistics for some applications in SX data analysis is that it requires minimal parameter tuning because of its insensitivity to the input parameters. In this paper, a software package called Robust Gaussian Fitting library (RGFlib) is introduced that is based on the concept of robust statistics. Two methods are presented based on the concept of robust statistics and RGFlib for two SX data-analysis tasks: (i) a robust peak-finding algorithm and (ii) an automated robust method to detect bad pixels on X-ray pixel detectors.</p>","PeriodicalId":7116,"journal":{"name":"Acta Crystallographica. Section D, Structural Biology","volume":"79 Pt 9","pages":"820-829"},"PeriodicalIF":2.6000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10478633/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Crystallographica. Section D, Structural Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1107/S2059798323005855","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/8/16 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

The term robustness in statistics refers to methods that are generally insensitive to deviations from model assumptions. In other words, robust methods are able to preserve their accuracy even when the data do not perfectly fit the statistical models. Robust statistical analyses are particularly effective when analysing mixtures of probability distributions. Therefore, these methods enable the discretization of X-ray serial crystallography data into two probability distributions: a group comprising true data points (for example the background intensities) and another group comprising outliers (for example Bragg peaks or bad pixels on an X-ray detector). These characteristics of robust statistical analysis are beneficial for the ever-increasing volume of serial crystallography (SX) data sets produced at synchrotron and X-ray free-electron laser (XFEL) sources. The key advantage of the use of robust statistics for some applications in SX data analysis is that it requires minimal parameter tuning because of its insensitivity to the input parameters. In this paper, a software package called Robust Gaussian Fitting library (RGFlib) is introduced that is based on the concept of robust statistics. Two methods are presented based on the concept of robust statistics and RGFlib for two SX data-analysis tasks: (i) a robust peak-finding algorithm and (ii) an automated robust method to detect bad pixels on X-ray pixel detectors.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于稳健统计分析的 Python 软件包,用于序列晶体学数据处理。
统计学中的稳健性是指对模型假设偏差通常不敏感的方法。换句话说,即使数据与统计模型不完全吻合,稳健方法也能保持其准确性。稳健统计分析在分析概率分布混合物时尤为有效。因此,这些方法可以将 X 射线序列晶体学数据离散化为两个概率分布:一组包括真实数据点(例如背景强度),另一组包括异常值(例如布拉格峰或 X 射线探测器上的坏像素)。同步加速器和 X 射线自由电子激光 (XFEL) 源产生的序列晶体学 (SX) 数据集数量不断增加,而稳健统计分析的这些特性对它们大有裨益。在 SX 数据分析的某些应用中使用稳健统计的主要优点是,由于它对输入参数不敏感,因此只需进行最少的参数调整。本文介绍了一个基于鲁棒统计概念的软件包,名为鲁棒高斯拟合库(RGFlib)。本文介绍了基于鲁棒统计概念和 RGFlib 的两种方法,分别用于两个 SX 数据分析任务:(i) 鲁棒峰值搜索算法和 (ii) 自动鲁棒方法,用于检测 X 射线像素探测器上的坏像素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Acta Crystallographica. Section D, Structural Biology
Acta Crystallographica. Section D, Structural Biology BIOCHEMICAL RESEARCH METHODSBIOCHEMISTRY &-BIOCHEMISTRY & MOLECULAR BIOLOGY
CiteScore
4.50
自引率
13.60%
发文量
216
期刊介绍: Acta Crystallographica Section D welcomes the submission of articles covering any aspect of structural biology, with a particular emphasis on the structures of biological macromolecules or the methods used to determine them. Reports on new structures of biological importance may address the smallest macromolecules to the largest complex molecular machines. These structures may have been determined using any structural biology technique including crystallography, NMR, cryoEM and/or other techniques. The key criterion is that such articles must present significant new insights into biological, chemical or medical sciences. The inclusion of complementary data that support the conclusions drawn from the structural studies (such as binding studies, mass spectrometry, enzyme assays, or analysis of mutants or other modified forms of biological macromolecule) is encouraged. Methods articles may include new approaches to any aspect of biological structure determination or structure analysis but will only be accepted where they focus on new methods that are demonstrated to be of general applicability and importance to structural biology. Articles describing particularly difficult problems in structural biology are also welcomed, if the analysis would provide useful insights to others facing similar problems.
期刊最新文献
The success rate of processed predicted models in molecular replacement: implications for experimental phasing in the AlphaFold era. EMhub: a web platform for data management and on-the-fly processing in scientific facilities. Welcoming two new Co-editors. CHiMP: deep-learning tools trained on protein crystallization micrographs to enable automation of experiments. Robust and automatic beamstop shadow outlier rejection: combining crystallographic statistics with modern clustering under a semi-supervised learning strategy.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1