Pub Date : 2024-12-01Epub Date: 2024-11-28DOI: 10.1107/S2059798324011392
Jimin Wang
Recently, the conclusions drawn from crystallographic data about the number of oxygen ligands associated with the CaMn4 cofactor in the oxygen-evolving center (OEC) of Thermosynechococcus vulcanus photosystem II (PSII) have been called into question. Here, using OEC-omit, metal ion-omit and ligand-omit electron-density maps, it is shown that the number of oxygen ligands ranges from three in the functional OEC of monomer B following dark adaption (0F), i.e. in its ground state (PDB entry 6jlj/0F and PDB entry 6jlm/0F), to five for both monomers of PSII in photo-advanced states following exposure to one and two flashes of light. For a significant fraction of the 0F OECs in monomer A, the number is four (PDB entry 6jlj/0F). Following one flash it increases to five (PDB entry 6jlk/1F), where it remains after a second flash (PDB entry 6jlj/2F). Following a third flash (3F), it decreases to three (PDB entry 6jlp/3F), suggesting that an O2 molecule has been produced. These observations suggest a mechanism for the reaction that transforms the O atoms of the water molecules bound at the O3 and O1 sites of the OEC into O2.
{"title":"Photosystem II: light-dependent oscillation of ligand composition at its active site.","authors":"Jimin Wang","doi":"10.1107/S2059798324011392","DOIUrl":"10.1107/S2059798324011392","url":null,"abstract":"<p><p>Recently, the conclusions drawn from crystallographic data about the number of oxygen ligands associated with the CaMn<sub>4</sub> cofactor in the oxygen-evolving center (OEC) of Thermosynechococcus vulcanus photosystem II (PSII) have been called into question. Here, using OEC-omit, metal ion-omit and ligand-omit electron-density maps, it is shown that the number of oxygen ligands ranges from three in the functional OEC of monomer B following dark adaption (0F), i.e. in its ground state (PDB entry 6jlj/0F and PDB entry 6jlm/0F), to five for both monomers of PSII in photo-advanced states following exposure to one and two flashes of light. For a significant fraction of the 0F OECs in monomer A, the number is four (PDB entry 6jlj/0F). Following one flash it increases to five (PDB entry 6jlk/1F), where it remains after a second flash (PDB entry 6jlj/2F). Following a third flash (3F), it decreases to three (PDB entry 6jlp/3F), suggesting that an O<sub>2</sub> molecule has been produced. These observations suggest a mechanism for the reaction that transforms the O atoms of the water molecules bound at the O3 and O1 sites of the OEC into O<sub>2</sub>.</p>","PeriodicalId":7116,"journal":{"name":"Acta Crystallographica. Section D, Structural Biology","volume":" ","pages":"850-861"},"PeriodicalIF":2.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142749695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-12-03DOI: 10.1107/S2059798324011458
Kaveh H Babai, Fei Long, Martin Malý, Keitaro Yamashita, Garib N Murshudov
Metals are essential components for the structure and function of many proteins. However, accurate modelling of their coordination environments remains a challenge due to the complexity and diversity of metal-coordination geometries. To address this, a method is presented for extracting and analysing coordination information, including bond lengths and angles, from the Crystallography Open Database. By using these data, comprehensive descriptions of metal-containing components are generated. A stereochemical information generator for a particular component within a specific macromolecule leverages an example PDB/mmCIF file containing the component to account for the actual surrounding environment. A matching process has been developed and implemented to align the derived metal structures with idealized coordinates from a coordination geometry library. Additionally, various strategies, depending on the quality of the matches, were employed to compile distance and angle statistics for the refinement of macromolecular structures. The developed methods were implemented in a new program, MetalCoord, that classifies and utilizes the metal-coordination geometry. The effectiveness of the developed algorithms was tested using metal-containing components from the PDB. As a result, metal-containing components from the CCP4 monomer library have been updated. The updated monomer dictionaries, in concert with the derived restraints, can be used in most structural biology computations, including macromolecular crystallography, single-particle cryo-EM and even molecular mechanics.
{"title":"Improving macromolecular structure refinement with metal-coordination restraints.","authors":"Kaveh H Babai, Fei Long, Martin Malý, Keitaro Yamashita, Garib N Murshudov","doi":"10.1107/S2059798324011458","DOIUrl":"10.1107/S2059798324011458","url":null,"abstract":"<p><p>Metals are essential components for the structure and function of many proteins. However, accurate modelling of their coordination environments remains a challenge due to the complexity and diversity of metal-coordination geometries. To address this, a method is presented for extracting and analysing coordination information, including bond lengths and angles, from the Crystallography Open Database. By using these data, comprehensive descriptions of metal-containing components are generated. A stereochemical information generator for a particular component within a specific macromolecule leverages an example PDB/mmCIF file containing the component to account for the actual surrounding environment. A matching process has been developed and implemented to align the derived metal structures with idealized coordinates from a coordination geometry library. Additionally, various strategies, depending on the quality of the matches, were employed to compile distance and angle statistics for the refinement of macromolecular structures. The developed methods were implemented in a new program, MetalCoord, that classifies and utilizes the metal-coordination geometry. The effectiveness of the developed algorithms was tested using metal-containing components from the PDB. As a result, metal-containing components from the CCP4 monomer library have been updated. The updated monomer dictionaries, in concert with the derived restraints, can be used in most structural biology computations, including macromolecular crystallography, single-particle cryo-EM and even molecular mechanics.</p>","PeriodicalId":7116,"journal":{"name":"Acta Crystallographica. Section D, Structural Biology","volume":" ","pages":"821-833"},"PeriodicalIF":2.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11626771/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142765374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-12-05DOI: 10.1107/S2059798324007848
Charles S Bond, Joel L Sussman
A comment on how easy (or difficult) it is to find a stucture of interest and some suggestions on what could be done to start to address the problem.
关于找到一个感兴趣的结构是多么容易(或困难)的评论,以及如何开始解决这个问题的一些建议。
{"title":"Everyone is using biological structures, but how does one find the structure(s) one wants?","authors":"Charles S Bond, Joel L Sussman","doi":"10.1107/S2059798324007848","DOIUrl":"https://doi.org/10.1107/S2059798324007848","url":null,"abstract":"<p><p>A comment on how easy (or difficult) it is to find a stucture of interest and some suggestions on what could be done to start to address the problem.</p>","PeriodicalId":7116,"journal":{"name":"Acta Crystallographica. Section D, Structural Biology","volume":"80 Pt 12","pages":"819-820"},"PeriodicalIF":2.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142794230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01Epub Date: 2024-10-03DOI: 10.1107/S2059798324009380
Ronan M Keegan, Adam J Simpkin, Daniel J Rigden
The availability of highly accurate protein structure predictions from AlphaFold2 (AF2) and similar tools has hugely expanded the applicability of molecular replacement (MR) for crystal structure solution. Many structures can be solved routinely using raw models, structures processed to remove unreliable parts or models split into distinct structural units. There is therefore an open question around how many and which cases still require experimental phasing methods such as single-wavelength anomalous diffraction (SAD). Here, this question is addressed using a large set of PDB depositions that were solved by SAD. A large majority (87%) could be solved using unedited or minimally edited AF2 predictions. A further 18 (4%) yield straightforwardly to MR after splitting of the AF2 prediction using Slice'N'Dice, although different splitting methods succeeded on slightly different sets of cases. It is also found that further unique targets can be solved by alternative modelling approaches such as ESMFold (four cases), alternative MR approaches such as ARCIMBOLDO and AMPLE (two cases each), and multimeric model building with AlphaFold-Multimer or UniFold (three cases). Ultimately, only 12 cases, or 3% of the SAD-phased set, did not yield to any form of MR tested here, offering valuable hints as to the number and the characteristics of cases where experimental phasing remains essential for macromolecular structure solution.
{"title":"The success rate of processed predicted models in molecular replacement: implications for experimental phasing in the AlphaFold era.","authors":"Ronan M Keegan, Adam J Simpkin, Daniel J Rigden","doi":"10.1107/S2059798324009380","DOIUrl":"10.1107/S2059798324009380","url":null,"abstract":"<p><p>The availability of highly accurate protein structure predictions from AlphaFold2 (AF2) and similar tools has hugely expanded the applicability of molecular replacement (MR) for crystal structure solution. Many structures can be solved routinely using raw models, structures processed to remove unreliable parts or models split into distinct structural units. There is therefore an open question around how many and which cases still require experimental phasing methods such as single-wavelength anomalous diffraction (SAD). Here, this question is addressed using a large set of PDB depositions that were solved by SAD. A large majority (87%) could be solved using unedited or minimally edited AF2 predictions. A further 18 (4%) yield straightforwardly to MR after splitting of the AF2 prediction using Slice'N'Dice, although different splitting methods succeeded on slightly different sets of cases. It is also found that further unique targets can be solved by alternative modelling approaches such as ESMFold (four cases), alternative MR approaches such as ARCIMBOLDO and AMPLE (two cases each), and multimeric model building with AlphaFold-Multimer or UniFold (three cases). Ultimately, only 12 cases, or 3% of the SAD-phased set, did not yield to any form of MR tested here, offering valuable hints as to the number and the characteristics of cases where experimental phasing remains essential for macromolecular structure solution.</p>","PeriodicalId":7116,"journal":{"name":"Acta Crystallographica. Section D, Structural Biology","volume":" ","pages":"766-779"},"PeriodicalIF":2.6,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11544426/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142363862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01Epub Date: 2024-11-06DOI: 10.1107/S2059798324010350
Charles S Bond, Elspeth F Garman, Randy J Read
Two new Co-editors are welcomed to Acta Cryst. D - Structural Biology.
晶体学报》(Acta Cryst.D - 结构生物学。
{"title":"Welcoming two new Co-editors.","authors":"Charles S Bond, Elspeth F Garman, Randy J Read","doi":"10.1107/S2059798324010350","DOIUrl":"https://doi.org/10.1107/S2059798324010350","url":null,"abstract":"<p><p>Two new Co-editors are welcomed to Acta Cryst. D - Structural Biology.</p>","PeriodicalId":7116,"journal":{"name":"Acta Crystallographica. Section D, Structural Biology","volume":"80 Pt 11","pages":"765"},"PeriodicalIF":2.6,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142602385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01Epub Date: 2024-10-07DOI: 10.1107/S2059798324009471
Jose M de la Rosa-Trevin, Grigory Sharov, Stefan Fleischmann, Dustin Morado, John C Bollinger, Darcie J Miller, Daniel S Terry, Scott C Blanchard, Israel S Fernandez, Marta Carroni
Most scientific facilities produce large amounts of heterogeneous data at a rapid pace. Managing users, instruments, reports and invoices presents additional challenges. To address these challenges, EMhub, a web platform designed to support the daily operations and record-keeping of a scientific facility, has been introduced. EMhub enables the easy management of user information, instruments, bookings and projects. The application was initially developed to meet the needs of a cryoEM facility, but its functionality and adaptability have proven to be broad enough to be extended to other data-generating centers. The expansion of EMHub is enabled by the modular nature of its core functionalities. The application allows external processes to be connected via a REST API, automating tasks such as folder creation, user and password generation, and the execution of real-time data-processing pipelines. EMhub has been used for several years at the Swedish National CryoEM Facility and has been installed in the CryoEM center at the Structural Biology Department at St. Jude Children's Research Hospital. A fully automated single-particle pipeline has been implemented for on-the-fly data processing and analysis. At St. Jude, the X-Ray Crystallography Center and the Single-Molecule Imaging Center have already expanded the platform to support their operational and data-management workflows.
大多数科研机构都能快速生成大量异构数据。管理用户、仪器、报告和发票带来了额外的挑战。为应对这些挑战,EMhub 推出了一个网络平台,旨在支持科学设施的日常运作和记录保存。EMhub 可以轻松管理用户信息、仪器、预订和项目。该应用程序最初是为满足低温电子显微镜设施的需要而开发的,但其功能和适应性已被证明足以扩展到其他数据生成中心。EMHub 核心功能的模块化特性使其得以扩展。该应用程序允许通过 REST API 连接外部进程,自动执行文件夹创建、用户和密码生成以及实时数据处理管道执行等任务。EMhub 已在瑞典国家低温电子显微镜设施使用多年,并已安装在圣裘德儿童研究医院结构生物学部的低温电子显微镜中心。该系统采用了全自动单颗粒管道,可进行即时数据处理和分析。在圣裘德,X 射线晶体学中心和单分子成像中心已经扩展了该平台,以支持其操作和数据管理工作流程。
{"title":"EMhub: a web platform for data management and on-the-fly processing in scientific facilities.","authors":"Jose M de la Rosa-Trevin, Grigory Sharov, Stefan Fleischmann, Dustin Morado, John C Bollinger, Darcie J Miller, Daniel S Terry, Scott C Blanchard, Israel S Fernandez, Marta Carroni","doi":"10.1107/S2059798324009471","DOIUrl":"10.1107/S2059798324009471","url":null,"abstract":"<p><p>Most scientific facilities produce large amounts of heterogeneous data at a rapid pace. Managing users, instruments, reports and invoices presents additional challenges. To address these challenges, EMhub, a web platform designed to support the daily operations and record-keeping of a scientific facility, has been introduced. EMhub enables the easy management of user information, instruments, bookings and projects. The application was initially developed to meet the needs of a cryoEM facility, but its functionality and adaptability have proven to be broad enough to be extended to other data-generating centers. The expansion of EMHub is enabled by the modular nature of its core functionalities. The application allows external processes to be connected via a REST API, automating tasks such as folder creation, user and password generation, and the execution of real-time data-processing pipelines. EMhub has been used for several years at the Swedish National CryoEM Facility and has been installed in the CryoEM center at the Structural Biology Department at St. Jude Children's Research Hospital. A fully automated single-particle pipeline has been implemented for on-the-fly data processing and analysis. At St. Jude, the X-Ray Crystallography Center and the Single-Molecule Imaging Center have already expanded the platform to support their operational and data-management workflows.</p>","PeriodicalId":7116,"journal":{"name":"Acta Crystallographica. Section D, Structural Biology","volume":" ","pages":"780-790"},"PeriodicalIF":2.6,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11544427/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142379860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1107/S2059798324009252
Anastasia I Sotiropoulou, Dimitris G Hatzinikolaou, Evangelia D Chrysina
β-Glucosidase from the thermophilic bacterium Caldicellulosiruptor saccharolyticus (Bgl1) has been denoted as having an attractive catalytic profile for various industrial applications. Bgl1 catalyses the final step of in the decomposition of cellulose, an unbranched glucose polymer that has attracted the attention of researchers in recent years as it is the most abundant renewable source of reduced carbon in the biosphere. With the aim of enhancing the thermostability of Bgl1 for a broad spectrum of biotechnological processes, it has been subjected to structural studies. Crystal structures of Bgl1 and its complex with glucose were determined at 1.47 and 1.95 Å resolution, respectively. Bgl1 is a member of glycosyl hydrolase family 1 (GH1 superfamily, EC 3.2.1.21) and the results showed that the 3D structure of Bgl1 follows the overall architecture of the GH1 family, with a classical (β/α)8 TIM-barrel fold. Comparisons of Bgl1 with sequence or structural homologues of β-glucosidase reveal quite similar structures but also unique structural features in Bgl1 with plausible functional roles.
{"title":"Structural studies of β-glucosidase from the thermophilic bacterium Caldicellulosiruptor saccharolyticus.","authors":"Anastasia I Sotiropoulou, Dimitris G Hatzinikolaou, Evangelia D Chrysina","doi":"10.1107/S2059798324009252","DOIUrl":"10.1107/S2059798324009252","url":null,"abstract":"<p><p>β-Glucosidase from the thermophilic bacterium Caldicellulosiruptor saccharolyticus (Bgl1) has been denoted as having an attractive catalytic profile for various industrial applications. Bgl1 catalyses the final step of in the decomposition of cellulose, an unbranched glucose polymer that has attracted the attention of researchers in recent years as it is the most abundant renewable source of reduced carbon in the biosphere. With the aim of enhancing the thermostability of Bgl1 for a broad spectrum of biotechnological processes, it has been subjected to structural studies. Crystal structures of Bgl1 and its complex with glucose were determined at 1.47 and 1.95 Å resolution, respectively. Bgl1 is a member of glycosyl hydrolase family 1 (GH1 superfamily, EC 3.2.1.21) and the results showed that the 3D structure of Bgl1 follows the overall architecture of the GH1 family, with a classical (β/α)<sub>8</sub> TIM-barrel fold. Comparisons of Bgl1 with sequence or structural homologues of β-glucosidase reveal quite similar structures but also unique structural features in Bgl1 with plausible functional roles.</p>","PeriodicalId":7116,"journal":{"name":"Acta Crystallographica. Section D, Structural Biology","volume":"80 Pt 10","pages":"733-743"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11448918/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142363936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1107/S2059798324009276
Oliver N F King, Karl E Levik, James Sandy, Mark Basham
A group of three deep-learning tools, referred to collectively as CHiMP (Crystal Hits in My Plate), were created for analysis of micrographs of protein crystallization experiments at the Diamond Light Source (DLS) synchrotron, UK. The first tool, a classification network, assigns images into categories relating to experimental outcomes. The other two tools are networks that perform both object detection and instance segmentation, resulting in masks of individual crystals in the first case and masks of crystallization droplets in addition to crystals in the second case, allowing the positions and sizes of these entities to be recorded. The creation of these tools used transfer learning, where weights from a pre-trained deep-learning network were used as a starting point and repurposed by further training on a relatively small set of data. Two of the tools are now integrated at the VMXi macromolecular crystallography beamline at DLS, where they have the potential to absolve the need for any user input, both for monitoring crystallization experiments and for triggering in situ data collections. The third is being integrated into the XChem fragment-based drug-discovery screening platform, also at DLS, to allow the automatic targeting of acoustic compound dispensing into crystallization droplets.
为了分析英国钻石光源(DLS)同步加速器蛋白质结晶实验的显微照片,我们创建了一组三个深度学习工具,统称为 CHiMP(Crystal Hits in My Plate)。第一个工具是一个分类网络,将图像分配到与实验结果相关的类别中。另外两个工具是同时执行对象检测和实例分割的网络,在第一种情况下可生成单个晶体的掩膜,在第二种情况下除晶体外还可生成结晶液滴的掩膜,从而记录这些实体的位置和大小。这些工具的创建使用了迁移学习,即以预先训练好的深度学习网络的权重为起点,通过在相对较小的数据集上进行进一步训练来重新使用。其中两个工具现已集成到 DLS 的 VMXi 大分子晶体学光束线,在那里,无论是监测结晶实验还是触发原位数据收集,它们都有可能免除用户输入的需要。第三个系统正在被集成到同样位于 DLS 的 XChem 片段药物发现筛选平台中,以便将声学化合物自动分配到结晶液滴中。
{"title":"CHiMP: deep-learning tools trained on protein crystallization micrographs to enable automation of experiments.","authors":"Oliver N F King, Karl E Levik, James Sandy, Mark Basham","doi":"10.1107/S2059798324009276","DOIUrl":"10.1107/S2059798324009276","url":null,"abstract":"<p><p>A group of three deep-learning tools, referred to collectively as CHiMP (Crystal Hits in My Plate), were created for analysis of micrographs of protein crystallization experiments at the Diamond Light Source (DLS) synchrotron, UK. The first tool, a classification network, assigns images into categories relating to experimental outcomes. The other two tools are networks that perform both object detection and instance segmentation, resulting in masks of individual crystals in the first case and masks of crystallization droplets in addition to crystals in the second case, allowing the positions and sizes of these entities to be recorded. The creation of these tools used transfer learning, where weights from a pre-trained deep-learning network were used as a starting point and repurposed by further training on a relatively small set of data. Two of the tools are now integrated at the VMXi macromolecular crystallography beamline at DLS, where they have the potential to absolve the need for any user input, both for monitoring crystallization experiments and for triggering in situ data collections. The third is being integrated into the XChem fragment-based drug-discovery screening platform, also at DLS, to allow the automatic targeting of acoustic compound dispensing into crystallization droplets.</p>","PeriodicalId":7116,"journal":{"name":"Acta Crystallographica. Section D, Structural Biology","volume":"80 Pt 10","pages":"744-764"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11448919/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142363934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1107/S2059798324008519
Yunyun Gao, Helen M Ginn, Andrea Thorn
During the automatic processing of crystallographic diffraction experiments, beamstop shadows are often unaccounted for or only partially masked. As a result of this, outlier reflection intensities are integrated, which is a known issue. Traditional statistical diagnostics have only limited effectiveness in identifying these outliers, here termed Not-Excluded-unMasked-Outliers (NEMOs). The diagnostic tool AUSPEX allows visual inspection of NEMOs, where they form a typical pattern: clusters at the low-resolution end of the AUSPEX plots of intensities or amplitudes versus resolution. To automate NEMO detection, a new algorithm was developed by combining data statistics with a density-based clustering method. This approach demonstrates a promising performance in detecting NEMOs in merged data sets without disrupting existing data-reduction pipelines. Re-refinement results indicate that excluding the identified NEMOs can effectively enhance the quality of subsequent structure-determination steps. This method offers a prospective automated means to assess the efficacy of a beamstop mask, as well as highlighting the potential of modern pattern-recognition techniques for automating outlier exclusion during data processing, facilitating future adaptation to evolving experimental strategies.
{"title":"Robust and automatic beamstop shadow outlier rejection: combining crystallographic statistics with modern clustering under a semi-supervised learning strategy.","authors":"Yunyun Gao, Helen M Ginn, Andrea Thorn","doi":"10.1107/S2059798324008519","DOIUrl":"10.1107/S2059798324008519","url":null,"abstract":"<p><p>During the automatic processing of crystallographic diffraction experiments, beamstop shadows are often unaccounted for or only partially masked. As a result of this, outlier reflection intensities are integrated, which is a known issue. Traditional statistical diagnostics have only limited effectiveness in identifying these outliers, here termed Not-Excluded-unMasked-Outliers (NEMOs). The diagnostic tool AUSPEX allows visual inspection of NEMOs, where they form a typical pattern: clusters at the low-resolution end of the AUSPEX plots of intensities or amplitudes versus resolution. To automate NEMO detection, a new algorithm was developed by combining data statistics with a density-based clustering method. This approach demonstrates a promising performance in detecting NEMOs in merged data sets without disrupting existing data-reduction pipelines. Re-refinement results indicate that excluding the identified NEMOs can effectively enhance the quality of subsequent structure-determination steps. This method offers a prospective automated means to assess the efficacy of a beamstop mask, as well as highlighting the potential of modern pattern-recognition techniques for automating outlier exclusion during data processing, facilitating future adaptation to evolving experimental strategies.</p>","PeriodicalId":7116,"journal":{"name":"Acta Crystallographica. Section D, Structural Biology","volume":"80 Pt 10","pages":"722-732"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11448920/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142363935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}