C O S Sorzano, A Jiménez-Moreno, D Maluenda, M Martínez, E Ramírez-Aportela, J Krieger, R Melero, A Cuervo, J Conesa, J Filipovic, P Conesa, L Del Caño, Y C Fonseca, J Jiménez-de la Morena, P Losana, R Sánchez-García, D Strelak, E Fernández-Giménez, F P de Isidro-Gómez, D Herreros, J L Vilas, R Marabini, J M Carazo
{"title":"冷冻电子显微镜单粒子分析中的偏差、方差、过拟合、金标准和共识。","authors":"C O S Sorzano, A Jiménez-Moreno, D Maluenda, M Martínez, E Ramírez-Aportela, J Krieger, R Melero, A Cuervo, J Conesa, J Filipovic, P Conesa, L Del Caño, Y C Fonseca, J Jiménez-de la Morena, P Losana, R Sánchez-García, D Strelak, E Fernández-Giménez, F P de Isidro-Gómez, D Herreros, J L Vilas, R Marabini, J M Carazo","doi":"10.1107/S2059798322001978","DOIUrl":null,"url":null,"abstract":"<p><p>Cryo-electron microscopy (cryoEM) has become a well established technique to elucidate the 3D structures of biological macromolecules. Projection images from thousands of macromolecules that are assumed to be structurally identical are combined into a single 3D map representing the Coulomb potential of the macromolecule under study. This article discusses possible caveats along the image-processing path and how to avoid them to obtain a reliable 3D structure. Some of these problems are very well known in the community. These may be referred to as sample-related (such as specimen denaturation at interfaces or non-uniform projection geometry leading to underrepresented projection directions). The rest are related to the algorithms used. While some have been discussed in depth in the literature, such as the use of an incorrect initial volume, others have received much less attention. However, they are fundamental in any data-analysis approach. Chiefly among them, instabilities in estimating many of the key parameters that are required for a correct 3D reconstruction that occur all along the processing workflow are referred to, which may significantly affect the reliability of the whole process. In the field, the term overfitting has been coined to refer to some particular kinds of artifacts. It is argued that overfitting is a statistical bias in key parameter-estimation steps in the 3D reconstruction process, including intrinsic algorithmic bias. It is also shown that common tools (Fourier shell correlation) and strategies (gold standard) that are normally used to detect or prevent overfitting do not fully protect against it. Alternatively, it is proposed that detecting the bias that leads to overfitting is much easier when addressed at the level of parameter estimation, rather than detecting it once the particle images have been combined into a 3D map. Comparing the results from multiple algorithms (or at least, independent executions of the same algorithm) can detect parameter bias. These multiple executions could then be averaged to give a lower variance estimate of the underlying parameters.</p>","PeriodicalId":43404,"journal":{"name":"Museum Anthropology","volume":"29 1","pages":"410-423"},"PeriodicalIF":0.7000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8972802/pdf/","citationCount":"0","resultStr":"{\"title\":\"On bias, variance, overfitting, gold standard and consensus in single-particle analysis by cryo-electron microscopy.\",\"authors\":\"C O S Sorzano, A Jiménez-Moreno, D Maluenda, M Martínez, E Ramírez-Aportela, J Krieger, R Melero, A Cuervo, J Conesa, J Filipovic, P Conesa, L Del Caño, Y C Fonseca, J Jiménez-de la Morena, P Losana, R Sánchez-García, D Strelak, E Fernández-Giménez, F P de Isidro-Gómez, D Herreros, J L Vilas, R Marabini, J M Carazo\",\"doi\":\"10.1107/S2059798322001978\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Cryo-electron microscopy (cryoEM) has become a well established technique to elucidate the 3D structures of biological macromolecules. Projection images from thousands of macromolecules that are assumed to be structurally identical are combined into a single 3D map representing the Coulomb potential of the macromolecule under study. This article discusses possible caveats along the image-processing path and how to avoid them to obtain a reliable 3D structure. Some of these problems are very well known in the community. These may be referred to as sample-related (such as specimen denaturation at interfaces or non-uniform projection geometry leading to underrepresented projection directions). The rest are related to the algorithms used. While some have been discussed in depth in the literature, such as the use of an incorrect initial volume, others have received much less attention. However, they are fundamental in any data-analysis approach. Chiefly among them, instabilities in estimating many of the key parameters that are required for a correct 3D reconstruction that occur all along the processing workflow are referred to, which may significantly affect the reliability of the whole process. In the field, the term overfitting has been coined to refer to some particular kinds of artifacts. It is argued that overfitting is a statistical bias in key parameter-estimation steps in the 3D reconstruction process, including intrinsic algorithmic bias. It is also shown that common tools (Fourier shell correlation) and strategies (gold standard) that are normally used to detect or prevent overfitting do not fully protect against it. Alternatively, it is proposed that detecting the bias that leads to overfitting is much easier when addressed at the level of parameter estimation, rather than detecting it once the particle images have been combined into a 3D map. Comparing the results from multiple algorithms (or at least, independent executions of the same algorithm) can detect parameter bias. These multiple executions could then be averaged to give a lower variance estimate of the underlying parameters.</p>\",\"PeriodicalId\":43404,\"journal\":{\"name\":\"Museum Anthropology\",\"volume\":\"29 1\",\"pages\":\"410-423\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2022-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8972802/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Museum Anthropology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1107/S2059798322001978\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/3/16 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"ANTHROPOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Museum Anthropology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1107/S2059798322001978","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/3/16 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ANTHROPOLOGY","Score":null,"Total":0}
On bias, variance, overfitting, gold standard and consensus in single-particle analysis by cryo-electron microscopy.
Cryo-electron microscopy (cryoEM) has become a well established technique to elucidate the 3D structures of biological macromolecules. Projection images from thousands of macromolecules that are assumed to be structurally identical are combined into a single 3D map representing the Coulomb potential of the macromolecule under study. This article discusses possible caveats along the image-processing path and how to avoid them to obtain a reliable 3D structure. Some of these problems are very well known in the community. These may be referred to as sample-related (such as specimen denaturation at interfaces or non-uniform projection geometry leading to underrepresented projection directions). The rest are related to the algorithms used. While some have been discussed in depth in the literature, such as the use of an incorrect initial volume, others have received much less attention. However, they are fundamental in any data-analysis approach. Chiefly among them, instabilities in estimating many of the key parameters that are required for a correct 3D reconstruction that occur all along the processing workflow are referred to, which may significantly affect the reliability of the whole process. In the field, the term overfitting has been coined to refer to some particular kinds of artifacts. It is argued that overfitting is a statistical bias in key parameter-estimation steps in the 3D reconstruction process, including intrinsic algorithmic bias. It is also shown that common tools (Fourier shell correlation) and strategies (gold standard) that are normally used to detect or prevent overfitting do not fully protect against it. Alternatively, it is proposed that detecting the bias that leads to overfitting is much easier when addressed at the level of parameter estimation, rather than detecting it once the particle images have been combined into a 3D map. Comparing the results from multiple algorithms (or at least, independent executions of the same algorithm) can detect parameter bias. These multiple executions could then be averaged to give a lower variance estimate of the underlying parameters.
期刊介绍:
Museum Anthropology seeks to be a leading voice for scholarly research on the collection, interpretation, and representation of the material world. Through critical articles, provocative commentaries, and thoughtful reviews, this peer-reviewed journal aspires to cultivate vibrant dialogues that reflect the global and transdisciplinary work of museums. Situated at the intersection of practice and theory, Museum Anthropology advances our knowledge of the ways in which material objects are intertwined with living histories of cultural display, economics, socio-politics, law, memory, ethics, colonialism, conservation, and public education.