{"title":"基于分割的中分辨率冷冻电镜特征提取","authors":"Lin Chen, Ruba Jebril, K. Al Nasr","doi":"10.1145/3388440.3414711","DOIUrl":null,"url":null,"abstract":"Cryo-Electron Microscopy is a biophysics technique that produces volume images for a given molecule. It can visualize large molecules and protein complexes. At high resolution, <5Å, the structure can be modeled. When the resolution drops to worse than 5Å, computational techniques are used overcome the inaccuracy inherent in volume images. In this paper, we propose a segmentation-based approach to extract important features to overcome the essential inaccuracy in medium resolution volume images. The features are volume components represent local peak regions on the image. Later, the volume components are classified into one of the main secondary structure elements found in the protein molecules. Specifically, we built four models to classify volume components: Helix-Sheet-Loop, Helix-Binary, Sheet-Binary, and Loop-Binary. We used machine learning-based classifiers. Seven classification models are used to classify volume components. The proposed work in this paper is a preliminary approach to detect secondary structure elements from medium resolution volume images. The four machine-learning models were trained using authentic volume images from the Electron Microscopy Data Bank. No simulated/synthesized image was used for either training or testing. This is important since all existing methods use simulated images for training. Due to the noise essential to authentic images, simulated images are not best representatives. The procedure includes feature extraction, model selection, fine-tuning, and model ensembling. We tested our four models on the 20% of the dataset of 3400 volume components. The methods have achieved 80% accuracy for Sheet-Binary model, 77% for Helix-Binary, 71% for Loop-Binary and 67% for Helix-Sheet-Loop model.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Segmentation-based Feature Extraction for Cryo-Electron Microscopy at Medium Resolution\",\"authors\":\"Lin Chen, Ruba Jebril, K. Al Nasr\",\"doi\":\"10.1145/3388440.3414711\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cryo-Electron Microscopy is a biophysics technique that produces volume images for a given molecule. It can visualize large molecules and protein complexes. At high resolution, <5Å, the structure can be modeled. When the resolution drops to worse than 5Å, computational techniques are used overcome the inaccuracy inherent in volume images. In this paper, we propose a segmentation-based approach to extract important features to overcome the essential inaccuracy in medium resolution volume images. The features are volume components represent local peak regions on the image. Later, the volume components are classified into one of the main secondary structure elements found in the protein molecules. Specifically, we built four models to classify volume components: Helix-Sheet-Loop, Helix-Binary, Sheet-Binary, and Loop-Binary. We used machine learning-based classifiers. Seven classification models are used to classify volume components. The proposed work in this paper is a preliminary approach to detect secondary structure elements from medium resolution volume images. The four machine-learning models were trained using authentic volume images from the Electron Microscopy Data Bank. No simulated/synthesized image was used for either training or testing. This is important since all existing methods use simulated images for training. Due to the noise essential to authentic images, simulated images are not best representatives. The procedure includes feature extraction, model selection, fine-tuning, and model ensembling. We tested our four models on the 20% of the dataset of 3400 volume components. The methods have achieved 80% accuracy for Sheet-Binary model, 77% for Helix-Binary, 71% for Loop-Binary and 67% for Helix-Sheet-Loop model.\",\"PeriodicalId\":411338,\"journal\":{\"name\":\"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3388440.3414711\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388440.3414711","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Segmentation-based Feature Extraction for Cryo-Electron Microscopy at Medium Resolution
Cryo-Electron Microscopy is a biophysics technique that produces volume images for a given molecule. It can visualize large molecules and protein complexes. At high resolution, <5Å, the structure can be modeled. When the resolution drops to worse than 5Å, computational techniques are used overcome the inaccuracy inherent in volume images. In this paper, we propose a segmentation-based approach to extract important features to overcome the essential inaccuracy in medium resolution volume images. The features are volume components represent local peak regions on the image. Later, the volume components are classified into one of the main secondary structure elements found in the protein molecules. Specifically, we built four models to classify volume components: Helix-Sheet-Loop, Helix-Binary, Sheet-Binary, and Loop-Binary. We used machine learning-based classifiers. Seven classification models are used to classify volume components. The proposed work in this paper is a preliminary approach to detect secondary structure elements from medium resolution volume images. The four machine-learning models were trained using authentic volume images from the Electron Microscopy Data Bank. No simulated/synthesized image was used for either training or testing. This is important since all existing methods use simulated images for training. Due to the noise essential to authentic images, simulated images are not best representatives. The procedure includes feature extraction, model selection, fine-tuning, and model ensembling. We tested our four models on the 20% of the dataset of 3400 volume components. The methods have achieved 80% accuracy for Sheet-Binary model, 77% for Helix-Binary, 71% for Loop-Binary and 67% for Helix-Sheet-Loop model.