首页 > 最新文献

BMC Structural Biology最新文献

英文 中文
Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce 利用序列分割和MapReduce技术提高RNA二级结构预测的准确性和效率
Q3 Biochemistry, Genetics and Molecular Biology Pub Date : 2013-11-08 DOI: 10.1186/1472-6807-13-S1-S3
Boyu Zhang, Daniel T Yehdego, Kyle L Johnson, Ming-Ying Leung, Michela Taufer

Ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Their secondary structures are crucial for the RNA functionality, and the prediction of the secondary structures is widely studied. Our previous research shows that cutting long sequences into shorter chunks, predicting secondary structures of the chunks independently using thermodynamic methods, and reconstructing the entire secondary structure from the predicted chunk structures can yield better accuracy than predicting the secondary structure using the RNA sequence as a whole. The chunking, prediction, and reconstruction processes can use different methods and parameters, some of which produce more accurate predictions than others. In this paper, we study the prediction accuracy and efficiency of three different chunking methods using seven popular secondary structure prediction programs that apply to two datasets of RNA with known secondary structures, which include both pseudoknotted and non-pseudoknotted sequences, as well as a family of viral genome RNAs whose structures have not been predicted before. Our modularized MapReduce framework based on Hadoop allows us to study the problem in a parallel and robust environment.

On average, the maximum accuracy retention values are larger than one for our chunking methods and the seven prediction programs over 50 non-pseudoknotted sequences, meaning that the secondary structure predicted using chunking is more similar to the real structure than the secondary structure predicted by using the whole sequence. We observe similar results for the 23 pseudoknotted sequences, except for the NUPACK program using the centered chunking method. The performance analysis for 14 long RNA sequences from the Nodaviridae virus family outlines how the coarse-grained mapping of chunking and predictions in the MapReduce framework exhibits shorter turnaround times for short RNA sequences. However, as the lengths of the RNA sequences increase, the fine-grained mapping can surpass the coarse-grained mapping in performance.

By using our MapReduce framework together with statistical analysis on the accuracy retention results, we observe how the inversion-based chunking methods can outperform predictions using the whole sequence. Our chunk-based approach also enables us to predict secondary structures for very long RNA sequences, which is not feasible with traditional methods alone.

核糖核酸(RNA)分子在包括基因表达和调控在内的许多生物过程中起着重要作用。它们的二级结构对RNA的功能起着至关重要的作用,对其二级结构的预测也得到了广泛的研究。我们之前的研究表明,将长序列切割成较短的片段,利用热力学方法独立预测片段的二级结构,并根据预测的片段结构重建整个二级结构,比将RNA序列作为一个整体预测二级结构具有更高的准确性。分块、预测和重建过程可以使用不同的方法和参数,其中一些方法产生的预测比其他方法更准确。在本文中,我们研究了使用七种流行的二级结构预测程序的三种不同的分块方法的预测精度和效率,这些程序适用于两个已知二级结构的RNA数据集,其中包括假结和非假结序列,以及以前未预测结构的病毒基因组RNA家族。我们基于Hadoop的模块化MapReduce框架允许我们在并行和健壮的环境中研究问题。平均而言,我们的分块方法和7种预测方案在50个非假结序列上的最大准确度保留值都大于1,这意味着使用分块预测的二级结构比使用整个序列预测的二级结构更接近真实结构。除了使用中心分块方法的NUPACK程序外,我们对23个伪结序列观察到类似的结果。对来自noddaviridae病毒家族的14个长RNA序列的性能分析概述了MapReduce框架中的粗粒度分组映射和预测如何在短RNA序列上显示更短的周转时间。然而,随着RNA序列长度的增加,细粒度映射在性能上可以超过粗粒度映射。通过使用我们的MapReduce框架以及对准确性保持结果的统计分析,我们观察到基于反转的分块方法如何优于使用整个序列的预测。我们基于块的方法也使我们能够预测非常长的RNA序列的二级结构,这是单独使用传统方法不可行的。
{"title":"Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce","authors":"Boyu Zhang,&nbsp;Daniel T Yehdego,&nbsp;Kyle L Johnson,&nbsp;Ming-Ying Leung,&nbsp;Michela Taufer","doi":"10.1186/1472-6807-13-S1-S3","DOIUrl":"https://doi.org/10.1186/1472-6807-13-S1-S3","url":null,"abstract":"<p>Ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Their secondary structures are crucial for the RNA functionality, and the prediction of the secondary structures is widely studied. Our previous research shows that cutting long sequences into shorter chunks, predicting secondary structures of the chunks independently using thermodynamic methods, and reconstructing the entire secondary structure from the predicted chunk structures can yield better accuracy than predicting the secondary structure using the RNA sequence as a whole. The chunking, prediction, and reconstruction processes can use different methods and parameters, some of which produce more accurate predictions than others. In this paper, we study the prediction accuracy and efficiency of three different chunking methods using seven popular secondary structure prediction programs that apply to two datasets of RNA with known secondary structures, which include both pseudoknotted and non-pseudoknotted sequences, as well as a family of viral genome RNAs whose structures have not been predicted before. Our modularized MapReduce framework based on Hadoop allows us to study the problem in a parallel and robust environment.</p><p>On average, the maximum accuracy retention values are larger than one for our chunking methods and the seven prediction programs over 50 non-pseudoknotted sequences, meaning that the secondary structure predicted using chunking is more similar to the real structure than the secondary structure predicted by using the whole sequence. We observe similar results for the 23 pseudoknotted sequences, except for the NUPACK program using the centered chunking method. The performance analysis for 14 long RNA sequences from the <i>Nodaviridae</i> virus family outlines how the coarse-grained mapping of chunking and predictions in the MapReduce framework exhibits shorter turnaround times for short RNA sequences. However, as the lengths of the RNA sequences increase, the fine-grained mapping can surpass the coarse-grained mapping in performance.</p><p>By using our MapReduce framework together with statistical analysis on the accuracy retention results, we observe how the inversion-based chunking methods can outperform predictions using the whole sequence. Our chunk-based approach also enables us to predict secondary structures for very long RNA sequences, which is not feasible with traditional methods alone.</p>","PeriodicalId":51240,"journal":{"name":"BMC Structural Biology","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1472-6807-13-S1-S3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"4354094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Elucidating the ensemble of functionally-relevant transitions in protein systems with a robotics-inspired method 用机器人启发的方法阐明蛋白质系统中功能相关转换的集合
Q3 Biochemistry, Genetics and Molecular Biology Pub Date : 2013-11-08 DOI: 10.1186/1472-6807-13-S1-S8
Kevin Molloy, Amarda Shehu

Many proteins tune their biological function by transitioning between different functional states, effectively acting as dynamic molecular machines. Detailed structural characterization of transition trajectories is central to understanding the relationship between protein dynamics and function. Computational approaches that build on the Molecular Dynamics framework are in principle able to model transition trajectories at great detail but also at considerable computational cost. Methods that delay consideration of dynamics and focus instead on elucidating energetically-credible conformational paths connecting two functionally-relevant structures provide a complementary approach. Effective sampling-based path planning methods originating in robotics have been recently proposed to produce conformational paths. These methods largely model short peptides or address large proteins by simplifying conformational space.

We propose a robotics-inspired method that connects two given structures of a protein by sampling conformational paths. The method focuses on small- to medium-size proteins, efficiently modeling structural deformations through the use of the molecular fragment replacement technique. In particular, the method grows a tree in conformational space rooted at the start structure, steering the tree to a goal region defined around the goal structure. We investigate various bias schemes over a progress coordinate for balance between coverage of conformational space and progress towards the goal. A geometric projection layer promotes path diversity. A reactive temperature scheme allows sampling of rare paths that cross energy barriers.

Experiments are conducted on small- to medium-size proteins of length up to 214 amino acids and with multiple known functionally-relevant states, some of which are more than 13? apart of each-other. Analysis reveals that the method effectively obtains conformational paths connecting structural states that are significantly different. A detailed analysis on the depth and breadth of the tree suggests that a soft global bias over the progress coordinate enhances sampling and results in higher path diversity. The explicit geometric projection layer that biases the exploration away from over-sampled regions further increases coverage, often improving proximity to the goal by forcing the exploration to find new paths. The reactive temperature scheme is shown effective in increasing path diversity, particularly in difficult structural transitions with known high-energy barriers.

许多蛋白质通过在不同的功能状态之间转换来调节其生物功能,有效地充当动态分子机器。过渡轨迹的详细结构表征是理解蛋白质动力学和功能之间关系的核心。建立在分子动力学框架上的计算方法原则上能够非常详细地模拟过渡轨迹,但也需要相当大的计算成本。延迟考虑动力学的方法侧重于阐明连接两个功能相关结构的能量可信构象路径,提供了一种互补的方法。源于机器人技术的有效的基于采样的路径规划方法最近被提出用于生成构象路径。这些方法主要是通过简化构象空间来模拟短肽或处理大蛋白质。我们提出了一种机器人启发的方法,通过采样构象路径连接蛋白质的两个给定结构。该方法侧重于中小尺寸的蛋白质,通过使用分子片段替换技术有效地模拟结构变形。特别是,该方法在以起始结构为根的构象空间中生长树,将树导向目标结构周围定义的目标区域。我们在一个进度坐标上研究了各种偏差方案,以平衡构象空间的覆盖和朝着目标的进度。几何投影层促进路径多样性。反应温度方案允许对跨越能量势垒的稀有路径进行采样。实验对象是长度达214个氨基酸的中小型蛋白质,具有多种已知的功能相关状态,其中一些超过13?彼此分开。分析表明,该方法有效地获得了连接结构状态显著不同的构象路径。对树的深度和宽度的详细分析表明,在进度坐标上的软全局偏差增强了采样,并导致更高的路径多样性。显式几何投影层使勘探远离过采样区域,进一步增加了覆盖范围,通常通过迫使勘探寻找新路径来提高与目标的接近度。反应温度方案在增加路径多样性方面是有效的,特别是在具有已知高能势垒的困难结构转变中。
{"title":"Elucidating the ensemble of functionally-relevant transitions in protein systems with a robotics-inspired method","authors":"Kevin Molloy,&nbsp;Amarda Shehu","doi":"10.1186/1472-6807-13-S1-S8","DOIUrl":"https://doi.org/10.1186/1472-6807-13-S1-S8","url":null,"abstract":"<p>Many proteins tune their biological function by transitioning between different functional states, effectively acting as dynamic molecular machines. Detailed structural characterization of transition trajectories is central to understanding the relationship between protein dynamics and function. Computational approaches that build on the Molecular Dynamics framework are in principle able to model transition trajectories at great detail but also at considerable computational cost. Methods that delay consideration of dynamics and focus instead on elucidating energetically-credible conformational paths connecting two functionally-relevant structures provide a complementary approach. Effective sampling-based path planning methods originating in robotics have been recently proposed to produce conformational paths. These methods largely model short peptides or address large proteins by simplifying conformational space.</p><p>We propose a robotics-inspired method that connects two given structures of a protein by sampling conformational paths. The method focuses on small- to medium-size proteins, efficiently modeling structural deformations through the use of the molecular fragment replacement technique. In particular, the method grows a tree in conformational space rooted at the start structure, steering the tree to a goal region defined around the goal structure. We investigate various bias schemes over a progress coordinate for balance between coverage of conformational space and progress towards the goal. A geometric projection layer promotes path diversity. A reactive temperature scheme allows sampling of rare paths that cross energy barriers.</p><p>Experiments are conducted on small- to medium-size proteins of length up to 214 amino acids and with multiple known functionally-relevant states, some of which are more than 13? apart of each-other. Analysis reveals that the method effectively obtains conformational paths connecting structural states that are significantly different. A detailed analysis on the depth and breadth of the tree suggests that a soft global bias over the progress coordinate enhances sampling and results in higher path diversity. The explicit geometric projection layer that biases the exploration away from over-sampled regions further increases coverage, often improving proximity to the goal by forcing the exploration to find new paths. The reactive temperature scheme is shown effective in increasing path diversity, particularly in difficult structural transitions with known high-energy barriers.</p>","PeriodicalId":51240,"journal":{"name":"BMC Structural Biology","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1472-6807-13-S1-S8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"4354942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Four-body atomic potential for modeling protein-ligand binding affinity: application to enzyme-inhibitor binding energy prediction 模拟蛋白质-配体结合亲和力的四体原子势:在酶抑制剂结合能预测中的应用
Q3 Biochemistry, Genetics and Molecular Biology Pub Date : 2013-11-08 DOI: 10.1186/1472-6807-13-S1-S1
Majid Masso

Models that are capable of reliably predicting binding affinities for protein-ligand complexes play an important role the field of structure-guided drug design.

Here, we begin by applying the computational geometry technique of Delaunay tessellation to each set of atomic coordinates for over 1400 diverse macromolecular structures, for the purpose of deriving a four-body statistical potential that serves as a topological scoring function. Next, we identify a second, independent set of three hundred protein-ligand complexes, having both high-resolution structures and known dissociation constants. Two-thirds of these complexes are randomly selected to train a predictive model of binding affinity as follows: two tessellations are generated in each case, one for the entire complex and another strictly for the isolated protein without its bound ligand, and a topological score is computed for each tessellation with the four-body potential. Predicted protein-ligand binding affinity is then based on an empirically derived linear function of the difference between both topological scores, one that appropriately scales the value of this difference.

A comparison between experimental and calculated binding affinity values over the two hundred complexes reveals a Pearson's correlation coefficient of r = 0.79 with a standard error of SE = 1.98 kcal/mol. To validate the method, we similarly generated two tessellations for each of the remaining protein-ligand complexes, computed their topological scores and the difference between the two scores for each complex, and applied the previously derived linear transformation of this topological score difference to predict binding affinities. For these one hundred complexes, we again observe a correlation of r = 0.79 (SE = 1.93 kcal/mol) between known and calculated binding affinities. Applying our model to an independent test set of high-resolution structures for three hundred diverse enzyme-inhibitor complexes, each with an experimentally known inhibition constant, also yields a correlation of r = 0.79 (SE = 2.39 kcal/mol) between experimental and calculated binding energies.

Lastly, we generate predictions with our model on a diverse test set of one hundred protein-ligand complexes previously used to benchmark 15 related methods, and our correlation of r = 0.66 between the calculated and experimental binding energies for this dataset exceeds those of the other approaches. Compared with these related prediction methods, our approach stands out based on salient features that include the reliability of our model, combined with the rapidity of the generated predictions, which are less than one second for an average sized complex.

能够可靠预测蛋白质-配体复合物结合亲和力的模型在结构导向药物设计领域发挥着重要作用。在这里,我们首先将Delaunay镶嵌的计算几何技术应用于1400多种不同大分子结构的每一组原子坐标,目的是推导出作为拓扑评分函数的四体统计势。接下来,我们确定了第二组独立的300个蛋白质配体复合物,具有高分辨率结构和已知的解离常数。随机选择这些复合物的三分之二来训练结合亲和力的预测模型,如下所示:每种情况下产生两个镶嵌,一个用于整个复合物,另一个严格用于不含其结合配体的分离蛋白,并且计算具有四体电位的每个镶嵌的拓扑分数。预测的蛋白质-配体结合亲和力是基于两个拓扑分数之间的差异的经验推导的线性函数,一个适当地衡量这种差异的值。对这200种配合物的实验值和计算值进行比较,得出Pearson相关系数r = 0.79,标准误差SE = 1.98 kcal/mol。为了验证该方法,我们同样为每个剩余的蛋白质配体复合物生成了两个镶嵌图,计算了它们的拓扑分数和每个复合物的两个分数之间的差值,并应用先前导出的拓扑分数差的线性变换来预测结合亲和力。对于这100个配合物,我们再次观察到已知和计算的结合亲和力之间的相关r = 0.79 (SE = 1.93 kcal/mol)。将我们的模型应用于300种不同酶抑制剂复合物的高分辨率结构的独立测试集,每个复合物都具有实验已知的抑制常数,实验和计算的结合能之间的相关性r = 0.79 (SE = 2.39 kcal/mol)。最后,我们用我们的模型对100个蛋白质配体复合物的不同测试集进行了预测,这些测试集之前用于基准测试15种相关方法,我们的计算结合能和实验结合能之间的相关性r = 0.66超过了其他方法。与这些相关的预测方法相比,我们的方法基于显著的特征脱颖而出,包括我们的模型的可靠性,以及生成预测的速度,对于平均大小的复杂来说,预测的速度不到一秒。
{"title":"Four-body atomic potential for modeling protein-ligand binding affinity: application to enzyme-inhibitor binding energy prediction","authors":"Majid Masso","doi":"10.1186/1472-6807-13-S1-S1","DOIUrl":"https://doi.org/10.1186/1472-6807-13-S1-S1","url":null,"abstract":"<p>Models that are capable of reliably predicting binding affinities for protein-ligand complexes play an important role the field of structure-guided drug design.</p><p>Here, we begin by applying the computational geometry technique of Delaunay tessellation to each set of atomic coordinates for over 1400 diverse macromolecular structures, for the purpose of deriving a four-body statistical potential that serves as a topological scoring function. Next, we identify a second, independent set of three hundred protein-ligand complexes, having both high-resolution structures and known dissociation constants. Two-thirds of these complexes are randomly selected to train a predictive model of binding affinity as follows: two tessellations are generated in each case, one for the entire complex and another strictly for the isolated protein without its bound ligand, and a topological score is computed for each tessellation with the four-body potential. Predicted protein-ligand binding affinity is then based on an empirically derived linear function of the difference between both topological scores, one that appropriately scales the value of this difference.</p><p>A comparison between experimental and calculated binding affinity values over the two hundred complexes reveals a Pearson's correlation coefficient of <i>r</i> = 0.79 with a standard error of <i>SE</i> = 1.98 kcal/mol. To validate the method, we similarly generated two tessellations for each of the remaining protein-ligand complexes, computed their topological scores and the difference between the two scores for each complex, and applied the previously derived linear transformation of this topological score difference to predict binding affinities. For these one hundred complexes, we again observe a correlation of <i>r</i> = 0.79 (<i>SE</i> = 1.93 kcal/mol) between known and calculated binding affinities. Applying our model to an independent test set of high-resolution structures for three hundred diverse enzyme-inhibitor complexes, each with an experimentally known inhibition constant, also yields a correlation of <i>r</i> = 0.79 (<i>SE</i> = 2.39 kcal/mol) between experimental and calculated binding energies.</p><p>Lastly, we generate predictions with our model on a diverse test set of one hundred protein-ligand complexes previously used to benchmark 15 related methods, and our correlation of <i>r</i> = 0.66 between the calculated and experimental binding energies for this dataset exceeds those of the other approaches. Compared with these related prediction methods, our approach stands out based on salient features that include the reliability of our model, combined with the rapidity of the generated predictions, which are less than one second for an average sized complex.</p>","PeriodicalId":51240,"journal":{"name":"BMC Structural Biology","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1472-6807-13-S1-S1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"4357914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Estimating loop length from CryoEM images at medium resolutions 估计循环长度从CryoEM图像在中等分辨率
Q3 Biochemistry, Genetics and Molecular Biology Pub Date : 2013-11-08 DOI: 10.1186/1472-6807-13-S1-S5
Andrew McKnight, Dong Si, Kamal Al Nasr, Andrey Chernikov, Nikos Chrisochoides, Jing He

De novo protein modeling approaches utilize 3-dimensional (3D) images derived from electron cryomicroscopy (CryoEM) experiments. The skeleton connecting two secondary structures such as α-helices represent the loop in the 3D image. The accuracy of the skeleton and of the detected secondary structures are critical in De novo modeling. It is important to measure the length along the skeleton accurately since the length can be used as a constraint in modeling the protein.

We have developed a novel computational geometric approach to derive a simplified curve in order to estimate the loop length along the skeleton. The method was tested using fifty simulated density images of helix-loop-helix segments of atomic structures and eighteen experimentally derived density data from Electron Microscopy Data Bank (EMDB). The test using simulated density maps shows that it is possible to estimate within 0.5? of the expected length for 48 of the 50 cases. The experiments, involving eighteen experimentally derived CryoEM images, show that twelve cases have error within 2?.

The tests using both simulated and experimentally derived images show that it is possible for our proposed method to estimate the loop length along the skeleton if the secondary structure elements, such as α-helices, can be detected accurately, and there is a continuous skeleton linking the α-helices.

从头开始的蛋白质建模方法利用来自电子冷冻显微镜(CryoEM)实验的三维(3D)图像。连接两个二级结构(如α-螺旋)的骨架表示三维图像中的环路。在从头建模中,骨架和检测到的二级结构的准确性是至关重要的。准确测量骨架的长度是很重要的,因为长度可以作为蛋白质建模的约束条件。我们开发了一种新的计算几何方法来推导简化曲线,以估计沿骨架的环路长度。利用50张原子结构螺旋-环-螺旋片段的模拟密度图像和18张来自电子显微镜数据库(EMDB)的实验导出的密度数据对该方法进行了测试。使用模拟密度图的测试表明,可以在0.5?50例中48例的预期长度。对18幅实验导出的CryoEM图像进行了实验,结果表明,其中12幅图像的误差在2°以内。模拟和实验结果表明,如果α-螺旋等二级结构元素能够被准确地检测到,并且α-螺旋之间存在连续的骨架连接,则该方法可以估计出沿骨架的环路长度。
{"title":"Estimating loop length from CryoEM images at medium resolutions","authors":"Andrew McKnight,&nbsp;Dong Si,&nbsp;Kamal Al Nasr,&nbsp;Andrey Chernikov,&nbsp;Nikos Chrisochoides,&nbsp;Jing He","doi":"10.1186/1472-6807-13-S1-S5","DOIUrl":"https://doi.org/10.1186/1472-6807-13-S1-S5","url":null,"abstract":"<p>De novo protein modeling approaches utilize 3-dimensional (3D) images derived from electron cryomicroscopy (CryoEM) experiments. The skeleton connecting two secondary structures such as <i>α</i>-helices represent the loop in the 3D image. The accuracy of the skeleton and of the detected secondary structures are critical in De novo modeling. It is important to measure the length along the skeleton accurately since the length can be used as a constraint in modeling the protein.</p><p>We have developed a novel computational geometric approach to derive a simplified curve in order to estimate the loop length along the skeleton. The method was tested using fifty simulated density images of helix-loop-helix segments of atomic structures and eighteen experimentally derived density data from Electron Microscopy Data Bank (EMDB). The test using simulated density maps shows that it is possible to estimate within 0.5? of the expected length for 48 of the 50 cases. The experiments, involving eighteen experimentally derived CryoEM images, show that twelve cases have error within 2?.</p><p>The tests using both simulated and experimentally derived images show that it is possible for our proposed method to estimate the loop length along the skeleton if the secondary structure elements, such as <i>α</i>-helices, can be detected accurately, and there is a continuous skeleton linking the <i>α</i>-helices.</p>","PeriodicalId":51240,"journal":{"name":"BMC Structural Biology","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1472-6807-13-S1-S5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"4351895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Unbiased, scalable sampling of protein loop conformations from probabilistic priors 无偏的,可扩展的采样的蛋白质环构象的概率先验
Q3 Biochemistry, Genetics and Molecular Biology Pub Date : 2013-11-08 DOI: 10.1186/1472-6807-13-S1-S9
Yajia Zhang, Kris Hauser

Protein loops are flexible structures that are intimately tied to function, but understanding loop motion and generating loop conformation ensembles remain significant computational challenges. Discrete search techniques scale poorly to large loops, optimization and molecular dynamics techniques are prone to local minima, and inverse kinematics techniques can only incorporate structural preferences in adhoc fashion. This paper presents Sub-Loop Inverse Kinematics Monte Carlo (SLIKMC), a new Markov chain Monte Carlo algorithm for generating conformations of closed loops according to experimentally available, heterogeneous structural preferences.

Our simulation experiments demonstrate that the method computes high-scoring conformations of large loops (> 10 residues) orders of magnitude faster than standard Monte Carlo and discrete search techniques. Two new developments contribute to the scalability of the new method. First, structural preferences are specified via a probabilistic graphical model (PGM) that links conformation variables, spatial variables (e.g., atom positions), constraints and prior information in a unified framework. The method uses a sparse PGM that exploits locality of interactions between atoms and residues. Second, a novel method for sampling sub-loops is developed to generate statistically unbiased samples of probability densities restricted by loop-closure constraints.

Numerical experiments confirm that SLIKMC generates conformation ensembles that are statistically consistent with specified structural preferences. Protein conformations with 100+ residues are sampled on standard PC hardware in seconds. Application to proteins involved in ion-binding demonstrate its potential as a tool for loop ensemble generation and missing structure completion.

蛋白质环是与功能密切相关的灵活结构,但理解环运动和生成环构象集成仍然是重大的计算挑战。离散搜索技术难以适应大的循环,优化和分子动力学技术容易出现局部最小值,逆运动学技术只能以特别的方式结合结构偏好。本文提出了子环逆运动学蒙特卡罗(SLIKMC),这是一种新的马尔可夫链蒙特卡罗算法,用于根据实验可用的异质结构偏好生成闭环构象。我们的仿真实验表明,该方法可以计算出大环路的高分构象(>10个残数)比标准蒙特卡罗和离散搜索技术快几个数量级。两个新的发展有助于新方法的可伸缩性。首先,通过概率图形模型(PGM)指定结构偏好,该模型将构象变量、空间变量(如原子位置)、约束和先验信息连接在一个统一的框架中。该方法使用稀疏PGM,利用原子和残基之间相互作用的局部性。其次,提出了一种采样子环的新方法,以产生受闭环约束的概率密度的统计无偏样本。数值实验证实,SLIKMC生成的构象集合在统计上与指定的结构偏好一致。具有100+残基的蛋白质构象在几秒钟内在标准PC硬件上采样。在参与离子结合的蛋白质上的应用证明了它作为环系综生成和缺失结构完成工具的潜力。
{"title":"Unbiased, scalable sampling of protein loop conformations from probabilistic priors","authors":"Yajia Zhang,&nbsp;Kris Hauser","doi":"10.1186/1472-6807-13-S1-S9","DOIUrl":"https://doi.org/10.1186/1472-6807-13-S1-S9","url":null,"abstract":"<p>Protein loops are flexible structures that are intimately tied to function, but understanding loop motion and generating loop conformation ensembles remain significant computational challenges. Discrete search techniques scale poorly to large loops, optimization and molecular dynamics techniques are prone to local minima, and inverse kinematics techniques can only incorporate structural preferences in adhoc fashion. This paper presents Sub-Loop Inverse Kinematics Monte Carlo (SLIKMC), a new Markov chain Monte Carlo algorithm for generating conformations of closed loops according to experimentally available, heterogeneous structural preferences.</p><p>Our simulation experiments demonstrate that the method computes high-scoring conformations of large loops (<i>&gt;</i> 10 residues) orders of magnitude faster than standard Monte Carlo and discrete search techniques. Two new developments contribute to the scalability of the new method. First, structural preferences are specified via a probabilistic graphical model (PGM) that links conformation variables, spatial variables (e.g., atom positions), constraints and prior information in a unified framework. The method uses a sparse PGM that exploits locality of interactions between atoms and residues. Second, a novel method for sampling sub-loops is developed to generate statistically unbiased samples of probability densities restricted by loop-closure constraints.</p><p>Numerical experiments confirm that SLIKMC generates conformation ensembles that are statistically consistent with specified structural preferences. Protein conformations with 100+ residues are sampled on standard PC hardware in seconds. Application to proteins involved in ion-binding demonstrate its potential as a tool for loop ensemble generation and missing structure completion.</p>","PeriodicalId":51240,"journal":{"name":"BMC Structural Biology","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1472-6807-13-S1-S9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"4354972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
DINC: A new AutoDock-based protocol for docking large ligands DINC:一种新的基于autodock的协议,用于对接大型配体
Q3 Biochemistry, Genetics and Molecular Biology Pub Date : 2013-11-08 DOI: 10.1186/1472-6807-13-S1-S11
Ankur Dhanik, John S McMurray, Lydia E Kavraki

Using the popular program AutoDock, computer-aided docking of small ligands with 6 or fewer rotatable bonds, is reasonably fast and accurate. However, docking large ligands using AutoDock's recommended standard docking protocol is less accurate and computationally slow.

In our earlier work, we presented a novel AutoDock-based incremental protocol (DINC) that addresses the limitations of AutoDock's standard protocol by enabling improved docking of large ligands. Instead of docking a large ligand to a target protein in one single step as done in the standard protocol, our protocol docks the large ligand in increments. In this paper, we present three detailed examples of docking using DINC and compare the docking results with those obtained using AutoDock's standard protocol. We summarize the docking results from an extended docking study that was done on 73 protein-ligand complexes comprised of large ligands. We demonstrate not only that DINC is up to 2 orders of magnitude faster than AutoDock's standard protocol, but that it also achieves the speed-up without sacrificing docking accuracy. We also show that positional restraints can be applied to the large ligand using DINC: this is useful when computing a docked conformation of the ligand. Finally, we introduce a webserver for docking large ligands using DINC.

Docking large ligands using DINC is significantly faster than AutoDock's standard protocol without any loss of accuracy. Therefore, DINC could be used as an alternative protocol for docking large ligands. DINC has been implemented as a webserver and is available at http://dinc.kavrakilab.org. Applications such as therapeutic drug design, rational vaccine design, and others involving large ligands could benefit from DINC and its webserver implementation.

使用流行的AutoDock程序,计算机辅助对接具有6个或更少可旋转键的小配体,相当快速和准确。然而,使用AutoDock推荐的标准对接协议对接大型配体不太准确,计算速度也很慢。在我们早期的工作中,我们提出了一种新的基于AutoDock的增量协议(DINC),通过改进大配体的对接,解决了AutoDock标准协议的局限性。在标准方案中,大配体与靶蛋白的对接不是一步完成的,我们的方案是以增量的方式对接大配体。在本文中,我们给出了三个使用DINC对接的详细示例,并将其与使用AutoDock标准协议获得的对接结果进行了比较。我们总结了一项扩展对接研究的对接结果,该研究对73个由大配体组成的蛋白质配体复合物进行了对接。我们不仅证明了DINC比AutoDock的标准协议快2个数量级,而且在不牺牲对接精度的情况下实现了加速。我们还表明,位置约束可以应用于大配体使用DINC:这是有用的,当计算一个停靠构象的配体。最后,我们介绍了一个使用DINC对接大配体的web服务器。使用DINC对接大型配体比AutoDock的标准协议要快得多,而且没有任何准确性损失。因此,DINC可以作为对接大配体的替代方案。DINC已经作为一个web服务器实现,可以在http://dinc.kavrakilab.org上获得。诸如治疗药物设计、合理疫苗设计和其他涉及大配体的应用程序可以从DINC及其web服务器实现中受益。
{"title":"DINC: A new AutoDock-based protocol for docking large ligands","authors":"Ankur Dhanik,&nbsp;John S McMurray,&nbsp;Lydia E Kavraki","doi":"10.1186/1472-6807-13-S1-S11","DOIUrl":"https://doi.org/10.1186/1472-6807-13-S1-S11","url":null,"abstract":"<p>Using the popular program AutoDock, computer-aided docking of small ligands with 6 or fewer rotatable bonds, is reasonably fast and accurate. However, docking large ligands using AutoDock's recommended standard docking protocol is less accurate and computationally slow.</p><p>In our earlier work, we presented a novel AutoDock-based incremental protocol (DINC) that addresses the limitations of AutoDock's standard protocol by enabling improved docking of large ligands. Instead of docking a large ligand to a target protein in one single step as done in the standard protocol, our protocol docks the large ligand in increments. In this paper, we present three detailed examples of docking using DINC and compare the docking results with those obtained using AutoDock's standard protocol. We summarize the docking results from an extended docking study that was done on 73 protein-ligand complexes comprised of large ligands. We demonstrate not only that DINC is up to 2 orders of magnitude faster than AutoDock's standard protocol, but that it also achieves the speed-up without sacrificing docking accuracy. We also show that positional restraints can be applied to the large ligand using DINC: this is useful when computing a docked conformation of the ligand. Finally, we introduce a webserver for docking large ligands using DINC.</p><p>Docking large ligands using DINC is significantly faster than AutoDock's standard protocol without any loss of accuracy. Therefore, DINC could be used as an alternative protocol for docking large ligands. DINC has been implemented as a webserver and is available at http://dinc.kavrakilab.org. Applications such as therapeutic drug design, rational vaccine design, and others involving large ligands could benefit from DINC and its webserver implementation.</p>","PeriodicalId":51240,"journal":{"name":"BMC Structural Biology","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1472-6807-13-S1-S11","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"4354075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Modeling protein conformational transitions by a combination of coarse-grained normal mode analysis and robotics-inspired methods 通过结合粗粒度正常模式分析和机器人启发的方法来建模蛋白质构象转变
Q3 Biochemistry, Genetics and Molecular Biology Pub Date : 2013-11-08 DOI: 10.1186/1472-6807-13-S1-S2
Ibrahim Al-Bluwi, Marc Vaisset, Thierry Siméon, Juan Cortés

Obtaining atomic-scale information about large-amplitude conformational transitions in proteins is a challenging problem for both experimental and computational methods. Such information is, however, important for understanding the mechanisms of interaction of many proteins.

This paper presents a computationally efficient approach, combining methods originating from robotics and computational biophysics, to model protein conformational transitions. The ability of normal mode analysis to predict directions of collective, large-amplitude motions is applied to bias the conformational exploration performed by a motion planning algorithm. To reduce the dimension of the problem, normal modes are computed for a coarse-grained elastic network model built on short fragments of three residues. Nevertheless, the validity of intermediate conformations is checked using the all-atom model, which is accurately reconstructed from the coarse-grained one using closed-form inverse kinematics.

Tests on a set of ten proteins demonstrate the ability of the method to model conformational transitions of proteins within a few hours of computing time on a single processor. These results also show that the computing time scales linearly with the protein size, independently of the protein topology. Further experiments on adenylate kinase show that main features of the transition between the open and closed conformations of this protein are well captured in the computed path.

The proposed method enables the simulation of large-amplitude conformational transitions in proteins using very few computational resources. The resulting paths are a first approximation that can directly provide important information on the molecular mechanisms involved in the conformational transition. This approximation can be subsequently refined and analyzed using state-of-the-art energy models and molecular modeling methods.

获得蛋白质中大振幅构象转变的原子尺度信息对于实验和计算方法都是一个具有挑战性的问题。然而,这些信息对于理解许多蛋白质相互作用的机制是重要的。本文提出了一种计算效率高的方法,结合了机器人技术和计算生物物理学的方法,来模拟蛋白质的构象转变。利用正态分析预测集体大振幅运动方向的能力,对运动规划算法进行的构象勘探进行了偏置。为了降低问题的维数,计算了基于三残数短片段的粗粒度弹性网络模型的正态模态。然而,中间构象的有效性是用全原子模型来检验的,该模型是用封闭形式的逆运动学从粗粒度模型精确重建的。在一组十种蛋白质上的测试表明,该方法可以在单个处理器上几个小时的计算时间内模拟蛋白质的构象转变。这些结果还表明,计算时间与蛋白质大小成线性关系,与蛋白质拓扑结构无关。对腺苷酸激酶的进一步实验表明,该蛋白的开放和封闭构象之间转换的主要特征在计算路径中被很好地捕获。所提出的方法可以使用很少的计算资源来模拟蛋白质中的大振幅构象转变。所得到的路径是一个初步近似,可以直接提供有关构象转变的分子机制的重要信息。这种近似可以随后使用最先进的能量模型和分子建模方法进行细化和分析。
{"title":"Modeling protein conformational transitions by a combination of coarse-grained normal mode analysis and robotics-inspired methods","authors":"Ibrahim Al-Bluwi,&nbsp;Marc Vaisset,&nbsp;Thierry Siméon,&nbsp;Juan Cortés","doi":"10.1186/1472-6807-13-S1-S2","DOIUrl":"https://doi.org/10.1186/1472-6807-13-S1-S2","url":null,"abstract":"<p>Obtaining atomic-scale information about large-amplitude conformational transitions in proteins is a challenging problem for both experimental and computational methods. Such information is, however, important for understanding the mechanisms of interaction of many proteins.</p><p>This paper presents a computationally efficient approach, combining methods originating from robotics and computational biophysics, to model protein conformational transitions. The ability of normal mode analysis to predict directions of collective, large-amplitude motions is applied to bias the conformational exploration performed by a motion planning algorithm. To reduce the dimension of the problem, normal modes are computed for a coarse-grained elastic network model built on short fragments of three residues. Nevertheless, the validity of intermediate conformations is checked using the all-atom model, which is accurately reconstructed from the coarse-grained one using closed-form inverse kinematics.</p><p>Tests on a set of ten proteins demonstrate the ability of the method to model conformational transitions of proteins within a few hours of computing time on a single processor. These results also show that the computing time scales linearly with the protein size, independently of the protein topology. Further experiments on adenylate kinase show that main features of the transition between the open and closed conformations of this protein are well captured in the computed path.</p><p>The proposed method enables the simulation of large-amplitude conformational transitions in proteins using very few computational resources. The resulting paths are a first approximation that can directly provide important information on the molecular mechanisms involved in the conformational transition. This approximation can be subsequently refined and analyzed using state-of-the-art energy models and molecular modeling methods.</p>","PeriodicalId":51240,"journal":{"name":"BMC Structural Biology","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1472-6807-13-S1-S2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"4354946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Quantification of the impact of PSI:Biology according to the annotations of the determined structures 根据所确定结构的注释量化PSI:生物学的影响
Q3 Biochemistry, Genetics and Molecular Biology Pub Date : 2013-10-21 DOI: 10.1186/1472-6807-13-24
Paul J DePietro, Elchin S Julfayev, William A McLaughlin

Protein Structure Initiative:Biology (PSI:Biology) is the third phase of PSI where protein structures are determined in high-throughput to characterize their biological functions. The transition to the third phase entailed the formation of PSI:Biology Partnerships which are composed of structural genomics centers and biomedical science laboratories. We present a method to examine the impact of protein structures determined under the auspices of PSI:Biology by measuring their rates of annotations. The mean numbers of annotations per structure and per residue are examined. These are designed to provide measures of the amount of structure to function connections that can be leveraged from each structure.

One result is that PSI:Biology structures are found to have a higher rate of annotations than structures determined during the first two phases of PSI. A second result is that the subset of PSI:Biology structures determined through PSI:Biology Partnerships have a higher rate of annotations than those determined exclusive of those partnerships. Both results hold when the annotation rates are examined either at the level of the entire protein or for annotations that are known to fall at specific residues within the portion of the protein that has a determined structure.

We conclude that PSI:Biology determines structures that are estimated to have a higher degree of biomedical interest than those determined during the first two phases of PSI based on a broad array of biomedical annotations. For the PSI:Biology Partnerships, we see that there is an associated added value that represents part of the progress toward the goals of PSI:Biology. We interpret the added value to mean that team-based structural biology projects that utilize the expertise and technologies of structural genomics centers together with biological laboratories in the community are conducted in a synergistic manner. We show that the annotation rates can be used in conjunction with established metrics, i.e. the numbers of structures and impact of publication records, to monitor the progress of PSI:Biology towards its goals of examining structure to function connections of high biomedical relevance. The metric provides an objective means to quantify the overall impact of PSI:Biology as it uses biomedical annotations from external sources.

蛋白质结构倡议:生物学(PSI:生物学)是PSI的第三阶段,其中蛋白质结构以高通量确定以表征其生物学功能。向第三阶段的过渡需要形成PSI:生物学伙伴关系,由结构基因组学中心和生物医学科学实验室组成。我们提出了一种方法来检查在PSI主持下确定的蛋白质结构的影响:生物学通过测量它们的注释率。研究了每个结构和每个残基的平均注释数。这些设计是为了提供可以从每个结构中利用的结构到功能连接的数量的度量。其中一个结果是PSI:生物学结构比PSI前两个阶段确定的结构具有更高的注释率。第二个结果是,通过PSI:生物学伙伴关系确定的PSI:生物学结构的子集比不包含这些伙伴关系的子集具有更高的注释率。当在整个蛋白质的水平上检查注释率时,或者对于已知落在具有确定结构的蛋白质部分内特定残基上的注释率,这两个结果都成立。我们得出结论,PSI:生物学确定的结构估计具有更高程度的生物医学兴趣,而不是基于广泛的生物医学注释在PSI的前两个阶段确定的结构。对于PSI:生物学伙伴关系,我们看到有一个相关的附加价值,它代表了PSI:生物学目标进展的一部分。我们将附加价值解释为利用结构基因组学中心的专业知识和技术以及社区生物实验室的团队结构生物学项目以协同方式进行。我们表明,注释率可以与已建立的指标(即结构的数量和出版记录的影响)结合使用,以监测PSI:生物学朝着检查高度生物医学相关性的结构与功能联系的目标的进展。该指标提供了一种客观的方法来量化PSI:Biology的总体影响,因为它使用了来自外部来源的生物医学注释。
{"title":"Quantification of the impact of PSI:Biology according to the annotations of the determined structures","authors":"Paul J DePietro,&nbsp;Elchin S Julfayev,&nbsp;William A McLaughlin","doi":"10.1186/1472-6807-13-24","DOIUrl":"https://doi.org/10.1186/1472-6807-13-24","url":null,"abstract":"<p>Protein Structure Initiative:Biology (PSI:Biology) is the third phase of PSI where protein structures are determined in high-throughput to characterize their biological functions. The transition to the third phase entailed the formation of PSI:Biology Partnerships which are composed of structural genomics centers and biomedical science laboratories. We present a method to examine the impact of protein structures determined under the auspices of PSI:Biology by measuring their rates of annotations. The mean numbers of annotations per structure and per residue are examined. These are designed to provide measures of the amount of structure to function connections that can be leveraged from each structure.</p><p>One result is that PSI:Biology structures are found to have a higher rate of annotations than structures determined during the first two phases of PSI. A second result is that the subset of PSI:Biology structures determined through PSI:Biology Partnerships have a higher rate of annotations than those determined exclusive of those partnerships. Both results hold when the annotation rates are examined either at the level of the entire protein or for annotations that are known to fall at specific residues within the portion of the protein that has a determined structure.</p><p>We conclude that PSI:Biology determines structures that are estimated to have a higher degree of biomedical interest than those determined during the first two phases of PSI based on a broad array of biomedical annotations. For the PSI:Biology Partnerships, we see that there is an associated added value that represents part of the progress toward the goals of PSI:Biology. We interpret the added value to mean that team-based structural biology projects that utilize the expertise and technologies of structural genomics centers together with biological laboratories in the community are conducted in a synergistic manner. We show that the annotation rates can be used in conjunction with established metrics, i.e. the numbers of structures and impact of publication records, to monitor the progress of PSI:Biology towards its goals of examining structure to function connections of high biomedical relevance. The metric provides an objective means to quantify the overall impact of PSI:Biology as it uses biomedical annotations from external sources.</p>","PeriodicalId":51240,"journal":{"name":"BMC Structural Biology","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1472-6807-13-24","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"4837187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
gEMpicker: a highly parallel GPU-accelerated particle picking tool for cryo-electron microscopy gEMpicker:用于低温电子显微镜的高度并行gpu加速粒子拾取工具
Q3 Biochemistry, Genetics and Molecular Biology Pub Date : 2013-10-21 DOI: 10.1186/1472-6807-13-25
Thai V Hoang, Xavier Cavin, Patrick Schultz, David W Ritchie

Picking images of particles in cryo-electron micrographs is an important step in solving the 3D structures of large macromolecular assemblies. However, in order to achieve sub-nanometre resolution it is often necessary to capture and process many thousands or even several millions of 2D particle images. Thus, a computational bottleneck in reaching high resolution is the accurate and automatic picking of particles from raw cryo-electron micrographs.

We have developed “gEMpicker”, a highly parallel correlation-based particle picking tool. To our knowledge, gEMpicker is the first particle picking program to use multiple graphics processor units (GPUs) to accelerate the calculation. When tested on the publicly available keyhole limpet hemocyanin dataset, we find that gEMpicker gives similar results to the FindEM program. However, compared to calculating correlations on one core of a contemporary central processor unit (CPU), running gEMpicker on a modern GPU gives a speed-up of about 27 ×. To achieve even higher processing speeds, the basic correlation calculations are accelerated considerably by using a hierarchy of parallel programming techniques to distribute the calculation over multiple GPUs and CPU cores attached to multiple nodes of a computer cluster. By using a theoretically optimal reduction algorithm to collect and combine the cluster calculation results, the speed of the overall calculation scales almost linearly with the number of cluster nodes available.

The very high picking throughput that is now possible using GPU-powered workstations or computer clusters will help experimentalists to achieve higher resolution 3D reconstructions more rapidly than before.

在低温电子显微图像中提取粒子图像是解决大型大分子组装体三维结构的重要步骤。然而,为了达到亚纳米分辨率,通常需要捕获和处理数千甚至数百万个二维粒子图像。因此,达到高分辨率的计算瓶颈是准确和自动地从原始冷冻电子显微图中拾取颗粒。我们开发了“gEMpicker”,一个高度并行的基于相关的粒子拾取工具。据我们所知,gEMpicker是第一个使用多个图形处理器单元(gpu)来加速计算的粒子拾取程序。在公开的keyhole帽贝血青素数据集上进行测试时,我们发现gEMpicker给出了与FindEM程序相似的结果。然而,与在现代中央处理器单元(CPU)的一个核心上计算相关性相比,在现代GPU上运行gEMpicker的速度提高了约27倍。为了实现更高的处理速度,通过使用并行编程技术的层次结构,将计算分布在连接到计算机集群的多个节点的多个gpu和CPU内核上,可以大大加快基本的相关计算。通过使用理论上最优的约简算法对聚类计算结果进行收集和组合,整体计算速度几乎与可用的聚类节点数成线性关系。现在使用gpu驱动的工作站或计算机集群可以实现非常高的拾取吞吐量,这将帮助实验人员比以前更快地实现更高分辨率的3D重建。
{"title":"gEMpicker: a highly parallel GPU-accelerated particle picking tool for cryo-electron microscopy","authors":"Thai V Hoang,&nbsp;Xavier Cavin,&nbsp;Patrick Schultz,&nbsp;David W Ritchie","doi":"10.1186/1472-6807-13-25","DOIUrl":"https://doi.org/10.1186/1472-6807-13-25","url":null,"abstract":"<p>Picking images of particles in cryo-electron micrographs is an important step in solving the 3D structures of large macromolecular assemblies. However, in order to achieve sub-nanometre resolution it is often necessary to capture and process many thousands or even several millions of 2D particle images. Thus, a computational bottleneck in reaching high resolution is the accurate and automatic picking of particles from raw cryo-electron micrographs.</p><p>We have developed “gEMpicker”, a highly parallel correlation-based particle picking tool. To our knowledge, gEMpicker is the first particle picking program to use multiple graphics processor units (GPUs) to accelerate the calculation. When tested on the publicly available keyhole limpet hemocyanin dataset, we find that gEMpicker gives similar results to the FindEM program. However, compared to calculating correlations on one core of a contemporary central processor unit (CPU), running gEMpicker on a modern GPU gives a speed-up of about 27 ×. To achieve even higher processing speeds, the basic correlation calculations are accelerated considerably by using a hierarchy of parallel programming techniques to distribute the calculation over multiple GPUs and CPU cores attached to multiple nodes of a computer cluster. By using a theoretically optimal reduction algorithm to collect and combine the cluster calculation results, the speed of the overall calculation scales almost linearly with the number of cluster nodes available.</p><p>The very high picking throughput that is now possible using GPU-powered workstations or computer clusters will help experimentalists to achieve higher resolution 3D reconstructions more rapidly than before.</p>","PeriodicalId":51240,"journal":{"name":"BMC Structural Biology","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1472-6807-13-25","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"4837190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Structural and biochemical characterization of the essential DsbA-like disulfide bond forming protein from Mycobacterium tuberculosis 结核分枝杆菌必需的dsba样二硫键形成蛋白的结构和生化特性
Q3 Biochemistry, Genetics and Molecular Biology Pub Date : 2013-10-18 DOI: 10.1186/1472-6807-13-23
Nicholas Chim, Christine A Harmston, David J Guzman, Celia W Goulding

Bacterial D is ulfide b ond forming (Dsb) proteins facilitate proper folding and disulfide bond formation of periplasmic and secreted proteins. Previously, we have shown that Mycobacterium tuberculosis Mt-DsbE and Mt-DsbF aid in vitro oxidative folding of proteins. The M. tuberculosis proteome contains another predicted membrane-tethered Dsb protein, Mt-DsbA, which is encoded by an essential gene.

Herein, we present structural and biochemical analyses of Mt-DsbA. The X-ray crystal structure of Mt-DsbA reveals a two-domain structure, comprising a canonical thioredoxin domain with the conserved CXXC active site cysteines in their reduced form, and an inserted α-helical domain containing a structural disulfide bond. The overall fold of Mt-DsbA resembles that of other DsbA-like proteins and not Mt-DsbE or Mt-DsbF. Biochemical characterization demonstrates that, unlike Mt-DsbE and Mt-DsbF, Mt-DsbA is unable to oxidatively fold reduced, denatured hirudin. Moreover, on the substrates tested in this study, Mt-DsbA has disulfide bond isomerase activity contrary to Mt-DsbE and Mt-DsbF.

These results suggest that Mt-DsbA acts upon a distinct subset of substrates as compared to Mt-DsbE and Mt-DsbF. One could speculate that Mt-DsbE and Mt-DsbF are functionally redundant whereas Mt-DsbA is not, offering an explanation for the essentiality of Mt-DsbA in M. tuberculosis.

细菌D是硫化物b键形成(Dsb)蛋白促进适当折叠和二硫键形成的质周和分泌蛋白。以前,我们已经证明结核分枝杆菌Mt-DsbE和Mt-DsbF有助于蛋白质的体外氧化折叠。结核分枝杆菌蛋白质组包含另一种预测的膜系Dsb蛋白Mt-DsbA,它由一个必需基因编码。在此,我们对Mt-DsbA进行了结构和生化分析。Mt-DsbA的x射线晶体结构显示为双畴结构,包括典型硫氧还蛋白结构域和含有结构二硫键的插入α-螺旋结构域。Mt-DsbA的整体折叠类似于其他dsba样蛋白,而不是Mt-DsbE或Mt-DsbF。生化表征表明,与Mt-DsbE和Mt-DsbF不同,Mt-DsbA不能氧化折叠还原,变性水蛭素。此外,在本研究测试的底物上,Mt-DsbA具有与Mt-DsbE和Mt-DsbF相反的二硫键异构酶活性。这些结果表明,与Mt-DsbE和Mt-DsbF相比,Mt-DsbA作用于不同的底物子集。人们可以推测Mt-DsbE和Mt-DsbF在功能上是冗余的,而Mt-DsbA则不是,这就解释了Mt-DsbA在结核分枝杆菌中的重要性。
{"title":"Structural and biochemical characterization of the essential DsbA-like disulfide bond forming protein from Mycobacterium tuberculosis","authors":"Nicholas Chim,&nbsp;Christine A Harmston,&nbsp;David J Guzman,&nbsp;Celia W Goulding","doi":"10.1186/1472-6807-13-23","DOIUrl":"https://doi.org/10.1186/1472-6807-13-23","url":null,"abstract":"<p>Bacterial <i>D</i> i<i>s</i> ulfide <i>b</i> ond forming (Dsb) proteins facilitate proper folding and disulfide bond formation of periplasmic and secreted proteins. Previously, we have shown that <i>Mycobacterium tuberculosis</i> Mt-DsbE and Mt-DsbF aid <i>in vitro</i> oxidative folding of proteins. The <i>M. tuberculosis</i> proteome contains another predicted membrane-tethered Dsb protein, Mt-DsbA, which is encoded by an essential gene.</p><p>Herein, we present structural and biochemical analyses of Mt-DsbA. The X-ray crystal structure of Mt-DsbA reveals a two-domain structure, comprising a canonical thioredoxin domain with the conserved CXXC active site cysteines in their reduced form, and an inserted α-helical domain containing a structural disulfide bond. The overall fold of Mt-DsbA resembles that of other DsbA-like proteins and not Mt-DsbE or Mt-DsbF. Biochemical characterization demonstrates that, unlike Mt-DsbE and Mt-DsbF, Mt-DsbA is unable to oxidatively fold reduced, denatured hirudin. Moreover, on the substrates tested in this study, Mt-DsbA has disulfide bond isomerase activity contrary to Mt-DsbE and Mt-DsbF.</p><p>These results suggest that Mt-DsbA acts upon a distinct subset of substrates as compared to Mt-DsbE and Mt-DsbF. One could speculate that Mt-DsbE and Mt-DsbF are functionally redundant whereas Mt-DsbA is not, offering an explanation for the essentiality of Mt-DsbA in <i>M. tuberculosis.</i>\u0000</p>","PeriodicalId":51240,"journal":{"name":"BMC Structural Biology","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1472-6807-13-23","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"4733830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
期刊
BMC Structural Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1