An algebraic geometry approach to protein structure determination from NMR data.

Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2005-01-01 DOI:10.1109/csb.2005.11

Lincong Wang, Ramgopal R Mettu, Bruce Randall Donald

{"title":"An algebraic geometry approach to protein structure determination from NMR data.","authors":"Lincong Wang, Ramgopal R Mettu, Bruce Randall Donald","doi":"10.1109/csb.2005.11","DOIUrl":null,"url":null,"abstract":"<p><p>Our paper describes the first provably-efficient algorithm for determining protein structures de novo, solely from experimental data. We show how the global nature of a certain kind of NMR data provides quantifiable complexity-theoretic benefits, allowing us to classify our algorithm as running in polynomial time. While our algorithm uses NMR data as input, it is the first polynomial-time algorithm to compute high-resolution structures de novo using any experimentally-recorded data, from either NMR spectroscopy or X-Ray crystallography. Improved algorithms for protein structure determination are needed, because currently, the process is expensive and time-consuming. For example, an area of intense research in NMR methodology is automated assignment of nuclear Overhauser effect (NOE) restraints, in which structure determination sits in a tight inner-loop (cycle) of assignment/refinement. These algorithms are very time-consuming, and typically require a large cluster. Thus, algorithms for protein structure determination that are known to run in polynomial time and provide guarantees on solution accuracy are likely to have great impact in the long-term. Methods stemming from a technique called \"distance geometry embedding\" do come with provable guarantees, but the NP-hardness of these problem formulations implies that in the worst case these techniques cannot run in polynomial time. We are able to avoid the NP-hardness by (a) some mild assumptions about the protein being studied, (b) the use of residual dipolar couplings (RDCs) instead of a dense network of NOEs, and (c) novel algorithms and proofs that exploit the biophysical geometry of (a) and (b), drawing on a variety of computer science, computational geometry, and computational algebra techniques. In our algorithm, RDC data, which gives global restraints on the orientation of internuclear bond vectors, is used in conjunction with very sparse NOE data to obtain a polynomial-time algorithm for protein structure determination. An implementation of our algorithm has been applied to 6 different real biological NMR data sets recorded for 3 proteins. Our algorithm is combinatorially precise, polynomial-time, and uses much less NMR data to produce results that are as good or better than previous approaches in terms of accuracy of the computed structure as well as running time. In practice approaches such as restrained molecular dynamics and simulated annealing, which lack both combinatorial precision and guarantees on running time and solution quality, are commonly used. Our results show that by using a different \"slice\" of the data, an algorithm that is polynomial time and that has guarantees about solution quality can be obtained. We believe that our techniques can be extended and generalized for other structure-determination problems such as computing side-chain conformations and the structure of nucleic acids from experimental data.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"235-46"},"PeriodicalIF":0.0000,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.11","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Computational Systems Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/csb.2005.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Our paper describes the first provably-efficient algorithm for determining protein structures de novo, solely from experimental data. We show how the global nature of a certain kind of NMR data provides quantifiable complexity-theoretic benefits, allowing us to classify our algorithm as running in polynomial time. While our algorithm uses NMR data as input, it is the first polynomial-time algorithm to compute high-resolution structures de novo using any experimentally-recorded data, from either NMR spectroscopy or X-Ray crystallography. Improved algorithms for protein structure determination are needed, because currently, the process is expensive and time-consuming. For example, an area of intense research in NMR methodology is automated assignment of nuclear Overhauser effect (NOE) restraints, in which structure determination sits in a tight inner-loop (cycle) of assignment/refinement. These algorithms are very time-consuming, and typically require a large cluster. Thus, algorithms for protein structure determination that are known to run in polynomial time and provide guarantees on solution accuracy are likely to have great impact in the long-term. Methods stemming from a technique called "distance geometry embedding" do come with provable guarantees, but the NP-hardness of these problem formulations implies that in the worst case these techniques cannot run in polynomial time. We are able to avoid the NP-hardness by (a) some mild assumptions about the protein being studied, (b) the use of residual dipolar couplings (RDCs) instead of a dense network of NOEs, and (c) novel algorithms and proofs that exploit the biophysical geometry of (a) and (b), drawing on a variety of computer science, computational geometry, and computational algebra techniques. In our algorithm, RDC data, which gives global restraints on the orientation of internuclear bond vectors, is used in conjunction with very sparse NOE data to obtain a polynomial-time algorithm for protein structure determination. An implementation of our algorithm has been applied to 6 different real biological NMR data sets recorded for 3 proteins. Our algorithm is combinatorially precise, polynomial-time, and uses much less NMR data to produce results that are as good or better than previous approaches in terms of accuracy of the computed structure as well as running time. In practice approaches such as restrained molecular dynamics and simulated annealing, which lack both combinatorial precision and guarantees on running time and solution quality, are commonly used. Our results show that by using a different "slice" of the data, an algorithm that is polynomial time and that has guarantees about solution quality can be obtained. We believe that our techniques can be extended and generalized for other structure-determination problems such as computing side-chain conformations and the structure of nucleic acids from experimental data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从核磁共振数据测定蛋白质结构的代数几何方法。

我们的论文描述了第一个可以证明有效的算法来确定蛋白质结构从头开始，仅仅从实验数据。我们展示了某种核磁共振数据的全局性质如何提供可量化的复杂性理论好处，允许我们将算法分类为在多项式时间内运行。虽然我们的算法使用核磁共振数据作为输入，但它是第一个使用任何实验记录数据(从核磁共振波谱或x射线晶体学)从头计算高分辨率结构的多项式时间算法。由于目前的检测过程昂贵且耗时，因此需要改进蛋白质结构的检测算法。例如，核磁共振方法论的一个热门研究领域是核Overhauser效应(NOE)约束的自动分配，其中结构确定位于分配/细化的紧密内循环(循环)中。这些算法非常耗时，通常需要一个大型集群。因此，已知在多项式时间内运行并提供解决精度保证的蛋白质结构确定算法可能在长期内产生很大影响。源于“距离几何嵌入”技术的方法确实具有可证明的保证，但这些问题公式的np硬度意味着，在最坏的情况下，这些技术不能在多项式时间内运行。我们能够通过(a)对所研究的蛋白质进行一些温和的假设，(b)使用残余偶极偶联(rdc)而不是密集的noe网络，以及(c)利用(a)和(b)的生物物理几何，利用各种计算机科学，计算几何和计算代数技术的新算法和证明来避免np硬度。在我们的算法中，RDC数据给出了核间键向量方向的全局约束，与非常稀疏的NOE数据结合使用，获得了一个用于蛋白质结构确定的多项式时间算法。我们的算法的实现已经应用于6个不同的真实生物NMR数据集，记录了3种蛋白质。我们的算法是组合精确的，多项式时间的，并且使用更少的NMR数据来产生在计算结构的准确性和运行时间方面与以前的方法一样好或更好的结果。在实际应用中，常用的是约束分子动力学和模拟退火等方法，它们既缺乏组合精度，又缺乏运行时间和求解质量的保证。我们的结果表明，通过使用数据的不同“切片”，可以获得多项式时间且有保证解质量的算法。我们相信，我们的技术可以扩展和推广到其他结构确定问题，如计算侧链构象和核酸结构的实验数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings. IEEE Computational Systems Bioinformatics Conference

自引率

0.00%

发文量