{"title":"Prediction of protein secondary structure using the 3D-1D compatibility algorithm.","authors":"M Ito, Y Matsuo, K Nishikawa","doi":"10.1093/bioinformatics/13.4.415","DOIUrl":null,"url":null,"abstract":"<p><p>A new method for the prediction of protein secondary structure is proposed, which relies totally on the global aspect of a protein. The prediction scheme is as follows. A structural library is first scanned with a query sequence by the 3D-1D compatibility method developed before. All the structures examined are sorted with the compatibility score and the top 50 in the list are picked out. Then, all the known secondary structures of the 50 proteins are globally aligned against the query sequence, according to the 3D-1D alignments. Prediction of either alpha helix, beta strand or coil is made by taking the majority among the observations at each residue site. Besides 325 proteins in the structural library, 77 proteins were selected from the latest release of the Brookhaven Protein Data Bank, and they were divided into three data sets. Data set 1 was used as a training set for which several adjustable parameters in the method were optimized. Then, the final form of the method was applied to a testing set (data set 2) which contained proteins of chain length < or = 400 residues. The average prediction accuracy was as high as 69% in the three-state assessment of alpha, beta and coil. On the other hand, data set 3 contains only those proteins of length > 400 residues, for which the present method would not work properly because of the size effect inherent in the 3D-1D compatibility method. The proteins in data set 3 were, therefore, subdivided into constituent domains (data set 4) before being fed into the prediction program. The prediction accuracy for data set 4 was 66% on average, a few percent lower than that for data set 2. Possible causes for this discrepancy are discussed.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.415","citationCount":"49","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer applications in the biosciences : CABIOS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/13.4.415","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 49
Abstract
A new method for the prediction of protein secondary structure is proposed, which relies totally on the global aspect of a protein. The prediction scheme is as follows. A structural library is first scanned with a query sequence by the 3D-1D compatibility method developed before. All the structures examined are sorted with the compatibility score and the top 50 in the list are picked out. Then, all the known secondary structures of the 50 proteins are globally aligned against the query sequence, according to the 3D-1D alignments. Prediction of either alpha helix, beta strand or coil is made by taking the majority among the observations at each residue site. Besides 325 proteins in the structural library, 77 proteins were selected from the latest release of the Brookhaven Protein Data Bank, and they were divided into three data sets. Data set 1 was used as a training set for which several adjustable parameters in the method were optimized. Then, the final form of the method was applied to a testing set (data set 2) which contained proteins of chain length < or = 400 residues. The average prediction accuracy was as high as 69% in the three-state assessment of alpha, beta and coil. On the other hand, data set 3 contains only those proteins of length > 400 residues, for which the present method would not work properly because of the size effect inherent in the 3D-1D compatibility method. The proteins in data set 3 were, therefore, subdivided into constituent domains (data set 4) before being fed into the prediction program. The prediction accuracy for data set 4 was 66% on average, a few percent lower than that for data set 2. Possible causes for this discrepancy are discussed.
提出了一种完全依赖于蛋白质整体结构的蛋白质二级结构预测新方法。预测方案如下:首先使用之前开发的3D-1D兼容方法对结构库进行查询序列扫描。所有被检查的结构都按照兼容性评分进行排序,并在列表中挑选出前50名。然后,根据3D-1D比对,将50个蛋白质的所有已知二级结构与查询序列进行全局比对。对α螺旋、β链或螺旋的预测是通过在每个残基位点的观察中取大多数来完成的。除了结构库中的325个蛋白质外,还从最新发布的Brookhaven Protein Data Bank中选择了77个蛋白质,并将其分为三个数据集。以数据集1作为训练集,对方法中的几个可调参数进行优化。然后,将该方法的最终形式应用于包含链长<或= 400个残基的蛋白质的测试集(数据集2)。在alpha、beta和coil三种状态评估中,平均预测准确率高达69%。另一方面,数据集3仅包含长度> 400个残基的蛋白质,由于3D-1D相容性方法固有的尺寸效应,目前的方法将无法正常工作。因此,在输入到预测程序之前,数据集3中的蛋白质被细分为组成域(数据集4)。数据集4的预测精度平均为66%,比数据集2的预测精度低几个百分点。讨论了造成这种差异的可能原因。