基于贝叶斯对齐的数据库搜索。

Proceedings. International Conference on Intelligent Systems for Molecular Biology Pub Date : 1999-01-01

J Zhu, R Lüthy, C E Lawrence

{"title":"基于贝叶斯对齐的数据库搜索。","authors":"J Zhu, R Lüthy, C E Lawrence","doi":"","DOIUrl":null,"url":null,"abstract":"The size of protein sequence database is getting larger each day. One common challenge is to predict protein structures or functions of the sequences in databases. It is easy when a sequence shares direct similarity to a well-characterized protein. If there is no direct similarity, we have to rely on a third sequence or a model as intermediate to link two proteins together. We developed a new model based method, called Bayesian search, as a means to connect two distantly related proteins. We compared this Bayesian search model with pairwise and multiple sequence comparison methods on structural databases using structural similarity as the criteria for relationship. The results show that the Bayesian search can link more distantly related sequence pairs than other methods, collectively and consistently over large protein families. If each query made one error on average against SCOP database PDB40D-B, Bayesian search found 36.5% of related pairs, PSI-Blast found 32.6%, and Smith-Waterman method found 25%. Examples are presented to show that the alignments predicted by the Bayesian search agree well with structural alignments. Also false positives found by Bayesian search at low cutoff values are analyzed.","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Database search based on Bayesian alignment.\",\"authors\":\"J Zhu, R Lüthy, C E Lawrence\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The size of protein sequence database is getting larger each day. One common challenge is to predict protein structures or functions of the sequences in databases. It is easy when a sequence shares direct similarity to a well-characterized protein. If there is no direct similarity, we have to rely on a third sequence or a model as intermediate to link two proteins together. We developed a new model based method, called Bayesian search, as a means to connect two distantly related proteins. We compared this Bayesian search model with pairwise and multiple sequence comparison methods on structural databases using structural similarity as the criteria for relationship. The results show that the Bayesian search can link more distantly related sequence pairs than other methods, collectively and consistently over large protein families. If each query made one error on average against SCOP database PDB40D-B, Bayesian search found 36.5% of related pairs, PSI-Blast found 32.6%, and Smith-Waterman method found 25%. Examples are presented to show that the alignments predicted by the Bayesian search agree well with structural alignments. Also false positives found by Bayesian search at low cutoff values are analyzed.\",\"PeriodicalId\":79420,\"journal\":{\"name\":\"Proceedings. International Conference on Intelligent Systems for Molecular Biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Conference on Intelligent Systems for Molecular Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

蛋白质序列数据库的规模日益扩大。一个常见的挑战是预测数据库中序列的蛋白质结构或功能。当一个序列与一个特征良好的蛋白质有直接的相似性时，这很容易。如果没有直接的相似性，我们必须依靠第三个序列或模型作为中间连接两个蛋白质。我们开发了一种新的基于模型的方法，称为贝叶斯搜索，作为连接两个远亲蛋白的手段。我们以结构相似度作为关系标准，将该贝叶斯搜索模型与结构数据库的两两比对和多序列比对方法进行了比较。结果表明，与其他方法相比，贝叶斯搜索可以连接更多的远亲序列对，在大蛋白质家族中集体和一致。如果每个查询对SCOP数据库PDB40D-B平均产生一次错误，则贝叶斯搜索发现36.5%的相关对，PSI-Blast发现32.6%，Smith-Waterman方法发现25%。算例表明，贝叶斯搜索预测的排列与结构排列吻合较好。同时分析了贝叶斯搜索在低截止值下发现的假阳性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Database search based on Bayesian alignment.

The size of protein sequence database is getting larger each day. One common challenge is to predict protein structures or functions of the sequences in databases. It is easy when a sequence shares direct similarity to a well-characterized protein. If there is no direct similarity, we have to rely on a third sequence or a model as intermediate to link two proteins together. We developed a new model based method, called Bayesian search, as a means to connect two distantly related proteins. We compared this Bayesian search model with pairwise and multiple sequence comparison methods on structural databases using structural similarity as the criteria for relationship. The results show that the Bayesian search can link more distantly related sequence pairs than other methods, collectively and consistently over large protein families. If each query made one error on average against SCOP database PDB40D-B, Bayesian search found 36.5% of related pairs, PSI-Blast found 32.6%, and Smith-Waterman method found 25%. Examples are presented to show that the alignments predicted by the Bayesian search agree well with structural alignments. Also false positives found by Bayesian search at low cutoff values are analyzed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. International Conference on Intelligent Systems for Molecular Biology

自引率

0.00%

发文量