Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics

Q3 Biochemistry, Genetics and Molecular Biology BMC Structural Biology Pub Date : 2012-05-29 DOI:10.1186/1472-6807-12-11
Wataru Nemoto, Hiroyuki Toh
{"title":"Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics","authors":"Wataru Nemoto,&nbsp;Hiroyuki Toh","doi":"10.1186/1472-6807-12-11","DOIUrl":null,"url":null,"abstract":"<p>The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions.</p><p>We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence-based methods.</p><p>Appropriate homologous sequences are selected automatically and objectively by the index. Such sequence selection improved the performance of functional region prediction. As far as we know, this is the first approach in which spatial statistics have been applied to protein analyses. Such integration of structure and sequence information would be useful for other bioinformatics problems.</p>","PeriodicalId":51240,"journal":{"name":"BMC Structural Biology","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2012-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1472-6807-12-11","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Structural Biology","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1186/1472-6807-12-11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 6

Abstract

The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions.

We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence-based methods.

Appropriate homologous sequences are selected automatically and objectively by the index. Such sequence selection improved the performance of functional region prediction. As far as we know, this is the first approach in which spatial statistics have been applied to protein analyses. Such integration of structure and sequence information would be useful for other bioinformatics problems.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用一组合适的同源序列进行功能区预测——一种将结构和序列信息与空间统计相结合的序列选择指标
检测蛋白质结构上的保守残基簇是预测蛋白质功能区域的有效策略之一。各种各样的方法,如进化追踪,都是基于这一策略发展起来的。在这种方法中,保守残基是通过比较同源氨基酸序列来确定的。因此,同源序列的选择是至关重要的一步。经验表明,保守残基的识别需要同源序列集合中一定程度的序列散度。然而,一种方法的发展,以选择同源序列适当的鉴定保守残基尚未得到充分解决。为了有效地预测功能区域,需要一种客观、通用的方法来选择合适的同源序列。我们开发了一种新的索引来选择适合于鉴定保守残基的序列,并在我们的方法中实现了该索引来预测蛋白质的功能区域。该指标的实现提高了功能区预测的性能。该指数表示蛋白质三级结构上的保守残基聚类程度。为此,利用空间统计的方法将结构和序列信息整合到索引中。空间统计学是既考虑数据的属性又考虑数据的几何坐标的统计学领域。聚类程度越高,索引得分越高。我们采用指标得分最高的同源序列集合,假设聚类程度最大时预测精度最好。与其他基于序列的方法选择的序列集相比,索引选择的序列集具有更高的功能区预测性能。通过索引自动、客观地选择合适的同源序列。这样的序列选择提高了功能区预测的性能。据我们所知,这是空间统计首次应用于蛋白质分析的方法。这种结构和序列信息的整合将有助于解决其他生物信息学问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
3.60
自引率
0.00%
发文量
0
审稿时长
>12 weeks
期刊介绍: BMC Structural Biology is an open access, peer-reviewed journal that considers articles on investigations into the structure of biological macromolecules, including solving structures, structural and functional analyses, and computational modeling.
期刊最新文献
Characterization of putative proteins encoded by variable ORFs in white spot syndrome virus genome Correction to: Classification of the human THAP protein family identifies an evolutionarily conserved coiled coil region Effect of low complexity regions within the PvMSP3α block II on the tertiary structure of the protein and implications to immune escape mechanisms QRNAS: software tool for refinement of nucleic acid structures Classification of the human THAP protein family identifies an evolutionarily conserved coiled coil region
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1