Target selection for structural genomics based on combining fold recognition and crystallisation prediction methods: application to the human proteome.

James E Bray
{"title":"Target selection for structural genomics based on combining fold recognition and crystallisation prediction methods: application to the human proteome.","authors":"James E Bray","doi":"10.1007/s10969-012-9130-x","DOIUrl":null,"url":null,"abstract":"<p><p>The objective of this study is to automatically identify regions of the human proteome that are suitable for 3D structure determination by X-ray crystallography and to annotate them according to their likelihood to produce diffraction quality crystals. The results provide a powerful tool for structural genomics laboratories who wish to select human proteins based on the statistical likelihood of crystallisation success. Combining fold recognition and crystallisation prediction algorithms enables the efficient calculation of the crystallisability of the entire human proteome. This novel study estimates that there are approximately 40,000 crystallisable regions in the human proteome. Currently, only 15% of these regions (approx. 6,000 sequences) have been solved to at least 95% sequence identity. The remaining unsolved regions have been categorised into 5 crystallisation classes and an integral membrane protein (IMP) class, based on established structure prediction, crystallisation prediction and transmembrane (TM) helix prediction algorithms. Approximately 750 unsolved regions (2% of the proteome) have been identified as having a PDB fold representative (template) and an 'optimal' likelihood of crystallisation. At the other end of the spectrum, more than 10,500 non-IMP regions with a PDB template are classified as 'very difficult' to crystallise (26%) and almost 2,500 regions (6%) were predicted to contain at least 3 TM helices. The 3D-SPECS (3D Structural Proteomics Explorer with Crystallisation Scores) website contains crystallisation predictions for the entire human proteome and can be found at http://www.bioinformaticsplus.org/3dspecs.</p>","PeriodicalId":73957,"journal":{"name":"Journal of structural and functional genomics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10969-012-9130-x","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of structural and functional genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10969-012-9130-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2012/2/22 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

The objective of this study is to automatically identify regions of the human proteome that are suitable for 3D structure determination by X-ray crystallography and to annotate them according to their likelihood to produce diffraction quality crystals. The results provide a powerful tool for structural genomics laboratories who wish to select human proteins based on the statistical likelihood of crystallisation success. Combining fold recognition and crystallisation prediction algorithms enables the efficient calculation of the crystallisability of the entire human proteome. This novel study estimates that there are approximately 40,000 crystallisable regions in the human proteome. Currently, only 15% of these regions (approx. 6,000 sequences) have been solved to at least 95% sequence identity. The remaining unsolved regions have been categorised into 5 crystallisation classes and an integral membrane protein (IMP) class, based on established structure prediction, crystallisation prediction and transmembrane (TM) helix prediction algorithms. Approximately 750 unsolved regions (2% of the proteome) have been identified as having a PDB fold representative (template) and an 'optimal' likelihood of crystallisation. At the other end of the spectrum, more than 10,500 non-IMP regions with a PDB template are classified as 'very difficult' to crystallise (26%) and almost 2,500 regions (6%) were predicted to contain at least 3 TM helices. The 3D-SPECS (3D Structural Proteomics Explorer with Crystallisation Scores) website contains crystallisation predictions for the entire human proteome and can be found at http://www.bioinformaticsplus.org/3dspecs.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于折叠识别和结晶预测相结合的结构基因组学靶点选择:在人类蛋白质组中的应用。
本研究的目的是通过x射线晶体学自动识别人类蛋白质组中适合三维结构测定的区域,并根据其产生衍射质量晶体的可能性对其进行注释。这些结果为结构基因组学实验室提供了一个强大的工具,他们希望根据结晶成功的统计可能性来选择人类蛋白质。结合折叠识别和结晶预测算法,可以有效地计算整个人类蛋白质组的结晶性。这项新研究估计人类蛋白质组中大约有40,000个可结晶区域。目前,这些地区中只有15%(约为10%)6000个序列)已经解决了至少95%的序列同一性。基于已建立的结构预测、结晶预测和跨膜(TM)螺旋预测算法,剩余未解决的区域被分为5个结晶类和一个完整膜蛋白(IMP)类。大约750个未解决的区域(2%的蛋白质组)已被确定为具有PDB折叠代表(模板)和“最佳”结晶可能性。在光谱的另一端,超过10,500个具有PDB模板的非imp区域被归类为“非常难以”结晶(26%),近2,500个区域(6%)被预测包含至少3个TM螺旋。3D- specs(具有结晶分数的3D结构蛋白质组学探索者)网站包含整个人类蛋白质组的结晶预测,可以在http://www.bioinformaticsplus.org/3dspecs上找到。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Structural Genomics: General Applications Classification of ligand molecules in PDB with graph match-based structural superposition HOMCOS: an updated server to search and model complex 3D structures. NLDB: a database for 3D protein-ligand interactions in enzymatic reactions. Toward the next step in G protein-coupled receptor research: a knowledge-driven analysis for the next potential targets in drug discovery
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1