Bridging the Gap between Sequence and Structure Classifications of Proteins with AlphaFold Models

IF 4.7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Journal of Molecular Biology Pub Date : 2024-08-26 DOI:10.1016/j.jmb.2024.168764
{"title":"Bridging the Gap between Sequence and Structure Classifications of Proteins with AlphaFold Models","authors":"","doi":"10.1016/j.jmb.2024.168764","DOIUrl":null,"url":null,"abstract":"<div><p>Classification of protein domains based on homology and structural similarity serves as a fundamental tool to gain biological insights into protein function. Recent advancements in protein structure prediction, exemplified by AlphaFold, have revolutionized the availability of protein structural data. We focus on classifying about 9000 Pfam families into ECOD (Evolutionary Classification of Domains) by using predicted AlphaFold models and the DPAM (Domain Parser for AlphaFold Models) tool. Our results offer insights into their homologous relationships and domain boundaries. More than half of these Pfam families contain DPAM domains that can be confidently assigned to the ECOD hierarchy. Most assigned domains belong to highly populated folds such as Immunoglobulin-like (IgL), Armadillo (ARM), helix-turn-helix (HTH), and Src homology 3 (SH3). A large fraction of DPAM domains, however, cannot be confidently assigned to ECOD homologous groups. These unassigned domains exhibit statistically different characteristics, including shorter average length, fewer secondary structure elements, and more abundant transmembrane segments. They could potentially define novel families remotely related to domains with known structures or novel superfamilies and folds. Manual scrutiny of a subset of these domains revealed an abundance of internal duplications and recurring structural motifs. Exploring sequence and structural features such as disulfide bond patterns, metal-binding sites, and enzyme active sites helped uncover novel structural folds as well as remote evolutionary relationships. By bridging the gap between sequence-based Pfam and structure-based ECOD domain classifications, our study contributes to a more comprehensive understanding of the protein universe by providing structural and functional insights into previously uncharacterized proteins.</p></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":null,"pages":null},"PeriodicalIF":4.7000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S002228362400384X","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Classification of protein domains based on homology and structural similarity serves as a fundamental tool to gain biological insights into protein function. Recent advancements in protein structure prediction, exemplified by AlphaFold, have revolutionized the availability of protein structural data. We focus on classifying about 9000 Pfam families into ECOD (Evolutionary Classification of Domains) by using predicted AlphaFold models and the DPAM (Domain Parser for AlphaFold Models) tool. Our results offer insights into their homologous relationships and domain boundaries. More than half of these Pfam families contain DPAM domains that can be confidently assigned to the ECOD hierarchy. Most assigned domains belong to highly populated folds such as Immunoglobulin-like (IgL), Armadillo (ARM), helix-turn-helix (HTH), and Src homology 3 (SH3). A large fraction of DPAM domains, however, cannot be confidently assigned to ECOD homologous groups. These unassigned domains exhibit statistically different characteristics, including shorter average length, fewer secondary structure elements, and more abundant transmembrane segments. They could potentially define novel families remotely related to domains with known structures or novel superfamilies and folds. Manual scrutiny of a subset of these domains revealed an abundance of internal duplications and recurring structural motifs. Exploring sequence and structural features such as disulfide bond patterns, metal-binding sites, and enzyme active sites helped uncover novel structural folds as well as remote evolutionary relationships. By bridging the gap between sequence-based Pfam and structure-based ECOD domain classifications, our study contributes to a more comprehensive understanding of the protein universe by providing structural and functional insights into previously uncharacterized proteins.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用 AlphaFold 模型弥合蛋白质序列和结构分类之间的差距。
根据同源性和结构相似性对蛋白质结构域进行分类是深入了解蛋白质功能的基本工具。以 AlphaFold 为代表的蛋白质结构预测技术的最新进展彻底改变了蛋白质结构数据的可用性。我们利用预测的 AlphaFold 模型和 DPAM(用于 AlphaFold 模型的领域解析器)工具,重点将约 9000 个 Pfam 家族分类为 ECOD(领域进化分类)。我们的研究结果有助于深入了解它们的同源关系和域边界。在这些 Pfam 家族中,有一半以上的 DPAM 结构域可以被可靠地分配到 ECOD 层次结构中。大多数被分配的结构域都属于高填充折叠,如免疫球蛋白样(IgL)、犰狳(ARM)、螺旋-翻转-螺旋(HTH)和 Src 同源 3(SH3)。然而,有很大一部分 DPAM 结构域无法确定归属于 ECOD 同源组。这些未归类的结构域在统计学上表现出不同的特征,包括平均长度较短、二级结构元素较少以及跨膜片段较多。它们有可能定义出与已知结构域或新型超家族和折叠结构域密切相关的新家族。对这些结构域的一个子集进行人工检查发现了大量的内部重复和重复出现的结构模式。探索序列和结构特征(如二硫键模式、金属结合位点和酶活性位点)有助于发现新的结构褶皱和远缘进化关系。通过弥合基于序列的 Pfam 和基于结构的 ECOD 结构域分类之间的差距,我们的研究提供了对以前未表征的蛋白质的结构和功能的见解,从而有助于更全面地了解蛋白质宇宙。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Molecular Biology
Journal of Molecular Biology 生物-生化与分子生物学
CiteScore
11.30
自引率
1.80%
发文量
412
审稿时长
28 days
期刊介绍: Journal of Molecular Biology (JMB) provides high quality, comprehensive and broad coverage in all areas of molecular biology. The journal publishes original scientific research papers that provide mechanistic and functional insights and report a significant advance to the field. The journal encourages the submission of multidisciplinary studies that use complementary experimental and computational approaches to address challenging biological questions. Research areas include but are not limited to: Biomolecular interactions, signaling networks, systems biology; Cell cycle, cell growth, cell differentiation; Cell death, autophagy; Cell signaling and regulation; Chemical biology; Computational biology, in combination with experimental studies; DNA replication, repair, and recombination; Development, regenerative biology, mechanistic and functional studies of stem cells; Epigenetics, chromatin structure and function; Gene expression; Membrane processes, cell surface proteins and cell-cell interactions; Methodological advances, both experimental and theoretical, including databases; Microbiology, virology, and interactions with the host or environment; Microbiota mechanistic and functional studies; Nuclear organization; Post-translational modifications, proteomics; Processing and function of biologically important macromolecules and complexes; Molecular basis of disease; RNA processing, structure and functions of non-coding RNAs, transcription; Sorting, spatiotemporal organization, trafficking; Structural biology; Synthetic biology; Translation, protein folding, chaperones, protein degradation and quality control.
期刊最新文献
Determinants in the HTLV-1 capsid major homology region that are critical for virus particle assembly. Pim1 is critical in T-cell-independent B-cell response and MAPK activation in B cells. Translation complex profile sequencing allows discrimination of leaky scanning and reinitiation in upstream open reading frame-controlled translation. Chromatin Transcription Elongation - A Structural Perspective. A nanobody toolbox for recognizing distinct epitopes on Cas9.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1