{"title":"具有结构或拓扑相似性的序列相似蛋白质域对。","authors":"Peter Røgen","doi":"10.1002/prot.26753","DOIUrl":null,"url":null,"abstract":"<p><p>For a variety of applications, protein structures are clustered by sequence similarity, and sequence-redundant structures are disregarded. Sequence-similar chains are likely to have similar structures, but significant structural variation, as measured with RMSD, has been documented for sequence-similar chains and found usually to have a functional explanation. Moving two neighboring stretches of backbone through each other may change the chain topology and alter possible folding paths. The size of this motion is compatible to a variation in a flexible loop. We search and find domains with alternate chain topology in CATH4.2 sequence families relatively independent of sequence identity and of structural similarity as measured by RMSD. Structural, topological, and functional representative sets should therefore keep sequence-similar domains not just with structural variation but also with topological variation. We present BCAlign that finds Alignment and superposition of protein Backbone Curves by optimizing a user chosen convex combination of structural derivation and derivation between the structure-based sequence alignment and an input sequence alignment. Steric and topological obstructions from deforming a curve into an aligned curve are then found by a previously developed algorithm. For highly sequence-similar domains, sequence-based structural alignment better represents the chains motion and generally reveals larger structural and topological variation than structure-based does. Fold-switching protein pairs have been reported to be most frequent between X-ray and NMR structures and estimated to be underrepresented in the PDB as the alternate configuration is harder to resolve. Here we similarly find chain topology most frequently altered between X-ray and NMR structures.</p>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sequence-Similar Protein Domain Pairs With Structural or Topological Dissimilarity.\",\"authors\":\"Peter Røgen\",\"doi\":\"10.1002/prot.26753\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>For a variety of applications, protein structures are clustered by sequence similarity, and sequence-redundant structures are disregarded. Sequence-similar chains are likely to have similar structures, but significant structural variation, as measured with RMSD, has been documented for sequence-similar chains and found usually to have a functional explanation. Moving two neighboring stretches of backbone through each other may change the chain topology and alter possible folding paths. The size of this motion is compatible to a variation in a flexible loop. We search and find domains with alternate chain topology in CATH4.2 sequence families relatively independent of sequence identity and of structural similarity as measured by RMSD. Structural, topological, and functional representative sets should therefore keep sequence-similar domains not just with structural variation but also with topological variation. We present BCAlign that finds Alignment and superposition of protein Backbone Curves by optimizing a user chosen convex combination of structural derivation and derivation between the structure-based sequence alignment and an input sequence alignment. Steric and topological obstructions from deforming a curve into an aligned curve are then found by a previously developed algorithm. For highly sequence-similar domains, sequence-based structural alignment better represents the chains motion and generally reveals larger structural and topological variation than structure-based does. Fold-switching protein pairs have been reported to be most frequent between X-ray and NMR structures and estimated to be underrepresented in the PDB as the alternate configuration is harder to resolve. Here we similarly find chain topology most frequently altered between X-ray and NMR structures.</p>\",\"PeriodicalId\":3,\"journal\":{\"name\":\"ACS Applied Electronic Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Electronic Materials\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/prot.26753\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/prot.26753","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
摘要
在各种应用中,蛋白质结构都是按序列相似性分组的,而序列冗余结构则不予考虑。序列相似的链很可能具有相似的结构,但用 RMSD 测量,序列相似的链也有显著的结构差异,而且通常有功能上的解释。将两条相邻的骨架相互移动可能会改变链的拓扑结构,并改变可能的折叠路径。这种运动的大小与柔性环的变化相当。我们在 CATH4.2 序列家族中搜索并发现了具有交替链拓扑结构的结构域,这些结构域相对独立于序列同一性和用 RMSD 测量的结构相似性。因此,结构、拓扑和功能代表集不仅应保留结构变异的序列相似结构域,还应保留拓扑变异的序列相似结构域。我们提出的 BCAlign 可以通过优化用户选择的结构推导和基于结构的序列比对与输入序列比对之间的推导的凸组合,找到蛋白质骨干曲线的比对和叠加。然后,通过之前开发的算法,找到将曲线变形为对齐曲线的立体和拓扑障碍。对于序列高度相似的结构域,基于序列的结构比对能更好地反映链的运动,通常比基于结构的比对能揭示更大的结构和拓扑变化。据报道,折叠转换蛋白质对在 X 射线和核磁共振结构之间最为常见,由于交替构型更难解析,因此估计在 PDB 中的代表性不足。在这里,我们同样发现链拓扑结构在 X 射线和 NMR 结构之间的变化最为频繁。
Sequence-Similar Protein Domain Pairs With Structural or Topological Dissimilarity.
For a variety of applications, protein structures are clustered by sequence similarity, and sequence-redundant structures are disregarded. Sequence-similar chains are likely to have similar structures, but significant structural variation, as measured with RMSD, has been documented for sequence-similar chains and found usually to have a functional explanation. Moving two neighboring stretches of backbone through each other may change the chain topology and alter possible folding paths. The size of this motion is compatible to a variation in a flexible loop. We search and find domains with alternate chain topology in CATH4.2 sequence families relatively independent of sequence identity and of structural similarity as measured by RMSD. Structural, topological, and functional representative sets should therefore keep sequence-similar domains not just with structural variation but also with topological variation. We present BCAlign that finds Alignment and superposition of protein Backbone Curves by optimizing a user chosen convex combination of structural derivation and derivation between the structure-based sequence alignment and an input sequence alignment. Steric and topological obstructions from deforming a curve into an aligned curve are then found by a previously developed algorithm. For highly sequence-similar domains, sequence-based structural alignment better represents the chains motion and generally reveals larger structural and topological variation than structure-based does. Fold-switching protein pairs have been reported to be most frequent between X-ray and NMR structures and estimated to be underrepresented in the PDB as the alternate configuration is harder to resolve. Here we similarly find chain topology most frequently altered between X-ray and NMR structures.