Tandem Repeats Provide Evidence for Convergent Evolution to Similar Protein Structures.

IF 2.8 2区 生物学 Q2 EVOLUTIONARY BIOLOGY Genome Biology and Evolution Pub Date : 2025-02-03 DOI:10.1093/gbe/evaf013
Erik S Wright
{"title":"Tandem Repeats Provide Evidence for Convergent Evolution to Similar Protein Structures.","authors":"Erik S Wright","doi":"10.1093/gbe/evaf013","DOIUrl":null,"url":null,"abstract":"<p><p>Homology is a key concept underpinning the comparison of sequences across organisms. Sequence-level homology is based on a statistical framework optimized over decades of work. Recently, computational protein structure prediction has enabled large-scale homology inference beyond the limits of accurate sequence alignment. In this regime, it is possible to observe nearly identical protein structures lacking detectable sequence similarity. In the absence of a robust statistical framework for structure comparison, it is largely assumed similar structures are homologous. However, it is conceivable that matching structures could arise through convergent evolution, resulting in analogous proteins without shared ancestry. Large databases of predicted structures offer a means of determining whether analogs are present among structure matches. Here, I find that a small subset (∼2.6%) of Foldseek clusters lack sequence-level support for homology, including ∼1% of strong structure matches with template modeling score ≥ 0.5. This result by itself does not imply these structure pairs are nonhomologous, since their sequences could have diverged beyond the limits of recognition. Yet, strong matches without sequence-level support for homology are enriched in structures with predicted repeats that could induce spurious matches. Some of these structural repeats are underpinned by sequence-level tandem repeats in both matching structures. I show that many of these tandem repeat units have genealogies inconsistent with their corresponding structures sharing a common ancestor, implying these highly similar structure pairs are analogous rather than homologous. This result suggests caution is warranted when inferring homology from structural resemblance alone in the absence of sequence-level support for homology.</p>","PeriodicalId":12779,"journal":{"name":"Genome Biology and Evolution","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11812678/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Biology and Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gbe/evaf013","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Homology is a key concept underpinning the comparison of sequences across organisms. Sequence-level homology is based on a statistical framework optimized over decades of work. Recently, computational protein structure prediction has enabled large-scale homology inference beyond the limits of accurate sequence alignment. In this regime, it is possible to observe nearly identical protein structures lacking detectable sequence similarity. In the absence of a robust statistical framework for structure comparison, it is largely assumed similar structures are homologous. However, it is conceivable that matching structures could arise through convergent evolution, resulting in analogous proteins without shared ancestry. Large databases of predicted structures offer a means of determining whether analogs are present among structure matches. Here, I find that a small subset (∼2.6%) of Foldseek clusters lack sequence-level support for homology, including ∼1% of strong structure matches with template modeling score ≥ 0.5. This result by itself does not imply these structure pairs are nonhomologous, since their sequences could have diverged beyond the limits of recognition. Yet, strong matches without sequence-level support for homology are enriched in structures with predicted repeats that could induce spurious matches. Some of these structural repeats are underpinned by sequence-level tandem repeats in both matching structures. I show that many of these tandem repeat units have genealogies inconsistent with their corresponding structures sharing a common ancestor, implying these highly similar structure pairs are analogous rather than homologous. This result suggests caution is warranted when inferring homology from structural resemblance alone in the absence of sequence-level support for homology.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
串联重复序列为相似蛋白质结构的趋同进化提供了证据。
同源性是支撑跨生物序列比较的关键概念。序列级同源性基于经过数十年工作优化的统计框架。最近,计算蛋白质结构预测已经超越了精确序列比对的限制,实现了大规模的同源性推断。在这种情况下,有可能观察到几乎相同的蛋白质结构缺乏可检测的序列相似性。在缺乏强健的结构比较统计框架的情况下,很大程度上假定相似的结构是同源的。然而,可以想象的是,匹配的结构可以通过趋同进化产生,导致类似的蛋白质没有共同的祖先。预测结构的大型数据库提供了一种确定结构匹配中是否存在类似物的方法。在这里,我发现一小部分(约2.6%)的Foldseek簇缺乏序列水平的同源性支持,包括约1%的强结构匹配与tm评分≥0.5。这一结果本身并不意味着这些结构对是非同源的,因为它们的序列可能已经超出了识别的范围。然而,没有同源性的序列水平支持的强匹配在预测重复的结构中丰富,可能导致虚假匹配。其中一些结构重复是由两个匹配结构中的序列级串联重复所支撑的。我表明,许多这些串联重复单位的家谱与它们的相应结构有一个共同的祖先不一致,这意味着这些高度相似的结构对是类似的,而不是同源的。这一结果表明,在缺乏同源性的序列水平支持时,仅从结构相似性推断同源性是有必要的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Genome Biology and Evolution
Genome Biology and Evolution EVOLUTIONARY BIOLOGY-GENETICS & HEREDITY
CiteScore
5.80
自引率
6.10%
发文量
169
审稿时长
1 months
期刊介绍: About the journal Genome Biology and Evolution (GBE) publishes leading original research at the interface between evolutionary biology and genomics. Papers considered for publication report novel evolutionary findings that concern natural genome diversity, population genomics, the structure, function, organisation and expression of genomes, comparative genomics, proteomics, and environmental genomic interactions. Major evolutionary insights from the fields of computational biology, structural biology, developmental biology, and cell biology are also considered, as are theoretical advances in the field of genome evolution. GBE’s scope embraces genome-wide evolutionary investigations at all taxonomic levels and for all forms of life — within populations or across domains. Its aims are to further the understanding of genomes in their evolutionary context and further the understanding of evolution from a genome-wide perspective.
期刊最新文献
De novo chromosome-level assembly of the endangered Pilocarpus microphyllus highlights genomic resources for conservation and sustainable pilocarpine extraction. Genome sequence data reveal complex and variable ploidy in the amoebozoan Acanthamoeba castellanii. Identifying adaptive footprints in the presence of demographic uncertainty. Cosmopolitan gene families with known functions are hotspots for the evolution of novel genes in stony corals. Haplotype-resolved genome of the critically endangered, paleo-endemic tree, Eidothea hardeniana.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1