High fitness paths can connect proteins with low sequence overlap.

ArXiv Pub Date : 2024-11-13
Pranav Kantroo, Günter P Wagner, Benjamin B Machta
{"title":"High fitness paths can connect proteins with low sequence overlap.","authors":"Pranav Kantroo, Günter P Wagner, Benjamin B Machta","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>The structure and function of a protein are determined by its amino acid sequence. While random mutations change a protein's sequence, evolutionary forces shape its structural fold and biological activity. Studies have shown that neutral networks can connect a local region of sequence space by single residue mutations that preserve viability. However, the larger-scale connectedness of protein morphospace remains poorly understood. Recent advances in artificial intelligence have enabled us to computationally predict a protein's structure and quantify its functional plausibility. Here we build on these tools to develop an algorithm that generates viable paths between distantly related extant protein pairs. The intermediate sequences in these paths differ by single residue changes over subsequent steps - substitutions, insertions and deletions are admissible moves. Their fitness is evaluated using the protein language model ESM2, and maintained as high as possible subject to the constraints of the traversal. We document the qualitative variation across paths generated between progressively divergent protein pairs, some of which do not even acquire the same structural fold. The ease of interpolating between two sequences could be used as a proxy for the likelihood of homology between them.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601789/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The structure and function of a protein are determined by its amino acid sequence. While random mutations change a protein's sequence, evolutionary forces shape its structural fold and biological activity. Studies have shown that neutral networks can connect a local region of sequence space by single residue mutations that preserve viability. However, the larger-scale connectedness of protein morphospace remains poorly understood. Recent advances in artificial intelligence have enabled us to computationally predict a protein's structure and quantify its functional plausibility. Here we build on these tools to develop an algorithm that generates viable paths between distantly related extant protein pairs. The intermediate sequences in these paths differ by single residue changes over subsequent steps - substitutions, insertions and deletions are admissible moves. Their fitness is evaluated using the protein language model ESM2, and maintained as high as possible subject to the constraints of the traversal. We document the qualitative variation across paths generated between progressively divergent protein pairs, some of which do not even acquire the same structural fold. The ease of interpolating between two sequences could be used as a proxy for the likelihood of homology between them.

分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
高匹配度路径可以连接序列重叠度较低的蛋白质。
蛋白质的结构和功能由其氨基酸序列决定。随机突变会改变蛋白质的序列,而进化力量则会塑造其结构折叠和生物活性。研究表明,中性网络可以通过单残基突变将序列空间的局部区域连接起来,从而保持活力。然而,人们对蛋白质形态空间更大规模的连接性仍然知之甚少。人工智能的最新进展使我们能够通过计算预测蛋白质的结构,并量化其功能合理性。在这里,我们以这些工具为基础,开发了一种算法,可以在远缘的现存蛋白质对之间生成可行的路径。这些路径中的中间序列在随后的步骤中因单个残基变化而不同--替换、插入和删除都是允许的动作。我们使用蛋白质语言模型 ESM2 对它们的适配性进行评估,并在遍历的限制条件下尽可能保持较高的适配性。我们记录了渐进式差异蛋白质对之间生成路径的质量变化,其中一些甚至没有获得相同的结构折叠。两个序列之间插值的难易程度可以代表它们之间同源性的可能性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Metastability in networks of nonlinear stochastic integrate-and-fire neurons. On the linear scaling of entropy vs. energy in human brain activity, the Hagedorn temperature and the Zipf law. Timing consistency of T cell receptor activation in a stochastic model combining kinetic segregation and proofreading. Brain Morphology Normative modelling platform for abnormality and Centile estimation: Brain MoNoCle. Adversarial Attacks on Large Language Models in Medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1