Arya Changiarath, Aayush Arya, Vasileios A. Xenidis, Jan Padeken and Lukas S. Stelzl
{"title":"通过分子动力学和主动学习研究蛋白质相分离和蛋白质相分离凝聚物识别的序列决定因素","authors":"Arya Changiarath, Aayush Arya, Vasileios A. Xenidis, Jan Padeken and Lukas S. Stelzl","doi":"10.1039/D4FD00099D","DOIUrl":null,"url":null,"abstract":"<p >Elucidating how protein sequence determines the properties of disordered proteins and their phase-separated condensates is a great challenge in computational chemistry, biology, and biophysics. Quantitative molecular dynamics simulations and derived free energy values can in principle capture how a sequence encodes the chemical and biological properties of a protein. These calculations are, however, computationally demanding, even after reducing the representation by coarse-graining; exploring the large spaces of potentially relevant sequences remains a formidable task. We employ an “active learning” scheme introduced by Yang <em>et al.</em> (<em>bioRxiv</em>, 2022, https://doi.org/10.1101/2022.08.05.502972) to reduce the number of labelled examples needed from simulations, where a neural network-based model suggests the most useful examples for the next training cycle. Applying this Bayesian optimisation framework, we determine properties of protein sequences with coarse-grained molecular dynamics, which enables the network to establish sequence–property relationships for disordered proteins and their self-interactions and their interactions in phase-separated condensates. We show how iterative training with second virial coefficients derived from the simulations of disordered protein sequences leads to a rapid improvement in predicting peptide self-interactions. We employ this Bayesian approach to efficiently search for new sequences that bind to condensates of the disordered C-terminal domain (CTD) of RNA Polymerase II, by simulating molecular recognition of peptides to phase-separated condensates in coarse-grained molecular dynamics. By searching for protein sequences which prefer to self-interact rather than interact with another protein sequence we are able to shape the morphology of protein condensates and design multiphasic protein condensates.</p>","PeriodicalId":49075,"journal":{"name":"Faraday Discussions","volume":"256 ","pages":" 235-254"},"PeriodicalIF":3.4000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/fd/d4fd00099d?page=search","citationCount":"0","resultStr":"{\"title\":\"Sequence determinants of protein phase separation and recognition by protein phase-separated condensates through molecular dynamics and active learning†\",\"authors\":\"Arya Changiarath, Aayush Arya, Vasileios A. Xenidis, Jan Padeken and Lukas S. Stelzl\",\"doi\":\"10.1039/D4FD00099D\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Elucidating how protein sequence determines the properties of disordered proteins and their phase-separated condensates is a great challenge in computational chemistry, biology, and biophysics. Quantitative molecular dynamics simulations and derived free energy values can in principle capture how a sequence encodes the chemical and biological properties of a protein. These calculations are, however, computationally demanding, even after reducing the representation by coarse-graining; exploring the large spaces of potentially relevant sequences remains a formidable task. We employ an “active learning” scheme introduced by Yang <em>et al.</em> (<em>bioRxiv</em>, 2022, https://doi.org/10.1101/2022.08.05.502972) to reduce the number of labelled examples needed from simulations, where a neural network-based model suggests the most useful examples for the next training cycle. Applying this Bayesian optimisation framework, we determine properties of protein sequences with coarse-grained molecular dynamics, which enables the network to establish sequence–property relationships for disordered proteins and their self-interactions and their interactions in phase-separated condensates. We show how iterative training with second virial coefficients derived from the simulations of disordered protein sequences leads to a rapid improvement in predicting peptide self-interactions. We employ this Bayesian approach to efficiently search for new sequences that bind to condensates of the disordered C-terminal domain (CTD) of RNA Polymerase II, by simulating molecular recognition of peptides to phase-separated condensates in coarse-grained molecular dynamics. By searching for protein sequences which prefer to self-interact rather than interact with another protein sequence we are able to shape the morphology of protein condensates and design multiphasic protein condensates.</p>\",\"PeriodicalId\":49075,\"journal\":{\"name\":\"Faraday Discussions\",\"volume\":\"256 \",\"pages\":\" 235-254\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://pubs.rsc.org/en/content/articlepdf/2025/fd/d4fd00099d?page=search\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Faraday Discussions\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://pubs.rsc.org/en/content/articlelanding/2025/fd/d4fd00099d\",\"RegionNum\":3,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Chemistry\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Faraday Discussions","FirstCategoryId":"92","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/fd/d4fd00099d","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Chemistry","Score":null,"Total":0}
引用次数: 0
摘要
阐明蛋白质序列如何决定无序蛋白质及其相分离凝聚物的特性,是计算化学、生物学和生物物理学的一大挑战。定量分子动力学模拟和推导出的自由能值原则上可以捕捉序列如何编码蛋白质的化学和生物特性。然而,这些计算对计算要求很高,即使在通过粗粒化减少表征之后也是如此;探索潜在相关序列的巨大空间仍然是一项艰巨的任务。我们采用了杨等人提出的 "主动学习 "方案(bioRxiv 2022.08.05.502972)来减少模拟所需的标记示例数量,其中基于神经网络的模型为下一个训练周期提出了最有用的示例。通过应用这种贝叶斯优化框架,我们用粗粒度分子动力学确定了蛋白质序列的属性,从而使网络能够建立无序蛋白质的序列属性关系及其在相分离凝聚体中的自我相互作用和相互作用。我们展示了如何利用从无序蛋白质序列模拟中得出的第二病毒系数进行迭代训练,从而快速提高肽自相互作用的预测能力。我们采用这种贝叶斯方法,通过在粗粒度分子动力学中模拟分子识别肽与相分离凝聚物的过程,有效地搜索与 RNA 聚合酶 II 的无序 C 端结构域 (CTD) 凝聚物结合的新序列。通过寻找更倾向于自我相互作用而不是与另一个蛋白质序列相互作用的蛋白质序列,我们能够塑造蛋白质凝聚物的形态并设计多相蛋白质凝聚物。
Sequence determinants of protein phase separation and recognition by protein phase-separated condensates through molecular dynamics and active learning†
Elucidating how protein sequence determines the properties of disordered proteins and their phase-separated condensates is a great challenge in computational chemistry, biology, and biophysics. Quantitative molecular dynamics simulations and derived free energy values can in principle capture how a sequence encodes the chemical and biological properties of a protein. These calculations are, however, computationally demanding, even after reducing the representation by coarse-graining; exploring the large spaces of potentially relevant sequences remains a formidable task. We employ an “active learning” scheme introduced by Yang et al. (bioRxiv, 2022, https://doi.org/10.1101/2022.08.05.502972) to reduce the number of labelled examples needed from simulations, where a neural network-based model suggests the most useful examples for the next training cycle. Applying this Bayesian optimisation framework, we determine properties of protein sequences with coarse-grained molecular dynamics, which enables the network to establish sequence–property relationships for disordered proteins and their self-interactions and their interactions in phase-separated condensates. We show how iterative training with second virial coefficients derived from the simulations of disordered protein sequences leads to a rapid improvement in predicting peptide self-interactions. We employ this Bayesian approach to efficiently search for new sequences that bind to condensates of the disordered C-terminal domain (CTD) of RNA Polymerase II, by simulating molecular recognition of peptides to phase-separated condensates in coarse-grained molecular dynamics. By searching for protein sequences which prefer to self-interact rather than interact with another protein sequence we are able to shape the morphology of protein condensates and design multiphasic protein condensates.