Hans-Christof Gasser , Diego A. Oyarzún , Ajitha Rajan , Javier Antonio Alfaro
{"title":"引导基于语言模型的蛋白质设计方法,实现疫苗和治疗中的 MHC I 类免疫可见性目标","authors":"Hans-Christof Gasser , Diego A. Oyarzún , Ajitha Rajan , Javier Antonio Alfaro","doi":"10.1016/j.immuno.2024.100035","DOIUrl":null,"url":null,"abstract":"<div><p>Proteins have an arsenal of medical applications that include disrupting protein interactions, acting as potent vaccines, and replacing genetically deficient proteins. While therapeutics must avoid triggering unwanted immune-responses, vaccines should support a robust immune-reaction targeting a broad range of pathogen variants. Therefore, computational methods modifying proteins’ immunogenicity without disrupting function are needed. While many components of the immune-system can be involved in a reaction, we focus on Cytotoxic T-lymphocytes (CTLs). These target short peptides presented via the MHC Class I (MHC-I) pathway. To explore the limits of modifying the visibility of those peptides to CTLs within the distribution of naturally occurring sequences, we developed a novel machine learning technique, <span>CAPE-XVAE</span>. It combines a language model with reinforcement learning to modify a protein’s immune-visibility. Our results show that <span>CAPE-XVAE</span> effectively modifies the visibility of the HIV Nef protein to CTLs. We contrast <span>CAPE-XVAE</span> to <span>CAPE-Packer</span>, a physics-based method we also developed. Compared to <span>CAPE-Packer</span>, the machine learning approach suggests sequences that draw upon local sequence similarities in the training set. This is beneficial for vaccine development, where the sequence should be representative of the real viral population. Additionally, the language model approach holds promise for preserving both known and unknown functional constraints, which is essential for the immune-modulation of therapeutic proteins. In contrast, <span>CAPE-Packer</span>, emphasizes preserving the protein’s overall fold and can reach greater extremes of immune-visibility, but falls short of capturing the sequence diversity of viral variants available to learn from. Source code: <span>https://github.com/hcgasser/CAPE</span><svg><path></path></svg> (Tag: <span>v1.1</span>)</p></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"14 ","pages":"Article 100035"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667119024000053/pdfft?md5=add2e81105c2c0a169282f80ff064817&pid=1-s2.0-S2667119024000053-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Guiding a language-model based protein design method towards MHC Class-I immune-visibility targets in vaccines and therapeutics\",\"authors\":\"Hans-Christof Gasser , Diego A. Oyarzún , Ajitha Rajan , Javier Antonio Alfaro\",\"doi\":\"10.1016/j.immuno.2024.100035\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Proteins have an arsenal of medical applications that include disrupting protein interactions, acting as potent vaccines, and replacing genetically deficient proteins. While therapeutics must avoid triggering unwanted immune-responses, vaccines should support a robust immune-reaction targeting a broad range of pathogen variants. Therefore, computational methods modifying proteins’ immunogenicity without disrupting function are needed. While many components of the immune-system can be involved in a reaction, we focus on Cytotoxic T-lymphocytes (CTLs). These target short peptides presented via the MHC Class I (MHC-I) pathway. To explore the limits of modifying the visibility of those peptides to CTLs within the distribution of naturally occurring sequences, we developed a novel machine learning technique, <span>CAPE-XVAE</span>. It combines a language model with reinforcement learning to modify a protein’s immune-visibility. Our results show that <span>CAPE-XVAE</span> effectively modifies the visibility of the HIV Nef protein to CTLs. We contrast <span>CAPE-XVAE</span> to <span>CAPE-Packer</span>, a physics-based method we also developed. Compared to <span>CAPE-Packer</span>, the machine learning approach suggests sequences that draw upon local sequence similarities in the training set. This is beneficial for vaccine development, where the sequence should be representative of the real viral population. Additionally, the language model approach holds promise for preserving both known and unknown functional constraints, which is essential for the immune-modulation of therapeutic proteins. In contrast, <span>CAPE-Packer</span>, emphasizes preserving the protein’s overall fold and can reach greater extremes of immune-visibility, but falls short of capturing the sequence diversity of viral variants available to learn from. Source code: <span>https://github.com/hcgasser/CAPE</span><svg><path></path></svg> (Tag: <span>v1.1</span>)</p></div>\",\"PeriodicalId\":73343,\"journal\":{\"name\":\"Immunoinformatics (Amsterdam, Netherlands)\",\"volume\":\"14 \",\"pages\":\"Article 100035\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2667119024000053/pdfft?md5=add2e81105c2c0a169282f80ff064817&pid=1-s2.0-S2667119024000053-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Immunoinformatics (Amsterdam, Netherlands)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667119024000053\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Immunoinformatics (Amsterdam, Netherlands)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667119024000053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
蛋白质在医学上有广泛的应用,包括破坏蛋白质相互作用、作为强效疫苗和替代基因缺陷蛋白质。治疗药物必须避免引发不必要的免疫反应,而疫苗则应支持针对各种病原体变体的强效免疫反应。因此,需要用计算方法在不破坏功能的情况下改变蛋白质的免疫原性。虽然免疫系统的许多成分都可能参与反应,但我们将重点放在细胞毒性 T 淋巴细胞(CTLs)上。它们的靶标是通过 MHC I 类(MHC-I)途径呈现的短肽。为了探索在天然序列分布范围内修改这些肽对 CTL 的可见性的极限,我们开发了一种新型机器学习技术 CAPE-XVAE。它将语言模型与强化学习相结合,以改变蛋白质的免疫可见性。我们的研究结果表明,CAPE-XVAE 能有效改变 HIV Nef 蛋白在 CTLs 中的可见性。我们将 CAPE-XVAE 与 CAPE-Packer 进行了对比,后者也是我们开发的一种基于物理的方法。与 CAPE-Packer 相比,机器学习方法能利用训练集中的局部序列相似性提出序列建议。这有利于疫苗研发,因为疫苗序列应能代表真实的病毒群。此外,语言模型方法有望保留已知和未知的功能约束,这对于治疗蛋白的免疫调节至关重要。相比之下,CAPE-Packer 则强调保留蛋白质的整体折叠,并能达到更高的免疫可见度,但却无法捕捉可供学习的病毒变体序列多样性。源代码:https://github.com/hcgasser/CAPE(标签:v1.1)
Guiding a language-model based protein design method towards MHC Class-I immune-visibility targets in vaccines and therapeutics
Proteins have an arsenal of medical applications that include disrupting protein interactions, acting as potent vaccines, and replacing genetically deficient proteins. While therapeutics must avoid triggering unwanted immune-responses, vaccines should support a robust immune-reaction targeting a broad range of pathogen variants. Therefore, computational methods modifying proteins’ immunogenicity without disrupting function are needed. While many components of the immune-system can be involved in a reaction, we focus on Cytotoxic T-lymphocytes (CTLs). These target short peptides presented via the MHC Class I (MHC-I) pathway. To explore the limits of modifying the visibility of those peptides to CTLs within the distribution of naturally occurring sequences, we developed a novel machine learning technique, CAPE-XVAE. It combines a language model with reinforcement learning to modify a protein’s immune-visibility. Our results show that CAPE-XVAE effectively modifies the visibility of the HIV Nef protein to CTLs. We contrast CAPE-XVAE to CAPE-Packer, a physics-based method we also developed. Compared to CAPE-Packer, the machine learning approach suggests sequences that draw upon local sequence similarities in the training set. This is beneficial for vaccine development, where the sequence should be representative of the real viral population. Additionally, the language model approach holds promise for preserving both known and unknown functional constraints, which is essential for the immune-modulation of therapeutic proteins. In contrast, CAPE-Packer, emphasizes preserving the protein’s overall fold and can reach greater extremes of immune-visibility, but falls short of capturing the sequence diversity of viral variants available to learn from. Source code: https://github.com/hcgasser/CAPE (Tag: v1.1)