Romanos Fasoulis , Mauricio Menegatti Rigo , Dinler Amaral Antunes , Georgios Paliouras , Lydia E. Kavraki
{"title":"迁移学习改进了 pMHC 动力稳定性和免疫原性预测","authors":"Romanos Fasoulis , Mauricio Menegatti Rigo , Dinler Amaral Antunes , Georgios Paliouras , Lydia E. Kavraki","doi":"10.1016/j.immuno.2023.100030","DOIUrl":null,"url":null,"abstract":"<div><p>The cellular immune response comprises several processes, with the most notable ones being the binding of the peptide to the Major Histocompability Complex (MHC), the peptide-MHC (pMHC) presentation to the surface of the cell, and the recognition of the pMHC by the T-Cell Receptor. Identifying the most potent peptide targets for MHC binding, presentation and T-cell recognition is vital for developing peptide-based vaccines and T-cell-based immunotherapies. Data-driven tools that predict each of these steps have been developed, and the availability of mass spectrometry (MS) datasets has facilitated the development of accurate Machine Learning (ML) methods for class-I pMHC binding prediction. However, the accuracy of ML-based tools for pMHC kinetic stability prediction and peptide immunogenicity prediction is uncertain, as stability and immunogenicity datasets are not abundant. Here, we use transfer learning techniques to improve stability and immunogenicity predictions, by taking advantage of a large number of binding affinity and MS datasets. The resulting models, TLStab and TLImm, exhibit comparable or better performance than state-of-the-art approaches on different stability and immunogenicity test sets respectively. Our approach demonstrates the promise of learning from the task of peptide binding to improve predictions on downstream tasks. The source code of TLStab and TLImm is publicly available at <span>https://github.com/KavrakiLab/TL-MHC</span><svg><path></path></svg>.</p></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"13 ","pages":"Article 100030"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667119023000101/pdfft?md5=8b373c4d3341fd69e7933198d284cc77&pid=1-s2.0-S2667119023000101-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Transfer learning improves pMHC kinetic stability and immunogenicity predictions\",\"authors\":\"Romanos Fasoulis , Mauricio Menegatti Rigo , Dinler Amaral Antunes , Georgios Paliouras , Lydia E. Kavraki\",\"doi\":\"10.1016/j.immuno.2023.100030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The cellular immune response comprises several processes, with the most notable ones being the binding of the peptide to the Major Histocompability Complex (MHC), the peptide-MHC (pMHC) presentation to the surface of the cell, and the recognition of the pMHC by the T-Cell Receptor. Identifying the most potent peptide targets for MHC binding, presentation and T-cell recognition is vital for developing peptide-based vaccines and T-cell-based immunotherapies. Data-driven tools that predict each of these steps have been developed, and the availability of mass spectrometry (MS) datasets has facilitated the development of accurate Machine Learning (ML) methods for class-I pMHC binding prediction. However, the accuracy of ML-based tools for pMHC kinetic stability prediction and peptide immunogenicity prediction is uncertain, as stability and immunogenicity datasets are not abundant. Here, we use transfer learning techniques to improve stability and immunogenicity predictions, by taking advantage of a large number of binding affinity and MS datasets. The resulting models, TLStab and TLImm, exhibit comparable or better performance than state-of-the-art approaches on different stability and immunogenicity test sets respectively. Our approach demonstrates the promise of learning from the task of peptide binding to improve predictions on downstream tasks. The source code of TLStab and TLImm is publicly available at <span>https://github.com/KavrakiLab/TL-MHC</span><svg><path></path></svg>.</p></div>\",\"PeriodicalId\":73343,\"journal\":{\"name\":\"Immunoinformatics (Amsterdam, Netherlands)\",\"volume\":\"13 \",\"pages\":\"Article 100030\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2667119023000101/pdfft?md5=8b373c4d3341fd69e7933198d284cc77&pid=1-s2.0-S2667119023000101-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Immunoinformatics (Amsterdam, Netherlands)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667119023000101\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Immunoinformatics (Amsterdam, Netherlands)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667119023000101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
细胞免疫反应包括几个过程,其中最显著的是多肽与主要组织相容性复合物(MHC)结合、多肽-MHC(pMHC)呈递到细胞表面以及 T 细胞受体识别 pMHC。确定与 MHC 结合、呈递和 T 细胞识别的最有效多肽靶标对于开发多肽疫苗和 T 细胞免疫疗法至关重要。目前已开发出能预测其中每个步骤的数据驱动工具,质谱(MS)数据集的可用性也促进了用于 I 类 pMHC 结合预测的精确机器学习(ML)方法的开发。然而,由于稳定性和免疫原性数据集并不丰富,基于 ML 的 pMHC 动力稳定性预测和多肽免疫原性预测工具的准确性尚不确定。在此,我们利用迁移学习技术,利用大量的结合亲和力和质谱数据集来改进稳定性和免疫原性预测。由此产生的 TLStab 和 TLImm 模型分别在不同的稳定性和免疫原性测试集上表现出与最先进方法相当甚至更好的性能。我们的方法证明了从多肽结合任务中学习以改进下游任务预测的前景。TLStab 和 TLImm 的源代码可在 https://github.com/KavrakiLab/TL-MHC 公开获取。
Transfer learning improves pMHC kinetic stability and immunogenicity predictions
The cellular immune response comprises several processes, with the most notable ones being the binding of the peptide to the Major Histocompability Complex (MHC), the peptide-MHC (pMHC) presentation to the surface of the cell, and the recognition of the pMHC by the T-Cell Receptor. Identifying the most potent peptide targets for MHC binding, presentation and T-cell recognition is vital for developing peptide-based vaccines and T-cell-based immunotherapies. Data-driven tools that predict each of these steps have been developed, and the availability of mass spectrometry (MS) datasets has facilitated the development of accurate Machine Learning (ML) methods for class-I pMHC binding prediction. However, the accuracy of ML-based tools for pMHC kinetic stability prediction and peptide immunogenicity prediction is uncertain, as stability and immunogenicity datasets are not abundant. Here, we use transfer learning techniques to improve stability and immunogenicity predictions, by taking advantage of a large number of binding affinity and MS datasets. The resulting models, TLStab and TLImm, exhibit comparable or better performance than state-of-the-art approaches on different stability and immunogenicity test sets respectively. Our approach demonstrates the promise of learning from the task of peptide binding to improve predictions on downstream tasks. The source code of TLStab and TLImm is publicly available at https://github.com/KavrakiLab/TL-MHC.