用于预测 T 细胞受体与多肽结合的注意力网络可将注意力与可解释的蛋白质结构特性联系起来。

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Frontiers in bioinformatics Pub Date : 2023-12-18 eCollection Date: 2023-01-01 DOI:10.3389/fbinf.2023.1274599

Kyohei Koyama, Kosuke Hashimoto, Chioko Nagao, Kenji Mizuguchi

{"title":"用于预测 T 细胞受体与多肽结合的注意力网络可将注意力与可解释的蛋白质结构特性联系起来。","authors":"Kyohei Koyama, Kosuke Hashimoto, Chioko Nagao, Kenji Mizuguchi","doi":"10.3389/fbinf.2023.1274599","DOIUrl":null,"url":null,"abstract":"Understanding how a T-cell receptor (TCR) recognizes its specific ligand peptide is crucial for gaining an insight into biological functions and disease mechanisms. Despite its importance, experimentally determining TCR-peptide-major histocompatibility complex (TCR-pMHC) interactions is expensive and time-consuming. To address this challenge, computational methods have been proposed, but they are typically evaluated by internal retrospective validation only, and few researchers have incorporated and tested an attention layer from language models into structural information. Therefore, in this study, we developed a machine learning model based on a modified version of Transformer, a source-target attention neural network, to predict the TCR-pMHC interaction solely from the amino acid sequences of the TCR complementarity-determining region (CDR) 3 and the peptide. This model achieved competitive performance on a benchmark dataset of the TCR-pMHC interaction, as well as on a truly new external dataset. Additionally, by analyzing the results of binding predictions, we associated the neural network weights with protein structural properties. By classifying the residues into large- and small-attention groups, we identified statistically significant properties associated with the largely attended residues such as hydrogen bonds within CDR3. The dataset that we created and the ability of our model to provide an interpretable prediction of TCR-peptide binding should increase our knowledge about molecular recognition and pave the way for designing new therapeutics.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1274599"},"PeriodicalIF":2.8000,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10759225/pdf/","citationCount":"0","resultStr":"{\"title\":\"Attention network for predicting T-cell receptor-peptide binding can associate attention with interpretable protein structural properties.\",\"authors\":\"Kyohei Koyama, Kosuke Hashimoto, Chioko Nagao, Kenji Mizuguchi\",\"doi\":\"10.3389/fbinf.2023.1274599\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding how a T-cell receptor (TCR) recognizes its specific ligand peptide is crucial for gaining an insight into biological functions and disease mechanisms. Despite its importance, experimentally determining TCR-peptide-major histocompatibility complex (TCR-pMHC) interactions is expensive and time-consuming. To address this challenge, computational methods have been proposed, but they are typically evaluated by internal retrospective validation only, and few researchers have incorporated and tested an attention layer from language models into structural information. Therefore, in this study, we developed a machine learning model based on a modified version of Transformer, a source-target attention neural network, to predict the TCR-pMHC interaction solely from the amino acid sequences of the TCR complementarity-determining region (CDR) 3 and the peptide. This model achieved competitive performance on a benchmark dataset of the TCR-pMHC interaction, as well as on a truly new external dataset. Additionally, by analyzing the results of binding predictions, we associated the neural network weights with protein structural properties. By classifying the residues into large- and small-attention groups, we identified statistically significant properties associated with the largely attended residues such as hydrogen bonds within CDR3. The dataset that we created and the ability of our model to provide an interpretable prediction of TCR-peptide binding should increase our knowledge about molecular recognition and pave the way for designing new therapeutics.\",\"PeriodicalId\":73066,\"journal\":{\"name\":\"Frontiers in bioinformatics\",\"volume\":\"3 \",\"pages\":\"1274599\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2023-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10759225/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fbinf.2023.1274599\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fbinf.2023.1274599","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

了解 T 细胞受体（TCR）如何识别其特定的配体肽对于深入了解生物功能和疾病机制至关重要。尽管很重要，但通过实验确定 TCR-肽-主要组织相容性复合体（TCR-pMHC）之间的相互作用既昂贵又耗时。为了应对这一挑战，人们提出了一些计算方法，但这些方法通常只通过内部回顾验证进行评估，很少有研究人员将语言模型的注意力层纳入结构信息并进行测试。因此，在本研究中，我们开发了一种基于源-目标注意神经网络 Transformer 改进版的机器学习模型，仅从 TCR 互补性决定区（CDR）3 和多肽的氨基酸序列预测 TCR-pMHC 相互作用。该模型在TCR-pMHC相互作用的基准数据集以及全新的外部数据集上都取得了具有竞争力的性能。此外，通过分析结合预测的结果，我们将神经网络权重与蛋白质结构特性联系起来。通过将残基分为大关注度组和小关注度组，我们发现了与大关注度残基（如 CDR3 中的氢键）相关的具有统计学意义的特性。我们创建的数据集和我们的模型能够提供可解释的 TCR 肽结合预测，这将增加我们对分子识别的了解，并为设计新疗法铺平道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Attention network for predicting T-cell receptor-peptide binding can associate attention with interpretable protein structural properties.

Understanding how a T-cell receptor (TCR) recognizes its specific ligand peptide is crucial for gaining an insight into biological functions and disease mechanisms. Despite its importance, experimentally determining TCR-peptide-major histocompatibility complex (TCR-pMHC) interactions is expensive and time-consuming. To address this challenge, computational methods have been proposed, but they are typically evaluated by internal retrospective validation only, and few researchers have incorporated and tested an attention layer from language models into structural information. Therefore, in this study, we developed a machine learning model based on a modified version of Transformer, a source-target attention neural network, to predict the TCR-pMHC interaction solely from the amino acid sequences of the TCR complementarity-determining region (CDR) 3 and the peptide. This model achieved competitive performance on a benchmark dataset of the TCR-pMHC interaction, as well as on a truly new external dataset. Additionally, by analyzing the results of binding predictions, we associated the neural network weights with protein structural properties. By classifying the residues into large- and small-attention groups, we identified statistically significant properties associated with the largely attended residues such as hydrogen bonds within CDR3. The dataset that we created and the ability of our model to provide an interpretable prediction of TCR-peptide binding should increase our knowledge about molecular recognition and pave the way for designing new therapeutics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in bioinformatics

CiteScore

2.60

自引率

0.00%

发文量