用于预测 T 细胞受体与多肽结合的注意力网络可将注意力与可解释的蛋白质结构特性联系起来。

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Frontiers in bioinformatics Pub Date : 2023-12-18 eCollection Date: 2023-01-01 DOI:10.3389/fbinf.2023.1274599
Kyohei Koyama, Kosuke Hashimoto, Chioko Nagao, Kenji Mizuguchi
{"title":"用于预测 T 细胞受体与多肽结合的注意力网络可将注意力与可解释的蛋白质结构特性联系起来。","authors":"Kyohei Koyama, Kosuke Hashimoto, Chioko Nagao, Kenji Mizuguchi","doi":"10.3389/fbinf.2023.1274599","DOIUrl":null,"url":null,"abstract":"<p><p>Understanding how a T-cell receptor (TCR) recognizes its specific ligand peptide is crucial for gaining an insight into biological functions and disease mechanisms. Despite its importance, experimentally determining TCR-peptide-major histocompatibility complex (TCR-pMHC) interactions is expensive and time-consuming. To address this challenge, computational methods have been proposed, but they are typically evaluated by internal retrospective validation only, and few researchers have incorporated and tested an attention layer from language models into structural information. Therefore, in this study, we developed a machine learning model based on a modified version of Transformer, a source-target attention neural network, to predict the TCR-pMHC interaction solely from the amino acid sequences of the TCR complementarity-determining region (CDR) 3 and the peptide. This model achieved competitive performance on a benchmark dataset of the TCR-pMHC interaction, as well as on a truly new external dataset. Additionally, by analyzing the results of binding predictions, we associated the neural network weights with protein structural properties. By classifying the residues into large- and small-attention groups, we identified statistically significant properties associated with the largely attended residues such as hydrogen bonds within CDR3. The dataset that we created and the ability of our model to provide an interpretable prediction of TCR-peptide binding should increase our knowledge about molecular recognition and pave the way for designing new therapeutics.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10759225/pdf/","citationCount":"0","resultStr":"{\"title\":\"Attention network for predicting T-cell receptor-peptide binding can associate attention with interpretable protein structural properties.\",\"authors\":\"Kyohei Koyama, Kosuke Hashimoto, Chioko Nagao, Kenji Mizuguchi\",\"doi\":\"10.3389/fbinf.2023.1274599\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Understanding how a T-cell receptor (TCR) recognizes its specific ligand peptide is crucial for gaining an insight into biological functions and disease mechanisms. Despite its importance, experimentally determining TCR-peptide-major histocompatibility complex (TCR-pMHC) interactions is expensive and time-consuming. To address this challenge, computational methods have been proposed, but they are typically evaluated by internal retrospective validation only, and few researchers have incorporated and tested an attention layer from language models into structural information. Therefore, in this study, we developed a machine learning model based on a modified version of Transformer, a source-target attention neural network, to predict the TCR-pMHC interaction solely from the amino acid sequences of the TCR complementarity-determining region (CDR) 3 and the peptide. This model achieved competitive performance on a benchmark dataset of the TCR-pMHC interaction, as well as on a truly new external dataset. Additionally, by analyzing the results of binding predictions, we associated the neural network weights with protein structural properties. By classifying the residues into large- and small-attention groups, we identified statistically significant properties associated with the largely attended residues such as hydrogen bonds within CDR3. The dataset that we created and the ability of our model to provide an interpretable prediction of TCR-peptide binding should increase our knowledge about molecular recognition and pave the way for designing new therapeutics.</p>\",\"PeriodicalId\":73066,\"journal\":{\"name\":\"Frontiers in bioinformatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2023-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10759225/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fbinf.2023.1274599\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fbinf.2023.1274599","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

了解 T 细胞受体(TCR)如何识别其特定的配体肽对于深入了解生物功能和疾病机制至关重要。尽管很重要,但通过实验确定 TCR-肽-主要组织相容性复合体(TCR-pMHC)之间的相互作用既昂贵又耗时。为了应对这一挑战,人们提出了一些计算方法,但这些方法通常只通过内部回顾验证进行评估,很少有研究人员将语言模型的注意力层纳入结构信息并进行测试。因此,在本研究中,我们开发了一种基于源-目标注意神经网络 Transformer 改进版的机器学习模型,仅从 TCR 互补性决定区(CDR)3 和多肽的氨基酸序列预测 TCR-pMHC 相互作用。该模型在TCR-pMHC相互作用的基准数据集以及全新的外部数据集上都取得了具有竞争力的性能。此外,通过分析结合预测的结果,我们将神经网络权重与蛋白质结构特性联系起来。通过将残基分为大关注度组和小关注度组,我们发现了与大关注度残基(如 CDR3 中的氢键)相关的具有统计学意义的特性。我们创建的数据集和我们的模型能够提供可解释的 TCR 肽结合预测,这将增加我们对分子识别的了解,并为设计新疗法铺平道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Attention network for predicting T-cell receptor-peptide binding can associate attention with interpretable protein structural properties.

Understanding how a T-cell receptor (TCR) recognizes its specific ligand peptide is crucial for gaining an insight into biological functions and disease mechanisms. Despite its importance, experimentally determining TCR-peptide-major histocompatibility complex (TCR-pMHC) interactions is expensive and time-consuming. To address this challenge, computational methods have been proposed, but they are typically evaluated by internal retrospective validation only, and few researchers have incorporated and tested an attention layer from language models into structural information. Therefore, in this study, we developed a machine learning model based on a modified version of Transformer, a source-target attention neural network, to predict the TCR-pMHC interaction solely from the amino acid sequences of the TCR complementarity-determining region (CDR) 3 and the peptide. This model achieved competitive performance on a benchmark dataset of the TCR-pMHC interaction, as well as on a truly new external dataset. Additionally, by analyzing the results of binding predictions, we associated the neural network weights with protein structural properties. By classifying the residues into large- and small-attention groups, we identified statistically significant properties associated with the largely attended residues such as hydrogen bonds within CDR3. The dataset that we created and the ability of our model to provide an interpretable prediction of TCR-peptide binding should increase our knowledge about molecular recognition and pave the way for designing new therapeutics.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.60
自引率
0.00%
发文量
0
期刊最新文献
The quantum hypercube as a k-mer graph. A review of model evaluation metrics for machine learning in genetics and genomics. Visual analysis of multi-omics data. Molecular docking and molecular dynamic simulation studies to identify potential terpenes against Internalin A protein of Listeria monocytogenes. PhIP-Seq: methods, applications and challenges.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1