Contrastive learning of T cell receptor representations.

Cell systems Pub Date : 2025-01-15 Epub Date: 2025-01-07 DOI:10.1016/j.cels.2024.12.006
Yuta Nagano, Andrew G T Pyo, Martina Milighetti, James Henderson, John Shawe-Taylor, Benny Chain, Andreas Tiffeau-Mayer
{"title":"Contrastive learning of T cell receptor representations.","authors":"Yuta Nagano, Andrew G T Pyo, Martina Milighetti, James Henderson, John Shawe-Taylor, Benny Chain, Andreas Tiffeau-Mayer","doi":"10.1016/j.cels.2024.12.006","DOIUrl":null,"url":null,"abstract":"<p><p>Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labeled TCR data remain sparse. In other domains, the pre-training of language models on unlabeled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein language models for TCR specificity prediction. Here, we introduce a TCR language model called SCEPTR (simple contrastive embedding of the primary sequence of T cell receptors), which is capable of data-efficient transfer learning. Through our model, we introduce a pre-training strategy combining autocontrastive learning and masked-language modeling, which enables SCEPTR to achieve its state-of-the-art performance. In contrast, existing protein language models and a variant of SCEPTR pre-trained without autocontrastive learning are outperformed by sequence alignment-based methods. We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity. A record of this paper's transparent peer review process is included in the supplemental information.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":" ","pages":"101165"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.cels.2024.12.006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/7 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labeled TCR data remain sparse. In other domains, the pre-training of language models on unlabeled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein language models for TCR specificity prediction. Here, we introduce a TCR language model called SCEPTR (simple contrastive embedding of the primary sequence of T cell receptors), which is capable of data-efficient transfer learning. Through our model, we introduce a pre-training strategy combining autocontrastive learning and masked-language modeling, which enables SCEPTR to achieve its state-of-the-art performance. In contrast, existing protein language models and a variant of SCEPTR pre-trained without autocontrastive learning are outperformed by sequence alignment-based methods. We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity. A record of this paper's transparent peer review process is included in the supplemental information.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
T细胞受体表征的对比学习。
T细胞受体(TCRs)及其配体相互作用的计算预测是免疫学领域的一大挑战。尽管在高通量分析方面取得了进展,特异性标记的TCR数据仍然稀少。在其他领域,对未标记数据的语言模型进行预训练已被成功地用于解决数据瓶颈。然而,目前尚不清楚如何最好地预训练蛋白质语言模型来预测TCR特异性。在这里,我们介绍了一种称为SCEPTR (T细胞受体初级序列的简单对比嵌入)的TCR语言模型,该模型能够进行数据高效的迁移学习。通过我们的模型,我们引入了一种结合自对比学习和屏蔽语言建模的预训练策略,使SCEPTR能够达到其最先进的性能。相比之下,基于序列比对的方法优于现有的蛋白质语言模型和未经自对比学习预训练的SCEPTR变体。我们预计,对比学习将是解码TCR特异性规则的有用范式。本文的透明同行评议过程记录包含在补充信息中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Proliferation history and transcription factor levels drive direct conversion to motor neurons. Compact transcription factor cassettes generate functional, engraftable motor neurons by direct conversion. Engineering highly active nuclease enzymes with machine learning and high-throughput screening. Multiplexed dynamic control of temperature to probe and observe mammalian cells. Self-resistance-gene-guided, high-throughput automated genome mining of bioactive natural products from Streptomyces.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1