Lessons learned from the IMMREP23 TCR-epitope prediction challenge

Morten Nielsen , Anne Eugster , Mathias Fynbo Jensen , Manisha Goel , Andreas Tiffeau-Mayer , Aurelien Pelissier , Sebastiaan Valkiers , María Rodríguez Martínez , Barthélémy Meynard-Piganeeau , Victor Greiff , Thierry Mora , Aleksandra M. Walczak , Giancarlo Croce , Dana L Moreno , David Gfeller , Pieter Meysman , Justin Barton
{"title":"Lessons learned from the IMMREP23 TCR-epitope prediction challenge","authors":"Morten Nielsen ,&nbsp;Anne Eugster ,&nbsp;Mathias Fynbo Jensen ,&nbsp;Manisha Goel ,&nbsp;Andreas Tiffeau-Mayer ,&nbsp;Aurelien Pelissier ,&nbsp;Sebastiaan Valkiers ,&nbsp;María Rodríguez Martínez ,&nbsp;Barthélémy Meynard-Piganeeau ,&nbsp;Victor Greiff ,&nbsp;Thierry Mora ,&nbsp;Aleksandra M. Walczak ,&nbsp;Giancarlo Croce ,&nbsp;Dana L Moreno ,&nbsp;David Gfeller ,&nbsp;Pieter Meysman ,&nbsp;Justin Barton","doi":"10.1016/j.immuno.2024.100045","DOIUrl":null,"url":null,"abstract":"<div><div>Here, we present the findings from IMMREP23, the second benchmark competition focused on predicting the specificity of TCR-pMHC interactions.</div><div>The interaction of T cell receptors (TCR) towards their pMHC target is a cornerstone of the cellular immune system. Over the last decade, substantial progress has been made within the field of TCR specificity prediction, providing proof of concept for predicting TCR-pMHC interactions in a narrow space of “seen” pMHC targets where substantial training data is available. However, a significant challenge persists in extending the predictive capability to novel “unseen” pMHC targets. Furthermore, the performance of proposed methods is often challenged when evaluated outside the initial publication and data sets.</div><div>To address these issues, IMMREP23 challenge invited participants to predict, for a given test set of TCR-pMHC pairs, the likelihood that a pair would bind. A total of 53 teams participated, providing a total of 398 submissions.</div><div>The benchmark confirms that current methods achieve reasonable performance in the \"seen\" pMHC setting. However, most participating methods had close to random performance on the subset of “unseen” peptides, underlining that this prediction challenge remains essentially unsolved.</div><div>Finally, another key lesson from the benchmark is the critical issue of data leakage. Specifically, the data set construction procedure employed in IMMREP23 led to biases in the negative test data set. These biases were identified by several participating teams, and complicated the interpretation of the benchmark results. Based on these results, we put forward suggestions on how future competitions could avoid such data leakages and biases.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"16 ","pages":"Article 100045"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Immunoinformatics (Amsterdam, Netherlands)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667119024000156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Here, we present the findings from IMMREP23, the second benchmark competition focused on predicting the specificity of TCR-pMHC interactions.
The interaction of T cell receptors (TCR) towards their pMHC target is a cornerstone of the cellular immune system. Over the last decade, substantial progress has been made within the field of TCR specificity prediction, providing proof of concept for predicting TCR-pMHC interactions in a narrow space of “seen” pMHC targets where substantial training data is available. However, a significant challenge persists in extending the predictive capability to novel “unseen” pMHC targets. Furthermore, the performance of proposed methods is often challenged when evaluated outside the initial publication and data sets.
To address these issues, IMMREP23 challenge invited participants to predict, for a given test set of TCR-pMHC pairs, the likelihood that a pair would bind. A total of 53 teams participated, providing a total of 398 submissions.
The benchmark confirms that current methods achieve reasonable performance in the "seen" pMHC setting. However, most participating methods had close to random performance on the subset of “unseen” peptides, underlining that this prediction challenge remains essentially unsolved.
Finally, another key lesson from the benchmark is the critical issue of data leakage. Specifically, the data set construction procedure employed in IMMREP23 led to biases in the negative test data set. These biases were identified by several participating teams, and complicated the interpretation of the benchmark results. Based on these results, we put forward suggestions on how future competitions could avoid such data leakages and biases.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从 IMMREP23 TCR 表位预测挑战中汲取的经验教训
T 细胞受体(TCR)与其 pMHC 靶点的相互作用是细胞免疫系统的基石。在过去的十年中,TCR 特异性预测领域取得了长足的进步,证明了在有大量训练数据的情况下,在 "可见 "pMHC 靶点的狭窄空间内预测 TCR-pMHC 相互作用的概念。然而,将预测能力扩展到 "未见 "的新型 pMHC 靶点仍是一个重大挑战。为了解决这些问题,IMMREP23 挑战赛邀请参赛者针对给定的 TCR-pMHC 对测试集,预测一对 TCR-pMHC 对结合的可能性。共有 53 个团队参加,提交了 398 份报告。该基准证实,目前的方法在 "看到的 "pMHC 环境中取得了合理的性能。然而,大多数参与方法在 "未见 "肽子集上的性能接近随机,这突出表明这一预测难题基本上仍未解决。最后,基准测试的另一个关键教训是数据泄漏这一关键问题。具体来说,IMMREP23 采用的数据集构建程序导致负测试数据集出现偏差。一些参与团队发现了这些偏差,并使基准结果的解释变得复杂。基于这些结果,我们就未来的竞赛如何避免此类数据泄漏和偏差提出了建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Immunoinformatics (Amsterdam, Netherlands)
Immunoinformatics (Amsterdam, Netherlands) Immunology, Computer Science Applications
自引率
0.00%
发文量
0
审稿时长
60 days
期刊最新文献
Scifer: An R/Bioconductor package for large-scale integration of Sanger sequencing and flow cytometry data of index-sorted single cells Lessons learned from the IMMREP23 TCR-epitope prediction challenge Multicohort analysis identifies conserved transcriptional interactions between humans and Plasmodium falciparum In silico modelling of CD8 T cell immune response links genetic regulation to population dynamics Data mining antibody sequences for database searching in bottom-up proteomics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1