Structural pre-training improves physical accuracy of antibody structure prediction using deep learning.

Jarosław Kończak, Bartosz Janusz, Jakub Młokosiewicz, Tadeusz Satława, Sonia Wróbel, Paweł Dudzic, Konrad Krawczyk
{"title":"Structural pre-training improves physical accuracy of antibody structure prediction using deep learning.","authors":"Jarosław Kończak,&nbsp;Bartosz Janusz,&nbsp;Jakub Młokosiewicz,&nbsp;Tadeusz Satława,&nbsp;Sonia Wróbel,&nbsp;Paweł Dudzic,&nbsp;Konrad Krawczyk","doi":"10.1016/j.immuno.2023.100028","DOIUrl":null,"url":null,"abstract":"<div><p>Protein folding problem obtained a practical solution recently, owing to advances in deep learning. There are classes of proteins though, such as antibodies, that are structurally unique, where the general solution still lacks. In particular, the prediction of the CDR-H3 loop, which is an instrumental part of an antibody in its antigen recognition abilities, remains a challenge. Antibody-specific deep learning frameworks were proposed to tackle this problem noting great progress, both on accuracy and speed fronts. Oftentimes though, the original networks produce physically implausible bond geometries that then need to undergo a time-consuming energy minimization process. Here we hypothesized that pre-training the network on a large, augmented set of models with correct physical geometries, rather than a small set of real antibody X-ray structures, would allow the network to learn better bond geometries. We show that fine-tuning such a pre-trained network on a task of shape prediction on real X-ray structures improves the number of correct peptide bond distances, abstracted as the Cα distances. We further demonstrate that pre-training allows the network to produce physically plausible shapes on an artificial set of CDR-H3s, showing the ability to generalize to the vast antibody sequence space. We hope that our strategy will benefit the development of deep learning antibody models that rapidly generate physically plausible geometries, without the burden of time-consuming energy minimization.</p></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"11 ","pages":"Article 100028"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Immunoinformatics (Amsterdam, Netherlands)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667119023000083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Protein folding problem obtained a practical solution recently, owing to advances in deep learning. There are classes of proteins though, such as antibodies, that are structurally unique, where the general solution still lacks. In particular, the prediction of the CDR-H3 loop, which is an instrumental part of an antibody in its antigen recognition abilities, remains a challenge. Antibody-specific deep learning frameworks were proposed to tackle this problem noting great progress, both on accuracy and speed fronts. Oftentimes though, the original networks produce physically implausible bond geometries that then need to undergo a time-consuming energy minimization process. Here we hypothesized that pre-training the network on a large, augmented set of models with correct physical geometries, rather than a small set of real antibody X-ray structures, would allow the network to learn better bond geometries. We show that fine-tuning such a pre-trained network on a task of shape prediction on real X-ray structures improves the number of correct peptide bond distances, abstracted as the Cα distances. We further demonstrate that pre-training allows the network to produce physically plausible shapes on an artificial set of CDR-H3s, showing the ability to generalize to the vast antibody sequence space. We hope that our strategy will benefit the development of deep learning antibody models that rapidly generate physically plausible geometries, without the burden of time-consuming energy minimization.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
结构预训练提高了利用深度学习进行抗体结构预测的物理准确性。
近年来,由于深度学习的进步,蛋白质折叠问题获得了实用的解决方案。不过,也有一些蛋白质,比如抗体,在结构上是独一无二的,而一般的解决方案仍然缺乏。特别是,预测CDR-H3环,这是抗体抗原识别能力的重要组成部分,仍然是一个挑战。提出了针对抗体的深度学习框架来解决这一问题,并在准确性和速度方面取得了巨大进展。然而,通常情况下,原始网络会产生物理上难以置信的键几何形状,然后需要经历一个耗时的能量最小化过程。在这里,我们假设在一个具有正确物理几何形状的大型增强模型集上预训练网络,而不是一小组真实的抗体x射线结构,将使网络能够更好地学习键的几何形状。我们表明,在实际x射线结构的形状预测任务上对这种预训练网络进行微调可以提高正确肽键距离的数量,抽象为Cα距离。我们进一步证明,预训练允许网络在一组人工CDR-H3s上产生物理上合理的形状,显示出推广到巨大抗体序列空间的能力。我们希望我们的策略将有利于深度学习抗体模型的发展,该模型可以快速生成物理上合理的几何形状,而无需耗费时间的能量最小化的负担。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Immunoinformatics (Amsterdam, Netherlands)
Immunoinformatics (Amsterdam, Netherlands) Immunology, Computer Science Applications
自引率
0.00%
发文量
0
审稿时长
60 days
期刊最新文献
Scifer: An R/Bioconductor package for large-scale integration of Sanger sequencing and flow cytometry data of index-sorted single cells Lessons learned from the IMMREP23 TCR-epitope prediction challenge Multicohort analysis identifies conserved transcriptional interactions between humans and Plasmodium falciparum In silico modelling of CD8 T cell immune response links genetic regulation to population dynamics Data mining antibody sequences for database searching in bottom-up proteomics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1