Structural pre-training improves physical accuracy of antibody structure prediction using deep learning.

Immunoinformatics (Amsterdam, Netherlands) Pub Date : 2023-09-01 DOI:10.1016/j.immuno.2023.100028

Jarosław Kończak, Bartosz Janusz, Jakub Młokosiewicz, Tadeusz Satława, Sonia Wróbel, Paweł Dudzic, Konrad Krawczyk

{"title":"Structural pre-training improves physical accuracy of antibody structure prediction using deep learning.","authors":"Jarosław Kończak, Bartosz Janusz, Jakub Młokosiewicz, Tadeusz Satława, Sonia Wróbel, Paweł Dudzic, Konrad Krawczyk","doi":"10.1016/j.immuno.2023.100028","DOIUrl":null,"url":null,"abstract":"<div><p>Protein folding problem obtained a practical solution recently, owing to advances in deep learning. There are classes of proteins though, such as antibodies, that are structurally unique, where the general solution still lacks. In particular, the prediction of the CDR-H3 loop, which is an instrumental part of an antibody in its antigen recognition abilities, remains a challenge. Antibody-specific deep learning frameworks were proposed to tackle this problem noting great progress, both on accuracy and speed fronts. Oftentimes though, the original networks produce physically implausible bond geometries that then need to undergo a time-consuming energy minimization process. Here we hypothesized that pre-training the network on a large, augmented set of models with correct physical geometries, rather than a small set of real antibody X-ray structures, would allow the network to learn better bond geometries. We show that fine-tuning such a pre-trained network on a task of shape prediction on real X-ray structures improves the number of correct peptide bond distances, abstracted as the Cα distances. We further demonstrate that pre-training allows the network to produce physically plausible shapes on an artificial set of CDR-H3s, showing the ability to generalize to the vast antibody sequence space. We hope that our strategy will benefit the development of deep learning antibody models that rapidly generate physically plausible geometries, without the burden of time-consuming energy minimization.</p></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"11 ","pages":"Article 100028"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Immunoinformatics (Amsterdam, Netherlands)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667119023000083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Protein folding problem obtained a practical solution recently, owing to advances in deep learning. There are classes of proteins though, such as antibodies, that are structurally unique, where the general solution still lacks. In particular, the prediction of the CDR-H3 loop, which is an instrumental part of an antibody in its antigen recognition abilities, remains a challenge. Antibody-specific deep learning frameworks were proposed to tackle this problem noting great progress, both on accuracy and speed fronts. Oftentimes though, the original networks produce physically implausible bond geometries that then need to undergo a time-consuming energy minimization process. Here we hypothesized that pre-training the network on a large, augmented set of models with correct physical geometries, rather than a small set of real antibody X-ray structures, would allow the network to learn better bond geometries. We show that fine-tuning such a pre-trained network on a task of shape prediction on real X-ray structures improves the number of correct peptide bond distances, abstracted as the Cα distances. We further demonstrate that pre-training allows the network to produce physically plausible shapes on an artificial set of CDR-H3s, showing the ability to generalize to the vast antibody sequence space. We hope that our strategy will benefit the development of deep learning antibody models that rapidly generate physically plausible geometries, without the burden of time-consuming energy minimization.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

结构预训练提高了利用深度学习进行抗体结构预测的物理准确性。

近年来，由于深度学习的进步，蛋白质折叠问题获得了实用的解决方案。不过，也有一些蛋白质，比如抗体，在结构上是独一无二的，而一般的解决方案仍然缺乏。特别是，预测CDR-H3环，这是抗体抗原识别能力的重要组成部分，仍然是一个挑战。提出了针对抗体的深度学习框架来解决这一问题，并在准确性和速度方面取得了巨大进展。然而，通常情况下，原始网络会产生物理上难以置信的键几何形状，然后需要经历一个耗时的能量最小化过程。在这里，我们假设在一个具有正确物理几何形状的大型增强模型集上预训练网络，而不是一小组真实的抗体x射线结构，将使网络能够更好地学习键的几何形状。我们表明，在实际x射线结构的形状预测任务上对这种预训练网络进行微调可以提高正确肽键距离的数量，抽象为Cα距离。我们进一步证明，预训练允许网络在一组人工CDR-H3s上产生物理上合理的形状，显示出推广到巨大抗体序列空间的能力。我们希望我们的策略将有利于深度学习抗体模型的发展，该模型可以快速生成物理上合理的几何形状，而无需耗费时间的能量最小化的负担。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊