Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features.

IF 5.6 2区 医学 Q1 MEDICINE, RESEARCH & EXPERIMENTAL mAbs Pub Date : 2023-01-01 DOI:10.1080/19420862.2022.2163584
Ameya Harmalkar, Roshan Rao, Yuxuan Richard Xie, Jonas Honer, Wibke Deisting, Jonas Anlahr, Anja Hoenig, Julia Czwikla, Eva Sienz-Widmann, Doris Rau, Austin J Rice, Timothy P Riley, Danqing Li, Hannah B Catterall, Christine E Tinberg, Jeffrey J Gray, Kathy Y Wei
{"title":"Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features.","authors":"Ameya Harmalkar,&nbsp;Roshan Rao,&nbsp;Yuxuan Richard Xie,&nbsp;Jonas Honer,&nbsp;Wibke Deisting,&nbsp;Jonas Anlahr,&nbsp;Anja Hoenig,&nbsp;Julia Czwikla,&nbsp;Eva Sienz-Widmann,&nbsp;Doris Rau,&nbsp;Austin J Rice,&nbsp;Timothy P Riley,&nbsp;Danqing Li,&nbsp;Hannah B Catterall,&nbsp;Christine E Tinberg,&nbsp;Jeffrey J Gray,&nbsp;Kathy Y Wei","doi":"10.1080/19420862.2022.2163584","DOIUrl":null,"url":null,"abstract":"<p><p>Over the last three decades, the appeal for monoclonal antibodies (mAbs) as therapeutics has been steadily increasing as evident with FDA's recent landmark approval of the 100th mAb. Unlike mAbs that bind to single targets, multispecific biologics (msAbs) have garnered particular interest owing to the advantage of engaging distinct targets. One important modular component of msAbs is the single-chain variable fragment (scFv). Despite the exquisite specificity and affinity of these scFv modules, their relatively poor thermostability often hampers their development as a potential therapeutic drug. In recent years, engineering antibody sequences to enhance their stability by mutations has gained considerable momentum. As experimental methods for antibody engineering are time-intensive, laborious and expensive, computational methods serve as a fast and inexpensive alternative to conventional routes. In this work, we show two machine learning approaches - one with pre-trained language models (PTLM) capturing functional effects of sequence variation, and second, a supervised convolutional neural network (CNN) trained with Rosetta energetic features - to better classify thermostable scFv variants from sequence. Both of these models are trained over temperature-specific data (TS50 measurements) derived from multiple libraries of scFv sequences. On out-of-distribution (refers to the fact that the out-of-distribution sequnes are blind to the algorithm) sequences, we show that a sufficiently simple CNN model performs better than general pre-trained language models trained on diverse protein sequences (average Spearman correlation coefficient, <math><mi>ρ</mi></math>, of 0.4 as opposed to 0.15). On the other hand, an antibody-specific language model performs comparatively better than the CNN model on the same task (<math><mi>ρ</mi><mo>=</mo></math> 0.52). Further, we demonstrate that for an independent mAb with available thermal melting temperatures for 20 experimentally characterized thermostable mutations, these models trained on TS50 data could identify 18 residue positions and 5 identical amino-acid mutations showing remarkable generalizability. Our results suggest that such models can be broadly applicable for improving the biological characteristics of antibodies. Further, transferring such models for alternative physicochemical properties of scFvs can have potential applications in optimizing large-scale production and delivery of mAbs or bsAbs.</p>","PeriodicalId":18206,"journal":{"name":"mAbs","volume":null,"pages":null},"PeriodicalIF":5.6000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/81/cc/KMAB_15_2163584.PMC9872953.pdf","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"mAbs","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/19420862.2022.2163584","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 3

Abstract

Over the last three decades, the appeal for monoclonal antibodies (mAbs) as therapeutics has been steadily increasing as evident with FDA's recent landmark approval of the 100th mAb. Unlike mAbs that bind to single targets, multispecific biologics (msAbs) have garnered particular interest owing to the advantage of engaging distinct targets. One important modular component of msAbs is the single-chain variable fragment (scFv). Despite the exquisite specificity and affinity of these scFv modules, their relatively poor thermostability often hampers their development as a potential therapeutic drug. In recent years, engineering antibody sequences to enhance their stability by mutations has gained considerable momentum. As experimental methods for antibody engineering are time-intensive, laborious and expensive, computational methods serve as a fast and inexpensive alternative to conventional routes. In this work, we show two machine learning approaches - one with pre-trained language models (PTLM) capturing functional effects of sequence variation, and second, a supervised convolutional neural network (CNN) trained with Rosetta energetic features - to better classify thermostable scFv variants from sequence. Both of these models are trained over temperature-specific data (TS50 measurements) derived from multiple libraries of scFv sequences. On out-of-distribution (refers to the fact that the out-of-distribution sequnes are blind to the algorithm) sequences, we show that a sufficiently simple CNN model performs better than general pre-trained language models trained on diverse protein sequences (average Spearman correlation coefficient, ρ, of 0.4 as opposed to 0.15). On the other hand, an antibody-specific language model performs comparatively better than the CNN model on the same task (ρ= 0.52). Further, we demonstrate that for an independent mAb with available thermal melting temperatures for 20 experimentally characterized thermostable mutations, these models trained on TS50 data could identify 18 residue positions and 5 identical amino-acid mutations showing remarkable generalizability. Our results suggest that such models can be broadly applicable for improving the biological characteristics of antibodies. Further, transferring such models for alternative physicochemical properties of scFvs can have potential applications in optimizing large-scale production and delivery of mAbs or bsAbs.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于序列和结构特征的机器学习对抗体热稳定性的可推广预测。
在过去的三十年里,单克隆抗体(mAb)作为治疗药物的吸引力一直在稳步增加,这一点随着美国食品药品监督管理局最近里程碑式地批准了第100种mAb而显而易见。与与单一靶点结合的单克隆抗体不同,多特异性生物制品(msAbs)因其与不同靶点结合而引起了人们的特别兴趣。msAbs的一个重要模块化成分是单链可变片段(scFv)。尽管这些scFv模块具有极好的特异性和亲和力,但它们相对较差的热稳定性往往阻碍了它们作为潜在治疗药物的发展。近年来,通过突变增强抗体序列稳定性的工程抗体序列获得了相当大的发展势头。由于抗体工程的实验方法耗时、费力且昂贵,因此计算方法是传统方法的快速廉价替代方法。在这项工作中,我们展示了两种机器学习方法——一种是预训练的语言模型(PTLM)捕捉序列变化的功能效应,另一种是用罗塞塔能量特征训练的监督卷积神经网络(CNN)——以更好地从序列中对热稳定scFv变体进行分类。这两个模型都是在源自scFv序列的多个库的温度特异性数据(TS50测量)上训练的。在分布外(指的是分布外序列对算法是盲的)序列上,我们表明,足够简单的CNN模型比在不同蛋白质序列上训练的一般预训练语言模型表现更好(平均Spearman相关系数ρ为0.4,而不是0.15)。另一方面,抗体特异性语言模型在相同任务中的表现相对优于CNN模型(ρ=0.52)。此外,我们证明,对于具有20个实验表征的热稳定突变的可用热熔解温度的独立mAb,这些基于TS50数据训练的模型可以识别18个残基位置和5个相同的氨基酸突变,显示出显著的可推广性。我们的研究结果表明,这种模型可以广泛应用于改善抗体的生物学特性。此外,转移这种用于scFvs的替代物理化学性质的模型可以在优化mAbs或bsAbs的大规模生产和递送方面具有潜在的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
mAbs
mAbs 工程技术-仪器仪表
CiteScore
10.70
自引率
11.30%
发文量
77
审稿时长
6-12 weeks
期刊介绍: mAbs is a multi-disciplinary journal dedicated to the art and science of antibody research and development. The journal has a strong scientific and medical focus, but also strives to serve a broader readership. The articles are thus of interest to scientists, clinical researchers, and physicians, as well as the wider mAb community, including our readers involved in technology transfer, legal issues, investment, strategic planning and the regulation of therapeutics.
期刊最新文献
Sequence-based engineering of pH-sensitive antibodies for tumor targeting or endosomal recycling applications. Systematic analysis of Fc mutations designed to reduce binding to Fc-gamma receptors Navigating large-volume subcutaneous injections of biopharmaceuticals: a systematic review of clinical pipelines and approved products Antibody association in solution: cluster distributions and mechanisms Targeted CQA analytical control strategy for commercial antibody products: Replacing ion-exchange chromatography methods for charge heterogeneity with multi-attribute monitoring
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1