Zachary A Rollins, Talal Widatalla, Andrew Waight, Alan C Cheng, Essam Metwally
{"title":"AbLEF: Antibody Language Ensemble Fusion for thermodynamically empowered property predictions.","authors":"Zachary A Rollins, Talal Widatalla, Andrew Waight, Alan C Cheng, Essam Metwally","doi":"10.1093/bioinformatics/btae268","DOIUrl":null,"url":null,"abstract":"MOTIVATION\nPre-trained protein language and/or structural models are often fine-tuned on drug development properties (ie, developability properties) to accelerate drug discovery initiatives. However, these models generally rely on a single structural conformation and/or a single sequence as a molecular representation. We present a physics-based model whereby 3D conformational ensemble representations are fused by a transformer-based architecture and concatenated to a language representation to predict antibody protein properties. AbLEF enables the direct infusion of thermodynamic information into latent space and this enhances property prediction by explicitly infusing dynamic molecular behavior that occurs during experimental measurement.\n\n\nRESULTS\nWe showcase the AbLEF model on two developability properties: hydrophobic interaction chromatography retention time (HIC-RT) and temperature of aggregation (Tagg). We find that (1) 3D conformational ensembles that are generated from molecular simulation can further improve antibody property prediction for small datasets, (2) the performance benefit from 3D conformational ensembles matches shallow machine learning methods in the small data regime, and (3) fine-tuned large protein language models can match smaller antibody-specific language models at predicting antibody properties.\n\n\nAVAILABILITY AND IMPLEMENTATION\nAbLEF codebase is available at https://github.com/merck/AbLEF.\n\n\nSUPPLEMENTARY INFORMATION\nSupplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae268","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
MOTIVATION
Pre-trained protein language and/or structural models are often fine-tuned on drug development properties (ie, developability properties) to accelerate drug discovery initiatives. However, these models generally rely on a single structural conformation and/or a single sequence as a molecular representation. We present a physics-based model whereby 3D conformational ensemble representations are fused by a transformer-based architecture and concatenated to a language representation to predict antibody protein properties. AbLEF enables the direct infusion of thermodynamic information into latent space and this enhances property prediction by explicitly infusing dynamic molecular behavior that occurs during experimental measurement.
RESULTS
We showcase the AbLEF model on two developability properties: hydrophobic interaction chromatography retention time (HIC-RT) and temperature of aggregation (Tagg). We find that (1) 3D conformational ensembles that are generated from molecular simulation can further improve antibody property prediction for small datasets, (2) the performance benefit from 3D conformational ensembles matches shallow machine learning methods in the small data regime, and (3) fine-tuned large protein language models can match smaller antibody-specific language models at predicting antibody properties.
AVAILABILITY AND IMPLEMENTATION
AbLEF codebase is available at https://github.com/merck/AbLEF.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
期刊介绍:
The leading journal in its field, Bioinformatics publishes the highest quality scientific papers and review articles of interest to academic and industrial researchers. Its main focus is on new developments in genome bioinformatics and computational biology. Two distinct sections within the journal - Discovery Notes and Application Notes- focus on shorter papers; the former reporting biologically interesting discoveries using computational methods, the latter exploring the applications used for experiments.