Oliver M Turnbull, Dino Oglic, Rebecca Croasdale-Wood, Charlotte M Deane
{"title":"p-IgGen: a paired antibody generative language model.","authors":"Oliver M Turnbull, Dino Oglic, Rebecca Croasdale-Wood, Charlotte M Deane","doi":"10.1093/bioinformatics/btae659","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>A key challenge in antibody drug discovery is designing novel sequences that are free from developability issues-such as aggregation, polyspecificity, poor expression, or low solubility. Here, we present p-IgGen, a protein language model for paired heavy-light chain antibody generation. The model generates diverse, antibody-like sequences with pairing properties found in natural antibodies. We also create a finetuned version of p-IgGen that biases the model to generate antibodies with 3D biophysical properties that fall within distributions seen in clinical-stage therapeutic antibodies.</p><p><strong>Availability and implementation: </strong>The model and inference code are freely available at www.github.com/oxpig/p-IgGen. Cleaned training data are deposited at doi.org/10.5281/zenodo.13880874.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11576349/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae659","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Summary: A key challenge in antibody drug discovery is designing novel sequences that are free from developability issues-such as aggregation, polyspecificity, poor expression, or low solubility. Here, we present p-IgGen, a protein language model for paired heavy-light chain antibody generation. The model generates diverse, antibody-like sequences with pairing properties found in natural antibodies. We also create a finetuned version of p-IgGen that biases the model to generate antibodies with 3D biophysical properties that fall within distributions seen in clinical-stage therapeutic antibodies.
Availability and implementation: The model and inference code are freely available at www.github.com/oxpig/p-IgGen. Cleaned training data are deposited at doi.org/10.5281/zenodo.13880874.