{"title":"AB-Gen:利用生成式预训练变换器和深度强化学习设计抗体库。","authors":"Xiaopeng Xu, Tiantian Xu, Juexiao Zhou, Xingyu Liao, Ruochi Zhang, Yu Wang, Lu Zhang, Xin Gao","doi":"10.1016/j.gpb.2023.03.004","DOIUrl":null,"url":null,"abstract":"<p><p>Antibody leads must fulfill multiple desirable properties to be clinical candidates. Primarily due to the low throughput in the experimental procedure, the need for such multi-property optimization causes the bottleneck in preclinical antibody discovery and development, because addressing one issue usually causes another. We developed a reinforcement learning (RL) method, named AB-Gen, for antibody library design using a generative pre-trained transformer (GPT) as the policy network of the RL agent. We showed that this model can learn the antibody space of heavy chain complementarity determining region 3 (CDRH3) and generate sequences with similar property distributions. Besides, when using human epidermal growth factor receptor-2 (HER2) as the target, the agent model of AB-Gen was able to generate novel CDRH3 sequences that fulfill multi-property constraints. Totally, 509 generated sequences were able to pass all property filters, and three highly conserved residues were identified. The importance of these residues was further demonstrated by molecular dynamics simulations, consolidating that the agent model was capable of grasping important information in this complex optimization task. Overall, the AB-Gen method is able to design novel antibody sequences with an improved success rate than the traditional propose-then-filter approach. It has the potential to be used in practical antibody design, thus empowering the antibody discovery and development process. The source code of AB-Gen is freely available at Zenodo (https://doi.org/10.5281/zenodo.7657016) and BioCode (https://ngdc.cncb.ac.cn/biocode/tools/BT007341).</p>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":" ","pages":"1043-1053"},"PeriodicalIF":11.5000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10928431/pdf/","citationCount":"0","resultStr":"{\"title\":\"AB-Gen: Antibody Library Design with Generative Pre-trained Transformer and Deep Reinforcement Learning.\",\"authors\":\"Xiaopeng Xu, Tiantian Xu, Juexiao Zhou, Xingyu Liao, Ruochi Zhang, Yu Wang, Lu Zhang, Xin Gao\",\"doi\":\"10.1016/j.gpb.2023.03.004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Antibody leads must fulfill multiple desirable properties to be clinical candidates. Primarily due to the low throughput in the experimental procedure, the need for such multi-property optimization causes the bottleneck in preclinical antibody discovery and development, because addressing one issue usually causes another. We developed a reinforcement learning (RL) method, named AB-Gen, for antibody library design using a generative pre-trained transformer (GPT) as the policy network of the RL agent. We showed that this model can learn the antibody space of heavy chain complementarity determining region 3 (CDRH3) and generate sequences with similar property distributions. Besides, when using human epidermal growth factor receptor-2 (HER2) as the target, the agent model of AB-Gen was able to generate novel CDRH3 sequences that fulfill multi-property constraints. Totally, 509 generated sequences were able to pass all property filters, and three highly conserved residues were identified. The importance of these residues was further demonstrated by molecular dynamics simulations, consolidating that the agent model was capable of grasping important information in this complex optimization task. Overall, the AB-Gen method is able to design novel antibody sequences with an improved success rate than the traditional propose-then-filter approach. It has the potential to be used in practical antibody design, thus empowering the antibody discovery and development process. The source code of AB-Gen is freely available at Zenodo (https://doi.org/10.5281/zenodo.7657016) and BioCode (https://ngdc.cncb.ac.cn/biocode/tools/BT007341).</p>\",\"PeriodicalId\":12528,\"journal\":{\"name\":\"Genomics, Proteomics & Bioinformatics\",\"volume\":\" \",\"pages\":\"1043-1053\"},\"PeriodicalIF\":11.5000,\"publicationDate\":\"2023-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10928431/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genomics, Proteomics & Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.gpb.2023.03.004\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/6/24 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics, Proteomics & Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.gpb.2023.03.004","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/6/24 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
AB-Gen: Antibody Library Design with Generative Pre-trained Transformer and Deep Reinforcement Learning.
Antibody leads must fulfill multiple desirable properties to be clinical candidates. Primarily due to the low throughput in the experimental procedure, the need for such multi-property optimization causes the bottleneck in preclinical antibody discovery and development, because addressing one issue usually causes another. We developed a reinforcement learning (RL) method, named AB-Gen, for antibody library design using a generative pre-trained transformer (GPT) as the policy network of the RL agent. We showed that this model can learn the antibody space of heavy chain complementarity determining region 3 (CDRH3) and generate sequences with similar property distributions. Besides, when using human epidermal growth factor receptor-2 (HER2) as the target, the agent model of AB-Gen was able to generate novel CDRH3 sequences that fulfill multi-property constraints. Totally, 509 generated sequences were able to pass all property filters, and three highly conserved residues were identified. The importance of these residues was further demonstrated by molecular dynamics simulations, consolidating that the agent model was capable of grasping important information in this complex optimization task. Overall, the AB-Gen method is able to design novel antibody sequences with an improved success rate than the traditional propose-then-filter approach. It has the potential to be used in practical antibody design, thus empowering the antibody discovery and development process. The source code of AB-Gen is freely available at Zenodo (https://doi.org/10.5281/zenodo.7657016) and BioCode (https://ngdc.cncb.ac.cn/biocode/tools/BT007341).
期刊介绍:
Genomics, Proteomics and Bioinformatics (GPB) is the official journal of the Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation and Genetics Society of China. It aims to disseminate new developments in the field of omics and bioinformatics, publish high-quality discoveries quickly, and promote open access and online publication. GPB welcomes submissions in all areas of life science, biology, and biomedicine, with a focus on large data acquisition, analysis, and curation. Manuscripts covering omics and related bioinformatics topics are particularly encouraged. GPB is indexed/abstracted by PubMed/MEDLINE, PubMed Central, Scopus, BIOSIS Previews, Chemical Abstracts, CSCD, among others.