Mai Ha Vu, Philippe A. Robert, Rahmad Akbar, Bartlomiej Swiatczak, Geir Kjetil Sandve, Dag Trygve Truslew Haug, Victor Greiff
{"title":"Linguistics-based formalization of the antibody language as a basis for antibody language models","authors":"Mai Ha Vu, Philippe A. Robert, Rahmad Akbar, Bartlomiej Swiatczak, Geir Kjetil Sandve, Dag Trygve Truslew Haug, Victor Greiff","doi":"10.1038/s43588-024-00642-3","DOIUrl":null,"url":null,"abstract":"Apparent parallels between natural language and antibody sequences have led to a surge in deep language models applied to antibody sequences for predicting cognate antigen recognition. However, a linguistic formal definition of antibody language does not exist, and insight into how antibody language models capture antibody-specific binding features remains largely uninterpretable. Here we describe how a linguistic formalization of the antibody language, by characterizing its tokens and grammar, could address current challenges in antibody language model rule mining. The parallels between natural language and antibody sequences could serve as a stepping stone to using deep language models for analyzing antibody sequences. This Perspective discusses how issues in antibody language model rule mining could be addressed by linguistically formalizing the antibody language.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":12.0000,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature computational science","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s43588-024-00642-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Apparent parallels between natural language and antibody sequences have led to a surge in deep language models applied to antibody sequences for predicting cognate antigen recognition. However, a linguistic formal definition of antibody language does not exist, and insight into how antibody language models capture antibody-specific binding features remains largely uninterpretable. Here we describe how a linguistic formalization of the antibody language, by characterizing its tokens and grammar, could address current challenges in antibody language model rule mining. The parallels between natural language and antibody sequences could serve as a stepping stone to using deep language models for analyzing antibody sequences. This Perspective discusses how issues in antibody language model rule mining could be addressed by linguistically formalizing the antibody language.