{"title":"Predicting Multi-Epitope Vaccine Candidates Using Natural Language Processing and Deep Learning","authors":"Xiaozhi Yuan, Daniel Bibl, Kahlil Khan, Lei Sun","doi":"10.1109/BIBE52308.2021.9635304","DOIUrl":null,"url":null,"abstract":"In silico approach can make vaccine designs more efficient and cost-effective. It complements the traditional process and becomes extremely valuable in coping with pandemics such as COVID-19. A recent study proposed an artificial intelligence-based framework to predict and design multi-epitope vaccines for the SARS-CoV-2 virus. However, we found several issues in its dataset design as well as its neural network design. To achieve more reliable predictions of the potential vaccine subunits, we create a more reliable and larger dataset for machine learning experiments. We apply natural language processing techniques and build neural networks composed of convolutional layer and recurrent layer to identify peptide sequences as vaccine candidates. We also train a classifier using embeddings from a pre-trained Transformer protein language model, which provides a baseline for comparison. Experimental results demonstrate that our models achieve high performance in classification accuracy and the area under the receiver operating characteristic curve.","PeriodicalId":343724,"journal":{"name":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE52308.2021.9635304","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In silico approach can make vaccine designs more efficient and cost-effective. It complements the traditional process and becomes extremely valuable in coping with pandemics such as COVID-19. A recent study proposed an artificial intelligence-based framework to predict and design multi-epitope vaccines for the SARS-CoV-2 virus. However, we found several issues in its dataset design as well as its neural network design. To achieve more reliable predictions of the potential vaccine subunits, we create a more reliable and larger dataset for machine learning experiments. We apply natural language processing techniques and build neural networks composed of convolutional layer and recurrent layer to identify peptide sequences as vaccine candidates. We also train a classifier using embeddings from a pre-trained Transformer protein language model, which provides a baseline for comparison. Experimental results demonstrate that our models achieve high performance in classification accuracy and the area under the receiver operating characteristic curve.