{"title":"ARNLE model identifies prevalence potential of SARS-CoV-2 variants","authors":"Yuqi Liu, Jing Li, Peihan Li, Yehong Yang, Kaiying Wang, Jinhui Li, Lang Yang, Jiangfeng Liu, Leili Jia, Aiping Wu, Juntao Yang, Peng Li, Hongbin Song","doi":"10.1038/s42256-024-00919-2","DOIUrl":null,"url":null,"abstract":"<p>SARS-CoV-2 mutations accumulated during the COVID-19 pandemic, posing significant challenges for immune prevention. An optimistic perspective suggests that SARS-CoV-2 will become more tropic to humans with weaker virulence and stronger infectivity. However, tracing a quantified trajectory of this process remains difficult. Here we introduce an attentional recurrent network based on language embedding (ARNLE) framework to analyse the shift in SARS-CoV-2 host tropism towards humans. ARNLE incorporates a language model for self-supervised learning to capture the features of amino acid sequences, alongside a supervised bidirectional long-short-term-memory-based network to discern the relationship between mutations and host tropism among coronaviruses. We identified a shift in SARS-CoV-2 tropism from weak to strong, transitioning from an approximate Chiroptera coronavirus to a primate-tropic coronavirus. Delta variants were closer to other common primate coronaviruses than previous SARS-CoV-2 variants. A similar phenomenon was observed among the Omicron variants. We employed a Bayesian-based post hoc explanation method to analyse key mutations influencing the human tropism of SARS-CoV-2. ARNLE identified pivotal mutations in the spike proteins, including T478K, L452R, G142D and so on, as the top determinants of human tropism. Our findings suggest that language models like ARNLE will significantly facilitate the identification of potentially prevalent variants and provide important support for screening key mutations, aiding in timely update of vaccines to protect against future emerging SARS-CoV-2 variants.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"70 1","pages":""},"PeriodicalIF":18.8000,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1038/s42256-024-00919-2","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
SARS-CoV-2 mutations accumulated during the COVID-19 pandemic, posing significant challenges for immune prevention. An optimistic perspective suggests that SARS-CoV-2 will become more tropic to humans with weaker virulence and stronger infectivity. However, tracing a quantified trajectory of this process remains difficult. Here we introduce an attentional recurrent network based on language embedding (ARNLE) framework to analyse the shift in SARS-CoV-2 host tropism towards humans. ARNLE incorporates a language model for self-supervised learning to capture the features of amino acid sequences, alongside a supervised bidirectional long-short-term-memory-based network to discern the relationship between mutations and host tropism among coronaviruses. We identified a shift in SARS-CoV-2 tropism from weak to strong, transitioning from an approximate Chiroptera coronavirus to a primate-tropic coronavirus. Delta variants were closer to other common primate coronaviruses than previous SARS-CoV-2 variants. A similar phenomenon was observed among the Omicron variants. We employed a Bayesian-based post hoc explanation method to analyse key mutations influencing the human tropism of SARS-CoV-2. ARNLE identified pivotal mutations in the spike proteins, including T478K, L452R, G142D and so on, as the top determinants of human tropism. Our findings suggest that language models like ARNLE will significantly facilitate the identification of potentially prevalent variants and provide important support for screening key mutations, aiding in timely update of vaccines to protect against future emerging SARS-CoV-2 variants.
期刊介绍:
Nature Machine Intelligence is a distinguished publication that presents original research and reviews on various topics in machine learning, robotics, and AI. Our focus extends beyond these fields, exploring their profound impact on other scientific disciplines, as well as societal and industrial aspects. We recognize limitless possibilities wherein machine intelligence can augment human capabilities and knowledge in domains like scientific exploration, healthcare, medical diagnostics, and the creation of safe and sustainable cities, transportation, and agriculture. Simultaneously, we acknowledge the emergence of ethical, social, and legal concerns due to the rapid pace of advancements.
To foster interdisciplinary discussions on these far-reaching implications, Nature Machine Intelligence serves as a platform for dialogue facilitated through Comments, News Features, News & Views articles, and Correspondence. Our goal is to encourage a comprehensive examination of these subjects.
Similar to all Nature-branded journals, Nature Machine Intelligence operates under the guidance of a team of skilled editors. We adhere to a fair and rigorous peer-review process, ensuring high standards of copy-editing and production, swift publication, and editorial independence.