Mutation prediction in the SARS-CoV-2 genome using attention-based neural machine translation.

IF 2.6 4区 工程技术 Q1 Mathematics Mathematical Biosciences and Engineering Pub Date : 2024-05-20 DOI:10.3934/mbe.2024264
Darrak Moin Quddusi, Sandesh Athni Hiremath, Naim Bajcinca
{"title":"Mutation prediction in the SARS-CoV-2 genome using attention-based neural machine translation.","authors":"Darrak Moin Quddusi, Sandesh Athni Hiremath, Naim Bajcinca","doi":"10.3934/mbe.2024264","DOIUrl":null,"url":null,"abstract":"<p><p>Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) has been evolving rapidly after causing havoc worldwide in 2020. Since then, it has been very hard to contain the virus owing to its frequently mutating nature. Changes in its genome lead to viral evolution, rendering it more resistant to existing vaccines and drugs. Predicting viral mutations beforehand will help in gearing up against more infectious and virulent versions of the virus in turn decreasing the damage caused by them. In this paper, we have proposed different NMT (neural machine translation) architectures based on RNNs (recurrent neural networks) to predict mutations in the SARS-CoV-2-selected non-structural proteins (NSP), i.e., NSP1, NSP3, NSP5, NSP8, NSP9, NSP13, and NSP15. First, we created and pre-processed the pairs of sequences from two languages using k-means clustering and nearest neighbors for training a neural translation machine. We also provided insights for training NMTs on long biological sequences. In addition, we evaluated and benchmarked our models to demonstrate their efficiency and reliability.</p>","PeriodicalId":49870,"journal":{"name":"Mathematical Biosciences and Engineering","volume":"21 5","pages":"5996-6018"},"PeriodicalIF":2.6000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Biosciences and Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3934/mbe.2024264","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

Abstract

Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) has been evolving rapidly after causing havoc worldwide in 2020. Since then, it has been very hard to contain the virus owing to its frequently mutating nature. Changes in its genome lead to viral evolution, rendering it more resistant to existing vaccines and drugs. Predicting viral mutations beforehand will help in gearing up against more infectious and virulent versions of the virus in turn decreasing the damage caused by them. In this paper, we have proposed different NMT (neural machine translation) architectures based on RNNs (recurrent neural networks) to predict mutations in the SARS-CoV-2-selected non-structural proteins (NSP), i.e., NSP1, NSP3, NSP5, NSP8, NSP9, NSP13, and NSP15. First, we created and pre-processed the pairs of sequences from two languages using k-means clustering and nearest neighbors for training a neural translation machine. We also provided insights for training NMTs on long biological sequences. In addition, we evaluated and benchmarked our models to demonstrate their efficiency and reliability.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用基于注意力的神经机器翻译预测 SARS-CoV-2 基因组中的突变。
严重急性呼吸系统综合症冠状病毒 2(SARS-CoV-2)自 2020 年在全球范围内造成严重破坏后,一直在迅速演变。从那时起,由于该病毒频繁变异,一直很难对其进行控制。病毒基因组的变化导致病毒进化,使其对现有疫苗和药物更具抵抗力。提前预测病毒变异将有助于应对更具传染性和毒性的病毒版本,从而减少病毒造成的损害。在本文中,我们提出了基于 RNN(递归神经网络)的不同 NMT(神经机器翻译)架构,用于预测 SARS-CoV-2 选定的非结构蛋白(NSP),即 NSP1、NSP3、NSP5、NSP8、NSP9、NSP13 和 NSP15 的突变。首先,我们使用 k-means 聚类和近邻法创建并预处理了来自两种语言的序列对,用于训练神经翻译机。我们还为在长生物序列上训练神经翻译机提供了见解。此外,我们还对模型进行了评估和基准测试,以证明其效率和可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Mathematical Biosciences and Engineering
Mathematical Biosciences and Engineering 工程技术-数学跨学科应用
CiteScore
3.90
自引率
7.70%
发文量
586
审稿时长
>12 weeks
期刊介绍: Mathematical Biosciences and Engineering (MBE) is an interdisciplinary Open Access journal promoting cutting-edge research, technology transfer and knowledge translation about complex data and information processing. MBE publishes Research articles (long and original research); Communications (short and novel research); Expository papers; Technology Transfer and Knowledge Translation reports (description of new technologies and products); Announcements and Industrial Progress and News (announcements and even advertisement, including major conferences).
期刊最新文献
Multiscale modelling of hepatitis B virus at cell level of organization. Global sensitivity analysis and uncertainty quantification for a mathematical model of dry anaerobic digestion in plug-flow reactors. Depression-induced changes in directed functional brain networks: A source-space resting-state EEG study. Mathematical modeling of infectious diseases and the impact of vaccination strategies. Retraction notice to "A novel architecture design for artificial intelligence-assisted culture conservation management system" [Mathematical Biosciences and Engineering 20(6) (2023) 9693-9711].
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1