FMDVSerPred: A Novel Computational Solution for Foot-and-mouth Disease Virus Classification and Serotype Prediction Prevalent in Asia using VP1 Nucleotide Sequence Data

IF 2.9 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS Current Bioinformatics Pub Date : 2024-01-29 DOI:10.2174/0115748936278851231213110653

Samarendra Das, Soumen Pal, Samyak Mahapatra, Jitendra K. Biswal, Sukanta K. Pradhan, Aditya P. Sahoo, Rabindra Prasad Singh

{"title":"FMDVSerPred: A Novel Computational Solution for Foot-and-mouth Disease Virus Classification and Serotype Prediction Prevalent in Asia using VP1 Nucleotide Sequence Data","authors":"Samarendra Das, Soumen Pal, Samyak Mahapatra, Jitendra K. Biswal, Sukanta K. Pradhan, Aditya P. Sahoo, Rabindra Prasad Singh","doi":"10.2174/0115748936278851231213110653","DOIUrl":null,"url":null,"abstract":"Background: Three serotypes of Foot-and-mouth disease (FMD) virus have been circulating in Asia, which are commonly identified by serological assays. Such tests are timeconsuming and also need a bio-containment facility for execution of the assays. To the best of our knowledge, no computational solution is available in the literature to predict the FMD virus serotypes. Thus, this necessitates the urgent need for user-friendly tools for FMD virus serotyping. Methods: We presented a computational solution based on a machine-learning model for FMD virus classification and serotype prediction. Besides, various data pre-processing techniques are implemented in the approach for better model prediction. We used sequence data of 2509 FMD virus isolates reported from India and seven other Asian FMD-endemic countries for model training, testing, and validation. We also studied the utility of the developed computational solution in a wet lab setup through collecting and sequencing of 12 virus isolates reported in India. Here, the computational solution is implemented in two user-friendly tools, i.e., online web-prediction server (https://nifmd-bbf.icar.gov.in/FMDVSerPred) and R statistical software package (https://github.com/sam-dfmd/FMDVSerPred). Results: The random forest machine learning model is implemented in the computational solution, as it outperformed seven other machine learning models when evaluated on ten test and independent datasets. Furthermore, the developed computational solution provided validation accuracies of up to 99.87% on test data, up to 98.64%, and 90.24% on independent data reported from Asian countries, including India and its seven neighboring countries, respectively. In addition, our approach was successfully used for predicting serotypes of field FMD virus isolates reported from various parts of India. Conclusion: Therefore, the high-throughput sequencing combined with machine learning offers a promising solution to FMD virus serotyping.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"38 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0115748936278851231213110653","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Three serotypes of Foot-and-mouth disease (FMD) virus have been circulating in Asia, which are commonly identified by serological assays. Such tests are timeconsuming and also need a bio-containment facility for execution of the assays. To the best of our knowledge, no computational solution is available in the literature to predict the FMD virus serotypes. Thus, this necessitates the urgent need for user-friendly tools for FMD virus serotyping. Methods: We presented a computational solution based on a machine-learning model for FMD virus classification and serotype prediction. Besides, various data pre-processing techniques are implemented in the approach for better model prediction. We used sequence data of 2509 FMD virus isolates reported from India and seven other Asian FMD-endemic countries for model training, testing, and validation. We also studied the utility of the developed computational solution in a wet lab setup through collecting and sequencing of 12 virus isolates reported in India. Here, the computational solution is implemented in two user-friendly tools, i.e., online web-prediction server (https://nifmd-bbf.icar.gov.in/FMDVSerPred) and R statistical software package (https://github.com/sam-dfmd/FMDVSerPred). Results: The random forest machine learning model is implemented in the computational solution, as it outperformed seven other machine learning models when evaluated on ten test and independent datasets. Furthermore, the developed computational solution provided validation accuracies of up to 99.87% on test data, up to 98.64%, and 90.24% on independent data reported from Asian countries, including India and its seven neighboring countries, respectively. In addition, our approach was successfully used for predicting serotypes of field FMD virus isolates reported from various parts of India. Conclusion: Therefore, the high-throughput sequencing combined with machine learning offers a promising solution to FMD virus serotyping.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

FMDVSerPred：利用 VP1 核苷酸序列数据对亚洲流行的口蹄疫病毒进行分类和血清型预测的新型计算解决方案

背景：口蹄疫病毒有三种血清型在亚洲流行，通常通过血清学检测进行鉴定。此类检测耗时较长，而且需要生物隔离设施来执行检测。据我们所知，文献中还没有预测口蹄疫病毒血清型的计算解决方案。因此，我们迫切需要便于使用的工具来进行口蹄疫病毒血清型鉴定。方法：我们提出了一种基于机器学习模型的计算解决方案，用于口蹄疫病毒分类和血清型预测。此外，该方法还采用了各种数据预处理技术，以便更好地进行模型预测。我们使用了从印度和其他七个亚洲口蹄疫流行国家报告的 2509 株口蹄疫病毒分离物的序列数据进行模型训练、测试和验证。我们还通过收集印度报告的 12 个病毒分离物并对其进行测序，在湿实验室设置中研究了所开发计算解决方案的实用性。在此，我们将计算解决方案应用于两个用户友好型工具，即在线网络预测服务器 (https://nifmd-bbf.icar.gov.in/FMDVSerPred) 和 R 统计软件包 (https://github.com/sam-dfmd/FMDVSerPred)。结果：计算解决方案中采用了随机森林机器学习模型，在十个独立测试数据集上进行评估时，该模型的表现优于其他七个机器学习模型。此外，所开发的计算解决方案在测试数据上的验证准确率高达 99.87%，在来自亚洲国家（包括印度及其七个邻国）的独立数据上的验证准确率高达 98.64% 和 90.24%。此外，我们的方法还成功地用于预测印度各地报告的现场口蹄疫病毒分离物的血清型。结论因此，高通量测序与机器学习相结合为口蹄疫病毒血清型鉴定提供了一种前景广阔的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Current Bioinformatics 生物-生化研究方法

CiteScore

6.60

自引率

2.50%

发文量

审稿时长

>12 weeks

期刊介绍： Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science. The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.