随机森林分类和变量选择方法鉴定地理特异性SARS-Cov-2突变

IF 0.3 Q4 STATISTICS & PROBABILITY Statistics and Applications Pub Date : 2020-07-01 Epub Date: 2020-06-30
Manoj Kandpal, Ramana V Davuluri
{"title":"随机森林分类和变量选择方法鉴定地理特异性SARS-Cov-2突变","authors":"Manoj Kandpal,&nbsp;Ramana V Davuluri","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>RNA viral genomes have very high mutations rates. As infection spreads in the host populations, different viral lineages emerge acquiring independent mutations that can lead to varied infection and death rates in different parts of the world. By application of Random Forest classification and feature selection methods, we developed an analysis pipeline for identification of geographic specific mutations and classification of different viral lineages, focusing on the missense-variants that alter the function of the encoded proteins. We applied the pipeline on publicly available SARS-CoV-2 datasets and demonstrated that the analysis pipeline accurately identified country or region-specific viral lineages and specific mutations that discriminate different lineages. The results presented here can help designing country-specific diagnostic strategies and prioritizing the mutations for functional interpretation and experimental validations.</p>","PeriodicalId":44466,"journal":{"name":"Statistics and Applications","volume":null,"pages":null},"PeriodicalIF":0.3000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514111/pdf/nihms-1620642.pdf","citationCount":"0","resultStr":"{\"title\":\"Identification of Geographic Specific SARS-Cov-2 Mutations by Random Forest Classification and Variable Selection Methods.\",\"authors\":\"Manoj Kandpal,&nbsp;Ramana V Davuluri\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>RNA viral genomes have very high mutations rates. As infection spreads in the host populations, different viral lineages emerge acquiring independent mutations that can lead to varied infection and death rates in different parts of the world. By application of Random Forest classification and feature selection methods, we developed an analysis pipeline for identification of geographic specific mutations and classification of different viral lineages, focusing on the missense-variants that alter the function of the encoded proteins. We applied the pipeline on publicly available SARS-CoV-2 datasets and demonstrated that the analysis pipeline accurately identified country or region-specific viral lineages and specific mutations that discriminate different lineages. The results presented here can help designing country-specific diagnostic strategies and prioritizing the mutations for functional interpretation and experimental validations.</p>\",\"PeriodicalId\":44466,\"journal\":{\"name\":\"Statistics and Applications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.3000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514111/pdf/nihms-1620642.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistics and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2020/6/30 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q4\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics and Applications","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/6/30 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

摘要

RNA病毒基因组具有非常高的突变率。随着感染在宿主人群中传播,不同的病毒谱系出现,获得独立的突变,这可能导致世界不同地区不同的感染率和死亡率。通过应用随机森林分类和特征选择方法,我们开发了一个分析管道,用于识别地理特异性突变和分类不同的病毒谱系,重点关注改变编码蛋白功能的错义变异。我们将该管道应用于公开可用的SARS-CoV-2数据集,并证明该分析管道准确地识别了国家或地区特定的病毒谱系和区分不同谱系的特定突变。这里提出的结果可以帮助设计特定国家的诊断策略,并优先考虑功能解释和实验验证的突变。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Identification of Geographic Specific SARS-Cov-2 Mutations by Random Forest Classification and Variable Selection Methods.

RNA viral genomes have very high mutations rates. As infection spreads in the host populations, different viral lineages emerge acquiring independent mutations that can lead to varied infection and death rates in different parts of the world. By application of Random Forest classification and feature selection methods, we developed an analysis pipeline for identification of geographic specific mutations and classification of different viral lineages, focusing on the missense-variants that alter the function of the encoded proteins. We applied the pipeline on publicly available SARS-CoV-2 datasets and demonstrated that the analysis pipeline accurately identified country or region-specific viral lineages and specific mutations that discriminate different lineages. The results presented here can help designing country-specific diagnostic strategies and prioritizing the mutations for functional interpretation and experimental validations.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Statistics and Applications
Statistics and Applications STATISTICS & PROBABILITY-
自引率
25.00%
发文量
0
期刊最新文献
Identification of Geographic Specific SARS-Cov-2 Mutations by Random Forest Classification and Variable Selection Methods.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1