Tao Zhang, Yiyun Liu, Xutong Guo, Xinran Zhang, Xinchang Zheng, Mochen Zhang, Yiming Bao
{"title":"VISTA:病毒基因组序列快速分类分配工具。","authors":"Tao Zhang, Yiyun Liu, Xutong Guo, Xinran Zhang, Xinchang Zheng, Mochen Zhang, Yiming Bao","doi":"10.1093/gpbjnl/qzae082","DOIUrl":null,"url":null,"abstract":"<p><p>The rapid expansion of the number of viral genome sequences in public databases necessitates a scalable, universal, and automated preliminary taxonomic framework for comprehensive virus studies. Here, we introduce VISTA (Virus Sequence-based Taxonomy Assignment), a computational tool that employs a novel pairwise sequence comparison system and an automatic demarcation threshold identification framework for virus taxonomy. Leveraging physio-chemical property sequences, k-mer profiles, and machine learning techniques, VISTA constructs a robust distance-based framework for taxonomic assignment. Functionally similar to PASC (Pairwise Sequence Comparison), a widely used virus assignment tool based on pairwise sequence comparison, VISTA demonstrates superior performance by providing significantly improved separation for taxonomic groups, more objective taxonomic demarcation thresholds, greatly enhanced speed, and a wider application scope. We successfully applied VISTA to 38 virus families, as well as to the class Caudoviricetes. This demonstrates VISTA's scalability, robustness, and ability to automatically and accurately assign taxonomy to both prokaryotic and eukaryotic viruses. Furthermore, the application of VISTA to 679 unclassified prokaryotic virus genomes recovered from metagenomic data identified 46 novel virus families. VISTA is available as both a command line tool and a user-friendly web portal at https://ngdc.cncb.ac.cn/vista.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VISTA: A Tool for Fast Taxonomic Assignment of Viral Genome Sequences.\",\"authors\":\"Tao Zhang, Yiyun Liu, Xutong Guo, Xinran Zhang, Xinchang Zheng, Mochen Zhang, Yiming Bao\",\"doi\":\"10.1093/gpbjnl/qzae082\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The rapid expansion of the number of viral genome sequences in public databases necessitates a scalable, universal, and automated preliminary taxonomic framework for comprehensive virus studies. Here, we introduce VISTA (Virus Sequence-based Taxonomy Assignment), a computational tool that employs a novel pairwise sequence comparison system and an automatic demarcation threshold identification framework for virus taxonomy. Leveraging physio-chemical property sequences, k-mer profiles, and machine learning techniques, VISTA constructs a robust distance-based framework for taxonomic assignment. Functionally similar to PASC (Pairwise Sequence Comparison), a widely used virus assignment tool based on pairwise sequence comparison, VISTA demonstrates superior performance by providing significantly improved separation for taxonomic groups, more objective taxonomic demarcation thresholds, greatly enhanced speed, and a wider application scope. We successfully applied VISTA to 38 virus families, as well as to the class Caudoviricetes. This demonstrates VISTA's scalability, robustness, and ability to automatically and accurately assign taxonomy to both prokaryotic and eukaryotic viruses. Furthermore, the application of VISTA to 679 unclassified prokaryotic virus genomes recovered from metagenomic data identified 46 novel virus families. VISTA is available as both a command line tool and a user-friendly web portal at https://ngdc.cncb.ac.cn/vista.</p>\",\"PeriodicalId\":94020,\"journal\":{\"name\":\"Genomics, proteomics & bioinformatics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genomics, proteomics & bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/gpbjnl/qzae082\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics, proteomics & bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/gpbjnl/qzae082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
随着公共数据库中病毒基因组序列数量的迅速增加,需要一个可扩展、通用和自动化的初步分类框架来进行全面的病毒研究。我们在此介绍 VISTA(基于病毒序列的分类分配),它是一种计算工具,采用了新颖的成对序列比较系统和自动分界阈值识别框架来进行病毒分类。VISTA 利用物理化学特性序列、k-mer 剖面和机器学习技术,构建了一个基于距离的稳健分类分配框架。VISTA 在功能上类似于 PASC(成对序列比较),后者是一种广泛使用的基于成对序列比较的病毒分类工具,VISTA 通过显著提高分类组的分离度、更客观的分类划分阈值、大大提高的速度和更广泛的应用范围,展示了卓越的性能。我们成功地将 VISTA 应用于 38 个病毒科和 Caudoviricetes 类。这证明了 VISTA 的可扩展性、稳健性以及自动、准确地对原核和真核病毒进行分类的能力。此外,将 VISTA 应用于从元基因组数据中恢复的 679 个未分类的原核病毒基因组,发现了 46 个新的病毒科。VISTA 既是命令行工具,也是用户友好的门户网站,网址是 https://ngdc.cncb.ac.cn/vista。
VISTA: A Tool for Fast Taxonomic Assignment of Viral Genome Sequences.
The rapid expansion of the number of viral genome sequences in public databases necessitates a scalable, universal, and automated preliminary taxonomic framework for comprehensive virus studies. Here, we introduce VISTA (Virus Sequence-based Taxonomy Assignment), a computational tool that employs a novel pairwise sequence comparison system and an automatic demarcation threshold identification framework for virus taxonomy. Leveraging physio-chemical property sequences, k-mer profiles, and machine learning techniques, VISTA constructs a robust distance-based framework for taxonomic assignment. Functionally similar to PASC (Pairwise Sequence Comparison), a widely used virus assignment tool based on pairwise sequence comparison, VISTA demonstrates superior performance by providing significantly improved separation for taxonomic groups, more objective taxonomic demarcation thresholds, greatly enhanced speed, and a wider application scope. We successfully applied VISTA to 38 virus families, as well as to the class Caudoviricetes. This demonstrates VISTA's scalability, robustness, and ability to automatically and accurately assign taxonomy to both prokaryotic and eukaryotic viruses. Furthermore, the application of VISTA to 679 unclassified prokaryotic virus genomes recovered from metagenomic data identified 46 novel virus families. VISTA is available as both a command line tool and a user-friendly web portal at https://ngdc.cncb.ac.cn/vista.