Zi-Hao Hu, Ying Wang, Long Yang, Qing-Yi Cao, Ming Ling, Xiao-Hua Meng, Yao Chen, Shu-Jun Ni, Zhi Chen, Cheng-Zhi Liu, Kun-Kai Su
{"title":"评估用于细菌单核苷酸变异检测的 10 种不同管道","authors":"Zi-Hao Hu, Ying Wang, Long Yang, Qing-Yi Cao, Ming Ling, Xiao-Hua Meng, Yao Chen, Shu-Jun Ni, Zhi Chen, Cheng-Zhi Liu, Kun-Kai Su","doi":"10.1097/IM9.0000000000000134","DOIUrl":null,"url":null,"abstract":"Abstract Bacterial genome sequencing is a powerful technique for studying the genetic diversity and evolution of microbial populations. However, the detection of genomic variants from sequencing data is challenging due to the presence of contamination, sequencing errors and multiple strains within the same species. Several bioinformatics tools have been developed to address these issues, but their performance and accuracy have not been systematically evaluated. In this study, we compared 10 variant detection pipelines using 18 simulated and 17 real datasets of high-throughput sequences from a bundle of representative bacteria. We assessed the sensitivity of each pipeline under different conditions of coverage, simulation and strain diversity. We also demonstrated the application of these tools to identify consistent mutations in a 30-time repeated sequencing dataset of Staphylococcus hominis. We found that HaplotypeCaller, but not Mutect2, from the GATK tool set showed the best performance in terms of accuracy and robustness. CFSAN and Snippy performed not as well in several simulated and real sequencing datasets. Our results provided a comprehensive benchmark and guidance for choosing the optimal variant detection pipeline for high-throughput bacterial genome sequencing data.","PeriodicalId":73374,"journal":{"name":"Infectious microbes & diseases","volume":"238 5","pages":"172 - 179"},"PeriodicalIF":2.0000,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of 10 Different Pipelines for Bacterial Single-Nucleotide Variant Detection\",\"authors\":\"Zi-Hao Hu, Ying Wang, Long Yang, Qing-Yi Cao, Ming Ling, Xiao-Hua Meng, Yao Chen, Shu-Jun Ni, Zhi Chen, Cheng-Zhi Liu, Kun-Kai Su\",\"doi\":\"10.1097/IM9.0000000000000134\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Bacterial genome sequencing is a powerful technique for studying the genetic diversity and evolution of microbial populations. However, the detection of genomic variants from sequencing data is challenging due to the presence of contamination, sequencing errors and multiple strains within the same species. Several bioinformatics tools have been developed to address these issues, but their performance and accuracy have not been systematically evaluated. In this study, we compared 10 variant detection pipelines using 18 simulated and 17 real datasets of high-throughput sequences from a bundle of representative bacteria. We assessed the sensitivity of each pipeline under different conditions of coverage, simulation and strain diversity. We also demonstrated the application of these tools to identify consistent mutations in a 30-time repeated sequencing dataset of Staphylococcus hominis. We found that HaplotypeCaller, but not Mutect2, from the GATK tool set showed the best performance in terms of accuracy and robustness. CFSAN and Snippy performed not as well in several simulated and real sequencing datasets. Our results provided a comprehensive benchmark and guidance for choosing the optimal variant detection pipeline for high-throughput bacterial genome sequencing data.\",\"PeriodicalId\":73374,\"journal\":{\"name\":\"Infectious microbes & diseases\",\"volume\":\"238 5\",\"pages\":\"172 - 179\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2023-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Infectious microbes & diseases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1097/IM9.0000000000000134\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"INFECTIOUS DISEASES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infectious microbes & diseases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1097/IM9.0000000000000134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
Evaluation of 10 Different Pipelines for Bacterial Single-Nucleotide Variant Detection
Abstract Bacterial genome sequencing is a powerful technique for studying the genetic diversity and evolution of microbial populations. However, the detection of genomic variants from sequencing data is challenging due to the presence of contamination, sequencing errors and multiple strains within the same species. Several bioinformatics tools have been developed to address these issues, but their performance and accuracy have not been systematically evaluated. In this study, we compared 10 variant detection pipelines using 18 simulated and 17 real datasets of high-throughput sequences from a bundle of representative bacteria. We assessed the sensitivity of each pipeline under different conditions of coverage, simulation and strain diversity. We also demonstrated the application of these tools to identify consistent mutations in a 30-time repeated sequencing dataset of Staphylococcus hominis. We found that HaplotypeCaller, but not Mutect2, from the GATK tool set showed the best performance in terms of accuracy and robustness. CFSAN and Snippy performed not as well in several simulated and real sequencing datasets. Our results provided a comprehensive benchmark and guidance for choosing the optimal variant detection pipeline for high-throughput bacterial genome sequencing data.