{"title":"SeqMaker:下一代测序模拟器与变异,测序误差和扩增偏差集成","authors":"Shifu Chen, Yue Han, Lanting Guo, Jing-Shan Hu, Jia Gu","doi":"10.1109/BIBM.2016.7822634","DOIUrl":null,"url":null,"abstract":"Tuning bioinformatics pipelines and training software parameters require sequencing data with known ground truth, which are actually difficult to get from real sequencing data. Particularly, for those applications of detecting low frequency variations (like ctDNA sequencing), it is hard to tell whether a called variation is a true positive, or a false positive caused by errors from sequencing or other processes. In these cases, simulated data with configured variations can be used to troubleshoot and validate bioinformatics programs. Although lots of next generation sequencing simulators have already been developed, most of them lack of capability to simulate lots of practical features, such like target capturing sequencing, copy number variations, gene fusions, amplification bias and sequencing errors. In this paper, we will present SeqMaker, a modern NGS simulator with capability to simulate different kinds of variations, with amplification bias and sequencing errors integrated. Target capturing sequencing is simply supported by using a capturing panel description file, other characteristics like sequencing error rate, average duplication level, DNA template length distribution and quality distribution can be easily configured with a simple JSON format profile file. With the integration sequencing errors and amplification bias, SeqMaker is able to simulate more real next generation sequencing data. The configurable variants and capturing regions make SeqMaker very useful to generate data for training bioinformatics pipelines for applications like somatic mutation calling.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"11 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"SeqMaker: A next generation sequencing simulator with variations, sequencing errors and amplification bias integrated\",\"authors\":\"Shifu Chen, Yue Han, Lanting Guo, Jing-Shan Hu, Jia Gu\",\"doi\":\"10.1109/BIBM.2016.7822634\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Tuning bioinformatics pipelines and training software parameters require sequencing data with known ground truth, which are actually difficult to get from real sequencing data. Particularly, for those applications of detecting low frequency variations (like ctDNA sequencing), it is hard to tell whether a called variation is a true positive, or a false positive caused by errors from sequencing or other processes. In these cases, simulated data with configured variations can be used to troubleshoot and validate bioinformatics programs. Although lots of next generation sequencing simulators have already been developed, most of them lack of capability to simulate lots of practical features, such like target capturing sequencing, copy number variations, gene fusions, amplification bias and sequencing errors. In this paper, we will present SeqMaker, a modern NGS simulator with capability to simulate different kinds of variations, with amplification bias and sequencing errors integrated. Target capturing sequencing is simply supported by using a capturing panel description file, other characteristics like sequencing error rate, average duplication level, DNA template length distribution and quality distribution can be easily configured with a simple JSON format profile file. With the integration sequencing errors and amplification bias, SeqMaker is able to simulate more real next generation sequencing data. The configurable variants and capturing regions make SeqMaker very useful to generate data for training bioinformatics pipelines for applications like somatic mutation calling.\",\"PeriodicalId\":345384,\"journal\":{\"name\":\"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"volume\":\"11 2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM.2016.7822634\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2016.7822634","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SeqMaker: A next generation sequencing simulator with variations, sequencing errors and amplification bias integrated
Tuning bioinformatics pipelines and training software parameters require sequencing data with known ground truth, which are actually difficult to get from real sequencing data. Particularly, for those applications of detecting low frequency variations (like ctDNA sequencing), it is hard to tell whether a called variation is a true positive, or a false positive caused by errors from sequencing or other processes. In these cases, simulated data with configured variations can be used to troubleshoot and validate bioinformatics programs. Although lots of next generation sequencing simulators have already been developed, most of them lack of capability to simulate lots of practical features, such like target capturing sequencing, copy number variations, gene fusions, amplification bias and sequencing errors. In this paper, we will present SeqMaker, a modern NGS simulator with capability to simulate different kinds of variations, with amplification bias and sequencing errors integrated. Target capturing sequencing is simply supported by using a capturing panel description file, other characteristics like sequencing error rate, average duplication level, DNA template length distribution and quality distribution can be easily configured with a simple JSON format profile file. With the integration sequencing errors and amplification bias, SeqMaker is able to simulate more real next generation sequencing data. The configurable variants and capturing regions make SeqMaker very useful to generate data for training bioinformatics pipelines for applications like somatic mutation calling.