{"title":"全基因组霰弹枪测序的数据集生成器。","authors":"G Myers","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Simulated data sets have been found to be useful in developing software systems because (1) they allow one to study the effect of a particular phenomenon in isolation, and (2) one has complete information about the true solution against which to measure the results of the software. In developing a software suite for assembling a whole human genome shotgun data set, we have developed a simulator, celsim, that permits one to describe and stochastically generate a target DNA sequence with a variety of repeat structures, to further generate polymorphic variants if desired, and to generate a shotgun data set that might be sampled from the target sequence(s). We have found the tool invaluable and quite powerful, yet the design is extremely simple, employing a special type of stochastic grammar.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A dataset generator for whole genome shotgun sequencing.\",\"authors\":\"G Myers\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Simulated data sets have been found to be useful in developing software systems because (1) they allow one to study the effect of a particular phenomenon in isolation, and (2) one has complete information about the true solution against which to measure the results of the software. In developing a software suite for assembling a whole human genome shotgun data set, we have developed a simulator, celsim, that permits one to describe and stochastically generate a target DNA sequence with a variety of repeat structures, to further generate polymorphic variants if desired, and to generate a shotgun data set that might be sampled from the target sequence(s). We have found the tool invaluable and quite powerful, yet the design is extremely simple, employing a special type of stochastic grammar.</p>\",\"PeriodicalId\":79420,\"journal\":{\"name\":\"Proceedings. International Conference on Intelligent Systems for Molecular Biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Conference on Intelligent Systems for Molecular Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A dataset generator for whole genome shotgun sequencing.
Simulated data sets have been found to be useful in developing software systems because (1) they allow one to study the effect of a particular phenomenon in isolation, and (2) one has complete information about the true solution against which to measure the results of the software. In developing a software suite for assembling a whole human genome shotgun data set, we have developed a simulator, celsim, that permits one to describe and stochastically generate a target DNA sequence with a variety of repeat structures, to further generate polymorphic variants if desired, and to generate a shotgun data set that might be sampled from the target sequence(s). We have found the tool invaluable and quite powerful, yet the design is extremely simple, employing a special type of stochastic grammar.