I. Zhbannikov, Samuel S. Hunter, J. Foster, M. Settles
{"title":"SeqyClean: A Pipeline for High-throughput Sequence Data Preprocessing","authors":"I. Zhbannikov, Samuel S. Hunter, J. Foster, M. Settles","doi":"10.1145/3107411.3107446","DOIUrl":null,"url":null,"abstract":"Modern high-throughput sequencing instruments produce massive amounts of data, which often contains noise in the form of sequencing errors, sequencing adaptors, and contaminating reads. This noise complicates genomics studies. Although many preprocessing software tools have been developed to reduce the sequence noise, many of them cannot handle data from multiple technologies and few address more than one type of noise. We present SeqyClean, a comprehensive preprocessing software pipeline. SeqyClean effectively removes multiple sources of noise in high throughput sequence data and, according to our tests, outperforms other available preprocessing tools. We show that preprocessing data with SeqyClean first improves both de-novo genome assembly and genome mapping. We have used SeqyClean extensively in the genomics core at the Institute for Bioinformatics and Evolutionary STudies (IBEST) at the University of Idaho, so it has been validated with both test and production data. SeqyClean is available as open source software under the MIT License at http://github.com/ibest/seqyclean","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3107411.3107446","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 58
Abstract
Modern high-throughput sequencing instruments produce massive amounts of data, which often contains noise in the form of sequencing errors, sequencing adaptors, and contaminating reads. This noise complicates genomics studies. Although many preprocessing software tools have been developed to reduce the sequence noise, many of them cannot handle data from multiple technologies and few address more than one type of noise. We present SeqyClean, a comprehensive preprocessing software pipeline. SeqyClean effectively removes multiple sources of noise in high throughput sequence data and, according to our tests, outperforms other available preprocessing tools. We show that preprocessing data with SeqyClean first improves both de-novo genome assembly and genome mapping. We have used SeqyClean extensively in the genomics core at the Institute for Bioinformatics and Evolutionary STudies (IBEST) at the University of Idaho, so it has been validated with both test and production data. SeqyClean is available as open source software under the MIT License at http://github.com/ibest/seqyclean