Objective and comprehensive evaluation of bisulfite short read mapping tools.

Q1 Biochemistry, Genetics and Molecular Biology Advances in Bioinformatics Pub Date : 2014-01-01 Epub Date: 2014-04-15 DOI:10.1155/2014/472045
Hong Tran, Jacob Porter, Ming-An Sun, Hehuang Xie, Liqing Zhang
{"title":"Objective and comprehensive evaluation of bisulfite short read mapping tools.","authors":"Hong Tran,&nbsp;Jacob Porter,&nbsp;Ming-An Sun,&nbsp;Hehuang Xie,&nbsp;Liqing Zhang","doi":"10.1155/2014/472045","DOIUrl":null,"url":null,"abstract":"<p><p>Background. Large-scale bisulfite treatment and short reads sequencing technology allow comprehensive estimation of methylation states of Cs in the genomes of different tissues, cell types, and developmental stages. Accurate characterization of DNA methylation is essential for understanding genotype phenotype association, gene and environment interaction, diseases, and cancer. Aligning bisulfite short reads to a reference genome has been a challenging task. We compared five bisulfite short read mapping tools, BSMAP, Bismark, BS-Seeker, BiSS, and BRAT-BW, representing two classes of mapping algorithms (hash table and suffix/prefix tries). We examined their mapping efficiency (i.e., the percentage of reads that can be mapped to the genomes), usability, running time, and effects of changing default parameter settings using both real and simulated reads. We also investigated how preprocessing data might affect mapping efficiency. Conclusion. Among the five programs compared, in terms of mapping efficiency, Bismark performs the best on the real data, followed by BiSS, BSMAP, and finally BRAT-BW and BS-Seeker with very similar performance. If CPU time is not a constraint, Bismark is a good choice of program for mapping bisulfite treated short reads. Data quality impacts a great deal mapping efficiency. Although increasing the number of mismatches allowed can increase mapping efficiency, it not only significantly slows down the program, but also runs the risk of having increased false positives. Therefore, users should carefully set the related parameters depending on the quality of their sequencing data. </p>","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2014 ","pages":"472045"},"PeriodicalIF":0.0000,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2014/472045","citationCount":"47","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2014/472045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2014/4/15 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 47

Abstract

Background. Large-scale bisulfite treatment and short reads sequencing technology allow comprehensive estimation of methylation states of Cs in the genomes of different tissues, cell types, and developmental stages. Accurate characterization of DNA methylation is essential for understanding genotype phenotype association, gene and environment interaction, diseases, and cancer. Aligning bisulfite short reads to a reference genome has been a challenging task. We compared five bisulfite short read mapping tools, BSMAP, Bismark, BS-Seeker, BiSS, and BRAT-BW, representing two classes of mapping algorithms (hash table and suffix/prefix tries). We examined their mapping efficiency (i.e., the percentage of reads that can be mapped to the genomes), usability, running time, and effects of changing default parameter settings using both real and simulated reads. We also investigated how preprocessing data might affect mapping efficiency. Conclusion. Among the five programs compared, in terms of mapping efficiency, Bismark performs the best on the real data, followed by BiSS, BSMAP, and finally BRAT-BW and BS-Seeker with very similar performance. If CPU time is not a constraint, Bismark is a good choice of program for mapping bisulfite treated short reads. Data quality impacts a great deal mapping efficiency. Although increasing the number of mismatches allowed can increase mapping efficiency, it not only significantly slows down the program, but also runs the risk of having increased false positives. Therefore, users should carefully set the related parameters depending on the quality of their sequencing data.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
亚硫酸氢盐短读测工具的客观综合评价。
背景。大规模亚硫酸氢盐处理和短reads测序技术可以全面估计不同组织、细胞类型和发育阶段基因组中Cs的甲基化状态。DNA甲基化的准确表征对于理解基因型表型关联、基因与环境相互作用、疾病和癌症至关重要。亚硫酸氢盐短序列与参考基因组的比对一直是一项具有挑战性的任务。我们比较了五种亚硫酸盐短读映射工具,BSMAP, Bismark, BS-Seeker, bis和BRAT-BW,代表了两类映射算法(哈希表和后缀/前缀尝试)。我们检查了它们的映射效率(即,可以映射到基因组的读取的百分比)、可用性、运行时间,以及使用真实和模拟读取更改默认参数设置的效果。我们还研究了预处理数据如何影响映射效率。结论。在比较的五个程序中,在真实数据的映射效率方面,Bismark表现最好,其次是bis、BSMAP,最后是BRAT-BW和BS-Seeker,两者的性能非常接近。如果CPU时间不受限制,Bismark是一个很好的选择,用于绘制亚硫酸氢盐处理的短读取。数据质量对映射效率有很大影响。尽管增加允许的不匹配数量可以提高映射效率,但它不仅会显著降低程序的速度,而且还会增加误报的风险。因此,用户应根据其测序数据的质量仔细设置相关参数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Advances in Bioinformatics
Advances in Bioinformatics Biochemistry, Genetics and Molecular Biology-Biochemistry, Genetics and Molecular Biology (miscellaneous)
自引率
0.00%
发文量
0
期刊最新文献
Computational Genomics A Guide to RNAseq Data Analysis Using Bioinformatics Approaches Computational Metabolomics Bioinformatics in Personalized Medicine Bioinformatics Tools for Gene and Genome Annotation Analysis of Microbes for Synthetic Biology and Cancer Biology Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1