为提升推理和减少存储而对质量分数去噪。

Proceedings. Data Compression Conference Pub Date : 2016-03-01 Epub Date: 2016-12-19 DOI:10.1109/DCC.2016.92
Idoia Ochoa, Mikel Hernaez, Rachel Goldfeder, Tsachy Weissman, Euan Ashley
{"title":"为提升推理和减少存储而对质量分数去噪。","authors":"Idoia Ochoa, Mikel Hernaez, Rachel Goldfeder, Tsachy Weissman, Euan Ashley","doi":"10.1109/DCC.2016.92","DOIUrl":null,"url":null,"abstract":"<p><p>Massive amounts of sequencing data are being generated thanks to advances in sequencing technology and a dramatic drop in the sequencing cost. Much of the raw data are comprised of nucleotides and the corresponding quality scores that indicate their reliability. The latter are more difficult to compress and are themselves noisy. Lossless and lossy compression of the quality scores has recently been proposed to alleviate the storage costs, but reducing the noise in the quality scores has remained largely unexplored. This raw data is processed in order to identify variants; these genetic variants are used in important applications, such as medical decision making. Thus improving the performance of the variant calling by reducing the noise contained in the quality scores is important. We propose a denoising scheme that reduces the noise of the quality scores and we demonstrate improved inference with this denoised data. Specifically, we show that replacing the quality scores with those generated by the proposed denoiser results in more accurate variant calling in general. Moreover, a consequence of the denoising is that the entropy of the produced quality scores is smaller, and thus significant compression can be achieved with respect to lossless compression of the original quality scores. We expect our results to provide a baseline for future research in denoising of quality scores. The code used in this work as well as a Supplement with all the results are available at http://web.stanford.edu/~iochoa/DCCdenoiser_CodeAndSupplement.zip.</p>","PeriodicalId":91161,"journal":{"name":"Proceedings. Data Compression Conference","volume":"2016 ","pages":"251-260"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5663231/pdf/nihms910316.pdf","citationCount":"0","resultStr":"{\"title\":\"Denoising of Quality Scores for Boosted Inference and Reduced Storage.\",\"authors\":\"Idoia Ochoa, Mikel Hernaez, Rachel Goldfeder, Tsachy Weissman, Euan Ashley\",\"doi\":\"10.1109/DCC.2016.92\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Massive amounts of sequencing data are being generated thanks to advances in sequencing technology and a dramatic drop in the sequencing cost. Much of the raw data are comprised of nucleotides and the corresponding quality scores that indicate their reliability. The latter are more difficult to compress and are themselves noisy. Lossless and lossy compression of the quality scores has recently been proposed to alleviate the storage costs, but reducing the noise in the quality scores has remained largely unexplored. This raw data is processed in order to identify variants; these genetic variants are used in important applications, such as medical decision making. Thus improving the performance of the variant calling by reducing the noise contained in the quality scores is important. We propose a denoising scheme that reduces the noise of the quality scores and we demonstrate improved inference with this denoised data. Specifically, we show that replacing the quality scores with those generated by the proposed denoiser results in more accurate variant calling in general. Moreover, a consequence of the denoising is that the entropy of the produced quality scores is smaller, and thus significant compression can be achieved with respect to lossless compression of the original quality scores. We expect our results to provide a baseline for future research in denoising of quality scores. The code used in this work as well as a Supplement with all the results are available at http://web.stanford.edu/~iochoa/DCCdenoiser_CodeAndSupplement.zip.</p>\",\"PeriodicalId\":91161,\"journal\":{\"name\":\"Proceedings. Data Compression Conference\",\"volume\":\"2016 \",\"pages\":\"251-260\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5663231/pdf/nihms910316.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. Data Compression Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.2016.92\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2016/12/19 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2016.92","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2016/12/19 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

由于测序技术的进步和测序成本的大幅下降,正在产生大量的测序数据。大部分原始数据由核苷酸和表示其可靠性的相应质量分数组成。后者更难压缩,而且本身有噪声。最近有人提出对质量分数进行无损压缩和有损压缩,以降低存储成本,但降低质量分数中的噪声在很大程度上仍有待探索。对这些原始数据进行处理是为了识别变异;这些遗传变异被用于医疗决策等重要应用中。因此,通过减少质量分数中的噪声来提高变体调用性能非常重要。我们提出了一种去噪方案,可以降低质量分数的噪声,我们还展示了利用这种去噪数据进行推断的改进。具体来说,我们证明了用所提出的去噪器生成的质量得分来替换质量得分,一般来说会提高变异调用的准确性。此外,去噪的一个后果是生成的质量分数的熵更小,因此与原始质量分数的无损压缩相比,可以实现显著的压缩。我们希望我们的研究结果能为今后的质量分数去噪研究提供一个基准。这项工作中使用的代码以及包含所有结果的补编可在 http://web.stanford.edu/~iochoa/DCCdenoiser_CodeAndSupplement.zip 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Denoising of Quality Scores for Boosted Inference and Reduced Storage.

Massive amounts of sequencing data are being generated thanks to advances in sequencing technology and a dramatic drop in the sequencing cost. Much of the raw data are comprised of nucleotides and the corresponding quality scores that indicate their reliability. The latter are more difficult to compress and are themselves noisy. Lossless and lossy compression of the quality scores has recently been proposed to alleviate the storage costs, but reducing the noise in the quality scores has remained largely unexplored. This raw data is processed in order to identify variants; these genetic variants are used in important applications, such as medical decision making. Thus improving the performance of the variant calling by reducing the noise contained in the quality scores is important. We propose a denoising scheme that reduces the noise of the quality scores and we demonstrate improved inference with this denoised data. Specifically, we show that replacing the quality scores with those generated by the proposed denoiser results in more accurate variant calling in general. Moreover, a consequence of the denoising is that the entropy of the produced quality scores is smaller, and thus significant compression can be achieved with respect to lossless compression of the original quality scores. We expect our results to provide a baseline for future research in denoising of quality scores. The code used in this work as well as a Supplement with all the results are available at http://web.stanford.edu/~iochoa/DCCdenoiser_CodeAndSupplement.zip.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Faster Maximal Exact Matches with Lazy LCP Evaluation. Recursive Prefix-Free Parsing for Building Big BWTs. PHONI: Streamed Matching Statistics with Multi-Genome References. Client-Driven Transmission of JPEG2000 Image Sequences Using Motion Compensated Conditional Replenishment GeneComp, a new reference-based compressor for SAM files.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1