Reference-based data compression for genome in cloud

Haixiang Shi, Yongqing Zhu, J. Samsudin
{"title":"Reference-based data compression for genome in cloud","authors":"Haixiang Shi, Yongqing Zhu, J. Samsudin","doi":"10.1145/3018009.3018030","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a new reference-based data compression method for efficient compressing of genome sequencing data in FASTQ format. With the advance of the next sequencing technology, the genome data can be generated faster and cheaper, which brings the challenges for efficient storage of these data when used in cloud computing. In order to efficiently store these types of genome data in cloud, content-aware compressing methods have to be developed to make use of the specific file structures. Compared with existing genome-specific compression methods, our proposed content-aware method focused on high compression ratio by taking advantages of repetitive nature of DNA sequence, and using reference genomes in compressing the sequences inside the FASTQ files. The benchmark results of 8 datasets show that our method can achieve highest compression ratio compared with existing FASTQ file compressors.","PeriodicalId":189252,"journal":{"name":"Proceedings of the 2nd International Conference on Communication and Information Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd International Conference on Communication and Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3018009.3018030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In this paper, we propose a new reference-based data compression method for efficient compressing of genome sequencing data in FASTQ format. With the advance of the next sequencing technology, the genome data can be generated faster and cheaper, which brings the challenges for efficient storage of these data when used in cloud computing. In order to efficiently store these types of genome data in cloud, content-aware compressing methods have to be developed to make use of the specific file structures. Compared with existing genome-specific compression methods, our proposed content-aware method focused on high compression ratio by taking advantages of repetitive nature of DNA sequence, and using reference genomes in compressing the sequences inside the FASTQ files. The benchmark results of 8 datasets show that our method can achieve highest compression ratio compared with existing FASTQ file compressors.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于参考的云基因组数据压缩
本文提出了一种新的基于参考的数据压缩方法,可以有效地压缩FASTQ格式的基因组测序数据。随着下一代测序技术的进步,基因组数据的生成速度更快,成本更低,这就给在云计算中使用这些数据时的高效存储带来了挑战。为了在云中有效地存储这些类型的基因组数据,必须开发内容感知压缩方法来利用特定的文件结构。与现有的基因组特异性压缩方法相比,我们提出的内容感知方法利用DNA序列的重复性,利用参考基因组压缩FASTQ文件内的序列,实现了高压缩比。8个数据集的基准测试结果表明,与现有的FASTQ文件压缩器相比,我们的方法可以实现最高的压缩比。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Integration and exchange method of multi-source heterogeneous big data for intelligent power distribution and utilization Training method for vehicle detection Pilot decontamination in multi-cell massive MIMO systems Point of sales application based on cloud computing adoption for indonesian small medium enterprise: qualitative study Calculating different weights in feature values in logistic regression
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1