基于层次树结构的DNA序列自适应差分编码。

Proceedings. Data Compression Conference Pub Date : 2013-01-01 Epub Date: 2013-03-22 DOI:10.1109/DCC.2013.45
Wenrui Dai, Hongkai Xiong, Xiaoqian Jiang, Lucila Ohno-Machado
{"title":"基于层次树结构的DNA序列自适应差分编码。","authors":"Wenrui Dai,&nbsp;Hongkai Xiong,&nbsp;Xiaoqian Jiang,&nbsp;Lucila Ohno-Machado","doi":"10.1109/DCC.2013.45","DOIUrl":null,"url":null,"abstract":"<p><p>Previous reference-based compression on DNA sequences do not fully exploit the intrinsic statistics by merely concerning the approximate matches. In this paper, an adaptive difference distribution-based coding framework is proposed by the fragments of nucleotides with a hierarchical tree structure. To keep the distribution of difference sequence from the reference and target sequences concentrated, the sub-fragment size and matching offset for predicting are flexible to the stepped size structure. The matching with approximate repeats in reference will be imposed with the Hamming-like weighted distance measure function in a local region closed to the current fragment, such that the accuracy of matching and the overhead of describing matching offset can be balanced. A well-designed coding scheme will make compact both the difference sequence and the additional parameters, e.g. sub-fragment size and matching offset. Experimental results show that the proposed scheme achieves 150% compression improvement in comparison with the best reference-based compressor GReEn.</p>","PeriodicalId":91161,"journal":{"name":"Proceedings. Data Compression Conference","volume":"2013 ","pages":"371-380"},"PeriodicalIF":0.0000,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/DCC.2013.45","citationCount":"5","resultStr":"{\"title\":\"An Adaptive Difference Distribution-based Coding with Hierarchical Tree Structure for DNA Sequence Compression.\",\"authors\":\"Wenrui Dai,&nbsp;Hongkai Xiong,&nbsp;Xiaoqian Jiang,&nbsp;Lucila Ohno-Machado\",\"doi\":\"10.1109/DCC.2013.45\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Previous reference-based compression on DNA sequences do not fully exploit the intrinsic statistics by merely concerning the approximate matches. In this paper, an adaptive difference distribution-based coding framework is proposed by the fragments of nucleotides with a hierarchical tree structure. To keep the distribution of difference sequence from the reference and target sequences concentrated, the sub-fragment size and matching offset for predicting are flexible to the stepped size structure. The matching with approximate repeats in reference will be imposed with the Hamming-like weighted distance measure function in a local region closed to the current fragment, such that the accuracy of matching and the overhead of describing matching offset can be balanced. A well-designed coding scheme will make compact both the difference sequence and the additional parameters, e.g. sub-fragment size and matching offset. Experimental results show that the proposed scheme achieves 150% compression improvement in comparison with the best reference-based compressor GReEn.</p>\",\"PeriodicalId\":91161,\"journal\":{\"name\":\"Proceedings. Data Compression Conference\",\"volume\":\"2013 \",\"pages\":\"371-380\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/DCC.2013.45\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. Data Compression Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.2013.45\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2013/3/22 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2013.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2013/3/22 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

以往的基于参考的DNA序列压缩,并没有充分利用内在统计数据,仅仅关注近似匹配。本文提出了一种基于差异分布的自适应编码框架,该框架由核苷酸片段组成,具有层次树结构。为了保持与参考序列和目标序列的差异序列分布集中,预测的子片段大小和匹配偏移量可灵活地适应步长结构。在当前片段附近的局部区域,使用haming -like加权距离度量函数对参考中近似重复的匹配进行施加,这样可以平衡匹配的精度和描述匹配偏移量的开销。设计良好的编码方案将使差异序列和附加参数(如子片段大小和匹配偏移量)都变得紧凑。实验结果表明,与基于参考的最佳压缩器GReEn相比,该方案的压缩性能提高了150%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An Adaptive Difference Distribution-based Coding with Hierarchical Tree Structure for DNA Sequence Compression.

Previous reference-based compression on DNA sequences do not fully exploit the intrinsic statistics by merely concerning the approximate matches. In this paper, an adaptive difference distribution-based coding framework is proposed by the fragments of nucleotides with a hierarchical tree structure. To keep the distribution of difference sequence from the reference and target sequences concentrated, the sub-fragment size and matching offset for predicting are flexible to the stepped size structure. The matching with approximate repeats in reference will be imposed with the Hamming-like weighted distance measure function in a local region closed to the current fragment, such that the accuracy of matching and the overhead of describing matching offset can be balanced. A well-designed coding scheme will make compact both the difference sequence and the additional parameters, e.g. sub-fragment size and matching offset. Experimental results show that the proposed scheme achieves 150% compression improvement in comparison with the best reference-based compressor GReEn.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Faster Maximal Exact Matches with Lazy LCP Evaluation. Recursive Prefix-Free Parsing for Building Big BWTs. PHONI: Streamed Matching Statistics with Multi-Genome References. Client-Driven Transmission of JPEG2000 Image Sequences Using Motion Compensated Conditional Replenishment GeneComp, a new reference-based compressor for SAM files.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1