改进大型关系数据库的语义压缩规范

S. Darwish
{"title":"改进大型关系数据库的语义压缩规范","authors":"S. Darwish","doi":"10.1049/iet-sen.2015.0054","DOIUrl":null,"url":null,"abstract":"The large-scale relational databases normally have a large size and a high degree of sparsity. This has made database compression very important to improve the performance and save storage space. Using standard compression techniques (syntactic) such as Gzip or Zip does not take advantage of the relational properties, as these techniques do not look at the nature of the data. Since semantic compression accounts for and exploits both the meanings and dynamic ranges of error for individual attributes (lossy compression); and existing data dependencies and correlations between attributes in the table (lossless compression), it is very effective for table-data compression. Inspired by semantic compression, this study proposes a novel independent lossless compression system through utilising data-mining model to find the frequent pattern with maximum gain (representative row) in order to draw attribute semantics, besides a modified version of an augmented vector quantisation coder to increase total throughput of the database compression. This algorithm enables more granular and suitable for every kind of massive data tables after synthetically considering compression ratio, space, and speed. The experimentation with several very large real-life datasets indicates the superiority of the system with respect to previously known lossless semantic techniques.","PeriodicalId":13395,"journal":{"name":"IET Softw.","volume":"1 1","pages":"108-115"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Improving semantic compression specification in large relational database\",\"authors\":\"S. Darwish\",\"doi\":\"10.1049/iet-sen.2015.0054\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The large-scale relational databases normally have a large size and a high degree of sparsity. This has made database compression very important to improve the performance and save storage space. Using standard compression techniques (syntactic) such as Gzip or Zip does not take advantage of the relational properties, as these techniques do not look at the nature of the data. Since semantic compression accounts for and exploits both the meanings and dynamic ranges of error for individual attributes (lossy compression); and existing data dependencies and correlations between attributes in the table (lossless compression), it is very effective for table-data compression. Inspired by semantic compression, this study proposes a novel independent lossless compression system through utilising data-mining model to find the frequent pattern with maximum gain (representative row) in order to draw attribute semantics, besides a modified version of an augmented vector quantisation coder to increase total throughput of the database compression. This algorithm enables more granular and suitable for every kind of massive data tables after synthetically considering compression ratio, space, and speed. The experimentation with several very large real-life datasets indicates the superiority of the system with respect to previously known lossless semantic techniques.\",\"PeriodicalId\":13395,\"journal\":{\"name\":\"IET Softw.\",\"volume\":\"1 1\",\"pages\":\"108-115\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Softw.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1049/iet-sen.2015.0054\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Softw.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/iet-sen.2015.0054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

大型关系数据库通常具有较大的规模和高度的稀疏性。这使得数据库压缩对于提高性能和节省存储空间非常重要。使用诸如Gzip或Zip之类的标准压缩技术(语法)并不能利用关系属性,因为这些技术不考虑数据的性质。由于语义压缩考虑并利用了单个属性的含义和动态误差范围(有损压缩);对表中已有的数据依赖关系和属性之间的相关性(无损压缩),它对表数据的压缩非常有效。受语义压缩的启发,本研究提出了一种新的独立无损压缩系统,利用数据挖掘模型寻找增益最大的频繁模式(代表行)来绘制属性语义,并改进了增广矢量量化编码器来提高数据库压缩的总吞吐量。该算法在综合考虑压缩比、空间、速度等因素后,更细化,适合各类海量数据表。对几个非常大的真实数据集的实验表明,相对于先前已知的无损语义技术,该系统具有优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Improving semantic compression specification in large relational database
The large-scale relational databases normally have a large size and a high degree of sparsity. This has made database compression very important to improve the performance and save storage space. Using standard compression techniques (syntactic) such as Gzip or Zip does not take advantage of the relational properties, as these techniques do not look at the nature of the data. Since semantic compression accounts for and exploits both the meanings and dynamic ranges of error for individual attributes (lossy compression); and existing data dependencies and correlations between attributes in the table (lossless compression), it is very effective for table-data compression. Inspired by semantic compression, this study proposes a novel independent lossless compression system through utilising data-mining model to find the frequent pattern with maximum gain (representative row) in order to draw attribute semantics, besides a modified version of an augmented vector quantisation coder to increase total throughput of the database compression. This algorithm enables more granular and suitable for every kind of massive data tables after synthetically considering compression ratio, space, and speed. The experimentation with several very large real-life datasets indicates the superiority of the system with respect to previously known lossless semantic techniques.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Prioritising test scripts for the testing of memory bloat in web applications A synergic quantum particle swarm optimisation for constrained combinatorial test generation A hybrid model for prediction of software effort based on team size A 20-year mapping of Bayesian belief networks in software project management Emerging and multidisciplinary approaches to software engineering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1