改进大型关系数据库的语义压缩规范

IET Softw. Pub Date : 2016-08-01 DOI:10.1049/iet-sen.2015.0054

S. Darwish

{"title":"改进大型关系数据库的语义压缩规范","authors":"S. Darwish","doi":"10.1049/iet-sen.2015.0054","DOIUrl":null,"url":null,"abstract":"The large-scale relational databases normally have a large size and a high degree of sparsity. This has made database compression very important to improve the performance and save storage space. Using standard compression techniques (syntactic) such as Gzip or Zip does not take advantage of the relational properties, as these techniques do not look at the nature of the data. Since semantic compression accounts for and exploits both the meanings and dynamic ranges of error for individual attributes (lossy compression); and existing data dependencies and correlations between attributes in the table (lossless compression), it is very effective for table-data compression. Inspired by semantic compression, this study proposes a novel independent lossless compression system through utilising data-mining model to find the frequent pattern with maximum gain (representative row) in order to draw attribute semantics, besides a modified version of an augmented vector quantisation coder to increase total throughput of the database compression. This algorithm enables more granular and suitable for every kind of massive data tables after synthetically considering compression ratio, space, and speed. The experimentation with several very large real-life datasets indicates the superiority of the system with respect to previously known lossless semantic techniques.","PeriodicalId":13395,"journal":{"name":"IET Softw.","volume":"1 1","pages":"108-115"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Improving semantic compression specification in large relational database\",\"authors\":\"S. Darwish\",\"doi\":\"10.1049/iet-sen.2015.0054\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The large-scale relational databases normally have a large size and a high degree of sparsity. This has made database compression very important to improve the performance and save storage space. Using standard compression techniques (syntactic) such as Gzip or Zip does not take advantage of the relational properties, as these techniques do not look at the nature of the data. Since semantic compression accounts for and exploits both the meanings and dynamic ranges of error for individual attributes (lossy compression); and existing data dependencies and correlations between attributes in the table (lossless compression), it is very effective for table-data compression. Inspired by semantic compression, this study proposes a novel independent lossless compression system through utilising data-mining model to find the frequent pattern with maximum gain (representative row) in order to draw attribute semantics, besides a modified version of an augmented vector quantisation coder to increase total throughput of the database compression. This algorithm enables more granular and suitable for every kind of massive data tables after synthetically considering compression ratio, space, and speed. The experimentation with several very large real-life datasets indicates the superiority of the system with respect to previously known lossless semantic techniques.\",\"PeriodicalId\":13395,\"journal\":{\"name\":\"IET Softw.\",\"volume\":\"1 1\",\"pages\":\"108-115\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Softw.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1049/iet-sen.2015.0054\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Softw.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/iet-sen.2015.0054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

大型关系数据库通常具有较大的规模和高度的稀疏性。这使得数据库压缩对于提高性能和节省存储空间非常重要。使用诸如Gzip或Zip之类的标准压缩技术(语法)并不能利用关系属性，因为这些技术不考虑数据的性质。由于语义压缩考虑并利用了单个属性的含义和动态误差范围(有损压缩);对表中已有的数据依赖关系和属性之间的相关性(无损压缩)，它对表数据的压缩非常有效。受语义压缩的启发，本研究提出了一种新的独立无损压缩系统，利用数据挖掘模型寻找增益最大的频繁模式(代表行)来绘制属性语义，并改进了增广矢量量化编码器来提高数据库压缩的总吞吐量。该算法在综合考虑压缩比、空间、速度等因素后，更细化，适合各类海量数据表。对几个非常大的真实数据集的实验表明，相对于先前已知的无损语义技术，该系统具有优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Improving semantic compression specification in large relational database

The large-scale relational databases normally have a large size and a high degree of sparsity. This has made database compression very important to improve the performance and save storage space. Using standard compression techniques (syntactic) such as Gzip or Zip does not take advantage of the relational properties, as these techniques do not look at the nature of the data. Since semantic compression accounts for and exploits both the meanings and dynamic ranges of error for individual attributes (lossy compression); and existing data dependencies and correlations between attributes in the table (lossless compression), it is very effective for table-data compression. Inspired by semantic compression, this study proposes a novel independent lossless compression system through utilising data-mining model to find the frequent pattern with maximum gain (representative row) in order to draw attribute semantics, besides a modified version of an augmented vector quantisation coder to increase total throughput of the database compression. This algorithm enables more granular and suitable for every kind of massive data tables after synthetically considering compression ratio, space, and speed. The experimentation with several very large real-life datasets indicates the superiority of the system with respect to previously known lossless semantic techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IET Softw.

自引率

0.00%

发文量