Fast Dictionary-Based Compression for Inverted Indexes

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining Pub Date : 2019-01-30 DOI:10.1145/3289600.3290962

Giulio Ermanno Pibiri, M. Petri, Alistair Moffat

引用次数: 18

Abstract

Dictionary-based compression schemes provide fast decoding operation, typically at the expense of reduced compression effectiveness compared to statistical or probability-based approaches. In this work, we apply dictionary-based techniques to the compression of inverted lists, showing that the high degree of regularity that these integer sequences exhibit is a good match for certain types of dictionary methods, and that an important new trade-off balance between compression effectiveness and compression efficiency can be achieved. Our observations are supported by experiments using the document-level inverted index data for two large text collections, and a wide range of other index compression implementations as reference points. Those experiments demonstrate that the gap between efficiency and effectiveness can be substantially narrowed.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于字典的倒排索引快速压缩

基于字典的压缩方案提供了快速的解码操作，与基于统计或概率的方法相比，通常是以降低压缩效率为代价的。在这项工作中，我们将基于字典的技术应用于倒排表的压缩，表明这些整数序列表现出的高度规律性与某些类型的字典方法很好地匹配，并且可以实现压缩有效性和压缩效率之间的重要权衡。我们的观察结果得到了实验的支持，实验使用了两个大型文本集合的文档级倒排索引数据，并将大量其他索引压缩实现作为参考点。这些实验表明，效率和效果之间的差距可以大大缩小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

自引率

0.00%

发文量

期刊最新文献

DAPA: The WSDM 2019 Workshop on Deep Matching in Practical Applications Solving the Sparsity Problem in Recommendations via Cross-Domain Item Embedding Based on Co-Clustering More Than Just Words: Modeling Non-Textual Characteristics of Podcasts Pleasant Route Suggestion based on Color and Object Rates Session details: Session 6: Networks and Social Behavior