On Lowering Merge Costs of an LSM Tree

33rd International Conference on Scientific and Statistical Database Management Pub Date : 2021-07-06 DOI:10.1145/3468791.3468820

Dai Hai Ton That, Mohammad Gharehdaghi, A. Rasin, T. Malik

引用次数: 1

Abstract

In column stores, which ingest large amounts of data into multiple column groups, query performance deteriorates. Commercial column stores use log-structured merge (LSM) tree on projections to ingest data rapidly. LSM tree improves ingestion performance, but for column stores the sort-merge maintenance phase in an LSM tree is I/O-intensive, which slows concurrent queries and reduces overall throughput. In this paper, we present a simple heuristic approach to reduce the sorting and merging cost that arise when data is ingested in column stores. We demonstrate how a Min-Max heuristic can construct buckets and identify the level of sortedness in each range of data. Filled and relatively-sorted buckets are written out to disk; unfilled buckets are retained to achieve a better level of sortedness, thus avoiding the expensive sort-merge phase. We compare our Min-Max approach with LSM tree and production columnar stores using real and synthetic datasets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

关于降低LSM树的合并代价

在列存储中，将大量数据摄取到多个列组中，查询性能会下降。商业列存储在投影上使用日志结构合并(LSM)树来快速摄取数据。LSM树提高了摄取性能，但是对于列存储来说，LSM树中的排序合并维护阶段是I/ o密集型的，这会减慢并发查询的速度并降低总体吞吐量。在本文中，我们提出了一种简单的启发式方法，以减少在列存储中摄取数据时产生的排序和合并成本。我们将演示Min-Max启发式算法如何构建桶并识别每个数据范围中的排序级别。填充的和相对排序的桶被写入磁盘;保留未填充的桶以实现更好的排序级别，从而避免昂贵的排序合并阶段。我们将我们的最小-最大方法与LSM树和使用真实数据集和合成数据集的生产列式存储进行比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

33rd International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量

期刊最新文献

Caching Support for Range Query Processing on Bitmap Indices Distributed Enumeration of Four Node Graphlets at Quadrillion-Scale Automatic Selection of Analytic Platforms with ASAP-DM HInT: Hybrid and Incremental Type Discovery for Large RDF Data Sources On Lowering Merge Costs of an LSM Tree