HPCache: memory-efficient OLAP through proportional caching revisited

The VLDB Journal Pub Date : 2023-12-22 DOI:10.1007/s00778-023-00828-7

Hamish Nicholson, Periklis Chrysogelos, Anastasia Ailamaki

{"title":"HPCache: memory-efficient OLAP through proportional caching revisited","authors":"Hamish Nicholson, Periklis Chrysogelos, Anastasia Ailamaki","doi":"10.1007/s00778-023-00828-7","DOIUrl":null,"url":null,"abstract":"<p>Analytical engines rely on in-memory data caching to avoid storage accesses and provide timely responses by keeping the most frequently accessed data in memory. Purely frequency- and time-based caching decisions, however, are a proxy of the expected query execution speedup only when storage accesses are significantly slower than in-memory query processing. On the other hand, fast storage offers loading times that approach fully in-memory query response times, rendering purely frequency-based statistics incapable of capturing the impact of a caching decision on query execution. For example, caching the input of a frequent query that spends most of its time processing joins is less beneficial than caching a page for a slightly less frequent but scan-heavy query. Thus, existing caching policies waste valuable memory space to cache input data that offer little-to-no acceleration for analytics. This paper proposes HPCache, a buffer management policy that enables fast analytics on high-bandwidth storage by efficiently using the available in-memory space. HPCache caches data based on the speedup potential instead of relying on frequency-based statistics. We show that, with fast storage, the benefit of in-memory caching varies significantly across queries; therefore, we quantify the efficiency of caching decisions and formulate an optimization problem. We implement HPCache in Proteus and show that (i) estimating speedup potential improves memory space utilization, and (ii) simple runtime statistics suffice to infer speedup. We show that HPCache achieves up to a 1.75x speed-up over frequency-based caching policies by caching column proportions and automatically tuning them. Overall, HPCache enables efficient use of the in-memory space for input caching in the presence of fast storage, without requiring workload predictions.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"74 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The VLDB Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00778-023-00828-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Analytical engines rely on in-memory data caching to avoid storage accesses and provide timely responses by keeping the most frequently accessed data in memory. Purely frequency- and time-based caching decisions, however, are a proxy of the expected query execution speedup only when storage accesses are significantly slower than in-memory query processing. On the other hand, fast storage offers loading times that approach fully in-memory query response times, rendering purely frequency-based statistics incapable of capturing the impact of a caching decision on query execution. For example, caching the input of a frequent query that spends most of its time processing joins is less beneficial than caching a page for a slightly less frequent but scan-heavy query. Thus, existing caching policies waste valuable memory space to cache input data that offer little-to-no acceleration for analytics. This paper proposes HPCache, a buffer management policy that enables fast analytics on high-bandwidth storage by efficiently using the available in-memory space. HPCache caches data based on the speedup potential instead of relying on frequency-based statistics. We show that, with fast storage, the benefit of in-memory caching varies significantly across queries; therefore, we quantify the efficiency of caching decisions and formulate an optimization problem. We implement HPCache in Proteus and show that (i) estimating speedup potential improves memory space utilization, and (ii) simple runtime statistics suffice to infer speedup. We show that HPCache achieves up to a 1.75x speed-up over frequency-based caching policies by caching column proportions and automatically tuning them. Overall, HPCache enables efficient use of the in-memory space for input caching in the presence of fast storage, without requiring workload predictions.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

HPCache：通过比例缓存重温内存效率高的 OLAP

分析引擎依靠内存数据缓存来避免存储访问，并通过在内存中保留访问频率最高的数据来提供及时响应。然而，只有当存储访问明显慢于内存查询处理时，纯粹基于频率和时间的缓存决策才能代表预期的查询执行速度提升。另一方面，快速存储提供的加载时间接近完全内存查询响应时间，这使得纯粹基于频率的统计无法捕捉缓存决策对查询执行的影响。例如，缓存一个频繁查询的输入（该查询大部分时间用于处理连接），不如缓存一个频率稍低但扫描量大的查询页面。因此，现有的缓存策略浪费了宝贵的内存空间来缓存输入数据，对分析几乎没有任何加速作用。本文提出的 HPCache 是一种缓冲区管理策略，可通过有效利用可用的内存空间，在高带宽存储上实现快速分析。HPCache 根据加速潜力缓存数据，而不是依赖基于频率的统计数据。我们的研究表明，在快速存储的情况下，内存缓存的优势在不同查询中差别很大；因此，我们量化了缓存决策的效率，并提出了一个优化问题。我们在 Proteus 中实现了 HPCache，并证明：(i) 估算加速潜力可提高内存空间利用率；(ii) 简单的运行时统计数据足以推断出加速情况。我们表明，通过缓存列比例并自动调整它们，HPCache 比基于频率的缓存策略最多可提高 1.75 倍的速度。总之，HPCache 可以在快速存储的情况下高效利用内存空间进行输入缓存，而无需进行工作负载预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The VLDB Journal

自引率

0.00%

发文量