HPCache: memory-efficient OLAP through proportional caching revisited

Hamish Nicholson, Periklis Chrysogelos, Anastasia Ailamaki
{"title":"HPCache: memory-efficient OLAP through proportional caching revisited","authors":"Hamish Nicholson, Periklis Chrysogelos, Anastasia Ailamaki","doi":"10.1007/s00778-023-00828-7","DOIUrl":null,"url":null,"abstract":"<p>Analytical engines rely on in-memory data caching to avoid storage accesses and provide timely responses by keeping the most frequently accessed data in memory. Purely frequency- and time-based caching decisions, however, are a proxy of the expected query execution speedup only when storage accesses are significantly slower than in-memory query processing. On the other hand, fast storage offers loading times that approach fully in-memory query response times, rendering purely frequency-based statistics incapable of capturing the impact of a caching decision on query execution. For example, caching the input of a frequent query that spends most of its time processing joins is less beneficial than caching a page for a slightly less frequent but scan-heavy query. Thus, existing caching policies waste valuable memory space to cache input data that offer little-to-no acceleration for analytics. This paper proposes HPCache, a buffer management policy that enables fast analytics on high-bandwidth storage by efficiently using the available in-memory space. HPCache caches data based on the speedup potential instead of relying on frequency-based statistics. We show that, with fast storage, the benefit of in-memory caching varies significantly across queries; therefore, we quantify the efficiency of caching decisions and formulate an optimization problem. We implement HPCache in Proteus and show that (i) estimating speedup potential improves memory space utilization, and (ii) simple runtime statistics suffice to infer speedup. We show that HPCache achieves up to a 1.75x speed-up over frequency-based caching policies by caching column proportions and automatically tuning them. Overall, HPCache enables efficient use of the in-memory space for input caching in the presence of fast storage, without requiring workload predictions.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"74 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The VLDB Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00778-023-00828-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Analytical engines rely on in-memory data caching to avoid storage accesses and provide timely responses by keeping the most frequently accessed data in memory. Purely frequency- and time-based caching decisions, however, are a proxy of the expected query execution speedup only when storage accesses are significantly slower than in-memory query processing. On the other hand, fast storage offers loading times that approach fully in-memory query response times, rendering purely frequency-based statistics incapable of capturing the impact of a caching decision on query execution. For example, caching the input of a frequent query that spends most of its time processing joins is less beneficial than caching a page for a slightly less frequent but scan-heavy query. Thus, existing caching policies waste valuable memory space to cache input data that offer little-to-no acceleration for analytics. This paper proposes HPCache, a buffer management policy that enables fast analytics on high-bandwidth storage by efficiently using the available in-memory space. HPCache caches data based on the speedup potential instead of relying on frequency-based statistics. We show that, with fast storage, the benefit of in-memory caching varies significantly across queries; therefore, we quantify the efficiency of caching decisions and formulate an optimization problem. We implement HPCache in Proteus and show that (i) estimating speedup potential improves memory space utilization, and (ii) simple runtime statistics suffice to infer speedup. We show that HPCache achieves up to a 1.75x speed-up over frequency-based caching policies by caching column proportions and automatically tuning them. Overall, HPCache enables efficient use of the in-memory space for input caching in the presence of fast storage, without requiring workload predictions.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HPCache:通过比例缓存重温内存效率高的 OLAP
分析引擎依靠内存数据缓存来避免存储访问,并通过在内存中保留访问频率最高的数据来提供及时响应。然而,只有当存储访问明显慢于内存查询处理时,纯粹基于频率和时间的缓存决策才能代表预期的查询执行速度提升。另一方面,快速存储提供的加载时间接近完全内存查询响应时间,这使得纯粹基于频率的统计无法捕捉缓存决策对查询执行的影响。例如,缓存一个频繁查询的输入(该查询大部分时间用于处理连接),不如缓存一个频率稍低但扫描量大的查询页面。因此,现有的缓存策略浪费了宝贵的内存空间来缓存输入数据,对分析几乎没有任何加速作用。本文提出的 HPCache 是一种缓冲区管理策略,可通过有效利用可用的内存空间,在高带宽存储上实现快速分析。HPCache 根据加速潜力缓存数据,而不是依赖基于频率的统计数据。我们的研究表明,在快速存储的情况下,内存缓存的优势在不同查询中差别很大;因此,我们量化了缓存决策的效率,并提出了一个优化问题。我们在 Proteus 中实现了 HPCache,并证明:(i) 估算加速潜力可提高内存空间利用率;(ii) 简单的运行时统计数据足以推断出加速情况。我们表明,通过缓存列比例并自动调整它们,HPCache 比基于频率的缓存策略最多可提高 1.75 倍的速度。总之,HPCache 可以在快速存储的情况下高效利用内存空间进行输入缓存,而无需进行工作负载预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A versatile framework for attributed network clustering via K-nearest neighbor augmentation Discovering critical vertices for reinforcement of large-scale bipartite networks DumpyOS: A data-adaptive multi-ary index for scalable data series similarity search Enabling space-time efficient range queries with REncoder AutoCTS++: zero-shot joint neural architecture and hyperparameter search for correlated time series forecasting
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1