Multigrain: Adaptive multilevel hot data identifier with a stack distance-based prefilter

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-02-19 DOI:10.1016/j.future.2025.107762

Hyerim Lee , Dongchul Park

{"title":"Multigrain: Adaptive multilevel hot data identifier with a stack distance-based prefilter","authors":"Hyerim Lee , Dongchul Park","doi":"10.1016/j.future.2025.107762","DOIUrl":null,"url":null,"abstract":"<div><div>Many computer system applications, such as data caching and Not AND (NAND) flash memory-based storage systems, employ a hot data identification scheme. However, regardless of the workload characteristics, most existing studies have adopted only a fine-grained (i.e., block-level) hot data decision policy, causing high computational overhead and error rates. Different workloads mandate different treatments to achieve effective hot data identification. Based on our comprehensive workload studies, this paper proposes Multigrain, an <em>adaptive multilevel</em> hot data identification scheme that dynamically selects a coarse-grained (i.e., subrequest-level) policy or coarser-grained (i.e., request-level) policy based on the workload. The proposed Multigrain employs multiple effective bloom filters to capture frequency and recency information. Moreover, it adopts a simple and smart <em>prefilter mechanism</em> leveraging workload stack distance information. To our knowledge, the proposed scheme is the <em>first multilevel coarse-grained hot data identification scheme</em> that judiciously selects an optimal hot data decision granularity to achieve effective and accurate identification. Our extensive experiments with many realistic workloads demonstrate that our adaptive multilevel scheme significantly reduces the execution time (by an average of up to 6.9<span><math><mo>×</mo></math></span>) and error rate (by an average of up to 2.27<span><math><mo>×</mo></math></span>) using the effective coarse-grained policies and a prefiltering mechanism.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"167 ","pages":"Article 107762"},"PeriodicalIF":6.2000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25000573","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Many computer system applications, such as data caching and Not AND (NAND) flash memory-based storage systems, employ a hot data identification scheme. However, regardless of the workload characteristics, most existing studies have adopted only a fine-grained (i.e., block-level) hot data decision policy, causing high computational overhead and error rates. Different workloads mandate different treatments to achieve effective hot data identification. Based on our comprehensive workload studies, this paper proposes Multigrain, an adaptive multilevel hot data identification scheme that dynamically selects a coarse-grained (i.e., subrequest-level) policy or coarser-grained (i.e., request-level) policy based on the workload. The proposed Multigrain employs multiple effective bloom filters to capture frequency and recency information. Moreover, it adopts a simple and smart prefilter mechanism leveraging workload stack distance information. To our knowledge, the proposed scheme is the first multilevel coarse-grained hot data identification scheme that judiciously selects an optimal hot data decision granularity to achieve effective and accurate identification. Our extensive experiments with many realistic workloads demonstrate that our adaptive multilevel scheme significantly reduces the execution time (by an average of up to 6.9

\times

) and error rate (by an average of up to 2.27

\times

) using the effective coarse-grained policies and a prefiltering mechanism.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多粒：具有基于堆栈距离的预滤波器的自适应多级别热数据标识符

许多计算机系统应用，如数据缓存和基于非与（NAND）闪存的存储系统，都采用热数据识别方案。然而，无论工作负载特征如何，大多数现有研究都只采用了细粒度（即块级）热数据决策策略，这导致了较高的计算开销和错误率。不同的工作负载需要不同的处理方法来实现有效的热数据识别。在对工作负载进行全面研究的基础上，本文提出了一种基于工作负载动态选择粗粒度（即子请求级）策略或粗粒度（即请求级）策略的自适应多级热数据识别方案Multigrain。该算法采用多个有效的布隆滤波器来捕获频率和频率信息。此外，它采用了一种简单而智能的预过滤机制，利用了工作负载堆栈距离信息。据我们所知，该方案是第一个多级粗粒度热数据识别方案，可以明智地选择最优的热数据决策粒度来实现有效准确的识别。我们对许多实际工作负载的广泛实验表明，我们的自适应多级方案使用有效的粗粒度策略和预过滤机制显著减少了执行时间（平均最多减少6.9倍）和错误率（平均最多减少2.27倍）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.