{"title":"Multigrain: Adaptive multilevel hot data identifier with a stack distance-based prefilter","authors":"Hyerim Lee , Dongchul Park","doi":"10.1016/j.future.2025.107762","DOIUrl":null,"url":null,"abstract":"<div><div>Many computer system applications, such as data caching and Not AND (NAND) flash memory-based storage systems, employ a hot data identification scheme. However, regardless of the workload characteristics, most existing studies have adopted only a fine-grained (i.e., block-level) hot data decision policy, causing high computational overhead and error rates. Different workloads mandate different treatments to achieve effective hot data identification. Based on our comprehensive workload studies, this paper proposes Multigrain, an <em>adaptive multilevel</em> hot data identification scheme that dynamically selects a coarse-grained (i.e., subrequest-level) policy or coarser-grained (i.e., request-level) policy based on the workload. The proposed Multigrain employs multiple effective bloom filters to capture frequency and recency information. Moreover, it adopts a simple and smart <em>prefilter mechanism</em> leveraging workload stack distance information. To our knowledge, the proposed scheme is the <em>first multilevel coarse-grained hot data identification scheme</em> that judiciously selects an optimal hot data decision granularity to achieve effective and accurate identification. Our extensive experiments with many realistic workloads demonstrate that our adaptive multilevel scheme significantly reduces the execution time (by an average of up to 6.9<span><math><mo>×</mo></math></span>) and error rate (by an average of up to 2.27<span><math><mo>×</mo></math></span>) using the effective coarse-grained policies and a prefiltering mechanism.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"167 ","pages":"Article 107762"},"PeriodicalIF":6.2000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25000573","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Many computer system applications, such as data caching and Not AND (NAND) flash memory-based storage systems, employ a hot data identification scheme. However, regardless of the workload characteristics, most existing studies have adopted only a fine-grained (i.e., block-level) hot data decision policy, causing high computational overhead and error rates. Different workloads mandate different treatments to achieve effective hot data identification. Based on our comprehensive workload studies, this paper proposes Multigrain, an adaptive multilevel hot data identification scheme that dynamically selects a coarse-grained (i.e., subrequest-level) policy or coarser-grained (i.e., request-level) policy based on the workload. The proposed Multigrain employs multiple effective bloom filters to capture frequency and recency information. Moreover, it adopts a simple and smart prefilter mechanism leveraging workload stack distance information. To our knowledge, the proposed scheme is the first multilevel coarse-grained hot data identification scheme that judiciously selects an optimal hot data decision granularity to achieve effective and accurate identification. Our extensive experiments with many realistic workloads demonstrate that our adaptive multilevel scheme significantly reduces the execution time (by an average of up to 6.9) and error rate (by an average of up to 2.27) using the effective coarse-grained policies and a prefiltering mechanism.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.