首页 > 最新文献

ACM Transactions on Storage最新文献

英文 中文
ZNSwap: un-Block your Swap zswap:取消阻塞您的Swap
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-03-06 DOI: https://dl.acm.org/doi/10.1145/3582434
Shai Bergman, Niklas Cassel, Matias Bjørling, Mark Silberstein

We introduce ZNSwap , a novel swap subsystem optimized for the recent Zoned Namespace (ZNS) SSDs. ZNSwap leverages ZNS’s explicit control over data management on the drive and introduces a space-efficient host-side Garbage Collector (GC) for swap storage co-designed with the OS swap logic. ZNSwap enables cross-layer optimizations, such as direct access to the in-kernel swap usage statistics by the GC to enable fine-grain swap storage management, and correct accounting of the GC bandwidth usage in the OS resource isolation mechanisms to improve performance isolation in multi-tenant environments. We evaluate ZNSwap using standard Linux swap benchmarks and two production key-value stores. ZNSwap shows significant performance benefits over the Linux swap on traditional SSDs, such as stable throughput for different memory access patterns, and 10× lower 99th percentile latency and 5× higher throughput for memcached key-value store under realistic usage scenarios.

我们介绍了znsswap,这是一种针对最近的Zoned Namespace (ZNS) ssd优化的新型交换子系统。zswap利用ZNS对驱动器上数据管理的显式控制,并为与操作系统交换逻辑共同设计的交换存储引入了一个空间高效的主机端垃圾收集器(GC)。znsswap支持跨层优化,例如GC直接访问内核内交换空间使用统计信息以实现细粒度交换存储管理,以及在操作系统资源隔离机制中正确计算GC带宽使用情况以提高多租户环境中的性能隔离。我们使用标准Linux交换基准和两个生产键值存储来评估znsswap。与传统ssd上的Linux交换相比,zswap显示出显著的性能优势,例如不同内存访问模式的稳定吞吐量,在实际使用场景下,memcached键值存储的第99百分位延迟降低10倍,吞吐量提高5倍。
{"title":"ZNSwap: un-Block your Swap","authors":"Shai Bergman, Niklas Cassel, Matias Bjørling, Mark Silberstein","doi":"https://dl.acm.org/doi/10.1145/3582434","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3582434","url":null,"abstract":"<p>We introduce <i>ZNSwap</i> , a novel swap subsystem optimized for the recent Zoned Namespace (ZNS) SSDs. ZNSwap leverages ZNS’s explicit control over data management on the drive and introduces a space-efficient host-side Garbage Collector (GC) for swap storage co-designed with the OS swap logic. ZNSwap enables cross-layer optimizations, such as direct access to the in-kernel swap usage statistics by the GC to enable fine-grain swap storage management, and correct accounting of the GC bandwidth usage in the OS resource isolation mechanisms to improve performance isolation in multi-tenant environments. We evaluate ZNSwap using standard Linux swap benchmarks and two production key-value stores. ZNSwap shows significant performance benefits over the Linux swap on traditional SSDs, such as stable throughput for different memory access patterns, and 10× lower 99th percentile latency and 5× higher throughput for <monospace>memcached</monospace> key-value store under realistic usage scenarios.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An In-depth Comparative Analysis of Cloud Block Storage Workloads: Findings and Implications 云块存储工作负载的深度比较分析:发现和启示
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-03-06 DOI: https://dl.acm.org/doi/10.1145/3572779
Jinhong Li, Qiuping Wang, Patrick P. C. Lee, Chao Shi

Cloud block storage systems support diverse types of applications in modern cloud services. Characterizing their input/output (I/O) activities is critical for guiding better system designs and optimizations. In this article, we present an in-depth comparative analysis of production cloud block storage workloads through the block-level I/O traces of billions of I/O requests collected from two production systems, Alibaba Cloud and Tencent Cloud Block Storage. We study their characteristics of load intensities, spatial patterns, and temporal patterns. We also compare the cloud block storage workloads with the notable public block-level I/O workloads from the enterprise data centers at Microsoft Research Cambridge, and we identify the commonalities and differences of the three sources of traces. To this end, we provide 6 findings through the high-level analysis and 16 findings through the detailed analysis on load intensity, spatial patterns, and temporal patterns. We discuss the implications of our findings on load balancing, cache efficiency, and storage cluster management in cloud block storage systems.

云块存储系统支持现代云服务中各种类型的应用。描述它们的输入/输出(I/O)活动对于指导更好的系统设计和优化至关重要。在本文中,我们通过从阿里云和腾讯云块存储两个生产系统收集的数十亿个I/O请求的块级I/O跟踪,对生产云块存储工作负载进行了深入的比较分析。我们研究了它们的载荷强度特征、空间模式和时间模式。我们还将云块存储工作负载与来自微软剑桥研究院企业数据中心的公共块级I/O工作负载进行了比较,并确定了三种跟踪源的共同点和差异。为此,我们通过高层次分析得出了6个结论,通过对负荷强度、空间格局和时间格局的详细分析得出了16个结论。我们讨论了我们的发现对云块存储系统中的负载平衡、缓存效率和存储集群管理的影响。
{"title":"An In-depth Comparative Analysis of Cloud Block Storage Workloads: Findings and Implications","authors":"Jinhong Li, Qiuping Wang, Patrick P. C. Lee, Chao Shi","doi":"https://dl.acm.org/doi/10.1145/3572779","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3572779","url":null,"abstract":"<p>Cloud block storage systems support diverse types of applications in modern cloud services. Characterizing their input/output (I/O) activities is critical for guiding better system designs and optimizations. In this article, we present an in-depth comparative analysis of production cloud block storage workloads through the block-level I/O traces of billions of I/O requests collected from two production systems, Alibaba Cloud and Tencent Cloud Block Storage. We study their characteristics of load intensities, spatial patterns, and temporal patterns. We also compare the cloud block storage workloads with the notable public block-level I/O workloads from the enterprise data centers at Microsoft Research Cambridge, and we identify the commonalities and differences of the three sources of traces. To this end, we provide 6 findings through the high-level analysis and 16 findings through the detailed analysis on load intensity, spatial patterns, and temporal patterns. We discuss the implications of our findings on load balancing, cache efficiency, and storage cluster management in cloud block storage systems.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Principled Schedulability Analysis for Distributed Storage Systems Using Thread Architecture Models 基于线程架构模型的分布式存储系统可调度性分析
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-03-06 DOI: https://dl.acm.org/doi/10.1145/3574323
Suli Yang, Jing Liu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau

In this article, we present an approach to systematically examine the schedulability of distributed storage systems, identify their scheduling problems, and enable effective scheduling in these systems. We use Thread Architecture Models (TAMs) to describe the behavior and interactions of different threads in a system, and show both how to construct TAMs for existing systems and utilize TAMs to identify critical scheduling problems. We specify three schedulability conditions that a schedulable TAM should satisfy: completeness, local enforceability, and independence; meeting these conditions enables a system to easily support different scheduling policies. We identify five common problems that prevent a system from satisfying the schedulability conditions, and show that these problems arise in existing systems such as HBase, Cassandra, MongoDB, and Riak, making it difficult or impossible to realize various scheduling disciplines. We demonstrate how to address these schedulability problems using both direct and indirect solutions, with different trade-offs. To show how to apply our approach to enable scheduling in realistic systems, we develop Tamed-HBase and Muzzled-HBase, sets of modifications to HBase that can realize the desired scheduling disciplines, including fairness and priority scheduling, even when presented with challenging workloads.

在本文中,我们提出了一种方法来系统地检查分布式存储系统的可调度性,识别它们的调度问题,并在这些系统中实现有效的调度。我们使用线程架构模型(TAMs)来描述系统中不同线程的行为和交互,并展示了如何为现有系统构建TAMs以及如何利用TAMs来识别关键调度问题。我们指定了可调度TAM应满足的三个可调度条件:完整性、局部可执行性和独立性;满足这些条件可以使系统轻松支持不同的调度策略。我们指出了五个常见的阻碍系统满足可调度性条件的问题,并指出这些问题出现在HBase、Cassandra、MongoDB和Riak等现有系统中,使得各种调度规则难以或不可能实现。我们将演示如何使用直接和间接解决方案来解决这些可调度性问题,并进行不同的权衡。为了展示如何应用我们的方法在现实系统中实现调度,我们开发了tame -HBase和muzzed -HBase,这两组对HBase的修改可以实现所需的调度规则,包括公平性和优先级调度,即使在面临具有挑战性的工作负载时也是如此。
{"title":"Principled Schedulability Analysis for Distributed Storage Systems Using Thread Architecture Models","authors":"Suli Yang, Jing Liu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau","doi":"https://dl.acm.org/doi/10.1145/3574323","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3574323","url":null,"abstract":"<p>In this article, we present an approach to systematically examine the <i>schedulability</i> of distributed storage systems, identify their scheduling problems, and enable effective scheduling in these systems. We use <i>Thread Architecture Models (TAMs)</i> to describe the behavior and interactions of different threads in a system, and show both how to construct TAMs for existing systems and utilize TAMs to identify critical scheduling problems. We specify three schedulability conditions that a schedulable TAM should satisfy: completeness, local enforceability, and independence; meeting these conditions enables a system to easily support different scheduling policies. We identify five common problems that prevent a system from satisfying the schedulability conditions, and show that these problems arise in existing systems such as HBase, Cassandra, MongoDB, and Riak, making it difficult or impossible to realize various scheduling disciplines. We demonstrate how to address these schedulability problems using both direct and indirect solutions, with different trade-offs. To show how to apply our approach to enable scheduling in realistic systems, we develop Tamed-HBase and Muzzled-HBase, sets of modifications to HBase that can realize the desired scheduling disciplines, including fairness and priority scheduling, even when presented with challenging workloads.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138542299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visibility Graph-based Cache Management for DRAM Buffer Inside Solid-state Drives 基于可见性图的固态硬盘内DRAM缓冲区缓存管理
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-03-03 DOI: 10.1145/3586576
Zhibing Sha, Jun Li, Fengxiang Zhang, Min Huang, Zhigang Cai, François Trahay, Jianwei Liao
Most solid-state drives (SSDs) adopt an on-board Dynamic Random Access Memory (DRAM) to buffer the write data, which can significantly reduce the amount of write operations committed to the flash array of SSD if data exhibits locality in write operations. This article focuses on efficiently managing the small amount of DRAM cache inside SSDs. The basic idea is to employ the visibility graph technique to unify both temporal and spatial locality of references of I/O accesses, for directing cache management in SSDs. Specifically, we propose to adaptively generate the visibility graph of cached data pages and then support batch adjustment of adjacent or nearby (hot) cached data pages by referring to the connection situations in the visibility graph. In addition, we propose to evict the buffered data pages in batches by also referring to the connection situations, to maximize the internal flushing parallelism of SSD devices without worsening I/O congestion. The trace-driven simulation experiments show that our proposal can yield improvements on cache hits by between 0.8% and 19.8%, and the overall I/O latency by 25.6% on average, compared to state-of-the-art cache management schemes inside SSDs.
大多数固态驱动器(SSD)采用板载动态随机存取存储器(DRAM)来缓冲写入数据,如果数据在写入操作中表现出局部性,这可以显著减少提交给SSD的闪存阵列的写入操作量。本文的重点是有效地管理SSD中的少量DRAM缓存。其基本思想是采用可见性图技术来统一I/O访问引用的时间和空间位置,以指导SSD中的缓存管理。具体来说,我们建议自适应地生成缓存数据页的可见性图,然后通过参考可见性图中的连接情况,支持对相邻或附近(热)缓存数据页进行批量调整。此外,我们还建议通过参考连接情况,批量驱逐缓冲的数据页,以最大限度地提高SSD设备的内部刷新并行性,而不会加剧I/O拥塞。跟踪驱动的模拟实验表明,与SSD内最先进的缓存管理方案相比,我们的方案可以将缓存命中率提高0.8%至19.8%,总体I/O延迟平均提高25.6%。
{"title":"Visibility Graph-based Cache Management for DRAM Buffer Inside Solid-state Drives","authors":"Zhibing Sha, Jun Li, Fengxiang Zhang, Min Huang, Zhigang Cai, François Trahay, Jianwei Liao","doi":"10.1145/3586576","DOIUrl":"https://doi.org/10.1145/3586576","url":null,"abstract":"Most solid-state drives (SSDs) adopt an on-board Dynamic Random Access Memory (DRAM) to buffer the write data, which can significantly reduce the amount of write operations committed to the flash array of SSD if data exhibits locality in write operations. This article focuses on efficiently managing the small amount of DRAM cache inside SSDs. The basic idea is to employ the visibility graph technique to unify both temporal and spatial locality of references of I/O accesses, for directing cache management in SSDs. Specifically, we propose to adaptively generate the visibility graph of cached data pages and then support batch adjustment of adjacent or nearby (hot) cached data pages by referring to the connection situations in the visibility graph. In addition, we propose to evict the buffered data pages in batches by also referring to the connection situations, to maximize the internal flushing parallelism of SSD devices without worsening I/O congestion. The trace-driven simulation experiments show that our proposal can yield improvements on cache hits by between 0.8% and 19.8%, and the overall I/O latency by 25.6% on average, compared to state-of-the-art cache management schemes inside SSDs.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43129495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Realizing Strong Determinism Contract on Log-Structured Merge Key-Value Stores 日志结构合并键值库中强确定性契约的实现
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-02-24 DOI: 10.1145/3582695
Miryeong Kwon, Seungjun Lee, Hyunkyu Choi, Jooyoung Hwang, Myoungsoo Jung
We propose Vigil-KV, a hardware and software co-designed framework that eliminates long-tail latency almost perfectly by introducing strong latency determinism. To make Get latency deterministic, Vigil-KV first enables a predictable latency mode (PLM) interface on a real datacenter-scale NVMe SSD, having knowledge about the nature of the underlying flash technologies. Vigil-KV at the system-level then hides the non-deterministic time window (associated with SSD’s internal tasks and/or write services) by internally scheduling the different device states of PLM across multiple physical functions. Vigil-KV further schedules compaction/flush operations and client requests being aware of PLM’s restrictions thereby integrating strong latency determinism into LSM KVs. We implement Vigil-KV upon a 1.92TB NVMe SSD prototype and Linux 4.19.91, but other LSM KVs can adopt its concept. We evaluate diverse Facebook and Yahoo scenarios with Vigil-KV, and the results show that Vigil-KV can reducethe tail latency of a baseline KV system by 3.19× while reducing the average latency by 34%, on average.
我们提出了Vigil KV,这是一个硬件和软件共同设计的框架,通过引入强延迟确定性,几乎完美地消除了长尾延迟。为了使Get latency具有确定性,Vigil KV首先在真正的数据中心规模的NVMe SSD上启用了可预测延迟模式(PLM)接口,了解底层闪存技术的性质。然后,系统级的Vigil KV通过在多个物理功能上对PLM的不同设备状态进行内部调度来隐藏不确定的时间窗口(与SSD的内部任务和/或写入服务相关)。Vigil KV在意识到PLM限制的情况下,进一步安排压缩/刷新操作和客户端请求,从而将强大的延迟确定性集成到LSM KV中。我们在1.92TB NVMe SSD原型和Linux 4.19.91上实现了Vigil KV,但其他LSM KV可以采用它的概念。我们用Vigil KV评估了不同的Facebook和Yahoo场景,结果表明,Vigil KV可以将基线KV系统的尾部延迟减少3.19倍,同时平均将平均延迟减少34%。
{"title":"Realizing Strong Determinism Contract on Log-Structured Merge Key-Value Stores","authors":"Miryeong Kwon, Seungjun Lee, Hyunkyu Choi, Jooyoung Hwang, Myoungsoo Jung","doi":"10.1145/3582695","DOIUrl":"https://doi.org/10.1145/3582695","url":null,"abstract":"We propose Vigil-KV, a hardware and software co-designed framework that eliminates long-tail latency almost perfectly by introducing strong latency determinism. To make Get latency deterministic, Vigil-KV first enables a predictable latency mode (PLM) interface on a real datacenter-scale NVMe SSD, having knowledge about the nature of the underlying flash technologies. Vigil-KV at the system-level then hides the non-deterministic time window (associated with SSD’s internal tasks and/or write services) by internally scheduling the different device states of PLM across multiple physical functions. Vigil-KV further schedules compaction/flush operations and client requests being aware of PLM’s restrictions thereby integrating strong latency determinism into LSM KVs. We implement Vigil-KV upon a 1.92TB NVMe SSD prototype and Linux 4.19.91, but other LSM KVs can adopt its concept. We evaluate diverse Facebook and Yahoo scenarios with Vigil-KV, and the results show that Vigil-KV can reducethe tail latency of a baseline KV system by 3.19× while reducing the average latency by 34%, on average.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45923878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boosting Cache Performance by Access Time Measurements 通过访问时间测量提高缓存性能
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-02-17 DOI: https://dl.acm.org/doi/10.1145/3572778
Gil Einziger, Omri Himelbrand, Erez Waisbard

Most modern systems utilize caches to reduce the average data access time and optimize their performance. Recently proposed policies implicitly assume uniform access times, but variable access times naturally appear in domains such as storage, web search, and DNS resolution.

Our work measures the access times for various items and exploits variations in access times as an additional signal for caching algorithms. Using such a signal, we introduce adaptive access time-aware cache policies that consistently improve the average access time compared with the best alternative in diverse workloads. Our adaptive algorithm attains an average access time reduction of up to 46% in storage workloads, up to 16% in web searches, and 8.4% on average when considering all experiments in our study.

大多数现代系统利用缓存来减少平均数据访问时间并优化其性能。最近提出的策略隐式地假设统一的访问时间,但是可变的访问时间自然会出现在存储、web搜索和DNS解析等领域中。我们的工作测量各种项目的访问时间,并利用访问时间的变化作为缓存算法的附加信号。使用这样的信号,我们引入了自适应的访问时间感知缓存策略,与不同工作负载下的最佳替代方案相比,这些策略持续提高了平均访问时间。我们的自适应算法在存储工作负载中平均减少了高达46%的访问时间,在网络搜索中减少了高达16%的访问时间,在考虑我们研究中的所有实验时平均减少了8.4%。
{"title":"Boosting Cache Performance by Access Time Measurements","authors":"Gil Einziger, Omri Himelbrand, Erez Waisbard","doi":"https://dl.acm.org/doi/10.1145/3572778","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3572778","url":null,"abstract":"<p>Most modern systems utilize caches to reduce the average data access time and optimize their performance. <span>Recently proposed policies implicitly</span> assume uniform access times, but variable access times naturally appear in domains such as storage, web search, and DNS resolution.</p><p>Our work measures the access times for various items and exploits variations in access times as an additional signal for caching algorithms. Using such a signal, we introduce adaptive access time-aware cache policies that consistently improve the average access time compared with the best alternative in diverse workloads. Our adaptive algorithm attains an average access time reduction of up to 46% in storage workloads, up to 16% in web searches, and 8.4% on average when considering all experiments in our study.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Design of Fast and Lightweight Resemblance Detection for Efficient Post-Deduplication Delta Compression 用于高效重复数据删除后增量压缩的快速轻量级相似性检测设计
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-02-16 DOI: 10.1145/3584663
Wen Xia, Lifeng Pu, Xiangyu Zou, Philip Shilane, Shiyi Li, Haijun Zhang, Xuan Wang
Post-deduplication delta compression is a data reduction technique that calculates and stores the differences of very similar but non-duplicate chunks in storage systems, which is able to achieve a very high compression ratio. However, the low throughput of widely used resemblance detection approaches (e.g., N-Transform) usually becomes the bottleneck of delta compression systems due to introducing high computational overhead. Generally, this overhead mainly consists of two parts: ① calculating the rolling hash byte by byte across data chunks and ② applying multiple transforms on all of the calculated rolling hash values. In this article, we propose Odess, a fast and lightweight resemblance detection approach, that greatly reduces the computational overhead for resemblance detection while achieving high detection accuracy and a high compression ratio. Odess first utilizes a novel Subwindow-based Parallel Rolling (SWPR) hash method using Single Instruction Multiple Data [1] (SIMD) to accelerate calculation of rolling hashes (corresponding to the first part of the overhead). Odess then uses a novel Content-Defined Sampling method to generate a much smaller proxy hash set from the whole rolling hash set and quickly applies transforms on this small hash set for resemblance detection (corresponding to the second part of the overhead). Evaluation results show that during the stage of resemblance detection, the Odess approach is ∼31.4× and ∼7.9× faster than the state-of-the-art N-Transform and Finesse (a recent variant of N-Transform [39]), respectively. When considering an end-to-end data reduction storage system, the Odess-based system’s throughput is about 3.20× and 1.41× higher than the N-Transform- and Finesse-based systems’ throughput, respectively, while maintaining the high compression ratio of N-Transform and achieving ∼1.22× higher compression ratio over Finesse.
重复数据消除后增量压缩是一种数据缩减技术,它计算并存储存储系统中非常相似但不重复的块的差异,从而能够实现非常高的压缩率。然而,由于引入了高计算开销,广泛使用的相似性检测方法(例如,N变换)的低吞吐量通常成为delta压缩系统的瓶颈。通常,这种开销主要由两部分组成:①跨数据块逐字节计算滚动哈希;②对所有计算出的滚动哈希值进行多次转换。在本文中,我们提出了一种快速、轻量级的相似性检测方法Odess,它大大减少了相似性检测的计算开销,同时实现了高检测精度和高压缩比。Odess首先利用了一种新颖的基于子窗口的并行滚动(SWPR)哈希方法,该方法使用单指令多数据[1](SIMD)来加速滚动哈希的计算(对应于开销的第一部分)。然后,Odess使用一种新颖的内容定义采样方法从整个滚动哈希集生成一个小得多的代理哈希集,并在这个小哈希集上快速应用变换进行相似性检测(对应于开销的第二部分)。评估结果表明,在相似性检测阶段,Odess方法分别比最先进的N-变换和Finesse(N-变换的最新变体[39])快31.4倍和7.9倍。当考虑端到端数据缩减存储系统时,基于Odess的系统的吞吐量分别比基于N-变换和Finesse的系统的吞吐率高3.20倍和1.41倍,同时保持N-变换的高压缩比,并实现比Finesse高约1.22倍的压缩比。
{"title":"The Design of Fast and Lightweight Resemblance Detection for Efficient Post-Deduplication Delta Compression","authors":"Wen Xia, Lifeng Pu, Xiangyu Zou, Philip Shilane, Shiyi Li, Haijun Zhang, Xuan Wang","doi":"10.1145/3584663","DOIUrl":"https://doi.org/10.1145/3584663","url":null,"abstract":"Post-deduplication delta compression is a data reduction technique that calculates and stores the differences of very similar but non-duplicate chunks in storage systems, which is able to achieve a very high compression ratio. However, the low throughput of widely used resemblance detection approaches (e.g., N-Transform) usually becomes the bottleneck of delta compression systems due to introducing high computational overhead. Generally, this overhead mainly consists of two parts: ① calculating the rolling hash byte by byte across data chunks and ② applying multiple transforms on all of the calculated rolling hash values. In this article, we propose Odess, a fast and lightweight resemblance detection approach, that greatly reduces the computational overhead for resemblance detection while achieving high detection accuracy and a high compression ratio. Odess first utilizes a novel Subwindow-based Parallel Rolling (SWPR) hash method using Single Instruction Multiple Data [1] (SIMD) to accelerate calculation of rolling hashes (corresponding to the first part of the overhead). Odess then uses a novel Content-Defined Sampling method to generate a much smaller proxy hash set from the whole rolling hash set and quickly applies transforms on this small hash set for resemblance detection (corresponding to the second part of the overhead). Evaluation results show that during the stage of resemblance detection, the Odess approach is ∼31.4× and ∼7.9× faster than the state-of-the-art N-Transform and Finesse (a recent variant of N-Transform [39]), respectively. When considering an end-to-end data reduction storage system, the Odess-based system’s throughput is about 3.20× and 1.41× higher than the N-Transform- and Finesse-based systems’ throughput, respectively, while maintaining the high compression ratio of N-Transform and achieving ∼1.22× higher compression ratio over Finesse.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41294923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory Programs TriCache:一种用户透明的块缓存,使用内存程序实现高性能的核外处理
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-02-13 DOI: 10.1145/3583139
Guan Feng, Huanqi Cao, Xiaowei Zhu, Bowen Yu, Yuanwei Wang, Zixuan Ma, Shengqi Chen, Wenguang Chen
Out-of-core systems rely on high-performance cache sub-systems to reduce the number of I/O operations. Although the page cache in modern operating systems enables transparent access to memory and storage devices, it suffers from efficiency and scalability issues on cache misses, forcing out-of-core systems to design and implement their own cache components, which is a non-trivial task. This study proposes TriCache, a cache mechanism that enables in-memory programs to efficiently process out-of-core datasets without requiring any code rewrite. It provides a virtual memory interface on top of the conventional block interface to simultaneously achieve user transparency and sufficient out-of-core performance. A multi-level block cache design is proposed to address the challenge of per-access address translations required by a memory interface. It can exploit spatial and temporal localities in memory or storage accesses to render storage-to-memory address translation and page-level concurrency control adequately efficient for the virtual memory interface. Our evaluation shows that in-memory systems operating on top of TriCache can outperform Linux OS page cache by more than one order of magnitude, and can deliver performance comparable to or even better than that of corresponding counterparts designed specifically for out-of-core scenarios.
外核系统依赖于高性能缓存子系统来减少I/O操作的数量。尽管现代操作系统中的页面缓存支持对内存和存储设备的透明访问,但它在缓存丢失时存在效率和可伸缩性问题,迫使核心外系统设计和实现自己的缓存组件,这是一项非常重要的任务。本研究提出了TriCache,这是一种缓存机制,使内存程序能够有效地处理核心外数据集,而无需重写任何代码。它在传统块接口之上提供了一个虚拟内存接口,以同时实现用户透明性和足够的核外性能。提出了一种多级块缓存设计,以解决存储器接口要求的每次访问地址转换的挑战。它可以利用内存中的空间和时间位置或存储访问,为虚拟内存接口提供足够有效的存储到内存地址转换和页面级并发控制。我们的评估表明,在TriCache之上运行的内存系统可以比Linux操作系统的页面缓存性能高出一个数量级以上,并且可以提供与专门为非核心场景设计的相应系统相当甚至更好的性能。
{"title":"TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory Programs","authors":"Guan Feng, Huanqi Cao, Xiaowei Zhu, Bowen Yu, Yuanwei Wang, Zixuan Ma, Shengqi Chen, Wenguang Chen","doi":"10.1145/3583139","DOIUrl":"https://doi.org/10.1145/3583139","url":null,"abstract":"Out-of-core systems rely on high-performance cache sub-systems to reduce the number of I/O operations. Although the page cache in modern operating systems enables transparent access to memory and storage devices, it suffers from efficiency and scalability issues on cache misses, forcing out-of-core systems to design and implement their own cache components, which is a non-trivial task. This study proposes TriCache, a cache mechanism that enables in-memory programs to efficiently process out-of-core datasets without requiring any code rewrite. It provides a virtual memory interface on top of the conventional block interface to simultaneously achieve user transparency and sufficient out-of-core performance. A multi-level block cache design is proposed to address the challenge of per-access address translations required by a memory interface. It can exploit spatial and temporal localities in memory or storage accesses to render storage-to-memory address translation and page-level concurrency control adequately efficient for the virtual memory interface. Our evaluation shows that in-memory systems operating on top of TriCache can outperform Linux OS page cache by more than one order of magnitude, and can deliver performance comparable to or even better than that of corresponding counterparts designed specifically for out-of-core scenarios.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45651206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ZNSwap: un-Block your Swap ZNSwap:取消阻止您的Swap
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-02-01 DOI: 10.1145/3582434
Shai Bergman, Niklas Cassel, Matias Bjørling, M. Silberstein
We introduce ZNSwap , a novel swap subsystem optimized for the recent Zoned Namespace (ZNS) SSDs. ZNSwap leverages ZNS’s explicit control over data management on the drive and introduces a space-efficient host-side Garbage Collector (GC) for swap storage co-designed with the OS swap logic. ZNSwap enables cross-layer optimizations, such as direct access to the in-kernel swap usage statistics by the GC to enable fine-grain swap storage management, and correct accounting of the GC bandwidth usage in the OS resource isolation mechanisms to improve performance isolation in multi-tenant environments. We evaluate ZNSwap using standard Linux swap benchmarks and two production key-value stores. ZNSwap shows significant performance benefits over the Linux swap on traditional SSDs, such as stable throughput for different memory access patterns, and 10× lower 99th percentile latency and 5× higher throughput for memcached key-value store under realistic usage scenarios.
我们介绍了ZNSwap,这是一种针对最近的分区命名空间(ZNS)SSD进行优化的新型交换子系统。ZNSwap利用ZNS对驱动器上数据管理的明确控制,并引入了一个与操作系统交换逻辑共同设计的用于交换存储的节省空间的主机端垃圾回收器(GC)。ZNSwap支持跨层优化,例如GC直接访问内核内交换使用情况统计信息以实现细粒度交换存储管理,以及在操作系统资源隔离机制中正确核算GC带宽使用情况以提高多租户环境中的性能隔离。我们使用标准的Linux交换基准和两个生产关键价值存储来评估ZNSwap。与传统SSD上的Linux交换相比,ZNSwap显示出显著的性能优势,例如不同内存访问模式的稳定吞吐量,以及在实际使用场景下memcached键值存储的10倍低的99%延迟和5倍高的吞吐量。
{"title":"ZNSwap: un-Block your Swap","authors":"Shai Bergman, Niklas Cassel, Matias Bjørling, M. Silberstein","doi":"10.1145/3582434","DOIUrl":"https://doi.org/10.1145/3582434","url":null,"abstract":"We introduce ZNSwap , a novel swap subsystem optimized for the recent Zoned Namespace (ZNS) SSDs. ZNSwap leverages ZNS’s explicit control over data management on the drive and introduces a space-efficient host-side Garbage Collector (GC) for swap storage co-designed with the OS swap logic. ZNSwap enables cross-layer optimizations, such as direct access to the in-kernel swap usage statistics by the GC to enable fine-grain swap storage management, and correct accounting of the GC bandwidth usage in the OS resource isolation mechanisms to improve performance isolation in multi-tenant environments. We evaluate ZNSwap using standard Linux swap benchmarks and two production key-value stores. ZNSwap shows significant performance benefits over the Linux swap on traditional SSDs, such as stable throughput for different memory access patterns, and 10× lower 99th percentile latency and 5× higher throughput for memcached key-value store under realistic usage scenarios.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49530487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
CacheSack: Theory and Experience of Google’s Admission Optimization for Datacenter Flash Caches CacheSack:谷歌数据中心Flash缓存准入优化的理论与经验
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-01-21 DOI: 10.1145/3582014
Tzu-Wei Yang, Seth Pollen, Mustafa Uysal, A. Merchant, H. Wolfmeister, Junaid Khalid
This article describes the algorithm, implementation, and deployment experience of CacheSack, the admission algorithm for Google datacenter flash caches. CacheSack minimizes the dominant costs of Google’s datacenter flash caches: disk IO and flash footprint. CacheSack partitions cache traffic into disjoint categories, analyzes the observed cache benefit of each subset, and formulates a knapsack problem to assign the optimal admission policy to each subset. Prior to this work, Google datacenter flash cache admission policies were optimized manually, with most caches using the Lazy Adaptive Replacement Cache algorithm. Production experiments showed that CacheSack significantly outperforms the prior static admission policies for a 7.7% improvement of the total cost of ownership, as well as significant improvements in disk reads (9.5% reduction) and flash wearout (17.8% reduction).
本文描述了CacheSack的算法、实现和部署经验,CacheSack是谷歌数据中心闪存缓存的接纳算法。CacheSack最大限度地减少b谷歌数据中心闪存缓存的主要成本:磁盘IO和闪存占用。CacheSack将缓存流量划分为不相关的类别,分析观察到的每个子集的缓存效益,并制定一个背包问题,为每个子集分配最优的允许策略。在此之前,谷歌数据中心的闪存缓存准入策略是手动优化的,大多数缓存使用Lazy Adaptive Replacement cache算法。生产实验表明,CacheSack显著优于之前的静态准入策略,总拥有成本提高了7.7%,磁盘读取(减少9.5%)和闪存磨损(减少17.8%)方面也有显著改善。
{"title":"CacheSack: Theory and Experience of Google’s Admission Optimization for Datacenter Flash Caches","authors":"Tzu-Wei Yang, Seth Pollen, Mustafa Uysal, A. Merchant, H. Wolfmeister, Junaid Khalid","doi":"10.1145/3582014","DOIUrl":"https://doi.org/10.1145/3582014","url":null,"abstract":"This article describes the algorithm, implementation, and deployment experience of CacheSack, the admission algorithm for Google datacenter flash caches. CacheSack minimizes the dominant costs of Google’s datacenter flash caches: disk IO and flash footprint. CacheSack partitions cache traffic into disjoint categories, analyzes the observed cache benefit of each subset, and formulates a knapsack problem to assign the optimal admission policy to each subset. Prior to this work, Google datacenter flash cache admission policies were optimized manually, with most caches using the Lazy Adaptive Replacement Cache algorithm. Production experiments showed that CacheSack significantly outperforms the prior static admission policies for a 7.7% improvement of the total cost of ownership, as well as significant improvements in disk reads (9.5% reduction) and flash wearout (17.8% reduction).","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48275503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Storage
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1