首页 > 最新文献

ACM Transactions on Storage最新文献

英文 中文
The Design of Fast and Lightweight Resemblance Detection for Efficient Post-Deduplication Delta Compression 用于高效重复数据删除后增量压缩的快速轻量级相似性检测设计
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-06-19 DOI: https://dl.acm.org/doi/10.1145/3584663
Wen Xia, Lifeng Pu, Xiangyu Zou, Philip Shilane, Shiyi Li, Haijun Zhang, Xuan Wang

Post-deduplication delta compression is a data reduction technique that calculates and stores the differences of very similar but non-duplicate chunks in storage systems, which is able to achieve a very high compression ratio. However, the low throughput of widely used resemblance detection approaches (e.g., N-Transform) usually becomes the bottleneck of delta compression systems due to introducing high computational overhead. Generally, this overhead mainly consists of two parts: ① calculating the rolling hash byte by byte across data chunks and ② applying multiple transforms on all of the calculated rolling hash values.

In this article, we propose Odess, a fast and lightweight resemblance detection approach, that greatly reduces the computational overhead for resemblance detection while achieving high detection accuracy and a high compression ratio. Odess first utilizes a novel Subwindow-based Parallel Rolling (SWPR) hash method using Single Instruction Multiple Data [1] (SIMD) to accelerate calculation of rolling hashes (corresponding to the first part of the overhead). Odess then uses a novel Content-Defined Sampling method to generate a much smaller proxy hash set from the whole rolling hash set and quickly applies transforms on this small hash set for resemblance detection (corresponding to the second part of the overhead).

Evaluation results show that during the stage of resemblance detection, the Odess approach is ∼31.4× and ∼7.9× faster than the state-of-the-art N-Transform and Finesse (a recent variant of N-Transform [39]), respectively. When considering an end-to-end data reduction storage system, the Odess-based system’s throughput is about 3.20× and 1.41× higher than the N-Transform- and Finesse-based systems’ throughput, respectively, while maintaining the high compression ratio of N-Transform and achieving ∼1.22× higher compression ratio over Finesse.

重复数据删除后增量压缩是一种数据缩减技术,它计算并存储存储系统中非常相似但不重复的块的差异,可以实现非常高的压缩比。然而,广泛使用的相似性检测方法(例如N-Transform)的低吞吐量通常由于引入高计算开销而成为增量压缩系统的瓶颈。一般来说,这个开销主要由两部分组成:①跨数据块逐个字节地计算滚动哈希,②对所有计算出的滚动哈希值应用多次变换。在本文中,我们提出了一种快速轻量级的相似性检测方法odes,它在实现高检测精度和高压缩比的同时,大大减少了相似性检测的计算开销。Odess首先利用一种新的基于子窗口的并行滚动(SWPR)哈希方法,使用单指令多数据[1](SIMD)来加速滚动哈希的计算(对应于开销的第一部分)。然后,Odess使用一种新颖的内容定义采样方法,从整个滚动哈希集生成一个小得多的代理哈希集,并在这个小哈希集上快速应用变换以进行相似性检测(对应于开销的第二部分)。评估结果表明,在相似性检测阶段,Odess方法分别比最先进的N-Transform和Finesse (N-Transform的最新变体[39])快~ 31.4倍和~ 7.9倍。当考虑端到端数据缩减存储系统时,基于odes的系统的吞吐量分别比基于N-Transform和Finesse的系统的吞吐量高约3.20倍和1.41倍,同时保持了N-Transform的高压缩比,并且比Finesse的压缩比高约1.22倍。
{"title":"The Design of Fast and Lightweight Resemblance Detection for Efficient Post-Deduplication Delta Compression","authors":"Wen Xia, Lifeng Pu, Xiangyu Zou, Philip Shilane, Shiyi Li, Haijun Zhang, Xuan Wang","doi":"https://dl.acm.org/doi/10.1145/3584663","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3584663","url":null,"abstract":"<p>Post-deduplication delta compression is a data reduction technique that calculates and stores the differences of very similar but non-duplicate chunks in storage systems, which is able to achieve a very high compression ratio. However, the low throughput of widely used resemblance detection approaches (e.g., N-Transform) usually becomes the bottleneck of delta compression systems due to introducing high computational overhead. Generally, this overhead mainly consists of two parts: ① calculating the rolling hash byte by byte across data chunks and ② applying multiple transforms on all of the calculated rolling hash values.</p><p> In this article, we propose Odess, a fast and lightweight resemblance detection approach, that greatly reduces the computational overhead for resemblance detection while achieving high detection accuracy and a high compression ratio. Odess first utilizes a novel Subwindow-based Parallel Rolling (SWPR) hash method using Single Instruction Multiple Data [1] (SIMD) to accelerate calculation of rolling hashes (corresponding to the first part of the overhead). Odess then uses a novel Content-Defined Sampling method to generate a much smaller proxy hash set from the whole rolling hash set and quickly applies transforms on this small hash set for resemblance detection (corresponding to the second part of the overhead).</p><p>Evaluation results show that during the stage of resemblance detection, the Odess approach is ∼31.4× and ∼7.9× faster than the state-of-the-art N-Transform and Finesse (a recent variant of N-Transform [39]), respectively. When considering an end-to-end data reduction storage system, the Odess-based system’s throughput is about 3.20× and 1.41× higher than the N-Transform- and Finesse-based systems’ throughput, respectively, while maintaining the high compression ratio of N-Transform and achieving ∼1.22× higher compression ratio over Finesse.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138512878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KVRangeDB: Range Queries for a Hash-based Key–Value Device KVRangeDB:基于哈希键值设备的范围查询
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-06-19 DOI: https://dl.acm.org/doi/10.1145/3582013
Mian Qin, Qing Zheng, Jason Lee, Bradley Settlemyer, Fei Wen, Narasimha Reddy, Paul Gratz

Key–value (KV) software has proven useful to a wide variety of applications including analytics, time-series databases, and distributed file systems. To satisfy the requirements of diverse workloads, KV stores have been carefully tailored to best match the performance characteristics of underlying solid-state block devices. Emerging KV storage device is a promising technology for both simplifying the KV software stack and improving the performance of persistent storage-based applications. However, while providing fast, predictable put and get operations, existing KV storage devices do not natively support range queries that are critical to all three types of applications described above.

In this article, we present KVRangeDB, a software layer that enables processing range queries for existing hash-based KV solid-state disks (KVSSDs). As an effort to adapt to the performance characteristics of emerging KVSSDs, KVRangeDB implements log-structured merge tree key index that reduces compaction I/O, merges keys when possible, and provides separate caches for indexes and values. We evaluated the KVRangeDB under a set of representative workloads, and compared its performance with two existing database solutions: a Rocksdb variant ported to work with the KVSSD, and Wisckey, a key–value database that is carefully tuned for conventional block devices. On filesystem aging workloads, KVRangeDB outperforms Wisckey by 23.7× in terms of throughput and reduce CPU usage and external write amplifications by 14.3× and 9.8×, respectively.

键值(KV)软件已被证明对各种应用程序非常有用,包括分析、时间序列数据库和分布式文件系统。为了满足不同工作负载的要求,KV存储已经过精心定制,以最佳地匹配底层固态块器件的性能特征。新兴的KV存储设备对于简化KV软件堆栈和提高基于持久存储的应用程序的性能是一种很有前途的技术。然而,虽然提供了快速、可预测的put和get操作,但现有的KV存储设备本身并不支持对上述所有三种类型的应用都至关重要的范围查询。在本文中,我们介绍了KVRangeDB,这是一个软件层,可以处理现有基于散列的KV固态磁盘(kvssd)的范围查询。为了适应新出现的kvssd的性能特征,KVRangeDB实现了日志结构的合并树键索引,减少了压缩I/O,在可能的情况下合并键,并为索引和值提供了单独的缓存。我们在一组具有代表性的工作负载下评估了KVRangeDB,并将其性能与两种现有数据库解决方案进行了比较:一种是移植到KVSSD上的Rocksdb变体,另一种是针对传统块设备进行了精心调整的键值数据库wiskey。在文件系统老化工作负载上,KVRangeDB的吞吐量比wiskey高23.7倍,CPU使用率和外部写放大分别降低14.3倍和9.8倍。
{"title":"KVRangeDB: Range Queries for a Hash-based Key–Value Device","authors":"Mian Qin, Qing Zheng, Jason Lee, Bradley Settlemyer, Fei Wen, Narasimha Reddy, Paul Gratz","doi":"https://dl.acm.org/doi/10.1145/3582013","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3582013","url":null,"abstract":"<p>Key–value (KV) software has proven useful to a wide variety of applications including analytics, time-series databases, and distributed file systems. To satisfy the requirements of diverse workloads, KV stores have been carefully tailored to best match the performance characteristics of underlying solid-state block devices. Emerging KV storage device is a promising technology for both simplifying the KV software stack and improving the performance of persistent storage-based applications. However, while providing fast, predictable put and get operations, existing KV storage devices do not natively support range queries that are critical to all three types of applications described above.</p><p>In this article, we present KVRangeDB, a software layer that enables processing range queries for existing hash-based KV solid-state disks (KVSSDs). As an effort to adapt to the performance characteristics of emerging KVSSDs, KVRangeDB implements log-structured merge tree key index that reduces compaction I/O, merges keys when possible, and provides separate caches for indexes and values. We evaluated the KVRangeDB under a set of representative workloads, and compared its performance with two existing database solutions: a Rocksdb variant ported to work with the KVSSD, and Wisckey, a key–value database that is carefully tuned for conventional block devices. On filesystem aging workloads, KVRangeDB outperforms Wisckey by 23.7× in terms of throughput and reduce CPU usage and external write amplifications by 14.3× and 9.8×, respectively.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Localized Validation Accelerates Distributed Transactions on Disaggregated Persistent Memory 本地化验证加速了分解持久内存上的分布式事务
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-06-19 DOI: https://dl.acm.org/doi/10.1145/3582012
Ming Zhang, Yu Hua, Pengfei Zuo, Lurong Liu

Persistent memory (PM) disaggregation significantly improves the resource utilization and failure isolation to build a scalable and cost-effective remote memory pool in modern data centers. However, due to offering limited computing power and overlooking the bandwidth and persistence properties of real PMs, existing distributed transaction schemes, which are designed for legacy DRAM-based monolithic servers, fail to efficiently work on the disaggregated PM. In this article, we propose FORD, a Fast One-sided RDMA-based Distributed transaction system for the new disaggregated PM architecture. FORD thoroughly leverages one-sided remote direct memory access to handle transactions for bypassing the remote CPU in the PM pool. To reduce the round trips, FORD batches the read and lock operations into one request to eliminate extra locking and validations for the read-write data. To accelerate the transaction commit, FORD updates all remote replicas in a single round trip with parallel undo logging and data visibility control. Moreover, considering the limited PM bandwidth, FORD enables the backup replicas to be read to alleviate the load on the primary replicas, thus improving the throughput. To efficiently guarantee the remote data persistency in the PM pool, FORD selectively flushes data to the backup replicas to mitigate the network overheads. Nevertheless, the original FORD wastes some validation round trips if the read-only data are not modified by other transactions. Hence, we further propose a localized validation scheme to transfer the validation operations for the read-only data from remote to local as much as possible to reduce the round trips. Experimental results demonstrate that FORD significantly improves the transaction throughput by up to 3× and decreases the latency by up to 87.4% compared with state-of-the-art systems.

持久内存(PM)分解可以显著提高资源利用率和故障隔离,从而在现代数据中心中构建可扩展且经济高效的远程内存池。然而,由于提供的计算能力有限,并且忽略了实际PM的带宽和持久性,现有的为遗留的基于dram的单片服务器设计的分布式事务方案无法有效地在分解的PM上工作。在本文中,我们提出了FORD,一个快速的基于单边rdma的分布式事务系统,用于新的分解PM体系结构。FORD完全利用单侧远程直接内存访问来处理事务,从而绕过PM池中的远程CPU。为了减少往返,FORD将读取和锁定操作分批处理到一个请求中,以消除对读写数据的额外锁定和验证。为了加速事务提交,FORD使用并行的撤销日志记录和数据可见性控制在一次往返中更新所有远程副本。此外,考虑到有限的PM带宽,FORD允许读取备份副本,以减轻主副本的负载,从而提高吞吐量。为了有效地保证PM池中的远程数据持久性,FORD有选择地将数据刷新到备份副本,以减轻网络开销。然而,如果只读数据没有被其他事务修改,那么原始FORD会浪费一些验证往返。因此,我们进一步提出了一种本地化验证方案,将只读数据的验证操作尽可能从远程转移到本地,以减少往返。实验结果表明,与最先进的系统相比,FORD显着将事务吞吐量提高了3倍,并将延迟降低了87.4%。
{"title":"Localized Validation Accelerates Distributed Transactions on Disaggregated Persistent Memory","authors":"Ming Zhang, Yu Hua, Pengfei Zuo, Lurong Liu","doi":"https://dl.acm.org/doi/10.1145/3582012","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3582012","url":null,"abstract":"<p>Persistent memory (PM) disaggregation significantly improves the resource utilization and failure isolation to build a scalable and cost-effective remote memory pool in modern data centers. However, due to offering limited computing power and overlooking the bandwidth and persistence properties of real PMs, existing distributed transaction schemes, which are designed for legacy DRAM-based monolithic servers, fail to efficiently work on the disaggregated PM. In this article, we propose FORD, a <i>F</i>ast <i>O</i>ne-sided <i>R</i>DMA-based <i>D</i>istributed transaction system for the new disaggregated PM architecture. FORD thoroughly leverages one-sided remote direct memory access to handle transactions for bypassing the remote CPU in the PM pool. To reduce the round trips, FORD batches the read and lock operations into one request to eliminate extra locking and validations for the read-write data. To accelerate the transaction commit, FORD updates all remote replicas in a single round trip with parallel undo logging and data visibility control. Moreover, considering the limited PM bandwidth, FORD enables the backup replicas to be read to alleviate the load on the primary replicas, thus improving the throughput. To efficiently guarantee the remote data persistency in the PM pool, FORD selectively flushes data to the backup replicas to mitigate the network overheads. Nevertheless, the original FORD wastes some validation round trips if the read-only data are not modified by other transactions. Hence, we further propose a localized validation scheme to transfer the validation operations for the read-only data from remote to local as much as possible to reduce the round trips. Experimental results demonstrate that FORD significantly improves the transaction throughput by up to 3× and decreases the latency by up to 87.4% compared with state-of-the-art systems.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visibility Graph-based Cache Management for DRAM Buffer Inside Solid-state Drives 基于可见性图的固态硬盘内DRAM缓存管理
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-06-19 DOI: https://dl.acm.org/doi/10.1145/3586576
Zhibing Sha, Jun Li, Fengxiang Zhang, Min Huang, Zhigang Cai, Francois Trahay, Jianwei Liao

Most solid-state drives (SSDs) adopt an on-board Dynamic Random Access Memory (DRAM) to buffer the write data, which can significantly reduce the amount of write operations committed to the flash array of SSD if data exhibits locality in write operations. This article focuses on efficiently managing the small amount of DRAM cache inside SSDs. The basic idea is to employ the visibility graph technique to unify both temporal and spatial locality of references of I/O accesses, for directing cache management in SSDs. Specifically, we propose to adaptively generate the visibility graph of cached data pages and then support batch adjustment of adjacent or nearby (hot) cached data pages by referring to the connection situations in the visibility graph. In addition, we propose to evict the buffered data pages in batches by also referring to the connection situations, to maximize the internal flushing parallelism of SSD devices without worsening I/O congestion. The trace-driven simulation experiments show that our proposal can yield improvements on cache hits by between 0.8% and 19.8%, and the overall I/O latency by 25.6% on average, compared to state-of-the-art cache management schemes inside SSDs.

大多数固态硬盘(SSD)都采用板载DRAM (Dynamic Random Access Memory)来缓冲写数据,如果数据在写操作中呈现局域性,则可以显著减少提交到SSD闪存阵列的写操作量。本文主要讨论如何有效地管理ssd内的少量DRAM缓存。其基本思想是使用可见性图技术统一I/O访问引用的时间和空间位置,以指导ssd中的缓存管理。具体而言,我们提出自适应生成缓存数据页面的可见性图,然后根据可见性图中的连接情况,支持对相邻或附近(热)缓存数据页面进行批量调整。此外,我们还建议在参考连接情况的情况下,分批地驱逐缓存的数据页,以最大限度地提高SSD设备的内部刷新并行性,而不会加剧I/O拥塞。跟踪驱动的模拟实验表明,与ssd内部最先进的缓存管理方案相比,我们的建议可以将缓存命中率提高0.8%到19.8%,总体I/O延迟平均降低25.6%。
{"title":"Visibility Graph-based Cache Management for DRAM Buffer Inside Solid-state Drives","authors":"Zhibing Sha, Jun Li, Fengxiang Zhang, Min Huang, Zhigang Cai, Francois Trahay, Jianwei Liao","doi":"https://dl.acm.org/doi/10.1145/3586576","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3586576","url":null,"abstract":"<p>Most solid-state drives (SSDs) adopt an on-board Dynamic Random Access Memory (DRAM) to buffer the write data, which can significantly reduce the amount of write operations committed to the flash array of SSD if data exhibits locality in write operations. This article focuses on efficiently managing the small amount of DRAM cache inside SSDs. The basic idea is to employ the visibility graph technique to unify both temporal and spatial locality of references of I/O accesses, for directing cache management in SSDs. Specifically, we propose to adaptively generate the visibility graph of cached data pages and then support batch adjustment of adjacent or nearby (hot) cached data pages by referring to the connection situations in the visibility graph. In addition, we propose to evict the buffered data pages in batches by also referring to the connection situations, to maximize the internal flushing parallelism of SSD devices without worsening I/O congestion. The trace-driven simulation experiments show that our proposal can yield improvements on cache hits by between <monospace>0.8</monospace>% and <monospace>19.8</monospace>%, and the overall I/O latency by <monospace>25.6</monospace>% on average, compared to state-of-the-art cache management schemes inside SSDs.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Universal SMR-aware Cache Framework with Deep Optimization for DM-SMR and HM-SMR Disks 基于DM-SMR和HM-SMR磁盘深度优化的通用smr感知缓存框架
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-06-19 DOI: https://dl.acm.org/doi/10.1145/3588442
Diansen Sun, Ruixiong Tan, Yunpeng Chai

To satisfy the enormous storage capacities required for big data, data centers have been adopting high-density shingled magnetic recording (SMR) disks. However, the weak fine-grained random write performance of SMR disks caused by their inherent write amplification and unbalanced read–write performance poses a severe challenge. Many studies have proposed solid-state drive (SSD) cache systems to address this issue. However, existing cache algorithms, such as the least recently used (LRU) algorithm, which is used to optimize cache popularity, and the MOST algorithm, which is used to optimize the write amplification factor, cannot exploit the full performance of the proposed cache systems because of their inappropriate optimization objectives. This article proposes a new SMR-aware cache framework called SAC+ to improve SMR-based hybrid storage. SAC+ integrates the two dominant types of SMR drives—namely, drive-managed and host-managed SMR drives—and provides a universal framework implementation. In addition, SAC+ integrally combines the drive characteristics to optimize I/O performance. The results of evaluations conducted using real-world traces indicate that SAC+ reduces the I/O time by 36–93% compared with state-of-the-art algorithms.

为了满足大数据的巨大存储容量需求,数据中心开始采用高密度瓦式磁记录(SMR)磁盘。然而,由于SMR磁盘固有的写放大和读写性能不平衡,导致其细粒度随机写性能较弱,这对SMR磁盘的性能提出了严峻的挑战。许多研究建议使用固态硬盘(SSD)缓存系统来解决这个问题。然而,现有的缓存算法,如用于优化缓存流行度的最近最少使用(LRU)算法和用于优化写放大因子的MOST算法,由于其优化目标不合适,无法充分利用所提出的缓存系统的性能。本文提出了一种新的smr感知缓存框架SAC+,以改进基于smr的混合存储。SAC+集成了两种主要类型的SMR驱动器——即驱动器管理的和主机管理的SMR驱动器——并提供了一个通用的框架实现。此外,SAC+集成了驱动器特性,以优化I/O性能。使用实际跟踪进行的评估结果表明,与最先进的算法相比,SAC+将I/O时间减少了36-93%。
{"title":"A Universal SMR-aware Cache Framework with Deep Optimization for DM-SMR and HM-SMR Disks","authors":"Diansen Sun, Ruixiong Tan, Yunpeng Chai","doi":"https://dl.acm.org/doi/10.1145/3588442","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3588442","url":null,"abstract":"<p>To satisfy the enormous storage capacities required for big data, data centers have been adopting high-density shingled magnetic recording (SMR) disks. However, the weak fine-grained random write performance of SMR disks caused by their inherent write amplification and unbalanced read–write performance poses a severe challenge. Many studies have proposed solid-state drive (SSD) cache systems to address this issue. However, existing cache algorithms, such as the least recently used (LRU) algorithm, which is used to optimize cache popularity, and the MOST algorithm, which is used to optimize the write amplification factor, cannot exploit the full performance of the proposed cache systems because of their inappropriate optimization objectives. This article proposes a new SMR-aware cache framework called SAC+ to improve SMR-based hybrid storage. SAC+ integrates the two dominant types of SMR drives—namely, drive-managed and host-managed SMR drives—and provides a universal framework implementation. In addition, SAC+ integrally combines the drive characteristics to optimize I/O performance. The results of evaluations conducted using real-world traces indicate that SAC+ reduces the I/O time by 36–93% compared with state-of-the-art algorithms.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CostCounter: A Better Method for Collision Mitigation in Cuckoo Hashing CostCounter:一种更好的Cuckoo哈希冲突缓解方法
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-05-12 DOI: 10.1145/3596910
Haonan Wu, Shuxian Wang, Zhanfeng Jin, Yuhang Zhang, Ruyun Ma, Sijin Fan, Ruili Chao
Hardware is often required to support fast search and high-throughput applications. Consequently, the performance of search algorithms is limited by storage bandwidth. Hence, the search algorithm must be optimized accordingly. We propose a CostCounter (CC) algorithm based on cuckoo hashing and an Improved CostCounter (ICC) algorithm. A better path can be selected when collisions occur using a cost counter to record the kick-out situation. Our simulation results indicate that the CC and ICC algorithms can achieve more significant performance improvements than Random Walk (RW), Breadth First Search (BFS), and MinCounter (MC). With two buckets and two slots per bucket, under the 95% memory load rate of the maximum load rate, CC and ICC are optimized on read-write times over 20% and 80% compared to MC and BFS, respectively. Furthermore, the CC and ICC algorithms achieve a slight improvement in storage efficiency compared with MC. In addition, we implement RW, MC, and the proposed algorithms using fine-grained locking to support a high throughput rate. From the test on field programmable gate arrays, we verify the simulation results and our algorithms optimize the maximum throughput over 23% compared to RW and 9% compared to MC under 95% of the memory capacity. The test results indicate that our CC and ICC algorithms can achieve better performance in terms of hardware bandwidth and memory load efficiency without incurring a significant resource cost.
通常需要硬件来支持快速搜索和高吞吐量应用程序。因此,搜索算法的性能受到存储带宽的限制。因此,必须相应地优化搜索算法。我们提出了一种基于杜鹃散列的成本计数器(CC)算法和一种改进的成本计数器算法。当发生碰撞时,可以使用成本计数器来记录踢出情况,从而选择更好的路径。我们的仿真结果表明,CC和ICC算法可以实现比随机漫步(RW)、广度优先搜索(BFS)和MinCounter(MC)更显著的性能改进。在两个存储桶和每个存储桶两个插槽的情况下,在最大负载率的95%内存负载率下,CC和ICC的读写时间分别比MC和BFS优化了20%和80%以上。此外,与MC相比,CC和ICC算法的存储效率略有提高。此外,我们使用细粒度锁定实现了RW、MC和所提出的算法,以支持高吞吐率。通过对现场可编程门阵列的测试,我们验证了模拟结果,并且我们的算法在95%的存储容量下优化了最大吞吐量,与RW相比超过23%,与MC相比超过9%。测试结果表明,我们的CC和ICC算法在硬件带宽和内存负载效率方面可以获得更好的性能,而不会产生显著的资源成本。
{"title":"CostCounter: A Better Method for Collision Mitigation in Cuckoo Hashing","authors":"Haonan Wu, Shuxian Wang, Zhanfeng Jin, Yuhang Zhang, Ruyun Ma, Sijin Fan, Ruili Chao","doi":"10.1145/3596910","DOIUrl":"https://doi.org/10.1145/3596910","url":null,"abstract":"Hardware is often required to support fast search and high-throughput applications. Consequently, the performance of search algorithms is limited by storage bandwidth. Hence, the search algorithm must be optimized accordingly. We propose a CostCounter (CC) algorithm based on cuckoo hashing and an Improved CostCounter (ICC) algorithm. A better path can be selected when collisions occur using a cost counter to record the kick-out situation. Our simulation results indicate that the CC and ICC algorithms can achieve more significant performance improvements than Random Walk (RW), Breadth First Search (BFS), and MinCounter (MC). With two buckets and two slots per bucket, under the 95% memory load rate of the maximum load rate, CC and ICC are optimized on read-write times over 20% and 80% compared to MC and BFS, respectively. Furthermore, the CC and ICC algorithms achieve a slight improvement in storage efficiency compared with MC. In addition, we implement RW, MC, and the proposed algorithms using fine-grained locking to support a high throughput rate. From the test on field programmable gate arrays, we verify the simulation results and our algorithms optimize the maximum throughput over 23% compared to RW and 9% compared to MC under 95% of the memory capacity. The test results indicate that our CC and ICC algorithms can achieve better performance in terms of hardware bandwidth and memory load efficiency without incurring a significant resource cost.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47925919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid Block Storage for Efficient Cloud Volume Service 用于高效云卷服务的混合块存储
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-05-08 DOI: 10.1145/3596446
Yiming Zhang, Huiba Li, Shengyun Liu, Peng Huang
The migration of traditional desktop and server applications to the cloud brings challenge of high performance, high reliability and low cost to the underlying cloud storage. To satisfy the requirement, this paper proposes a hybrid cloud-scale block storage system called Ursa. Trace analysis shows that the I/O patterns served by block storage have only limited locality to exploit. Therefore, instead of using SSDs as a cache layer, Ursa proposes an SSD-HDD-hybrid storage structure that directly stores primary replicas on SSDs and replicates backup replicas on HDDs. At the core of Ursa’s hybrid storage design is an adaptive journal that can bridge the performance gap between primary SSDs and backup HDDs for random writes, by transforming small backup writes into journal appends which are then asynchronously replayed and merged to backup HDDs. To efficiently index the journal, we design a novel range-optimized merge-tree (ROMT) structure that combines a continuous range of keys into a single composite key {offset,length}. Ursa integrates the hybrid structure with designs for high reliability, scalability, and availability. Experiments show that Ursa in its hybrid mode achieves almost the same performance as in its SSD-only mode (storing all replicas on SSDs), and outperforms other block stores (Ceph and Sheepdog) even in their SSD-only mode while achieving much higher CPU efficiency (IOPS and throughput per core).
传统桌面和服务器应用向云的迁移给底层云存储带来了高性能、高可靠性和低成本的挑战。为了满足这一需求,本文提出了一种混合云规模的块存储系统Ursa。跟踪分析表明,块存储服务的I/O模式只有有限的局部性可以利用。因此,Ursa提出了一种ssd - hdd混合存储结构,直接将主副本存储在ssd上,将备份副本复制到hdd上,而不是使用ssd作为缓存层。Ursa混合存储设计的核心是一个自适应日志,通过将小的备份写入转换为日志附件,然后异步重放并合并到备份hdd,可以弥合主ssd和备份hdd之间随机写入的性能差距。为了有效地索引日志,我们设计了一种新的范围优化合并树(ROMT)结构,该结构将连续范围的键组合成单个复合键{offset,length}。Ursa将混合结构与高可靠性、可扩展性和可用性的设计相结合。实验表明,Ursa在其混合模式下实现了与纯ssd模式(将所有副本存储在ssd上)几乎相同的性能,并且即使在纯ssd模式下也优于其他块存储(Ceph和Sheepdog),同时实现了更高的CPU效率(IOPS和每核吞吐量)。
{"title":"Hybrid Block Storage for Efficient Cloud Volume Service","authors":"Yiming Zhang, Huiba Li, Shengyun Liu, Peng Huang","doi":"10.1145/3596446","DOIUrl":"https://doi.org/10.1145/3596446","url":null,"abstract":"The migration of traditional desktop and server applications to the cloud brings challenge of high performance, high reliability and low cost to the underlying cloud storage. To satisfy the requirement, this paper proposes a hybrid cloud-scale block storage system called Ursa. Trace analysis shows that the I/O patterns served by block storage have only limited locality to exploit. Therefore, instead of using SSDs as a cache layer, Ursa proposes an SSD-HDD-hybrid storage structure that directly stores primary replicas on SSDs and replicates backup replicas on HDDs. At the core of Ursa’s hybrid storage design is an adaptive journal that can bridge the performance gap between primary SSDs and backup HDDs for random writes, by transforming small backup writes into journal appends which are then asynchronously replayed and merged to backup HDDs. To efficiently index the journal, we design a novel range-optimized merge-tree (ROMT) structure that combines a continuous range of keys into a single composite key {offset,length}. Ursa integrates the hybrid structure with designs for high reliability, scalability, and availability. Experiments show that Ursa in its hybrid mode achieves almost the same performance as in its SSD-only mode (storing all replicas on SSDs), and outperforms other block stores (Ceph and Sheepdog) even in their SSD-only mode while achieving much higher CPU efficiency (IOPS and throughput per core).","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45670248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid Block Storage for Efficient Cloud Volume Service 高效云卷服务的混合块存储
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-05-08 DOI: https://dl.acm.org/doi/10.1145/3596446
Yiming Zhang, Huiba Li, Shengyun Liu, Peng Huang

The migration of traditional desktop and server applications to the cloud brings challenge of high performance, high reliability and low cost to the underlying cloud storage. To satisfy the requirement, this paper proposes a hybrid cloud-scale block storage system called Ursa. Trace analysis shows that the I/O patterns served by block storage have only limited locality to exploit. Therefore, instead of using SSDs as a cache layer, Ursa proposes an SSD-HDD-hybrid storage structure that directly stores primary replicas on SSDs and replicates backup replicas on HDDs. At the core of Ursa’s hybrid storage design is an adaptive journal that can bridge the performance gap between primary SSDs and backup HDDs for random writes, by transforming small backup writes into journal appends which are then asynchronously replayed and merged to backup HDDs. To efficiently index the journal, we design a novel range-optimized merge-tree (ROMT) structure that combines a continuous range of keys into a single composite key {offset,length}. Ursa integrates the hybrid structure with designs for high reliability, scalability, and availability. Experiments show that Ursa in its hybrid mode achieves almost the same performance as in its SSD-only mode (storing all replicas on SSDs), and outperforms other block stores (Ceph and Sheepdog) even in their SSD-only mode while achieving much higher CPU efficiency (IOPS and throughput per core).

传统桌面和服务器应用向云的迁移给底层云存储带来了高性能、高可靠性和低成本的挑战。为了满足这一需求,本文提出了一种混合云规模的块存储系统Ursa。跟踪分析表明,块存储服务的I/O模式只有有限的局部性可以利用。因此,Ursa提出了一种ssd - hdd混合存储结构,直接将主副本存储在ssd上,将备份副本复制到hdd上,而不是使用ssd作为缓存层。Ursa混合存储设计的核心是一个自适应日志,通过将小的备份写入转换为日志附件,然后异步重放并合并到备份hdd,可以弥合主ssd和备份hdd之间随机写入的性能差距。为了有效地索引日志,我们设计了一种新的范围优化合并树(ROMT)结构,该结构将连续范围的键组合成单个复合键{offset,length}。Ursa将混合结构与高可靠性、可扩展性和可用性的设计相结合。实验表明,Ursa在其混合模式下实现了与纯ssd模式(将所有副本存储在ssd上)几乎相同的性能,并且即使在纯ssd模式下也优于其他块存储(Ceph和Sheepdog),同时实现了更高的CPU效率(IOPS和每核吞吐量)。
{"title":"Hybrid Block Storage for Efficient Cloud Volume Service","authors":"Yiming Zhang, Huiba Li, Shengyun Liu, Peng Huang","doi":"https://dl.acm.org/doi/10.1145/3596446","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3596446","url":null,"abstract":"<p>The migration of traditional desktop and server applications to the cloud brings challenge of high performance, high reliability and low cost to the underlying cloud storage. To satisfy the requirement, this paper proposes a hybrid cloud-scale block storage system called <span>Ursa</span>. Trace analysis shows that the I/O patterns served by block storage have only limited locality to exploit. Therefore, instead of using SSDs as a cache layer, <span>Ursa</span> proposes an SSD-HDD-hybrid storage structure that directly stores primary replicas on SSDs and replicates backup replicas on HDDs. At the core of <span>Ursa</span>’s hybrid storage design is an adaptive journal that can bridge the performance gap between primary SSDs and backup HDDs for random writes, by transforming small backup writes into journal appends which are then asynchronously replayed and merged to backup HDDs. To efficiently index the journal, we design a novel range-optimized merge-tree (ROMT) structure that combines a continuous range of keys into a single composite key <monospace>{offset,length}</monospace>. <span>Ursa</span> integrates the hybrid structure with designs for high reliability, scalability, and availability. Experiments show that <span>Ursa</span> in its hybrid mode achieves almost the same performance as in its SSD-only mode (storing all replicas on SSDs), and outperforms other block stores (Ceph and Sheepdog) even in their SSD-only mode while achieving much higher CPU efficiency (IOPS and throughput per core).</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Derrick: A Three-layer Balancer for Self-managed Continuous Scalability Derrick:实现自我管理的连续可扩展性的三层均衡器
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-04-28 DOI: 10.1145/3594543
Andrzej Jackowski, Leszek Gryz, Michal Welnicki, C. Dubnicki, K. Iwanicki
Data arrangement determines the capacity, resilience, and performance of a distributed storage system. A scalable self-managed system must place its data efficiently not only during stable operation but also after an expansion, planned downscaling, or device failures. In this article, we present Derrick, a data balancing algorithm addressing these needs, which has been developed for HYDRAstor, a highly scalable commercial storage system. Derrick makes its decisions quickly in case of failures but takes additional time to find a nearly optimal data arrangement and a plan for reaching it when the device population changes. Compared to balancing algorithms in two other state-of-the-art systems, Derrick provides better capacity utilization, reduced data movement, and improved performance. Moreover, it can be easily adapted to meet custom placement requirements.
数据排列决定了分布式存储系统的容量、弹性和性能。可扩展的自管理系统不仅在稳定运行期间,而且在扩展、计划缩减或设备故障之后,都必须有效地放置数据。在这篇文章中,我们介绍了Derrick,一种满足这些需求的数据平衡算法,它是为高度可扩展的商业存储系统HYDRAstor开发的。Derrick在出现故障时迅速做出决定,但需要额外的时间来找到近乎最佳的数据安排,并在设备数量变化时制定计划。与其他两个最先进系统中的平衡算法相比,Derrick提供了更好的容量利用率、减少的数据移动和改进的性能。此外,它可以很容易地适应自定义的放置要求。
{"title":"Derrick: A Three-layer Balancer for Self-managed Continuous Scalability","authors":"Andrzej Jackowski, Leszek Gryz, Michal Welnicki, C. Dubnicki, K. Iwanicki","doi":"10.1145/3594543","DOIUrl":"https://doi.org/10.1145/3594543","url":null,"abstract":"Data arrangement determines the capacity, resilience, and performance of a distributed storage system. A scalable self-managed system must place its data efficiently not only during stable operation but also after an expansion, planned downscaling, or device failures. In this article, we present Derrick, a data balancing algorithm addressing these needs, which has been developed for HYDRAstor, a highly scalable commercial storage system. Derrick makes its decisions quickly in case of failures but takes additional time to find a nearly optimal data arrangement and a plan for reaching it when the device population changes. Compared to balancing algorithms in two other state-of-the-art systems, Derrick provides better capacity utilization, reduced data movement, and improved performance. Moreover, it can be easily adapted to meet custom placement requirements.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43674372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introduction to the Special Section on USENIX ATC 2022 介绍USENIX ATC 2022的特殊部分
IF 1.7 3区 计算机科学 Q3 Computer Science Pub Date : 2023-04-08 DOI: https://dl.acm.org/doi/10.1145/3582557
Jiri Schindler, Noa Zilberman

No abstract available.

没有摘要。
{"title":"Introduction to the Special Section on USENIX ATC 2022","authors":"Jiri Schindler, Noa Zilberman","doi":"https://dl.acm.org/doi/10.1145/3582557","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3582557","url":null,"abstract":"<p>No abstract available.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2023-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Storage
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1