ACM Transactions on Storage最新文献

LVMT: An Efficient Authenticated Storage for Blockchain LVMT：区块链的高效认证存储

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2024-05-16 DOI: 10.1145/3664818

Chenxing Li, Sidi Mohamed Beillahi, Guang Yang, Ming Wu, Wei Xu, Fan Long

Authenticated storage access is the performance bottleneck of a blockchain, because each access can be amplified to potentially O(log n) disk I/O operations in the standard Merkle Patricia Trie (MPT) storage structure. In this paper, we propose a multi-Layer Versioned Multipoint Trie (LVMT), a novel high-performance blockchain storage with significantly reduced I/O amplifications. LVMT uses the authenticated multipoint evaluation tree (AMT) vector commitment protocol to update commitment proofs in constant time. LVMT adopts a multi-layer design to support unlimited key-value pairs and stores version numbers instead of value hashes to avoid costly elliptic curve multiplication operations. In our experiment, LVMT outperforms the MPT in real Ethereum traces, delivering read and write operations six times faster. It also boosts blockchain system execution throughput by up to 2.7 times.

认证存储访问是区块链的性能瓶颈，因为在标准的 Merkle Patricia Trie（MPT）存储结构中，每次访问都可能被放大到潜在的 O(log n) 磁盘 I/O 操作。在本文中，我们提出了一种多层版本化多点三角形（LVMT），这是一种新型高性能区块链存储，可显著减少 I/O 放大。LVMT 使用认证多点评估树（AMT）向量承诺协议，在恒定时间内更新承诺证明。LVMT 采用多层设计，支持无限键值对，并存储版本号而非值哈希值，以避免昂贵的椭圆曲线乘法运算。在我们的实验中，LVMT 在实际以太坊跟踪中的表现优于 MPT，其读写操作速度是 MPT 的六倍。它还将区块链系统的执行吞吐量提高了 2.7 倍。

引用次数: 0

The Design of Fast Delta Encoding for Delta Compression Based Storage Systems 基于德尔塔压缩的存储系统的快速德尔塔编码设计

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2024-05-14 DOI: 10.1145/3664817

Haoliang Tan, Wen Xia, Xiangyu Zou, Cai Deng, Qing Liao, Zhaoquan Gu

Delta encoding is a data reduction technique capable of calculating the differences (i.e., delta) among very similar files and chunks. It is widely used for various applications, such as synchronization replication, backup/archival storage, cache compression, etc. However, delta encoding is computationally costly due to its time-consuming word-matching operations for delta calculation. Existing delta encoding approaches either run at a slow encoding speed, such as Xdelta and Zdelta, or at a low compression ratio, such as Ddelta and Edelta. In this paper, we propose Gdelta, a fast delta encoding approach with a high compression ratio. The key idea behind Gdelta is the combined use of five techniques: (1) employing an improved Gear-based rolling hash to replace Adler32 hash for fast scanning overlapping words of similar chunks, (2) adopting a quick array-based indexing for word-matching, (3) applying a sampling indexing scheme to reduce the cost of traditional building full indexes for base chunks’ words, (4) skipping unmatched words to accelerate delta encoding through non-redundant areas, and (5) last but not least, after word-matching, further batch compressing the remainder to improve the compression ratio. Our evaluation results driven by seven real-world datasets suggest that Gdelta achieves encoding/decoding speedups of 3.5X ∼ 25X over the classic Xdelta and Zdelta approaches while increasing the compression ratio by about 10% ∼ 240%.

三角洲编码是一种数据缩减技术，能够计算非常相似的文件和数据块之间的差异（即三角洲）。它被广泛用于各种应用，如同步复制、备份/存档存储、缓存压缩等。然而，由于 delta 编码的计算需要耗费大量时间进行单词匹配操作，因此计算成本很高。现有的 delta 编码方法要么编码速度慢，如 Xdelta 和 Zdelta，要么压缩率低，如 Ddelta 和 Edelta。在本文中，我们提出了一种具有高压缩比的快速三角编码方法 Gdelta。Gdelta 背后的关键理念是综合利用五种技术：(1) 采用改进的基于 Gear 的滚动哈希取代 Adler32 哈希，以快速扫描相似数据块的重叠词；(2) 采用基于数组的快速索引进行词匹配；(3) 采用抽样索引方案，以降低为基础数据块的词建立完整索引的传统成本；(4) 跳过不匹配的词，通过非冗余区域加速 delta 编码；(5) 最后但并非最不重要的是，在词匹配后，进一步批量压缩剩余部分，以提高压缩率。我们通过七个真实世界数据集得出的评估结果表明，与经典的 Xdelta 和 Zdelta 方法相比，Gdelta 的编码/解码速度提高了 3.5 倍 ∼ 25 倍，同时压缩率提高了约 10% ∼ 240%。

{"title":"The Design of Fast Delta Encoding for Delta Compression Based Storage Systems","authors":"Haoliang Tan, Wen Xia, Xiangyu Zou, Cai Deng, Qing Liao, Zhaoquan Gu","doi":"10.1145/3664817","DOIUrl":"https://doi.org/10.1145/3664817","url":null,"abstract":"Delta encoding is a data reduction technique capable of calculating the differences (i.e., delta) among very similar files and chunks. It is widely used for various applications, such as synchronization replication, backup/archival storage, cache compression, etc. However, delta encoding is computationally costly due to its time-consuming word-matching operations for delta calculation. Existing delta encoding approaches either run at a slow encoding speed, such as Xdelta and Zdelta, or at a low compression ratio, such as Ddelta and Edelta. In this paper, we propose Gdelta, a fast delta encoding approach with a high compression ratio. The key idea behind Gdelta is the combined use of five techniques: (1) employing an improved Gear-based rolling hash to replace Adler32 hash for fast scanning overlapping words of similar chunks, (2) adopting a quick array-based indexing for word-matching, (3) applying a sampling indexing scheme to reduce the cost of traditional building full indexes for base chunks’ words, (4) skipping unmatched words to accelerate delta encoding through non-redundant areas, and (5) last but not least, after word-matching, further batch compressing the remainder to improve the compression ratio. Our evaluation results driven by seven real-world datasets suggest that Gdelta achieves encoding/decoding speedups of 3.5X ∼ 25X over the classic Xdelta and Zdelta approaches while increasing the compression ratio by about 10% ∼ 240%.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"4 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140928573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Memory-Disaggregated Radix Tree 内存分解的 Radix 树

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2024-05-08 DOI: 10.1145/3664289

Xuchuan Luo, Pengfei Zuo, Jiacheng Shen, Jiazhen Gu, Xin Wang, Michael Lyu, Yangfan Zhou

Disaggregated memory (DM) is an increasingly prevalent architecture with high resource utilization. It separates computing and memory resources into two pools and interconnects them with fast networks. Existing range indexes on DM are based on B+ trees, which suffer from large inherent read and write amplifications. The read and write amplifications rapidly saturate the network bandwidth, resulting in low request throughput and high access latency of B+ trees on DM.

In this paper, we propose that the radix tree is more suitable for DM than the B+ tree due to smaller read and write amplifications. However, constructing a radix tree on DM is challenging due to the costly lock-based concurrency control, the bounded memory-side IOPS, and the complicated computing-side cache validation. To address these challenges, we design SMART, the first radix tree for disaggregated memory with high performance. Specifically, we leverage 1) a hybrid concurrency control scheme including lock-free internal nodes and fine-grained lock-based leaf nodes to reduce lock overhead, 2) a computing-side read-delegation and write-combining technique to break through the IOPS upper bound by reducing redundant I/Os, and 3) a simple yet effective reverse check mechanism for computing-side cache validation. Experimental results show that SMART achieves 6.1 × higher throughput under typical write-intensive workloads and 2.8 × higher throughput under read-only workloads in YCSB benchmarks, compared with state-of-the-art B+ trees on DM.

分解内存（DM）是一种日益流行的高资源利用率架构。它将计算和内存资源分成两个池，并通过快速网络互连。DM 上现有的范围索引基于 B+ 树，这种索引存在较大的固有读写放大。读写放大会使网络带宽迅速饱和，导致 DM 上 B+ 树的请求吞吐量低、访问延迟高。本文提出，由于读写放大较小，弧度树比 B+ 树更适合 DM。然而，由于基于锁的并发控制成本高昂、内存侧 IOPS 受限以及计算侧缓存验证复杂，在 DM 上构建弧度树具有挑战性。为了应对这些挑战，我们设计了 SMART，这是第一个用于高性能分解内存的弧度树。具体来说，我们利用：1）一种混合并发控制方案，包括无锁内部节点和基于细粒度锁的叶节点，以减少锁开销；2）一种计算侧读委托和写合并技术，通过减少冗余 I/O 来突破 IOPS 上限；3）一种简单而有效的反向检查机制，用于计算侧缓存验证。实验结果表明，在 YCSB 基准中，与 DM 上最先进的 B+ 树相比，SMART 在典型写密集型工作负载下的吞吐量提高了 6.1 倍，在只读工作负载下的吞吐量提高了 2.8 倍。

{"title":"A Memory-Disaggregated Radix Tree","authors":"Xuchuan Luo, Pengfei Zuo, Jiacheng Shen, Jiazhen Gu, Xin Wang, Michael Lyu, Yangfan Zhou","doi":"10.1145/3664289","DOIUrl":"https://doi.org/10.1145/3664289","url":null,"abstract":"Disaggregated memory (DM) is an increasingly prevalent architecture with high resource utilization. It separates computing and memory resources into two pools and interconnects them with fast networks. Existing range indexes on DM are based on B+ trees, which suffer from large inherent read and write amplifications. The read and write amplifications rapidly saturate the network bandwidth, resulting in low request throughput and high access latency of B+ trees on DM. In this paper, we propose that the radix tree is more suitable for DM than the B+ tree due to smaller read and write amplifications. However, constructing a radix tree on DM is challenging due to the costly lock-based concurrency control, the bounded memory-side IOPS, and the complicated computing-side cache validation. To address these challenges, we design SMART, the first radix tree for disaggregated memory with high performance. Specifically, we leverage 1) a hybrid concurrency control scheme including lock-free internal nodes and fine-grained lock-based leaf nodes to reduce lock overhead, 2) a computing-side read-delegation and write-combining technique to break through the IOPS upper bound by reducing redundant I/Os, and 3) a simple yet effective reverse check mechanism for computing-side cache validation. Experimental results show that SMART achieves 6.1 × higher throughput under typical write-intensive workloads and 2.8 × higher throughput under read-only workloads in YCSB benchmarks, compared with state-of-the-art B+ trees on DM.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"41 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140928577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fastmove: A Comprehensive Study of On-Chip DMA and its Demonstration for Accelerating Data Movement in NVM-based Storage Systems Fastmove：片上 DMA 综合研究及其在基于 NVM 的存储系统中加速数据移动的演示

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2024-05-06 DOI: 10.1145/3656477

Jiahao Li, Jingbo Su, Luofan Chen, Cheng Li, Kai Zhang, Liang Yang, Sam Noh, Yinlong Xu

Data-intensive applications executing on NVM-based storage systems experience serious bottlenecks when moving data between DRAM and NVM. We advocate for the use of the long-existing but recently neglected on-chip DMA to expedite data movement with three contributions. First, we explore new latency-oriented optimization directions, driven by a comprehensive DMA study, to design a high-performance DMA module, which significantly lowers the I/O size threshold to observe benefits. Second, we propose a new data movement engine, Fastmove, that coordinates the use of the DMA along with the CPU with DDIO-aware strategies, judicious scheduling and load splitting such that the DMA’s limitations are compensated, and the overall gains are maximized. Finally, with a general kernel-based design, simple APIs, and DAX file system integration, Fastmove allows applications to transparently exploit the DMA and its new features without code change. We run three data-intensive applications MySQL, GraphWalker, and Filebench atop NOVA, ext4-DAX, and XFS-DAX, with standard benchmarks like TPC-C, and popular graph algorithms like PageRank. Across single- and multi-socket settings, compared to the conventional CPU-only NVM accesses, Fastmove introduces to TPC-C with MySQL 1.13-2.16 × speedups of peak throughput, reduces the average latency by 17.7-60.8%, and saves 37.1-68.9% CPU usage spent in data movement. It also shortens the execution time of graph algorithms with GraphWalker by 39.7-53.4%, and introduces 1.01-1.48 × throughput speedups for Filebench.

在基于 NVM 的存储系统上执行的数据密集型应用在 DRAM 和 NVM 之间移动数据时会遇到严重的瓶颈。我们主张利用存在已久但最近被忽视的片上 DMA 来加速数据移动，并为此做出了三项贡献。首先，我们在全面的 DMA 研究的推动下，探索了新的面向延迟的优化方向，设计出了高性能 DMA 模块，大大降低了 I/O 大小门槛，从而观察到效益。其次，我们提出了一种新的数据移动引擎 Fastmove，它通过 DDIO 感知策略、明智的调度和负载分流来协调 DMA 和 CPU 的使用，从而弥补 DMA 的局限性，实现整体收益最大化。最后，通过基于内核的通用设计、简单的应用程序接口和 DAX 文件系统集成，Fastmove 允许应用程序在不修改代码的情况下透明地利用 DMA 及其新功能。我们在NOVA、ext4-DAX和XFS-DAX上运行了三个数据密集型应用程序MySQL、GraphWalker和Filebench，并进行了TPC-C等标准基准测试和PageRank等流行图形算法测试。在单插槽和多插槽设置中，与传统的仅使用 CPU 的 NVM 访问相比，Fastmove 将峰值吞吐量提高了 1.13-2.16 倍，将平均延迟降低了 17.7-60.8%，并节省了 37.1-68.9% 用于数据移动的 CPU 占用率。它还将使用 GraphWalker 的图形算法的执行时间缩短了 39.7-53.4%，并将 Filebench 的吞吐量速度提高了 1.01-1.48倍。

{"title":"Fastmove: A Comprehensive Study of On-Chip DMA and its Demonstration for Accelerating Data Movement in NVM-based Storage Systems","authors":"Jiahao Li, Jingbo Su, Luofan Chen, Cheng Li, Kai Zhang, Liang Yang, Sam Noh, Yinlong Xu","doi":"10.1145/3656477","DOIUrl":"https://doi.org/10.1145/3656477","url":null,"abstract":"Data-intensive applications executing on NVM-based storage systems experience serious bottlenecks when moving data between DRAM and NVM. We advocate for the use of the long-existing but recently neglected on-chip DMA to expedite data movement with three contributions. First, we explore new latency-oriented optimization directions, driven by a comprehensive DMA study, to design a high-performance DMA module, which significantly lowers the I/O size threshold to observe benefits. Second, we propose a new data movement engine, <monospace>Fastmove</monospace>, that coordinates the use of the DMA along with the CPU with DDIO-aware strategies, judicious scheduling and load splitting such that the DMA’s limitations are compensated, and the overall gains are maximized. Finally, with a general kernel-based design, simple APIs, and DAX file system integration, <monospace>Fastmove</monospace> allows applications to transparently exploit the DMA and its new features without code change. We run three data-intensive applications MySQL, GraphWalker, and Filebench atop <monospace>NOVA</monospace>, <monospace>ext4-DAX</monospace>, and <monospace>XFS-DAX</monospace>, with standard benchmarks like TPC-C, and popular graph algorithms like PageRank. Across single- and multi-socket settings, compared to the conventional CPU-only NVM accesses, <monospace>Fastmove</monospace> introduces to TPC-C with MySQL 1.13-2.16 × speedups of peak throughput, reduces the average latency by 17.7-60.8%, and saves 37.1-68.9% CPU usage spent in data movement. It also shortens the execution time of graph algorithms with GraphWalker by 39.7-53.4%, and introduces 1.01-1.48 × throughput speedups for Filebench.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"61 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140881717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FSDedup: Feature-Aware and Selective Deduplication for Improving Performance of Encrypted Non-Volatile Main Memory FSDedup：提高加密非易失性主存储器性能的特征感知和选择性重复数据删除技术

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2024-05-01 DOI: 10.1145/3662736

Chunfeng Du, Zihang Lin, Suzhen Wu, Yifei Chen, Jiapeng Wu, Shengzhe Wang, Weichun Wang, Qingfeng Wu, Bo Mao

Enhancing the endurance, performance, and energy efficiency of encrypted Non-Volatile Main Memory (NVMM) can be achieved by minimizing written data through inline deduplication. However, existing approaches applying inline deduplication to encrypted NVMM suffer from substantial performance degradation due to high computing, memory footprint, and index-lookup overhead to generate, store, and query the cryptographic hash (fingerprint). In the preliminary ESD [14], we proposed the Error Correcting Code (ECC) assisted selective deduplication scheme, utilizing the ECC information as a fingerprint to identify similar data effectively and then leveraging the selective deduplication technique to eliminate a large amount of redundant data with high reference counts. In this paper, we proposed FSDedup. Compared with ESD, FSDedup could leverage the prefetch cache to reduce the read overhead during similarity comparison and utilize the cache refresh mechanism to identify further and eliminate more redundant data. Extensive experimental evaluations demonstrate that FSDedup can enhance the performance of the NVMM system further than the ESD. Experimental results show that FSDedup can improve both write and read speed by up to 1.8 ×, enhance Instructions Per Cycle (IPC) by up to 1.5 ×, and reduce energy consumption by up to 2.0 ×, compared to ESD.

通过内联重复数据删除来减少写入数据，可以提高加密非易失性主存储器（NVMM）的耐用性、性能和能效。然而，由于生成、存储和查询加密哈希值（指纹）所需的计算、内存占用和索引查找开销较高，因此将内联重复数据删除应用于加密非易失性主存储器的现有方法存在性能大幅下降的问题。在最初的 ESD [14]中，我们提出了纠错码（ECC）辅助选择性重复数据删除方案，利用 ECC 信息作为指纹有效识别相似数据，然后利用选择性重复数据删除技术消除大量具有高参考计数的冗余数据。本文提出了 FSDedup。与 ESD 相比，FSDedup 可以利用预取缓存减少相似性比较过程中的读取开销，并利用缓存刷新机制进一步识别和消除更多冗余数据。广泛的实验评估证明，与 ESD 相比，FSDedup 可以进一步提高 NVMM 系统的性能。实验结果表明，与 ESD 相比，FSDedup 可将写入和读取速度提高 1.8 倍，将每周期指令数（IPC）提高 1.5 倍，将能耗降低 2.0 倍。

{"title":"FSDedup: Feature-Aware and Selective Deduplication for Improving Performance of Encrypted Non-Volatile Main Memory","authors":"Chunfeng Du, Zihang Lin, Suzhen Wu, Yifei Chen, Jiapeng Wu, Shengzhe Wang, Weichun Wang, Qingfeng Wu, Bo Mao","doi":"10.1145/3662736","DOIUrl":"https://doi.org/10.1145/3662736","url":null,"abstract":"Enhancing the endurance, performance, and energy efficiency of encrypted Non-Volatile Main Memory (NVMM) can be achieved by minimizing written data through inline deduplication. However, existing approaches applying inline deduplication to encrypted NVMM suffer from substantial performance degradation due to high computing, memory footprint, and index-lookup overhead to generate, store, and query the cryptographic hash (fingerprint). In the preliminary ESD [14], we proposed the Error Correcting Code (ECC) assisted selective deduplication scheme, utilizing the ECC information as a fingerprint to identify similar data effectively and then leveraging the selective deduplication technique to eliminate a large amount of redundant data with high reference counts. In this paper, we proposed FSDedup. Compared with ESD, FSDedup could leverage the prefetch cache to reduce the read overhead during similarity comparison and utilize the cache refresh mechanism to identify further and eliminate more redundant data. Extensive experimental evaluations demonstrate that FSDedup can enhance the performance of the NVMM system further than the ESD. Experimental results show that FSDedup can improve both write and read speed by up to 1.8 ×, enhance Instructions Per Cycle (IPC) by up to 1.5 ×, and reduce energy consumption by up to 2.0 ×, compared to ESD.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"26 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140840473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design and Implementation of Deduplication on F2FS F2FS 重复数据删除的设计与实施

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2024-04-29 DOI: 10.1145/3662735

Tiangmeng Zhang, Renhui Chen, Zijing Li, Congming Gao, Chengke Wang, Jiwu Shu

Data deduplication technology has gained popularity in modern file systems due to its ability to eliminate redundant writes and improve storage space efficiency. In recent years, the flash-friendly file system (F2FS) has been widely adopted in flash memory based storage devices, including smartphones, fast-speed servers and Internet of Things. In this paper, we propose F2DFS (deduplication-based F2FS), which introduces three main design contributions. First, F2DFS integrates inline and offline hybrid deduplication. Inline deduplication eliminates redundant writes and enhances flash device endurance, while offline deduplication mitigates the negative I/O performance impact and saves more storage space. Second, F2DFS follows the file system coupling design principle, effectively leveraging the potentials and benefits of both deduplication and native F2FS. Also, with the aid of this principle, F2DFS achieves high-performance and space-efficient incremental deduplication. Third, F2DFS adopts virtual indexing to mitigate deduplication-induced many-to-one mapping updates during the segment cleaning. We conducted comprehensive experimental comparisons between F2DFS, native F2FS, and other state-of-the-art deduplication schemes, using both synthetic and real-world workloads. For inline deduplication, F2DFS outperforms SmartDedup, Dmdedup, and ZFS, in terms of both I/O bandwidth performance and deduplication rates. And for offline deduplication, compared to SmartDedup, XFS and BtrFS, F2DFS shows higher execution efficiency, lower resource usage and greater storage space savings. Moreover, F2DFS demonstrates more efficient segment cleanings than native F2FS.

重复数据删除技术能够消除冗余写入并提高存储空间效率，因此在现代文件系统中越来越受欢迎。近年来，基于闪存的存储设备（包括智能手机、高速服务器和物联网）广泛采用了闪存友好型文件系统（F2FS）。本文提出的 F2DFS（基于重复数据删除的 F2FS）主要有三个设计贡献。首先，F2DFS 集成了在线和离线混合重复数据删除功能。内联重复数据删除消除了冗余写入，增强了闪存设备的耐用性，而离线重复数据删除则减轻了对 I/O 性能的负面影响，节省了更多存储空间。其次，F2DFS 遵循文件系统耦合设计原则，有效利用了重复数据删除和本地 F2FS 的潜力和优势。同时，借助这一原理，F2DFS 实现了高性能和空间效率高的增量重复数据删除。第三，F2DFS 采用了虚拟索引技术，以减轻段清理过程中重复数据删除引起的多对一映射更新。我们使用合成和实际工作负载对 F2DFS、本地 F2FS 和其他最先进的重复数据删除方案进行了全面的实验比较。就在线重复数据删除而言，F2DFS 在 I/O 带宽性能和重复数据删除率方面都优于 SmartDedup、Dmdedup 和 ZFS。在离线重复数据删除方面，与 SmartDedup、XFS 和 BtrFS 相比，F2DFS 表现出更高的执行效率、更低的资源使用率和更大的存储空间节省。此外，与本地 F2FS 相比，F2DFS 的段清理效率更高。

{"title":"Design and Implementation of Deduplication on F2FS","authors":"Tiangmeng Zhang, Renhui Chen, Zijing Li, Congming Gao, Chengke Wang, Jiwu Shu","doi":"10.1145/3662735","DOIUrl":"https://doi.org/10.1145/3662735","url":null,"abstract":"Data deduplication technology has gained popularity in modern file systems due to its ability to eliminate redundant writes and improve storage space efficiency. In recent years, the flash-friendly file system (F2FS) has been widely adopted in flash memory based storage devices, including smartphones, fast-speed servers and Internet of Things. In this paper, we propose F2DFS (deduplication-based F2FS), which introduces three main design contributions. First, F2DFS integrates inline and offline hybrid deduplication. Inline deduplication eliminates redundant writes and enhances flash device endurance, while offline deduplication mitigates the negative I/O performance impact and saves more storage space. Second, F2DFS follows the file system coupling design principle, effectively leveraging the potentials and benefits of both deduplication and native F2FS. Also, with the aid of this principle, F2DFS achieves high-performance and space-efficient incremental deduplication. Third, F2DFS adopts virtual indexing to mitigate deduplication-induced many-to-one mapping updates during the segment cleaning. We conducted comprehensive experimental comparisons between F2DFS, native F2FS, and other state-of-the-art deduplication schemes, using both synthetic and real-world workloads. For inline deduplication, F2DFS outperforms SmartDedup, Dmdedup, and ZFS, in terms of both I/O bandwidth performance and deduplication rates. And for offline deduplication, compared to SmartDedup, XFS and BtrFS, F2DFS shows higher execution efficiency, lower resource usage and greater storage space savings. Moreover, F2DFS demonstrates more efficient segment cleanings than native F2FS.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"39 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140812433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Index Shipping for Efficient Replication in LSM Key-Value Stores with Hybrid KV Placement 在混合 KV 放置的 LSM 键值存储中实现高效复制的索引运输

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2024-04-16 DOI: 10.1145/3658672

Giorgos Stilianakis, Giorgos Saloustros, Orestis Chiotakis, Giorgos Xanthakis, Angelos Bilas

Key-value (KV) stores based on LSM tree have become a foundational layer in the storage stack of datacenters and cloud services. Current approaches for achieving reliability and availability favor reducing network traffic and send to replicas only new KV pairs. As a result, they perform costly compactions to reorganize data in both the primary and backup nodes, which increases device I/O traffic and CPU overhead, and eventually hurts overall system performance. In this paper we describe Tebis, an efficient LSM-based KV store that reduces I/O amplification and CPU overhead for maintaining the replica index. We use a primary-backup replication scheme that performs compactions only on the primary nodes and sends pre-built indexes to backup nodes, avoiding all compactions in backup nodes. Our approach includes an efficient mechanism to deal with pointer translation across nodes in the pre-built region index. Our results show that Tebisreduces resource utilization on backup nodes compared to performing full compactions: Throughput is increased by 1.06 − 2.90 ×, CPU efficiency is increased by 1.21 − 2.78 ×, and I/O amplification is reduced by 1.7 − 3.27 ×, while network traffic increases by up to 1.32 − 3.76 ×.

基于 LSM 树的键值（KV）存储已成为数据中心和云服务存储栈的基础层。当前实现可靠性和可用性的方法倾向于减少网络流量，只向副本发送新的 KV 对。因此，它们在主节点和备份节点中执行代价高昂的压缩来重组数据，这增加了设备 I/O 流量和 CPU 开销，最终损害了系统的整体性能。在本文中，我们介绍了基于 LSM 的高效 KV 存储 Tebis，它可以减少 I/O 放大和维护副本索引的 CPU 开销。我们采用主-备份复制方案，只在主节点上执行编译，并将预建索引发送到备份节点，避免在备份节点上进行所有编译。我们的方法包括一种高效机制，用于处理预建区域索引中跨节点的指针转换。我们的结果表明，与执行完全编译相比，Tebis 降低了备份节点的资源利用率：吞吐量提高了 1.06 - 2.90 倍，CPU 效率提高了 1.21 - 2.78 倍，I/O 放大减少了 1.7 - 3.27 倍，而网络流量最多增加了 1.32 - 3.76 倍。

{"title":"Index Shipping for Efficient Replication in LSM Key-Value Stores with Hybrid KV Placement","authors":"Giorgos Stilianakis, Giorgos Saloustros, Orestis Chiotakis, Giorgos Xanthakis, Angelos Bilas","doi":"10.1145/3658672","DOIUrl":"https://doi.org/10.1145/3658672","url":null,"abstract":"Key-value (KV) stores based on LSM tree have become a foundational layer in the storage stack of datacenters and cloud services. Current approaches for achieving reliability and availability favor reducing network traffic and send to replicas only new KV pairs. As a result, they perform costly compactions to reorganize data in both the primary and backup nodes, which increases device I/O traffic and CPU overhead, and eventually hurts overall system performance. In this paper we describe Tebis, an efficient LSM-based KV store that reduces I/O amplification and CPU overhead for maintaining the replica index. We use a primary-backup replication scheme that performs compactions only on the primary nodes and sends pre-built indexes to backup nodes, avoiding all compactions in backup nodes. Our approach includes an efficient mechanism to deal with pointer translation across nodes in the pre-built region index. Our results show that Tebis\u0000reduces resource utilization on backup nodes compared to performing full compactions: Throughput is increased by 1.06 − 2.90 ×, CPU efficiency is increased by 1.21 − 2.78 ×, and I/O amplification is reduced by 1.7 − 3.27 ×, while network traffic increases by up to 1.32 − 3.76 ×.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"6 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140588933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

eZNS: Elastic Zoned Namespace for Enhanced Performance Isolation and Device Utilization eZNS：增强性能隔离和设备利用率的弹性分区命名空间

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2024-04-12 DOI: 10.1145/3653716

Jaehong Min, Chenxingyu Zhao, Ming Liu, Arvind Krishnamurthy

Emerging Zoned Namespace (ZNS) SSDs, providing the coarse-grained zone abstraction, hold the potential to significantly enhance the cost-efficiency of future storage infrastructure and mitigate performance unpredictability. However, existing ZNS SSDs have a static zoned interface, making them in-adaptable to workload runtime behavior, unscalable to underlying hardware capabilities, and interfering with co-located zones. Applications either under-provision the zone resources yielding unsatisfied throughput, create over-provisioned zones and incur costs, or experience unexpected I/O latencies.

We propose eZNS, an elastic-zoned namespace interface that exposes an adaptive zone with predictable characteristics. eZNS comprises two major components: a zone arbiter that manages zone allocation and active resources on the control plane, a hierarchical I/O scheduler with read congestion control, and write admission control on the data plane. Together, eZNS enables the transparent use of a ZNS SSD and closes the gap between application requirements and zone interface properties. Our evaluations over RocksDB demonstrate that eZNS outperforms a static zoned interface by 17.7% and 80.3% in throughput and tail latency, respectively, at most.

新兴的分区命名空间（ZNS）固态硬盘提供了粗粒度分区抽象，有望显著提高未来存储基础架构的成本效益，并降低性能不可预测性。然而，现有的 ZNS 固态硬盘具有静态分区接口，无法适应工作负载运行时的行为，无法扩展到底层硬件能力，并且会干扰同地分区。应用程序要么对分区资源配置不足，导致吞吐量无法满足要求；要么创建过度配置的分区，导致成本增加；要么出现意外的 I/O 延迟。我们提出的 eZNS 是一种弹性分区命名空间接口，可提供具有可预测特性的自适应分区。eZNS 由两个主要部分组成：在控制平面上管理分区分配和活动资源的分区仲裁器；在数据平面上具有读取拥塞控制和写入许可控制功能的分层 I/O 调度器。eZNS 可实现对 ZNS 固态硬盘的透明使用，缩小应用需求与区域接口属性之间的差距。我们在 RocksDB 上进行的评估表明，eZNS 在吞吐量和尾部延迟方面分别比静态分区接口最多高出 17.7% 和 80.3%。

{"title":"eZNS: Elastic Zoned Namespace for Enhanced Performance Isolation and Device Utilization","authors":"Jaehong Min, Chenxingyu Zhao, Ming Liu, Arvind Krishnamurthy","doi":"10.1145/3653716","DOIUrl":"https://doi.org/10.1145/3653716","url":null,"abstract":"Emerging Zoned Namespace (ZNS) SSDs, providing the coarse-grained zone abstraction, hold the potential to significantly enhance the cost-efficiency of future storage infrastructure and mitigate performance unpredictability. However, existing ZNS SSDs have a static zoned interface, making them in-adaptable to workload runtime behavior, unscalable to underlying hardware capabilities, and interfering with co-located zones. Applications either under-provision the zone resources yielding unsatisfied throughput, create over-provisioned zones and incur costs, or experience unexpected I/O latencies. We propose eZNS, an elastic-zoned namespace interface that exposes an adaptive zone with predictable characteristics. eZNS comprises two major components: a zone arbiter that manages zone allocation and active resources on the control plane, a hierarchical I/O scheduler with read congestion control, and write admission control on the data plane. Together, eZNS enables the transparent use of a ZNS SSD and closes the gap between application requirements and zone interface properties. Our evaluations over RocksDB demonstrate that eZNS outperforms a static zoned interface by 17.7% and 80.3% in throughput and tail latency, respectively, at most.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"52 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140576562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Contract-Aware and Cost-effective LSM Store for Cloud Storage with Low Latency Spikes 面向低延迟峰值云存储的合约感知型低成本高效率 LSM 存储器

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2024-02-20 DOI: 10.1145/3643851

Yuanhui Zhou, Jian Zhou, Kai Lu, Ling Zhan, Peng Xu, Peng Wu, Shuning Chen, Xian Liu, Jiguang Wan

Cloud storage is gaining popularity because features such as pay-as-you-go significantly reduce storage costs. However, the community has not sufficiently explored its contract model and latency characteristics. As LSM-Tree-based key-value stores (LSM stores) become the building block for numerous cloud applications, how cloud storage would impact the performance of key-value accesses is vital. This study reveals the significant latency variances of Amazon Elastic Block Store (EBS) under various I/O pressures, which challenges LSM store read performance on cloud storage. To reduce the corresponding tail latency, we propose Calcspar, a contract-aware LSM store for cloud storage, which efficiently addresses the challenges by regulating the rate of I/O requests to cloud storage and absorbing surplus I/O requests with the data cache. We specifically developed a fluctuation-aware cache to lower the high latency brought on by workload fluctuations. Additionally, we build a congestion-aware IOPS allocator to reduce the impact of LSM store internal operations on read latency. We evaluated Calcspar on EBS with different real-world workloads and compared it to the cutting-edge LSM stores. The results show that Calcspar can significantly reduce tail latency while maintaining regular read and write performance, keeping the 99^th percentile latency under 550μs and reducing average latency by 66%. In addition, Calcspar has lower write prices and average latency compared to Cloud NoSQL services offered by cloud vendors.

由于 "即用即付 "等功能大大降低了存储成本，云存储越来越受欢迎。然而，业界对其合同模式和延迟特性的探讨还不够充分。随着基于 LSM 树的键值存储（LSM 存储）成为众多云应用的基石，云存储如何影响键值访问的性能至关重要。本研究揭示了亚马逊弹性块存储（EBS）在各种 I/O 压力下的显著延迟差异，这对云存储上的 LSM 存储读取性能提出了挑战。为了减少相应的尾部延迟，我们提出了用于云存储的合约感知 LSM 存储 Calcspar，它通过调节云存储的 I/O 请求速率和利用数据缓存吸收多余的 I/O 请求来有效地应对挑战。我们专门开发了波动感知缓存，以降低工作负载波动带来的高延迟。此外，我们还开发了拥塞感知 IOPS 分配器，以降低 LSM 存储内部操作对读取延迟的影响。我们在 EBS 上用不同的实际工作负载对 Calcspar 进行了评估，并将其与最先进的 LSM 存储进行了比较。结果表明，Calcspar 可以在保持常规读写性能的同时显著降低尾部延迟，将第 99 百分位数延迟保持在 550μs 以下，并将平均延迟降低 66%。此外，与云供应商提供的云 NoSQL 服务相比，Calcspar 的写入价格和平均延迟都更低。

{"title":"A Contract-Aware and Cost-effective LSM Store for Cloud Storage with Low Latency Spikes","authors":"Yuanhui Zhou, Jian Zhou, Kai Lu, Ling Zhan, Peng Xu, Peng Wu, Shuning Chen, Xian Liu, Jiguang Wan","doi":"10.1145/3643851","DOIUrl":"https://doi.org/10.1145/3643851","url":null,"abstract":"Cloud storage is gaining popularity because features such as pay-as-you-go significantly reduce storage costs. However, the community has not sufficiently explored its contract model and latency characteristics. As LSM-Tree-based key-value stores (LSM stores) become the building block for numerous cloud applications, how cloud storage would impact the performance of key-value accesses is vital. This study reveals the significant latency variances of Amazon Elastic Block Store (EBS) under various I/O pressures, which challenges LSM store read performance on cloud storage. To reduce the corresponding tail latency, we propose Calcspar, a contract-aware LSM store for cloud storage, which efficiently addresses the challenges by regulating the rate of I/O requests to cloud storage and absorbing surplus I/O requests with the data cache. We specifically developed a fluctuation-aware cache to lower the high latency brought on by workload fluctuations. Additionally, we build a congestion-aware IOPS allocator to reduce the impact of LSM store internal operations on read latency. We evaluated Calcspar on EBS with different real-world workloads and compared it to the cutting-edge LSM stores. The results show that Calcspar can significantly reduce tail latency while maintaining regular read and write performance, keeping the 99th percentile latency under 550μs and reducing average latency by 66%. In addition, Calcspar has lower write prices and average latency compared to Cloud NoSQL services offered by cloud vendors.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"36 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139956219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Introduction to the Special Section on USENIX ATC 2023 USENIX ATC 2023 特别分会简介

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2024-02-19 DOI: 10.1145/3635156

Dan Williams, Julia Lawall

No abstract available.

无摘要。

引用次数: 0