ACM Transactions on Storage最新文献_第5页

A Universal SMR-aware Cache Framework with Deep Optimization for DM-SMR and HM-SMR Disks 基于DM-SMR和HM-SMR磁盘深度优化的通用smr感知缓存框架

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2023-06-19 DOI: https://dl.acm.org/doi/10.1145/3588442

Diansen Sun, Ruixiong Tan, Yunpeng Chai

To satisfy the enormous storage capacities required for big data, data centers have been adopting high-density shingled magnetic recording (SMR) disks. However, the weak fine-grained random write performance of SMR disks caused by their inherent write amplification and unbalanced read–write performance poses a severe challenge. Many studies have proposed solid-state drive (SSD) cache systems to address this issue. However, existing cache algorithms, such as the least recently used (LRU) algorithm, which is used to optimize cache popularity, and the MOST algorithm, which is used to optimize the write amplification factor, cannot exploit the full performance of the proposed cache systems because of their inappropriate optimization objectives. This article proposes a new SMR-aware cache framework called SAC+ to improve SMR-based hybrid storage. SAC+ integrates the two dominant types of SMR drives—namely, drive-managed and host-managed SMR drives—and provides a universal framework implementation. In addition, SAC+ integrally combines the drive characteristics to optimize I/O performance. The results of evaluations conducted using real-world traces indicate that SAC+ reduces the I/O time by 36–93% compared with state-of-the-art algorithms.

为了满足大数据的巨大存储容量需求，数据中心开始采用高密度瓦式磁记录(SMR)磁盘。然而，由于SMR磁盘固有的写放大和读写性能不平衡，导致其细粒度随机写性能较弱，这对SMR磁盘的性能提出了严峻的挑战。许多研究建议使用固态硬盘(SSD)缓存系统来解决这个问题。然而，现有的缓存算法，如用于优化缓存流行度的最近最少使用(LRU)算法和用于优化写放大因子的MOST算法，由于其优化目标不合适，无法充分利用所提出的缓存系统的性能。本文提出了一种新的smr感知缓存框架SAC+，以改进基于smr的混合存储。SAC+集成了两种主要类型的SMR驱动器——即驱动器管理的和主机管理的SMR驱动器——并提供了一个通用的框架实现。此外，SAC+集成了驱动器特性，以优化I/O性能。使用实际跟踪进行的评估结果表明，与最先进的算法相比，SAC+将I/O时间减少了36-93%。

{"title":"A Universal SMR-aware Cache Framework with Deep Optimization for DM-SMR and HM-SMR Disks","authors":"Diansen Sun, Ruixiong Tan, Yunpeng Chai","doi":"https://dl.acm.org/doi/10.1145/3588442","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3588442","url":null,"abstract":"To satisfy the enormous storage capacities required for big data, data centers have been adopting high-density shingled magnetic recording (SMR) disks. However, the weak fine-grained random write performance of SMR disks caused by their inherent write amplification and unbalanced read–write performance poses a severe challenge. Many studies have proposed solid-state drive (SSD) cache systems to address this issue. However, existing cache algorithms, such as the least recently used (LRU) algorithm, which is used to optimize cache popularity, and the MOST algorithm, which is used to optimize the write amplification factor, cannot exploit the full performance of the proposed cache systems because of their inappropriate optimization objectives. This article proposes a new SMR-aware cache framework called SAC+ to improve SMR-based hybrid storage. SAC+ integrates the two dominant types of SMR drives—namely, drive-managed and host-managed SMR drives—and provides a universal framework implementation. In addition, SAC+ integrally combines the drive characteristics to optimize I/O performance. The results of evaluations conducted using real-world traces indicate that SAC+ reduces the I/O time by 36–93% compared with state-of-the-art algorithms.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"2011 2","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CostCounter: A Better Method for Collision Mitigation in Cuckoo Hashing CostCounter:一种更好的Cuckoo哈希冲突缓解方法

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2023-05-12 DOI: 10.1145/3596910

Haonan Wu, Shuxian Wang, Zhanfeng Jin, Yuhang Zhang, Ruyun Ma, Sijin Fan, Ruili Chao

Hardware is often required to support fast search and high-throughput applications. Consequently, the performance of search algorithms is limited by storage bandwidth. Hence, the search algorithm must be optimized accordingly. We propose a CostCounter (CC) algorithm based on cuckoo hashing and an Improved CostCounter (ICC) algorithm. A better path can be selected when collisions occur using a cost counter to record the kick-out situation. Our simulation results indicate that the CC and ICC algorithms can achieve more significant performance improvements than Random Walk (RW), Breadth First Search (BFS), and MinCounter (MC). With two buckets and two slots per bucket, under the 95% memory load rate of the maximum load rate, CC and ICC are optimized on read-write times over 20% and 80% compared to MC and BFS, respectively. Furthermore, the CC and ICC algorithms achieve a slight improvement in storage efficiency compared with MC. In addition, we implement RW, MC, and the proposed algorithms using fine-grained locking to support a high throughput rate. From the test on field programmable gate arrays, we verify the simulation results and our algorithms optimize the maximum throughput over 23% compared to RW and 9% compared to MC under 95% of the memory capacity. The test results indicate that our CC and ICC algorithms can achieve better performance in terms of hardware bandwidth and memory load efficiency without incurring a significant resource cost.

通常需要硬件来支持快速搜索和高吞吐量应用程序。因此，搜索算法的性能受到存储带宽的限制。因此，必须相应地优化搜索算法。我们提出了一种基于杜鹃散列的成本计数器（CC）算法和一种改进的成本计数器算法。当发生碰撞时，可以使用成本计数器来记录踢出情况，从而选择更好的路径。我们的仿真结果表明，CC和ICC算法可以实现比随机漫步（RW）、广度优先搜索（BFS）和MinCounter（MC）更显著的性能改进。在两个存储桶和每个存储桶两个插槽的情况下，在最大负载率的95%内存负载率下，CC和ICC的读写时间分别比MC和BFS优化了20%和80%以上。此外，与MC相比，CC和ICC算法的存储效率略有提高。此外，我们使用细粒度锁定实现了RW、MC和所提出的算法，以支持高吞吐率。通过对现场可编程门阵列的测试，我们验证了模拟结果，并且我们的算法在95%的存储容量下优化了最大吞吐量，与RW相比超过23%，与MC相比超过9%。测试结果表明，我们的CC和ICC算法在硬件带宽和内存负载效率方面可以获得更好的性能，而不会产生显著的资源成本。

{"title":"CostCounter: A Better Method for Collision Mitigation in Cuckoo Hashing","authors":"Haonan Wu, Shuxian Wang, Zhanfeng Jin, Yuhang Zhang, Ruyun Ma, Sijin Fan, Ruili Chao","doi":"10.1145/3596910","DOIUrl":"https://doi.org/10.1145/3596910","url":null,"abstract":"Hardware is often required to support fast search and high-throughput applications. Consequently, the performance of search algorithms is limited by storage bandwidth. Hence, the search algorithm must be optimized accordingly. We propose a CostCounter (CC) algorithm based on cuckoo hashing and an Improved CostCounter (ICC) algorithm. A better path can be selected when collisions occur using a cost counter to record the kick-out situation. Our simulation results indicate that the CC and ICC algorithms can achieve more significant performance improvements than Random Walk (RW), Breadth First Search (BFS), and MinCounter (MC). With two buckets and two slots per bucket, under the 95% memory load rate of the maximum load rate, CC and ICC are optimized on read-write times over 20% and 80% compared to MC and BFS, respectively. Furthermore, the CC and ICC algorithms achieve a slight improvement in storage efficiency compared with MC. In addition, we implement RW, MC, and the proposed algorithms using fine-grained locking to support a high throughput rate. From the test on field programmable gate arrays, we verify the simulation results and our algorithms optimize the maximum throughput over 23% compared to RW and 9% compared to MC under 95% of the memory capacity. The test results indicate that our CC and ICC algorithms can achieve better performance in terms of hardware bandwidth and memory load efficiency without incurring a significant resource cost.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"19 1","pages":"1 - 24"},"PeriodicalIF":1.7,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47925919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hybrid Block Storage for Efficient Cloud Volume Service 用于高效云卷服务的混合块存储

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2023-05-08 DOI: 10.1145/3596446

Yiming Zhang, Huiba Li, Shengyun Liu, Peng Huang

The migration of traditional desktop and server applications to the cloud brings challenge of high performance, high reliability and low cost to the underlying cloud storage. To satisfy the requirement, this paper proposes a hybrid cloud-scale block storage system called Ursa. Trace analysis shows that the I/O patterns served by block storage have only limited locality to exploit. Therefore, instead of using SSDs as a cache layer, Ursa proposes an SSD-HDD-hybrid storage structure that directly stores primary replicas on SSDs and replicates backup replicas on HDDs. At the core of Ursa’s hybrid storage design is an adaptive journal that can bridge the performance gap between primary SSDs and backup HDDs for random writes, by transforming small backup writes into journal appends which are then asynchronously replayed and merged to backup HDDs. To efficiently index the journal, we design a novel range-optimized merge-tree (ROMT) structure that combines a continuous range of keys into a single composite key {offset,length}. Ursa integrates the hybrid structure with designs for high reliability, scalability, and availability. Experiments show that Ursa in its hybrid mode achieves almost the same performance as in its SSD-only mode (storing all replicas on SSDs), and outperforms other block stores (Ceph and Sheepdog) even in their SSD-only mode while achieving much higher CPU efficiency (IOPS and throughput per core).

传统桌面和服务器应用向云的迁移给底层云存储带来了高性能、高可靠性和低成本的挑战。为了满足这一需求，本文提出了一种混合云规模的块存储系统Ursa。跟踪分析表明，块存储服务的I/O模式只有有限的局部性可以利用。因此，Ursa提出了一种ssd - hdd混合存储结构，直接将主副本存储在ssd上，将备份副本复制到hdd上，而不是使用ssd作为缓存层。Ursa混合存储设计的核心是一个自适应日志，通过将小的备份写入转换为日志附件，然后异步重放并合并到备份hdd，可以弥合主ssd和备份hdd之间随机写入的性能差距。为了有效地索引日志，我们设计了一种新的范围优化合并树(ROMT)结构，该结构将连续范围的键组合成单个复合键{offset,length}。Ursa将混合结构与高可靠性、可扩展性和可用性的设计相结合。实验表明，Ursa在其混合模式下实现了与纯ssd模式(将所有副本存储在ssd上)几乎相同的性能，并且即使在纯ssd模式下也优于其他块存储(Ceph和Sheepdog)，同时实现了更高的CPU效率(IOPS和每核吞吐量)。

{"title":"Hybrid Block Storage for Efficient Cloud Volume Service","authors":"Yiming Zhang, Huiba Li, Shengyun Liu, Peng Huang","doi":"10.1145/3596446","DOIUrl":"https://doi.org/10.1145/3596446","url":null,"abstract":"The migration of traditional desktop and server applications to the cloud brings challenge of high performance, high reliability and low cost to the underlying cloud storage. To satisfy the requirement, this paper proposes a hybrid cloud-scale block storage system called Ursa. Trace analysis shows that the I/O patterns served by block storage have only limited locality to exploit. Therefore, instead of using SSDs as a cache layer, Ursa proposes an SSD-HDD-hybrid storage structure that directly stores primary replicas on SSDs and replicates backup replicas on HDDs. At the core of Ursa’s hybrid storage design is an adaptive journal that can bridge the performance gap between primary SSDs and backup HDDs for random writes, by transforming small backup writes into journal appends which are then asynchronously replayed and merged to backup HDDs. To efficiently index the journal, we design a novel range-optimized merge-tree (ROMT) structure that combines a continuous range of keys into a single composite key {offset,length}. Ursa integrates the hybrid structure with designs for high reliability, scalability, and availability. Experiments show that Ursa in its hybrid mode achieves almost the same performance as in its SSD-only mode (storing all replicas on SSDs), and outperforms other block stores (Ceph and Sheepdog) even in their SSD-only mode while achieving much higher CPU efficiency (IOPS and throughput per core).","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":" ","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45670248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hybrid Block Storage for Efficient Cloud Volume Service 高效云卷服务的混合块存储

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2023-05-08 DOI: https://dl.acm.org/doi/10.1145/3596446

Yiming Zhang, Huiba Li, Shengyun Liu, Peng Huang

The migration of traditional desktop and server applications to the cloud brings challenge of high performance, high reliability and low cost to the underlying cloud storage. To satisfy the requirement, this paper proposes a hybrid cloud-scale block storage system called Ursa. Trace analysis shows that the I/O patterns served by block storage have only limited locality to exploit. Therefore, instead of using SSDs as a cache layer, Ursa proposes an SSD-HDD-hybrid storage structure that directly stores primary replicas on SSDs and replicates backup replicas on HDDs. At the core of Ursa’s hybrid storage design is an adaptive journal that can bridge the performance gap between primary SSDs and backup HDDs for random writes, by transforming small backup writes into journal appends which are then asynchronously replayed and merged to backup HDDs. To efficiently index the journal, we design a novel range-optimized merge-tree (ROMT) structure that combines a continuous range of keys into a single composite key {offset,length}. Ursa integrates the hybrid structure with designs for high reliability, scalability, and availability. Experiments show that Ursa in its hybrid mode achieves almost the same performance as in its SSD-only mode (storing all replicas on SSDs), and outperforms other block stores (Ceph and Sheepdog) even in their SSD-only mode while achieving much higher CPU efficiency (IOPS and throughput per core).

传统桌面和服务器应用向云的迁移给底层云存储带来了高性能、高可靠性和低成本的挑战。为了满足这一需求，本文提出了一种混合云规模的块存储系统Ursa。跟踪分析表明，块存储服务的I/O模式只有有限的局部性可以利用。因此，Ursa提出了一种ssd - hdd混合存储结构，直接将主副本存储在ssd上，将备份副本复制到hdd上，而不是使用ssd作为缓存层。Ursa混合存储设计的核心是一个自适应日志，通过将小的备份写入转换为日志附件，然后异步重放并合并到备份hdd，可以弥合主ssd和备份hdd之间随机写入的性能差距。为了有效地索引日志，我们设计了一种新的范围优化合并树(ROMT)结构，该结构将连续范围的键组合成单个复合键{offset,length}。Ursa将混合结构与高可靠性、可扩展性和可用性的设计相结合。实验表明，Ursa在其混合模式下实现了与纯ssd模式(将所有副本存储在ssd上)几乎相同的性能，并且即使在纯ssd模式下也优于其他块存储(Ceph和Sheepdog)，同时实现了更高的CPU效率(IOPS和每核吞吐量)。

{"title":"Hybrid Block Storage for Efficient Cloud Volume Service","authors":"Yiming Zhang, Huiba Li, Shengyun Liu, Peng Huang","doi":"https://dl.acm.org/doi/10.1145/3596446","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3596446","url":null,"abstract":"The migration of traditional desktop and server applications to the cloud brings challenge of high performance, high reliability and low cost to the underlying cloud storage. To satisfy the requirement, this paper proposes a hybrid cloud-scale block storage system called Ursa. Trace analysis shows that the I/O patterns served by block storage have only limited locality to exploit. Therefore, instead of using SSDs as a cache layer, Ursa proposes an SSD-HDD-hybrid storage structure that directly stores primary replicas on SSDs and replicates backup replicas on HDDs. At the core of Ursa’s hybrid storage design is an adaptive journal that can bridge the performance gap between primary SSDs and backup HDDs for random writes, by transforming small backup writes into journal appends which are then asynchronously replayed and merged to backup HDDs. To efficiently index the journal, we design a novel range-optimized merge-tree (ROMT) structure that combines a continuous range of keys into a single composite key <monospace>{offset,length}</monospace>. Ursa integrates the hybrid structure with designs for high reliability, scalability, and availability. Experiments show that Ursa in its hybrid mode achieves almost the same performance as in its SSD-only mode (storing all replicas on SSDs), and outperforms other block stores (Ceph and Sheepdog) even in their SSD-only mode while achieving much higher CPU efficiency (IOPS and throughput per core).","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"1 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Derrick: A Three-layer Balancer for Self-managed Continuous Scalability Derrick:实现自我管理的连续可扩展性的三层均衡器

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2023-04-28 DOI: 10.1145/3594543

Andrzej Jackowski, Leszek Gryz, Michal Welnicki, C. Dubnicki, K. Iwanicki

Data arrangement determines the capacity, resilience, and performance of a distributed storage system. A scalable self-managed system must place its data efficiently not only during stable operation but also after an expansion, planned downscaling, or device failures. In this article, we present Derrick, a data balancing algorithm addressing these needs, which has been developed for HYDRAstor, a highly scalable commercial storage system. Derrick makes its decisions quickly in case of failures but takes additional time to find a nearly optimal data arrangement and a plan for reaching it when the device population changes. Compared to balancing algorithms in two other state-of-the-art systems, Derrick provides better capacity utilization, reduced data movement, and improved performance. Moreover, it can be easily adapted to meet custom placement requirements.

数据排列决定了分布式存储系统的容量、弹性和性能。可扩展的自管理系统不仅在稳定运行期间，而且在扩展、计划缩减或设备故障之后，都必须有效地放置数据。在这篇文章中，我们介绍了Derrick，一种满足这些需求的数据平衡算法，它是为高度可扩展的商业存储系统HYDRAstor开发的。Derrick在出现故障时迅速做出决定，但需要额外的时间来找到近乎最佳的数据安排，并在设备数量变化时制定计划。与其他两个最先进系统中的平衡算法相比，Derrick提供了更好的容量利用率、减少的数据移动和改进的性能。此外，它可以很容易地适应自定义的放置要求。

引用次数: 0

Introduction to the Special Section on USENIX ATC 2022 介绍USENIX ATC 2022的特殊部分

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2023-04-08 DOI: https://dl.acm.org/doi/10.1145/3582557

Jiri Schindler, Noa Zilberman

No abstract available.

没有摘要。

引用次数: 0

Introduction to the Special Section on USENIX ATC 2022 介绍USENIX ATC 2022的特殊部分

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2023-04-08 DOI: 10.1145/3582557

J. Schindler, Noa Zilberman

The USENIX Annual Technical Conference (ATC) publishes current computer systems research across system disciplines including networking, storage, security, operating systems, databases, and machine learning. This special section of the ACM Transactions on Storage presents some highlights from the storage-related papers published in the USENIX ATC in 2022. A large proportion of ATC papers have traditionally been related to storage. ATC ’22 has continued this trend. Out of 393 submissions, the authors tagged 124 (32%) with one or more topic labels related to Storage, File Systems, Key-Value Stores, and Data Management Systems. The conference accepted 14 storage-related works (22% of all published submissions). We selected three storage papers. They have been expanded since their publication and rereviewed by several of their original ATC ’22 reviewers. Collectively, they represent the mission of the USENIX organization: to bring together researchers from academia and systems practitioners working on production systems and/or large installations of cloud services providers. The ATC complements other USENIX venues including the premier research conference on Operating Systems Design and Implementation (OSDI) as well as storageand networked-systems-focused conferences of File and Storage Technologies (FAST) and Networked Systems Design and Implementation (USENIX NSDI), respectively. We are pleased to present these papers representing this cross section in their expanded form. The Realizing Strong Determinism Contract on Log-Structured Merge Key-Value Stores paper advocates for a hardware and software co-designed framework that advances the state-of-the-art of a widely used persistent data structure of log-structured merge trees for NVMe SSDs. The ZNSwap: un-Block your Swap paper presents a new approach for Zoned Namespace SSDs that significantly improves the performance of Linux memory swap on SSD devices. Finally, the CacheSack: Theory and Experience of Google’s Admission Optimization for Datacenter Flash Caches paper, submitted to the ATC Operational Systems Track, describes the design of using Flash caches to lower I/O access latency, drawing on years of research and experiences of the authors. We hope that you will find new insights into the complex world of storage by reading them.

USENIX年度技术会议（ATC）发布了当前跨系统学科的计算机系统研究，包括网络、存储、安全、操作系统、数据库和机器学习。ACM存储事务的这一特别部分介绍了2022年在USENIX ATC上发表的存储相关论文中的一些亮点。ATC文件的很大一部分传统上与存储有关。ATC’22延续了这一趋势。在393份投稿中，作者用一个或多个与存储、文件系统、关键价值存储和数据管理系统相关的主题标签标记了124份（32%）。会议接受了14件与存储相关的作品（占所有已发表作品的22%）。我们选择了三张存储纸。自出版以来，它们得到了扩展，并被ATC的22位原始评审员中的几位重新评审。他们共同代表了USENIX组织的使命：将学术界的研究人员和从事生产系统和/或大型云服务提供商安装的系统从业者聚集在一起。ATC补充了USENIX的其他场地，包括操作系统设计与实施（OSDI）的首要研究会议，以及文件和存储技术（FAST）和网络系统设计与实现（USENIX NSDI）的存储和网络系统重点会议。我们很高兴以扩展的形式展示这些代表这一横截面的论文。《在日志结构合并键值存储上实现强确定性契约》论文主张建立一个硬件和软件共同设计的框架，以推进NVMe SSD中广泛使用的日志结构合并树的持久数据结构的最新技术。ZNSwap:un-BlockyourSwap论文为分区命名空间SSD提供了一种新方法，显著提高了SSD设备上Linux内存交换的性能。最后，提交给ATC操作系统轨道的CacheSack:谷歌数据中心闪存准入优化的理论和经验论文，结合作者多年的研究和经验，描述了使用闪存来降低I/O访问延迟的设计。我们希望您能通过阅读它们，对复杂的存储世界有新的见解。

{"title":"Introduction to the Special Section on USENIX ATC 2022","authors":"J. Schindler, Noa Zilberman","doi":"10.1145/3582557","DOIUrl":"https://doi.org/10.1145/3582557","url":null,"abstract":"The USENIX Annual Technical Conference (ATC) publishes current computer systems research across system disciplines including networking, storage, security, operating systems, databases, and machine learning. This special section of the ACM Transactions on Storage presents some highlights from the storage-related papers published in the USENIX ATC in 2022. A large proportion of ATC papers have traditionally been related to storage. ATC ’22 has continued this trend. Out of 393 submissions, the authors tagged 124 (32%) with one or more topic labels related to Storage, File Systems, Key-Value Stores, and Data Management Systems. The conference accepted 14 storage-related works (22% of all published submissions). We selected three storage papers. They have been expanded since their publication and rereviewed by several of their original ATC ’22 reviewers. Collectively, they represent the mission of the USENIX organization: to bring together researchers from academia and systems practitioners working on production systems and/or large installations of cloud services providers. The ATC complements other USENIX venues including the premier research conference on Operating Systems Design and Implementation (OSDI) as well as storageand networked-systems-focused conferences of File and Storage Technologies (FAST) and Networked Systems Design and Implementation (USENIX NSDI), respectively. We are pleased to present these papers representing this cross section in their expanded form. The Realizing Strong Determinism Contract on Log-Structured Merge Key-Value Stores paper advocates for a hardware and software co-designed framework that advances the state-of-the-art of a widely used persistent data structure of log-structured merge trees for NVMe SSDs. The ZNSwap: un-Block your Swap paper presents a new approach for Zoned Namespace SSDs that significantly improves the performance of Linux memory swap on SSD devices. Finally, the CacheSack: Theory and Experience of Google’s Admission Optimization for Datacenter Flash Caches paper, submitted to the ATC Operational Systems Track, describes the design of using Flash caches to lower I/O access latency, drawing on years of research and experiences of the authors. We hope that you will find new insights into the complex world of storage by reading them.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":" ","pages":"1 - 1"},"PeriodicalIF":1.7,"publicationDate":"2023-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42438967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Realizing Strong Determinism Contract on Log-Structured Merge Key-Value Stores 日志结构合并键值存储的强确定性契约实现

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2023-03-25 DOI: https://dl.acm.org/doi/10.1145/3582695

Miryeong Kwon, Seungjun Lee, Hyunkyu Choi, Jooyoung Hwang, Myoungsoo Jung

We propose Vigil-KV, a hardware and software co-designed framework that eliminates long-tail latency almost perfectly by introducing strong latency determinism. To make Get latency deterministic, Vigil-KV first enables a predictable latency mode (PLM) interface on a real datacenter-scale NVMe SSD, having knowledge about the nature of the underlying flash technologies. Vigil-KV at the system-level then hides the non-deterministic time window (associated with SSD’s internal tasks and/or write services) by internally scheduling the different device states of PLM across multiple physical functions. Vigil-KV further schedules compaction/flush operations and client requests being aware of PLM’s restrictions thereby integrating strong latency determinism into LSM KVs. We implement Vigil-KV upon a 1.92TB NVMe SSD prototype and Linux 4.19.91, but other LSM KVs can adopt its concept. We evaluate diverse Facebook and Yahoo scenarios with Vigil-KV, and the results show that Vigil-KV can reducethe tail latency of a baseline KV system by 3.19× while reducing the average latency by 34%, on average.

我们提出了Vigil-KV，这是一个硬件和软件协同设计的框架，通过引入强延迟确定性，几乎完美地消除了长尾延迟。为了使Get延迟具有确定性，Vigil-KV首先在真正的数据中心规模的NVMe SSD上启用可预测的延迟模式(PLM)接口，了解底层闪存技术的本质。然后，系统级的Vigil-KV通过内部调度PLM跨多个物理功能的不同设备状态来隐藏非确定性时间窗口(与SSD的内部任务和/或写服务相关)。Vigil-KV进一步调度压缩/刷新操作和客户端请求，了解PLM的限制，从而将强延迟确定性集成到LSM kv中。我们在1.92TB NVMe SSD原型和Linux 4.19.91上实现了Vigil-KV，但其他LSM kv可以采用它的概念。我们使用Vigil-KV对Facebook和Yahoo的不同场景进行了评估，结果表明Vigil-KV可以将基线KV系统的尾部延迟减少3.19倍，平均延迟减少34%。

{"title":"Realizing Strong Determinism Contract on Log-Structured Merge Key-Value Stores","authors":"Miryeong Kwon, Seungjun Lee, Hyunkyu Choi, Jooyoung Hwang, Myoungsoo Jung","doi":"https://dl.acm.org/doi/10.1145/3582695","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3582695","url":null,"abstract":"We propose Vigil-KV, a hardware and software co-designed framework that eliminates long-tail latency almost perfectly by introducing strong latency determinism. To make Get latency deterministic, Vigil-KV first enables a predictable latency mode (PLM) interface on a real datacenter-scale NVMe SSD, having knowledge about the nature of the underlying flash technologies. Vigil-KV at the system-level then hides the non-deterministic time window (associated with SSD’s internal tasks and/or write services) by internally scheduling the different device states of PLM across multiple physical functions. Vigil-KV further schedules compaction/flush operations and client requests being aware of PLM’s restrictions thereby integrating strong latency determinism into LSM KVs. We implement Vigil-KV upon a 1.92TB NVMe SSD prototype and Linux 4.19.91, but other LSM KVs can adopt its concept. We evaluate diverse Facebook and Yahoo scenarios with Vigil-KV, and the results show that Vigil-KV can reducethe tail latency of a baseline KV system by 3.19× while reducing the average latency by 34%, on average.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"20 8","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory Programs TriCache:一个用户透明的块缓存，可以在内存程序中实现高性能的核外处理

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2023-03-22 DOI: https://dl.acm.org/doi/10.1145/3583139

Guanyu Feng, Huanqi Cao, Xiaowei Zhu, Bowen Yu, Yuanwei Wang, Zixuan Ma, Shengqi Chen, Wenguang Chen

Out-of-core systems rely on high-performance cache sub-systems to reduce the number of I/O operations. Although the page cache in modern operating systems enables transparent access to memory and storage devices, it suffers from efficiency and scalability issues on cache misses, forcing out-of-core systems to design and implement their own cache components, which is a non-trivial task.

This study proposes TriCache, a cache mechanism that enables in-memory programs to efficiently process out-of-core datasets without requiring any code rewrite. It provides a virtual memory interface on top of the conventional block interface to simultaneously achieve user transparency and sufficient out-of-core performance. A multi-level block cache design is proposed to address the challenge of per-access address translations required by a memory interface. It can exploit spatial and temporal localities in memory or storage accesses to render storage-to-memory address translation and page-level concurrency control adequately efficient for the virtual memory interface.

Our evaluation shows that in-memory systems operating on top of TriCache can outperform Linux OS page cache by more than one order of magnitude, and can deliver performance comparable to or even better than that of corresponding counterparts designed specifically for out-of-core scenarios.

外核系统依赖于高性能缓存子系统来减少I/O操作的数量。尽管现代操作系统中的页面缓存支持对内存和存储设备的透明访问，但它在缓存丢失时存在效率和可伸缩性问题，迫使核心外系统设计和实现自己的缓存组件，这是一项非常重要的任务。本研究提出了TriCache，这是一种缓存机制，使内存程序能够有效地处理核心外数据集，而无需重写任何代码。它在传统块接口之上提供了一个虚拟内存接口，以同时实现用户透明性和足够的核外性能。提出了一种多级块缓存设计，以解决存储器接口要求的每次访问地址转换的挑战。它可以利用内存中的空间和时间位置或存储访问，为虚拟内存接口提供足够有效的存储到内存地址转换和页面级并发控制。我们的评估表明，在TriCache之上运行的内存系统可以比Linux操作系统的页面缓存性能高出一个数量级以上，并且可以提供与专门为非核心场景设计的相应系统相当甚至更好的性能。

{"title":"TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory Programs","authors":"Guanyu Feng, Huanqi Cao, Xiaowei Zhu, Bowen Yu, Yuanwei Wang, Zixuan Ma, Shengqi Chen, Wenguang Chen","doi":"https://dl.acm.org/doi/10.1145/3583139","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3583139","url":null,"abstract":"Out-of-core systems rely on high-performance cache sub-systems to reduce the number of I/O operations. Although the page cache in modern operating systems enables transparent access to memory and storage devices, it suffers from efficiency and scalability issues on cache misses, forcing out-of-core systems to design and implement their own cache components, which is a non-trivial task.This study proposes TriCache, a cache mechanism that enables in-memory programs to efficiently process out-of-core datasets without requiring any code rewrite. It provides a virtual memory interface on top of the conventional block interface to simultaneously achieve user transparency and sufficient out-of-core performance. A multi-level block cache design is proposed to address the challenge of per-access address translations required by a memory interface. It can exploit spatial and temporal localities in memory or storage accesses to render storage-to-memory address translation and page-level concurrency control adequately efficient for the virtual memory interface.Our evaluation shows that in-memory systems operating on top of TriCache can outperform Linux OS page cache by more than one order of magnitude, and can deliver performance comparable to or even better than that of corresponding counterparts designed specifically for out-of-core scenarios.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"12 6","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Universal SMR-aware Cache Framework with Deep Optimization for DM-SMR and HM-SMR Disks 一种用于DM-SMR和HM-SMR磁盘的具有深度优化的通用SMR感知缓存框架

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage

Pub Date : 2023-03-21 DOI: 10.1145/3588442

Diansen Sun, Ruixiong Tan, Yunpeng Chai

To satisfy the enormous storage capacities required for big data, data centers have been adopting high-density shingled magnetic recording (SMR) disks. However, the weak fine-grained random write performance of SMR disks caused by their inherent write amplification and unbalanced read–write performance poses a severe challenge. Many studies have proposed solid-state drive (SSD) cache systems to address this issue. However, existing cache algorithms, such as the least recently used (LRU) algorithm, which is used to optimize cache popularity, and the MOST algorithm, which is used to optimize the write amplification factor, cannot exploit the full performance of the proposed cache systems because of their inappropriate optimization objectives. This article proposes a new SMR-aware cache framework called SAC+ to improve SMR-based hybrid storage. SAC+ integrates the two dominant types of SMR drives—namely, drive-managed and host-managed SMR drives—and provides a universal framework implementation. In addition, SAC+ integrally combines the drive characteristics to optimize I/O performance. The results of evaluations conducted using real-world traces indicate that SAC+ reduces the I/O time by 36–93% compared with state-of-the-art algorithms.

为了满足大数据所需的巨大存储容量，数据中心一直在采用高密度叠片磁记录（SMR）磁盘。然而，SMR磁盘固有的写入放大和不平衡的读写性能导致其细粒度随机写入性能较弱，这是一个严峻的挑战。许多研究提出了固态驱动器（SSD）缓存系统来解决这个问题。然而，现有的缓存算法，例如用于优化缓存流行度的最近最少使用（LRU）算法和用于优化写放大因子的MOST算法，由于其不适当的优化目标，不能充分利用所提出的缓存系统的性能。本文提出了一种新的SMR感知缓存框架SAC+，以改进基于SMR的混合存储。SAC+集成了两种主要类型的SMR驱动器，即驱动器管理的和主机管理的SMR驱动，并提供了通用的框架实现。此外，SAC+集成了驱动器特性，以优化I/O性能。使用真实世界轨迹进行的评估结果表明，与最先进的算法相比，SAC+将I/O时间减少了36-93%。

{"title":"A Universal SMR-aware Cache Framework with Deep Optimization for DM-SMR and HM-SMR Disks","authors":"Diansen Sun, Ruixiong Tan, Yunpeng Chai","doi":"10.1145/3588442","DOIUrl":"https://doi.org/10.1145/3588442","url":null,"abstract":"To satisfy the enormous storage capacities required for big data, data centers have been adopting high-density shingled magnetic recording (SMR) disks. However, the weak fine-grained random write performance of SMR disks caused by their inherent write amplification and unbalanced read–write performance poses a severe challenge. Many studies have proposed solid-state drive (SSD) cache systems to address this issue. However, existing cache algorithms, such as the least recently used (LRU) algorithm, which is used to optimize cache popularity, and the MOST algorithm, which is used to optimize the write amplification factor, cannot exploit the full performance of the proposed cache systems because of their inappropriate optimization objectives. This article proposes a new SMR-aware cache framework called SAC+ to improve SMR-based hybrid storage. SAC+ integrates the two dominant types of SMR drives—namely, drive-managed and host-managed SMR drives—and provides a universal framework implementation. In addition, SAC+ integrally combines the drive characteristics to optimize I/O performance. The results of evaluations conducted using real-world traces indicate that SAC+ reduces the I/O time by 36–93% compared with state-of-the-art algorithms.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"19 1","pages":"1 - 35"},"PeriodicalIF":1.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48920469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0