Pub Date : 2023-06-19DOI: https://dl.acm.org/doi/10.1145/3588442
Diansen Sun, Ruixiong Tan, Yunpeng Chai
To satisfy the enormous storage capacities required for big data, data centers have been adopting high-density shingled magnetic recording (SMR) disks. However, the weak fine-grained random write performance of SMR disks caused by their inherent write amplification and unbalanced read–write performance poses a severe challenge. Many studies have proposed solid-state drive (SSD) cache systems to address this issue. However, existing cache algorithms, such as the least recently used (LRU) algorithm, which is used to optimize cache popularity, and the MOST algorithm, which is used to optimize the write amplification factor, cannot exploit the full performance of the proposed cache systems because of their inappropriate optimization objectives. This article proposes a new SMR-aware cache framework called SAC+ to improve SMR-based hybrid storage. SAC+ integrates the two dominant types of SMR drives—namely, drive-managed and host-managed SMR drives—and provides a universal framework implementation. In addition, SAC+ integrally combines the drive characteristics to optimize I/O performance. The results of evaluations conducted using real-world traces indicate that SAC+ reduces the I/O time by 36–93% compared with state-of-the-art algorithms.
{"title":"A Universal SMR-aware Cache Framework with Deep Optimization for DM-SMR and HM-SMR Disks","authors":"Diansen Sun, Ruixiong Tan, Yunpeng Chai","doi":"https://dl.acm.org/doi/10.1145/3588442","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3588442","url":null,"abstract":"<p>To satisfy the enormous storage capacities required for big data, data centers have been adopting high-density shingled magnetic recording (SMR) disks. However, the weak fine-grained random write performance of SMR disks caused by their inherent write amplification and unbalanced read–write performance poses a severe challenge. Many studies have proposed solid-state drive (SSD) cache systems to address this issue. However, existing cache algorithms, such as the least recently used (LRU) algorithm, which is used to optimize cache popularity, and the MOST algorithm, which is used to optimize the write amplification factor, cannot exploit the full performance of the proposed cache systems because of their inappropriate optimization objectives. This article proposes a new SMR-aware cache framework called SAC+ to improve SMR-based hybrid storage. SAC+ integrates the two dominant types of SMR drives—namely, drive-managed and host-managed SMR drives—and provides a universal framework implementation. In addition, SAC+ integrally combines the drive characteristics to optimize I/O performance. The results of evaluations conducted using real-world traces indicate that SAC+ reduces the I/O time by 36–93% compared with state-of-the-art algorithms.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"2011 2","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hardware is often required to support fast search and high-throughput applications. Consequently, the performance of search algorithms is limited by storage bandwidth. Hence, the search algorithm must be optimized accordingly. We propose a CostCounter (CC) algorithm based on cuckoo hashing and an Improved CostCounter (ICC) algorithm. A better path can be selected when collisions occur using a cost counter to record the kick-out situation. Our simulation results indicate that the CC and ICC algorithms can achieve more significant performance improvements than Random Walk (RW), Breadth First Search (BFS), and MinCounter (MC). With two buckets and two slots per bucket, under the 95% memory load rate of the maximum load rate, CC and ICC are optimized on read-write times over 20% and 80% compared to MC and BFS, respectively. Furthermore, the CC and ICC algorithms achieve a slight improvement in storage efficiency compared with MC. In addition, we implement RW, MC, and the proposed algorithms using fine-grained locking to support a high throughput rate. From the test on field programmable gate arrays, we verify the simulation results and our algorithms optimize the maximum throughput over 23% compared to RW and 9% compared to MC under 95% of the memory capacity. The test results indicate that our CC and ICC algorithms can achieve better performance in terms of hardware bandwidth and memory load efficiency without incurring a significant resource cost.
{"title":"CostCounter: A Better Method for Collision Mitigation in Cuckoo Hashing","authors":"Haonan Wu, Shuxian Wang, Zhanfeng Jin, Yuhang Zhang, Ruyun Ma, Sijin Fan, Ruili Chao","doi":"10.1145/3596910","DOIUrl":"https://doi.org/10.1145/3596910","url":null,"abstract":"Hardware is often required to support fast search and high-throughput applications. Consequently, the performance of search algorithms is limited by storage bandwidth. Hence, the search algorithm must be optimized accordingly. We propose a CostCounter (CC) algorithm based on cuckoo hashing and an Improved CostCounter (ICC) algorithm. A better path can be selected when collisions occur using a cost counter to record the kick-out situation. Our simulation results indicate that the CC and ICC algorithms can achieve more significant performance improvements than Random Walk (RW), Breadth First Search (BFS), and MinCounter (MC). With two buckets and two slots per bucket, under the 95% memory load rate of the maximum load rate, CC and ICC are optimized on read-write times over 20% and 80% compared to MC and BFS, respectively. Furthermore, the CC and ICC algorithms achieve a slight improvement in storage efficiency compared with MC. In addition, we implement RW, MC, and the proposed algorithms using fine-grained locking to support a high throughput rate. From the test on field programmable gate arrays, we verify the simulation results and our algorithms optimize the maximum throughput over 23% compared to RW and 9% compared to MC under 95% of the memory capacity. The test results indicate that our CC and ICC algorithms can achieve better performance in terms of hardware bandwidth and memory load efficiency without incurring a significant resource cost.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"19 1","pages":"1 - 24"},"PeriodicalIF":1.7,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47925919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The migration of traditional desktop and server applications to the cloud brings challenge of high performance, high reliability and low cost to the underlying cloud storage. To satisfy the requirement, this paper proposes a hybrid cloud-scale block storage system called Ursa. Trace analysis shows that the I/O patterns served by block storage have only limited locality to exploit. Therefore, instead of using SSDs as a cache layer, Ursa proposes an SSD-HDD-hybrid storage structure that directly stores primary replicas on SSDs and replicates backup replicas on HDDs. At the core of Ursa’s hybrid storage design is an adaptive journal that can bridge the performance gap between primary SSDs and backup HDDs for random writes, by transforming small backup writes into journal appends which are then asynchronously replayed and merged to backup HDDs. To efficiently index the journal, we design a novel range-optimized merge-tree (ROMT) structure that combines a continuous range of keys into a single composite key {offset,length}. Ursa integrates the hybrid structure with designs for high reliability, scalability, and availability. Experiments show that Ursa in its hybrid mode achieves almost the same performance as in its SSD-only mode (storing all replicas on SSDs), and outperforms other block stores (Ceph and Sheepdog) even in their SSD-only mode while achieving much higher CPU efficiency (IOPS and throughput per core).
{"title":"Hybrid Block Storage for Efficient Cloud Volume Service","authors":"Yiming Zhang, Huiba Li, Shengyun Liu, Peng Huang","doi":"10.1145/3596446","DOIUrl":"https://doi.org/10.1145/3596446","url":null,"abstract":"The migration of traditional desktop and server applications to the cloud brings challenge of high performance, high reliability and low cost to the underlying cloud storage. To satisfy the requirement, this paper proposes a hybrid cloud-scale block storage system called Ursa. Trace analysis shows that the I/O patterns served by block storage have only limited locality to exploit. Therefore, instead of using SSDs as a cache layer, Ursa proposes an SSD-HDD-hybrid storage structure that directly stores primary replicas on SSDs and replicates backup replicas on HDDs. At the core of Ursa’s hybrid storage design is an adaptive journal that can bridge the performance gap between primary SSDs and backup HDDs for random writes, by transforming small backup writes into journal appends which are then asynchronously replayed and merged to backup HDDs. To efficiently index the journal, we design a novel range-optimized merge-tree (ROMT) structure that combines a continuous range of keys into a single composite key {offset,length}. Ursa integrates the hybrid structure with designs for high reliability, scalability, and availability. Experiments show that Ursa in its hybrid mode achieves almost the same performance as in its SSD-only mode (storing all replicas on SSDs), and outperforms other block stores (Ceph and Sheepdog) even in their SSD-only mode while achieving much higher CPU efficiency (IOPS and throughput per core).","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":" ","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45670248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-08DOI: https://dl.acm.org/doi/10.1145/3596446
Yiming Zhang, Huiba Li, Shengyun Liu, Peng Huang
The migration of traditional desktop and server applications to the cloud brings challenge of high performance, high reliability and low cost to the underlying cloud storage. To satisfy the requirement, this paper proposes a hybrid cloud-scale block storage system called Ursa. Trace analysis shows that the I/O patterns served by block storage have only limited locality to exploit. Therefore, instead of using SSDs as a cache layer, Ursa proposes an SSD-HDD-hybrid storage structure that directly stores primary replicas on SSDs and replicates backup replicas on HDDs. At the core of Ursa’s hybrid storage design is an adaptive journal that can bridge the performance gap between primary SSDs and backup HDDs for random writes, by transforming small backup writes into journal appends which are then asynchronously replayed and merged to backup HDDs. To efficiently index the journal, we design a novel range-optimized merge-tree (ROMT) structure that combines a continuous range of keys into a single composite key {offset,length}. Ursa integrates the hybrid structure with designs for high reliability, scalability, and availability. Experiments show that Ursa in its hybrid mode achieves almost the same performance as in its SSD-only mode (storing all replicas on SSDs), and outperforms other block stores (Ceph and Sheepdog) even in their SSD-only mode while achieving much higher CPU efficiency (IOPS and throughput per core).
{"title":"Hybrid Block Storage for Efficient Cloud Volume Service","authors":"Yiming Zhang, Huiba Li, Shengyun Liu, Peng Huang","doi":"https://dl.acm.org/doi/10.1145/3596446","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3596446","url":null,"abstract":"<p>The migration of traditional desktop and server applications to the cloud brings challenge of high performance, high reliability and low cost to the underlying cloud storage. To satisfy the requirement, this paper proposes a hybrid cloud-scale block storage system called <span>Ursa</span>. Trace analysis shows that the I/O patterns served by block storage have only limited locality to exploit. Therefore, instead of using SSDs as a cache layer, <span>Ursa</span> proposes an SSD-HDD-hybrid storage structure that directly stores primary replicas on SSDs and replicates backup replicas on HDDs. At the core of <span>Ursa</span>’s hybrid storage design is an adaptive journal that can bridge the performance gap between primary SSDs and backup HDDs for random writes, by transforming small backup writes into journal appends which are then asynchronously replayed and merged to backup HDDs. To efficiently index the journal, we design a novel range-optimized merge-tree (ROMT) structure that combines a continuous range of keys into a single composite key <monospace>{offset,length}</monospace>. <span>Ursa</span> integrates the hybrid structure with designs for high reliability, scalability, and availability. Experiments show that <span>Ursa</span> in its hybrid mode achieves almost the same performance as in its SSD-only mode (storing all replicas on SSDs), and outperforms other block stores (Ceph and Sheepdog) even in their SSD-only mode while achieving much higher CPU efficiency (IOPS and throughput per core).</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"1 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrzej Jackowski, Leszek Gryz, Michal Welnicki, C. Dubnicki, K. Iwanicki
Data arrangement determines the capacity, resilience, and performance of a distributed storage system. A scalable self-managed system must place its data efficiently not only during stable operation but also after an expansion, planned downscaling, or device failures. In this article, we present Derrick, a data balancing algorithm addressing these needs, which has been developed for HYDRAstor, a highly scalable commercial storage system. Derrick makes its decisions quickly in case of failures but takes additional time to find a nearly optimal data arrangement and a plan for reaching it when the device population changes. Compared to balancing algorithms in two other state-of-the-art systems, Derrick provides better capacity utilization, reduced data movement, and improved performance. Moreover, it can be easily adapted to meet custom placement requirements.
{"title":"Derrick: A Three-layer Balancer for Self-managed Continuous Scalability","authors":"Andrzej Jackowski, Leszek Gryz, Michal Welnicki, C. Dubnicki, K. Iwanicki","doi":"10.1145/3594543","DOIUrl":"https://doi.org/10.1145/3594543","url":null,"abstract":"Data arrangement determines the capacity, resilience, and performance of a distributed storage system. A scalable self-managed system must place its data efficiently not only during stable operation but also after an expansion, planned downscaling, or device failures. In this article, we present Derrick, a data balancing algorithm addressing these needs, which has been developed for HYDRAstor, a highly scalable commercial storage system. Derrick makes its decisions quickly in case of failures but takes additional time to find a nearly optimal data arrangement and a plan for reaching it when the device population changes. Compared to balancing algorithms in two other state-of-the-art systems, Derrick provides better capacity utilization, reduced data movement, and improved performance. Moreover, it can be easily adapted to meet custom placement requirements.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"19 1","pages":"1 - 34"},"PeriodicalIF":1.7,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43674372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-08DOI: https://dl.acm.org/doi/10.1145/3582557
Jiri Schindler, Noa Zilberman
No abstract available.
没有摘要。
{"title":"Introduction to the Special Section on USENIX ATC 2022","authors":"Jiri Schindler, Noa Zilberman","doi":"https://dl.acm.org/doi/10.1145/3582557","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3582557","url":null,"abstract":"<p>No abstract available.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"52 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The USENIX Annual Technical Conference (ATC) publishes current computer systems research across system disciplines including networking, storage, security, operating systems, databases, and machine learning. This special section of the ACM Transactions on Storage presents some highlights from the storage-related papers published in the USENIX ATC in 2022. A large proportion of ATC papers have traditionally been related to storage. ATC ’22 has continued this trend. Out of 393 submissions, the authors tagged 124 (32%) with one or more topic labels related to Storage, File Systems, Key-Value Stores, and Data Management Systems. The conference accepted 14 storage-related works (22% of all published submissions). We selected three storage papers. They have been expanded since their publication and rereviewed by several of their original ATC ’22 reviewers. Collectively, they represent the mission of the USENIX organization: to bring together researchers from academia and systems practitioners working on production systems and/or large installations of cloud services providers. The ATC complements other USENIX venues including the premier research conference on Operating Systems Design and Implementation (OSDI) as well as storageand networked-systems-focused conferences of File and Storage Technologies (FAST) and Networked Systems Design and Implementation (USENIX NSDI), respectively. We are pleased to present these papers representing this cross section in their expanded form. The Realizing Strong Determinism Contract on Log-Structured Merge Key-Value Stores paper advocates for a hardware and software co-designed framework that advances the state-of-the-art of a widely used persistent data structure of log-structured merge trees for NVMe SSDs. The ZNSwap: un-Block your Swap paper presents a new approach for Zoned Namespace SSDs that significantly improves the performance of Linux memory swap on SSD devices. Finally, the CacheSack: Theory and Experience of Google’s Admission Optimization for Datacenter Flash Caches paper, submitted to the ATC Operational Systems Track, describes the design of using Flash caches to lower I/O access latency, drawing on years of research and experiences of the authors. We hope that you will find new insights into the complex world of storage by reading them.
{"title":"Introduction to the Special Section on USENIX ATC 2022","authors":"J. Schindler, Noa Zilberman","doi":"10.1145/3582557","DOIUrl":"https://doi.org/10.1145/3582557","url":null,"abstract":"The USENIX Annual Technical Conference (ATC) publishes current computer systems research across system disciplines including networking, storage, security, operating systems, databases, and machine learning. This special section of the ACM Transactions on Storage presents some highlights from the storage-related papers published in the USENIX ATC in 2022. A large proportion of ATC papers have traditionally been related to storage. ATC ’22 has continued this trend. Out of 393 submissions, the authors tagged 124 (32%) with one or more topic labels related to Storage, File Systems, Key-Value Stores, and Data Management Systems. The conference accepted 14 storage-related works (22% of all published submissions). We selected three storage papers. They have been expanded since their publication and rereviewed by several of their original ATC ’22 reviewers. Collectively, they represent the mission of the USENIX organization: to bring together researchers from academia and systems practitioners working on production systems and/or large installations of cloud services providers. The ATC complements other USENIX venues including the premier research conference on Operating Systems Design and Implementation (OSDI) as well as storageand networked-systems-focused conferences of File and Storage Technologies (FAST) and Networked Systems Design and Implementation (USENIX NSDI), respectively. We are pleased to present these papers representing this cross section in their expanded form. The Realizing Strong Determinism Contract on Log-Structured Merge Key-Value Stores paper advocates for a hardware and software co-designed framework that advances the state-of-the-art of a widely used persistent data structure of log-structured merge trees for NVMe SSDs. The ZNSwap: un-Block your Swap paper presents a new approach for Zoned Namespace SSDs that significantly improves the performance of Linux memory swap on SSD devices. Finally, the CacheSack: Theory and Experience of Google’s Admission Optimization for Datacenter Flash Caches paper, submitted to the ATC Operational Systems Track, describes the design of using Flash caches to lower I/O access latency, drawing on years of research and experiences of the authors. We hope that you will find new insights into the complex world of storage by reading them.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":" ","pages":"1 - 1"},"PeriodicalIF":1.7,"publicationDate":"2023-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42438967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose Vigil-KV, a hardware and software co-designed framework that eliminates long-tail latency almost perfectly by introducing strong latency determinism. To make Get latency deterministic, Vigil-KV first enables a predictable latency mode (PLM) interface on a real datacenter-scale NVMe SSD, having knowledge about the nature of the underlying flash technologies. Vigil-KV at the system-level then hides the non-deterministic time window (associated with SSD’s internal tasks and/or write services) by internally scheduling the different device states of PLM across multiple physical functions. Vigil-KV further schedules compaction/flush operations and client requests being aware of PLM’s restrictions thereby integrating strong latency determinism into LSM KVs. We implement Vigil-KV upon a 1.92TB NVMe SSD prototype and Linux 4.19.91, but other LSM KVs can adopt its concept. We evaluate diverse Facebook and Yahoo scenarios with Vigil-KV, and the results show that Vigil-KV can reducethe tail latency of a baseline KV system by 3.19× while reducing the average latency by 34%, on average.
{"title":"Realizing Strong Determinism Contract on Log-Structured Merge Key-Value Stores","authors":"Miryeong Kwon, Seungjun Lee, Hyunkyu Choi, Jooyoung Hwang, Myoungsoo Jung","doi":"https://dl.acm.org/doi/10.1145/3582695","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3582695","url":null,"abstract":"<p>We propose <i>Vigil-KV</i>, a hardware and software co-designed framework that eliminates long-tail latency almost perfectly by introducing strong latency determinism. To make Get latency deterministic, Vigil-KV first enables a predictable latency mode (PLM) interface on a real datacenter-scale NVMe SSD, having knowledge about the nature of the underlying flash technologies. Vigil-KV at the system-level then hides the non-deterministic time window (associated with SSD’s internal tasks and/or write services) by internally scheduling the different device states of PLM across multiple physical functions. Vigil-KV further schedules compaction/flush operations and client requests being aware of PLM’s restrictions thereby integrating strong latency determinism into LSM KVs. We implement Vigil-KV upon a 1.92TB NVMe SSD prototype and Linux 4.19.91, but other LSM KVs can adopt its concept. We evaluate diverse Facebook and Yahoo scenarios with Vigil-KV, and the results show that Vigil-KV can reducethe tail latency of a baseline KV system by 3.19× while reducing the average latency by 34%, on average.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"20 8","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Out-of-core systems rely on high-performance cache sub-systems to reduce the number of I/O operations. Although the page cache in modern operating systems enables transparent access to memory and storage devices, it suffers from efficiency and scalability issues on cache misses, forcing out-of-core systems to design and implement their own cache components, which is a non-trivial task.
This study proposes TriCache, a cache mechanism that enables in-memory programs to efficiently process out-of-core datasets without requiring any code rewrite. It provides a virtual memory interface on top of the conventional block interface to simultaneously achieve user transparency and sufficient out-of-core performance. A multi-level block cache design is proposed to address the challenge of per-access address translations required by a memory interface. It can exploit spatial and temporal localities in memory or storage accesses to render storage-to-memory address translation and page-level concurrency control adequately efficient for the virtual memory interface.
Our evaluation shows that in-memory systems operating on top of TriCache can outperform Linux OS page cache by more than one order of magnitude, and can deliver performance comparable to or even better than that of corresponding counterparts designed specifically for out-of-core scenarios.
{"title":"TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory Programs","authors":"Guanyu Feng, Huanqi Cao, Xiaowei Zhu, Bowen Yu, Yuanwei Wang, Zixuan Ma, Shengqi Chen, Wenguang Chen","doi":"https://dl.acm.org/doi/10.1145/3583139","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3583139","url":null,"abstract":"<p>Out-of-core systems rely on high-performance cache sub-systems to reduce the number of I/O operations. Although the page cache in modern operating systems enables transparent access to memory and storage devices, it suffers from efficiency and scalability issues on cache misses, forcing out-of-core systems to design and implement their own cache components, which is a non-trivial task.</p><p>This study proposes TriCache, a cache mechanism that enables in-memory programs to efficiently process out-of-core datasets without requiring any code rewrite. It provides a virtual memory interface on top of the conventional block interface to simultaneously achieve user transparency and sufficient out-of-core performance. A multi-level block cache design is proposed to address the challenge of per-access address translations required by a memory interface. It can exploit spatial and temporal localities in memory or storage accesses to render storage-to-memory address translation and page-level concurrency control adequately efficient for the virtual memory interface.</p><p>Our evaluation shows that in-memory systems operating on top of TriCache can outperform Linux OS page cache by more than one order of magnitude, and can deliver performance comparable to or even better than that of corresponding counterparts designed specifically for out-of-core scenarios.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"12 6","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To satisfy the enormous storage capacities required for big data, data centers have been adopting high-density shingled magnetic recording (SMR) disks. However, the weak fine-grained random write performance of SMR disks caused by their inherent write amplification and unbalanced read–write performance poses a severe challenge. Many studies have proposed solid-state drive (SSD) cache systems to address this issue. However, existing cache algorithms, such as the least recently used (LRU) algorithm, which is used to optimize cache popularity, and the MOST algorithm, which is used to optimize the write amplification factor, cannot exploit the full performance of the proposed cache systems because of their inappropriate optimization objectives. This article proposes a new SMR-aware cache framework called SAC+ to improve SMR-based hybrid storage. SAC+ integrates the two dominant types of SMR drives—namely, drive-managed and host-managed SMR drives—and provides a universal framework implementation. In addition, SAC+ integrally combines the drive characteristics to optimize I/O performance. The results of evaluations conducted using real-world traces indicate that SAC+ reduces the I/O time by 36–93% compared with state-of-the-art algorithms.
{"title":"A Universal SMR-aware Cache Framework with Deep Optimization for DM-SMR and HM-SMR Disks","authors":"Diansen Sun, Ruixiong Tan, Yunpeng Chai","doi":"10.1145/3588442","DOIUrl":"https://doi.org/10.1145/3588442","url":null,"abstract":"To satisfy the enormous storage capacities required for big data, data centers have been adopting high-density shingled magnetic recording (SMR) disks. However, the weak fine-grained random write performance of SMR disks caused by their inherent write amplification and unbalanced read–write performance poses a severe challenge. Many studies have proposed solid-state drive (SSD) cache systems to address this issue. However, existing cache algorithms, such as the least recently used (LRU) algorithm, which is used to optimize cache popularity, and the MOST algorithm, which is used to optimize the write amplification factor, cannot exploit the full performance of the proposed cache systems because of their inappropriate optimization objectives. This article proposes a new SMR-aware cache framework called SAC+ to improve SMR-based hybrid storage. SAC+ integrates the two dominant types of SMR drives—namely, drive-managed and host-managed SMR drives—and provides a universal framework implementation. In addition, SAC+ integrally combines the drive characteristics to optimize I/O performance. The results of evaluations conducted using real-world traces indicate that SAC+ reduces the I/O time by 36–93% compared with state-of-the-art algorithms.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"19 1","pages":"1 - 35"},"PeriodicalIF":1.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48920469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}