Pub Date : 2023-07-07DOI: https://dl.acm.org/doi/10.1145/3607536
Suzhen Wu, Zhanhong Tu, Yuxuan Zhou, Zuocheng Wang, Zhirong Shen, Wei Chen, Wei Wang, Weichun Wang, Bo Mao
More and more data are stored in cloud storage which brings two major challenges. First, the modified files in the cloud should be quickly synchronized to ensure data consistency, e.g., delta synchronization (sync) achieves efficient cloud sync by synchronizing only the updated part of the file. Second, the huge data in the cloud needs to be deduplicated and encrypted, e.g., Message-Locked Encryption (MLE) implements data deduplication by encrypting the content among different users. However, when combined, a few updates in the content can cause large sync traffic amplification for both keys and ciphertext in the MLE-based cloud storage, significantly degrading the cloud sync efficiency. A feature-based encryption sync scheme, FeatureSync, is proposed to address the delta amplification problem. However, with further improvement of the network bandwidth, the performance of FeatureSync stagnates. In our preliminary experimental evaluations, we find that the bottleneck of the computational overhead in the high-bandwidth network environments is the main bottleneck in FeatureSync. In this paper, we propose an enhanced feature-based encryption sync scheme FASTSync to optimize the performance of FeatureSync in high-bandwidth network environments. The performance evaluations on a lightweight prototype implementation of FASTSync show that FASTSync reduces the cloud sync time by 70.3% and the encryption time by 37.3% on average, compared with FeatureSync.
{"title":"FASTSync: a FAST Delta Sync Scheme for Encrypted Cloud Storage in High-Bandwidth Network Environments","authors":"Suzhen Wu, Zhanhong Tu, Yuxuan Zhou, Zuocheng Wang, Zhirong Shen, Wei Chen, Wei Wang, Weichun Wang, Bo Mao","doi":"https://dl.acm.org/doi/10.1145/3607536","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3607536","url":null,"abstract":"<p>More and more data are stored in cloud storage which brings two major challenges. First, the modified files in the cloud should be quickly synchronized to ensure data consistency, e.g., delta synchronization (sync) achieves efficient cloud sync by synchronizing only the updated part of the file. Second, the huge data in the cloud needs to be deduplicated and encrypted, e.g., Message-Locked Encryption (MLE) implements data deduplication by encrypting the content among different users. However, when combined, a few updates in the content can cause large sync traffic amplification for both keys and ciphertext in the MLE-based cloud storage, significantly degrading the cloud sync efficiency. A feature-based encryption sync scheme, FeatureSync, is proposed to address the delta amplification problem. However, with further improvement of the network bandwidth, the performance of FeatureSync stagnates. In our preliminary experimental evaluations, we find that the bottleneck of the computational overhead in the high-bandwidth network environments is the main bottleneck in FeatureSync. In this paper, we propose an enhanced feature-based encryption sync scheme FASTSync to optimize the performance of FeatureSync in high-bandwidth network environments. The performance evaluations on a lightweight prototype implementation of FASTSync show that FASTSync reduces the cloud sync time by 70.3% and the encryption time by 37.3% on average, compared with FeatureSync.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"46 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138512818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiuyun Tong, Xinghua Li, Yinbin Miao, Yunwei Wang, Ximeng Liu, R. Deng
Symmetric Searchable Encryption (SSE), as an ideal primitive, can ensure data privacy while supporting retrieval over encrypted data. However, existing multi-user SSE schemes require the data owner to share the secret key with all query users or always be online to generate search tokens. While there are some solutions to this problem, they have at least one weakness, such as non-supporting conjunctive query, result decryption assistance of the data owner, and unauthorized access. To solve the above issues, we propose an Owner-free Distributed Symmetric searchable encryption supporting Conjunctive query (ODiSC). Specifically, we first evaluate Learning-Parity-with-Noise weak Pseudorandom Function (LPN-wPRF) in dual-cloud architecture to generate search tokens with the data owner free from sharing key and being online. Then, we provide fine-grained conjunctive query in the distributed architecture using additive secret sharing and symmetric-key hidden vector encryption. Finally, formal security analysis and empirical performance evaluation demonstrate that ODiSC is adaptively simulation-secure and efficient.
{"title":"Owner-Free Distributed Symmetric Searchable Encryption Supporting Conjunctive Queries","authors":"Qiuyun Tong, Xinghua Li, Yinbin Miao, Yunwei Wang, Ximeng Liu, R. Deng","doi":"10.1145/3607255","DOIUrl":"https://doi.org/10.1145/3607255","url":null,"abstract":"Symmetric Searchable Encryption (SSE), as an ideal primitive, can ensure data privacy while supporting retrieval over encrypted data. However, existing multi-user SSE schemes require the data owner to share the secret key with all query users or always be online to generate search tokens. While there are some solutions to this problem, they have at least one weakness, such as non-supporting conjunctive query, result decryption assistance of the data owner, and unauthorized access. To solve the above issues, we propose an Owner-free Distributed Symmetric searchable encryption supporting Conjunctive query (ODiSC). Specifically, we first evaluate Learning-Parity-with-Noise weak Pseudorandom Function (LPN-wPRF) in dual-cloud architecture to generate search tokens with the data owner free from sharing key and being online. Then, we provide fine-grained conjunctive query in the distributed architecture using additive secret sharing and symmetric-key hidden vector encryption. Finally, formal security analysis and empirical performance evaluation demonstrate that ODiSC is adaptively simulation-secure and efficient.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":" ","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47729536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-05DOI: https://dl.acm.org/doi/10.1145/3607255
Qiuyun Tong, Xinghua Li, Yinbin Miao, Yunwei Wang, Ximeng Liu, Robert H. Deng
Symmetric Searchable Encryption (SSE), as an ideal primitive, can ensure data privacy while supporting retrieval over encrypted data. However, existing multi-user SSE schemes require the data owner to share the secret key with all query users or always be online to generate search tokens. While there are some solutions to this problem, they have at least one weakness, such as non-supporting conjunctive query, result decryption assistance of the data owner, and unauthorized access. To solve the above issues, we propose an Owner-free Distributed Symmetric searchable encryption supporting Conjunctive query (ODiSC). Specifically, we first evaluate Learning-Parity-with-Noise weak Pseudorandom Function (LPN-wPRF) in dual-cloud architecture to generate search tokens with the data owner free from sharing key and being online. Then, we provide fine-grained conjunctive query in the distributed architecture using additive secret sharing and symmetric-key hidden vector encryption. Finally, formal security analysis and empirical performance evaluation demonstrate that ODiSC is adaptively simulation-secure and efficient.
{"title":"Owner-Free Distributed Symmetric Searchable Encryption Supporting Conjunctive Queries","authors":"Qiuyun Tong, Xinghua Li, Yinbin Miao, Yunwei Wang, Ximeng Liu, Robert H. Deng","doi":"https://dl.acm.org/doi/10.1145/3607255","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3607255","url":null,"abstract":"<p>Symmetric Searchable Encryption (SSE), as an ideal primitive, can ensure data privacy while supporting retrieval over encrypted data. However, existing multi-user SSE schemes require the data owner to share the secret key with all query users or always be online to generate search tokens. While there are some solutions to this problem, they have at least one weakness, such as non-supporting conjunctive query, result decryption assistance of the data owner, and unauthorized access. To solve the above issues, we propose an <underline>O</underline>wner-free <underline>Di</underline>stributed <underline>S</underline>ymmetric searchable encryption supporting <underline>C</underline>onjunctive query (ODiSC). Specifically, we first evaluate Learning-Parity-with-Noise weak Pseudorandom Function (LPN-wPRF) in dual-cloud architecture to generate search tokens with the data owner free from sharing key and being online. Then, we provide fine-grained conjunctive query in the distributed architecture using additive secret sharing and symmetric-key hidden vector encryption. Finally, formal security analysis and empirical performance evaluation demonstrate that ODiSC is adaptively simulation-secure and efficient.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"74 11","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138512867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hardware is often required to support fast search and high-throughput applications. Consequently, the performance of search algorithms is limited by storage bandwidth. Hence, the search algorithm must be optimized accordingly. We propose a CostCounter (CC) algorithm based on cuckoo hashing and an Improved CostCounter (ICC) algorithm. A better path can be selected when collisions occur using a cost counter to record the kick-out situation. Our simulation results indicate that the CC and ICC algorithms can achieve more significant performance improvements than Random Walk (RW), Breadth First Search (BFS), and MinCounter (MC). With two buckets and two slots per bucket, under the 95% memory load rate of the maximum load rate, CC and ICC are optimized on read-write times over 20% and 80% compared to MC and BFS, respectively. Furthermore, the CC and ICC algorithms achieve a slight improvement in storage efficiency compared with MC. In addition, we implement RW, MC, and the proposed algorithms using fine-grained locking to support a high throughput rate. From the test on field programmable gate arrays, we verify the simulation results and our algorithms optimize the maximum throughput over 23% compared to RW and 9% compared to MC under 95% of the memory capacity. The test results indicate that our CC and ICC algorithms can achieve better performance in terms of hardware bandwidth and memory load efficiency without incurring a significant resource cost.
通常需要硬件来支持快速搜索和高吞吐量应用程序。因此,搜索算法的性能受到存储带宽的限制。因此,搜索算法必须进行相应的优化。提出了一种基于布谷鸟哈希的CostCounter (CC)算法和一种改进的CostCounter (ICC)算法。当发生碰撞时,可以选择更好的路径,使用成本计数器记录踢球情况。仿真结果表明,CC和ICC算法比Random Walk (RW)、广度优先搜索(BFS)和MinCounter (MC)算法能取得更显著的性能改进。对于两个桶和每个桶两个槽,在最大负载率为95%的内存负载率下,CC和ICC的读写时间分别比MC和BFS高20%和80%。此外,与MC相比,CC和ICC算法在存储效率方面略有提高。此外,我们使用细粒度锁定实现RW, MC和所提出的算法以支持高吞吐率。通过对现场可编程门阵列的测试,我们验证了仿真结果,我们的算法在95%的内存容量下,与RW相比优化了23%的最大吞吐量,与MC相比优化了9%的最大吞吐量。测试结果表明,我们的CC和ICC算法在硬件带宽和内存负载效率方面可以获得更好的性能,而不会产生显着的资源成本。
{"title":"CostCounter: A Better Method for Collision Mitigation in Cuckoo Hashing","authors":"Haonan Wu, Shuxian Wang, Zhanfeng Jin, Yuhang Zhang, Ruyun Ma, Sijin Fan, Ruili Chao","doi":"https://dl.acm.org/doi/10.1145/3596910","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3596910","url":null,"abstract":"<p>Hardware is often required to support fast search and high-throughput applications. Consequently, the performance of search algorithms is limited by storage bandwidth. Hence, the search algorithm must be optimized accordingly. We propose a CostCounter (CC) algorithm based on cuckoo hashing and an Improved CostCounter (ICC) algorithm. A better path can be selected when collisions occur using a cost counter to record the kick-out situation. Our simulation results indicate that the CC and ICC algorithms can achieve more significant performance improvements than Random Walk (RW), Breadth First Search (BFS), and MinCounter (MC). With two buckets and two slots per bucket, under the 95% memory load rate of the maximum load rate, CC and ICC are optimized on read-write times over 20% and 80% compared to MC and BFS, respectively. Furthermore, the CC and ICC algorithms achieve a slight improvement in storage efficiency compared with MC. In addition, we implement RW, MC, and the proposed algorithms using fine-grained locking to support a high throughput rate. From the test on field programmable gate arrays, we verify the simulation results and our algorithms optimize the maximum throughput over 23% compared to RW and 9% compared to MC under 95% of the memory capacity. The test results indicate that our CC and ICC algorithms can achieve better performance in terms of hardware bandwidth and memory load efficiency without incurring a significant resource cost.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"31 7","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-19DOI: https://dl.acm.org/doi/10.1145/3580281
Jiaxin Li, Yiming Zhang, Shan Lu, Haryadi S. Gunawi, Xiaohui Gu, Feng Huang, Dongsheng Li
This article systematically studies 99 distributed performance bugs from five widely deployed distributed storage and computing systems (Cassandra, HBase, HDFS, Hadoop MapReduce and ZooKeeper). We present the TaxPerf database, which collectively organizes the analysis results as over 400 classification labels and over 2,500 lines of bug re-description. TaxPerf is classified into six bug categories (and 18 bug subcategories) by their root causes; resource, blocking, synchronization, optimization, configuration, and logic. TaxPerf can be used as a benchmark for performance bug studies and debug tool designs. Although it is impractical to automatically detect all categories of performance bugs in TaxPerf, we find that an important category of blocking bugs can be effectively solved by analysis tools. We analyze the cascading nature of blocking bugs and design an automatic detection tool called PCatch, which (i) performs program analysis to identify code regions whose execution time can potentially increase dramatically with the workload size; (ii) adapts the traditional happens-before model to reason about software resource contention and performance dependency relationship; and (iii) uses dynamic tracking to identify whether the slowdown propagation is contained in one job. Evaluation shows that PCatch can accurately detect blocking bugs of representative distributed storage and computing systems by observing system executions under small-scale workloads.
{"title":"Performance Bug Analysis and Detection for Distributed Storage and Computing Systems","authors":"Jiaxin Li, Yiming Zhang, Shan Lu, Haryadi S. Gunawi, Xiaohui Gu, Feng Huang, Dongsheng Li","doi":"https://dl.acm.org/doi/10.1145/3580281","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3580281","url":null,"abstract":"<p>This article systematically studies 99 distributed performance bugs from five widely deployed distributed storage and computing systems (Cassandra, HBase, HDFS, Hadoop MapReduce and ZooKeeper). We present the <i>TaxPerf</i> database, which collectively organizes the analysis results as over 400 classification labels and over 2,500 lines of bug re-description. TaxPerf is classified into six bug categories (and 18 bug subcategories) by their root causes; resource, blocking, synchronization, optimization, configuration, and logic. TaxPerf can be used as a benchmark for performance bug studies and debug tool designs. Although it is impractical to automatically detect all categories of performance bugs in TaxPerf, we find that an important category of blocking bugs can be effectively solved by analysis tools. We analyze the cascading nature of blocking bugs and design an automatic detection tool called <i>PCatch</i>, which (i) performs program analysis to identify code regions whose execution time can potentially increase dramatically with the workload size; (ii) adapts the traditional happens-before model to reason about software resource contention and performance dependency relationship; and (iii) uses dynamic tracking to identify whether the slowdown propagation is contained in one job. Evaluation shows that PCatch can accurately detect blocking bugs of representative distributed storage and computing systems by observing system executions under small-scale workloads.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"58 11","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-19DOI: https://dl.acm.org/doi/10.1145/3594543
Andrzej Jackowski, Leszek Gryz, Michał Wełnicki, Cezary Dubnicki, Konrad Iwanicki
Data arrangement determines the capacity, resilience, and performance of a distributed storage system. A scalable self-managed system must place its data efficiently not only during stable operation but also after an expansion, planned downscaling, or device failures. In this article, we present Derrick, a data balancing algorithm addressing these needs, which has been developed for HYDRAstor, a highly scalable commercial storage system. Derrick makes its decisions quickly in case of failures but takes additional time to find a nearly optimal data arrangement and a plan for reaching it when the device population changes. Compared to balancing algorithms in two other state-of-the-art systems, Derrick provides better capacity utilization, reduced data movement, and improved performance. Moreover, it can be easily adapted to meet custom placement requirements.
{"title":"Derrick: A Three-layer Balancer for Self-managed Continuous Scalability","authors":"Andrzej Jackowski, Leszek Gryz, Michał Wełnicki, Cezary Dubnicki, Konrad Iwanicki","doi":"https://dl.acm.org/doi/10.1145/3594543","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3594543","url":null,"abstract":"<p>Data arrangement determines the capacity, resilience, and performance of a distributed storage system. A scalable self-managed system must place its data efficiently not only during stable operation but also after an expansion, planned downscaling, or device failures. In this article, we present Derrick, a data balancing algorithm addressing these needs, which has been developed for HYDRAstor, a highly scalable commercial storage system. Derrick makes its decisions quickly in case of failures but takes additional time to find a nearly optimal data arrangement and a plan for reaching it when the device population changes. Compared to balancing algorithms in two other state-of-the-art systems, Derrick provides better capacity utilization, reduced data movement, and improved performance. Moreover, it can be easily adapted to meet custom placement requirements.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"2006 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-19DOI: https://dl.acm.org/doi/10.1145/3582013
Mian Qin, Qing Zheng, Jason Lee, Bradley Settlemyer, Fei Wen, Narasimha Reddy, Paul Gratz
Key–value (KV) software has proven useful to a wide variety of applications including analytics, time-series databases, and distributed file systems. To satisfy the requirements of diverse workloads, KV stores have been carefully tailored to best match the performance characteristics of underlying solid-state block devices. Emerging KV storage device is a promising technology for both simplifying the KV software stack and improving the performance of persistent storage-based applications. However, while providing fast, predictable put and get operations, existing KV storage devices do not natively support range queries that are critical to all three types of applications described above.
In this article, we present KVRangeDB, a software layer that enables processing range queries for existing hash-based KV solid-state disks (KVSSDs). As an effort to adapt to the performance characteristics of emerging KVSSDs, KVRangeDB implements log-structured merge tree key index that reduces compaction I/O, merges keys when possible, and provides separate caches for indexes and values. We evaluated the KVRangeDB under a set of representative workloads, and compared its performance with two existing database solutions: a Rocksdb variant ported to work with the KVSSD, and Wisckey, a key–value database that is carefully tuned for conventional block devices. On filesystem aging workloads, KVRangeDB outperforms Wisckey by 23.7× in terms of throughput and reduce CPU usage and external write amplifications by 14.3× and 9.8×, respectively.
{"title":"KVRangeDB: Range Queries for a Hash-based Key–Value Device","authors":"Mian Qin, Qing Zheng, Jason Lee, Bradley Settlemyer, Fei Wen, Narasimha Reddy, Paul Gratz","doi":"https://dl.acm.org/doi/10.1145/3582013","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3582013","url":null,"abstract":"<p>Key–value (KV) software has proven useful to a wide variety of applications including analytics, time-series databases, and distributed file systems. To satisfy the requirements of diverse workloads, KV stores have been carefully tailored to best match the performance characteristics of underlying solid-state block devices. Emerging KV storage device is a promising technology for both simplifying the KV software stack and improving the performance of persistent storage-based applications. However, while providing fast, predictable put and get operations, existing KV storage devices do not natively support range queries that are critical to all three types of applications described above.</p><p>In this article, we present KVRangeDB, a software layer that enables processing range queries for existing hash-based KV solid-state disks (KVSSDs). As an effort to adapt to the performance characteristics of emerging KVSSDs, KVRangeDB implements log-structured merge tree key index that reduces compaction I/O, merges keys when possible, and provides separate caches for indexes and values. We evaluated the KVRangeDB under a set of representative workloads, and compared its performance with two existing database solutions: a Rocksdb variant ported to work with the KVSSD, and Wisckey, a key–value database that is carefully tuned for conventional block devices. On filesystem aging workloads, KVRangeDB outperforms Wisckey by 23.7× in terms of throughput and reduce CPU usage and external write amplifications by 14.3× and 9.8×, respectively.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"2011 10","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-19DOI: https://dl.acm.org/doi/10.1145/3582012
Ming Zhang, Yu Hua, Pengfei Zuo, Lurong Liu
Persistent memory (PM) disaggregation significantly improves the resource utilization and failure isolation to build a scalable and cost-effective remote memory pool in modern data centers. However, due to offering limited computing power and overlooking the bandwidth and persistence properties of real PMs, existing distributed transaction schemes, which are designed for legacy DRAM-based monolithic servers, fail to efficiently work on the disaggregated PM. In this article, we propose FORD, a Fast One-sided RDMA-based Distributed transaction system for the new disaggregated PM architecture. FORD thoroughly leverages one-sided remote direct memory access to handle transactions for bypassing the remote CPU in the PM pool. To reduce the round trips, FORD batches the read and lock operations into one request to eliminate extra locking and validations for the read-write data. To accelerate the transaction commit, FORD updates all remote replicas in a single round trip with parallel undo logging and data visibility control. Moreover, considering the limited PM bandwidth, FORD enables the backup replicas to be read to alleviate the load on the primary replicas, thus improving the throughput. To efficiently guarantee the remote data persistency in the PM pool, FORD selectively flushes data to the backup replicas to mitigate the network overheads. Nevertheless, the original FORD wastes some validation round trips if the read-only data are not modified by other transactions. Hence, we further propose a localized validation scheme to transfer the validation operations for the read-only data from remote to local as much as possible to reduce the round trips. Experimental results demonstrate that FORD significantly improves the transaction throughput by up to 3× and decreases the latency by up to 87.4% compared with state-of-the-art systems.
{"title":"Localized Validation Accelerates Distributed Transactions on Disaggregated Persistent Memory","authors":"Ming Zhang, Yu Hua, Pengfei Zuo, Lurong Liu","doi":"https://dl.acm.org/doi/10.1145/3582012","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3582012","url":null,"abstract":"<p>Persistent memory (PM) disaggregation significantly improves the resource utilization and failure isolation to build a scalable and cost-effective remote memory pool in modern data centers. However, due to offering limited computing power and overlooking the bandwidth and persistence properties of real PMs, existing distributed transaction schemes, which are designed for legacy DRAM-based monolithic servers, fail to efficiently work on the disaggregated PM. In this article, we propose FORD, a <i>F</i>ast <i>O</i>ne-sided <i>R</i>DMA-based <i>D</i>istributed transaction system for the new disaggregated PM architecture. FORD thoroughly leverages one-sided remote direct memory access to handle transactions for bypassing the remote CPU in the PM pool. To reduce the round trips, FORD batches the read and lock operations into one request to eliminate extra locking and validations for the read-write data. To accelerate the transaction commit, FORD updates all remote replicas in a single round trip with parallel undo logging and data visibility control. Moreover, considering the limited PM bandwidth, FORD enables the backup replicas to be read to alleviate the load on the primary replicas, thus improving the throughput. To efficiently guarantee the remote data persistency in the PM pool, FORD selectively flushes data to the backup replicas to mitigate the network overheads. Nevertheless, the original FORD wastes some validation round trips if the read-only data are not modified by other transactions. Hence, we further propose a localized validation scheme to transfer the validation operations for the read-only data from remote to local as much as possible to reduce the round trips. Experimental results demonstrate that FORD significantly improves the transaction throughput by up to 3× and decreases the latency by up to 87.4% compared with state-of-the-art systems.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"41 5","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-19DOI: https://dl.acm.org/doi/10.1145/3584663
Wen Xia, Lifeng Pu, Xiangyu Zou, Philip Shilane, Shiyi Li, Haijun Zhang, Xuan Wang
Post-deduplication delta compression is a data reduction technique that calculates and stores the differences of very similar but non-duplicate chunks in storage systems, which is able to achieve a very high compression ratio. However, the low throughput of widely used resemblance detection approaches (e.g., N-Transform) usually becomes the bottleneck of delta compression systems due to introducing high computational overhead. Generally, this overhead mainly consists of two parts: ① calculating the rolling hash byte by byte across data chunks and ② applying multiple transforms on all of the calculated rolling hash values.
In this article, we propose Odess, a fast and lightweight resemblance detection approach, that greatly reduces the computational overhead for resemblance detection while achieving high detection accuracy and a high compression ratio. Odess first utilizes a novel Subwindow-based Parallel Rolling (SWPR) hash method using Single Instruction Multiple Data [1] (SIMD) to accelerate calculation of rolling hashes (corresponding to the first part of the overhead). Odess then uses a novel Content-Defined Sampling method to generate a much smaller proxy hash set from the whole rolling hash set and quickly applies transforms on this small hash set for resemblance detection (corresponding to the second part of the overhead).
Evaluation results show that during the stage of resemblance detection, the Odess approach is ∼31.4× and ∼7.9× faster than the state-of-the-art N-Transform and Finesse (a recent variant of N-Transform [39]), respectively. When considering an end-to-end data reduction storage system, the Odess-based system’s throughput is about 3.20× and 1.41× higher than the N-Transform- and Finesse-based systems’ throughput, respectively, while maintaining the high compression ratio of N-Transform and achieving ∼1.22× higher compression ratio over Finesse.
{"title":"The Design of Fast and Lightweight Resemblance Detection for Efficient Post-Deduplication Delta Compression","authors":"Wen Xia, Lifeng Pu, Xiangyu Zou, Philip Shilane, Shiyi Li, Haijun Zhang, Xuan Wang","doi":"https://dl.acm.org/doi/10.1145/3584663","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3584663","url":null,"abstract":"<p>Post-deduplication delta compression is a data reduction technique that calculates and stores the differences of very similar but non-duplicate chunks in storage systems, which is able to achieve a very high compression ratio. However, the low throughput of widely used resemblance detection approaches (e.g., N-Transform) usually becomes the bottleneck of delta compression systems due to introducing high computational overhead. Generally, this overhead mainly consists of two parts: ① calculating the rolling hash byte by byte across data chunks and ② applying multiple transforms on all of the calculated rolling hash values.</p><p> In this article, we propose Odess, a fast and lightweight resemblance detection approach, that greatly reduces the computational overhead for resemblance detection while achieving high detection accuracy and a high compression ratio. Odess first utilizes a novel Subwindow-based Parallel Rolling (SWPR) hash method using Single Instruction Multiple Data [1] (SIMD) to accelerate calculation of rolling hashes (corresponding to the first part of the overhead). Odess then uses a novel Content-Defined Sampling method to generate a much smaller proxy hash set from the whole rolling hash set and quickly applies transforms on this small hash set for resemblance detection (corresponding to the second part of the overhead).</p><p>Evaluation results show that during the stage of resemblance detection, the Odess approach is ∼31.4× and ∼7.9× faster than the state-of-the-art N-Transform and Finesse (a recent variant of N-Transform [39]), respectively. When considering an end-to-end data reduction storage system, the Odess-based system’s throughput is about 3.20× and 1.41× higher than the N-Transform- and Finesse-based systems’ throughput, respectively, while maintaining the high compression ratio of N-Transform and achieving ∼1.22× higher compression ratio over Finesse.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"76 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138512878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-19DOI: https://dl.acm.org/doi/10.1145/3586576
Zhibing Sha, Jun Li, Fengxiang Zhang, Min Huang, Zhigang Cai, Francois Trahay, Jianwei Liao
Most solid-state drives (SSDs) adopt an on-board Dynamic Random Access Memory (DRAM) to buffer the write data, which can significantly reduce the amount of write operations committed to the flash array of SSD if data exhibits locality in write operations. This article focuses on efficiently managing the small amount of DRAM cache inside SSDs. The basic idea is to employ the visibility graph technique to unify both temporal and spatial locality of references of I/O accesses, for directing cache management in SSDs. Specifically, we propose to adaptively generate the visibility graph of cached data pages and then support batch adjustment of adjacent or nearby (hot) cached data pages by referring to the connection situations in the visibility graph. In addition, we propose to evict the buffered data pages in batches by also referring to the connection situations, to maximize the internal flushing parallelism of SSD devices without worsening I/O congestion. The trace-driven simulation experiments show that our proposal can yield improvements on cache hits by between 0.8% and 19.8%, and the overall I/O latency by 25.6% on average, compared to state-of-the-art cache management schemes inside SSDs.
大多数固态硬盘(SSD)都采用板载DRAM (Dynamic Random Access Memory)来缓冲写数据,如果数据在写操作中呈现局域性,则可以显著减少提交到SSD闪存阵列的写操作量。本文主要讨论如何有效地管理ssd内的少量DRAM缓存。其基本思想是使用可见性图技术统一I/O访问引用的时间和空间位置,以指导ssd中的缓存管理。具体而言,我们提出自适应生成缓存数据页面的可见性图,然后根据可见性图中的连接情况,支持对相邻或附近(热)缓存数据页面进行批量调整。此外,我们还建议在参考连接情况的情况下,分批地驱逐缓存的数据页,以最大限度地提高SSD设备的内部刷新并行性,而不会加剧I/O拥塞。跟踪驱动的模拟实验表明,与ssd内部最先进的缓存管理方案相比,我们的建议可以将缓存命中率提高0.8%到19.8%,总体I/O延迟平均降低25.6%。
{"title":"Visibility Graph-based Cache Management for DRAM Buffer Inside Solid-state Drives","authors":"Zhibing Sha, Jun Li, Fengxiang Zhang, Min Huang, Zhigang Cai, Francois Trahay, Jianwei Liao","doi":"https://dl.acm.org/doi/10.1145/3586576","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3586576","url":null,"abstract":"<p>Most solid-state drives (SSDs) adopt an on-board Dynamic Random Access Memory (DRAM) to buffer the write data, which can significantly reduce the amount of write operations committed to the flash array of SSD if data exhibits locality in write operations. This article focuses on efficiently managing the small amount of DRAM cache inside SSDs. The basic idea is to employ the visibility graph technique to unify both temporal and spatial locality of references of I/O accesses, for directing cache management in SSDs. Specifically, we propose to adaptively generate the visibility graph of cached data pages and then support batch adjustment of adjacent or nearby (hot) cached data pages by referring to the connection situations in the visibility graph. In addition, we propose to evict the buffered data pages in batches by also referring to the connection situations, to maximize the internal flushing parallelism of SSD devices without worsening I/O congestion. The trace-driven simulation experiments show that our proposal can yield improvements on cache hits by between <monospace>0.8</monospace>% and <monospace>19.8</monospace>%, and the overall I/O latency by <monospace>25.6</monospace>% on average, compared to state-of-the-art cache management schemes inside SSDs.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"58 10","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}