Jingwei Li, P. Lee, Chufeng Tan, Chuan Qin, Xiaosong Zhang
Encrypted deduplication combines encryption and deduplication to simultaneously achieve both data security and storage efficiency. State-of-the-art encrypted deduplication systems mainly build on deterministic encryption to preserve deduplication effectiveness. However, such deterministic encryption reveals the underlying frequency distribution of the original plaintext chunks. This allows an adversary to launch frequency analysis against the ciphertext chunks and infer the content of the original plaintext chunks. In this article, we study how frequency analysis affects information leakage in encrypted deduplication, from both attack and defense perspectives. Specifically, we target backup workloads and propose a new inference attack that exploits chunk locality to increase the coverage of inferred chunks. We further combine the new inference attack with the knowledge of chunk sizes and show its attack effectiveness against variable-size chunks. We conduct trace-driven evaluation on both real-world and synthetic datasets and show that our proposed attacks infer a significant fraction of plaintext chunks under backup workloads. To defend against frequency analysis, we present two defense approaches, namely MinHash encryption and scrambling. Our trace-driven evaluation shows that our combined MinHash encryption and scrambling scheme effectively mitigates the severity of the inference attacks, while maintaining high storage efficiency and incurring limited metadata access overhead.
{"title":"Information Leakage in Encrypted Deduplication via Frequency Analysis","authors":"Jingwei Li, P. Lee, Chufeng Tan, Chuan Qin, Xiaosong Zhang","doi":"10.1145/3365840","DOIUrl":"https://doi.org/10.1145/3365840","url":null,"abstract":"Encrypted deduplication combines encryption and deduplication to simultaneously achieve both data security and storage efficiency. State-of-the-art encrypted deduplication systems mainly build on deterministic encryption to preserve deduplication effectiveness. However, such deterministic encryption reveals the underlying frequency distribution of the original plaintext chunks. This allows an adversary to launch frequency analysis against the ciphertext chunks and infer the content of the original plaintext chunks. In this article, we study how frequency analysis affects information leakage in encrypted deduplication, from both attack and defense perspectives. Specifically, we target backup workloads and propose a new inference attack that exploits chunk locality to increase the coverage of inferred chunks. We further combine the new inference attack with the knowledge of chunk sizes and show its attack effectiveness against variable-size chunks. We conduct trace-driven evaluation on both real-world and synthetic datasets and show that our proposed attacks infer a significant fraction of plaintext chunks under backup workloads. To defend against frequency analysis, we present two defense approaches, namely MinHash encryption and scrambling. Our trace-driven evaluation shows that our combined MinHash encryption and scrambling scheme effectively mitigates the severity of the inference attacks, while maintaining high storage efficiency and incurring limited metadata access overhead.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129215553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This issue of ACM Transactions on Storage brings some of the highlights of the 11th ACM International Systems and Storage Conference (SYSTOR’18), held in Haifa, Israel, during June 2018. SYSTOR is an international forum for interaction across the systems research community, attracting authors and participants from all over the world. Of the 44 submitted and 10 accepted papers, we invited the authors of two notable papers to extend them for publication in this special issue. The first article, which was selected as the best paper at the conference, “Lerna: Parallelizing Dependent Loops Using Speculation,” by Mohamed M. Saad, Roberto Palmieri, and Binoy Ravindran, presents a technique for using speculation to parallelize code containing data dependencies. Lerna uses a combination of static analysis and profiling to rewrite a sequential program into a parallel one and manages the concurrent execution of jobs using transactional memory. This is the first framework for executing such process automatically, without any input from the programmers. The second article, “REGISTOR: A Platform for Unstructured Data Processing Inside SSD Storage,” by Shuyi Pei, Jing Yang, and Qing Yang, presents the design, implementation, and evaluation of an in-device regular-expression engine for SSDs. The resulting smart storage avoids transfer of data through the low-bandwidth I/O bus, increasing the application-perceived throughput by up to 10×. This is a significant step toward addressing the challenges of the Big Data era.
{"title":"Introduction to the Special Issue on ACM International Systems and Storage Conference (SYSTOR) 2018","authors":"G. Yadgar, Donald E. Porter","doi":"10.1145/3313898","DOIUrl":"https://doi.org/10.1145/3313898","url":null,"abstract":"This issue of ACM Transactions on Storage brings some of the highlights of the 11th ACM International Systems and Storage Conference (SYSTOR’18), held in Haifa, Israel, during June 2018. SYSTOR is an international forum for interaction across the systems research community, attracting authors and participants from all over the world. Of the 44 submitted and 10 accepted papers, we invited the authors of two notable papers to extend them for publication in this special issue. The first article, which was selected as the best paper at the conference, “Lerna: Parallelizing Dependent Loops Using Speculation,” by Mohamed M. Saad, Roberto Palmieri, and Binoy Ravindran, presents a technique for using speculation to parallelize code containing data dependencies. Lerna uses a combination of static analysis and profiling to rewrite a sequential program into a parallel one and manages the concurrent execution of jobs using transactional memory. This is the first framework for executing such process automatically, without any input from the programmers. The second article, “REGISTOR: A Platform for Unstructured Data Processing Inside SSD Storage,” by Shuyi Pei, Jing Yang, and Qing Yang, presents the design, implementation, and evaluation of an in-device regular-expression engine for SSDs. The resulting smart storage avoids transfer of data through the low-bandwidth I/O bus, increasing the application-perceived throughput by up to 10×. This is a significant step toward addressing the challenges of the Big Data era.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121836669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this first issue of 2019, there are a couple of announcements that I would like to make. First, Associate Editors Salima Benbernou, Haryadi Gunawi, Jiri Schindler, and Thomas Schwarz are stepping down after many years of devoted service. I applaud and thank them for their devotion and hard work that has allowed Transactions on Storage (TOS) to continue and maintain the high quality of publications it is offering today. Second, our ACM TOS team would like to express our appreciation to the numerous academic reviewers who contributed to the peer-review process these past years. Our entire community is indebted to these volunteers, who have generously given their time and expertise to thoroughly review the papers that were submitted. In the recent two years, ACM TOS received the help of over 30 editorial board members along with 180 reviewers to curate nearly 250 submissions and publish more than 70 articles with meaningful and impactful results. While the list of all the reviewers have been listed in our website (https://tos.acm.org/), I take this opportunity to list out our distinguished reviewers who went out of their way to provide careful, thorough, and timely reviews that stood out among the reviews. These names are based on the recommendations of the Associate Editors through whom the reviews were solicited. Again, we thank all the reviewers for your dedication and support for ACM TOS and the computer system storage community as a whole. Thank you all.
{"title":"ACM TOS Distinguished Reviewers","authors":"S. Noh","doi":"10.1145/3313879","DOIUrl":"https://doi.org/10.1145/3313879","url":null,"abstract":"In this first issue of 2019, there are a couple of announcements that I would like to make. First, Associate Editors Salima Benbernou, Haryadi Gunawi, Jiri Schindler, and Thomas Schwarz are stepping down after many years of devoted service. I applaud and thank them for their devotion and hard work that has allowed Transactions on Storage (TOS) to continue and maintain the high quality of publications it is offering today. Second, our ACM TOS team would like to express our appreciation to the numerous academic reviewers who contributed to the peer-review process these past years. Our entire community is indebted to these volunteers, who have generously given their time and expertise to thoroughly review the papers that were submitted. In the recent two years, ACM TOS received the help of over 30 editorial board members along with 180 reviewers to curate nearly 250 submissions and publish more than 70 articles with meaningful and impactful results. While the list of all the reviewers have been listed in our website (https://tos.acm.org/), I take this opportunity to list out our distinguished reviewers who went out of their way to provide careful, thorough, and timely reviews that stood out among the reviews. These names are based on the recommendations of the Associate Editors through whom the reviews were solicited. Again, we thank all the reviewers for your dedication and support for ACM TOS and the computer system storage community as a whole. Thank you all.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132502774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael J. May, Etamar Laron, Khalid Zoabi, Havah Gerhardt
Users and Operating Systems (OSs) have vastly different views of files. OSs use files to persist data and structured information. To accomplish this, OSs treat files as named collections of bytes managed in hierarchical file systems. Despite their critical role in computing, little attention is paid to the lifecycle of the file, the evolution of file contents, or the evolution of file metadata. In contrast, users have rich mental models of files: they group files into projects, send data repositories to others, work on documents over time, and stash them aside for future use. Current OSs and Revision Control Systems ignore such mental models, persisting a selective, manually designated history of revisions. Preserving the mental model allows applications to better match how users view their files, making file processing and archiving tools more effective. We propose two mechanisms that OSs can adopt to better preserve the mental model: File Lifecycle Events (FLEs) that record a file’s progression and Complex File Events (CFEs) that combine them into meaningful patterns. We present the Complex File Events Engine (CoFEE), which uses file system monitoring and an extensible rulebase (Drools) to detect FLEs and convert them into complex ones. CFEs are persisted in NoSQL stores for later querying.
{"title":"On the Lifecycle of the File","authors":"Michael J. May, Etamar Laron, Khalid Zoabi, Havah Gerhardt","doi":"10.1145/3295463","DOIUrl":"https://doi.org/10.1145/3295463","url":null,"abstract":"Users and Operating Systems (OSs) have vastly different views of files. OSs use files to persist data and structured information. To accomplish this, OSs treat files as named collections of bytes managed in hierarchical file systems. Despite their critical role in computing, little attention is paid to the lifecycle of the file, the evolution of file contents, or the evolution of file metadata. In contrast, users have rich mental models of files: they group files into projects, send data repositories to others, work on documents over time, and stash them aside for future use. Current OSs and Revision Control Systems ignore such mental models, persisting a selective, manually designated history of revisions. Preserving the mental model allows applications to better match how users view their files, making file processing and archiving tools more effective. We propose two mechanisms that OSs can adopt to better preserve the mental model: File Lifecycle Events (FLEs) that record a file’s progression and Complex File Events (CFEs) that combine them into meaningful patterns. We present the Complex File Events Engine (CoFEE), which uses file system monitoring and an extensible rulebase (Drools) to detect FLEs and convert them into complex ones. CFEs are persisted in NoSQL stores for later querying.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132414643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distributed RAM storage aggregates the RAM of servers in data center networks (DCN) to provide extremely high I/O performance for large-scale cloud systems. For quick recovery of storage server failures, MemCube [53] exploits the proximity of the BCube network to limit the recovery traffic to the recovery servers’ 1-hop neighborhood. However, the previous design is applicable only to the symmetric BCube(n,k) network with nk+1 nodes and has suboptimal recovery performance due to congestion and contention. To address these problems, in this article, we propose CubeX, which (i) generalizes the “1-hop” principle of MemCube for arbitrary cube-based networks and (ii) improves the throughput and recovery performance of RAM-based key-value (KV) store via cross-layer optimizations. At the core of CubeX is to leverage the glocality (= globality + locality) of cube-based networks: It scatters backup data across a large number of disks globally distributed throughout the cube and restricts all recovery traffic within the small local range of each server node. Our evaluation shows that CubeX not only efficiently supports RAM-based KV store for cube-based networks but also significantly outperforms MemCube and RAMCloud in both throughput and recovery time.
{"title":"Leveraging Glocality for Fast Failure Recovery in Distributed RAM Storage","authors":"Yiming Zhang, Dongsheng Li, Ling Liu","doi":"10.1145/3289604","DOIUrl":"https://doi.org/10.1145/3289604","url":null,"abstract":"Distributed RAM storage aggregates the RAM of servers in data center networks (DCN) to provide extremely high I/O performance for large-scale cloud systems. For quick recovery of storage server failures, MemCube [53] exploits the proximity of the BCube network to limit the recovery traffic to the recovery servers’ 1-hop neighborhood. However, the previous design is applicable only to the symmetric BCube(n,k) network with nk+1 nodes and has suboptimal recovery performance due to congestion and contention. To address these problems, in this article, we propose CubeX, which (i) generalizes the “1-hop” principle of MemCube for arbitrary cube-based networks and (ii) improves the throughput and recovery performance of RAM-based key-value (KV) store via cross-layer optimizations. At the core of CubeX is to leverage the glocality (= globality + locality) of cube-based networks: It scatters backup data across a large number of disks globally distributed throughout the cube and restricts all recovery traffic within the small local range of each server node. Our evaluation shows that CubeX not only efficiently supports RAM-based KV store for cube-based networks but also significantly outperforms MemCube and RAMCloud in both throughput and recovery time.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127002535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhichao Cao, Hao Wen, Xiongzi Ge, Jingwei Ma, Jim Diehl, D. Du
With the rapid increase in the amount of data produced and the development of new types of storage devices, storage tiering continues to be a popular way to achieve a good tradeoff between performance and cost-effectiveness. In a basic two-tier storage system, a storage tier with higher performance and typically higher cost (the fast tier) is used to store frequently-accessed (active) data while a large amount of less-active data are stored in the lower-performance and low-cost tier (the slow tier). Data are migrated between these two tiers according to their activity. In this article, we propose a Tier-aware Data Deduplication-based File System, called TDDFS, which can operate efficiently on top of a two-tier storage environment. Specifically, to achieve better performance, nearly all file operations are performed in the fast tier. To achieve higher cost-effectiveness, files are migrated from the fast tier to the slow tier if they are no longer active, and this migration is done with data deduplication. The distinctiveness of our design is that it maintains the non-redundant (unique) chunks produced by data deduplication in both tiers if possible. When a file is reloaded (called a reloaded file) from the slow tier to the fast tier, if some data chunks of the file already exist in the fast tier, then the data migration of these chunks from the slow tier can be avoided. Our evaluation shows that TDDFS achieves close to the best overall performance among various file-tiering designs for two-tier storage systems.
{"title":"TDDFS","authors":"Zhichao Cao, Hao Wen, Xiongzi Ge, Jingwei Ma, Jim Diehl, D. Du","doi":"10.1145/3295461","DOIUrl":"https://doi.org/10.1145/3295461","url":null,"abstract":"With the rapid increase in the amount of data produced and the development of new types of storage devices, storage tiering continues to be a popular way to achieve a good tradeoff between performance and cost-effectiveness. In a basic two-tier storage system, a storage tier with higher performance and typically higher cost (the fast tier) is used to store frequently-accessed (active) data while a large amount of less-active data are stored in the lower-performance and low-cost tier (the slow tier). Data are migrated between these two tiers according to their activity. In this article, we propose a Tier-aware Data Deduplication-based File System, called TDDFS, which can operate efficiently on top of a two-tier storage environment. Specifically, to achieve better performance, nearly all file operations are performed in the fast tier. To achieve higher cost-effectiveness, files are migrated from the fast tier to the slow tier if they are no longer active, and this migration is done with data deduplication. The distinctiveness of our design is that it maintains the non-redundant (unique) chunks produced by data deduplication in both tiers if possible. When a file is reloaded (called a reloaded file) from the slow tier to the fast tier, if some data chunks of the file already exist in the fast tier, then the data migration of these chunks from the slow tier can be avoided. Our evaluation shows that TDDFS achieves close to the best overall performance among various file-tiering designs for two-tier storage systems.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"132 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114017768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Solid-state Drives (SSDs) have changed the landscape of storage systems and present a promising storage solution for data-intensive applications due to their low latency, high bandwidth, and low power consumption compared to traditional hard disk drives. SSDs achieve these desirable characteristics using internal parallelism—parallel access to multiple internal flash memory chips—and a Flash Translation Layer (FTL) that determines where data are stored on those chips so that they do not wear out prematurely. However, current state-of-the-art cache-based FTLs like the Demand-based Flash Translation Layer (DFTL) do not allow IO schedulers to take full advantage of internal parallelism, because they impose a tight coupling between the logical-to-physical address translation and the data access. To address this limitation, we introduce a new FTL design called Parallel-DFTL that works with the DFTL to decouple address translation operations from data accesses. Parallel-DFTL separates address translation and data access operations into different queues, allowing the SSD to use concurrent flash accesses for both types of operations. We also present a Parallel-LRU cache replacement algorithm to improve the concurrency of address translation operations. To compare Parallel-DFTL against existing FTL approaches, we present a Parallel-DFTL performance model and compare its predictions against those for DFTL and an ideal page-mapping approach. We also implemented the Parallel-DFTL approach in an SSD simulator using real device parameters, and used trace-driven simulation to evaluate Parallel-DFTL’s efficacy. Our evaluation results show that Parallel-DFTL improved the overall performance by up to 32% for the real IO workloads we tested, and by up to two orders of magnitude with synthetic test workloads. We also found that Parallel-DFTL is able to achieve reasonable performance with a very small cache size and that it provides the best benefit for those workloads with large request size or with high write ratio.
{"title":"Exploiting Internal Parallelism for Address Translation in Solid-State Drives","authors":"Wei Xie, Yong Chen, P. Roth","doi":"10.1145/3239564","DOIUrl":"https://doi.org/10.1145/3239564","url":null,"abstract":"Solid-state Drives (SSDs) have changed the landscape of storage systems and present a promising storage solution for data-intensive applications due to their low latency, high bandwidth, and low power consumption compared to traditional hard disk drives. SSDs achieve these desirable characteristics using internal parallelism—parallel access to multiple internal flash memory chips—and a Flash Translation Layer (FTL) that determines where data are stored on those chips so that they do not wear out prematurely. However, current state-of-the-art cache-based FTLs like the Demand-based Flash Translation Layer (DFTL) do not allow IO schedulers to take full advantage of internal parallelism, because they impose a tight coupling between the logical-to-physical address translation and the data access. To address this limitation, we introduce a new FTL design called Parallel-DFTL that works with the DFTL to decouple address translation operations from data accesses. Parallel-DFTL separates address translation and data access operations into different queues, allowing the SSD to use concurrent flash accesses for both types of operations. We also present a Parallel-LRU cache replacement algorithm to improve the concurrency of address translation operations. To compare Parallel-DFTL against existing FTL approaches, we present a Parallel-DFTL performance model and compare its predictions against those for DFTL and an ideal page-mapping approach. We also implemented the Parallel-DFTL approach in an SSD simulator using real device parameters, and used trace-driven simulation to evaluate Parallel-DFTL’s efficacy. Our evaluation results show that Parallel-DFTL improved the overall performance by up to 32% for the real IO workloads we tested, and by up to two orders of magnitude with synthetic test workloads. We also found that Parallel-DFTL is able to achieve reasonable performance with a very small cache size and that it provides the best benefit for those workloads with large request size or with high write ratio.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"41 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131851931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction to the Special Issue on SYSTOR 2017","authors":"Peter Desnoyers, E. D. Lara","doi":"10.1145/3287097","DOIUrl":"https://doi.org/10.1145/3287097","url":null,"abstract":"","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131644172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Trivedi, Nikolas Ioannou, B. Metzler, Patrick Stuedi, Jonas Pfefferle, K. Kourtis, Ioannis Koltsidas, T. Gross
During the past decade, network and storage devices have undergone rapid performance improvements, delivering ultra-low latency and several Gbps of bandwidth. Nevertheless, current network and storage stacks fail to deliver this hardware performance to the applications, often due to the loss of I/O efficiency from stalled CPU performance. While many efforts attempt to address this issue solely on either the network or the storage stack, achieving high-performance for networked-storage applications requires a holistic approach that considers both. In this article, we present FlashNet, a software I/O stack that unifies high-performance network properties with flash storage access and management. FlashNet builds on RDMA principles and abstractions to provide a direct, asynchronous, end-to-end data path between a client and remote flash storage. The key insight behind FlashNet is to co-design the stack’s components (an RDMA controller, a flash controller, and a file system) to enable cross-stack optimizations and maximize I/O efficiency. In micro-benchmarks, FlashNet improves 4kB network I/O operations per second (IOPS by 38.6% to 1.22M, decreases access latency by 43.5% to 50.4μs, and prolongs the flash lifetime by 1.6-5.9× for writes. We illustrate the capabilities of FlashNet by building a Key-Value store and porting a distributed data store that uses RDMA on it. The use of FlashNet’s RDMA API improves the performance of KV store by 2× and requires minimum changes for the ported data store to access remote flash devices.
{"title":"FlashNet","authors":"A. Trivedi, Nikolas Ioannou, B. Metzler, Patrick Stuedi, Jonas Pfefferle, K. Kourtis, Ioannis Koltsidas, T. Gross","doi":"10.1145/3239562","DOIUrl":"https://doi.org/10.1145/3239562","url":null,"abstract":"During the past decade, network and storage devices have undergone rapid performance improvements, delivering ultra-low latency and several Gbps of bandwidth. Nevertheless, current network and storage stacks fail to deliver this hardware performance to the applications, often due to the loss of I/O efficiency from stalled CPU performance. While many efforts attempt to address this issue solely on either the network or the storage stack, achieving high-performance for networked-storage applications requires a holistic approach that considers both. In this article, we present FlashNet, a software I/O stack that unifies high-performance network properties with flash storage access and management. FlashNet builds on RDMA principles and abstractions to provide a direct, asynchronous, end-to-end data path between a client and remote flash storage. The key insight behind FlashNet is to co-design the stack’s components (an RDMA controller, a flash controller, and a file system) to enable cross-stack optimizations and maximize I/O efficiency. In micro-benchmarks, FlashNet improves 4kB network I/O operations per second (IOPS by 38.6% to 1.22M, decreases access latency by 43.5% to 50.4μs, and prolongs the flash lifetime by 1.6-5.9× for writes. We illustrate the capabilities of FlashNet by building a Key-Value store and porting a distributed data store that uses RDMA on it. The use of FlashNet’s RDMA API improves the performance of KV store by 2× and requires minimum changes for the ported data store to access remote flash devices.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123985344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zvika Guz, Harry Li, A. Shayesteh, V. Balakrishnan
Storage disaggregation separates compute and storage to different nodes to allow for independent resource scaling and, thus, better hardware resource utilization. While disaggregation of hard-drives storage is a common practice, NVMe-SSD (i.e., PCIe-based SSD) disaggregation is considered more challenging. This is because SSDs are significantly faster than hard drives, so the latency overheads (due to both network and CPU processing) as well as the extra compute cycles needed for the offloading stack become much more pronounced. In this work, we characterize the overheads of NVMe-SSD disaggregation. We show that NVMe-over-Fabrics (NVMe-oF)—a recently released remote storage protocol specification—reduces the overheads of remote access to a bare minimum, thus greatly increasing the cost-efficiency of Flash disaggregation. Specifically, while recent work showed that SSD storage disaggregation via iSCSI degrades application-level throughput by 20%, we report on negligible performance degradation with NVMe-oF—both when using stress-tests as well as with a more-realistic KV-store workload.
{"title":"Performance Characterization of NVMe-over-Fabrics Storage Disaggregation","authors":"Zvika Guz, Harry Li, A. Shayesteh, V. Balakrishnan","doi":"10.1145/3239563","DOIUrl":"https://doi.org/10.1145/3239563","url":null,"abstract":"Storage disaggregation separates compute and storage to different nodes to allow for independent resource scaling and, thus, better hardware resource utilization. While disaggregation of hard-drives storage is a common practice, NVMe-SSD (i.e., PCIe-based SSD) disaggregation is considered more challenging. This is because SSDs are significantly faster than hard drives, so the latency overheads (due to both network and CPU processing) as well as the extra compute cycles needed for the offloading stack become much more pronounced. In this work, we characterize the overheads of NVMe-SSD disaggregation. We show that NVMe-over-Fabrics (NVMe-oF)—a recently released remote storage protocol specification—reduces the overheads of remote access to a bare minimum, thus greatly increasing the cost-efficiency of Flash disaggregation. Specifically, while recent work showed that SSD storage disaggregation via iSCSI degrades application-level throughput by 20%, we report on negligible performance degradation with NVMe-oF—both when using stress-tests as well as with a more-realistic KV-store workload.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"533 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133355014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}