Hyojun Kim, S. Seshadri, Clem Dickey, Lawrence Chiu
Storage devices based on Phase Change Memory (PCM) devices are beginning to generate considerable attention in both industry and academic communities. But whether the technology in its current state will be a commercially and technically viable alternative to entrenched technologies such as flash-based SSDs still remains unanswered. To address this it is important to consider PCM SSD devices not just from a device standpoint, but also from a holistic perspective. This paper presents the results of our performance measurement study of a recent all-PCM SSD prototype. The average latency for 4 KB random read is 6.7 μs, which is about 16x faster than a comparable eMLC flash SSD. The distribution of I/O response times is also much narrower than the flash SSD for both reads and writes. Based on real-world workload traces, we model a hypothetical storage device which consists of flash, HDD, and PCM to identify the combinations of device types that offer the best performance within cost constraints. Our results show that - even at current price points - PCM storage devices show promise as a new component in multi-tiered enterprise storage systems.
{"title":"Phase change memory in enterprise storage systems: silver bullet or snake oil?","authors":"Hyojun Kim, S. Seshadri, Clem Dickey, Lawrence Chiu","doi":"10.1145/2527792.2527794","DOIUrl":"https://doi.org/10.1145/2527792.2527794","url":null,"abstract":"Storage devices based on Phase Change Memory (PCM) devices are beginning to generate considerable attention in both industry and academic communities. But whether the technology in its current state will be a commercially and technically viable alternative to entrenched technologies such as flash-based SSDs still remains unanswered. To address this it is important to consider PCM SSD devices not just from a device standpoint, but also from a holistic perspective.\u0000 This paper presents the results of our performance measurement study of a recent all-PCM SSD prototype. The average latency for 4 KB random read is 6.7 μs, which is about 16x faster than a comparable eMLC flash SSD. The distribution of I/O response times is also much narrower than the flash SSD for both reads and writes. Based on real-world workload traces, we model a hypothetical storage device which consists of flash, HDD, and PCM to identify the combinations of device types that offer the best performance within cost constraints. Our results show that - even at current price points - PCM storage devices show promise as a new component in multi-tiered enterprise storage systems.","PeriodicalId":404573,"journal":{"name":"INFLOW '13","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130128514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jithin Jose, M. Banikazemi, W. Belluomini, Chet Murthy, D. Panda
Storage Class Memory (SCM) blends the best properties of main memory and hard disk drives. It offers non-volatility and byte addressability, and promises short access times with low cost per bit. Earlier research in this field explored designs exploiting SCM features and used either simulations or theoretical models for evaluations. In this work, we explore the design challenges for achieving non-volatility using real SCM hardware that is available now: Flash-Backed DRAM. We present performance analysis of flash-backed DRAM and describe the system issues involved in achieving true non-volatility using the system memory hierarchy which was designed assuming that data is volatile. We present software abstractions which allow applications to be redesigned easily using SCM features, without having to worry about system issues. Furthermore, we present case studies using two applications with different characteristics: an SSD-based caching layer used in enterprise storage (Flash Cache) and an in-memory database (SolidDB), and redesign them using software abstractions. Our performance evaluations reveal that SCM aware Flash Cache design could enable persistence with less than 2% degradation in performance. Similarly, redesigning SolidDB persistence layer using SCM improved the performance by a factor of two. To the best of our knowledge, this is the first work that evaluates SCM performance and demonstrates application redesign using real SCM hardware.
{"title":"MetaData persistence using storage class memory: experiences with flash-backed DRAM","authors":"Jithin Jose, M. Banikazemi, W. Belluomini, Chet Murthy, D. Panda","doi":"10.1145/2527792.2527800","DOIUrl":"https://doi.org/10.1145/2527792.2527800","url":null,"abstract":"Storage Class Memory (SCM) blends the best properties of main memory and hard disk drives. It offers non-volatility and byte addressability, and promises short access times with low cost per bit. Earlier research in this field explored designs exploiting SCM features and used either simulations or theoretical models for evaluations. In this work, we explore the design challenges for achieving non-volatility using real SCM hardware that is available now: Flash-Backed DRAM. We present performance analysis of flash-backed DRAM and describe the system issues involved in achieving true non-volatility using the system memory hierarchy which was designed assuming that data is volatile. We present software abstractions which allow applications to be redesigned easily using SCM features, without having to worry about system issues. Furthermore, we present case studies using two applications with different characteristics: an SSD-based caching layer used in enterprise storage (Flash Cache) and an in-memory database (SolidDB), and redesign them using software abstractions. Our performance evaluations reveal that SCM aware Flash Cache design could enable persistence with less than 2% degradation in performance. Similarly, redesigning SolidDB persistence layer using SCM improved the performance by a factor of two. To the best of our knowledge, this is the first work that evaluates SCM performance and demonstrates application redesign using real SCM hardware.","PeriodicalId":404573,"journal":{"name":"INFLOW '13","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122576886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emerging non-volatile storage (e.g., Phase Change Memory, STT-RAM) allow access to persistent data at latencies an order of magnitude lower than SSDs. The density and price gap between NVMs and denser storage make NVM economically most suitable as a cache for larger, more conventional storage (i.e., NAND flash-based SSDs and disks). Existing storage caching architectures (even those that use fast flash-based SSDs) introduce significant software overhead that can obscure the performance benefits of faster memories. We propose Bankshot, a caching architecture that allows cache hits to bypass the OS (and the associated software overheads) entirely, while relying on the OS for heavy-weight operations like servicing misses and performing write backs. We evaluate several design decisions in Bankshot including different cache management policies and different levels of hardware, software support for tracking dirty data and maintaining meta-data. We find that with hardware support Bankshot can offer upto 5× speedup over conventional caching systems.
{"title":"Bankshot: caching slow storage in fast non-volatile memory","authors":"Meenakshi Sundaram Bhaskaran, Jian Xu, S. Swanson","doi":"10.1145/2527792.2527793","DOIUrl":"https://doi.org/10.1145/2527792.2527793","url":null,"abstract":"Emerging non-volatile storage (e.g., Phase Change Memory, STT-RAM) allow access to persistent data at latencies an order of magnitude lower than SSDs. The density and price gap between NVMs and denser storage make NVM economically most suitable as a cache for larger, more conventional storage (i.e., NAND flash-based SSDs and disks). Existing storage caching architectures (even those that use fast flash-based SSDs) introduce significant software overhead that can obscure the performance benefits of faster memories. We propose Bankshot, a caching architecture that allows cache hits to bypass the OS (and the associated software overheads) entirely, while relying on the OS for heavy-weight operations like servicing misses and performing write backs. We evaluate several design decisions in Bankshot including different cache management policies and different levels of hardware, software support for tracking dirty data and maintaining meta-data. We find that with hardware support Bankshot can offer upto 5× speedup over conventional caching systems.","PeriodicalId":404573,"journal":{"name":"INFLOW '13","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122064206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dimitris Skourtis, D. Achlioptas, C. Maltzahn, S. Brandt
Solid-state drives are becoming increasingly popular in enterprise storage systems, playing the role of large caches and permanent storage. Although SSDs provide faster random access than hard-drives, their performance under read/write workloads is highly variable often exceeding that of hard-drives (e.g., taking 100ms for a single read). Many systems with mixed workloads have low latency requirements, or require predictable performance and guarantees. In such cases, the performance variance of SSDs becomes a problem for both predictability and raw performance. In this paper, we propose a design based on redundancy, which provides high performance and low latency for reads under read/write workloads by physically separating reads from writes. More specifically, reads achieve read-only performance while writes perform at least as good as before. We evaluate our design using micro-benchmarks and real traces, illustrating the performance benefits of read/write separation in solid-state drives.
{"title":"High performance & low latency in solid-state drives through redundancy","authors":"Dimitris Skourtis, D. Achlioptas, C. Maltzahn, S. Brandt","doi":"10.1145/2527792.2527798","DOIUrl":"https://doi.org/10.1145/2527792.2527798","url":null,"abstract":"Solid-state drives are becoming increasingly popular in enterprise storage systems, playing the role of large caches and permanent storage. Although SSDs provide faster random access than hard-drives, their performance under read/write workloads is highly variable often exceeding that of hard-drives (e.g., taking 100ms for a single read). Many systems with mixed workloads have low latency requirements, or require predictable performance and guarantees. In such cases, the performance variance of SSDs becomes a problem for both predictability and raw performance.\u0000 In this paper, we propose a design based on redundancy, which provides high performance and low latency for reads under read/write workloads by physically separating reads from writes. More specifically, reads achieve read-only performance while writes perform at least as good as before. We evaluate our design using micro-benchmarks and real traces, illustrating the performance benefits of read/write separation in solid-state drives.","PeriodicalId":404573,"journal":{"name":"INFLOW '13","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116545584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a cost-effective and reliable SSD host cache solution that we call SRC (SSD RAID Cache). Cost-effectiveness is brought about by using multiple low-cost SSDs and reliability is enhanced through RAID-based data redundancy. RAID, however, is managed in a log-structured manner on multiple SSDs effectively eliminating the detrimental read-modify-write operations found in conventional RAID-5. Within the proposed framework, we also propose to eliminate parity blocks for stripes that are composed of clean blocks as the original data resides in primary storage. We also propose the use of destaging, instead of garbage collection, to make space in the cache when the SSD cache is full. We show that the proposed techniques have significant implications on the performance of the cache and lifetime of the SSDs that comprise the cache. Finally, we study various ways in which stripes can be formed based on data and parity block allocation policies. Our experimental results using different realistic I/O workloads show using the SRC scheme is on average 59% better than the conventional SSD cache scheme supporting RAID-5. In case of lifetime, our results show that SRC reduces the erase count of the SSD drives by an average of 47% compared to the RAID-5 scheme.
{"title":"Improving performance and lifetime of the SSD RAID-based host cache through a log-structured approach","authors":"Y. Oh, Jongmoo Choi, Donghee Lee, S. Noh","doi":"10.1145/2527792.2527795","DOIUrl":"https://doi.org/10.1145/2527792.2527795","url":null,"abstract":"This paper proposes a cost-effective and reliable SSD host cache solution that we call SRC (SSD RAID Cache). Cost-effectiveness is brought about by using multiple low-cost SSDs and reliability is enhanced through RAID-based data redundancy. RAID, however, is managed in a log-structured manner on multiple SSDs effectively eliminating the detrimental read-modify-write operations found in conventional RAID-5. Within the proposed framework, we also propose to eliminate parity blocks for stripes that are composed of clean blocks as the original data resides in primary storage. We also propose the use of destaging, instead of garbage collection, to make space in the cache when the SSD cache is full. We show that the proposed techniques have significant implications on the performance of the cache and lifetime of the SSDs that comprise the cache. Finally, we study various ways in which stripes can be formed based on data and parity block allocation policies. Our experimental results using different realistic I/O workloads show using the SRC scheme is on average 59% better than the conventional SSD cache scheme supporting RAID-5. In case of lifetime, our results show that SRC reduces the erase count of the SSD drives by an average of 47% compared to the RAID-5 scheme.","PeriodicalId":404573,"journal":{"name":"INFLOW '13","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126111682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Browser caches are typically designed to maximize hit rates---if all other factors are held equal, then the highest hit rate will result in the highest performance. However, if the performance of the underlying cache storage (i.e. the file system) varies with differing workloads, then these other factors may in fact not be equal when comparing different cache strategies. Mobile systems such as smart phones are typically equipped with low-speed flash storage, and suffer severe degradation in file system performance under sufficiently random write workloads. A cache implementation which performs random writes will thus spend more time reading and writing its cache, possibly resulting in lower overall system performance than a lower-hit-rate implementation which achieves higher storage performance. We present a log-structured browser cache, generating almost purely sequential writes, and in which cleaning is efficiently performed by cache eviction. An implementation of this cache for the Chromium browser on Android was developed; using captured user browsing traces we test the log-structured cache and compare its performance to the existing Chromium implementation. We achieve a ten-fold performance improvement in basic cache operations (as measured on a Nexus 7 tablet), while in the worst case increasing miss rate by less than 3% (from 65% to 68%). For network bandwidths of 1Mb/s or higher the increased cache performance more than makes up for the decrease in hit rate; the effect is more pronounced when examining 95th percentile delays.
{"title":"Log-structured cache: trading hit-rate for storage performance (and winning) in mobile devices","authors":"Abutalib Aghayev, Peter Desnoyers","doi":"10.1145/2527792.2527797","DOIUrl":"https://doi.org/10.1145/2527792.2527797","url":null,"abstract":"Browser caches are typically designed to maximize hit rates---if all other factors are held equal, then the highest hit rate will result in the highest performance. However, if the performance of the underlying cache storage (i.e. the file system) varies with differing workloads, then these other factors may in fact not be equal when comparing different cache strategies.\u0000 Mobile systems such as smart phones are typically equipped with low-speed flash storage, and suffer severe degradation in file system performance under sufficiently random write workloads. A cache implementation which performs random writes will thus spend more time reading and writing its cache, possibly resulting in lower overall system performance than a lower-hit-rate implementation which achieves higher storage performance.\u0000 We present a log-structured browser cache, generating almost purely sequential writes, and in which cleaning is efficiently performed by cache eviction. An implementation of this cache for the Chromium browser on Android was developed; using captured user browsing traces we test the log-structured cache and compare its performance to the existing Chromium implementation.\u0000 We achieve a ten-fold performance improvement in basic cache operations (as measured on a Nexus 7 tablet), while in the worst case increasing miss rate by less than 3% (from 65% to 68%). For network bandwidths of 1Mb/s or higher the increased cache performance more than makes up for the decrease in hit rate; the effect is more pronounced when examining 95th percentile delays.","PeriodicalId":404573,"journal":{"name":"INFLOW '13","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131100388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sudarsun Kannan, Ada Gavrilovska, K. Schwan, Sanjay Kumar
The growth in browser-based computations is raising the need for efficient local storage for browser-based applications. A standard approach to control how such applications access and manipulate the underlying platform resources, is to run in-browser applications in a sandbox environment. Sandboxing works by static code analysis and system call interception, and as a result, the performance of browser applications making frequent I/O calls can be severely impacted. To address this, we explore the utility of next generation non-volatile memories (NVM) in client platforms. By using NVM as virtual memory, and integrating NVM support for browser applications with byte-addressable I/O interfaces, our approach shows up to 3.5x reduction in sandboxing cost and around 3x reduction in serialization overheads for browser-based applications, and improved application performance.
{"title":"NVM heaps for accelerating browser-based applications","authors":"Sudarsun Kannan, Ada Gavrilovska, K. Schwan, Sanjay Kumar","doi":"10.1145/2527792.2527796","DOIUrl":"https://doi.org/10.1145/2527792.2527796","url":null,"abstract":"The growth in browser-based computations is raising the need for efficient local storage for browser-based applications. A standard approach to control how such applications access and manipulate the underlying platform resources, is to run in-browser applications in a sandbox environment. Sandboxing works by static code analysis and system call interception, and as a result, the performance of browser applications making frequent I/O calls can be severely impacted. To address this, we explore the utility of next generation non-volatile memories (NVM) in client platforms. By using NVM as virtual memory, and integrating NVM support for browser applications with byte-addressable I/O interfaces, our approach shows up to 3.5x reduction in sandboxing cost and around 3x reduction in serialization overheads for browser-based applications, and improved application performance.","PeriodicalId":404573,"journal":{"name":"INFLOW '13","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131904657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Katelin Bailey, Peter Hornyack, L. Ceze, S. Gribble, H. Levy
In the near future, new storage-class memory (SCM) technologies -- such as phase-change memory and memristors -- will radically change the nature of long-term storage. These devices will be cheap, non-volatile, byte addressable, and near DRAM density and speed. While SCM offers enormous opportunities, profiting from them will require new storage systems specifically designed for SCM's properties. This paper presents Echo, a persistent key-value storage system designed to leverage the advantages and address the challenges of SCM. The goals of Echo include high performance for both small and large data objects, recoverability after failure, and scalability on multicore systems. Echo achieves its goals through the use of a two-level memory design targeted for memory systems containing both DRAM and SCM, exploitation of SCM's byte addressability for fine-grained transactions in non-volatile memory, and the use of snapshot isolation for concurrency, consistency, and versioning. Our evaluation demonstrates that Echo's SCM-centric design achieves the durability guarantees of the best disk-based stores with the performance characteristics approaching the best in-memory key-value stores.
{"title":"Exploring storage class memory with key value stores","authors":"Katelin Bailey, Peter Hornyack, L. Ceze, S. Gribble, H. Levy","doi":"10.1145/2527792.2527799","DOIUrl":"https://doi.org/10.1145/2527792.2527799","url":null,"abstract":"In the near future, new storage-class memory (SCM) technologies -- such as phase-change memory and memristors -- will radically change the nature of long-term storage. These devices will be cheap, non-volatile, byte addressable, and near DRAM density and speed. While SCM offers enormous opportunities, profiting from them will require new storage systems specifically designed for SCM's properties.\u0000 This paper presents Echo, a persistent key-value storage system designed to leverage the advantages and address the challenges of SCM. The goals of Echo include high performance for both small and large data objects, recoverability after failure, and scalability on multicore systems. Echo achieves its goals through the use of a two-level memory design targeted for memory systems containing both DRAM and SCM, exploitation of SCM's byte addressability for fine-grained transactions in non-volatile memory, and the use of snapshot isolation for concurrency, consistency, and versioning. Our evaluation demonstrates that Echo's SCM-centric design achieves the durability guarantees of the best disk-based stores with the performance characteristics approaching the best in-memory key-value stores.","PeriodicalId":404573,"journal":{"name":"INFLOW '13","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121792741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}