Pub Date : 2014-06-02DOI: 10.1109/MSST.2014.6855553
A. Crume, C. Maltzahn, L. Ward, Thomas M. Kroeger, M. Curry
Predicting access times is a crucial part of predicting hard disk drive performance. Existing approaches use white-box modeling and require intimate knowledge of the internal layout of the drive, which can take months to extract. Automatically learning this behavior is a much more desirable approach, requiring less expert knowledge, fewer assumptions, and less time. While previous research has created black-box models of hard disk drive performance, none have shown low per-request errors. A barrier to machine learning of access times has been the existence of periodic behavior with high, unknown frequencies. We identify these high frequencies with Fourier analysis and include them explicitly as input to the model. In this paper we focus on the simulation of access times for random read workloads within a single zone. We are able to automatically generate and tune request-level access time models with mean absolute error less than 0.15 ms. To our knowledge this is the first time such a fidelity has been achieved with modern disk drives using machine learning. We are confident that our approach forms the core for automatic generation of access time models that include other workloads and span across entire disk drives, but more work remains.
{"title":"Automatic generation of behavioral hard disk drive access time models","authors":"A. Crume, C. Maltzahn, L. Ward, Thomas M. Kroeger, M. Curry","doi":"10.1109/MSST.2014.6855553","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855553","url":null,"abstract":"Predicting access times is a crucial part of predicting hard disk drive performance. Existing approaches use white-box modeling and require intimate knowledge of the internal layout of the drive, which can take months to extract. Automatically learning this behavior is a much more desirable approach, requiring less expert knowledge, fewer assumptions, and less time. While previous research has created black-box models of hard disk drive performance, none have shown low per-request errors. A barrier to machine learning of access times has been the existence of periodic behavior with high, unknown frequencies. We identify these high frequencies with Fourier analysis and include them explicitly as input to the model. In this paper we focus on the simulation of access times for random read workloads within a single zone. We are able to automatically generate and tune request-level access time models with mean absolute error less than 0.15 ms. To our knowledge this is the first time such a fidelity has been achieved with modern disk drives using machine learning. We are confident that our approach forms the core for automatic generation of access time models that include other workloads and span across entire disk drives, but more work remains.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115184106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-06-02DOI: 10.1109/MSST.2014.6855542
George Goldberg, Danny Harnik, D. Sotnikov
Sampling has long been a prominent tool in statistics and analytics, first and foremost when very large amounts of data are involved. In the realm of very large file systems (and hierarchical data stores in general), however, sampling has mostly been ignored and for several good reasons. Mainly, running sampling in such an environment introduces technical challenges that make the entire sampling process non-beneficial. In this work we demonstrate that there are cases for which sampling is very worthwhile in very large file systems. We address this topic in two aspect: (a) the technical side where we design and implement solutions to efficient weighted sampling that is also distributed, one-pass and addresses multiple efficiency aspects; and (b) the usability aspect in which we demonstrate several use-cases in which weighted sampling over large file systems is extremely beneficial. In particular, we show use-cases regarding estimation of compression ratios, testing and auditing and offline collection of statistics on very large data stores.
{"title":"The case for sampling on very large file systems","authors":"George Goldberg, Danny Harnik, D. Sotnikov","doi":"10.1109/MSST.2014.6855542","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855542","url":null,"abstract":"Sampling has long been a prominent tool in statistics and analytics, first and foremost when very large amounts of data are involved. In the realm of very large file systems (and hierarchical data stores in general), however, sampling has mostly been ignored and for several good reasons. Mainly, running sampling in such an environment introduces technical challenges that make the entire sampling process non-beneficial. In this work we demonstrate that there are cases for which sampling is very worthwhile in very large file systems. We address this topic in two aspect: (a) the technical side where we design and implement solutions to efficient weighted sampling that is also distributed, one-pass and addresses multiple efficiency aspects; and (b) the usability aspect in which we demonstrate several use-cases in which weighted sampling over large file systems is extremely beneficial. In particular, we show use-cases regarding estimation of compression ratios, testing and auditing and offline collection of statistics on very large data stores.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127319897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-06-02DOI: 10.1109/MSST.2014.6855534
Yue Yang, Jianwen Zhu
Garbage collection plays a central role of flash-based solid state drive performance, in particular, its endurance. Analytical modeling is an indispensable instrument for design improvement as it demonstrates the relationship between SSD endurance, manifested as write amplification, and the algorithmic design variables, as well as workload characteristics. In this paper, we improve recent advances in using the mean field analysis as a tool for performance analysis and target hotness-aware flash management algorithms. We show that even under a generic workload model, the system dynamics can be captured by a system of ordinary differential equations, and the steady-state write amplification can be predicted for a variety of practical garbage collection algorithms, including the d-Choice algorithm. Furthermore, the analytical model is validated by a large collection of real and synthetic traces, and prediction errors against these simulations are shown to be within 5%.
{"title":"Analytical modeling of garbage collection algorithms in hotness-aware flash-based solid state drives","authors":"Yue Yang, Jianwen Zhu","doi":"10.1109/MSST.2014.6855534","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855534","url":null,"abstract":"Garbage collection plays a central role of flash-based solid state drive performance, in particular, its endurance. Analytical modeling is an indispensable instrument for design improvement as it demonstrates the relationship between SSD endurance, manifested as write amplification, and the algorithmic design variables, as well as workload characteristics. In this paper, we improve recent advances in using the mean field analysis as a tool for performance analysis and target hotness-aware flash management algorithms. We show that even under a generic workload model, the system dynamics can be captured by a system of ordinary differential equations, and the steady-state write amplification can be predicted for a variety of practical garbage collection algorithms, including the d-Choice algorithm. Furthermore, the analytical model is validated by a large collection of real and synthetic traces, and prediction errors against these simulations are shown to be within 5%.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133662621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-06-02DOI: 10.1109/MSST.2014.6855541
Feng Chen, M. Mesnier, Scott Hahn
Persistent Memory (PM) technologies, such as Phase Change Memory, STT-RAM, and memristors, are receiving increasingly high interest in academia and industry. PM provides many attractive features, such as DRAM-like speed and storage-like persistence. Yet, because it draws a blurry line between memory and storage, neither a memory- or storage-based model is a natural fit. Best integrating PM into existing systems has become challenging and is now a top priority for many. In this paper we share our initial approach to integrating PM into computer systems, with minimal impact to the core operating system. By adopting a hybrid storage model, all of our changes are confined to a block storage driver, called PMBD, which directly accesses PM attached to the memory bus and exposes a logical block I/O interface to users. We explore the design space by examining a variety of options to achieve performance, protection from stray writes, ordered persistence, and compatibility for legacy file systems and applications. All told, we find that by using a combination of existing OS mechanisms (per-core page table mappings, non-temporal store instructions, memory fences, and I/O barriers), we are able to achieve each of these goals with small performance overhead for both micro-benchmarks and real world applications (e.g., file server and database workloads). Our experience suggests that determining the right combination of existing platform and OS mechanisms is a non-trivial exercise. In this paper, we share both our failed and successful attempts. The final solution that we propose represents an evolution of our initial approach. We have also open-sourced our software prototype with all attempted design options to encourage further research in this area.
{"title":"A protected block device for Persistent Memory","authors":"Feng Chen, M. Mesnier, Scott Hahn","doi":"10.1109/MSST.2014.6855541","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855541","url":null,"abstract":"Persistent Memory (PM) technologies, such as Phase Change Memory, STT-RAM, and memristors, are receiving increasingly high interest in academia and industry. PM provides many attractive features, such as DRAM-like speed and storage-like persistence. Yet, because it draws a blurry line between memory and storage, neither a memory- or storage-based model is a natural fit. Best integrating PM into existing systems has become challenging and is now a top priority for many. In this paper we share our initial approach to integrating PM into computer systems, with minimal impact to the core operating system. By adopting a hybrid storage model, all of our changes are confined to a block storage driver, called PMBD, which directly accesses PM attached to the memory bus and exposes a logical block I/O interface to users. We explore the design space by examining a variety of options to achieve performance, protection from stray writes, ordered persistence, and compatibility for legacy file systems and applications. All told, we find that by using a combination of existing OS mechanisms (per-core page table mappings, non-temporal store instructions, memory fences, and I/O barriers), we are able to achieve each of these goals with small performance overhead for both micro-benchmarks and real world applications (e.g., file server and database workloads). Our experience suggests that determining the right combination of existing platform and OS mechanisms is a non-trivial exercise. In this paper, we share both our failed and successful attempts. The final solution that we propose represents an evolution of our initial approach. We have also open-sourced our software prototype with all attempted design options to encourage further research in this area.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"11 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125007252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manufacturers are continuously pushing NAND flash memory into smaller geometries and enforce each cell to store multiple bits in order to largely reduce its cost. Unfortunately, these scaling down techniques inherently degrade the endurance and reliability of flash memory. As a result, permanent errors such as block or die failures could occur with a higher possibility. While most transient errors like programming errors can be fixed by an ECC (error correction code) scheme, rectifying permanent errors requires a data redundancy mechanism like RAID (redundant array of independent disks) in a single SSD where multiple channels work in parallel. To enhance the reliability of a solid-state drive (SSD) while maintaining its performance, we first implement several common RAID structures in the channel level of a single SSD to understand their impact on an SSD's performance. Next, we propose a new data redundancy architecture called CR5M (Channel-RAID5 with Mirroring), which can be applied to one SSD for mission-critical applications. CR5M utilizes hidden mirror chips to accelerate the performance of small writes. Finally, we conduct extensive simulations using real-world traces and synthetic benchmarks on a validated simulator to evaluate CR5M. Experimental results demonstrate that compared with CR5 (Channel-RAID5) CR5M decreases mean response time by up to 25.8%. Besides, it reduces the average writes per channel by up to 23.6%.
为了大幅降低成本,制造商们不断地将NAND闪存推向更小的几何形状,并强制每个单元存储多个比特。不幸的是,这些缩小技术本质上降低了闪存的耐用性和可靠性。因此,诸如块或模具失效之类的永久性错误可能会以更高的可能性发生。虽然大多数瞬时错误(如编程错误)可以通过ECC(错误纠正码)方案修复,但纠正永久性错误需要在单个SSD中使用数据冗余机制,如RAID(独立磁盘冗余阵列),其中多个通道并行工作。为了提高固态硬盘的可靠性,同时保持其性能,我们首先在单个SSD的通道级别实现几种常见的RAID结构,以了解它们对SSD性能的影响。接下来,我们提出一种新的数据冗余架构,称为CR5M (Channel-RAID5 with Mirroring),它可以应用于一个SSD,用于任务关键型应用程序。CR5M利用隐藏镜像芯片来加速小写的性能。最后,我们在经过验证的模拟器上使用真实世界的轨迹和合成基准进行了广泛的模拟,以评估CR5M。实验结果表明,与CR5 (Channel-RAID5)相比,CR5M的平均响应时间降低了25.8%。此外,它还将每个通道的平均写操作减少了23.6%。
{"title":"CR5M: A mirroring-powered channel-RAID5 architecture for an SSD","authors":"Yu Wang, Wei Wang, T. Xie, Wen Pan, Yanyan Gao, Yiming Ouyang","doi":"10.1109/MSST.2014.6855547","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855547","url":null,"abstract":"Manufacturers are continuously pushing NAND flash memory into smaller geometries and enforce each cell to store multiple bits in order to largely reduce its cost. Unfortunately, these scaling down techniques inherently degrade the endurance and reliability of flash memory. As a result, permanent errors such as block or die failures could occur with a higher possibility. While most transient errors like programming errors can be fixed by an ECC (error correction code) scheme, rectifying permanent errors requires a data redundancy mechanism like RAID (redundant array of independent disks) in a single SSD where multiple channels work in parallel. To enhance the reliability of a solid-state drive (SSD) while maintaining its performance, we first implement several common RAID structures in the channel level of a single SSD to understand their impact on an SSD's performance. Next, we propose a new data redundancy architecture called CR5M (Channel-RAID5 with Mirroring), which can be applied to one SSD for mission-critical applications. CR5M utilizes hidden mirror chips to accelerate the performance of small writes. Finally, we conduct extensive simulations using real-world traces and synthetic benchmarks on a validated simulator to evaluate CR5M. Experimental results demonstrate that compared with CR5 (Channel-RAID5) CR5M decreases mean response time by up to 25.8%. Besides, it reduces the average writes per channel by up to 23.6%.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128050163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-06-02DOI: 10.1109/MSST.2014.6855545
Q. Wei, Cheng Chen, Jun Yang
Random writes significantly limit the application of Solid State Drive (SSD) in the I/O intensive applications such as scientific computing, Web services, and database. While several buffer management algorithms are proposed to reduce random writes, their ability to deal with workloads mixed with sequential and random accesses is limited. In this paper, we propose a cooperative buffer management scheme referred to as CBM, which coordinates write buffer and read cache to fully exploit temporal and spatial localities among I/O intensive workload. To improve both buffer hit rate and destage sequentiality, CBM divides write buffer space into Page Region and Block Region. Randomly written data is put in the Page Region at page granularity, while sequentially written data is stored in the Block Region at block granularity. CBM leverages threshold-based migration to dynamically classify random write from sequential writes. When a block is evicted from write buffer, CBM merges the dirty pages in write buffer and the clean pages in read cache belonging to the evicted block to maximize the possibility of forming full block write. CBM has been extensively evaluated with simulation and real implementation on OpenSSD. Our testing results conclusively demonstrate that CBM can achieve up to 84% performance improvement and 85% garbage collection overhead reduction compared to existing buffer management schemes.
{"title":"CBM: A cooperative buffer management for SSD","authors":"Q. Wei, Cheng Chen, Jun Yang","doi":"10.1109/MSST.2014.6855545","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855545","url":null,"abstract":"Random writes significantly limit the application of Solid State Drive (SSD) in the I/O intensive applications such as scientific computing, Web services, and database. While several buffer management algorithms are proposed to reduce random writes, their ability to deal with workloads mixed with sequential and random accesses is limited. In this paper, we propose a cooperative buffer management scheme referred to as CBM, which coordinates write buffer and read cache to fully exploit temporal and spatial localities among I/O intensive workload. To improve both buffer hit rate and destage sequentiality, CBM divides write buffer space into Page Region and Block Region. Randomly written data is put in the Page Region at page granularity, while sequentially written data is stored in the Block Region at block granularity. CBM leverages threshold-based migration to dynamically classify random write from sequential writes. When a block is evicted from write buffer, CBM merges the dirty pages in write buffer and the clean pages in read cache belonging to the evicted block to maximize the possibility of forming full block write. CBM has been extensively evaluated with simulation and real implementation on OpenSSD. Our testing results conclusively demonstrate that CBM can achieve up to 84% performance improvement and 85% garbage collection overhead reduction compared to existing buffer management schemes.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121229534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-06-02DOI: 10.1109/MSST.2014.6855544
Congming Gao, Liang Shi, Mengying Zhao, C. Xue, Kaijie Wu, E. Sha
Solid state drives (SSDs) have been widely deployed in personal computers, data centers, and cloud storages. In order to improve performance, SSDs are usually constructed with a number of channels with each channel connecting to a number of NAND flash chips. Despite the rich parallelism offered by multiple channels and multiple chips per channel, recent studies show that the utilization of flash chips (i.e. the number of flash chips being accessed simultaneously) is seriously low. Our study shows that the low chip utilization is caused by the access conflict among I/O requests. In this work, we propose Parallel Issue Queuing (PIQ), a novel I/O scheduler at the host system, to minimize the access conflicts between I/O requests. The proposed PIQ schedules I/O requests without conflicts into the same batch and I/O requests with conflicts into different batches. Hence the multiple I/O requests in one batch can be fulfilled simultaneously by exploiting the rich parallelism of SSD. And because PIQ is implemented at the host side, it can take advantage of rich resource at host system such as main memory and CPU, which makes the overhead negligible. Extensive experimental results show that PIQ delivers significant performance improvement to the applications that have heavy access conflicts.
{"title":"Exploiting parallelism in I/O scheduling for access conflict minimization in flash-based solid state drives","authors":"Congming Gao, Liang Shi, Mengying Zhao, C. Xue, Kaijie Wu, E. Sha","doi":"10.1109/MSST.2014.6855544","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855544","url":null,"abstract":"Solid state drives (SSDs) have been widely deployed in personal computers, data centers, and cloud storages. In order to improve performance, SSDs are usually constructed with a number of channels with each channel connecting to a number of NAND flash chips. Despite the rich parallelism offered by multiple channels and multiple chips per channel, recent studies show that the utilization of flash chips (i.e. the number of flash chips being accessed simultaneously) is seriously low. Our study shows that the low chip utilization is caused by the access conflict among I/O requests. In this work, we propose Parallel Issue Queuing (PIQ), a novel I/O scheduler at the host system, to minimize the access conflicts between I/O requests. The proposed PIQ schedules I/O requests without conflicts into the same batch and I/O requests with conflicts into different batches. Hence the multiple I/O requests in one batch can be fulfilled simultaneously by exploiting the rich parallelism of SSD. And because PIQ is implemented at the host side, it can take advantage of rich resource at host system such as main memory and CPU, which makes the overhead negligible. Extensive experimental results show that PIQ delivers significant performance improvement to the applications that have heavy access conflicts.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121495601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-06-02DOI: 10.1109/MSST.2014.6855546
Ziqi Fan, D. Du, Doug Voigt
With the rapid development of new types of nonvolatile memory (NVM), one of these technologies may replace DRAM as the main memory in the near future. Some drawbacks of DRAM, such as data loss due to power failure or a system crash can be remedied by NVM's non-volatile nature. In the meantime, solid state drives (SSDs) are becoming widely deployed as storage devices for faster random access speed compared with traditional hard disk drives (HDDs). For applications demanding higher reliability and better performance, using NVM as the main memory and SSDs as storage devices becomes a promising architecture. Although SSDs have better performance than HDDs, SSDs cannot support in-place updates (i.e., an erase operation has to be performed before a page can be updated) and suffer from a low endurance problem that each unit will wear out after certain number of erase operations. In an NVM based main memory, any updated pages called dirty pages can be kept longer without the urgent need to be flushed to SSDs. This difference opens an opportunity to design new cache policies that help extend the lifespan of SSDs by wisely choosing cache eviction victims to decrease storage write traffic. However, it is very challenging to design a policy that can also increase the cache hit ratio for better system performance. Most existing DRAM-based cache policies have mainly concentrated on the recency or frequency status of a page. On the other hand, most existing NVM-based cache policies have mainly focused on the dirty or clean status of a page. In this paper, by extending the concept of the Adaptive Replacement Cache (ARC), we propose a Hierarchical Adaptive Replacement Cache (H-ARC) policy that considers all four factors of a page's status: dirty, clean, recency, and frequency. Specifically, at the higher level, H-ARC adaptively splits the whole cache space into a dirty-page cache and a clean-page cache. At the lower level, inside the dirty-page cache and the clean-page cache, H-ARC splits them into a recency-page cache and a frequency-page cache separately. During the page eviction process, all parts of the cache will be balanced towards to their desired sizes.
{"title":"H-ARC: A non-volatile memory based cache policy for solid state drives","authors":"Ziqi Fan, D. Du, Doug Voigt","doi":"10.1109/MSST.2014.6855546","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855546","url":null,"abstract":"With the rapid development of new types of nonvolatile memory (NVM), one of these technologies may replace DRAM as the main memory in the near future. Some drawbacks of DRAM, such as data loss due to power failure or a system crash can be remedied by NVM's non-volatile nature. In the meantime, solid state drives (SSDs) are becoming widely deployed as storage devices for faster random access speed compared with traditional hard disk drives (HDDs). For applications demanding higher reliability and better performance, using NVM as the main memory and SSDs as storage devices becomes a promising architecture. Although SSDs have better performance than HDDs, SSDs cannot support in-place updates (i.e., an erase operation has to be performed before a page can be updated) and suffer from a low endurance problem that each unit will wear out after certain number of erase operations. In an NVM based main memory, any updated pages called dirty pages can be kept longer without the urgent need to be flushed to SSDs. This difference opens an opportunity to design new cache policies that help extend the lifespan of SSDs by wisely choosing cache eviction victims to decrease storage write traffic. However, it is very challenging to design a policy that can also increase the cache hit ratio for better system performance. Most existing DRAM-based cache policies have mainly concentrated on the recency or frequency status of a page. On the other hand, most existing NVM-based cache policies have mainly focused on the dirty or clean status of a page. In this paper, by extending the concept of the Adaptive Replacement Cache (ARC), we propose a Hierarchical Adaptive Replacement Cache (H-ARC) policy that considers all four factors of a page's status: dirty, clean, recency, and frequency. Specifically, at the higher level, H-ARC adaptively splits the whole cache space into a dirty-page cache and a clean-page cache. At the lower level, inside the dirty-page cache and the clean-page cache, H-ARC splits them into a recency-page cache and a frequency-page cache separately. During the page eviction process, all parts of the cache will be balanced towards to their desired sizes.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121852377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-06-02DOI: 10.1109/MSST.2014.6855551
Vipul Mathur, Cijo George, J. Basak
Performance problems are particularly hard to detect and diagnose in most computer systems, since there is no clear failure apart from the system being slow. In this paper, we present an empirical, data-driven methodology for detecting performance problems in data storage systems, and aiding in quick diagnosis once a problem is detected. The key feature of our solution is that it uses a combination of time-series analysis, domain knowledge and expert inputs to improve the overall efficacy. Our solution learns from a system's own history to establish the baseline of normal behavior. Hence it is not necessary to determine any static trigger-levels for metrics to raise alerts. Static triggers are ineffective since each system and its workloads are different from others. The method presented here (a) gives accurate indications of the time period when something goes wrong in a system, and (b) helps pin-point the most affected parts of the system to aid in diagnosis. Validation on more than 400 actual field support cases shows about 85% true positive rate with less than 10% false positive rate in identifying time periods of performance impact before or during the time a case was open. Results in a controlled lab environment are even better.
{"title":"Anode: Empirical detection of performance problems in storage systems using time-series analysis of periodic measurements","authors":"Vipul Mathur, Cijo George, J. Basak","doi":"10.1109/MSST.2014.6855551","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855551","url":null,"abstract":"Performance problems are particularly hard to detect and diagnose in most computer systems, since there is no clear failure apart from the system being slow. In this paper, we present an empirical, data-driven methodology for detecting performance problems in data storage systems, and aiding in quick diagnosis once a problem is detected. The key feature of our solution is that it uses a combination of time-series analysis, domain knowledge and expert inputs to improve the overall efficacy. Our solution learns from a system's own history to establish the baseline of normal behavior. Hence it is not necessary to determine any static trigger-levels for metrics to raise alerts. Static triggers are ineffective since each system and its workloads are different from others. The method presented here (a) gives accurate indications of the time period when something goes wrong in a system, and (b) helps pin-point the most affected parts of the system to aid in diagnosis. Validation on more than 400 actual field support cases shows about 85% true positive rate with less than 10% false positive rate in identifying time periods of performance impact before or during the time a case was open. Results in a controlled lab environment are even better.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116010282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-06-02DOI: 10.1109/MSST.2014.6855539
Chao Jin, Weiya Xi, Z. Ching, Feng Huo, Chun-Teck Lim
HiSMRfs a file system with standard POSIX interface suitable for Shingled Magnetic Recording (SMR) drives, has been designed and developed. HiSMRfs can manage raw SMR drives and support random writes without remapping layer implemented inside SMR drives. To achieve high performance, HiSMRfs separates data and metadata storage, and manages them differently. Metadata is managed using in-memory tree structures and stored in a high performance random write area such as in a SSD. Data writing is done through sequential appending style and store in a SMR drive. HiSMRfs includes a file/object-based RAID module for SMR/HDD arrays. The RAID module computes parity for individual files/objects and guarantees that data and parity writing are 100% in sequential and in full stripe. HiSMRfs is also suitable for a hybrid storage system with conventional HDDs and SSDs. Two prototype systems with HiSMRfs have been developed. The performance has been tested and compared with SMRfs and Flashcache. The experimental tests show that HiSMRfs performs 25% better than SMRfs, and 11% better than Flashcache system.
{"title":"HiSMRfs: A high performance file system for shingled storage array","authors":"Chao Jin, Weiya Xi, Z. Ching, Feng Huo, Chun-Teck Lim","doi":"10.1109/MSST.2014.6855539","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855539","url":null,"abstract":"HiSMRfs a file system with standard POSIX interface suitable for Shingled Magnetic Recording (SMR) drives, has been designed and developed. HiSMRfs can manage raw SMR drives and support random writes without remapping layer implemented inside SMR drives. To achieve high performance, HiSMRfs separates data and metadata storage, and manages them differently. Metadata is managed using in-memory tree structures and stored in a high performance random write area such as in a SSD. Data writing is done through sequential appending style and store in a SMR drive. HiSMRfs includes a file/object-based RAID module for SMR/HDD arrays. The RAID module computes parity for individual files/objects and guarantees that data and parity writing are 100% in sequential and in full stripe. HiSMRfs is also suitable for a hybrid storage system with conventional HDDs and SSDs. Two prototype systems with HiSMRfs have been developed. The performance has been tested and compared with SMRfs and Flashcache. The experimental tests show that HiSMRfs performs 25% better than SMRfs, and 11% better than Flashcache system.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115692368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}