2014 30th Symposium on Mass Storage Systems and Technologies (MSST)最新文献_第2页

Automatic generation of behavioral hard disk drive access time models 自动生成行为硬盘驱动器访问时间模型

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855553

A. Crume, C. Maltzahn, L. Ward, Thomas M. Kroeger, M. Curry

Predicting access times is a crucial part of predicting hard disk drive performance. Existing approaches use white-box modeling and require intimate knowledge of the internal layout of the drive, which can take months to extract. Automatically learning this behavior is a much more desirable approach, requiring less expert knowledge, fewer assumptions, and less time. While previous research has created black-box models of hard disk drive performance, none have shown low per-request errors. A barrier to machine learning of access times has been the existence of periodic behavior with high, unknown frequencies. We identify these high frequencies with Fourier analysis and include them explicitly as input to the model. In this paper we focus on the simulation of access times for random read workloads within a single zone. We are able to automatically generate and tune request-level access time models with mean absolute error less than 0.15 ms. To our knowledge this is the first time such a fidelity has been achieved with modern disk drives using machine learning. We are confident that our approach forms the core for automatic generation of access time models that include other workloads and span across entire disk drives, but more work remains.

预测访问时间是预测硬盘驱动器性能的关键部分。现有的方法使用白盒建模，并且需要对驱动器的内部布局有深入的了解，这可能需要几个月的时间来提取。自动学习这种行为是一种更可取的方法，它需要更少的专家知识、更少的假设和更少的时间。虽然以前的研究已经创建了硬盘驱动器性能的黑盒模型，但没有一个显示出低的每次请求错误。机器学习访问时间的一个障碍是存在高频率、未知频率的周期性行为。我们用傅里叶分析识别这些高频，并将它们明确地作为模型的输入。在本文中，我们重点研究了单个区域内随机读工作负载的访问时间模拟。我们能够自动生成和调优请求级访问时间模型，平均绝对误差小于0.15 ms。据我们所知，这是第一次使用机器学习在现代磁盘驱动器上实现这样的保真度。我们相信，我们的方法构成了自动生成访问时间模型的核心，包括其他工作负载和跨整个磁盘驱动器的访问时间模型，但还有更多的工作要做。

{"title":"Automatic generation of behavioral hard disk drive access time models","authors":"A. Crume, C. Maltzahn, L. Ward, Thomas M. Kroeger, M. Curry","doi":"10.1109/MSST.2014.6855553","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855553","url":null,"abstract":"Predicting access times is a crucial part of predicting hard disk drive performance. Existing approaches use white-box modeling and require intimate knowledge of the internal layout of the drive, which can take months to extract. Automatically learning this behavior is a much more desirable approach, requiring less expert knowledge, fewer assumptions, and less time. While previous research has created black-box models of hard disk drive performance, none have shown low per-request errors. A barrier to machine learning of access times has been the existence of periodic behavior with high, unknown frequencies. We identify these high frequencies with Fourier analysis and include them explicitly as input to the model. In this paper we focus on the simulation of access times for random read workloads within a single zone. We are able to automatically generate and tune request-level access time models with mean absolute error less than 0.15 ms. To our knowledge this is the first time such a fidelity has been achieved with modern disk drives using machine learning. We are confident that our approach forms the core for automatic generation of access time models that include other workloads and span across entire disk drives, but more work remains.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115184106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

The case for sampling on very large file systems 在非常大的文件系统上进行采样的案例

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855542

George Goldberg, Danny Harnik, D. Sotnikov

Sampling has long been a prominent tool in statistics and analytics, first and foremost when very large amounts of data are involved. In the realm of very large file systems (and hierarchical data stores in general), however, sampling has mostly been ignored and for several good reasons. Mainly, running sampling in such an environment introduces technical challenges that make the entire sampling process non-beneficial. In this work we demonstrate that there are cases for which sampling is very worthwhile in very large file systems. We address this topic in two aspect: (a) the technical side where we design and implement solutions to efficient weighted sampling that is also distributed, one-pass and addresses multiple efficiency aspects; and (b) the usability aspect in which we demonstrate several use-cases in which weighted sampling over large file systems is extremely beneficial. In particular, we show use-cases regarding estimation of compression ratios, testing and auditing and offline collection of statistics on very large data stores.

抽样长期以来一直是统计和分析中的重要工具，尤其是在涉及大量数据时。然而，在非常大的文件系统(以及一般的分层数据存储)领域中，由于几个很好的原因，抽样基本上被忽略了。主要是，在这样的环境中运行采样引入了技术挑战，使整个采样过程没有好处。在这项工作中，我们证明了在一些情况下，在非常大的文件系统中采样是非常值得的。我们从两个方面解决这个问题:(a)技术方面，我们设计和实施有效加权抽样的解决方案，该解决方案也是分布式的，一次通过并解决多个效率方面的问题;(b)可用性方面，我们展示了几个用例，在这些用例中，对大型文件系统进行加权抽样是非常有益的。特别是，我们展示了有关压缩比估计、测试和审计以及在非常大的数据存储上离线收集统计信息的用例。

引用次数: 2

Analytical modeling of garbage collection algorithms in hotness-aware flash-based solid state drives 基于热感知闪存的固态硬盘垃圾收集算法的分析建模

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855534

Yue Yang, Jianwen Zhu

Garbage collection plays a central role of flash-based solid state drive performance, in particular, its endurance. Analytical modeling is an indispensable instrument for design improvement as it demonstrates the relationship between SSD endurance, manifested as write amplification, and the algorithmic design variables, as well as workload characteristics. In this paper, we improve recent advances in using the mean field analysis as a tool for performance analysis and target hotness-aware flash management algorithms. We show that even under a generic workload model, the system dynamics can be captured by a system of ordinary differential equations, and the steady-state write amplification can be predicted for a variety of practical garbage collection algorithms, including the d-Choice algorithm. Furthermore, the analytical model is validated by a large collection of real and synthetic traces, and prediction errors against these simulations are shown to be within 5%.

垃圾收集对基于闪存的固态硬盘的性能，特别是耐久性起着核心作用。分析建模是设计改进不可或缺的工具，因为它展示了SSD耐用性(表现为写入放大)与算法设计变量以及工作负载特征之间的关系。在本文中，我们改进了使用平均场分析作为性能分析和目标热感知闪存管理算法的工具的最新进展。我们表明，即使在一般的工作负载模型下，系统动态也可以通过常微分方程系统来捕获，并且可以预测各种实用的垃圾收集算法(包括d-Choice算法)的稳态写入放大。此外，分析模型通过大量真实和合成轨迹验证，与这些模拟的预测误差显示在5%以内。

引用次数: 14

A protected block device for Persistent Memory 持久内存保护的块设备

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855541

Feng Chen, M. Mesnier, Scott Hahn

Persistent Memory (PM) technologies, such as Phase Change Memory, STT-RAM, and memristors, are receiving increasingly high interest in academia and industry. PM provides many attractive features, such as DRAM-like speed and storage-like persistence. Yet, because it draws a blurry line between memory and storage, neither a memory- or storage-based model is a natural fit. Best integrating PM into existing systems has become challenging and is now a top priority for many. In this paper we share our initial approach to integrating PM into computer systems, with minimal impact to the core operating system. By adopting a hybrid storage model, all of our changes are confined to a block storage driver, called PMBD, which directly accesses PM attached to the memory bus and exposes a logical block I/O interface to users. We explore the design space by examining a variety of options to achieve performance, protection from stray writes, ordered persistence, and compatibility for legacy file systems and applications. All told, we find that by using a combination of existing OS mechanisms (per-core page table mappings, non-temporal store instructions, memory fences, and I/O barriers), we are able to achieve each of these goals with small performance overhead for both micro-benchmarks and real world applications (e.g., file server and database workloads). Our experience suggests that determining the right combination of existing platform and OS mechanisms is a non-trivial exercise. In this paper, we share both our failed and successful attempts. The final solution that we propose represents an evolution of our initial approach. We have also open-sourced our software prototype with all attempted design options to encourage further research in this area.

持久存储器(PM)技术，如相变存储器、STT-RAM和忆阻器，正受到学术界和工业界越来越高的兴趣。PM提供了许多吸引人的特性，比如类似dram的速度和类似存储的持久性。然而，由于它在内存和存储之间画了一条模糊的界限，基于内存或基于存储的模型都不是一个自然的适合。将项目管理最佳地集成到现有系统中已经变得具有挑战性，并且现在是许多人的首要任务。在本文中，我们分享了将PM集成到计算机系统中的初始方法，对核心操作系统的影响最小。通过采用混合存储模型，我们所有的更改都被限制在一个称为PMBD的块存储驱动程序中，它直接访问附加到内存总线上的PM，并向用户公开逻辑块I/O接口。我们通过研究各种选项来探索设计空间，以实现性能、防止偶然写入、有序持久性以及对遗留文件系统和应用程序的兼容性。总而言之，我们发现通过使用现有操作系统机制(每核页表映射、非时态存储指令、内存栅栏和I/O屏障)的组合，我们能够以很小的性能开销实现这些目标，无论是对于微基准测试还是现实世界的应用程序(例如，文件服务器和数据库工作负载)。我们的经验表明，确定现有平台和操作系统机制的正确组合是一项非常重要的工作。在本文中，我们分享了我们的失败和成功的尝试。我们提出的最终解决方案代表了我们最初方法的演变。我们还开源了我们的软件原型和所有尝试的设计选项，以鼓励在这一领域的进一步研究。

{"title":"A protected block device for Persistent Memory","authors":"Feng Chen, M. Mesnier, Scott Hahn","doi":"10.1109/MSST.2014.6855541","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855541","url":null,"abstract":"Persistent Memory (PM) technologies, such as Phase Change Memory, STT-RAM, and memristors, are receiving increasingly high interest in academia and industry. PM provides many attractive features, such as DRAM-like speed and storage-like persistence. Yet, because it draws a blurry line between memory and storage, neither a memory- or storage-based model is a natural fit. Best integrating PM into existing systems has become challenging and is now a top priority for many. In this paper we share our initial approach to integrating PM into computer systems, with minimal impact to the core operating system. By adopting a hybrid storage model, all of our changes are confined to a block storage driver, called PMBD, which directly accesses PM attached to the memory bus and exposes a logical block I/O interface to users. We explore the design space by examining a variety of options to achieve performance, protection from stray writes, ordered persistence, and compatibility for legacy file systems and applications. All told, we find that by using a combination of existing OS mechanisms (per-core page table mappings, non-temporal store instructions, memory fences, and I/O barriers), we are able to achieve each of these goals with small performance overhead for both micro-benchmarks and real world applications (e.g., file server and database workloads). Our experience suggests that determining the right combination of existing platform and OS mechanisms is a non-trivial exercise. In this paper, we share both our failed and successful attempts. The final solution that we propose represents an evolution of our initial approach. We have also open-sourced our software prototype with all attempted design options to encourage further research in this area.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"11 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125007252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58

CR5M: A mirroring-powered channel-RAID5 architecture for an SSD CR5M: SSD硬盘的镜像通道- raid5架构

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855547

Yu Wang, Wei Wang, T. Xie, Wen Pan, Yanyan Gao, Yiming Ouyang

Manufacturers are continuously pushing NAND flash memory into smaller geometries and enforce each cell to store multiple bits in order to largely reduce its cost. Unfortunately, these scaling down techniques inherently degrade the endurance and reliability of flash memory. As a result, permanent errors such as block or die failures could occur with a higher possibility. While most transient errors like programming errors can be fixed by an ECC (error correction code) scheme, rectifying permanent errors requires a data redundancy mechanism like RAID (redundant array of independent disks) in a single SSD where multiple channels work in parallel. To enhance the reliability of a solid-state drive (SSD) while maintaining its performance, we first implement several common RAID structures in the channel level of a single SSD to understand their impact on an SSD's performance. Next, we propose a new data redundancy architecture called CR5M (Channel-RAID5 with Mirroring), which can be applied to one SSD for mission-critical applications. CR5M utilizes hidden mirror chips to accelerate the performance of small writes. Finally, we conduct extensive simulations using real-world traces and synthetic benchmarks on a validated simulator to evaluate CR5M. Experimental results demonstrate that compared with CR5 (Channel-RAID5) CR5M decreases mean response time by up to 25.8%. Besides, it reduces the average writes per channel by up to 23.6%.

为了大幅降低成本，制造商们不断地将NAND闪存推向更小的几何形状，并强制每个单元存储多个比特。不幸的是，这些缩小技术本质上降低了闪存的耐用性和可靠性。因此，诸如块或模具失效之类的永久性错误可能会以更高的可能性发生。虽然大多数瞬时错误(如编程错误)可以通过ECC(错误纠正码)方案修复，但纠正永久性错误需要在单个SSD中使用数据冗余机制，如RAID(独立磁盘冗余阵列)，其中多个通道并行工作。为了提高固态硬盘的可靠性，同时保持其性能，我们首先在单个SSD的通道级别实现几种常见的RAID结构，以了解它们对SSD性能的影响。接下来，我们提出一种新的数据冗余架构，称为CR5M (Channel-RAID5 with Mirroring)，它可以应用于一个SSD，用于任务关键型应用程序。CR5M利用隐藏镜像芯片来加速小写的性能。最后，我们在经过验证的模拟器上使用真实世界的轨迹和合成基准进行了广泛的模拟，以评估CR5M。实验结果表明，与CR5 (Channel-RAID5)相比，CR5M的平均响应时间降低了25.8%。此外，它还将每个通道的平均写操作减少了23.6%。

{"title":"CR5M: A mirroring-powered channel-RAID5 architecture for an SSD","authors":"Yu Wang, Wei Wang, T. Xie, Wen Pan, Yanyan Gao, Yiming Ouyang","doi":"10.1109/MSST.2014.6855547","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855547","url":null,"abstract":"Manufacturers are continuously pushing NAND flash memory into smaller geometries and enforce each cell to store multiple bits in order to largely reduce its cost. Unfortunately, these scaling down techniques inherently degrade the endurance and reliability of flash memory. As a result, permanent errors such as block or die failures could occur with a higher possibility. While most transient errors like programming errors can be fixed by an ECC (error correction code) scheme, rectifying permanent errors requires a data redundancy mechanism like RAID (redundant array of independent disks) in a single SSD where multiple channels work in parallel. To enhance the reliability of a solid-state drive (SSD) while maintaining its performance, we first implement several common RAID structures in the channel level of a single SSD to understand their impact on an SSD's performance. Next, we propose a new data redundancy architecture called CR5M (Channel-RAID5 with Mirroring), which can be applied to one SSD for mission-critical applications. CR5M utilizes hidden mirror chips to accelerate the performance of small writes. Finally, we conduct extensive simulations using real-world traces and synthetic benchmarks on a validated simulator to evaluate CR5M. Experimental results demonstrate that compared with CR5 (Channel-RAID5) CR5M decreases mean response time by up to 25.8%. Besides, it reduces the average writes per channel by up to 23.6%.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128050163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

CBM: A cooperative buffer management for SSD CBM: SSD的协作缓冲区管理

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855545

Q. Wei, Cheng Chen, Jun Yang

Random writes significantly limit the application of Solid State Drive (SSD) in the I/O intensive applications such as scientific computing, Web services, and database. While several buffer management algorithms are proposed to reduce random writes, their ability to deal with workloads mixed with sequential and random accesses is limited. In this paper, we propose a cooperative buffer management scheme referred to as CBM, which coordinates write buffer and read cache to fully exploit temporal and spatial localities among I/O intensive workload. To improve both buffer hit rate and destage sequentiality, CBM divides write buffer space into Page Region and Block Region. Randomly written data is put in the Page Region at page granularity, while sequentially written data is stored in the Block Region at block granularity. CBM leverages threshold-based migration to dynamically classify random write from sequential writes. When a block is evicted from write buffer, CBM merges the dirty pages in write buffer and the clean pages in read cache belonging to the evicted block to maximize the possibility of forming full block write. CBM has been extensively evaluated with simulation and real implementation on OpenSSD. Our testing results conclusively demonstrate that CBM can achieve up to 84% performance improvement and 85% garbage collection overhead reduction compared to existing buffer management schemes.

随机写极大地限制了SSD在科学计算、Web服务、数据库等I/O密集型应用中的应用。虽然提出了几种缓冲区管理算法来减少随机写入，但它们处理顺序和随机访问混合工作负载的能力有限。在本文中，我们提出了一种协作式缓冲区管理方案CBM，该方案协调写缓冲区和读缓存，以充分利用I/O密集型工作负载中的时间和空间位置。为了提高缓冲区命中率和破坏顺序，CBM将写缓冲区空间划分为页区和块区。随机写入的数据以页粒度放在页区域中，而顺序写入的数据以块粒度存储在块区域中。CBM利用基于阈值的迁移动态地对随机写入和顺序写入进行分类。当一个块从写缓冲区中被驱逐时，CBM合并写缓冲区中的脏页和属于被驱逐块的读缓存中的干净页，以最大限度地提高形成完整块写的可能性。CBM已经在OpenSSD上通过模拟和实际实现进行了广泛的评估。我们的测试结果最终表明，与现有的缓冲区管理方案相比，CBM可以实现高达84%的性能改进，并减少85%的垃圾收集开销。

{"title":"CBM: A cooperative buffer management for SSD","authors":"Q. Wei, Cheng Chen, Jun Yang","doi":"10.1109/MSST.2014.6855545","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855545","url":null,"abstract":"Random writes significantly limit the application of Solid State Drive (SSD) in the I/O intensive applications such as scientific computing, Web services, and database. While several buffer management algorithms are proposed to reduce random writes, their ability to deal with workloads mixed with sequential and random accesses is limited. In this paper, we propose a cooperative buffer management scheme referred to as CBM, which coordinates write buffer and read cache to fully exploit temporal and spatial localities among I/O intensive workload. To improve both buffer hit rate and destage sequentiality, CBM divides write buffer space into Page Region and Block Region. Randomly written data is put in the Page Region at page granularity, while sequentially written data is stored in the Block Region at block granularity. CBM leverages threshold-based migration to dynamically classify random write from sequential writes. When a block is evicted from write buffer, CBM merges the dirty pages in write buffer and the clean pages in read cache belonging to the evicted block to maximize the possibility of forming full block write. CBM has been extensively evaluated with simulation and real implementation on OpenSSD. Our testing results conclusively demonstrate that CBM can achieve up to 84% performance improvement and 85% garbage collection overhead reduction compared to existing buffer management schemes.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121229534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Exploiting parallelism in I/O scheduling for access conflict minimization in flash-based solid state drives 在基于闪存的固态驱动器中利用I/O调度中的并行性来最小化访问冲突

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855544

Congming Gao, Liang Shi, Mengying Zhao, C. Xue, Kaijie Wu, E. Sha

Solid state drives (SSDs) have been widely deployed in personal computers, data centers, and cloud storages. In order to improve performance, SSDs are usually constructed with a number of channels with each channel connecting to a number of NAND flash chips. Despite the rich parallelism offered by multiple channels and multiple chips per channel, recent studies show that the utilization of flash chips (i.e. the number of flash chips being accessed simultaneously) is seriously low. Our study shows that the low chip utilization is caused by the access conflict among I/O requests. In this work, we propose Parallel Issue Queuing (PIQ), a novel I/O scheduler at the host system, to minimize the access conflicts between I/O requests. The proposed PIQ schedules I/O requests without conflicts into the same batch and I/O requests with conflicts into different batches. Hence the multiple I/O requests in one batch can be fulfilled simultaneously by exploiting the rich parallelism of SSD. And because PIQ is implemented at the host side, it can take advantage of rich resource at host system such as main memory and CPU, which makes the overhead negligible. Extensive experimental results show that PIQ delivers significant performance improvement to the applications that have heavy access conflicts.

固态硬盘(ssd)已广泛应用于个人计算机、数据中心和云存储中。为了提高性能，ssd通常由多个通道构成，每个通道连接到多个NAND闪存芯片。尽管多通道和每通道多芯片提供了丰富的并行性，但最近的研究表明，闪存芯片的利用率(即同时访问的闪存芯片数量)严重低。研究表明，芯片利用率低是由于I/O请求之间的访问冲突造成的。在这项工作中，我们提出了并行问题队列(PIQ)，一种在主机系统上的新型I/O调度器，以最大限度地减少I/O请求之间的访问冲突。提出的PIQ将没有冲突的I/O请求调度到同一批中，将有冲突的I/O请求调度到不同的批中。因此，通过利用SSD丰富的并行性，可以同时完成一个批处理中的多个I/O请求。而且由于PIQ是在主机端实现的，所以它可以利用主机系统丰富的资源，如主存和CPU，使得开销可以忽略不计。大量的实验结果表明，PIQ可以显著提高具有大量访问冲突的应用程序的性能。

{"title":"Exploiting parallelism in I/O scheduling for access conflict minimization in flash-based solid state drives","authors":"Congming Gao, Liang Shi, Mengying Zhao, C. Xue, Kaijie Wu, E. Sha","doi":"10.1109/MSST.2014.6855544","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855544","url":null,"abstract":"Solid state drives (SSDs) have been widely deployed in personal computers, data centers, and cloud storages. In order to improve performance, SSDs are usually constructed with a number of channels with each channel connecting to a number of NAND flash chips. Despite the rich parallelism offered by multiple channels and multiple chips per channel, recent studies show that the utilization of flash chips (i.e. the number of flash chips being accessed simultaneously) is seriously low. Our study shows that the low chip utilization is caused by the access conflict among I/O requests. In this work, we propose Parallel Issue Queuing (PIQ), a novel I/O scheduler at the host system, to minimize the access conflicts between I/O requests. The proposed PIQ schedules I/O requests without conflicts into the same batch and I/O requests with conflicts into different batches. Hence the multiple I/O requests in one batch can be fulfilled simultaneously by exploiting the rich parallelism of SSD. And because PIQ is implemented at the host side, it can take advantage of rich resource at host system such as main memory and CPU, which makes the overhead negligible. Extensive experimental results show that PIQ delivers significant performance improvement to the applications that have heavy access conflicts.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121495601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 68

H-ARC: A non-volatile memory based cache policy for solid state drives H-ARC:基于非易失性存储器的固态驱动器缓存策略

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855546

Ziqi Fan, D. Du, Doug Voigt

With the rapid development of new types of nonvolatile memory (NVM), one of these technologies may replace DRAM as the main memory in the near future. Some drawbacks of DRAM, such as data loss due to power failure or a system crash can be remedied by NVM's non-volatile nature. In the meantime, solid state drives (SSDs) are becoming widely deployed as storage devices for faster random access speed compared with traditional hard disk drives (HDDs). For applications demanding higher reliability and better performance, using NVM as the main memory and SSDs as storage devices becomes a promising architecture. Although SSDs have better performance than HDDs, SSDs cannot support in-place updates (i.e., an erase operation has to be performed before a page can be updated) and suffer from a low endurance problem that each unit will wear out after certain number of erase operations. In an NVM based main memory, any updated pages called dirty pages can be kept longer without the urgent need to be flushed to SSDs. This difference opens an opportunity to design new cache policies that help extend the lifespan of SSDs by wisely choosing cache eviction victims to decrease storage write traffic. However, it is very challenging to design a policy that can also increase the cache hit ratio for better system performance. Most existing DRAM-based cache policies have mainly concentrated on the recency or frequency status of a page. On the other hand, most existing NVM-based cache policies have mainly focused on the dirty or clean status of a page. In this paper, by extending the concept of the Adaptive Replacement Cache (ARC), we propose a Hierarchical Adaptive Replacement Cache (H-ARC) policy that considers all four factors of a page's status: dirty, clean, recency, and frequency. Specifically, at the higher level, H-ARC adaptively splits the whole cache space into a dirty-page cache and a clean-page cache. At the lower level, inside the dirty-page cache and the clean-page cache, H-ARC splits them into a recency-page cache and a frequency-page cache separately. During the page eviction process, all parts of the cache will be balanced towards to their desired sizes.

随着新型非易失性存储器(NVM)的快速发展，其中一种技术可能在不久的将来取代DRAM成为主存储器。DRAM的一些缺点，例如由于电源故障或系统崩溃而导致的数据丢失，可以通过NVM的非易失性来弥补。与此同时，固态硬盘(ssd)由于具有比传统硬盘(hdd)更快的随机存取速度而被广泛应用于存储设备。对于要求更高可靠性和性能的应用，使用NVM作为主存、ssd作存储成为一种很有前景的架构。尽管ssd具有比hdd更好的性能，但ssd不支持就地更新(即，在更新页面之前必须执行擦除操作)，并且存在耐久性低的问题，即每个单元在进行一定数量的擦除操作后会磨损。在基于NVM的主存中，任何被称为脏页的更新页都可以保存更长时间，而不需要紧急刷新到ssd。这种差异为设计新的缓存策略提供了机会，通过明智地选择缓存清除受害者来减少存储写流量，从而帮助延长ssd的生命周期。然而，设计一种能够提高缓存命中率以提高系统性能的策略是非常具有挑战性的。大多数现有的基于dram的缓存策略主要集中于页面的最近或频率状态。另一方面，大多数现有的基于nvm的缓存策略主要关注页面的脏状态或干净状态。在本文中，通过扩展自适应替换缓存(ARC)的概念，我们提出了一种分层自适应替换缓存(H-ARC)策略，该策略考虑了页面状态的所有四个因素:脏的、干净的、最近的和频率。具体来说，在更高的级别上，H-ARC自适应地将整个缓存空间划分为脏页缓存和干净页缓存。在较低的层次上，在脏页缓存和干净页缓存中，H-ARC将它们分别划分为最近页缓存和频率页缓存。在页面移除过程中，缓存的所有部分将被平衡到所需的大小。

{"title":"H-ARC: A non-volatile memory based cache policy for solid state drives","authors":"Ziqi Fan, D. Du, Doug Voigt","doi":"10.1109/MSST.2014.6855546","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855546","url":null,"abstract":"With the rapid development of new types of nonvolatile memory (NVM), one of these technologies may replace DRAM as the main memory in the near future. Some drawbacks of DRAM, such as data loss due to power failure or a system crash can be remedied by NVM's non-volatile nature. In the meantime, solid state drives (SSDs) are becoming widely deployed as storage devices for faster random access speed compared with traditional hard disk drives (HDDs). For applications demanding higher reliability and better performance, using NVM as the main memory and SSDs as storage devices becomes a promising architecture. Although SSDs have better performance than HDDs, SSDs cannot support in-place updates (i.e., an erase operation has to be performed before a page can be updated) and suffer from a low endurance problem that each unit will wear out after certain number of erase operations. In an NVM based main memory, any updated pages called dirty pages can be kept longer without the urgent need to be flushed to SSDs. This difference opens an opportunity to design new cache policies that help extend the lifespan of SSDs by wisely choosing cache eviction victims to decrease storage write traffic. However, it is very challenging to design a policy that can also increase the cache hit ratio for better system performance. Most existing DRAM-based cache policies have mainly concentrated on the recency or frequency status of a page. On the other hand, most existing NVM-based cache policies have mainly focused on the dirty or clean status of a page. In this paper, by extending the concept of the Adaptive Replacement Cache (ARC), we propose a Hierarchical Adaptive Replacement Cache (H-ARC) policy that considers all four factors of a page's status: dirty, clean, recency, and frequency. Specifically, at the higher level, H-ARC adaptively splits the whole cache space into a dirty-page cache and a clean-page cache. At the lower level, inside the dirty-page cache and the clean-page cache, H-ARC splits them into a recency-page cache and a frequency-page cache separately. During the page eviction process, all parts of the cache will be balanced towards to their desired sizes.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121852377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

Anode: Empirical detection of performance problems in storage systems using time-series analysis of periodic measurements 阳极:使用周期测量的时间序列分析的存储系统性能问题的经验检测

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855551

Vipul Mathur, Cijo George, J. Basak

Performance problems are particularly hard to detect and diagnose in most computer systems, since there is no clear failure apart from the system being slow. In this paper, we present an empirical, data-driven methodology for detecting performance problems in data storage systems, and aiding in quick diagnosis once a problem is detected. The key feature of our solution is that it uses a combination of time-series analysis, domain knowledge and expert inputs to improve the overall efficacy. Our solution learns from a system's own history to establish the baseline of normal behavior. Hence it is not necessary to determine any static trigger-levels for metrics to raise alerts. Static triggers are ineffective since each system and its workloads are different from others. The method presented here (a) gives accurate indications of the time period when something goes wrong in a system, and (b) helps pin-point the most affected parts of the system to aid in diagnosis. Validation on more than 400 actual field support cases shows about 85% true positive rate with less than 10% false positive rate in identifying time periods of performance impact before or during the time a case was open. Results in a controlled lab environment are even better.

在大多数计算机系统中，性能问题尤其难以检测和诊断，因为除了系统变慢之外，没有明显的故障。在本文中，我们提出了一种经验的、数据驱动的方法，用于检测数据存储系统中的性能问题，并在检测到问题后帮助快速诊断。我们的解决方案的关键特点是它结合了时间序列分析、领域知识和专家输入来提高整体效率。我们的解决方案从系统自身的历史中学习，以建立正常行为的基线。因此，没有必要为指标确定任何静态触发级别以引发警报。静态触发器是无效的，因为每个系统及其工作负载都不同于其他系统。这里提出的方法(a)给出了当系统中出现问题时的准确时间段指示，(b)帮助确定系统中受影响最大的部分，以帮助诊断。对400多个实际现场支持案例的验证表明，在确定案例开放之前或期间的性能影响时间段时，真阳性率约为85%，假阳性率低于10%。在受控的实验室环境中，结果甚至更好。

{"title":"Anode: Empirical detection of performance problems in storage systems using time-series analysis of periodic measurements","authors":"Vipul Mathur, Cijo George, J. Basak","doi":"10.1109/MSST.2014.6855551","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855551","url":null,"abstract":"Performance problems are particularly hard to detect and diagnose in most computer systems, since there is no clear failure apart from the system being slow. In this paper, we present an empirical, data-driven methodology for detecting performance problems in data storage systems, and aiding in quick diagnosis once a problem is detected. The key feature of our solution is that it uses a combination of time-series analysis, domain knowledge and expert inputs to improve the overall efficacy. Our solution learns from a system's own history to establish the baseline of normal behavior. Hence it is not necessary to determine any static trigger-levels for metrics to raise alerts. Static triggers are ineffective since each system and its workloads are different from others. The method presented here (a) gives accurate indications of the time period when something goes wrong in a system, and (b) helps pin-point the most affected parts of the system to aid in diagnosis. Validation on more than 400 actual field support cases shows about 85% true positive rate with less than 10% false positive rate in identifying time periods of performance impact before or during the time a case was open. Results in a controlled lab environment are even better.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116010282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

HiSMRfs: A high performance file system for shingled storage array HiSMRfs:用于带状存储阵列的高性能文件系统

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855539

Chao Jin, Weiya Xi, Z. Ching, Feng Huo, Chun-Teck Lim

HiSMRfs a file system with standard POSIX interface suitable for Shingled Magnetic Recording (SMR) drives, has been designed and developed. HiSMRfs can manage raw SMR drives and support random writes without remapping layer implemented inside SMR drives. To achieve high performance, HiSMRfs separates data and metadata storage, and manages them differently. Metadata is managed using in-memory tree structures and stored in a high performance random write area such as in a SSD. Data writing is done through sequential appending style and store in a SMR drive. HiSMRfs includes a file/object-based RAID module for SMR/HDD arrays. The RAID module computes parity for individual files/objects and guarantees that data and parity writing are 100% in sequential and in full stripe. HiSMRfs is also suitable for a hybrid storage system with conventional HDDs and SSDs. Two prototype systems with HiSMRfs have been developed. The performance has been tested and compared with SMRfs and Flashcache. The experimental tests show that HiSMRfs performs 25% better than SMRfs, and 11% better than Flashcache system.

HiSMRfs是一个具有标准POSIX接口的文件系统，适用于带状磁记录(SMR)驱动器。HiSMRfs可以管理原始SMR驱动器并支持随机写入，而无需在SMR驱动器内部实现重新映射层。为了实现高性能，HiSMRfs将数据存储和元数据存储分开，并进行不同的管理。元数据采用内存树结构进行管理，存储在高性能随机写区域(如SSD)中。数据写入通过顺序追加方式完成，并存储在SMR驱动器中。HiSMRfs包括一个用于SMR/HDD阵列的基于文件/对象的RAID模块。RAID模块对单个文件/对象进行奇偶校验，并保证数据和奇偶校验写入100%顺序和全分条。HiSMRfs也适用于传统hdd和ssd的混合存储系统。已经开发了两个带有HiSMRfs的原型系统。并与SMRfs和Flashcache进行了性能测试和比较。实验结果表明，HiSMRfs系统比SMRfs系统性能提高25%，比Flashcache系统性能提高11%。

引用次数: 53