012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)最新文献_第2页

NANDFlashSim: Intrinsic latency variation aware NAND flash memory system modeling and simulation at microarchitecture level NANDFlashSim:内在延迟变化感知的NAND闪存系统微架构级建模和仿真

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232389

Myoungsoo Jung, E. Wilson, D. Donofrio, J. Shalf, M. Kandemir

As NAND flash memory becomes popular in diverse areas ranging from embedded systems to high performance computing, exposing and understanding flash memory's performance, energy consumption, and reliability becomes increasingly important. Moreover, with an increasing trend towards multiple-die, multiple-plane architectures and high speed interfaces, high performance NAND flash memory systems are expected to continue to scale. This scaling should further reduce costs and thereby widen proliferation of devices based on the technology. However, when designing NAND flash-based devices, making decisions about the optimal system configuration is non-trivial because NAND flash is sensitive to a large number of parameters, and some parameters exhibit significant latency variations. Such parameters include varying architectures such as multi-die and multi-plane, and a host of factors that affect performance, energy consumption, diverse node technology, and reliability. Unfortunately, there are no public domain tools for high-fidelity, microarchitecture level NAND flash memory simulation in existence to assist with making such decisions. Therefore, we introduce NANDFlashSim; a latency variation-aware, detailed, and highly configurable NAND flash simulation model. NANDFlashSim implements a detailed timing model for operations in sixteen state-of-the-art NAND flash operation mode combinations. In addition, NANDFlashSim models energies and reliability of NAND flash memory based on statistics. From our comprehensive experiments using NANDFlashSim, we found that 1) most read cases were unable to leverage the highly-parallel internal architecture of NAND flash regardless of the NAND flash operation mode, 2) the main source of this performance bottleneck is I/O bus activity, not NAND flash activity itself, 3) multi-level-cell NAND flash provides lower I/O bus resource contention than single-level-cell NAND flash, but the resource contention becomes a serious problem as the number of die increases, and 4) preference to employ many dies rather than to employ many planes promises better performance in disk-friendly real workloads. The simulator can be downloaded from http://www.cse.psu.edu/~mqj5086/nfs.

随着NAND闪存在从嵌入式系统到高性能计算等各个领域的流行，揭示和理解闪存的性能、能耗和可靠性变得越来越重要。此外，随着多芯片、多平面架构和高速接口的发展趋势，高性能NAND闪存系统有望继续扩大规模。这种规模将进一步降低成本，从而扩大基于该技术的设备的扩散。然而，在设计基于NAND闪存的设备时，由于NAND闪存对大量参数很敏感，并且一些参数表现出显著的延迟变化，因此做出关于最佳系统配置的决策是非常重要的。这些参数包括多模、多平面等不同架构，以及影响性能、能耗、不同节点技术和可靠性的一系列因素。不幸的是，目前还没有用于高保真度、微架构级NAND闪存模拟的公共领域工具来帮助做出这样的决定。因此，我们引入NANDFlashSim;一个延迟变化感知，详细，高度可配置的NAND闪存仿真模型。NANDFlashSim实现了16个最先进的NAND闪存操作模式组合的详细时序模型。此外，NANDFlashSim基于统计对NAND闪存的能量和可靠性进行了建模。从我们使用NANDFlashSim的综合实验中，我们发现1)大多数读取案例无法利用NAND闪存的高度并行内部架构，无论NAND闪存的操作模式如何;2)这种性能瓶颈的主要来源是I/O总线活动，而不是NAND闪存活动本身;3)多级单元NAND闪存提供比单级单元NAND闪存更低的I/O总线资源争用。但是，随着die数量的增加，资源争用成为一个严重的问题，并且4)在磁盘友好型实际工作负载中，优先使用多个die而不是使用多个plane可以保证更好的性能。模拟器可以从http://www.cse.psu.edu/~mqj5086/nfs下载。

{"title":"NANDFlashSim: Intrinsic latency variation aware NAND flash memory system modeling and simulation at microarchitecture level","authors":"Myoungsoo Jung, E. Wilson, D. Donofrio, J. Shalf, M. Kandemir","doi":"10.1109/MSST.2012.6232389","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232389","url":null,"abstract":"As NAND flash memory becomes popular in diverse areas ranging from embedded systems to high performance computing, exposing and understanding flash memory's performance, energy consumption, and reliability becomes increasingly important. Moreover, with an increasing trend towards multiple-die, multiple-plane architectures and high speed interfaces, high performance NAND flash memory systems are expected to continue to scale. This scaling should further reduce costs and thereby widen proliferation of devices based on the technology. However, when designing NAND flash-based devices, making decisions about the optimal system configuration is non-trivial because NAND flash is sensitive to a large number of parameters, and some parameters exhibit significant latency variations. Such parameters include varying architectures such as multi-die and multi-plane, and a host of factors that affect performance, energy consumption, diverse node technology, and reliability. Unfortunately, there are no public domain tools for high-fidelity, microarchitecture level NAND flash memory simulation in existence to assist with making such decisions. Therefore, we introduce NANDFlashSim; a latency variation-aware, detailed, and highly configurable NAND flash simulation model. NANDFlashSim implements a detailed timing model for operations in sixteen state-of-the-art NAND flash operation mode combinations. In addition, NANDFlashSim models energies and reliability of NAND flash memory based on statistics. From our comprehensive experiments using NANDFlashSim, we found that 1) most read cases were unable to leverage the highly-parallel internal architecture of NAND flash regardless of the NAND flash operation mode, 2) the main source of this performance bottleneck is I/O bus activity, not NAND flash activity itself, 3) multi-level-cell NAND flash provides lower I/O bus resource contention than single-level-cell NAND flash, but the resource contention becomes a serious problem as the number of die increases, and 4) preference to employ many dies rather than to employ many planes promises better performance in disk-friendly real workloads. The simulator can be downloaded from http://www.cse.psu.edu/~mqj5086/nfs.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125736895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 50

Valmar: High-bandwidth real-time streaming data management Valmar:高带宽实时流数据管理

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232387

David O. Bigelow, S. Brandt, John Bent, Hsing-bung Chen

In applications ranging from radio telescopes to Internet traffic monitoring, our ability to generate data has outpaced our ability to effectively capture, mine, and manage it. These ultra-high-bandwidth data streams typically contain little useful information and most of the data can be safely discarded. Periodically, however, an event of interest is observed and a large segment of the data must be preserved, including data preceding detection of the event. Doing so requires guaranteed data capture at source rates, line speed filtering to detect events and data points of interest, and TiVo-like ability to save past data once an event has been detected. We present Valmar, a system for guaranteed capture, indexing, and storage of ultra-high-bandwidth data streams. Our results show that Valmar performs at nearly full disk bandwidth, up to several orders of magnitude faster than flat file and database systems, works well with both small and large data elements, and allows concurrent read and search access without compromising data capture guarantees.

从射电望远镜到互联网流量监控，我们生成数据的能力已经超过了我们有效捕获、挖掘和管理数据的能力。这些超高带宽数据流通常包含很少有用的信息，大多数数据可以安全地丢弃。但是，定期观察感兴趣的事件时，必须保留大量数据，包括检测到事件之前的数据。这样做需要保证以源速率捕获数据，线速过滤以检测事件和感兴趣的数据点，以及类似tivo的功能，以便在检测到事件后保存过去的数据。我们介绍了Valmar，一个保证捕获、索引和存储超高带宽数据流的系统。我们的结果表明，Valmar在几乎全磁盘带宽的情况下执行，比平面文件和数据库系统快几个数量级，可以很好地处理小型和大型数据元素，并且允许并发读取和搜索访问，而不会影响数据捕获保证。

引用次数: 1

ADAPT: Efficient workload-sensitive flash management based on adaptation, prediction and aggregation ADAPT:基于自适应、预测和聚合的高效工作负载敏感闪存管理

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232388

Chundong Wang, W. Wong

Solid-state drives (SSDs) made of flash memory are widely utilized in enterprise servers nowadays. Internally, the management of flash memory resources is done by an embedded software known as the flash translation layer (FTL). One important function of the FTL is to map logical addresses issued by the operating system into physical flash addresses. The efficiency of this address mapping in the FTL directly impacts the performance of SSDs. In this paper, we propose a hybrid mapping FTL scheme, called Aggregated Data movement Augmenting Predictive Transfers (ADAPT). ADAPT observes access behaviors online to handle both sequential and random write requests efficiently. It also takes advantage of locality revealed in the history of recent accesses to avoid unnecessary data movements in the required merge process. More importantly, by these mechanisms, ADAPT can adapt to various workloads to achieve good performance. Experimental results show that ADAPT is as much as 35.4%, 44.2% and 23.5% faster than a state-of-the-art hybrid mapping scheme, a prevalent page-based mapping scheme, and a latest workload-adaptive mapping scheme, respectively, with a small increase in space requirement.

目前，由闪存制成的固态硬盘(ssd)在企业服务器中得到了广泛的应用。在内部，闪存资源的管理是由称为闪存转换层(FTL)的嵌入式软件完成的。FTL的一个重要功能是将操作系统发出的逻辑地址映射到物理闪存地址。这种地址映射在FTL中的效率直接影响到ssd的性能。在本文中，我们提出了一种混合映射FTL方案，称为聚合数据移动增强预测传输(ADAPT)。ADAPT在线观察访问行为，以有效地处理顺序和随机写请求。它还利用了最近访问历史中显示的局部性，以避免在所需的合并过程中不必要的数据移动。更重要的是，通过这些机制，ADAPT可以适应各种工作负载，从而获得良好的性能。实验结果表明，ADAPT算法的速度分别比最先进的混合映射方案、流行的基于页面的映射方案和最新的工作量自适应映射方案快35.4%、44.2%和23.5%，而空间需求略有增加。

{"title":"ADAPT: Efficient workload-sensitive flash management based on adaptation, prediction and aggregation","authors":"Chundong Wang, W. Wong","doi":"10.1109/MSST.2012.6232388","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232388","url":null,"abstract":"Solid-state drives (SSDs) made of flash memory are widely utilized in enterprise servers nowadays. Internally, the management of flash memory resources is done by an embedded software known as the flash translation layer (FTL). One important function of the FTL is to map logical addresses issued by the operating system into physical flash addresses. The efficiency of this address mapping in the FTL directly impacts the performance of SSDs. In this paper, we propose a hybrid mapping FTL scheme, called Aggregated Data movement Augmenting Predictive Transfers (ADAPT). ADAPT observes access behaviors online to handle both sequential and random write requests efficiently. It also takes advantage of locality revealed in the history of recent accesses to avoid unnecessary data movements in the required merge process. More importantly, by these mechanisms, ADAPT can adapt to various workloads to achieve good performance. Experimental results show that ADAPT is as much as 35.4%, 44.2% and 23.5% faster than a state-of-the-art hybrid mapping scheme, a prevalent page-based mapping scheme, and a latest workload-adaptive mapping scheme, respectively, with a small increase in space requirement.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132743965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

On the speedup of single-disk failure recovery in XOR-coded storage systems: Theory and practice xor编码存储系统中单盘故障恢复的加速:理论与实践

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232371

Yunfeng Zhu, P. Lee, Yuchong Hu, Liping Xiang, Yinlong Xu

Modern storage systems stripe redundant data across multiple disks to provide availability guarantees against disk failures. One form of data redundancy is based on XOR-based erasure codes, which use only XOR operations for encoding and decoding. In addition to providing failure tolerance, a storage system must also provide fast failure recovery to avoid data unavailability. We consider the problem of speeding up the recovery of a single-disk failure for arbitrary XOR-based erasure codes. We address this problem from both theoretical and practical perspectives. We propose a replace recovery algorithm, which uses a hill-climbing technique to search for a fast recovery solution, such that the solution search can be completed within a short time period. We further implement our replace recovery algorithm atop a parallelized architecture to justify its practicality. We experiment our replace recovery algorithm and its parallelized implementation on a networked storage system testbed, and demonstrate that our replace recovery algorithm uses less recovery time than the conventional approach.

现代存储系统将冗余数据分条到多个磁盘上，以保证磁盘故障时的可用性。数据冗余的一种形式是基于异或的擦除码，它只使用异或操作进行编码和解码。存储系统除了提供容错功能外，还必须提供快速故障恢复功能，以避免数据不可用。我们考虑了对任意基于xor的擦除码加速单磁盘故障恢复的问题。我们从理论和实践两个角度来解决这个问题。我们提出了一种替换恢复算法，该算法使用爬坡技术搜索快速恢复解，使得解搜索可以在短时间内完成。我们进一步在并行架构上实现替换恢复算法，以证明其实用性。在网络存储系统测试平台上对替换恢复算法及其并行化实现进行了实验，并证明了替换恢复算法比传统方法使用的恢复时间更短。

{"title":"On the speedup of single-disk failure recovery in XOR-coded storage systems: Theory and practice","authors":"Yunfeng Zhu, P. Lee, Yuchong Hu, Liping Xiang, Yinlong Xu","doi":"10.1109/MSST.2012.6232371","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232371","url":null,"abstract":"Modern storage systems stripe redundant data across multiple disks to provide availability guarantees against disk failures. One form of data redundancy is based on XOR-based erasure codes, which use only XOR operations for encoding and decoding. In addition to providing failure tolerance, a storage system must also provide fast failure recovery to avoid data unavailability. We consider the problem of speeding up the recovery of a single-disk failure for arbitrary XOR-based erasure codes. We address this problem from both theoretical and practical perspectives. We propose a replace recovery algorithm, which uses a hill-climbing technique to search for a fast recovery solution, such that the solution search can be completed within a short time period. We further implement our replace recovery algorithm atop a parallelized architecture to justify its practicality. We experiment our replace recovery algorithm and its parallelized implementation on a networked storage system testbed, and demonstrate that our replace recovery algorithm uses less recovery time than the conventional approach.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124086558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

A new high-performance, energy-efficient replication storage system with reliability guarantee 一种高性能、节能、可靠的新型复制存储系统

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232373

Ji-guang Wan, Chao Yin, Jun Wang, C. Xie

In modern replication storage systems where data carries two or more multiple copies, a primary group of disks is always up to service incoming requests while other disks are often spun down to sleep states to save energy during slack periods. However, since new writes cannot be immediately synchronized onto all disks, system reliability is degraded. This paper develops PERAID, a new high-performance, energy-efficient replication storage system, which aims to improve both performance and energy efficiency without compromising reliability. It employs a parity software RAID as a virtual write buffer disk at the front end to absorb new writes. Since extra parity redundancy supplies two or more copies, PERAID guarantees comparable reliability with that of a replication storage system. In addition, PERAID offers better write performance compared to the replication system by avoiding the classical small-write problem in traditional parity RAID: buffering many small random writes into few large writes and writing to storage in a parallel fashion. By evaluating our PERAID prototype using two benchmarks and two real-life traces, we found that PERAID significantly improves write performance and saves more energy than existing solutions such as GRAID, eRAID.

在数据携带两个或多个副本的现代复制存储系统中，一组主磁盘总是用于处理传入的请求，而其他磁盘通常休眠到休眠状态，以便在空闲期间节省能源。但是，由于新的写操作不能立即同步到所有磁盘，因此降低了系统可靠性。本文开发了一种新的高性能、节能的复制存储系统PERAID，其目的是在不影响可靠性的情况下提高性能和能源效率。它采用奇偶校验软件RAID作为前端的虚拟写缓冲盘来吸收新的写。由于额外的奇偶冗余提供了两个或更多的副本，PERAID保证了与复制存储系统相当的可靠性。此外，与复制系统相比，PERAID提供了更好的写入性能，因为它避免了传统奇偶校验RAID中常见的小写入问题:将许多小的随机写入缓冲为少量大写入，并以并行方式写入存储。通过使用两个基准测试和两个实际跟踪来评估我们的PERAID原型，我们发现PERAID显著提高了写入性能，并且比现有的解决方案(如GRAID、eRAID)节省了更多的能源。

{"title":"A new high-performance, energy-efficient replication storage system with reliability guarantee","authors":"Ji-guang Wan, Chao Yin, Jun Wang, C. Xie","doi":"10.1109/MSST.2012.6232373","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232373","url":null,"abstract":"In modern replication storage systems where data carries two or more multiple copies, a primary group of disks is always up to service incoming requests while other disks are often spun down to sleep states to save energy during slack periods. However, since new writes cannot be immediately synchronized onto all disks, system reliability is degraded. This paper develops PERAID, a new high-performance, energy-efficient replication storage system, which aims to improve both performance and energy efficiency without compromising reliability. It employs a parity software RAID as a virtual write buffer disk at the front end to absorb new writes. Since extra parity redundancy supplies two or more copies, PERAID guarantees comparable reliability with that of a replication storage system. In addition, PERAID offers better write performance compared to the replication system by avoiding the classical small-write problem in traditional parity RAID: buffering many small random writes into few large writes and writing to storage in a parallel fashion. By evaluating our PERAID prototype using two benchmarks and two real-life traces, we found that PERAID significantly improves write performance and saves more energy than existing solutions such as GRAID, eRAID.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"18 779 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131624199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Mercury: Host-side flash caching for the data center 水星:数据中心的主机端闪存缓存

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232368

Steve Byan, J. Lentini, Anshul Madan, Luis Pabon

The adoption of flash memory in high volume consumer products such as cell phones, tablet computers, digital cameras, and portable music players has driven down flash costs and increased flash quality. This trend is pushing flash memory into new applications, including enterprise computing. In enterprise data centers, servers containing flash-based SolidState Drives (SSDs) are becoming common. However, data center architects prefer to deploy shared storage over direct-attached storage (DAS). Shared storage offers superior manageability, availability, and scalability compared to DAS. For these reasons, system designers want to reap the benefits of direct attached flash memory without decreasing the value of shared storage systems. Our solution is Mercury, a persistent, write-through host-side cache for flash memory. By designing Mercury as a hypervisor cache, we simplify integration and deployment into host environments. This paper presents our experience building a host-side flash cache, an architectural analysis of possible cache attachment points, and a performance evaluation using enterprise workloads. Our results show a 26% improvement in the bandwidth observed by the Jetstress benchmark and a 500% improvement in the I/O rate of an enterprise workload.

在手机、平板电脑、数码相机和便携式音乐播放器等大量消费产品中采用闪存，降低了闪存的成本，提高了闪存的质量。这一趋势正在推动闪存进入新的应用领域，包括企业计算。在企业数据中心，包含基于闪存的固态硬盘(ssd)的服务器正变得越来越普遍。然而，数据中心架构师更喜欢部署共享存储，而不是直接连接存储(DAS)。与DAS相比，共享存储提供了更好的可管理性、可用性和可伸缩性。由于这些原因，系统设计人员希望在不降低共享存储系统价值的情况下获得直接连接闪存的好处。我们的解决方案是Mercury，它是用于闪存的持久的、透写的主机端缓存。通过将Mercury设计为一个管理程序缓存，我们简化了在主机环境中的集成和部署。本文介绍了我们构建主机端闪存缓存的经验，对可能的缓存附着点进行架构分析，以及使用企业工作负载进行性能评估。我们的结果显示，Jetstress基准测试观察到的带宽提高了26%，企业工作负载的I/O速率提高了500%。

{"title":"Mercury: Host-side flash caching for the data center","authors":"Steve Byan, J. Lentini, Anshul Madan, Luis Pabon","doi":"10.1109/MSST.2012.6232368","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232368","url":null,"abstract":"The adoption of flash memory in high volume consumer products such as cell phones, tablet computers, digital cameras, and portable music players has driven down flash costs and increased flash quality. This trend is pushing flash memory into new applications, including enterprise computing. In enterprise data centers, servers containing flash-based SolidState Drives (SSDs) are becoming common. However, data center architects prefer to deploy shared storage over direct-attached storage (DAS). Shared storage offers superior manageability, availability, and scalability compared to DAS. For these reasons, system designers want to reap the benefits of direct attached flash memory without decreasing the value of shared storage systems. Our solution is Mercury, a persistent, write-through host-side cache for flash memory. By designing Mercury as a hypervisor cache, we simplify integration and deployment into host environments. This paper presents our experience building a host-side flash cache, an architectural analysis of possible cache attachment points, and a performance evaluation using enterprise workloads. Our results show a 26% improvement in the bandwidth observed by the Jetstress benchmark and a 500% improvement in the I/O rate of an enterprise workload.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128444721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 132

Flashy prefetching for high-performance flash drives 用于高性能闪存驱动器的华丽预取

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232367

Ahsen J. Uppal, R. C. Chiang, H. H. Huang

While hard drives hold on to the capacity advantage, flash-based solid-state drives (SSD) with high bandwidth and low latency have become good alternatives for I/O-intensive applications. Traditional data prefetching has been primarily designed to improve I/O performance on hard drives. The same techniques, if applied unchanged on flash drives, are likely to either fail to fully utilize SSDs, or interfere with application I/O requests, both of which could result in undesirable application performance. In this work, we demonstrate that data prefetching, when effectively harnessing the high performance of SSDs, can provide significant performance benefits for a wide range of data-intensive applications. The new technique, flashy prefetching, consists of accurate prediction of application needs in runtime and adaptive feedback-directed prefetching that scales with application needs, while being considerate to underlying storage devices. We have implemented a real system in Linux and evaluated it on four different SSDs. The results show 65-70% prefetching accuracy and an average 20% speedup on LFS, web search engine traces, BLAST, and TPC-H like benchmarks across various storage drives.

虽然硬盘驱动器保持着容量优势，但具有高带宽和低延迟的基于闪存的固态驱动器(SSD)已成为I/ o密集型应用程序的良好替代方案。传统的数据预取主要是为了提高硬盘驱动器的I/O性能。同样的技术，如果在闪存驱动器上不加改变地应用，很可能无法充分利用ssd，或者干扰应用程序I/O请求，这两种情况都可能导致不理想的应用程序性能。在这项工作中，我们证明了当有效地利用ssd的高性能时，数据预取可以为广泛的数据密集型应用程序提供显着的性能优势。新技术，浮华预取，包括在运行时对应用程序需求的准确预测和根据应用程序需求进行调整的自适应反馈定向预取，同时考虑到底层存储设备。我们在Linux上实现了一个真实的系统，并在四个不同的ssd上对其进行了评估。结果显示，在各种存储驱动器上，在LFS、web搜索引擎跟踪、BLAST和TPC-H类基准测试中，预取准确率为65-70%，平均加速为20%。

{"title":"Flashy prefetching for high-performance flash drives","authors":"Ahsen J. Uppal, R. C. Chiang, H. H. Huang","doi":"10.1109/MSST.2012.6232367","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232367","url":null,"abstract":"While hard drives hold on to the capacity advantage, flash-based solid-state drives (SSD) with high bandwidth and low latency have become good alternatives for I/O-intensive applications. Traditional data prefetching has been primarily designed to improve I/O performance on hard drives. The same techniques, if applied unchanged on flash drives, are likely to either fail to fully utilize SSDs, or interfere with application I/O requests, both of which could result in undesirable application performance. In this work, we demonstrate that data prefetching, when effectively harnessing the high performance of SSDs, can provide significant performance benefits for a wide range of data-intensive applications. The new technique, flashy prefetching, consists of accurate prediction of application needs in runtime and adaptive feedback-directed prefetching that scales with application needs, while being considerate to underlying storage devices. We have implemented a real system in Linux and evaluated it on four different SSDs. The results show 65-70% prefetching accuracy and an average 20% speedup on LFS, web search engine traces, BLAST, and TPC-H like benchmarks across various storage drives.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129174947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Deduplication in SSDs: Model and quantitative analysis ssd的重复数据删除:模型和定量分析

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232379

Jong-Hun Kim, Choonghyun Lee, Sang Yup Lee, Ikjoon Son, Jongmoo Choi, Sungroh Yoon, Hu-ung Lee, Sooyong Kang, Y. Won, Jaehyuk Cha

In NAND Flash-based SSDs, deduplication can provide an effective resolution of three critical issues: cell lifetime, write performance, and garbage collection overhead. However, deduplication at SSD device level distinguishes itself from the one at enterprise storage systems in many aspects, whose success lies in proper exploitation of underlying very limited hardware resources and workload characteristics of SSDs. In this paper, we develop a novel deduplication framework elaborately tailored for SSDs. We first mathematically develop an analytical model that enables us to calculate the minimum required duplication rate in order to achieve performance gain given deduplication overhead. Then, we explore a number of design choices for implementing deduplication components by hardware or software. As a result, we propose two acceleration techniques: sampling-based filtering and recency-based fingerprint management. The former selectively applies deduplication based upon sampling and the latter effectively exploits limited controller memory while maximizing the deduplication ratio. We prototype the proposed deduplication framework in three physical hardware platforms and investigate deduplication efficiency according to various CPU capabilities and hardware/software alternatives. Experimental results have shown that we achieve the duplication rate ranging from 4% to 51%, with an average of 17%, for the nine workloads considered in this work. The response time of a write request can be improved by up to 48% with an average of 15%, while the lifespan of SSDs is expected to increase up to 4.1 times with an average of 2.4 times.

在基于NAND闪存的ssd中，重复数据删除可以有效地解决三个关键问题:单元寿命、写性能和垃圾收集开销。但是，SSD设备级的重复数据删除与企业存储系统级的重复数据删除有很多不同之处，其成功之处在于合理利用底层非常有限的硬件资源和SSD的工作负载特点。在本文中，我们开发了一个为ssd精心定制的新型重复数据删除框架。我们首先在数学上开发了一个分析模型，使我们能够计算最小所需的重复速率，以便在给定重复数据删除开销的情况下实现性能增益。然后，我们探讨了通过硬件或软件实现重复数据删除组件的一些设计选择。因此，我们提出了两种加速技术:基于采样的滤波和基于近代性的指纹管理。前者基于采样有选择地应用重复数据删除，后者有效地利用有限的控制器内存，同时最大限度地提高重复数据删除率。我们在三种物理硬件平台上对提出的重复数据删除框架进行了原型化，并根据不同的CPU能力和硬件/软件替代方案调查了重复数据删除效率。实验结果表明，对于本文所考虑的9种工作负载，我们实现了4% ~ 51%的重复率，平均为17%。写请求响应时间提高48%，平均提高15%;ssd寿命提高4.1倍，平均提高2.4倍。

{"title":"Deduplication in SSDs: Model and quantitative analysis","authors":"Jong-Hun Kim, Choonghyun Lee, Sang Yup Lee, Ikjoon Son, Jongmoo Choi, Sungroh Yoon, Hu-ung Lee, Sooyong Kang, Y. Won, Jaehyuk Cha","doi":"10.1109/MSST.2012.6232379","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232379","url":null,"abstract":"In NAND Flash-based SSDs, deduplication can provide an effective resolution of three critical issues: cell lifetime, write performance, and garbage collection overhead. However, deduplication at SSD device level distinguishes itself from the one at enterprise storage systems in many aspects, whose success lies in proper exploitation of underlying very limited hardware resources and workload characteristics of SSDs. In this paper, we develop a novel deduplication framework elaborately tailored for SSDs. We first mathematically develop an analytical model that enables us to calculate the minimum required duplication rate in order to achieve performance gain given deduplication overhead. Then, we explore a number of design choices for implementing deduplication components by hardware or software. As a result, we propose two acceleration techniques: sampling-based filtering and recency-based fingerprint management. The former selectively applies deduplication based upon sampling and the latter effectively exploits limited controller memory while maximizing the deduplication ratio. We prototype the proposed deduplication framework in three physical hardware platforms and investigate deduplication efficiency according to various CPU capabilities and hardware/software alternatives. Experimental results have shown that we achieve the duplication rate ranging from 4% to 51%, with an average of 17%, for the nine workloads considered in this work. The response time of a write request can be improved by up to 48% with an average of 15%, while the lifespan of SSDs is expected to increase up to 4.1 times with an average of 2.4 times.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"11 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130227993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 68

vPFS: Bandwidth virtualization of parallel storage systems vPFS:并行存储系统的带宽虚拟化

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232370

Yiqi Xu, D. Arteaga, Ming Zhao, Yonggang Liu, R. Figueiredo, Seetharami R. Seelam

Existing parallel file systems are unable to differentiate I/Os requests from concurrent applications and meet per-application bandwidth requirements. This limitation prevents applications from meeting their desired Quality of Service (QoS) as high-performance computing (HPC) systems continue to scale up. This paper presents vPFS, a new solution to address this challenge through a bandwidth virtualization layer for parallel file systems. vPFS employs user-level parallel file system proxies to interpose requests between native clients and servers and to schedule parallel I/Os from different applications based on configurable bandwidth management policies. vPFS is designed to be generic enough to support various scheduling algorithms and parallel file systems. Its utility and performance are studied with a prototype which virtualizes PVFS2, a widely used parallel file system. Enhanced proportional sharing schedulers are enabled based on the unique characteristics (parallel striped I/Os) and requirement (high throughput) of parallel storage systems. The enhancements include new threshold- and layout-driven scheduling synchronization schemes which reduce global communication overhead while delivering total-service fairness. An experimental evaluation using typical HPC benchmarks (IOR, NPB BTIO) shows that the throughput overhead of vPFS is small (<;3% for write, <;1% for read). It also shows that vPFS can achieve good proportional bandwidth sharing (>;96% of target sharing ratio) for competing applications with diverse I/O patterns.

现有的并行文件系统无法区分来自并发应用程序的I/ o请求，也无法满足每个应用程序的带宽需求。随着高性能计算(HPC)系统的不断扩展，这种限制使应用程序无法满足其期望的服务质量(QoS)。本文提出了vPFS，一种通过并行文件系统的带宽虚拟化层来解决这一挑战的新解决方案。vPFS使用用户级并行文件系统代理在本地客户机和服务器之间插入请求，并根据可配置的带宽管理策略调度来自不同应用程序的并行I/ o。vPFS被设计得足够通用，以支持各种调度算法和并行文件系统。通过虚拟PVFS2(一种广泛应用的并行文件系统)的原型，研究了它的实用性和性能。基于并行存储系统的独特特性(并行条纹I/ o)和需求(高吞吐量)，启用了增强的比例共享调度器。增强功能包括新的阈值和布局驱动的调度同步方案，这些方案在提供总体服务公平性的同时减少了全局通信开销。使用典型HPC基准(IOR, NPB BTIO)的实验评估表明，对于具有不同I/O模式的竞争应用程序，vPFS的吞吐量开销很小(占目标共享比率的96%)。

{"title":"vPFS: Bandwidth virtualization of parallel storage systems","authors":"Yiqi Xu, D. Arteaga, Ming Zhao, Yonggang Liu, R. Figueiredo, Seetharami R. Seelam","doi":"10.1109/MSST.2012.6232370","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232370","url":null,"abstract":"Existing parallel file systems are unable to differentiate I/Os requests from concurrent applications and meet per-application bandwidth requirements. This limitation prevents applications from meeting their desired Quality of Service (QoS) as high-performance computing (HPC) systems continue to scale up. This paper presents vPFS, a new solution to address this challenge through a bandwidth virtualization layer for parallel file systems. vPFS employs user-level parallel file system proxies to interpose requests between native clients and servers and to schedule parallel I/Os from different applications based on configurable bandwidth management policies. vPFS is designed to be generic enough to support various scheduling algorithms and parallel file systems. Its utility and performance are studied with a prototype which virtualizes PVFS2, a widely used parallel file system. Enhanced proportional sharing schedulers are enabled based on the unique characteristics (parallel striped I/Os) and requirement (high throughput) of parallel storage systems. The enhancements include new threshold- and layout-driven scheduling synchronization schemes which reduce global communication overhead while delivering total-service fairness. An experimental evaluation using typical HPC benchmarks (IOR, NPB BTIO) shows that the throughput overhead of vPFS is small (<;3% for write, <;1% for read). It also shows that vPFS can achieve good proportional bandwidth sharing (>;96% of target sharing ratio) for competing applications with diverse I/O patterns.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130922215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

On the role of burst buffers in leadership-class storage systems 关于突发缓冲在领导级存储系统中的作用

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232369

Ning Liu, Jason Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, C. Maltzahn

The largest-scale high-performance (HPC) systems are stretching parallel file systems to their limits in terms of aggregate bandwidth and numbers of clients. To further sustain the scalability of these file systems, researchers and HPC storage architects are exploring various storage system designs. One proposed storage system design integrates a tier of solid-state burst buffers into the storage system to absorb application I/O requests. In this paper, we simulate and explore this storage system design for use by large-scale HPC systems. First, we examine application I/O patterns on an existing large-scale HPC system to identify common burst patterns. Next, we describe enhancements to the CODES storage system simulator to enable our burst buffer simulations. These enhancements include the integration of a burst buffer model into the I/O forwarding layer of the simulator, the development of an I/O kernel description language and interpreter, the development of a suite of I/O kernels that are derived from observed I/O patterns, and fidelity improvements to the CODES models. We evaluate the I/O performance for a set of multiapplication I/O workloads and burst buffer configurations. We show that burst buffers can accelerate the application perceived throughput to the external storage system and can reduce the amount of external storage bandwidth required to meet a desired application perceived throughput goal.

最大规模的高性能(HPC)系统正在将并行文件系统扩展到其聚合带宽和客户机数量的极限。为了进一步维持这些文件系统的可伸缩性，研究人员和HPC存储架构师正在探索各种存储系统设计。一种建议的存储系统设计将一层固态突发缓冲区集成到存储系统中，以吸收应用程序的I/O请求。在本文中，我们模拟和探索了这种存储系统的设计，用于大规模的高性能计算系统。首先，我们在现有的大规模HPC系统上检查应用程序I/O模式，以识别常见的突发模式。接下来，我们描述了对CODES存储系统模拟器的增强，以实现突发缓冲模拟。这些增强包括将突发缓冲模型集成到模拟器的I/O转发层中，开发I/O内核描述语言和解释器，开发一套基于观察到的I/O模式的I/O内核，以及对CODES模型的保真度改进。我们评估了一组多应用程序I/O工作负载和突发缓冲区配置的I/O性能。我们展示了突发缓冲区可以加速应用程序对外部存储系统的感知吞吐量，并可以减少满足期望的应用程序感知吞吐量目标所需的外部存储带宽。

{"title":"On the role of burst buffers in leadership-class storage systems","authors":"Ning Liu, Jason Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, C. Maltzahn","doi":"10.1109/MSST.2012.6232369","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232369","url":null,"abstract":"The largest-scale high-performance (HPC) systems are stretching parallel file systems to their limits in terms of aggregate bandwidth and numbers of clients. To further sustain the scalability of these file systems, researchers and HPC storage architects are exploring various storage system designs. One proposed storage system design integrates a tier of solid-state burst buffers into the storage system to absorb application I/O requests. In this paper, we simulate and explore this storage system design for use by large-scale HPC systems. First, we examine application I/O patterns on an existing large-scale HPC system to identify common burst patterns. Next, we describe enhancements to the CODES storage system simulator to enable our burst buffer simulations. These enhancements include the integration of a burst buffer model into the I/O forwarding layer of the simulator, the development of an I/O kernel description language and interpreter, the development of a suite of I/O kernels that are derived from observed I/O patterns, and fidelity improvements to the CODES models. We evaluate the I/O performance for a set of multiapplication I/O workloads and burst buffer configurations. We show that burst buffers can accelerate the application perceived throughput to the external storage system and can reduce the amount of external storage bandwidth required to meet a desired application perceived throughput goal.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128678540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 340