首页 > 最新文献

2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)最新文献

英文 中文
Write amplification reduction in NAND Flash through multi-write coding 通过多写编码减少NAND闪存的写放大
Pub Date : 2010-05-03 DOI: 10.1109/MSST.2010.5496985
A. Jagmohan, M. Franceschini, L. Lastras
The block erase requirement in NAND Flash devices leads to the need for garbage collection. Garbage collection results in write amplification, that is, to an increase in the number of physical page programming operations. Write amplification adversely impacts the limited lifetime of a NAND Flash device, and can add significant system overhead unless a large spare factor is maintained. This paper proposes a NAND Flash system which uses multi-write coding to reduce write amplification. Multi-write coding allows a NAND Flash page to be written more than once without requiring an intervening block erase. We present a novel two-write coding technique based on enumerative coding, which achieves linear coding rates with low computational complexity. The proposed technique also seeks to minimize memory wear by reducing the number of programmed cells per page write. We describe a system which uses lossless data compression in conjunction with multi-write coding, and show through simulations that the proposed system has significantly reduced write amplification and memory wear.
NAND闪存设备中的块擦除要求导致需要垃圾收集。垃圾收集导致写放大,即增加物理页编程操作的数量。写入放大会对NAND闪存设备有限的寿命产生不利影响,并且会增加显著的系统开销,除非维护大量的备用因素。本文提出了一种采用多写编码来减小写放大的NAND闪存系统。多次写入编码允许NAND闪存页被多次写入,而不需要中间的块擦除。提出了一种新的基于枚举编码的双写编码技术,该技术实现了低计算复杂度的线性编码率。所提出的技术还寻求通过减少每个页面写入的编程单元的数量来最小化内存损耗。我们描述了一个将无损数据压缩与多写编码相结合的系统,并通过仿真表明,所提出的系统显著降低了写放大和内存损耗。
{"title":"Write amplification reduction in NAND Flash through multi-write coding","authors":"A. Jagmohan, M. Franceschini, L. Lastras","doi":"10.1109/MSST.2010.5496985","DOIUrl":"https://doi.org/10.1109/MSST.2010.5496985","url":null,"abstract":"The block erase requirement in NAND Flash devices leads to the need for garbage collection. Garbage collection results in write amplification, that is, to an increase in the number of physical page programming operations. Write amplification adversely impacts the limited lifetime of a NAND Flash device, and can add significant system overhead unless a large spare factor is maintained. This paper proposes a NAND Flash system which uses multi-write coding to reduce write amplification. Multi-write coding allows a NAND Flash page to be written more than once without requiring an intervening block erase. We present a novel two-write coding technique based on enumerative coding, which achieves linear coding rates with low computational complexity. The proposed technique also seeks to minimize memory wear by reducing the number of programmed cells per page write. We describe a system which uses lossless data compression in conjunction with multi-write coding, and show through simulations that the proposed system has significantly reduced write amplification and memory wear.","PeriodicalId":350968,"journal":{"name":"2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114171964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
A content-aware block placement algorithm for reducing PRAM storage bit writes 一种用于减少PRAM存储位写入的内容感知块放置算法
Pub Date : 2010-05-03 DOI: 10.1109/MSST.2010.5496996
B. Wongchaowart, M. Iskander, Sangyeun Cho
Phase-change random access memory (PRAM) is a promising storage-class memory technology that has the potential to replace flash memory and DRAM in many applications. Because individual cells in a PRAM can be written independently, only data cells whose current values differ from the corresponding bits in a write request need to be updated. Furthermore, when a block write request is received, the PRAM may contain many free blocks that are available for overwriting, and these free blocks will generally have different contents. For this reason, the number of bit programming operations required to write new data to the PRAM (and consequently power consumption and write bandwidth) depends on the location that is chosen to be overwritten. This paper describes a block placement algorithm for reducing PRAM bit writes based on the idea of indexing free blocks using a content-based signature; computing the signature value of a new block of data to be written allows a free block with similar contents to be located quickly. While the benefit that can be realized by the use of any block placement algorithm is heavily dependent on the workload, our evaluation results show that block placement using content-based signatures is able to reduce the number of PRAM bit programming operations by as much as an order of magnitude.
相变随机存取存储器(PRAM)是一种很有前途的存储级存储器技术,在许多应用中有可能取代闪存和DRAM。由于PRAM中的单个单元可以独立写入,因此只有当前值与写入请求中相应位不同的数据单元需要更新。此外,当收到块写请求时,PRAM可能包含许多可用于覆盖的空闲块,并且这些空闲块通常具有不同的内容。由于这个原因,将新数据写入PRAM所需的位编程操作的数量(以及因此产生的功耗和写带宽)取决于所选择的要覆盖的位置。基于使用基于内容的签名对空闲块进行索引的思想,本文描述了一种减少PRAM位写入的块放置算法;计算要写入的新数据块的签名值,可以快速定位具有相似内容的空闲块。虽然使用任何块放置算法所能实现的好处在很大程度上取决于工作负载,但我们的评估结果表明,使用基于内容的签名的块放置能够将PRAM位编程操作的数量减少多达一个数量级。
{"title":"A content-aware block placement algorithm for reducing PRAM storage bit writes","authors":"B. Wongchaowart, M. Iskander, Sangyeun Cho","doi":"10.1109/MSST.2010.5496996","DOIUrl":"https://doi.org/10.1109/MSST.2010.5496996","url":null,"abstract":"Phase-change random access memory (PRAM) is a promising storage-class memory technology that has the potential to replace flash memory and DRAM in many applications. Because individual cells in a PRAM can be written independently, only data cells whose current values differ from the corresponding bits in a write request need to be updated. Furthermore, when a block write request is received, the PRAM may contain many free blocks that are available for overwriting, and these free blocks will generally have different contents. For this reason, the number of bit programming operations required to write new data to the PRAM (and consequently power consumption and write bandwidth) depends on the location that is chosen to be overwritten. This paper describes a block placement algorithm for reducing PRAM bit writes based on the idea of indexing free blocks using a content-based signature; computing the signature value of a new block of data to be written allows a free block with similar contents to be located quickly. While the benefit that can be realized by the use of any block placement algorithm is heavily dependent on the workload, our evaluation results show that block placement using content-based signatures is able to reduce the number of PRAM bit programming operations by as much as an order of magnitude.","PeriodicalId":350968,"journal":{"name":"2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124107935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Disk-enabled authenticated encryption 启用磁盘的身份验证加密
Pub Date : 2010-05-03 DOI: 10.1109/MSST.2010.5496979
Kevin R. B. Butler, Stephen E. McLaughlin, P. Mcdaniel
Storage is increasingly becoming a vector for data compromise. Solutions for protecting on-disk data confidentiality and integrity to date have been limited in their effectiveness. Providing authenticated encryption, or simultaneous encryption with integrity information, is important to protect data at rest. In this paper, we propose that disks augmented with non-volatile storage (e.g., hybrid hard disks) and cryptographic processors (e.g., FDE drives) may provide a solution for authenticated encryption, storing security metadata within the drive itself to eliminate dependences on other parts of the system. We augment the DiskSim simulator with a flash simulator to evaluate the costs associated with managing operational overheads. These experiments show that proper tuning of system parameters can eliminate many of the costs associated with managing security metadata, with less than a 2% decrease in IOPS versus regular disks.
存储正日益成为数据泄露的载体。迄今为止,用于保护磁盘上数据机密性和完整性的解决方案的有效性有限。提供经过身份验证的加密,或使用完整性信息进行同步加密,对于保护静态数据非常重要。在这篇论文中,我们建议增加非易失性存储(例如,混合硬盘)和加密处理器(例如,FDE驱动器)的磁盘可以为身份验证加密提供解决方案,将安全元数据存储在驱动器本身中,以消除对系统其他部分的依赖。我们在DiskSim模拟器的基础上增加了一个flash模拟器,以评估与管理运营开销相关的成本。这些实验表明,适当调优系统参数可以消除与管理安全元数据相关的许多成本,与常规磁盘相比,IOPS下降不到2%。
{"title":"Disk-enabled authenticated encryption","authors":"Kevin R. B. Butler, Stephen E. McLaughlin, P. Mcdaniel","doi":"10.1109/MSST.2010.5496979","DOIUrl":"https://doi.org/10.1109/MSST.2010.5496979","url":null,"abstract":"Storage is increasingly becoming a vector for data compromise. Solutions for protecting on-disk data confidentiality and integrity to date have been limited in their effectiveness. Providing authenticated encryption, or simultaneous encryption with integrity information, is important to protect data at rest. In this paper, we propose that disks augmented with non-volatile storage (e.g., hybrid hard disks) and cryptographic processors (e.g., FDE drives) may provide a solution for authenticated encryption, storing security metadata within the drive itself to eliminate dependences on other parts of the system. We augment the DiskSim simulator with a flash simulator to evaluate the costs associated with managing operational overheads. These experiments show that proper tuning of system parameters can eliminate many of the costs associated with managing security metadata, with less than a 2% decrease in IOPS versus regular disks.","PeriodicalId":350968,"journal":{"name":"2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130385278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mahanaxar: Quality of service guarantees in high-bandwidth, real-time streaming data storage Mahanaxar:高带宽、实时流数据存储的服务质量保证
Pub Date : 2010-05-03 DOI: 10.1109/MSST.2010.5496975
David O. Bigelow, S. Brandt, John Bent, Hsing-bung Chen
Large radio telescopes, cyber-security systems monitoring real-time network traffic, and others have specialized data storage needs: guaranteed capture of an ultra-high-bandwidth data stream, retention of the data long enough to determine what is “interesting,” retention of interesting data indefinitely, and concurrent read/write access to determine what data is interesting, without interrupting the ongoing capture of incoming data. Mahanaxar addresses this problem. Mahanaxar guarantees streaming real-time data capture at (nearly) the full rate of the raw device, allows concurrent read and write access to the device on a best-effort basis without interrupting the data capture, and retains data as long as possible given the available storage. It has built in mechanisms for reliability and indexing, can scale to meet arbitrary bandwidth requirements, and handles both small and large data elements equally well. Results from our prototype implementation show that Mahanaxar provides both better guarantees and better performance than traditional file systems.
大型射电望远镜、监控实时网络流量的网络安全系统等都有专门的数据存储需求:保证捕获超高带宽数据流,保留足够长的数据以确定什么是“有趣的”,无限期保留感兴趣的数据,并发读/写访问以确定什么数据是有趣的,而不中断正在进行的捕获传入数据。Mahanaxar解决了这个问题。Mahanaxar保证以(几乎)原始设备的全部速率流式实时数据捕获,允许在不中断数据捕获的情况下以最大努力对设备进行并发读写访问,并在给定可用存储的情况下尽可能长时间保留数据。它内置了可靠性和索引机制,可以扩展以满足任意带宽需求,并且可以很好地处理小型和大型数据元素。我们的原型实现结果表明,与传统文件系统相比,Mahanaxar提供了更好的保证和更好的性能。
{"title":"Mahanaxar: Quality of service guarantees in high-bandwidth, real-time streaming data storage","authors":"David O. Bigelow, S. Brandt, John Bent, Hsing-bung Chen","doi":"10.1109/MSST.2010.5496975","DOIUrl":"https://doi.org/10.1109/MSST.2010.5496975","url":null,"abstract":"Large radio telescopes, cyber-security systems monitoring real-time network traffic, and others have specialized data storage needs: guaranteed capture of an ultra-high-bandwidth data stream, retention of the data long enough to determine what is “interesting,” retention of interesting data indefinitely, and concurrent read/write access to determine what data is interesting, without interrupting the ongoing capture of incoming data. Mahanaxar addresses this problem. Mahanaxar guarantees streaming real-time data capture at (nearly) the full rate of the raw device, allows concurrent read and write access to the device on a best-effort basis without interrupting the data capture, and retains data as long as possible given the available storage. It has built in mechanisms for reliability and indexing, can scale to meet arbitrary bandwidth requirements, and handles both small and large data elements equally well. Results from our prototype implementation show that Mahanaxar provides both better guarantees and better performance than traditional file systems.","PeriodicalId":350968,"journal":{"name":"2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"358 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116364784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Leveraging disk drive acoustic modes for power management 利用磁盘驱动器声学模式进行电源管理
Pub Date : 2010-05-03 DOI: 10.1109/MSST.2010.5496993
Doron Chen, George Goldberg, R. Kahn, Ronen I. Kat, K. Meth
Reduction of disk drive power consumption is a challenging task, particularly since the most prevalent way of achieving it, powering down idle disks, has many undesirable side-effects. Some hard disk drives support acoustic modes, meaning they can be configured to reduce the acceleration and velocity of the disk head. This reduces instantaneous power consumption but sacrifices performance. As a result, input/output (I/O) operations run longer at reduced power. This is useful for power capping since it causes significant reduction in peak power consumption of the disks. We conducted experiments on several disk drives that support acoustic management. Most of these disk drives support only two modes — quiet and normal. We ran different I/O workloads, including SPC-1 to simulate a real-world online transaction processing workload. We found that the reduction in peak power can reach up to 23% when using quiet mode. We show that for some workloads this translates into a reduction of 12.5% in overall energy consumption. In other workloads we encountered the opposite phenomenon-an increase of more than 6% in the overall energy consumption.
降低磁盘驱动器功耗是一项具有挑战性的任务,特别是因为实现它的最流行的方法是关闭空闲磁盘,这有许多不希望看到的副作用。一些硬盘驱动器支持声学模式,这意味着它们可以配置为降低磁盘磁头的加速度和速度。这降低了瞬时功耗,但牺牲了性能。因此,在降低功耗的情况下,输入/输出(I/O)操作可以运行更长时间。这对于功率封顶很有用,因为它可以显著降低磁盘的峰值功耗。我们在几个支持声学管理的磁盘驱动器上进行了实验。大多数这些磁盘驱动器只支持两种模式-安静和正常。我们运行了不同的I/O工作负载,包括SPC-1,以模拟真实的在线事务处理工作负载。我们发现,当使用安静模式时,峰值功率的降低可以达到23%。我们表明,对于某些工作负载,这意味着总体能耗降低了12.5%。在其他工作负载中,我们遇到了相反的现象——总能耗增加了6%以上。
{"title":"Leveraging disk drive acoustic modes for power management","authors":"Doron Chen, George Goldberg, R. Kahn, Ronen I. Kat, K. Meth","doi":"10.1109/MSST.2010.5496993","DOIUrl":"https://doi.org/10.1109/MSST.2010.5496993","url":null,"abstract":"Reduction of disk drive power consumption is a challenging task, particularly since the most prevalent way of achieving it, powering down idle disks, has many undesirable side-effects. Some hard disk drives support acoustic modes, meaning they can be configured to reduce the acceleration and velocity of the disk head. This reduces instantaneous power consumption but sacrifices performance. As a result, input/output (I/O) operations run longer at reduced power. This is useful for power capping since it causes significant reduction in peak power consumption of the disks. We conducted experiments on several disk drives that support acoustic management. Most of these disk drives support only two modes — quiet and normal. We ran different I/O workloads, including SPC-1 to simulate a real-world online transaction processing workload. We found that the reduction in peak power can reach up to 23% when using quiet mode. We show that for some workloads this translates into a reduction of 12.5% in overall energy consumption. In other workloads we encountered the opposite phenomenon-an increase of more than 6% in the overall energy consumption.","PeriodicalId":350968,"journal":{"name":"2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114949273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Enabling active storage on parallel I/O software stacks 在并行I/O软件堆栈上启用活动存储
Pub Date : 2010-05-03 DOI: 10.1109/MSST.2010.5496981
S. Son, S. Lang, P. Carns, R. Ross, R. Thakur, Berkin Özisikyilmaz, Prabhat Kumar, W. Liao, A. Choudhary
As data sizes continue to increase, the concept of active storage is well fitted for many data analysis kernels. Nevertheless, while this concept has been investigated and deployed in a number of forms, enabling it from the parallel I/O software stack has been largely unexplored. In this paper, we propose and evaluate an active storage system that allows data analysis, mining, and statistical operations to be executed from within a parallel I/O interface. In our proposed scheme, common analysis kernels are embedded in parallel file systems. We expose the semantics of these kernels to parallel file systems through an enhanced runtime interface so that execution of embedded kernels is possible on the server. In order to allow complete server-side operations without file format or layout manipulation, our scheme adjusts the file I/O buffer to the computational unit boundary on the fly. Our scheme also uses server-side collective communication primitives for reduction and aggregation using interserver communication. We have implemented a prototype of our active storage system and demonstrate its benefits using four data analysis benchmarks. Our experimental results show that our proposed system improves the overall performance of all four benchmarks by 50.9% on average and that the compute-intensive portion of the k-means clustering kernel can be improved by 58.4% through GPU offloading when executed with a larger computational load. We also show that our scheme consistently outperforms the traditional storage model with a wide variety of input dataset sizes, number of nodes, and computational loads.
随着数据大小的不断增加,活动存储的概念非常适合许多数据分析内核。然而,虽然这个概念已经以多种形式进行了研究和部署,但从并行I/O软件堆栈中启用它在很大程度上还没有被探索过。在本文中,我们提出并评估了一个主动存储系统,该系统允许在并行I/O接口内执行数据分析,挖掘和统计操作。在我们提出的方案中,通用分析内核被嵌入到并行文件系统中。我们通过增强的运行时接口将这些内核的语义公开给并行文件系统,这样就可以在服务器上执行嵌入式内核。为了允许在没有文件格式或布局操作的情况下完成服务器端操作,我们的方案动态地将文件I/O缓冲区调整到计算单元边界。我们的方案还使用服务器端集体通信原语进行减少和使用服务器间通信进行聚合。我们已经实现了主动存储系统的原型,并使用四个数据分析基准来演示其优点。我们的实验结果表明,我们提出的系统在所有四个基准测试中的总体性能平均提高了50.9%,并且当在较大的计算负载下执行时,通过GPU卸载可以将k-means聚类内核的计算密集型部分提高58.4%。我们还表明,我们的方案在各种输入数据集大小、节点数量和计算负载的情况下始终优于传统的存储模型。
{"title":"Enabling active storage on parallel I/O software stacks","authors":"S. Son, S. Lang, P. Carns, R. Ross, R. Thakur, Berkin Özisikyilmaz, Prabhat Kumar, W. Liao, A. Choudhary","doi":"10.1109/MSST.2010.5496981","DOIUrl":"https://doi.org/10.1109/MSST.2010.5496981","url":null,"abstract":"As data sizes continue to increase, the concept of active storage is well fitted for many data analysis kernels. Nevertheless, while this concept has been investigated and deployed in a number of forms, enabling it from the parallel I/O software stack has been largely unexplored. In this paper, we propose and evaluate an active storage system that allows data analysis, mining, and statistical operations to be executed from within a parallel I/O interface. In our proposed scheme, common analysis kernels are embedded in parallel file systems. We expose the semantics of these kernels to parallel file systems through an enhanced runtime interface so that execution of embedded kernels is possible on the server. In order to allow complete server-side operations without file format or layout manipulation, our scheme adjusts the file I/O buffer to the computational unit boundary on the fly. Our scheme also uses server-side collective communication primitives for reduction and aggregation using interserver communication. We have implemented a prototype of our active storage system and demonstrate its benefits using four data analysis benchmarks. Our experimental results show that our proposed system improves the overall performance of all four benchmarks by 50.9% on average and that the compute-intensive portion of the k-means clustering kernel can be improved by 58.4% through GPU offloading when executed with a larger computational load. We also show that our scheme consistently outperforms the traditional storage model with a wide variety of input dataset sizes, number of nodes, and computational loads.","PeriodicalId":350968,"journal":{"name":"2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125521073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Exporting kernel page caching for efficient user-level I/O 导出内核页面缓存以实现高效的用户级I/O
Pub Date : 2010-05-03 DOI: 10.1109/MSST.2010.5496973
Richard P. Spillane, S. Dixit, Shrikar Archak, Saumitra Bhanage, E. Zadok
The modern file system is still implemented in the kernel, and is statically linked with other kernel components. This architecture has brought performance and efficient integration with memory management. However kernel development is slow and modern storage systems must support an array of features, including distribution across a network, tagging, searching, deduplication, checksumming, snap-shotting, file pre-allocation, real time I/O guarantees for media, and more. To move complex components into user-level however will require an efficient mechanism for handling page faulting and zero-copy caching, write ordering, synchronous flushes, interaction with the kernel page write-back thread, and secure shared memory. We implement such a system, and experiment with a user-level object store built on top. Our object store is a complete re-design of the traditional storage stack and demonstrates the efficiency of our technique, and the flexibility it grants to user-level storage systems. Our current prototype file system incurs between a 1% and 6% overhead on the default native file system Ext3 for in-cache system workloads. Where the native kernel file system design has traditionally found its primary motivation. For update and insert intensive metadata workloads that are out-of-cache, we perform 39 times better than the native Ext3 file system, while still performing only 2 times worse on out-of-cache random lookups.
现代文件系统仍然在内核中实现,并与其他内核组件静态链接。这种体系结构带来了与内存管理的性能和高效集成。然而,内核开发是缓慢的,现代存储系统必须支持一系列特性,包括跨网络分发、标记、搜索、重复数据删除、校验和、快照、文件预分配、媒体的实时I/O保证等等。然而,要将复杂的组件转移到用户级,需要一种有效的机制来处理页面错误和零拷贝缓存、写排序、同步刷新、与内核页面回写线程的交互以及安全的共享内存。我们实现了这样一个系统,并在上面构建了一个用户级对象存储。我们的对象存储是对传统存储堆栈的完全重新设计,并展示了我们的技术的效率,以及它赋予用户级存储系统的灵活性。对于缓存内系统工作负载,我们当前的原型文件系统会在默认本机文件系统Ext3上产生1%到6%的开销。本地内核文件系统设计在传统上找到了它的主要动机。对于缓存外的更新和插入密集型元数据工作负载,我们的性能比本机Ext3文件系统好39倍,而在缓存外随机查找时的性能仅差2倍。
{"title":"Exporting kernel page caching for efficient user-level I/O","authors":"Richard P. Spillane, S. Dixit, Shrikar Archak, Saumitra Bhanage, E. Zadok","doi":"10.1109/MSST.2010.5496973","DOIUrl":"https://doi.org/10.1109/MSST.2010.5496973","url":null,"abstract":"The modern file system is still implemented in the kernel, and is statically linked with other kernel components. This architecture has brought performance and efficient integration with memory management. However kernel development is slow and modern storage systems must support an array of features, including distribution across a network, tagging, searching, deduplication, checksumming, snap-shotting, file pre-allocation, real time I/O guarantees for media, and more. To move complex components into user-level however will require an efficient mechanism for handling page faulting and zero-copy caching, write ordering, synchronous flushes, interaction with the kernel page write-back thread, and secure shared memory. We implement such a system, and experiment with a user-level object store built on top. Our object store is a complete re-design of the traditional storage stack and demonstrates the efficiency of our technique, and the flexibility it grants to user-level storage systems. Our current prototype file system incurs between a 1% and 6% overhead on the default native file system Ext3 for in-cache system workloads. Where the native kernel file system design has traditionally found its primary motivation. For update and insert intensive metadata workloads that are out-of-cache, we perform 39 times better than the native Ext3 file system, while still performing only 2 times worse on out-of-cache random lookups.","PeriodicalId":350968,"journal":{"name":"2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128538091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A study of self-similarity in parallel I/O workloads 并行I/O工作负载的自相似性研究
Pub Date : 2010-05-03 DOI: 10.1109/MSST.2010.5496978
Qiang Zoll, Yifeng Zhu, D. Feng
A challenging issue in performance evaluation of parallel storage systems through trace-driven simulation is to accurately characterize and emulate I/O behaviors in real applications. The correlation study of inter-arrival times between I/O requests, with an emphasis on I/O-intensive scientific applications, shows the necessity to further study the self-similarity of parallel I/O arrivals. This paper analyzes several I/O traces collected in large-scale supercomputers and concludes that parallel I/Os exhibit statistically self-similar like behavior. Instead of Markov model, a new stochastic model is proposed and validated in this paper to accurately model parallel I/O burstiness. This model can be used to predicting I/O workloads in real systems and generate reliable synthetic I/O sequences in simulation studies.
在通过跟踪驱动仿真对并行存储系统进行性能评估时,如何准确表征和模拟实际应用中的I/O行为是一个具有挑战性的问题。以I/O密集型科学应用为重点,对I/O请求间到达时间的相关性研究表明,有必要进一步研究并行I/O到达的自相似性。本文分析了在大型超级计算机中收集的几个I/O轨迹,得出并行I/O表现出统计上的自相似行为的结论。本文提出并验证了一种新的随机模型来代替马尔可夫模型来准确地模拟并行I/O突发。该模型可用于预测实际系统中的I/O工作负载,并在仿真研究中生成可靠的综合I/O序列。
{"title":"A study of self-similarity in parallel I/O workloads","authors":"Qiang Zoll, Yifeng Zhu, D. Feng","doi":"10.1109/MSST.2010.5496978","DOIUrl":"https://doi.org/10.1109/MSST.2010.5496978","url":null,"abstract":"A challenging issue in performance evaluation of parallel storage systems through trace-driven simulation is to accurately characterize and emulate I/O behaviors in real applications. The correlation study of inter-arrival times between I/O requests, with an emphasis on I/O-intensive scientific applications, shows the necessity to further study the self-similarity of parallel I/O arrivals. This paper analyzes several I/O traces collected in large-scale supercomputers and concludes that parallel I/Os exhibit statistically self-similar like behavior. Instead of Markov model, a new stochastic model is proposed and validated in this paper to accurately model parallel I/O burstiness. This model can be used to predicting I/O workloads in real systems and generate reliable synthetic I/O sequences in simulation studies.","PeriodicalId":350968,"journal":{"name":"2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133209721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
The Hadoop Distributed File System Hadoop分布式文件系统
Pub Date : 2010-05-03 DOI: 10.1109/MSST.2010.5496972
K. Shvachko, Hairong Kuang, S. Radia, R. Chansler
The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!.
Hadoop分布式文件系统(HDFS)旨在可靠地存储非常大的数据集,并以高带宽将这些数据集流式传输到用户应用程序。在大型集群中,数千台服务器既承载直接附加的存储,又执行用户应用程序任务。通过在许多服务器上分布存储和计算,资源可以随着需求而增长,同时在各种规模下都保持经济。我们描述了HDFS的架构,并报告了在雅虎使用HDFS管理25pb企业数据的经验。
{"title":"The Hadoop Distributed File System","authors":"K. Shvachko, Hairong Kuang, S. Radia, R. Chansler","doi":"10.1109/MSST.2010.5496972","DOIUrl":"https://doi.org/10.1109/MSST.2010.5496972","url":null,"abstract":"The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!.","PeriodicalId":350968,"journal":{"name":"2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124924400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5169
An adaptive partitioning scheme for DRAM-based cache in Solid State Drives 一种基于dram的固态硬盘缓存自适应分区方案
Pub Date : 2010-05-03 DOI: 10.1109/MSST.2010.5496995
Hyotaek Shim, Bon-Keun Seo, Jin-Soo Kim, S. Maeng
Recently, NAND flash-based Solid State Drives (SSDs) have been rapidly adopted in laptops, desktops, and server storage systems because their performance is superior to that of traditional magnetic disks. However, NAND flash memory has some limitations such as out-of-place updates, bulk erase operations, and a limited number of write operations. To alleviate these unfavorable characteristics, various techniques for improving internal software and hardware components have been devised. In particular, the internal device cache of SSDs has a significant impact on the performance. The device cache is used for two main purposes: to absorb frequent read/write requests and to store logical-to-physical address mapping information. In the device cache, we observed that the optimal ratio of the data buffering and the address mapping space changes according to workload characteristics. To achieve optimal performance in SSDs, the device cache should be appropriately partitioned between the two main purposes. In this paper, we propose an adaptive partitioning scheme, which is based on a ghost caching mechanism, to adaptively tune the ratio of the buffering and the mapping space in the device cache according to the workload characteristics. The simulation results demonstrate that the performance of the proposed scheme approximates the best performance.
近年来,基于NAND闪存的固态硬盘(ssd)由于其性能优于传统磁盘,在笔记本电脑、台式机和服务器存储系统中得到了迅速的应用。然而,NAND闪存有一些限制,比如不在位置的更新、批量擦除操作和有限数量的写操作。为了减轻这些不利的特性,已经设计了各种改进内部软件和硬件组件的技术。其中,ssd盘内部设备缓存对性能影响较大。设备缓存主要用于两个目的:吸收频繁的读/写请求和存储逻辑到物理地址的映射信息。在设备缓存中,我们观察到数据缓冲和地址映射空间的最佳比例根据工作负载特征而变化。为了在ssd中实现最佳性能,设备缓存应该在两个主要用途之间进行适当的分区。在本文中,我们提出了一种基于幽灵缓存机制的自适应分区方案,可以根据工作负载的特点自适应地调整设备缓存中的缓冲空间和映射空间的比例。仿真结果表明,所提方案的性能接近最佳性能。
{"title":"An adaptive partitioning scheme for DRAM-based cache in Solid State Drives","authors":"Hyotaek Shim, Bon-Keun Seo, Jin-Soo Kim, S. Maeng","doi":"10.1109/MSST.2010.5496995","DOIUrl":"https://doi.org/10.1109/MSST.2010.5496995","url":null,"abstract":"Recently, NAND flash-based Solid State Drives (SSDs) have been rapidly adopted in laptops, desktops, and server storage systems because their performance is superior to that of traditional magnetic disks. However, NAND flash memory has some limitations such as out-of-place updates, bulk erase operations, and a limited number of write operations. To alleviate these unfavorable characteristics, various techniques for improving internal software and hardware components have been devised. In particular, the internal device cache of SSDs has a significant impact on the performance. The device cache is used for two main purposes: to absorb frequent read/write requests and to store logical-to-physical address mapping information. In the device cache, we observed that the optimal ratio of the data buffering and the address mapping space changes according to workload characteristics. To achieve optimal performance in SSDs, the device cache should be appropriately partitioned between the two main purposes. In this paper, we propose an adaptive partitioning scheme, which is based on a ghost caching mechanism, to adaptively tune the ratio of the buffering and the mapping space in the device cache according to the workload characteristics. The simulation results demonstrate that the performance of the proposed scheme approximates the best performance.","PeriodicalId":350968,"journal":{"name":"2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131042506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
期刊
2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1