首页 > 最新文献

ACM Transactions on Storage (TOS)最新文献

英文 中文
Bringing Order to Chaos 将秩序带入混乱
Pub Date : 2018-10-03 DOI: 10.1145/3242091
Y. Won, Joontaek Oh, Jaemin Jung, Gyeongyeol Choi, Seongbae Son, J. Hwang, Sangyeun Cho
This work is dedicated to eliminating the overhead required for guaranteeing the storage order in the modern IO stack. The existing block device adopts a prohibitively expensive approach in ensuring the storage order among write requests: interleaving the write requests with Transfer-and-Flush. For exploiting the cache barrier command for flash storage, we overhaul the IO scheduler, the dispatch module, and the filesystem so that these layers are orchestrated to preserve the ordering condition imposed by the application with which the associated data blocks are made durable. The key ingredients of Barrier-Enabled IO stack are Epoch-based IO scheduling, Order-Preserving Dispatch, and Dual-Mode Journaling. Barrier-enabled IO stack can control the storage order without Transfer-and-Flush overhead. We implement the barrier-enabled IO stack in server as well as in mobile platforms. SQLite performance increases by 270% and 75%, in server and in smartphone, respectively. In a server storage, BarrierFS brings as much as by 43 × and by 73× performance gain in MySQL and SQLite, respectively, against EXT4 via relaxing the durability of a transaction.
这项工作致力于消除在现代IO堆栈中保证存储顺序所需的开销。现有的块设备在确保写请求之间的存储顺序方面采用了一种代价高昂的方法:将写请求与Transfer-and-Flush交织在一起。为了利用flash存储的缓存屏障命令,我们彻底检查了IO调度器、调度模块和文件系统,以便对这些层进行编排,以保持应用程序所施加的排序条件,从而使相关的数据块持久。Barrier-Enabled IO堆栈的关键组成部分是基于时代的IO调度、保序调度和双模式日志记录。启用屏障的IO堆栈可以控制存储顺序,而不需要传输和刷新开销。我们在服务器和移动平台上实现了启用屏障的IO堆栈。在服务器和智能手机上,SQLite性能分别提高了270%和75%。在服务器存储中,通过放宽事务的持久性,BarrierFS在MySQL和SQLite中分别比EXT4带来了43倍和73倍的性能提升。
{"title":"Bringing Order to Chaos","authors":"Y. Won, Joontaek Oh, Jaemin Jung, Gyeongyeol Choi, Seongbae Son, J. Hwang, Sangyeun Cho","doi":"10.1145/3242091","DOIUrl":"https://doi.org/10.1145/3242091","url":null,"abstract":"This work is dedicated to eliminating the overhead required for guaranteeing the storage order in the modern IO stack. The existing block device adopts a prohibitively expensive approach in ensuring the storage order among write requests: interleaving the write requests with Transfer-and-Flush. For exploiting the cache barrier command for flash storage, we overhaul the IO scheduler, the dispatch module, and the filesystem so that these layers are orchestrated to preserve the ordering condition imposed by the application with which the associated data blocks are made durable. The key ingredients of Barrier-Enabled IO stack are Epoch-based IO scheduling, Order-Preserving Dispatch, and Dual-Mode Journaling. Barrier-enabled IO stack can control the storage order without Transfer-and-Flush overhead. We implement the barrier-enabled IO stack in server as well as in mobile platforms. SQLite performance increases by 270% and 75%, in server and in smartphone, respectively. In a server storage, BarrierFS brings as much as by 43 × and by 73× performance gain in MySQL and SQLite, respectively, against EXT4 via relaxing the durability of a transaction.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"355 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114083302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fail-Slow at Scale 大规模慢速故障
Pub Date : 2018-10-03 DOI: 10.1145/3242086
Haryadi S. Gunawi, Riza O. Suminto, R. Sears, Casey Golliher, S. Sundararaman, Xing Lin, Tim Emami, Weiguang Sheng, N. Bidokhti, C. McCaffrey, Deepthi Srinivasan, Biswaranjan Panda, A. Baptist, G. Grider, P. Fields, K. Harms, R. Ross, Andree Jacobson, R. Ricci, Kirk Webb, P. Alvaro, H. Runesha, M. Hao, Huaicheng Li
Fail-slow hardware is an under-studied failure mode. We present a study of 114 reports of fail-slow hardware incidents, collected from large-scale cluster deployments in 14 institutions. We show that all hardware types such as disk, SSD, CPU, memory, and network components can exhibit performance faults. We made several important observations such as faults convert from one form to another, the cascading root causes and impacts can be long, and fail-slow faults can have varying symptoms. From this study, we make suggestions to vendors, operators, and systems designers.
慢速故障硬件是一种尚未得到充分研究的故障模式。我们对14个机构的大规模集群部署中收集的114份故障慢速硬件事件报告进行了研究。我们展示了所有硬件类型(如磁盘、SSD、CPU、内存和网络组件)都可能出现性能故障。我们进行了一些重要的观察,例如故障从一种形式转换为另一种形式,级联的根本原因和影响可能很长,故障缓慢故障可能具有不同的症状。根据这项研究,我们对供应商、运营商和系统设计师提出建议。
{"title":"Fail-Slow at Scale","authors":"Haryadi S. Gunawi, Riza O. Suminto, R. Sears, Casey Golliher, S. Sundararaman, Xing Lin, Tim Emami, Weiguang Sheng, N. Bidokhti, C. McCaffrey, Deepthi Srinivasan, Biswaranjan Panda, A. Baptist, G. Grider, P. Fields, K. Harms, R. Ross, Andree Jacobson, R. Ricci, Kirk Webb, P. Alvaro, H. Runesha, M. Hao, Huaicheng Li","doi":"10.1145/3242086","DOIUrl":"https://doi.org/10.1145/3242086","url":null,"abstract":"Fail-slow hardware is an under-studied failure mode. We present a study of 114 reports of fail-slow hardware incidents, collected from large-scale cluster deployments in 14 institutions. We show that all hardware types such as disk, SSD, CPU, memory, and network components can exhibit performance faults. We made several important observations such as faults convert from one form to another, the cascading root causes and impacts can be long, and fail-slow faults can have varying symptoms. From this study, we make suggestions to vendors, operators, and systems designers.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116437695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
M-CLOCK M-CLOCK
Pub Date : 2018-10-03 DOI: 10.1145/3216730
Minhoe Lee, Donghyun Kang, Y. Eom
Phase Change Memory (PCM) has drawn great attention as a main memory due to its attractive characteristics such as non-volatility, byte-addressability, and in-place update. However, since the capacity of PCM is not fully mature yet, hybrid memory architecture that consists of DRAM and PCM has been suggested as a main memory. In addition, page replacement algorithm based on hybrid memory architecture is actively being studied, because existing page replacement algorithms cannot be used on hybrid memory architecture in that they do not consider the two weaknesses of PCM: high write latency and low endurance. In this article, to mitigate the above hardware limitations of PCM, we revisit the page cache layer for the hybrid memory architecture and propose a novel page replacement algorithm, called M-CLOCK, to improve the performance of hybrid memory architecture and the lifespan of PCM. In particular, M-CLOCK aims to reduce the number of PCM writes that negatively affect the performance of hybrid memory architecture. Experimental results clearly show that M-CLOCK outperforms the state-of-the-art page replacement algorithms in terms of the number of PCM writes and effective memory access time by up to 98% and 9.4 times, respectively.
相变存储器(PCM)由于其非易失性、字节可寻址性和就地更新等特性而受到广泛关注。但是,由于PCM的容量尚未完全成熟,因此有人建议将DRAM和PCM组成的混合存储器结构作为主存储器。此外,基于混合内存架构的页面替换算法也正在积极研究中,因为现有的页面替换算法没有考虑到PCM的两个弱点:高写延迟和低持久时间,因此无法在混合内存架构上使用。在本文中,为了减轻PCM的上述硬件限制,我们重新审视了混合内存架构的页面缓存层,并提出了一种新的页面替换算法M-CLOCK,以提高混合内存架构的性能和PCM的使用寿命。特别是,M-CLOCK旨在减少对混合内存架构的性能产生负面影响的PCM写入次数。实验结果清楚地表明,M-CLOCK在PCM写次数和有效内存访问时间方面分别比最先进的页面替换算法高出98%和9.4倍。
{"title":"M-CLOCK","authors":"Minhoe Lee, Donghyun Kang, Y. Eom","doi":"10.1145/3216730","DOIUrl":"https://doi.org/10.1145/3216730","url":null,"abstract":"Phase Change Memory (PCM) has drawn great attention as a main memory due to its attractive characteristics such as non-volatility, byte-addressability, and in-place update. However, since the capacity of PCM is not fully mature yet, hybrid memory architecture that consists of DRAM and PCM has been suggested as a main memory. In addition, page replacement algorithm based on hybrid memory architecture is actively being studied, because existing page replacement algorithms cannot be used on hybrid memory architecture in that they do not consider the two weaknesses of PCM: high write latency and low endurance. In this article, to mitigate the above hardware limitations of PCM, we revisit the page cache layer for the hybrid memory architecture and propose a novel page replacement algorithm, called M-CLOCK, to improve the performance of hybrid memory architecture and the lifespan of PCM. In particular, M-CLOCK aims to reduce the number of PCM writes that negatively affect the performance of hybrid memory architecture. Experimental results clearly show that M-CLOCK outperforms the state-of-the-art page replacement algorithms in terms of the number of PCM writes and effective memory access time by up to 98% and 9.4 times, respectively.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125505459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Protocol-Aware Recovery for Consensus-Based Distributed Storage 基于共识的分布式存储协议感知恢复
Pub Date : 2018-10-03 DOI: 10.1145/3241062
R. Alagappan, Aishwarya Ganesan, Eric Lee, Aws Albarghouthi, Vijay Chidambaram, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
We introduce protocol-aware recovery (Par), a new approach that exploits protocol-specific knowledge to correctly recover from storage faults in distributed systems. We demonstrate the efficacy of Par through the design and implementation of corruption-tolerant replication (Ctrl), a Par mechanism specific to replicated state machine (RSM) systems. We experimentally show that the Ctrl versions of two systems, LogCabin and ZooKeeper, safely recover from storage faults and provide high availability, while the unmodified versions can lose data or become unavailable. We also show that the Ctrl versions achieve this reliability with little performance overheads.
我们介绍了协议感知恢复(Par),这是一种利用特定于协议的知识从分布式系统中的存储故障中正确恢复的新方法。我们通过设计和实现容错复制(Ctrl)来证明Par的有效性,这是一种特定于复制状态机(RSM)系统的Par机制。我们通过实验证明,LogCabin和ZooKeeper两个系统的Ctrl版本可以安全地从存储故障中恢复并提供高可用性,而未修改的版本可能会丢失数据或不可用。我们还展示了Ctrl版本以很少的性能开销实现了这种可靠性。
{"title":"Protocol-Aware Recovery for Consensus-Based Distributed Storage","authors":"R. Alagappan, Aishwarya Ganesan, Eric Lee, Aws Albarghouthi, Vijay Chidambaram, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau","doi":"10.1145/3241062","DOIUrl":"https://doi.org/10.1145/3241062","url":null,"abstract":"We introduce protocol-aware recovery (Par), a new approach that exploits protocol-specific knowledge to correctly recover from storage faults in distributed systems. We demonstrate the efficacy of Par through the design and implementation of <underline>c</underline>orruption-<underline>t</underline>olerant <underline>r</underline>ep<underline>l</underline>ication (Ctrl), a Par mechanism specific to replicated state machine (RSM) systems. We experimentally show that the Ctrl versions of two systems, LogCabin and ZooKeeper, safely recover from storage faults and provide high availability, while the unmodified versions can lose data or become unavailable. We also show that the Ctrl versions achieve this reliability with little performance overheads.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131641295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Lerna
Pub Date : 2018-06-04 DOI: 10.1145/3310368
Mohamed M. Saad, R. Palmieri, B. Ravindran
We present Lerna, an end-to-end tool that automatically and transparently detects and extracts parallelism from data-dependent sequential loops. Lerna uses speculation combined with a set of techniques including code profiling, dependency analysis, instrumentation, and adaptive execution. Speculation is needed to avoid conservative actions and detect actual conflicts. Lerna targets applications that are hard-to-parallelize due to data dependency. Our experimental study involves the parallelization of 13 applications with data dependencies. Results on a 24-core machine show an average of 2.7× speedup for micro-benchmarks and 2.5× for the macro-benchmarks.
我们提出了Lerna,一个端到端工具,自动和透明地检测和提取数据依赖顺序循环的并行性。Lerna将推测与一系列技术相结合,包括代码分析、依赖分析、检测和自适应执行。为了避免保守的行动和发现实际的冲突,需要进行推测。Lerna针对的是由于数据依赖而难以并行化的应用程序。我们的实验研究涉及13个具有数据依赖性的应用程序的并行化。在24核机器上的结果显示,微基准测试的平均加速为2.7倍,宏观基准测试的平均加速为2.5倍。
{"title":"Lerna","authors":"Mohamed M. Saad, R. Palmieri, B. Ravindran","doi":"10.1145/3310368","DOIUrl":"https://doi.org/10.1145/3310368","url":null,"abstract":"We present Lerna, an end-to-end tool that automatically and transparently detects and extracts parallelism from data-dependent sequential loops. Lerna uses speculation combined with a set of techniques including code profiling, dependency analysis, instrumentation, and adaptive execution. Speculation is needed to avoid conservative actions and detect actual conflicts. Lerna targets applications that are hard-to-parallelize due to data dependency. Our experimental study involves the parallelization of 13 applications with data dependencies. Results on a 24-core machine show an average of 2.7× speedup for micro-benchmarks and 2.5× for the macro-benchmarks.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132780023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
REGISTOR 暂存器
Pub Date : 2018-06-04 DOI: 10.1145/3310149
Shuyi Pei, Jing Yang, Qing Yang
This article presents REGISTOR, a platform for regular expression grabbing inside storage. The main idea of Registor is accelerating regular expression (regex) search inside storage where large data set is stored, eliminating the I/O bottleneck problem. A special hardware engine for regex search is designed and augmented inside a flash SSD that processes data on-the-fly during data transmission from NAND flash to host. To make the speed of regex search match the internal bus speed of a modern SSD, a deep pipeline structure is designed in Registor hardware consisting of a file semantics extractor, matching candidates finder, regex matching units (REMUs), and results organizer. Furthermore, each stage of the pipeline makes the use of maximal parallelism possible. To make Registor readily usable by high-level applications, we have developed a set of APIs and libraries in Linux allowing Registor to process files in the SSD by recombining separate data blocks into files efficiently. A working prototype of Registor has been built in our newly designed NVMe-SSD. Extensive experiments and analyses have been carried out to show that Registor achieves high throughput, reduces the I/O bandwidth requirement by up to 97%, and reduces CPU utilization by as much as 82% for regex search in large datasets.
本文介绍了REGISTOR,这是一个在存储中获取正则表达式的平台。Registor的主要思想是在存储大型数据集的存储中加速正则表达式(regex)搜索,从而消除I/O瓶颈问题。在闪存SSD中设计并增强了用于正则表达式搜索的特殊硬件引擎,该引擎在数据从NAND闪存传输到主机期间实时处理数据。为了使正则表达式搜索的速度与现代SSD的内部总线速度相匹配,在Registor硬件中设计了一个深层管道结构,该结构由文件语义提取器、匹配候选项查找器、正则表达式匹配单元(remu)和结果管理器组成。此外,管道的每个阶段都可以使用最大的并行性。为了使Registor易于被高级应用程序使用,我们在Linux中开发了一组api和库,允许Registor通过有效地将单独的数据块重组为文件来处理SSD中的文件。register的工作原型已经建立在我们新设计的NVMe-SSD中。大量的实验和分析表明,Registor实现了高吞吐量,将I/O带宽需求降低了97%,并将大型数据集中的正则表达式搜索的CPU利用率降低了82%。
{"title":"REGISTOR","authors":"Shuyi Pei, Jing Yang, Qing Yang","doi":"10.1145/3310149","DOIUrl":"https://doi.org/10.1145/3310149","url":null,"abstract":"This article presents REGISTOR, a platform for regular expression grabbing inside storage. The main idea of Registor is accelerating regular expression (regex) search inside storage where large data set is stored, eliminating the I/O bottleneck problem. A special hardware engine for regex search is designed and augmented inside a flash SSD that processes data on-the-fly during data transmission from NAND flash to host. To make the speed of regex search match the internal bus speed of a modern SSD, a deep pipeline structure is designed in Registor hardware consisting of a file semantics extractor, matching candidates finder, regex matching units (REMUs), and results organizer. Furthermore, each stage of the pipeline makes the use of maximal parallelism possible. To make Registor readily usable by high-level applications, we have developed a set of APIs and libraries in Linux allowing Registor to process files in the SSD by recombining separate data blocks into files efficiently. A working prototype of Registor has been built in our newly designed NVMe-SSD. Extensive experiments and analyses have been carried out to show that Registor achieves high throughput, reduces the I/O bandwidth requirement by up to 97%, and reduces CPU utilization by as much as 82% for regex search in large datasets.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116670237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Cluster and Single-Node Analysis of Long-Term Deduplication Patterns 长期重复数据删除模式的集群和单机分析
Pub Date : 2018-05-11 DOI: 10.1145/3183890
Zhen Sun, G. Kuenning, Sonam Mandal, Philip Shilane, Vasily Tarasov, Nong Xiao, E. Zadok
Deduplication has become essential in disk-based backup systems, but there have been few long-term studies of backup workloads. Most past studies either were of a small static snapshot or covered only a short period that was not representative of how a backup system evolves over time. For this article, we first collected 21 months of data from a shared user file system; 33 users and over 4,000 snapshots are covered. We then analyzed the dataset, examining a variety of essential characteristics across two dimensions: single-node deduplication and cluster deduplication. For single-node deduplication analysis, our primary focus was individual-user data. Despite apparently similar roles and behavior among all of our users, we found significant differences in their deduplication ratios. Moreover, the data that some users share with others had a much higher deduplication ratio than average. For cluster deduplication analysis, we implemented seven published data-routing algorithms and created a detailed comparison of their performance with respect to deduplication ratio, load distribution, and communication overhead. We found that per-file routing achieves a higher deduplication ratio than routing by super-chunk (multiple consecutive chunks), but it also leads to high data skew (imbalance of space usage across nodes). We also found that large chunking sizes are better for cluster deduplication, as they significantly reduce data-routing overhead, while their negative impact on deduplication ratios is small and acceptable. We draw interesting conclusions from both single-node and cluster deduplication analysis and make recommendations for future deduplication systems design.
重复数据删除在基于磁盘的备份系统中已经变得至关重要,但是很少有关于备份工作负载的长期研究。过去的大多数研究要么是一个小的静态快照,要么只涵盖了一个短时间,不能代表备份系统如何随时间发展。在本文中,我们首先从共享用户文件系统收集了21个月的数据;涵盖33个用户和超过4,000个快照。然后,我们分析了数据集,检查了两个维度上的各种基本特征:单节点重复数据删除和集群重复数据删除。对于单节点重复数据删除分析,我们主要关注个人用户数据。尽管所有用户的角色和行为明显相似,但我们发现他们的重复数据删除比率存在显著差异。此外,部分用户与其他用户共享的数据的重复数据删除率远高于平均水平。对于集群重复数据删除分析,我们实现了七种已发布的数据路由算法,并创建了它们在重复数据删除比率、负载分布和通信开销方面的性能的详细比较。我们发现,按文件路由比按超级块(多个连续的块)路由实现更高的重复数据删除比率,但它也会导致高数据倾斜(节点间空间使用的不平衡)。我们还发现,较大的分块大小更适合集群重复数据删除,因为它们可以显著减少数据路由开销,而它们对重复数据删除比率的负面影响很小,并且是可以接受的。我们从单节点和集群重复数据删除分析中得出了有趣的结论,并为未来的重复数据删除系统设计提出了建议。
{"title":"Cluster and Single-Node Analysis of Long-Term Deduplication Patterns","authors":"Zhen Sun, G. Kuenning, Sonam Mandal, Philip Shilane, Vasily Tarasov, Nong Xiao, E. Zadok","doi":"10.1145/3183890","DOIUrl":"https://doi.org/10.1145/3183890","url":null,"abstract":"Deduplication has become essential in disk-based backup systems, but there have been few long-term studies of backup workloads. Most past studies either were of a small static snapshot or covered only a short period that was not representative of how a backup system evolves over time. For this article, we first collected 21 months of data from a shared user file system; 33 users and over 4,000 snapshots are covered. We then analyzed the dataset, examining a variety of essential characteristics across two dimensions: single-node deduplication and cluster deduplication. For single-node deduplication analysis, our primary focus was individual-user data. Despite apparently similar roles and behavior among all of our users, we found significant differences in their deduplication ratios. Moreover, the data that some users share with others had a much higher deduplication ratio than average. For cluster deduplication analysis, we implemented seven published data-routing algorithms and created a detailed comparison of their performance with respect to deduplication ratio, load distribution, and communication overhead. We found that per-file routing achieves a higher deduplication ratio than routing by super-chunk (multiple consecutive chunks), but it also leads to high data skew (imbalance of space usage across nodes). We also found that large chunking sizes are better for cluster deduplication, as they significantly reduce data-routing overhead, while their negative impact on deduplication ratios is small and acceptable. We draw interesting conclusions from both single-node and cluster deduplication analysis and make recommendations for future deduplication systems design.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127644451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Empirical Evaluation and Enhancement of Enterprise Storage System Request Scheduling 企业存储系统请求调度的实证评价与改进
Pub Date : 2018-04-27 DOI: 10.1145/3193741
Deng Zhou, Vania Fang, T. Xie, Wen Pan, R. Kesavan, Tony Lin, N. Patel
Since little has been reported in the literature concerning enterprise storage system file-level request scheduling, we do not have enough knowledge about how various scheduling factors affect performance. Moreover, we are in lack of a good understanding on how to enhance request scheduling to adapt to the changing characteristics of workloads and hardware resources. To answer these questions, we first build a request scheduler prototype based on WAFL®, a mainstream file system running on numerous enterprise storage systems worldwide. Next, we use the prototype to quantitatively measure the impact of various scheduling configurations on performance on a NetApp®'s enterprise-class storage system. Several observations have been made. For example, we discover that in order to improve performance, the priority of write requests and non-preempted restarted requests should be boosted in some workloads. Inspired by these observations, we further propose two scheduling enhancement heuristics called SORD (size-oriented request dispatching) and QATS (queue-depth aware time slicing). Finally, we evaluate them by conducting a wide range of experiments using workloads generated by SPC-1 and SFS2014 on both HDD-based and all-flash platforms. Experimental results show that the combination of the two can noticeably reduce average request latency under some workloads.
由于文献中关于企业存储系统文件级请求调度的报道很少,因此我们对各种调度因素如何影响性能没有足够的了解。此外,我们对如何增强请求调度以适应工作负载和硬件资源的变化特征缺乏很好的理解。为了回答这些问题,我们首先基于WAFL®构建了一个请求调度器原型,WAFL®是一种运行在全球众多企业存储系统上的主流文件系统。接下来,我们使用原型来定量测量各种调度配置对NetApp®企业级存储系统性能的影响。已经做了一些观察。例如,我们发现为了提高性能,在某些工作负载中应该提高写请求和非抢占重启请求的优先级。受这些观察结果的启发,我们进一步提出了两种调度增强启发式算法,称为SORD(面向大小的请求调度)和QATS(队列深度感知时间切片)。最后,我们通过在基于hdd和全闪存平台上使用SPC-1和SFS2014生成的工作负载进行广泛的实验来评估它们。实验结果表明,在某些工作负载下,两者的结合可以显著降低平均请求延迟。
{"title":"Empirical Evaluation and Enhancement of Enterprise Storage System Request Scheduling","authors":"Deng Zhou, Vania Fang, T. Xie, Wen Pan, R. Kesavan, Tony Lin, N. Patel","doi":"10.1145/3193741","DOIUrl":"https://doi.org/10.1145/3193741","url":null,"abstract":"Since little has been reported in the literature concerning enterprise storage system file-level request scheduling, we do not have enough knowledge about how various scheduling factors affect performance. Moreover, we are in lack of a good understanding on how to enhance request scheduling to adapt to the changing characteristics of workloads and hardware resources. To answer these questions, we first build a request scheduler prototype based on WAFL®, a mainstream file system running on numerous enterprise storage systems worldwide. Next, we use the prototype to quantitatively measure the impact of various scheduling configurations on performance on a NetApp®'s enterprise-class storage system. Several observations have been made. For example, we discover that in order to improve performance, the priority of write requests and non-preempted restarted requests should be boosted in some workloads. Inspired by these observations, we further propose two scheduling enhancement heuristics called SORD (size-oriented request dispatching) and QATS (queue-depth aware time slicing). Finally, we evaluate them by conducting a wide range of experiments using workloads generated by SPC-1 and SFS2014 on both HDD-based and all-flash platforms. Experimental results show that the combination of the two can noticeably reduce average request latency under some workloads.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122163306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast Miss Ratio Curve Modeling for Storage Cache 存储缓存快速缺失率曲线建模
Pub Date : 2018-04-12 DOI: 10.1145/3185751
Xiameng Hu, Xiaolin Wang, Lan Zhou, Yingwei Luo, Zhenlin Wang, C. Ding, Chencheng Ye
The reuse distance (least recently used (LRU) stack distance) is an essential metric for performance prediction and optimization of storage cache. Over the past four decades, there have been steady improvements in the algorithmic efficiency of reuse distance measurement. This progress is accelerating in recent years, both in theory and practical implementation. In this article, we present a kinetic model of LRU cache memory, based on the average eviction time (AET) of the cached data. The AET model enables fast measurement and use of low-cost sampling. It can produce the miss ratio curve in linear time with extremely low space costs. On storage trace benchmarks, AET reduces the time and space costs compared to former techniques. Furthermore, AET is a composable model that can characterize shared cache behavior through sampling and modeling individual programs or traces.
重用距离(LRU堆栈距离)是存储缓存性能预测和优化的重要指标。在过去的四十年中,重用距离测量的算法效率稳步提高。近年来,这一进展在理论和实际实施方面都在加速。在本文中,我们提出了一个基于缓存数据的平均清除时间(AET)的LRU缓存的动力学模型。AET模型可实现快速测量和使用低成本采样。它能以极低的空间成本在线性时间内生成脱靶率曲线。在存储跟踪基准测试中,与以前的技术相比,AET减少了时间和空间成本。此外,AET是一个可组合的模型,可以通过采样和建模单个程序或跟踪来表征共享缓存行为。
{"title":"Fast Miss Ratio Curve Modeling for Storage Cache","authors":"Xiameng Hu, Xiaolin Wang, Lan Zhou, Yingwei Luo, Zhenlin Wang, C. Ding, Chencheng Ye","doi":"10.1145/3185751","DOIUrl":"https://doi.org/10.1145/3185751","url":null,"abstract":"The reuse distance (least recently used (LRU) stack distance) is an essential metric for performance prediction and optimization of storage cache. Over the past four decades, there have been steady improvements in the algorithmic efficiency of reuse distance measurement. This progress is accelerating in recent years, both in theory and practical implementation. In this article, we present a kinetic model of LRU cache memory, based on the average eviction time (AET) of the cached data. The AET model enables fast measurement and use of low-cost sampling. It can produce the miss ratio curve in linear time with extremely low space costs. On storage trace benchmarks, AET reduces the time and space costs compared to former techniques. Furthermore, AET is a composable model that can characterize shared cache behavior through sampling and modeling individual programs or traces.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"20 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125772837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Workload Characterization for Enterprise Disk Drives 企业磁盘驱动器的工作负载表征
Pub Date : 2018-04-12 DOI: 10.1145/3151847
A. Kashyap
The article presents an analysis of drive workloads from enterprise storage systems. The drive workloads are obtained from field return units from a cross-section of enterprise storage system vendors and thus provides a view of the workload characteristics over a wide spectrum of end-user applications. The workload parameters that have been characterized include transfer lengths, access patterns, throughput, and utilization. The study shows that reads are the dominant workload accounting for 80% of the accesses to the drive. Writes are dominated by short block random accesses while reads range from random to highly sequential. A trend analysis over the period 2010–2014 shows that the workload has remained fairly constant even as the capacities of the drives shipped has steadily increased. The study shows that the data stored on disk drives is relatively cold—on average less than 4% of the drive capacity is accessed in a given 2h interval.
本文分析了来自企业存储系统的驱动器工作负载。驱动器工作负载从企业存储系统供应商的现场返回单元获得,因此提供了广泛的最终用户应用程序的工作负载特征视图。已经描述的工作负载参数包括传输长度、访问模式、吞吐量和利用率。研究表明,读取是主要的工作负载,占对驱动器访问的80%。写操作主要是短块随机访问,而读操作的范围从随机到高度顺序。2010年至2014年期间的趋势分析表明,即使所发运的驱动器容量稳步增加,工作负载仍保持相当稳定。研究表明,存储在磁盘驱动器上的数据相对较冷——在给定的2h间隔内,平均只有不到4%的驱动器容量被访问。
{"title":"Workload Characterization for Enterprise Disk Drives","authors":"A. Kashyap","doi":"10.1145/3151847","DOIUrl":"https://doi.org/10.1145/3151847","url":null,"abstract":"The article presents an analysis of drive workloads from enterprise storage systems. The drive workloads are obtained from field return units from a cross-section of enterprise storage system vendors and thus provides a view of the workload characteristics over a wide spectrum of end-user applications. The workload parameters that have been characterized include transfer lengths, access patterns, throughput, and utilization. The study shows that reads are the dominant workload accounting for 80% of the accesses to the drive. Writes are dominated by short block random accesses while reads range from random to highly sequential. A trend analysis over the period 2010–2014 shows that the workload has remained fairly constant even as the capacities of the drives shipped has steadily increased. The study shows that the data stored on disk drives is relatively cold—on average less than 4% of the drive capacity is accessed in a given 2h interval.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114827710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
ACM Transactions on Storage (TOS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1