2014 30th Symposium on Mass Storage Systems and Technologies (MSST)最新文献

英文中文

Virtualization-aware access control for multitenant filesystems 多租户文件系统的虚拟化感知访问控制

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855543

Giorgos Kappes, A. Hatzieleftheriou, S. Anastasiadis

In a virtualization environment that serves multiple tenants, storage consolidation at the filesystem level is desirable because it enables data sharing, administration efficiency, and performance optimizations. The scalable deployment of filesystems in such environments is challenging due to intermediate translation layers required for networked file access or identity management. First we present several security requirements in multitenant filesystems. Then we introduce the design of the Dike authorization architecture. It combines native access control with tenant namespace isolation and compatibility to object-based filesystems. We use a public cloud to experimentally evaluate a prototype implementation of Dike that we developed. At several thousand tenants, our prototype incurs limited performance overhead up to 16%, unlike an existing solution whose multitenancy overhead approaches 84% in some cases.

在为多个租户提供服务的虚拟化环境中，需要在文件系统级别进行存储整合，因为这样可以实现数据共享、管理效率和性能优化。由于网络文件访问或身份管理需要中间转换层，因此在这种环境中可伸缩的文件系统部署具有挑战性。首先，我们介绍了多租户文件系统中的几个安全需求。然后介绍了Dike授权体系结构的设计。它结合了本地访问控制、租户名称空间隔离和对基于对象的文件系统的兼容性。我们使用公共云来实验性地评估我们开发的Dike原型实现。在数千个租户的情况下，我们的原型会产生有限的性能开销，最高可达16%，而现有解决方案的多租户开销在某些情况下接近84%。

引用次数: 16

Tyche: An efficient Ethernet-based protocol for converged networked storage Tyche:一种高效的基于以太网的融合网络存储协议

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855540

Pilar González-Férez, A. Bilas

Current technology trends for efficient use of infrastructures dictate that storage converges with computation by placing storage devices, such as NVM-based cards and drives, in the servers themselves. With converged storage the role of the interconnect among servers becomes more important for achieving high I/O throughput. Given that Ethernet is emerging as the dominant technology for datacenters, it becomes imperative to examine how to reduce protocol overheads for accessing remote storage over Ethernet interconnects. In this paper we propose Tyche, a network storage protocol directly on top of Ethernet, which does not require any hardware support from the network interface. Therefore, Tyche can be deployed in existing infrastructures and to co-exist with other Ethernet-based protocols. Tyche presents remote storage as a local block device and can support any existing filesystem. At the heart of our approach, there are two main axis: reduction of host-level overheads and scaling with the number of cores and network interfaces in a server. Both target at achieving high I/O throughput in future servers. We reduce overheads via a copy-reduction technique, storage-specific packet processing, pre-allocation of memory, and using RDMA-like operations without requiring hardware support. We transparently handle multiple NICs and offer improved scaling with the number of links and cores via reduced synchronization, proper packet queue design, and NUMA affinity management. Our results show that Tyche achieves scalable I/O throughput, up to 6.4 GB/s for reads and 6.8 GB/s for writes with 6 × 10 GigE NICs. Our analysis shows that although multiple aspects of the protocol play a role for performance, NUMA affinity is particularly important. When comparing to NBD, Tyche performs better by up to one order of magnitude.

当前有效使用基础设施的技术趋势表明，通过将存储设备(如基于nvm的卡和驱动器)放置在服务器本身，存储与计算融合在一起。对于融合存储，服务器之间互连的作用对于实现高I/O吞吐量变得更加重要。考虑到以太网正在成为数据中心的主导技术，研究如何减少通过以太网互连访问远程存储的协议开销变得势在必行。在本文中，我们提出了一个直接建立在以太网之上的网络存储协议Tyche，它不需要任何来自网络接口的硬件支持。因此，Tyche可以部署在现有的基础设施中，也可以与其他基于以太网的协议共存。Tyche将远程存储作为本地块设备呈现，并且可以支持任何现有的文件系统。在我们的方法的核心，有两个主轴:减少主机级开销和扩展服务器中的核心和网络接口的数量。两者的目标都是在未来的服务器中实现高I/O吞吐量。我们通过减少复制技术、特定于存储的数据包处理、内存预分配和使用类似rdma的操作来减少开销，而不需要硬件支持。我们透明地处理多个nic，并通过减少同步、适当的数据包队列设计和NUMA亲和管理，改进了链路和核心数量的可伸缩性。我们的结果表明，Tyche实现了可扩展的I/O吞吐量，在6 × 10 GigE网卡下，读取高达6.4 GB/s，写入高达6.8 GB/s。我们的分析表明，尽管协议的多个方面对性能都有影响，但NUMA亲和性尤其重要。与NBD相比，Tyche的表现要好一个数量级。

{"title":"Tyche: An efficient Ethernet-based protocol for converged networked storage","authors":"Pilar González-Férez, A. Bilas","doi":"10.1109/MSST.2014.6855540","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855540","url":null,"abstract":"Current technology trends for efficient use of infrastructures dictate that storage converges with computation by placing storage devices, such as NVM-based cards and drives, in the servers themselves. With converged storage the role of the interconnect among servers becomes more important for achieving high I/O throughput. Given that Ethernet is emerging as the dominant technology for datacenters, it becomes imperative to examine how to reduce protocol overheads for accessing remote storage over Ethernet interconnects. In this paper we propose Tyche, a network storage protocol directly on top of Ethernet, which does not require any hardware support from the network interface. Therefore, Tyche can be deployed in existing infrastructures and to co-exist with other Ethernet-based protocols. Tyche presents remote storage as a local block device and can support any existing filesystem. At the heart of our approach, there are two main axis: reduction of host-level overheads and scaling with the number of cores and network interfaces in a server. Both target at achieving high I/O throughput in future servers. We reduce overheads via a copy-reduction technique, storage-specific packet processing, pre-allocation of memory, and using RDMA-like operations without requiring hardware support. We transparently handle multiple NICs and offer improved scaling with the number of links and cores via reduced synchronization, proper packet queue design, and NUMA affinity management. Our results show that Tyche achieves scalable I/O throughput, up to 6.4 GB/s for reads and 6.8 GB/s for writes with 6 × 10 GigE NICs. Our analysis shows that although multiple aspects of the protocol play a role for performance, NUMA affinity is particularly important. When comparing to NBD, Tyche performs better by up to one order of magnitude.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130966115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

PLC-cache: Endurable SSD cache for deduplication-based primary storage PLC-cache:用于基于重复数据删除的主存储的耐用SSD缓存

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855536

Jian Liu, Yunpeng Chai, X. Qin, Y. Xiao

Data deduplication techniques improve cost efficiency by dramatically reducing space needs of storage systems. SSD-based data cache has been adopted to remedy the declining I/O performance induced by deduplication operations in the latency-sensitive primary storage. Unfortunately, frequent data updates caused by classical cache algorithms (e.g., FIFO, LRU, and LFU) inevitably slow down SSDs' I/O processing speed while significantly shortening SSDs' lifetime. To address this problem, we propose a new approach-PLC-Cache-to greatly improve the I/O performance as well as write durability of SSDs. PLC-Cache is conducive to amplifying the proportion of the Popular and Long-term Cached (PLC) data, which is infrequently written and kept in SSD cache in a long time period to catalyze cache hits, in an entire SSD written data set. PLC-Cache advocates a two-phase approach. First, non-popular data are ruled out from being written into SSDs. Second, PLC-Cache makes an effort to convert SSD written data into PLC-data as much as possible. Our experimental results based on a practical deduplication system indicate that compared with the existing caching schemes, PLC-Cache shortens data access latency by an average of 23.4%. Importantly, PLC-Cache improves the lifetime of SSD-based caches by reducing the amount of data written to SSDs by a factor of 15.7.

重复数据删除技术通过大幅降低存储系统对空间的需求，提高了成本效益。针对对延迟敏感的主存储中重复数据删除操作导致I/O性能下降的问题，采用了基于ssd的数据缓存。不幸的是，经典缓存算法(如FIFO、LRU和LFU)导致的频繁数据更新不可避免地降低了ssd的I/O处理速度，同时显著缩短了ssd的使用寿命。为了解决这个问题，我们提出了一种新的方法- plc - cache -大大提高了固态硬盘的I/O性能和写入持久性。PLC- cache有利于扩大PLC (Popular and long -term Cached)数据在整个SSD写入数据集中的比例，这些数据不经常写入，并且在很长一段时间内保存在SSD缓存中以催化缓存命中。PLC-Cache提倡两阶段方法。首先，不流行的数据被排除在写入ssd之外。其次，PLC-Cache尽可能地将SSD写入的数据转换为plc数据。基于一个实际的重复数据删除系统的实验结果表明，与现有的缓存方案相比，PLC-Cache平均缩短了23.4%的数据访问延迟。重要的是，PLC-Cache通过将写入ssd的数据量减少15.7倍，提高了基于ssd的缓存的寿命。

{"title":"PLC-cache: Endurable SSD cache for deduplication-based primary storage","authors":"Jian Liu, Yunpeng Chai, X. Qin, Y. Xiao","doi":"10.1109/MSST.2014.6855536","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855536","url":null,"abstract":"Data deduplication techniques improve cost efficiency by dramatically reducing space needs of storage systems. SSD-based data cache has been adopted to remedy the declining I/O performance induced by deduplication operations in the latency-sensitive primary storage. Unfortunately, frequent data updates caused by classical cache algorithms (e.g., FIFO, LRU, and LFU) inevitably slow down SSDs' I/O processing speed while significantly shortening SSDs' lifetime. To address this problem, we propose a new approach-PLC-Cache-to greatly improve the I/O performance as well as write durability of SSDs. PLC-Cache is conducive to amplifying the proportion of the Popular and Long-term Cached (PLC) data, which is infrequently written and kept in SSD cache in a long time period to catalyze cache hits, in an entire SSD written data set. PLC-Cache advocates a two-phase approach. First, non-popular data are ruled out from being written into SSDs. Second, PLC-Cache makes an effort to convert SSD written data into PLC-data as much as possible. Our experimental results based on a practical deduplication system indicate that compared with the existing caching schemes, PLC-Cache shortens data access latency by an average of 23.4%. Importantly, PLC-Cache improves the lifetime of SSD-based caches by reducing the amount of data written to SSDs by a factor of 15.7.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131084956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Toward I/O-efficient protection against silent data corruptions in RAID arrays 迈向I/ o高效保护，防止RAID阵列中的静默数据损坏

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855548

Mingqiang Li, P. Lee

Although RAID is a well-known technique to protect data against disk errors, it is vulnerable to silent data corruptions that cannot be detected by disk drives. Existing integrity protection schemes designed for RAID arrays often introduce high I/O overhead. Our key insight is that by properly designing an integrity protection scheme that adapts to the read/write characteristics of storage workloads, the I/O overhead can be significantly mitigated. In view of this, this paper presents a systematic study on I/O-efficient integrity protection against silent data corruptions in RAID arrays. We formalize an integrity checking model, and justify that a large proportion of disk reads can be checked with simpler and more I/O-efficient integrity checking mechanisms. Based on this integrity checking model, we construct two integrity protection schemes that provide complementary performance advantages for storage workloads with different user write sizes. We further propose a quantitative method for choosing between the two schemes in real-world scenarios. Our trace-driven simulation results show that with the appropriate integrity protection scheme, we can reduce the I/O overhead to below 15%.

尽管RAID是一种众所周知的保护数据不受磁盘错误影响的技术，但它很容易受到磁盘驱动器无法检测到的静默数据损坏的影响。为RAID阵列设计的现有完整性保护方案通常会带来很高的I/O开销。我们的关键见解是，通过适当地设计适应存储工作负载读/写特征的完整性保护方案，可以显著降低I/O开销。鉴于此，本文系统地研究了RAID阵列中I/ o高效完整性保护，以防止数据静默损坏。我们形式化了一个完整性检查模型，并证明可以使用更简单、I/ o效率更高的完整性检查机制来检查大部分磁盘读取。基于该完整性检查模型，我们构建了两种完整性保护方案，为不同用户写大小的存储工作负载提供互补的性能优势。我们进一步提出了一种定量方法，用于在现实场景中选择两种方案。我们的跟踪驱动仿真结果表明，使用适当的完整性保护方案，我们可以将I/O开销降低到15%以下。

{"title":"Toward I/O-efficient protection against silent data corruptions in RAID arrays","authors":"Mingqiang Li, P. Lee","doi":"10.1109/MSST.2014.6855548","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855548","url":null,"abstract":"Although RAID is a well-known technique to protect data against disk errors, it is vulnerable to silent data corruptions that cannot be detected by disk drives. Existing integrity protection schemes designed for RAID arrays often introduce high I/O overhead. Our key insight is that by properly designing an integrity protection scheme that adapts to the read/write characteristics of storage workloads, the I/O overhead can be significantly mitigated. In view of this, this paper presents a systematic study on I/O-efficient integrity protection against silent data corruptions in RAID arrays. We formalize an integrity checking model, and justify that a large proportion of disk reads can be checked with simpler and more I/O-efficient integrity checking mechanisms. Based on this integrity checking model, we construct two integrity protection schemes that provide complementary performance advantages for storage workloads with different user write sizes. We further propose a quantitative method for choosing between the two schemes in real-world scenarios. Our trace-driven simulation results show that with the appropriate integrity protection scheme, we can reduce the I/O overhead to below 15%.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127066673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Advanced magnetic tape technology for linear tape systems: Barium ferrite technology beyond the limitation of metal particulate media 用于线性磁带系统的先进磁带技术:钡铁氧体技术超越了金属颗粒介质的限制

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855556

O. Shimizu, T. Harasawa, H. Noguchi

We surveyed the history of using metal particulate media in linear tape systems to enhance cartridge capacity, discussed the metal particulate media limitations, and introduced advanced barium-ferrite-particulate-media-based magnetic tape technology, focusing on the use of magnetic particles, surface profile design, and particle orientation control. The increase in cartridge capacity has been accelerated by combining barium ferrite particles with ultrathin layer coating technology and by controlling the barium ferrite particle orientation and surface asperities, which reduce the surface frictional force without increasing the head-to-media spacing.

我们回顾了在线性磁带系统中使用金属颗粒介质来提高磁带容量的历史，讨论了金属颗粒介质的局限性，并介绍了基于钡铁氧体颗粒介质的先进磁带技术，重点介绍了磁性颗粒的使用、表面轮廓设计和颗粒方向控制。通过将钡铁氧体颗粒与超薄层涂层技术相结合，以及控制钡铁氧体颗粒的取向和表面凹凸度，可以在不增加头与介质间距的情况下减小表面摩擦力，从而加速了筒体容量的增加。

引用次数: 5

NAND flash architectures reducing write amplification through multi-write codes NAND闪存架构通过多写代码减少写放大

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855549

S. Odeh, Yuval Cassuto

Multi-write codes hold great promise to reduce write amplification in flash-based storage devices. In this work we propose two novel mapping architectures that show clear advantage over known schemes using multi-write codes, and over schemes not using such codes. We demonstrate the advantage of the proposed architectures by evaluating them with industry-accepted benchmark traces. The results show write amplification savings of double-digit percentages, for as low as 10% over-provisioning. In addition to showing the superiority of the new architectures on real-world workloads, the paper includes a study of the write-amplification performance on synthetically-generated workloads with time locality. In addition, some analytical insight is provided to assist the deployment of the architectures in real storage devices with varying device parameters.

多写代码在减少基于闪存的存储设备的写入放大方面具有很大的前景。在这项工作中，我们提出了两种新的映射架构，它们比使用多写代码的已知方案和不使用多写代码的方案显示出明显的优势。我们通过使用行业认可的基准跟踪来评估所提出的体系结构，从而展示了它们的优势。结果显示，对于低至10%的过度配置，写入放大节省了两位数的百分比。除了展示新架构在实际工作负载上的优势外，本文还研究了具有时间局域性的合成生成工作负载上的写放大性能。此外，还提供了一些分析见解，以帮助在具有不同设备参数的实际存储设备中部署体系结构。

引用次数: 30

DedupT: Deduplication for tape systems DedupT:用于磁带系统的重复数据删除

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855555

Abdullah Gharaibeh, C. Constantinescu, Maohua Lu, R. Routray, Anurag Sharma, P. Sarkar, David A. Pease, M. Ripeanu

Deduplication is a commonly-used technique on disk-based storage pools. However, deduplication has not been used for tape-based pools: tape characteristics, such as high mount and seek times combined with data fragmentation resulting from deduplication create a toxic combination that leads to unacceptably high retrieval times. This work proposes DedupT, a system that efficiently supports deduplication on tape pools. This paper (i) details the main challenges to enable efficient deduplication on tape libraries, (ii) presents a class of solutions based on graph-modeling of similarity between data items that enables efficient placement on tapes; and (iii) presents the design and evaluation of novel cross-tape and on-tape chunk placement algorithms that alleviate tape mount time overhead and reduce on-tape data fragmentation. Using 4.5 TB of real-world workloads, we show that DedupT retains at least 95% of the deduplication efficiency. We show that DedupT mitigates major retrieval time overheads, and, due to reading less data, is able to offer better restore performance compared to the case of restoring non-deduplicated data.

重复数据删除是基于磁盘的存储池中常用的一种技术。但是，基于磁带的池还没有使用重复数据删除:磁带的特性，比如高挂载和查找时间，再加上重复数据删除导致的数据碎片，形成了一个有害的组合，导致难以接受的高检索时间。DedupT是一种高效支持磁带池重复数据删除的系统。本文(i)详细介绍了在磁带库上实现高效重复数据删除的主要挑战，(ii)提出了一类基于数据项之间相似性的图形建模的解决方案，这些解决方案能够有效地放置在磁带上;(iii)介绍了新的跨磁带和磁带上块放置算法的设计和评估，这些算法可以减轻磁带挂载时间开销并减少磁带上的数据碎片。使用4.5 TB的实际工作负载，我们发现DedupT至少保留了95%的重复数据删除效率。我们表明，DedupT减少了主要的检索时间开销，并且由于读取的数据较少，与恢复未重复数据删除的数据相比，能够提供更好的恢复性能。

{"title":"DedupT: Deduplication for tape systems","authors":"Abdullah Gharaibeh, C. Constantinescu, Maohua Lu, R. Routray, Anurag Sharma, P. Sarkar, David A. Pease, M. Ripeanu","doi":"10.1109/MSST.2014.6855555","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855555","url":null,"abstract":"Deduplication is a commonly-used technique on disk-based storage pools. However, deduplication has not been used for tape-based pools: tape characteristics, such as high mount and seek times combined with data fragmentation resulting from deduplication create a toxic combination that leads to unacceptably high retrieval times. This work proposes DedupT, a system that efficiently supports deduplication on tape pools. This paper (i) details the main challenges to enable efficient deduplication on tape libraries, (ii) presents a class of solutions based on graph-modeling of similarity between data items that enables efficient placement on tapes; and (iii) presents the design and evaluation of novel cross-tape and on-tape chunk placement algorithms that alleviate tape mount time overhead and reduce on-tape data fragmentation. Using 4.5 TB of real-world workloads, we show that DedupT retains at least 95% of the deduplication efficiency. We show that DedupT mitigates major retrieval time overheads, and, due to reading less data, is able to offer better restore performance compared to the case of restoring non-deduplicated data.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"17 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120845366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

SSD-optimized workload placement with adaptive learning and classification in HPC environments 在HPC环境中使用自适应学习和分类的ssd优化工作负载放置

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855552

Lipeng Wan, Zheng Lu, Qing Cao, Feiyi Wang, S. Oral, B. Settlemyer

In recent years, non-volatile memory devices such as SSD drives have emerged as a viable storage solution due to their increasing capacity and decreasing cost. Due to the unique capability and capacity requirements in large scale HPC (High Performance Computing) storage environment, a hybrid configuration (SSD and HDD) may represent one of the most available and balanced solutions considering the cost and performance. Under this setting, effective data placement as well as movement with controlled overhead become a pressing challenge. In this paper, we propose an integrated object placement and movement framework and adaptive learning algorithms to address these issues. Specifically, we present a method that shuffle data objects across storage tiers to optimize the data access performance. The method also integrates an adaptive learning algorithm where realtime classification is employed to predict the popularity of data object accesses, so that they can be placed on, or migrate between SSD or HDD drives in the most efficient manner. We discuss preliminary results based on this approach using a simulator we developed to show that the proposed methods can dynamically adapt storage placements and access pattern as workloads evolve to achieve the best system level performance such as throughput.

近年来，非易失性存储设备(如SSD驱动器)由于其容量的增加和成本的降低而成为一种可行的存储解决方案。由于大规模HPC(高性能计算)存储环境的独特能力和容量需求，混合配置(SSD和HDD)可能是考虑成本和性能的最可用和最平衡的解决方案之一。在这种情况下，有效的数据放置以及控制开销的移动成为一个紧迫的挑战。在本文中，我们提出了一个集成的对象放置和运动框架以及自适应学习算法来解决这些问题。具体来说，我们提出了一种跨存储层shuffle数据对象以优化数据访问性能的方法。该方法还集成了一种自适应学习算法，采用实时分类预测数据对象访问的流行程度，以便最有效地将数据对象放在SSD或HDD驱动器上或在SSD或HDD驱动器之间迁移。我们使用我们开发的模拟器讨论了基于此方法的初步结果，以表明所提出的方法可以随着工作负载的发展动态地适应存储位置和访问模式，以实现最佳的系统级性能，例如吞吐量。

{"title":"SSD-optimized workload placement with adaptive learning and classification in HPC environments","authors":"Lipeng Wan, Zheng Lu, Qing Cao, Feiyi Wang, S. Oral, B. Settlemyer","doi":"10.1109/MSST.2014.6855552","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855552","url":null,"abstract":"In recent years, non-volatile memory devices such as SSD drives have emerged as a viable storage solution due to their increasing capacity and decreasing cost. Due to the unique capability and capacity requirements in large scale HPC (High Performance Computing) storage environment, a hybrid configuration (SSD and HDD) may represent one of the most available and balanced solutions considering the cost and performance. Under this setting, effective data placement as well as movement with controlled overhead become a pressing challenge. In this paper, we propose an integrated object placement and movement framework and adaptive learning algorithms to address these issues. Specifically, we present a method that shuffle data objects across storage tiers to optimize the data access performance. The method also integrates an adaptive learning algorithm where realtime classification is employed to predict the popularity of data object accesses, so that they can be placed on, or migrate between SSD or HDD drives in the most efficient manner. We discuss preliminary results based on this approach using a simulator we developed to show that the proposed methods can dynamically adapt storage placements and access pattern as workloads evolve to achieve the best system level performance such as throughput.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124030793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Jericho: Achieving scalability through optimal data placement on multicore systems Jericho:通过在多核系统上的最佳数据放置实现可伸缩性

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855538

Stelios Mavridis, Yannis Sfakianakis, Anastasios Papagiannis, M. Marazakis, A. Bilas

Achieving high I/O throughput on modern servers presents significant challenges. With increasing core counts, server memory architectures become less uniform, both in terms of latency as well as bandwidth. In particular, the bandwidth of the interconnect among NUMA nodes is limited compared to local memory bandwidth. Moreover, interconnect congestion and contention introduce additional latency on remote accesses. These challenges severely limit the maximum achievable storage throughput and IOPS rate. Therefore, data and thread placement are critical for data-intensive applications running on NUMA architectures. In this paper we present Jericho, a new I/O stack for the Linux kernel that improves affinity between application threads, kernel threads, and buffers in the storage I/O path. Jericho consists of a NUMA-aware filesystem and a DRAM cache organized in slices mapped to NUMA nodes. The Jericho filesystem implements our task placement policy by dynamically migrating application threads that issue I/Os based on the location of the corresponding I/O buffers. The Jericho DRAM I/O cache, a replacement for the Linux page-cache, splits buffer memory in slices, and uses per-slice kernel I/O threads for I/O request processing. Our evaluation shows that running the FIO microbenchmark on a modern 64-core server with an unmodified Linux kernel results in only 5% of the memory accesses being served by local memory. With Jericho, more than 95% of accesses become local, with a corresponding 2x performance improvement.

在现代服务器上实现高I/O吞吐量提出了重大挑战。随着核心数量的增加，服务器内存体系结构在延迟和带宽方面变得不那么统一。特别是，与本地内存带宽相比，NUMA节点之间互连的带宽是有限的。此外，互连拥塞和争用会给远程访问带来额外的延迟。这些挑战严重限制了可实现的最大存储吞吐量和IOPS。因此，对于在NUMA体系结构上运行的数据密集型应用程序，数据和线程放置是至关重要的。在本文中，我们介绍了Jericho，一个用于Linux内核的新的I/O堆栈，它改善了应用程序线程、内核线程和存储I/O路径中的缓冲区之间的亲和性。Jericho由一个NUMA感知的文件系统和一个DRAM缓存组成，这些缓存组织在映射到NUMA节点的片中。Jericho文件系统通过基于相应I/O缓冲区的位置动态迁移发出I/O的应用程序线程来实现我们的任务放置策略。Jericho DRAM I/O缓存是Linux页缓存的替代品，它将缓冲内存分成片，并使用每片内核I/O线程进行I/O请求处理。我们的评估表明，在使用未修改的Linux内核的现代64核服务器上运行FIO微基准测试，结果只有5%的内存访问由本地内存提供。有了Jericho, 95%以上的访问变成了本地访问，相应的性能提高了2倍。

{"title":"Jericho: Achieving scalability through optimal data placement on multicore systems","authors":"Stelios Mavridis, Yannis Sfakianakis, Anastasios Papagiannis, M. Marazakis, A. Bilas","doi":"10.1109/MSST.2014.6855538","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855538","url":null,"abstract":"Achieving high I/O throughput on modern servers presents significant challenges. With increasing core counts, server memory architectures become less uniform, both in terms of latency as well as bandwidth. In particular, the bandwidth of the interconnect among NUMA nodes is limited compared to local memory bandwidth. Moreover, interconnect congestion and contention introduce additional latency on remote accesses. These challenges severely limit the maximum achievable storage throughput and IOPS rate. Therefore, data and thread placement are critical for data-intensive applications running on NUMA architectures. In this paper we present Jericho, a new I/O stack for the Linux kernel that improves affinity between application threads, kernel threads, and buffers in the storage I/O path. Jericho consists of a NUMA-aware filesystem and a DRAM cache organized in slices mapped to NUMA nodes. The Jericho filesystem implements our task placement policy by dynamically migrating application threads that issue I/Os based on the location of the corresponding I/O buffers. The Jericho DRAM I/O cache, a replacement for the Linux page-cache, splits buffer memory in slices, and uses per-slice kernel I/O threads for I/O request processing. Our evaluation shows that running the FIO microbenchmark on a modern 64-core server with an unmodified Linux kernel results in only 5% of the memory accesses being served by local memory. With Jericho, more than 95% of accesses become local, with a corresponding 2x performance improvement.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121081522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Client-aware cloud storage 客户端感知云存储

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2014-06-02 DOI: 10.1109/MSST.2014.6855554

Feng Chen, M. Mesnier, Scott Hahn

Cloud storage is receiving high interest in both academia and industry. As a new storage model, it provides many attractive features, such as high availability, resilience, and cost efficiency. Yet, cloud storage also brings many new challenges. In particular, it widens the already-significant semantic gap between applications, which generate data, and storage systems, which manage data. This widening semantic gap makes end-to-end differentiated services extremely difficult. In this paper, we present a client-aware cloud storage framework, which allows semantic information to flow from clients, across multiple intermediate layers, to the cloud storage system. In turn, the storage system can differentiate various data classes and enforce predefined policies. We showcase the effectiveness of enabling such client awareness by using Intel's Differentiated Storage Services (DSS) to enhance persistent disk caching and to control I/O traffic to different storage devices. We find that we can significantly outperform LRU-style caching, improving upload bandwidth by 5x and download bandwidth by 1.6x. Further, we can achieve 85% of the performance of a full-SSD solution at only a fraction (14%) of the cost.

云存储正受到学术界和工业界的高度关注。作为一种新的存储模型，它提供了许多吸引人的特性，如高可用性、弹性和成本效率。然而，云存储也带来了许多新的挑战。特别是，它扩大了生成数据的应用程序和管理数据的存储系统之间已经很明显的语义差距。这种不断扩大的语义差距使得端到端差异化服务极其困难。在本文中，我们提出了一个客户端感知的云存储框架，该框架允许语义信息从客户端跨多个中间层流向云存储系统。因此，存储系统可以区分不同的数据类，并执行预定义的策略。通过使用英特尔的差异化存储服务(DSS)来增强持久磁盘缓存和控制到不同存储设备的I/O流量，我们展示了启用这种客户机感知的有效性。我们发现我们可以显著优于lru风格的缓存，将上传带宽提高5倍，下载带宽提高1.6倍。此外，我们可以实现全固态硬盘解决方案85%的性能，而成本仅为前者的一小部分(14%)。

{"title":"Client-aware cloud storage","authors":"Feng Chen, M. Mesnier, Scott Hahn","doi":"10.1109/MSST.2014.6855554","DOIUrl":"https://doi.org/10.1109/MSST.2014.6855554","url":null,"abstract":"Cloud storage is receiving high interest in both academia and industry. As a new storage model, it provides many attractive features, such as high availability, resilience, and cost efficiency. Yet, cloud storage also brings many new challenges. In particular, it widens the already-significant semantic gap between applications, which generate data, and storage systems, which manage data. This widening semantic gap makes end-to-end differentiated services extremely difficult. In this paper, we present a client-aware cloud storage framework, which allows semantic information to flow from clients, across multiple intermediate layers, to the cloud storage system. In turn, the storage system can differentiate various data classes and enforce predefined policies. We showcase the effectiveness of enabling such client awareness by using Intel's Differentiated Storage Services (DSS) to enhance persistent disk caching and to control I/O traffic to different storage devices. We find that we can significantly outperform LRU-style caching, improving upload bandwidth by 5x and download bandwidth by 1.6x. Further, we can achieve 85% of the performance of a full-SSD solution at only a fraction (14%) of the cost.","PeriodicalId":188071,"journal":{"name":"2014 30th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121038819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀