012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)最新文献

英文中文

Shortcut-JFS: A write efficient journaling file system for phase change memory 一个写效率高的日志文件系统，用于相变存储器

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232378

Eunji Lee, S. Yoo, Jee-Eun Jang, H. Bahn

Journaling file systems are widely used in modern computer systems as it provides high reliability with reasonable performance. However, existing journaling file systems are not efficient for emerging PCM (Phase Change Memory) storage. Specifically, a large amount of write operations performed by journaling incur serious performance degradation of PCM storage as it has long write latency. In this paper, we present a new journaling file system for PCM, called Shortcut-JFS, that reduces write amount of journaling by more than a half exploiting the byte-accessibility of PCM. Specifically, Shortcut-JFS performs two novel schemes, 1) differential logging that performs journaling only for modified bytes and 2) in-place checkpointing that removes unnecessary block copy overhead. We implemented Shortcut-JFS on Linux 2.6, and measured the performance of Shortcut-JFS and legacy journaling schemes used in ext 3. The results show that the performance improvement of Shortcut-JFS against ext 3 is 40% on average.

日志文件系统以其高可靠性和合理的性能在现代计算机系统中得到了广泛的应用。然而，现有的日志文件系统对于新兴的PCM(相变存储器)存储并不高效。具体来说，通过日志记录执行的大量写操作会导致PCM存储的严重性能下降，因为它具有很长的写延迟。在本文中，我们提出了一种新的PCM日志文件系统，称为Shortcut-JFS，它利用PCM的字节可访问性，将日志记录的写量减少了一半以上。具体来说，Shortcut-JFS执行两种新颖的模式:1)差分日志记录，只对修改的字节执行日志记录;2)就地检查点，消除不必要的块复制开销。我们在Linux 2.6上实现了Shortcut-JFS，并测量了ext 3中使用的Shortcut-JFS和遗留日志模式的性能。结果表明，与ext 3相比，Shortcut-JFS的性能平均提高了40%。

引用次数: 42

HRAID6ML: A hybrid RAID6 storage architecture with mirrored logging HRAID6ML:带有镜像日志记录的混合RAID6存储架构

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232374

Lingfang Zeng, D. Feng, Jianxi Chen, Q. Wei, B. Veeravalli, Wenguo Liu

The RAID6 provides high reliability using double-parity-update at cost of high write penalty. In this paper, we propose HRAID6ML, a new logging architecture for RAID6 systems for enhanced energy efficiency, performance and reliability. HRAID6ML explores a group of Solid State Drives (SSDs) and Hard Disk Drives (HDDs): Two HDDs (parity disks) and several SSDs form RAID6. The free space of the two parity disks is used as mirrored log region of the whole system to absorb writes. The mirrored logging policy helps to recover system from parity disk failure. Mirrored logging operation does not introduce noticeable performance overhead to the whole system. HRAID6ML eliminates the additional hardware and energy costs, potential single point of failure and performance bottleneck. Furthermore, HRAID6ML prolongs the lifecycle of the SSDs and improves the systems energy efficiency by reducing the SSDs write frequency. We have implemented proposed HRAID6ML. Extensive trace-driven evaluations demonstrate the advantages of the HRAID6ML system over both traditional SSD-based RAID6 system and HDD-based RAID6 system.

RAID6使用双奇偶校验更新提供高可靠性，但代价是高写代价。在本文中，我们提出了一种新的用于RAID6系统的日志架构HRAID6ML，以提高能源效率、性能和可靠性。HRAID6ML研究一组固态驱动器(ssd)和硬盘驱动器(hdd):两个hdd(奇偶校验磁盘)和几个ssd组成RAID6。两个奇偶校验盘的空闲空间用作整个系统的镜像日志区来吸收写操作。镜像日志策略有助于从奇偶磁盘故障中恢复系统。镜像日志操作不会给整个系统带来明显的性能开销。HRAID6ML消除了额外的硬件和能源成本、潜在的单点故障和性能瓶颈。此外，HRAID6ML通过降低ssd写频率，延长了ssd的生命周期，提高了系统的能源效率。我们已经实现了建议的HRAID6ML。广泛的跟踪驱动评估表明，HRAID6ML系统优于传统的基于ssd的RAID6系统和基于hdd的RAID6系统。

{"title":"HRAID6ML: A hybrid RAID6 storage architecture with mirrored logging","authors":"Lingfang Zeng, D. Feng, Jianxi Chen, Q. Wei, B. Veeravalli, Wenguo Liu","doi":"10.1109/MSST.2012.6232374","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232374","url":null,"abstract":"The RAID6 provides high reliability using double-parity-update at cost of high write penalty. In this paper, we propose HRAID6ML, a new logging architecture for RAID6 systems for enhanced energy efficiency, performance and reliability. HRAID6ML explores a group of Solid State Drives (SSDs) and Hard Disk Drives (HDDs): Two HDDs (parity disks) and several SSDs form RAID6. The free space of the two parity disks is used as mirrored log region of the whole system to absorb writes. The mirrored logging policy helps to recover system from parity disk failure. Mirrored logging operation does not introduce noticeable performance overhead to the whole system. HRAID6ML eliminates the additional hardware and energy costs, potential single point of failure and performance bottleneck. Furthermore, HRAID6ML prolongs the lifecycle of the SSDs and improves the systems energy efficiency by reducing the SSDs write frequency. We have implemented proposed HRAID6ML. Extensive trace-driven evaluations demonstrate the advantages of the HRAID6ML system over both traditional SSD-based RAID6 system and HDD-based RAID6 system.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115236945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

SLO-aware hybrid store 慢速感知混合存储

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232385

Priya Sehgal, K. Voruganti, R. Sundaram

In the past storage vendors used different types of storage depending upon the type of workload. For example, they used Solid State Drives (SSDs) or FC hard disks (HDD) for online transaction, while SATA for archival type workloads. However, recently many storage vendors are designing hybrid SSD/HDD based systems that can satisfy multiple service level objectives (SLOs) of different workloads all placed together in one storage box, at better cost points. The combination is achieved by using SSDs as a read-write cache while HDD as a permanent store. In this paper we present an SLO based resource management algorithm that controls the amount of SSD given to a particular workload. This algorithm solves following problems: 1) it ensures that workloads do not interfere with each other 2) it ensure that we do not overprovision (cost wise) the amount of SSD allocated to a workload to satisfy its SLO (latency requirement) and 3) dynamically adjust SSD allocated in light of changing workload characteristics (i.e., provide only required amount of SSD). We have implemented our algorithm in a prototype Hybrid Store, and have tested its efficacy using many real workloads. Our algorithm satisfies latency SLOs almost always by utilizing close to optimal amount of SSD and saving 6-50% of SSD space compared to the naïve algorithm.

在过去，存储供应商根据工作负载的类型使用不同类型的存储。例如，他们使用固态硬盘(ssd)或FC硬盘(HDD)进行在线交易，而使用SATA处理归档类型的工作负载。然而，最近许多存储供应商正在设计基于SSD/HDD的混合系统，这些系统可以满足不同工作负载的多个服务水平目标(slo)，这些工作负载都放在一个存储盒中，成本更低。这种组合是通过使用ssd作为读写缓存，而HDD作为永久存储来实现的。在本文中，我们提出了一种基于SLO的资源管理算法，该算法可以控制给定给特定工作负载的SSD数量。该算法解决了以下问题:1)确保工作负载不会相互干扰;2)确保我们不会为满足工作负载的SLO(延迟要求)而过度配置(成本方面)分配给工作负载的SSD数量;3)根据工作负载特征的变化动态调整分配的SSD(即只提供所需的SSD数量)。我们已经在一个原型混合商店中实现了我们的算法，并使用许多实际工作负载测试了它的有效性。与naïve算法相比，我们的算法几乎总是通过利用接近最优数量的SSD和节省6-50%的SSD空间来满足延迟slo。

{"title":"SLO-aware hybrid store","authors":"Priya Sehgal, K. Voruganti, R. Sundaram","doi":"10.1109/MSST.2012.6232385","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232385","url":null,"abstract":"In the past storage vendors used different types of storage depending upon the type of workload. For example, they used Solid State Drives (SSDs) or FC hard disks (HDD) for online transaction, while SATA for archival type workloads. However, recently many storage vendors are designing hybrid SSD/HDD based systems that can satisfy multiple service level objectives (SLOs) of different workloads all placed together in one storage box, at better cost points. The combination is achieved by using SSDs as a read-write cache while HDD as a permanent store. In this paper we present an SLO based resource management algorithm that controls the amount of SSD given to a particular workload. This algorithm solves following problems: 1) it ensures that workloads do not interfere with each other 2) it ensure that we do not overprovision (cost wise) the amount of SSD allocated to a workload to satisfy its SLO (latency requirement) and 3) dynamically adjust SSD allocated in light of changing workload characteristics (i.e., provide only required amount of SSD). We have implemented our algorithm in a prototype Hybrid Store, and have tested its efficacy using many real workloads. Our algorithm satisfies latency SLOs almost always by utilizing close to optimal amount of SSD and saving 6-50% of SSD space compared to the naïve algorithm.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122312781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Design of an exact data deduplication cluster 设计精确的重复数据删除集群

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232380

J. Kaiser, Dirk Meister, A. Brinkmann, S. Effert

Data deduplication is an important component of enterprise storage environments. The throughput and capacity limitations of single node solutions have led to the development of clustered deduplication systems. Most implemented clustered inline solutions are trading deduplication ratio versus performance and are willing to miss opportunities to detect redundant data, which a single node system would detect. We present an inline deduplication cluster with a joint distributed chunk index, which is able to detect as much redundancy as a single node solution. The use of locality and load balancing paradigms enables the nodes to minimize information exchange. Therefore, we are able to show that, despite different claims in previous papers, it is possible to combine exact deduplication, small chunk sizes, and scalability within one environment using only a commodity GBit Ethernet interconnect. Additionally, we investigate the throughput and scalability limitations with a special focus on the intra-node communication.

重复数据删除是企业存储环境的重要组成部分。单节点解决方案的吞吐量和容量限制导致了集群重复数据删除系统的发展。大多数实现的集群内联解决方案都在重复数据删除比率与性能之间进行权衡，并且愿意错过检测冗余数据的机会，而单节点系统可以检测到冗余数据。我们提出了一个具有联合分布式块索引的内联重复数据删除集群，它能够检测到与单节点解决方案一样多的冗余。局部性和负载平衡范例的使用使节点能够最大限度地减少信息交换。因此，我们能够证明，尽管在以前的论文中有不同的主张，但在一个环境中，仅使用商品gb以太网互连就可以结合精确的重复数据删除、小块大小和可伸缩性。此外，我们研究了吞吐量和可扩展性限制，特别关注节点内通信。

引用次数: 33

Estimation of deduplication ratios in large data sets 估计大型数据集中的重复数据删除比率

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232381

Danny Harnik, Oded Margalit, D. Naor, D. Sotnikov, G. Vernik

We study the problem of accurately estimating the data reduction ratio achieved by deduplication and compression on a specific data set. This turns out to be a challenging task - It has been shown both empirically and analytically that essentially all of the data at hand needs to be inspected in order to come up with a accurate estimation when deduplication is involved. Moreover, even when permitted to inspect all the data, there are challenges in devising an efficient, yet accurate, method. Efficiency in this case refers to the demanding CPU, memory and disk usage associated with deduplication and compression. Our study focuses on what can be done when scanning the entire data set. We present a novel two-phased framework for such estimations. Our techniques are provably accurate, yet run with very low memory requirements and avoid overheads associated with maintaining large deduplication tables. We give formal proofs of the correctness of our algorithm, compare it to existing techniques from the database and streaming literature and evaluate our technique on a number of real world workloads. For example, we estimate the data reduction ratio of a 7 TB data set with accuracy guarantees of at most a 1% relative error while using as little as 1 MB of RAM (and no additional disk access). In the interesting case of full-file deduplication, our framework readily accepts optimizations that allow estimation on a large data set without reading most of the actual data. For one of the workloads we used in this work we achieved accuracy guarantee of 2% relative error while reading only 27% of the data from disk. Our technique is practical, simple to implement, and useful for multiple scenarios, including estimating the number of disks to buy, choosing a deduplication technique, deciding whether to dedupe or not dedupe and conducting large-scale academic studies related to deduplication ratios.

我们研究了在特定数据集上精确估计通过重复数据删除和压缩实现的数据缩减率的问题。事实证明这是一项具有挑战性的任务——经验和分析都表明，在涉及重复数据删除时，基本上需要检查手头的所有数据，以便得出准确的估计。此外，即使允许检查所有数据，在设计一种有效而准确的方法方面也存在挑战。在这种情况下，效率是指与重复数据删除和压缩相关的高要求CPU、内存和磁盘使用。我们的研究重点是在扫描整个数据集时可以做些什么。我们提出了一种新的两阶段估计框架。我们的技术可以证明是准确的，但是运行时内存需求非常低，并且避免了维护大型重复数据删除表的开销。我们给出了我们算法正确性的正式证明，将其与数据库和流媒体文献中的现有技术进行比较，并在许多真实世界的工作负载上评估我们的技术。例如，我们估计一个7 TB数据集的数据缩减率，在使用少到1 MB的RAM(并且没有额外的磁盘访问)的情况下，准确度保证最多有1%的相对误差。在有趣的全文件重复数据删除案例中，我们的框架很容易接受这样的优化，即允许在不读取大部分实际数据的情况下对大型数据集进行估计。对于我们在这项工作中使用的一个工作负载，我们在仅从磁盘读取27%数据的情况下实现了2%相对误差的准确性保证。我们的技术实用、易于实现，可用于多种场景，包括估计购买的磁盘数量、选择重复数据删除技术、决定是否进行重复数据删除以及进行与重复数据删除比率相关的大规模学术研究。

{"title":"Estimation of deduplication ratios in large data sets","authors":"Danny Harnik, Oded Margalit, D. Naor, D. Sotnikov, G. Vernik","doi":"10.1109/MSST.2012.6232381","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232381","url":null,"abstract":"We study the problem of accurately estimating the data reduction ratio achieved by deduplication and compression on a specific data set. This turns out to be a challenging task - It has been shown both empirically and analytically that essentially all of the data at hand needs to be inspected in order to come up with a accurate estimation when deduplication is involved. Moreover, even when permitted to inspect all the data, there are challenges in devising an efficient, yet accurate, method. Efficiency in this case refers to the demanding CPU, memory and disk usage associated with deduplication and compression. Our study focuses on what can be done when scanning the entire data set. We present a novel two-phased framework for such estimations. Our techniques are provably accurate, yet run with very low memory requirements and avoid overheads associated with maintaining large deduplication tables. We give formal proofs of the correctness of our algorithm, compare it to existing techniques from the database and streaming literature and evaluate our technique on a number of real world workloads. For example, we estimate the data reduction ratio of a 7 TB data set with accuracy guarantees of at most a 1% relative error while using as little as 1 MB of RAM (and no additional disk access). In the interesting case of full-file deduplication, our framework readily accepts optimizations that allow estimation on a large data set without reading most of the actual data. For one of the workloads we used in this work we achieved accuracy guarantee of 2% relative error while reading only 27% of the data from disk. Our technique is practical, simple to implement, and useful for multiple scenarios, including estimating the number of disks to buy, choosing a deduplication technique, deciding whether to dedupe or not dedupe and conducting large-scale academic studies related to deduplication ratios.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"01 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128275127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Write amplification due to ECC on flash memory or leave those bit errors alone 写放大由于在闪存上的ECC或留下那些位错误单独

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232375

Sangwhan Moon, A. Reddy

While flash memory is receiving significant attention because of many attractive properties, concerns about write endurance delay the wider deployment of the flash memory. This paper analyzes the effectiveness of protection schemes designed for flash memory, such as ECC and scrubbing. The bit error rate of flash memory is a function of the number of program-erase cycles the cell has gone through, making the reliability dependent on time and workload. Moreover, some of the protection schemes require additional write operations, which degrade flash memory's reliability. These issues make it more complex to reveal the relationship between the protection schemes and flash memory's lifetime. In this paper, a Markov model based analysis of the protection schemes is presented. Our model considers the time varying reliability of flash memory as well as write amplification of various protection schemes such as ECC. Our study shows that write amplification from these various sources can significantly affect the benefits of these schemes in improving the lifetime. Based on the results from our analysis, we propose that bit errors within a page be left uncorrected until a threshold of errors are accumulated. We show that such an approach can significantly improve lifetimes by up to 40%.

虽然快闪记忆体因其许多吸引人的特性而受到广泛关注，但对写入持久性的担忧却延迟了快闪记忆体的广泛部署。本文分析了针对闪存设计的ECC和刷洗等保护方案的有效性。闪存的误码率是单元所经历的程序擦除循环次数的函数，这使得可靠性依赖于时间和工作负载。此外，一些保护方案需要额外的写操作，这降低了闪存的可靠性。这些问题使得揭示保护方案与闪存寿命之间的关系变得更加复杂。本文提出了一种基于马尔科夫模型的保护方案分析方法。我们的模型考虑了闪存的时变可靠性以及各种保护方案(如ECC)的写放大。我们的研究表明，来自这些不同来源的写入放大可以显着影响这些方案在改善寿命方面的好处。根据我们的分析结果，我们建议在错误累积的阈值之前不纠正页面内的比特错误。我们表明，这种方法可以显著提高寿命高达40%。

{"title":"Write amplification due to ECC on flash memory or leave those bit errors alone","authors":"Sangwhan Moon, A. Reddy","doi":"10.1109/MSST.2012.6232375","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232375","url":null,"abstract":"While flash memory is receiving significant attention because of many attractive properties, concerns about write endurance delay the wider deployment of the flash memory. This paper analyzes the effectiveness of protection schemes designed for flash memory, such as ECC and scrubbing. The bit error rate of flash memory is a function of the number of program-erase cycles the cell has gone through, making the reliability dependent on time and workload. Moreover, some of the protection schemes require additional write operations, which degrade flash memory's reliability. These issues make it more complex to reveal the relationship between the protection schemes and flash memory's lifetime. In this paper, a Markov model based analysis of the protection schemes is presented. Our model considers the time varying reliability of flash memory as well as write amplification of various protection schemes such as ECC. Our study shows that write amplification from these various sources can significantly affect the benefits of these schemes in improving the lifetime. Based on the results from our analysis, we propose that bit errors within a page be left uncorrected until a threshold of errors are accumulated. We show that such an approach can significantly improve lifetimes by up to 40%.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125066163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

A QoS aware non-work-conserving disk scheduler 一个支持QoS的不节省工作的磁盘调度器

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232386

Pedro Eugenio Rocha, L. C. E. Bona

Disk schedulers should provide QoS guarantees to applications, thus sharing proportionally the storage resource and enforcing performance isolation. Disk schedulers must execute requests in an efficient order though, preventing poor disk usage. Non-work-conserving disk schedulers help to increase disk throughput by predicting future requests' arrival and therefore exploiting disk spatial locality. Previous work are limited to either provide QoS guarantees or exploit disk spatial locality. In this paper, we propose a new non-work-conserving disk scheduler called High-throughput Token Bucket Scheduler (HTBS), which can provide both QoS guarantees and high throughput by (a) assigning tags to requests in a fair queuing-like fashion and (b) predicting future requests' arrival. We show through experiments with our Linux Kernel implementation that HTBS outperforms previous QoS aware work-conserving disk schedulers throughput as well as provides tight QoS guarantees, unlike other non-work-conserving algorithms.

磁盘调度器应该为应用程序提供QoS保证，从而按比例共享存储资源并实施性能隔离。磁盘调度器必须以有效的顺序执行请求，以防止不良的磁盘使用。不节省工作的磁盘调度器通过预测未来请求的到达，从而利用磁盘空间局域性，帮助提高磁盘吞吐量。以前的工作仅限于提供QoS保证或利用磁盘空间局部性。在本文中，我们提出了一种新的非节省工作的磁盘调度器，称为高吞吐量令牌桶调度器(HTBS)，它可以通过(a)以公平的排队方式为请求分配标签和(b)预测未来请求的到达来提供QoS保证和高吞吐量。我们通过Linux内核实现的实验表明，HTBS优于以前的QoS感知节省工作的磁盘调度器吞吐量，并提供严格的QoS保证，这与其他非节省工作的算法不同。

引用次数: 16

An active storage framework for object storage devices 对象存储设备的活动存储框架

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232372

Michael T. Runde, W. G. Stevens, Paul A. Wortman, J. Chandy

In this paper, we present the design and implementation of an active storage framework for object storage devices. The framework is based on the use of virtual machines/execution engines to execute function code downloaded from client applications. We investigate the issues involved in supporting multiple execution engines. Allowing user-downloadable code fragments introduces potential safety and security considerations, and we study the effect of these considerations on these engines. In particular, we look at various remote procedure execution mechanisms and the efficiency and safety of these mechanisms. Finally, we present performance results of the active storage framework on a variety of applications.

在本文中，我们提出了一个对象存储设备的主动存储框架的设计和实现。该框架基于使用虚拟机/执行引擎来执行从客户端应用程序下载的功能代码。我们调查了支持多个执行引擎所涉及的问题。允许用户下载代码片段会引入潜在的安全和安全性考虑，我们将研究这些考虑对这些引擎的影响。特别地，我们将研究各种远程过程执行机制以及这些机制的效率和安全性。最后，我们给出了主动存储框架在各种应用程序上的性能结果。

引用次数: 18

Active Flash: Out-of-core data analytics on flash storage 活动闪存:闪存存储的外核数据分析

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232366

Simona Boboila, Youngjae Kim, Sudharshan S. Vazhkudai, Peter Desnoyers, G. Shipman

Next generation science will increasingly come to rely on the ability to perform efficient, on-the-fly analytics of data generated by high-performance computing (HPC) simulations, modeling complex physical phenomena. Scientific computing workflows are stymied by the traditional chaining of simulation and data analysis, creating multiple rounds of redundant reads and writes to the storage system, which grows in cost with the ever-increasing gap between compute and storage speeds in HPC clusters. Recent HPC acquisitions have introduced compute node-local flash storage as a means to alleviate this I/O bottleneck. We propose a novel approach, Active Flash, to expedite data analysis pipelines by migrating to the location of the data, the flash device itself. We argue that Active Flash has the potential to enable true out-of-core data analytics by freeing up both the compute core and the associated main memory. By performing analysis locally, dependence on limited bandwidth to a central storage system is reduced, while allowing this analysis to proceed in parallel with the main application. In addition, offloading work from the host to the more power-efficient controller reduces peak system power usage, which is already in the megawatt range and poses a major barrier to HPC system scalability. We propose an architecture for Active Flash, explore energy and performance trade-offs in moving computation from host to storage, demonstrate the ability of appropriate embedded controllers to perform data analysis and reduction tasks at speeds sufficient for this application, and present a simulation study of Active Flash scheduling policies. These results show the viability of the Active Flash model, and its capability to potentially have a transformative impact on scientific data analysis.

下一代科学将越来越依赖于对高性能计算(HPC)模拟生成的数据进行高效、实时分析的能力，并对复杂的物理现象进行建模。传统的模拟和数据分析链化阻碍了科学计算的工作流程，造成了对存储系统的多轮冗余读写，随着高性能计算集群中计算和存储速度之间的差距越来越大，成本也越来越高。最近的HPC收购已经引入了计算节点本地闪存作为缓解这种I/O瓶颈的一种手段。我们提出了一种新颖的方法，Active Flash，通过迁移到数据的位置，即闪存设备本身来加快数据分析管道。我们认为Active Flash有潜力通过释放计算核心和相关的主内存来实现真正的核外数据分析。通过在本地执行分析，减少了对中央存储系统有限带宽的依赖，同时允许该分析与主应用程序并行进行。此外，将工作从主机卸载到更节能的控制器可以降低峰值系统功耗，峰值功耗已经在兆瓦级范围内，这对HPC系统的可扩展性构成了主要障碍。我们提出了一种Active Flash架构，探索将计算从主机移动到存储时的能量和性能权衡，展示了适当的嵌入式控制器以足够的速度执行数据分析和减少任务的能力，并提出了Active Flash调度策略的仿真研究。这些结果显示了Active Flash模型的可行性，以及它对科学数据分析产生变革性影响的潜在能力。

{"title":"Active Flash: Out-of-core data analytics on flash storage","authors":"Simona Boboila, Youngjae Kim, Sudharshan S. Vazhkudai, Peter Desnoyers, G. Shipman","doi":"10.1109/MSST.2012.6232366","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232366","url":null,"abstract":"Next generation science will increasingly come to rely on the ability to perform efficient, on-the-fly analytics of data generated by high-performance computing (HPC) simulations, modeling complex physical phenomena. Scientific computing workflows are stymied by the traditional chaining of simulation and data analysis, creating multiple rounds of redundant reads and writes to the storage system, which grows in cost with the ever-increasing gap between compute and storage speeds in HPC clusters. Recent HPC acquisitions have introduced compute node-local flash storage as a means to alleviate this I/O bottleneck. We propose a novel approach, Active Flash, to expedite data analysis pipelines by migrating to the location of the data, the flash device itself. We argue that Active Flash has the potential to enable true out-of-core data analytics by freeing up both the compute core and the associated main memory. By performing analysis locally, dependence on limited bandwidth to a central storage system is reduced, while allowing this analysis to proceed in parallel with the main application. In addition, offloading work from the host to the more power-efficient controller reduces peak system power usage, which is already in the megawatt range and poses a major barrier to HPC system scalability. We propose an architecture for Active Flash, explore energy and performance trade-offs in moving computation from host to storage, demonstrate the ability of appropriate embedded controllers to perform data analysis and reduction tasks at speeds sufficient for this application, and present a simulation study of Active Flash scheduling policies. These results show the viability of the Active Flash model, and its capability to potentially have a transformative impact on scientific data analysis.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126527699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 67

Integrating flash-based SSDs into the storage stack 将基于闪存的ssd集成到存储堆栈中

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

Pub Date : 2012-04-16 DOI: 10.1109/MSST.2012.6232365

Raja Appuswamy, D. V. Moolenbroek, A. Tanenbaum

Over the past few years, hybrid storage architectures that use high-performance SSDs in concert with high-density HDDs have received significant interest from both industry and academia, due to their capability to improve performance while reducing capital and operating costs. These hybrid architectures differ in their approach to integrating SSDs into the traditional HDD-based storage stack. Of several such possible integrations, two have seen widespread adoption: Caching and Dynamic Storage Tiering. Although the effectiveness of these architectures under certain workloads is well understood, a systematic side-by-side analysis of these approaches remains difficult due to the range of design alternatives and configuration parameters involved. Such a study is required now more than ever to be able to design effective hybrid storage solutions for deployment in increasingly virtualized modern storage installations that blend several workloads into a single stream. In this paper, we first present our extensions to the Loris storage stack that transform it into a framework for designing hybrid storage systems. We then illustrate the flexibility of the framework by designing several Caching and DST-based hybrid systems. Following this, we present a systematic side-by-side analysis of these systems under a range of individual workload types and offer insights into the advantages and disadvantages of each architecture. Finally, we discuss the ramifications of our findings on the design of future hybrid storage systems in the light of recent changes in hardware landscape and application workloads.

在过去的几年里，使用高性能ssd和高密度hdd的混合存储架构已经引起了工业界和学术界的极大兴趣，因为它们能够提高性能，同时降低资本和运营成本。这些混合架构在将ssd集成到传统的基于hdd的存储堆栈中的方法上有所不同。在几种可能的集成中，有两种已经被广泛采用:缓存和动态存储分层。尽管这些架构在某些工作负载下的有效性得到了很好的理解，但是由于涉及的设计替代方案和配置参数的范围，对这些方法进行系统的并行分析仍然很困难。现在比以往任何时候都需要这样的研究，以便能够设计有效的混合存储解决方案，以部署在日益虚拟化的现代存储安装中，将多个工作负载混合到单个流中。在本文中，我们首先介绍了对Loris存储堆栈的扩展，将其转换为设计混合存储系统的框架。然后，我们通过设计几个基于缓存和dst的混合系统来说明框架的灵活性。接下来，我们将在一系列单独的工作负载类型下对这些系统进行系统的并行分析，并深入了解每种体系结构的优缺点。最后，根据最近硬件环境和应用程序工作负载的变化，讨论了我们的研究结果对未来混合存储系统设计的影响。

{"title":"Integrating flash-based SSDs into the storage stack","authors":"Raja Appuswamy, D. V. Moolenbroek, A. Tanenbaum","doi":"10.1109/MSST.2012.6232365","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232365","url":null,"abstract":"Over the past few years, hybrid storage architectures that use high-performance SSDs in concert with high-density HDDs have received significant interest from both industry and academia, due to their capability to improve performance while reducing capital and operating costs. These hybrid architectures differ in their approach to integrating SSDs into the traditional HDD-based storage stack. Of several such possible integrations, two have seen widespread adoption: Caching and Dynamic Storage Tiering. Although the effectiveness of these architectures under certain workloads is well understood, a systematic side-by-side analysis of these approaches remains difficult due to the range of design alternatives and configuration parameters involved. Such a study is required now more than ever to be able to design effective hybrid storage solutions for deployment in increasingly virtualized modern storage installations that blend several workloads into a single stream. In this paper, we first present our extensions to the Loris storage stack that transform it into a framework for designing hybrid storage systems. We then illustrate the flexibility of the framework by designing several Caching and DST-based hybrid systems. Following this, we present a systematic side-by-side analysis of these systems under a range of individual workload types and offer insights into the advantages and disadvantages of each architecture. Finally, we discuss the ramifications of our findings on the design of future hybrid storage systems in the light of recent changes in hardware landscape and application workloads.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124973034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀