Pub Date : 2012-04-16DOI: 10.1109/MSST.2012.6232378
Eunji Lee, S. Yoo, Jee-Eun Jang, H. Bahn
Journaling file systems are widely used in modern computer systems as it provides high reliability with reasonable performance. However, existing journaling file systems are not efficient for emerging PCM (Phase Change Memory) storage. Specifically, a large amount of write operations performed by journaling incur serious performance degradation of PCM storage as it has long write latency. In this paper, we present a new journaling file system for PCM, called Shortcut-JFS, that reduces write amount of journaling by more than a half exploiting the byte-accessibility of PCM. Specifically, Shortcut-JFS performs two novel schemes, 1) differential logging that performs journaling only for modified bytes and 2) in-place checkpointing that removes unnecessary block copy overhead. We implemented Shortcut-JFS on Linux 2.6, and measured the performance of Shortcut-JFS and legacy journaling schemes used in ext 3. The results show that the performance improvement of Shortcut-JFS against ext 3 is 40% on average.
{"title":"Shortcut-JFS: A write efficient journaling file system for phase change memory","authors":"Eunji Lee, S. Yoo, Jee-Eun Jang, H. Bahn","doi":"10.1109/MSST.2012.6232378","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232378","url":null,"abstract":"Journaling file systems are widely used in modern computer systems as it provides high reliability with reasonable performance. However, existing journaling file systems are not efficient for emerging PCM (Phase Change Memory) storage. Specifically, a large amount of write operations performed by journaling incur serious performance degradation of PCM storage as it has long write latency. In this paper, we present a new journaling file system for PCM, called Shortcut-JFS, that reduces write amount of journaling by more than a half exploiting the byte-accessibility of PCM. Specifically, Shortcut-JFS performs two novel schemes, 1) differential logging that performs journaling only for modified bytes and 2) in-place checkpointing that removes unnecessary block copy overhead. We implemented Shortcut-JFS on Linux 2.6, and measured the performance of Shortcut-JFS and legacy journaling schemes used in ext 3. The results show that the performance improvement of Shortcut-JFS against ext 3 is 40% on average.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117100981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-16DOI: 10.1109/MSST.2012.6232374
Lingfang Zeng, D. Feng, Jianxi Chen, Q. Wei, B. Veeravalli, Wenguo Liu
The RAID6 provides high reliability using double-parity-update at cost of high write penalty. In this paper, we propose HRAID6ML, a new logging architecture for RAID6 systems for enhanced energy efficiency, performance and reliability. HRAID6ML explores a group of Solid State Drives (SSDs) and Hard Disk Drives (HDDs): Two HDDs (parity disks) and several SSDs form RAID6. The free space of the two parity disks is used as mirrored log region of the whole system to absorb writes. The mirrored logging policy helps to recover system from parity disk failure. Mirrored logging operation does not introduce noticeable performance overhead to the whole system. HRAID6ML eliminates the additional hardware and energy costs, potential single point of failure and performance bottleneck. Furthermore, HRAID6ML prolongs the lifecycle of the SSDs and improves the systems energy efficiency by reducing the SSDs write frequency. We have implemented proposed HRAID6ML. Extensive trace-driven evaluations demonstrate the advantages of the HRAID6ML system over both traditional SSD-based RAID6 system and HDD-based RAID6 system.
{"title":"HRAID6ML: A hybrid RAID6 storage architecture with mirrored logging","authors":"Lingfang Zeng, D. Feng, Jianxi Chen, Q. Wei, B. Veeravalli, Wenguo Liu","doi":"10.1109/MSST.2012.6232374","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232374","url":null,"abstract":"The RAID6 provides high reliability using double-parity-update at cost of high write penalty. In this paper, we propose HRAID6ML, a new logging architecture for RAID6 systems for enhanced energy efficiency, performance and reliability. HRAID6ML explores a group of Solid State Drives (SSDs) and Hard Disk Drives (HDDs): Two HDDs (parity disks) and several SSDs form RAID6. The free space of the two parity disks is used as mirrored log region of the whole system to absorb writes. The mirrored logging policy helps to recover system from parity disk failure. Mirrored logging operation does not introduce noticeable performance overhead to the whole system. HRAID6ML eliminates the additional hardware and energy costs, potential single point of failure and performance bottleneck. Furthermore, HRAID6ML prolongs the lifecycle of the SSDs and improves the systems energy efficiency by reducing the SSDs write frequency. We have implemented proposed HRAID6ML. Extensive trace-driven evaluations demonstrate the advantages of the HRAID6ML system over both traditional SSD-based RAID6 system and HDD-based RAID6 system.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115236945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-16DOI: 10.1109/MSST.2012.6232385
Priya Sehgal, K. Voruganti, R. Sundaram
In the past storage vendors used different types of storage depending upon the type of workload. For example, they used Solid State Drives (SSDs) or FC hard disks (HDD) for online transaction, while SATA for archival type workloads. However, recently many storage vendors are designing hybrid SSD/HDD based systems that can satisfy multiple service level objectives (SLOs) of different workloads all placed together in one storage box, at better cost points. The combination is achieved by using SSDs as a read-write cache while HDD as a permanent store. In this paper we present an SLO based resource management algorithm that controls the amount of SSD given to a particular workload. This algorithm solves following problems: 1) it ensures that workloads do not interfere with each other 2) it ensure that we do not overprovision (cost wise) the amount of SSD allocated to a workload to satisfy its SLO (latency requirement) and 3) dynamically adjust SSD allocated in light of changing workload characteristics (i.e., provide only required amount of SSD). We have implemented our algorithm in a prototype Hybrid Store, and have tested its efficacy using many real workloads. Our algorithm satisfies latency SLOs almost always by utilizing close to optimal amount of SSD and saving 6-50% of SSD space compared to the naïve algorithm.
{"title":"SLO-aware hybrid store","authors":"Priya Sehgal, K. Voruganti, R. Sundaram","doi":"10.1109/MSST.2012.6232385","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232385","url":null,"abstract":"In the past storage vendors used different types of storage depending upon the type of workload. For example, they used Solid State Drives (SSDs) or FC hard disks (HDD) for online transaction, while SATA for archival type workloads. However, recently many storage vendors are designing hybrid SSD/HDD based systems that can satisfy multiple service level objectives (SLOs) of different workloads all placed together in one storage box, at better cost points. The combination is achieved by using SSDs as a read-write cache while HDD as a permanent store. In this paper we present an SLO based resource management algorithm that controls the amount of SSD given to a particular workload. This algorithm solves following problems: 1) it ensures that workloads do not interfere with each other 2) it ensure that we do not overprovision (cost wise) the amount of SSD allocated to a workload to satisfy its SLO (latency requirement) and 3) dynamically adjust SSD allocated in light of changing workload characteristics (i.e., provide only required amount of SSD). We have implemented our algorithm in a prototype Hybrid Store, and have tested its efficacy using many real workloads. Our algorithm satisfies latency SLOs almost always by utilizing close to optimal amount of SSD and saving 6-50% of SSD space compared to the naïve algorithm.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122312781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-16DOI: 10.1109/MSST.2012.6232380
J. Kaiser, Dirk Meister, A. Brinkmann, S. Effert
Data deduplication is an important component of enterprise storage environments. The throughput and capacity limitations of single node solutions have led to the development of clustered deduplication systems. Most implemented clustered inline solutions are trading deduplication ratio versus performance and are willing to miss opportunities to detect redundant data, which a single node system would detect. We present an inline deduplication cluster with a joint distributed chunk index, which is able to detect as much redundancy as a single node solution. The use of locality and load balancing paradigms enables the nodes to minimize information exchange. Therefore, we are able to show that, despite different claims in previous papers, it is possible to combine exact deduplication, small chunk sizes, and scalability within one environment using only a commodity GBit Ethernet interconnect. Additionally, we investigate the throughput and scalability limitations with a special focus on the intra-node communication.
{"title":"Design of an exact data deduplication cluster","authors":"J. Kaiser, Dirk Meister, A. Brinkmann, S. Effert","doi":"10.1109/MSST.2012.6232380","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232380","url":null,"abstract":"Data deduplication is an important component of enterprise storage environments. The throughput and capacity limitations of single node solutions have led to the development of clustered deduplication systems. Most implemented clustered inline solutions are trading deduplication ratio versus performance and are willing to miss opportunities to detect redundant data, which a single node system would detect. We present an inline deduplication cluster with a joint distributed chunk index, which is able to detect as much redundancy as a single node solution. The use of locality and load balancing paradigms enables the nodes to minimize information exchange. Therefore, we are able to show that, despite different claims in previous papers, it is possible to combine exact deduplication, small chunk sizes, and scalability within one environment using only a commodity GBit Ethernet interconnect. Additionally, we investigate the throughput and scalability limitations with a special focus on the intra-node communication.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126277134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-16DOI: 10.1109/MSST.2012.6232381
Danny Harnik, Oded Margalit, D. Naor, D. Sotnikov, G. Vernik
We study the problem of accurately estimating the data reduction ratio achieved by deduplication and compression on a specific data set. This turns out to be a challenging task - It has been shown both empirically and analytically that essentially all of the data at hand needs to be inspected in order to come up with a accurate estimation when deduplication is involved. Moreover, even when permitted to inspect all the data, there are challenges in devising an efficient, yet accurate, method. Efficiency in this case refers to the demanding CPU, memory and disk usage associated with deduplication and compression. Our study focuses on what can be done when scanning the entire data set. We present a novel two-phased framework for such estimations. Our techniques are provably accurate, yet run with very low memory requirements and avoid overheads associated with maintaining large deduplication tables. We give formal proofs of the correctness of our algorithm, compare it to existing techniques from the database and streaming literature and evaluate our technique on a number of real world workloads. For example, we estimate the data reduction ratio of a 7 TB data set with accuracy guarantees of at most a 1% relative error while using as little as 1 MB of RAM (and no additional disk access). In the interesting case of full-file deduplication, our framework readily accepts optimizations that allow estimation on a large data set without reading most of the actual data. For one of the workloads we used in this work we achieved accuracy guarantee of 2% relative error while reading only 27% of the data from disk. Our technique is practical, simple to implement, and useful for multiple scenarios, including estimating the number of disks to buy, choosing a deduplication technique, deciding whether to dedupe or not dedupe and conducting large-scale academic studies related to deduplication ratios.
{"title":"Estimation of deduplication ratios in large data sets","authors":"Danny Harnik, Oded Margalit, D. Naor, D. Sotnikov, G. Vernik","doi":"10.1109/MSST.2012.6232381","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232381","url":null,"abstract":"We study the problem of accurately estimating the data reduction ratio achieved by deduplication and compression on a specific data set. This turns out to be a challenging task - It has been shown both empirically and analytically that essentially all of the data at hand needs to be inspected in order to come up with a accurate estimation when deduplication is involved. Moreover, even when permitted to inspect all the data, there are challenges in devising an efficient, yet accurate, method. Efficiency in this case refers to the demanding CPU, memory and disk usage associated with deduplication and compression. Our study focuses on what can be done when scanning the entire data set. We present a novel two-phased framework for such estimations. Our techniques are provably accurate, yet run with very low memory requirements and avoid overheads associated with maintaining large deduplication tables. We give formal proofs of the correctness of our algorithm, compare it to existing techniques from the database and streaming literature and evaluate our technique on a number of real world workloads. For example, we estimate the data reduction ratio of a 7 TB data set with accuracy guarantees of at most a 1% relative error while using as little as 1 MB of RAM (and no additional disk access). In the interesting case of full-file deduplication, our framework readily accepts optimizations that allow estimation on a large data set without reading most of the actual data. For one of the workloads we used in this work we achieved accuracy guarantee of 2% relative error while reading only 27% of the data from disk. Our technique is practical, simple to implement, and useful for multiple scenarios, including estimating the number of disks to buy, choosing a deduplication technique, deciding whether to dedupe or not dedupe and conducting large-scale academic studies related to deduplication ratios.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"01 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128275127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-16DOI: 10.1109/MSST.2012.6232375
Sangwhan Moon, A. Reddy
While flash memory is receiving significant attention because of many attractive properties, concerns about write endurance delay the wider deployment of the flash memory. This paper analyzes the effectiveness of protection schemes designed for flash memory, such as ECC and scrubbing. The bit error rate of flash memory is a function of the number of program-erase cycles the cell has gone through, making the reliability dependent on time and workload. Moreover, some of the protection schemes require additional write operations, which degrade flash memory's reliability. These issues make it more complex to reveal the relationship between the protection schemes and flash memory's lifetime. In this paper, a Markov model based analysis of the protection schemes is presented. Our model considers the time varying reliability of flash memory as well as write amplification of various protection schemes such as ECC. Our study shows that write amplification from these various sources can significantly affect the benefits of these schemes in improving the lifetime. Based on the results from our analysis, we propose that bit errors within a page be left uncorrected until a threshold of errors are accumulated. We show that such an approach can significantly improve lifetimes by up to 40%.
{"title":"Write amplification due to ECC on flash memory or leave those bit errors alone","authors":"Sangwhan Moon, A. Reddy","doi":"10.1109/MSST.2012.6232375","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232375","url":null,"abstract":"While flash memory is receiving significant attention because of many attractive properties, concerns about write endurance delay the wider deployment of the flash memory. This paper analyzes the effectiveness of protection schemes designed for flash memory, such as ECC and scrubbing. The bit error rate of flash memory is a function of the number of program-erase cycles the cell has gone through, making the reliability dependent on time and workload. Moreover, some of the protection schemes require additional write operations, which degrade flash memory's reliability. These issues make it more complex to reveal the relationship between the protection schemes and flash memory's lifetime. In this paper, a Markov model based analysis of the protection schemes is presented. Our model considers the time varying reliability of flash memory as well as write amplification of various protection schemes such as ECC. Our study shows that write amplification from these various sources can significantly affect the benefits of these schemes in improving the lifetime. Based on the results from our analysis, we propose that bit errors within a page be left uncorrected until a threshold of errors are accumulated. We show that such an approach can significantly improve lifetimes by up to 40%.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125066163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-16DOI: 10.1109/MSST.2012.6232386
Pedro Eugenio Rocha, L. C. E. Bona
Disk schedulers should provide QoS guarantees to applications, thus sharing proportionally the storage resource and enforcing performance isolation. Disk schedulers must execute requests in an efficient order though, preventing poor disk usage. Non-work-conserving disk schedulers help to increase disk throughput by predicting future requests' arrival and therefore exploiting disk spatial locality. Previous work are limited to either provide QoS guarantees or exploit disk spatial locality. In this paper, we propose a new non-work-conserving disk scheduler called High-throughput Token Bucket Scheduler (HTBS), which can provide both QoS guarantees and high throughput by (a) assigning tags to requests in a fair queuing-like fashion and (b) predicting future requests' arrival. We show through experiments with our Linux Kernel implementation that HTBS outperforms previous QoS aware work-conserving disk schedulers throughput as well as provides tight QoS guarantees, unlike other non-work-conserving algorithms.
{"title":"A QoS aware non-work-conserving disk scheduler","authors":"Pedro Eugenio Rocha, L. C. E. Bona","doi":"10.1109/MSST.2012.6232386","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232386","url":null,"abstract":"Disk schedulers should provide QoS guarantees to applications, thus sharing proportionally the storage resource and enforcing performance isolation. Disk schedulers must execute requests in an efficient order though, preventing poor disk usage. Non-work-conserving disk schedulers help to increase disk throughput by predicting future requests' arrival and therefore exploiting disk spatial locality. Previous work are limited to either provide QoS guarantees or exploit disk spatial locality. In this paper, we propose a new non-work-conserving disk scheduler called High-throughput Token Bucket Scheduler (HTBS), which can provide both QoS guarantees and high throughput by (a) assigning tags to requests in a fair queuing-like fashion and (b) predicting future requests' arrival. We show through experiments with our Linux Kernel implementation that HTBS outperforms previous QoS aware work-conserving disk schedulers throughput as well as provides tight QoS guarantees, unlike other non-work-conserving algorithms.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127750618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-16DOI: 10.1109/MSST.2012.6232372
Michael T. Runde, W. G. Stevens, Paul A. Wortman, J. Chandy
In this paper, we present the design and implementation of an active storage framework for object storage devices. The framework is based on the use of virtual machines/execution engines to execute function code downloaded from client applications. We investigate the issues involved in supporting multiple execution engines. Allowing user-downloadable code fragments introduces potential safety and security considerations, and we study the effect of these considerations on these engines. In particular, we look at various remote procedure execution mechanisms and the efficiency and safety of these mechanisms. Finally, we present performance results of the active storage framework on a variety of applications.
{"title":"An active storage framework for object storage devices","authors":"Michael T. Runde, W. G. Stevens, Paul A. Wortman, J. Chandy","doi":"10.1109/MSST.2012.6232372","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232372","url":null,"abstract":"In this paper, we present the design and implementation of an active storage framework for object storage devices. The framework is based on the use of virtual machines/execution engines to execute function code downloaded from client applications. We investigate the issues involved in supporting multiple execution engines. Allowing user-downloadable code fragments introduces potential safety and security considerations, and we study the effect of these considerations on these engines. In particular, we look at various remote procedure execution mechanisms and the efficiency and safety of these mechanisms. Finally, we present performance results of the active storage framework on a variety of applications.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126227082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-16DOI: 10.1109/MSST.2012.6232366
Simona Boboila, Youngjae Kim, Sudharshan S. Vazhkudai, Peter Desnoyers, G. Shipman
Next generation science will increasingly come to rely on the ability to perform efficient, on-the-fly analytics of data generated by high-performance computing (HPC) simulations, modeling complex physical phenomena. Scientific computing workflows are stymied by the traditional chaining of simulation and data analysis, creating multiple rounds of redundant reads and writes to the storage system, which grows in cost with the ever-increasing gap between compute and storage speeds in HPC clusters. Recent HPC acquisitions have introduced compute node-local flash storage as a means to alleviate this I/O bottleneck. We propose a novel approach, Active Flash, to expedite data analysis pipelines by migrating to the location of the data, the flash device itself. We argue that Active Flash has the potential to enable true out-of-core data analytics by freeing up both the compute core and the associated main memory. By performing analysis locally, dependence on limited bandwidth to a central storage system is reduced, while allowing this analysis to proceed in parallel with the main application. In addition, offloading work from the host to the more power-efficient controller reduces peak system power usage, which is already in the megawatt range and poses a major barrier to HPC system scalability. We propose an architecture for Active Flash, explore energy and performance trade-offs in moving computation from host to storage, demonstrate the ability of appropriate embedded controllers to perform data analysis and reduction tasks at speeds sufficient for this application, and present a simulation study of Active Flash scheduling policies. These results show the viability of the Active Flash model, and its capability to potentially have a transformative impact on scientific data analysis.
{"title":"Active Flash: Out-of-core data analytics on flash storage","authors":"Simona Boboila, Youngjae Kim, Sudharshan S. Vazhkudai, Peter Desnoyers, G. Shipman","doi":"10.1109/MSST.2012.6232366","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232366","url":null,"abstract":"Next generation science will increasingly come to rely on the ability to perform efficient, on-the-fly analytics of data generated by high-performance computing (HPC) simulations, modeling complex physical phenomena. Scientific computing workflows are stymied by the traditional chaining of simulation and data analysis, creating multiple rounds of redundant reads and writes to the storage system, which grows in cost with the ever-increasing gap between compute and storage speeds in HPC clusters. Recent HPC acquisitions have introduced compute node-local flash storage as a means to alleviate this I/O bottleneck. We propose a novel approach, Active Flash, to expedite data analysis pipelines by migrating to the location of the data, the flash device itself. We argue that Active Flash has the potential to enable true out-of-core data analytics by freeing up both the compute core and the associated main memory. By performing analysis locally, dependence on limited bandwidth to a central storage system is reduced, while allowing this analysis to proceed in parallel with the main application. In addition, offloading work from the host to the more power-efficient controller reduces peak system power usage, which is already in the megawatt range and poses a major barrier to HPC system scalability. We propose an architecture for Active Flash, explore energy and performance trade-offs in moving computation from host to storage, demonstrate the ability of appropriate embedded controllers to perform data analysis and reduction tasks at speeds sufficient for this application, and present a simulation study of Active Flash scheduling policies. These results show the viability of the Active Flash model, and its capability to potentially have a transformative impact on scientific data analysis.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126527699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-16DOI: 10.1109/MSST.2012.6232365
Raja Appuswamy, D. V. Moolenbroek, A. Tanenbaum
Over the past few years, hybrid storage architectures that use high-performance SSDs in concert with high-density HDDs have received significant interest from both industry and academia, due to their capability to improve performance while reducing capital and operating costs. These hybrid architectures differ in their approach to integrating SSDs into the traditional HDD-based storage stack. Of several such possible integrations, two have seen widespread adoption: Caching and Dynamic Storage Tiering. Although the effectiveness of these architectures under certain workloads is well understood, a systematic side-by-side analysis of these approaches remains difficult due to the range of design alternatives and configuration parameters involved. Such a study is required now more than ever to be able to design effective hybrid storage solutions for deployment in increasingly virtualized modern storage installations that blend several workloads into a single stream. In this paper, we first present our extensions to the Loris storage stack that transform it into a framework for designing hybrid storage systems. We then illustrate the flexibility of the framework by designing several Caching and DST-based hybrid systems. Following this, we present a systematic side-by-side analysis of these systems under a range of individual workload types and offer insights into the advantages and disadvantages of each architecture. Finally, we discuss the ramifications of our findings on the design of future hybrid storage systems in the light of recent changes in hardware landscape and application workloads.
{"title":"Integrating flash-based SSDs into the storage stack","authors":"Raja Appuswamy, D. V. Moolenbroek, A. Tanenbaum","doi":"10.1109/MSST.2012.6232365","DOIUrl":"https://doi.org/10.1109/MSST.2012.6232365","url":null,"abstract":"Over the past few years, hybrid storage architectures that use high-performance SSDs in concert with high-density HDDs have received significant interest from both industry and academia, due to their capability to improve performance while reducing capital and operating costs. These hybrid architectures differ in their approach to integrating SSDs into the traditional HDD-based storage stack. Of several such possible integrations, two have seen widespread adoption: Caching and Dynamic Storage Tiering. Although the effectiveness of these architectures under certain workloads is well understood, a systematic side-by-side analysis of these approaches remains difficult due to the range of design alternatives and configuration parameters involved. Such a study is required now more than ever to be able to design effective hybrid storage solutions for deployment in increasingly virtualized modern storage installations that blend several workloads into a single stream. In this paper, we first present our extensions to the Loris storage stack that transform it into a framework for designing hybrid storage systems. We then illustrate the flexibility of the framework by designing several Caching and DST-based hybrid systems. Following this, we present a systematic side-by-side analysis of these systems under a range of individual workload types and offer insights into the advantages and disadvantages of each architecture. Finally, we discuss the ramifications of our findings on the design of future hybrid storage systems in the light of recent changes in hardware landscape and application workloads.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124973034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}