Lu Cheng, Pengju Shang, S. Sehrish, Grant Mackey, Jun Wang
The data generated by scientific simulation, sensor, monitor or optical telescope has increased with dramatic speed. In order to analyze the raw data fast and space efficiently, data pre-process operation is needed to achieve better performance in data analysis phase. Current research shows an increasing tread of adopting MapReduce framework for large scale data processing. However, the data access patterns which generally applied to scientific data set are not supported by current MapReduce framework directly. The gap between the requirement from analytics application and the property of MapReduce framework motivates us to provide support for these data access patterns in MapReduce framework. In our work, we studied the data access patterns in matrix files and proposed a new concentric data layout solution to facilitate matrix data access and analysis in MapReduce framework. Concentric data layout is a hierarchical data layout which maintains the dimensional property in large data sets. Contrary to the continuous data layout adopted in current Hadoop framework, concentric data layout stores the data from the same sub-matrix into one chunk, and then stores chunks symmetrically in a higher level. This matches well with the matrix like computation. The concentric data layout preprocesses the data beforehand, and optimizes the afterward run of MapReduce application. The experiments show that the concentric data layout improves the overall performance, reduces the execution time by about 38% when reading a 64 GB file. It also mitigates the unused data read overhead and increases the useful data efficiency by 32% on average.
{"title":"Concentric Layout, a New Scientific Data Distribution Scheme in Hadoop File System","authors":"Lu Cheng, Pengju Shang, S. Sehrish, Grant Mackey, Jun Wang","doi":"10.1109/NAS.2010.59","DOIUrl":"https://doi.org/10.1109/NAS.2010.59","url":null,"abstract":"The data generated by scientific simulation, sensor, monitor or optical telescope has increased with dramatic speed. In order to analyze the raw data fast and space efficiently, data pre-process operation is needed to achieve better performance in data analysis phase. Current research shows an increasing tread of adopting MapReduce framework for large scale data processing. However, the data access patterns which generally applied to scientific data set are not supported by current MapReduce framework directly. The gap between the requirement from analytics application and the property of MapReduce framework motivates us to provide support for these data access patterns in MapReduce framework. In our work, we studied the data access patterns in matrix files and proposed a new concentric data layout solution to facilitate matrix data access and analysis in MapReduce framework. Concentric data layout is a hierarchical data layout which maintains the dimensional property in large data sets. Contrary to the continuous data layout adopted in current Hadoop framework, concentric data layout stores the data from the same sub-matrix into one chunk, and then stores chunks symmetrically in a higher level. This matches well with the matrix like computation. The concentric data layout preprocesses the data beforehand, and optimizes the afterward run of MapReduce application. The experiments show that the concentric data layout improves the overall performance, reduces the execution time by about 38% when reading a 64 GB file. It also mitigates the unused data read overhead and increases the useful data efficiency by 32% on average.","PeriodicalId":284549,"journal":{"name":"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115455510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tu Quach Ngoc, Jonghyun Lee, Kyung Jun Gil, Karpjoo Jeong, S. Lim
In this paper, we present a novel approach to micro-scale air quality monitoring for urban areas. This approach is based on two major technologies: wireless sensor networks (WSN) and service-oriented architecture (SOA). We discuss technical issues such as architectural designs, system integration, and user interfaces. We present a prototype system developed for the Konkuk University which uses an Enterprise Service Bus (ESB) system called ServiceMix.
{"title":"An ESB Based Micro-scale Urban Air Quality Monitoring System","authors":"Tu Quach Ngoc, Jonghyun Lee, Kyung Jun Gil, Karpjoo Jeong, S. Lim","doi":"10.1109/NAS.2010.60","DOIUrl":"https://doi.org/10.1109/NAS.2010.60","url":null,"abstract":"In this paper, we present a novel approach to micro-scale air quality monitoring for urban areas. This approach is based on two major technologies: wireless sensor networks (WSN) and service-oriented architecture (SOA). We discuss technical issues such as architectural designs, system integration, and user interfaces. We present a prototype system developed for the Konkuk University which uses an Enterprise Service Bus (ESB) system called ServiceMix.","PeriodicalId":284549,"journal":{"name":"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125158926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liang Ma, Caijun Zhen, Bin Zhao, Jingwei Ma, G. Wang, X. Liu
Backup technology based on data de-duplication has become a hot topic in nowadays. In order to get a better performance, traditional research is mainly focused on decreasing the disk access time. In this paper, we consider computing complexity problem in data de-duplication system, and try to improve system performance by reducing computing time. We put computing tasks on commodity coprocessor to speed up the computing process. Compared with general-purpose processors, commodity coprocessors have lower energy consumption and lower cost. Experimental results show that they have equal or even better performance compared with general-purpose processors.
{"title":"Towards Fast De-duplication Using Low Energy Coprocessor","authors":"Liang Ma, Caijun Zhen, Bin Zhao, Jingwei Ma, G. Wang, X. Liu","doi":"10.1109/NAS.2010.29","DOIUrl":"https://doi.org/10.1109/NAS.2010.29","url":null,"abstract":"Backup technology based on data de-duplication has become a hot topic in nowadays. In order to get a better performance, traditional research is mainly focused on decreasing the disk access time. In this paper, we consider computing complexity problem in data de-duplication system, and try to improve system performance by reducing computing time. We put computing tasks on commodity coprocessor to speed up the computing process. Compared with general-purpose processors, commodity coprocessors have lower energy consumption and lower cost. Experimental results show that they have equal or even better performance compared with general-purpose processors.","PeriodicalId":284549,"journal":{"name":"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126096072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Beyond the storage savings brought by chunk-level de-duplication in backup and archiving systems, a prominent challenge facing this technology is how to efficiently and effectively identify the duplicate chunks. Most of the chunk fingerprints used to identify individual chunks are stored on disks due to the limited main memory capacity. Checking for chunk fingerprint match on disk for every input chunk is known to be a severe performance bottleneck for the backup process. On the other hand, our intuitions and analyses of real backup data both indicate that duplicate chunks tend to strongly concentrate according to the data ownership. Motivated by this observation and to avoid or alleviate the aforementioned backup performance bottleneck, we propose DAM, a dataownership-aware multi-layered de-duplication scheme that exploits the data chunks’ ownership and uses a tri-layered de-duplication approach to narrow the search space for duplicate chunks to reduce the total disk accesses. Our experimental results with real world datasets on DAM show it reduces the disk accesses by an average of 60.8% and shortens the de-duplication time by an average of 46.3%.
{"title":"DAM: A DataOwnership-Aware Multi-layered De-duplication Scheme","authors":"Yujuan Tan, D. Feng, Zhichao Yan, Guohui Zhou","doi":"10.1109/NAS.2010.57","DOIUrl":"https://doi.org/10.1109/NAS.2010.57","url":null,"abstract":"Beyond the storage savings brought by chunk-level de-duplication in backup and archiving systems, a prominent challenge facing this technology is how to efficiently and effectively identify the duplicate chunks. Most of the chunk fingerprints used to identify individual chunks are stored on disks due to the limited main memory capacity. Checking for chunk fingerprint match on disk for every input chunk is known to be a severe performance bottleneck for the backup process. On the other hand, our intuitions and analyses of real backup data both indicate that duplicate chunks tend to strongly concentrate according to the data ownership. Motivated by this observation and to avoid or alleviate the aforementioned backup performance bottleneck, we propose DAM, a dataownership-aware multi-layered de-duplication scheme that exploits the data chunks’ ownership and uses a tri-layered de-duplication approach to narrow the search space for duplicate chunks to reduce the total disk accesses. Our experimental results with real world datasets on DAM show it reduces the disk accesses by an average of 60.8% and shortens the de-duplication time by an average of 46.3%.","PeriodicalId":284549,"journal":{"name":"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage","volume":"464 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123928352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Parallel storage systems have been highly scalable and widely used in support of data-intensive applications. In future systems with the nature of massive data processing and storing, hybrid storage systems opt for a solution to fulfill a variety of demands such as large storage capacity, high I/O performance and low cost. Hybrid storage systems (HSS) contain both high-end storage components (e.g. solid-state disks and hard disk drives) to guarantee performance, and low-end storage components (e.g. tapes) to reduce cost. In HSS, transferring data back and forth among solid-state disks (SSDs), hard disk drives (HDDs), and tapes plays a critical role in achieving high I/O performance. Prefetching is a promising solution to reduce the latency of data transferring in HSS. However, prefetching in the context of HSS is technically challenging due to an interesting dilemma: aggressive prefetching is required to efficiently reduce I/O latency, whereas overaggressive prefetching may waste I/O bandwidth by transferring useless data from HDDs to SSDs or from tapes to HDDs. To address this problem, we propose a multi-layer prefetching algorithm that can speculatively prefetch data from tapes to HDDs and from HDDs to SSDs. To evaluate our algorithm, we develop an analytical model and the experimental results reveal that our prefetching algorithm improves the performance in hybrid storage systems.
并行存储系统具有高度可扩展性,并广泛用于支持数据密集型应用程序。在未来具有海量数据处理和存储性质的系统中,混合存储系统选择满足大存储容量、高I/O性能和低成本等多种需求的解决方案。混合存储系统(HSS)既包含保证性能的高端存储组件(如固态磁盘和硬盘驱动器),也包含降低成本的低端存储组件(如磁带)。在HSS中,数据在ssd (solid-state disk)、hdd (hard disk drives)和磁带之间的来回传输对实现高I/O性能起着至关重要的作用。预取是降低HSS数据传输延迟的一种很有前途的解决方案。然而,HSS环境中的预取在技术上具有挑战性,因为存在一个有趣的难题:需要主动预取来有效地减少I/O延迟,而过度预取可能会将无用的数据从hdd传输到ssd或从磁带传输到hdd,从而浪费I/O带宽。为了解决这个问题,我们提出了一种多层预取算法,可以推测地从磁带预取数据到hdd和从hdd预取数据到ssd。为了评估我们的算法,我们建立了一个分析模型,实验结果表明我们的预取算法提高了混合存储系统的性能。
{"title":"Modelling Speculative Prefetching for Hybrid Storage Systems","authors":"Mais Nijim","doi":"10.1109/NAS.2010.27","DOIUrl":"https://doi.org/10.1109/NAS.2010.27","url":null,"abstract":"Parallel storage systems have been highly scalable and widely used in support of data-intensive applications. In future systems with the nature of massive data processing and storing, hybrid storage systems opt for a solution to fulfill a variety of demands such as large storage capacity, high I/O performance and low cost. Hybrid storage systems (HSS) contain both high-end storage components (e.g. solid-state disks and hard disk drives) to guarantee performance, and low-end storage components (e.g. tapes) to reduce cost. In HSS, transferring data back and forth among solid-state disks (SSDs), hard disk drives (HDDs), and tapes plays a critical role in achieving high I/O performance. Prefetching is a promising solution to reduce the latency of data transferring in HSS. However, prefetching in the context of HSS is technically challenging due to an interesting dilemma: aggressive prefetching is required to efficiently reduce I/O latency, whereas overaggressive prefetching may waste I/O bandwidth by transferring useless data from HDDs to SSDs or from tapes to HDDs. To address this problem, we propose a multi-layer prefetching algorithm that can speculatively prefetch data from tapes to HDDs and from HDDs to SSDs. To evaluate our algorithm, we develop an analytical model and the experimental results reveal that our prefetching algorithm improves the performance in hybrid storage systems.","PeriodicalId":284549,"journal":{"name":"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114814472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Grid system has many great security challenges such as access control. The attribute-based access control model (ABAC) has much merits that are more flexible, fine-grained and dynamically suitable to grid environment. As an important factor in grid security, trust is increasingly applied to management of security, especially in access control. This paper puts forward a novel trust model in multi-domain grid environment and trust factor was originally introduced into access control architecture of grid to extend classic ABAC model. By extending the authorization architecture of XACML, extended ABAC based access control architecture for grid was submitted. In our experiment, the increase and decrease of trust are non-symmetrical and the trust model is sensitive to the malicious attacks. It can effectively control the trust change of different nodes and the trust model can reduce effectively the damage of vicious attack.
{"title":"A Trust Aware Grid Access Control Architecture Based on ABAC","authors":"Tiezhu Zhao, Shoubin Dong","doi":"10.1109/NAS.2010.18","DOIUrl":"https://doi.org/10.1109/NAS.2010.18","url":null,"abstract":"Grid system has many great security challenges such as access control. The attribute-based access control model (ABAC) has much merits that are more flexible, fine-grained and dynamically suitable to grid environment. As an important factor in grid security, trust is increasingly applied to management of security, especially in access control. This paper puts forward a novel trust model in multi-domain grid environment and trust factor was originally introduced into access control architecture of grid to extend classic ABAC model. By extending the authorization architecture of XACML, extended ABAC based access control architecture for grid was submitted. In our experiment, the increase and decrease of trust are non-symmetrical and the trust model is sensitive to the malicious attacks. It can effectively control the trust change of different nodes and the trust model can reduce effectively the damage of vicious attack.","PeriodicalId":284549,"journal":{"name":"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130763091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiuwen Wang, Haiping Qu, Lu Xu, Xiaoming Han, Jiangang Zhang
In shared data centres, accurate models of workloads are indispensable in the process of autonomic resource scheduling. Facing the problem of parameterizing the vast space of big MAPs in order to fit the real workload traces with time-varying characteristics, in this paper we propose a MAP fitting approach JAMC with joint approximation of the order moment and the lag correlation. Based on the state-of-the-art fitting method KPC, JAMC uses a similar divide and conquer approach to simplify the fitting problem and uses optimization to explore the best solution. Our experiments show that JAMC is simple and sufficient enough to effectively predict the behavior of the queueing systems, and the fitting time cost of a few minutes is acceptable for shared data center. Through the analysis of the sensitivity to the orders fitted, we deduce that it is not the case that the higher orders have better results. In the case of Bellcore Aug89, the appropriate fitted orders for the moments and autocorrelations should be respectively on a set of 10~20 and 10000~30000.
{"title":"A MAP Fitting Approach with Joint Approximation Oriented to the Dynamic Resource Provisioning in Shared Data Centres","authors":"Xiuwen Wang, Haiping Qu, Lu Xu, Xiaoming Han, Jiangang Zhang","doi":"10.1109/NAS.2010.39","DOIUrl":"https://doi.org/10.1109/NAS.2010.39","url":null,"abstract":"In shared data centres, accurate models of workloads are indispensable in the process of autonomic resource scheduling. Facing the problem of parameterizing the vast space of big MAPs in order to fit the real workload traces with time-varying characteristics, in this paper we propose a MAP fitting approach JAMC with joint approximation of the order moment and the lag correlation. Based on the state-of-the-art fitting method KPC, JAMC uses a similar divide and conquer approach to simplify the fitting problem and uses optimization to explore the best solution. Our experiments show that JAMC is simple and sufficient enough to effectively predict the behavior of the queueing systems, and the fitting time cost of a few minutes is acceptable for shared data center. Through the analysis of the sensitivity to the orders fitted, we deduce that it is not the case that the higher orders have better results. In the case of Bellcore Aug89, the appropriate fitted orders for the moments and autocorrelations should be respectively on a set of 10~20 and 10000~30000.","PeriodicalId":284549,"journal":{"name":"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125788256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Solid-State Disks (SSD) are widely used in government and security departments owing to its faster speed of data access, more durability, more shock and drop, no noise, lower power consumption, lighter weight compared with Magnetic disk. As a result, the demand of security for storing data has been generated. The Advanced Encryption Standard (AES) is today's key data encryption standard for protecting data, but the implementation of high-speed AES encryption engine needs to consume a large number of hardware resources. This paper presents a low-cost and inner-round pipelined ECB-256-AES encryption engine. Through sharing the resources between the AES encryption module and the AES decryption module and using the look-up table for the SubBytes and InvSubBytes operations, the logic resources have been largely reduced; by using loop rolling and inner-round pipelined techniques, a high throughput of encryption and decryption operations is achieved. A 1.986Gbits/s throughput and 232.748MHz clock frequency are achieved using 614 slices of the Xilinx xc6slx45-3fgg484. The simulation results show that the AES crypto design is able to meet the read and write speed of SATA 1.0 interface.
{"title":"A Low Cost and Inner-round Pipelined Design of ECB-AES-256 Crypto Engine for Solid State Disk","authors":"Fei Wu, Liang Wang, Ji-guang Wan","doi":"10.1109/NAS.2010.40","DOIUrl":"https://doi.org/10.1109/NAS.2010.40","url":null,"abstract":"Solid-State Disks (SSD) are widely used in government and security departments owing to its faster speed of data access, more durability, more shock and drop, no noise, lower power consumption, lighter weight compared with Magnetic disk. As a result, the demand of security for storing data has been generated. The Advanced Encryption Standard (AES) is today's key data encryption standard for protecting data, but the implementation of high-speed AES encryption engine needs to consume a large number of hardware resources. This paper presents a low-cost and inner-round pipelined ECB-256-AES encryption engine. Through sharing the resources between the AES encryption module and the AES decryption module and using the look-up table for the SubBytes and InvSubBytes operations, the logic resources have been largely reduced; by using loop rolling and inner-round pipelined techniques, a high throughput of encryption and decryption operations is achieved. A 1.986Gbits/s throughput and 232.748MHz clock frequency are achieved using 614 slices of the Xilinx xc6slx45-3fgg484. The simulation results show that the AES crypto design is able to meet the read and write speed of SATA 1.0 interface.","PeriodicalId":284549,"journal":{"name":"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127538725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Securing communications among a group of nodes in mobile ad hoc networks (MANETs) is challenging due to the lack of trusted infrastructure. Group key management is one of the basic building blocks in securing group communications. A group key is a common secret used in cryptographic algorithms. Group key management involves creating and distributing the common secret for all group members. Change of membership requires the group key being refreshed to ensure backward and forward secrecy. In this paper, we extend our previous work with new protocols. Our basic idea is that each group member does not need to order intermediate keys and can deduce the group key locally. A multicast tree is formed for efficient and reliable message dissemination.*****
{"title":"A Simple Group Key Management Approach for Mobile Ad Hoc Networks","authors":"Bing Wu, Yuhong Dong","doi":"10.1109/NAS.2010.20","DOIUrl":"https://doi.org/10.1109/NAS.2010.20","url":null,"abstract":"Securing communications among a group of nodes in mobile ad hoc networks (MANETs) is challenging due to the lack of trusted infrastructure. Group key management is one of the basic building blocks in securing group communications. A group key is a common secret used in cryptographic algorithms. Group key management involves creating and distributing the common secret for all group members. Change of membership requires the group key being refreshed to ensure backward and forward secrecy. In this paper, we extend our previous work with new protocols. Our basic idea is that each group member does not need to order intermediate keys and can deduce the group key locally. A multicast tree is formed for efficient and reliable message dissemination.*****","PeriodicalId":284549,"journal":{"name":"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122455612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Offering better performance for random access compared to conventional hard disks and providing larger capacity and lower cost than DRAM, NAND flash based SSDsare integrated in server storage hierarchy as a second tier of disk cache between DRAM and disks for caching more data from disks to meet the increasingly intensive I/O demands. Unfortunately, available hybrid storage architectures cannot fully exploit SSDs’ potentials due to absorbing too much workload of disk tier, which results in excessive wear and performance degradation associated with internel garbage collection. In this paper, we propose RAF (Random Access First), an hybrid storage architecture that combines both of an SSD based disk cache and a disk drive subsystem. RAF focuses on extending the lifetime of SSD while improving system performance through providing priority to caching random-access data. In detail, RAF splits flash cache into read and write cache to service read/write requests respectively. Read cache only holds random-access data that are evicted from file cache to reduce flash wear and write hits. Write cache performs as a circular write-through log so as to improve system response time and simplify garbage collection. Similar to read cache, write cache only caches random-access data and flushes them to hard disks immediately. Note that, sequential access are serviced by hard disks directly to even the full workload between SSD and disk storage. RAF is implemented in Linux kernel 2.6.30.10. The results of experiments show that RAF can significantly reduce flash wear and improve performance compared with the state-of-art FlashCache architecture.
{"title":"RAF: A Random Access First Cache Management to Improve SSD-Based Disk Cache","authors":"Yang Liu, Jianzhong Huang, C. Xie, Q. Cao","doi":"10.1109/NAS.2010.9","DOIUrl":"https://doi.org/10.1109/NAS.2010.9","url":null,"abstract":"Offering better performance for random access compared to conventional hard disks and providing larger capacity and lower cost than DRAM, NAND flash based SSDsare integrated in server storage hierarchy as a second tier of disk cache between DRAM and disks for caching more data from disks to meet the increasingly intensive I/O demands. Unfortunately, available hybrid storage architectures cannot fully exploit SSDs’ potentials due to absorbing too much workload of disk tier, which results in excessive wear and performance degradation associated with internel garbage collection. In this paper, we propose RAF (Random Access First), an hybrid storage architecture that combines both of an SSD based disk cache and a disk drive subsystem. RAF focuses on extending the lifetime of SSD while improving system performance through providing priority to caching random-access data. In detail, RAF splits flash cache into read and write cache to service read/write requests respectively. Read cache only holds random-access data that are evicted from file cache to reduce flash wear and write hits. Write cache performs as a circular write-through log so as to improve system response time and simplify garbage collection. Similar to read cache, write cache only caches random-access data and flushes them to hard disks immediately. Note that, sequential access are serviced by hard disks directly to even the full workload between SSD and disk storage. RAF is implemented in Linux kernel 2.6.30.10. The results of experiments show that RAF can significantly reduce flash wear and improve performance compared with the state-of-art FlashCache architecture.","PeriodicalId":284549,"journal":{"name":"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121101679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}