Xiaolin Wang, Yechen Li, Yingwei Luo, Xiameng Hu, Jacob Brock, C. Ding, Zhenlin Wang
On multicore processors, applications are run sharing the cache. This paper presents online optimization to collocate applications to minimize cache interference to maximize performance. The paper formulates the optimization problem and solution, presents a new sampling technique for locality analysis and evaluates it in an exhaustive test of 12,870 cases. For locality analysis, previous sampling was two orders of magnitude faster than full-trace analysis. The new sampling reduces the cost by another two orders of magnitude. The best prior work improves co-run performance by 56% on average. The new optimization improves it by another 29%. When sampling and optimization are combined, the paper shows that it takes less than 0.1 second analysis per program to obtain a co-run that is within 1.5% of the best possible performance.
{"title":"Optimal Footprint Symbiosis in Shared Cache","authors":"Xiaolin Wang, Yechen Li, Yingwei Luo, Xiameng Hu, Jacob Brock, C. Ding, Zhenlin Wang","doi":"10.1109/CCGrid.2015.153","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.153","url":null,"abstract":"On multicore processors, applications are run sharing the cache. This paper presents online optimization to collocate applications to minimize cache interference to maximize performance. The paper formulates the optimization problem and solution, presents a new sampling technique for locality analysis and evaluates it in an exhaustive test of 12,870 cases. For locality analysis, previous sampling was two orders of magnitude faster than full-trace analysis. The new sampling reduces the cost by another two orders of magnitude. The best prior work improves co-run performance by 56% on average. The new optimization improves it by another 29%. When sampling and optimization are combined, the paper shows that it takes less than 0.1 second analysis per program to obtain a co-run that is within 1.5% of the best possible performance.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"52 1","pages":"412-422"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78833815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Following our previous research results, in this paper we provide two authoritative application scenarios that build on top of OLAP*, a middleware for parallel processing of OLAP queries that truly realizes effective and efficiently OLAP over Big Data. We have provided two authoritative case studies, namely parallel OLAP data cube processing and virtual OLAP data cube design, for which we also propose a comprehensive performance evaluation and analysis. Derived analysis clearly confirms the benefits of our proposed framework.
{"title":"Cloud-Based OLAP over Big Data: Application Scenarios and Performance Analysis","authors":"A. Cuzzocrea, Rim Moussa, Guandong Xu, G. Grasso","doi":"10.1109/CCGrid.2015.174","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.174","url":null,"abstract":"Following our previous research results, in this paper we provide two authoritative application scenarios that build on top of OLAP*, a middleware for parallel processing of OLAP queries that truly realizes effective and efficiently OLAP over Big Data. We have provided two authoritative case studies, namely parallel OLAP data cube processing and virtual OLAP data cube design, for which we also propose a comprehensive performance evaluation and analysis. Derived analysis clearly confirms the benefits of our proposed framework.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"146 1","pages":"921-927"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85008743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present an architecture that increases persistence and reliability of automated infrastructure management in the context of hybrid, cluster-cloud environments. We describe our highly available implementation that builds upon Chef configuration management system and infrastructure-as-a-service cloud resources from Amazon Web Services. We summarize our experience with managing a 20-node Linux cluster using this implementation. Our analysis of utilization and cost of necessary cloud resources indicates that the designed system is a low-cost alternative to acquiring additional physical hardware for hardening cluster management.
我们提出了一种在混合、集群云环境中提高自动化基础设施管理持久性和可靠性的架构。我们描述了基于Chef配置管理系统和Amazon Web Services的基础设施即服务云资源的高可用性实现。我们总结一下使用此实现管理20节点Linux集群的经验。我们对必要云资源的利用率和成本的分析表明,所设计的系统是一种低成本的替代方案,可以获得额外的物理硬件来加强集群管理。
{"title":"Highly Available Cloud-Based Cluster Management","authors":"Dmitry Duplyakin, Matthew Haney, H. Tufo","doi":"10.1109/CCGrid.2015.125","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.125","url":null,"abstract":"We present an architecture that increases persistence and reliability of automated infrastructure management in the context of hybrid, cluster-cloud environments. We describe our highly available implementation that builds upon Chef configuration management system and infrastructure-as-a-service cloud resources from Amazon Web Services. We summarize our experience with managing a 20-node Linux cluster using this implementation. Our analysis of utilization and cost of necessary cloud resources indicates that the designed system is a low-cost alternative to acquiring additional physical hardware for hardening cluster management.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"4 1","pages":"1201-1204"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87266025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Distributed Infrastructure with Remote Agent Control (DIRAC) software framework allows a user community to manage computing activities in different grid and cloud environments. Many communities from several fields (LHCb, Belle II, Creatis, DIRAC4EGI multiple community portal, etc.) use DIRAC to run jobs in distributed environments. Google created the MapReduce programming model offering an efficient way of performing distributed computation over large data sets. Several enterprises are providing Hadoop cloud based resources to their users, and are trying to simplify the usage of Hadoop in the cloud. Based in these two robust technologies, we have created BigDataDIRAC, a solution which gives users the opportunity to access multiple Big Data resources scattered in different geographical areas, such as access to grid resources. This approach opens the possibility of offering not only grid and cloud to the users, but also Big Data resources from the same DIRAC environment. Proof of concept is shown using three computing centers in two countries, and with four Hadoop clusters. Our results demonstrate the ability of BigDataDIRAC to manage jobs driven by dataset location stored in the Hadoop File System (HDFS) of the Hadoop distributed clusters. DIRAC is used to monitor the execution, collect the necessary statistical data, and upload the results from the remote HDFS to the SandBox Storage machine. The tests produced the equivalent of 5 days continuous processing.
{"title":"BigDataDIRAC: Deploying Distributed Big Data Applications","authors":"Víctor Fernández, V. Muñoz, T. F. Pena","doi":"10.1109/CCGrid.2015.109","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.109","url":null,"abstract":"The Distributed Infrastructure with Remote Agent Control (DIRAC) software framework allows a user community to manage computing activities in different grid and cloud environments. Many communities from several fields (LHCb, Belle II, Creatis, DIRAC4EGI multiple community portal, etc.) use DIRAC to run jobs in distributed environments. Google created the MapReduce programming model offering an efficient way of performing distributed computation over large data sets. Several enterprises are providing Hadoop cloud based resources to their users, and are trying to simplify the usage of Hadoop in the cloud. Based in these two robust technologies, we have created BigDataDIRAC, a solution which gives users the opportunity to access multiple Big Data resources scattered in different geographical areas, such as access to grid resources. This approach opens the possibility of offering not only grid and cloud to the users, but also Big Data resources from the same DIRAC environment. Proof of concept is shown using three computing centers in two countries, and with four Hadoop clusters. Our results demonstrate the ability of BigDataDIRAC to manage jobs driven by dataset location stored in the Hadoop File System (HDFS) of the Hadoop distributed clusters. DIRAC is used to monitor the execution, collect the necessary statistical data, and upload the results from the remote HDFS to the SandBox Storage machine. The tests produced the equivalent of 5 days continuous processing.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"26 1","pages":"1177-1180"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90259948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qing Zhao, Congcong Xiong, Xi Zhao, Ce Yu, Jian Xiao
With the arrival of cloud computing and Big Data, many scientific applications with large amount of data can be abstracted as scientific workflows and running on a cloud environment. Distributing these datasets intelligently can decrease data transfers efficiently during the workflow's execution. In this paper, we proposed a 2- stage data placement strategy. In the initial stage, we cluster the datasets based on their correlation, and allocate these clusters onto data centers. Compared with existing works, we have incorporated the data size into correlation calculation, and have proposed a new type of data correlation for the intermediate data named "the first order conduction correlation". Hence the data transmission cost can be measured more reasonable. In the runtime stage, the re-distribution algorithm can adjust data layout according to the changed factors, and the overhead of re-layout itself has also been measured. Compared with previous work, simulation results show that our proposed strategy can effectively reduce the time consumption of data movements during the workflow execution.
{"title":"A Data Placement Strategy for Data-Intensive Scientific Workflows in Cloud","authors":"Qing Zhao, Congcong Xiong, Xi Zhao, Ce Yu, Jian Xiao","doi":"10.1109/CCGrid.2015.72","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.72","url":null,"abstract":"With the arrival of cloud computing and Big Data, many scientific applications with large amount of data can be abstracted as scientific workflows and running on a cloud environment. Distributing these datasets intelligently can decrease data transfers efficiently during the workflow's execution. In this paper, we proposed a 2- stage data placement strategy. In the initial stage, we cluster the datasets based on their correlation, and allocate these clusters onto data centers. Compared with existing works, we have incorporated the data size into correlation calculation, and have proposed a new type of data correlation for the intermediate data named \"the first order conduction correlation\". Hence the data transmission cost can be measured more reasonable. In the runtime stage, the re-distribution algorithm can adjust data layout according to the changed factors, and the overhead of re-layout itself has also been measured. Compared with previous work, simulation results show that our proposed strategy can effectively reduce the time consumption of data movements during the workflow execution.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"11 1","pages":"928-934"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84290821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In exascale systems, increasing error rate -- particularly silent data corruption -- is a major concern. The Global ViewResilience (GVR) system builds a new model of application resilience on versioned global arrays. These arrays can be exploited for flexible, application-specific error checking and recovery. We explore a fundamental challenge to the GVR model -- the cost of versioning. We propose a novel log-structured implementation that appends new data to an update log, simultaneously tracking modified regions and versioning incrementally. We compare performance of log-structured arrays to traditional flat arrays using micro-benchmarks and three full applications, and show that versioning can be more than 10x faster, and reduce memory cost significantly. Further, in future systems with NVRAM, a log-structured approach is more tolerant onramp limitations such as write bandwidth and wear-out.
{"title":"Log-Structured Global Array for Efficient Multi-Version Snapshots","authors":"H. Fujita, N. Dun, Z. Rubenstein, A. Chien","doi":"10.1109/CCGrid.2015.80","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.80","url":null,"abstract":"In exascale systems, increasing error rate -- particularly silent data corruption -- is a major concern. The Global ViewResilience (GVR) system builds a new model of application resilience on versioned global arrays. These arrays can be exploited for flexible, application-specific error checking and recovery. We explore a fundamental challenge to the GVR model -- the cost of versioning. We propose a novel log-structured implementation that appends new data to an update log, simultaneously tracking modified regions and versioning incrementally. We compare performance of log-structured arrays to traditional flat arrays using micro-benchmarks and three full applications, and show that versioning can be more than 10x faster, and reduce memory cost significantly. Further, in future systems with NVRAM, a log-structured approach is more tolerant onramp limitations such as write bandwidth and wear-out.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"7 3 1","pages":"281-291"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83478157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clipping arbitrary polygons is one of the complex operations in computer graphics and computational geometry. It is applied in many fields such as Geographic Information Systems (GIS) and VLSI CAD. We have two significant results to report. Our first result is the effective parallelization of the classic, highly sequential Greiner-Hormann algorithm, which yields the first output-sensitive CREW PRAM algorithm for a pair of simple polygons, and can perform clipping in O(logn) time using O(n+k) processors, where n is the total number of vertices and k is the number of edge intersections. This improves upon our previous clipping algorithm based on the parallelization of Vatti's sweepline algorithm, which requires O(n+k+k') processors to achieve logarithmic time complexity where k' can be O(n2). This also improves upon another O(logn) time algorithm by Karinthi, Srinivas, and Almasi which unlike our algorithm does not handle self-intersecting polygons, is not output-sensitive, and must employ O(n2) processors to achieve O(logn) time. We also study multi-core and many-core implementations of our parallel Greiner-Hormann algorithm. Our second result is a practical, parallel GIS system, namely MPI-GIS, for polygon overlay processing of two GIS layers containing large number of polygons over a cluster of compute nodes. It employs R-tree for efficient indexing and identification of potentially intersecting set of polygons across two input GIS layers. Spatial data files tend to be large in size (in GBs) and the underlying overlay computation is highly irregular and compute intensive. This system achieves 44X speedup on a 32-node NERSC's CARVER cluster while processing about 600K polygons in two GIS layers within 19 seconds which takes over 13 minutes on state-of-art ArcGIS system.
{"title":"A Parallel Algorithm for Clipping Polygons with Improved Bounds and a Distributed Overlay Processing System Using MPI","authors":"S. Puri, S. Prasad","doi":"10.1109/CCGrid.2015.43","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.43","url":null,"abstract":"Clipping arbitrary polygons is one of the complex operations in computer graphics and computational geometry. It is applied in many fields such as Geographic Information Systems (GIS) and VLSI CAD. We have two significant results to report. Our first result is the effective parallelization of the classic, highly sequential Greiner-Hormann algorithm, which yields the first output-sensitive CREW PRAM algorithm for a pair of simple polygons, and can perform clipping in O(logn) time using O(n+k) processors, where n is the total number of vertices and k is the number of edge intersections. This improves upon our previous clipping algorithm based on the parallelization of Vatti's sweepline algorithm, which requires O(n+k+k') processors to achieve logarithmic time complexity where k' can be O(n2). This also improves upon another O(logn) time algorithm by Karinthi, Srinivas, and Almasi which unlike our algorithm does not handle self-intersecting polygons, is not output-sensitive, and must employ O(n2) processors to achieve O(logn) time. We also study multi-core and many-core implementations of our parallel Greiner-Hormann algorithm. Our second result is a practical, parallel GIS system, namely MPI-GIS, for polygon overlay processing of two GIS layers containing large number of polygons over a cluster of compute nodes. It employs R-tree for efficient indexing and identification of potentially intersecting set of polygons across two input GIS layers. Spatial data files tend to be large in size (in GBs) and the underlying overlay computation is highly irregular and compute intensive. This system achieves 44X speedup on a 32-node NERSC's CARVER cluster while processing about 600K polygons in two GIS layers within 19 seconds which takes over 13 minutes on state-of-art ArcGIS system.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"138 1","pages":"576-585"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86591068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In practical Cloud/Grid computing systems, DAG scheduling may be faced with challenges arising from severe uncertainty about the underlying platform. For instance, it could be hard to have explicit information about task execution time and/or the availability of resources, both may change dynamically, in difficult to predict ways. In such a setting, the development of various kinds of just-in-time scheduling schemes, which aim at maximizing the parallelism of ready tasks of DAG, seems to be a promising approach to cope with the lack of environment information and achieve efficient DAG execution. Although many attempts have been tried to develop such just-in-time scheduling heuristics, most of them are based on DAG decomposition, which results in complicated and suboptimal solutions for general DAGs. This paper presents a priority-based heuristic, which is not only easy to apply to arbitrary DAGs, but also exhibits comparable or better performance than the existing solutions.
{"title":"A Priority-Based Scheduling Heuristic to Maximize Parallelism of Ready Tasks for DAG Applications","authors":"Wei Zheng, Lu Tang, R. Sakellariou","doi":"10.1109/CCGrid.2015.97","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.97","url":null,"abstract":"In practical Cloud/Grid computing systems, DAG scheduling may be faced with challenges arising from severe uncertainty about the underlying platform. For instance, it could be hard to have explicit information about task execution time and/or the availability of resources, both may change dynamically, in difficult to predict ways. In such a setting, the development of various kinds of just-in-time scheduling schemes, which aim at maximizing the parallelism of ready tasks of DAG, seems to be a promising approach to cope with the lack of environment information and achieve efficient DAG execution. Although many attempts have been tried to develop such just-in-time scheduling heuristics, most of them are based on DAG decomposition, which results in complicated and suboptimal solutions for general DAGs. This paper presents a priority-based heuristic, which is not only easy to apply to arbitrary DAGs, but also exhibits comparable or better performance than the existing solutions.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"74 1","pages":"596-605"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82290594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, machine learning is widely used in applications and cloud services. And as the emerging field of machine learning, deep learning shows excellent ability in solving complex learning problems. To give users better experience, high performance implementations of deep learning applications seem very important. As a common means to accelerate algorithms, FPGA has high performance, low power consumption, small size and other characteristics. So we use FPGA to design a deep learning accelerator, the accelerator focuses on the implementation of the prediction process, data access optimization and pipeline structure. Compared with Core 2 CPU 2.3GHz, our accelerator can achieve promising result.
最近,机器学习在应用程序和云服务中得到了广泛的应用。而深度学习作为机器学习的新兴领域,在解决复杂学习问题方面表现出了出色的能力。为了给用户更好的体验,深度学习应用的高性能实现显得非常重要。FPGA作为一种常用的算法加速手段,具有高性能、低功耗、体积小等特点。与Core 2 CPU 2.3GHz相比,我们的加速器可以取得很好的效果。
{"title":"A Deep Learning Prediction Process Accelerator Based FPGA","authors":"Qi Yu, Chao Wang, Xiang Ma, Xi Li, Xuehai Zhou","doi":"10.1109/CCGrid.2015.114","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.114","url":null,"abstract":"Recently, machine learning is widely used in applications and cloud services. And as the emerging field of machine learning, deep learning shows excellent ability in solving complex learning problems. To give users better experience, high performance implementations of deep learning applications seem very important. As a common means to accelerate algorithms, FPGA has high performance, low power consumption, small size and other characteristics. So we use FPGA to design a deep learning accelerator, the accelerator focuses on the implementation of the prediction process, data access optimization and pipeline structure. Compared with Core 2 CPU 2.3GHz, our accelerator can achieve promising result.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"1 1","pages":"1159-1162"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90057790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Providing QoS guarantees for hybrid storage systems made up of both solid-state drives (SSDs) and hard disks (HDs) is a challenging problem. Since HDs and SSDs have widely different IOPS capacities, it is not sensible to treat the storage system as a monolithic black box, instead a useful QoS model must necessarily differentiate the IOs made to different device types. Traditional storage resource allocation models have largely been designed to provide QoS for a single resource type, and result in poor utilization and fairness when applied to multiple coupled resources. In this paper, we present a new resource allocation model for hybrid storage systems using a multi-resource framework. The model supports reservations and shares for clients sharing the storage system. Reservations specify the minimum throughput (IOPS) that a client must receive, while shares reflect its weight relative to other clients that are bottlenecked on the same device. We present a formal multi-resource allocation model to allocate IOPS to clients, together with an IO scheduling algorithm to maximize system throughput. The model and algorithms are validated with empirical results.
{"title":"A Resource Allocation Model for Hybrid Storage Systems","authors":"Hui Wang, P. Varman","doi":"10.1109/CCGrid.2015.132","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.132","url":null,"abstract":"Providing QoS guarantees for hybrid storage systems made up of both solid-state drives (SSDs) and hard disks (HDs) is a challenging problem. Since HDs and SSDs have widely different IOPS capacities, it is not sensible to treat the storage system as a monolithic black box, instead a useful QoS model must necessarily differentiate the IOs made to different device types. Traditional storage resource allocation models have largely been designed to provide QoS for a single resource type, and result in poor utilization and fairness when applied to multiple coupled resources. In this paper, we present a new resource allocation model for hybrid storage systems using a multi-resource framework. The model supports reservations and shares for clients sharing the storage system. Reservations specify the minimum throughput (IOPS) that a client must receive, while shares reflect its weight relative to other clients that are bottlenecked on the same device. We present a formal multi-resource allocation model to allocate IOPS to clients, together with an IO scheduling algorithm to maximize system throughput. The model and algorithms are validated with empirical results.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"108 1","pages":"91-100"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73349179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}