Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00083
Sreekrishnan Venkateswaran, S. Sarkar
Cloud service brokerage is an emerging technology that attempts to simplify the consumption and operation of hybrid clouds. Today's cloud brokers attempt to insulate consumers from the vagaries of multiple clouds. To achieve the insulation, the modern cloud broker needs to disguise itself as the end-provider to consumers by creating and operating a virtual data center construct that we call a "meta-cloud", which is assembled on top of a set of participating supplier clouds. It is crucial for such a cloud broker to be considered a trusted partner both by cloud consumers and by the underpinning cloud suppliers. A fundamental tenet of brokerage trust is vendor neutrality. On the one hand, cloud consumers will be comfortable if a cloud broker guarantees that they will not be led through a preferred path. And on the other hand, cloud suppliers would be more interested in partnering with a cloud broker who promises a fair apportioning of client provisioning requests. Because consumer and supplier trust on a meta-cloud broker stems from the assumption of being agnostic to supplier clouds, there is a need for a test strategy that verifies the fairness of cloud brokerage. In this paper, we propose a calculus of fairness that defines the rules to determine the operational behavior of a cloud broker. The calculus uses temporal logic to model the fact that fairness is a trait that has to be ascertained over time; it is not a characteristic that can be judged at a per-request fulfillment level. Using our temporal calculus of fairness as the basis, we propose an algorithm to determine the fairness of a broker probabilistically, based on its observed request apportioning policies. Our model for the fairness of cloud broker behavior also factors in inter-provider variables such as cost divergence and capacity variance. We empirically validate our approach by constructing a meta-cloud from AWS, Azure and IBM, in addition to leveraging a cloud simulator. Our industrial engagements with large enterprises also validate the need for such cloud brokerage with verifiable fairness.
{"title":"Modeling Operational Fairness of Hybrid Cloud Brokerage","authors":"Sreekrishnan Venkateswaran, S. Sarkar","doi":"10.1109/CCGRID.2018.00083","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00083","url":null,"abstract":"Cloud service brokerage is an emerging technology that attempts to simplify the consumption and operation of hybrid clouds. Today's cloud brokers attempt to insulate consumers from the vagaries of multiple clouds. To achieve the insulation, the modern cloud broker needs to disguise itself as the end-provider to consumers by creating and operating a virtual data center construct that we call a \"meta-cloud\", which is assembled on top of a set of participating supplier clouds. It is crucial for such a cloud broker to be considered a trusted partner both by cloud consumers and by the underpinning cloud suppliers. A fundamental tenet of brokerage trust is vendor neutrality. On the one hand, cloud consumers will be comfortable if a cloud broker guarantees that they will not be led through a preferred path. And on the other hand, cloud suppliers would be more interested in partnering with a cloud broker who promises a fair apportioning of client provisioning requests. Because consumer and supplier trust on a meta-cloud broker stems from the assumption of being agnostic to supplier clouds, there is a need for a test strategy that verifies the fairness of cloud brokerage. In this paper, we propose a calculus of fairness that defines the rules to determine the operational behavior of a cloud broker. The calculus uses temporal logic to model the fact that fairness is a trait that has to be ascertained over time; it is not a characteristic that can be judged at a per-request fulfillment level. Using our temporal calculus of fairness as the basis, we propose an algorithm to determine the fairness of a broker probabilistically, based on its observed request apportioning policies. Our model for the fairness of cloud broker behavior also factors in inter-provider variables such as cost divergence and capacity variance. We empirically validate our approach by constructing a meta-cloud from AWS, Azure and IBM, in addition to leveraging a cloud simulator. Our industrial engagements with large enterprises also validate the need for such cloud brokerage with verifiable fairness.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129260733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00041
N. V. Bozdog, M. Makkes, A. V. Halteren, H. Bal
The daily home-office commute of millions of people in crowded cities puts a strain on air quality, traveling time and noise pollution. This is especially problematic in western cities, where cars and taxis have low occupancy with daily commuters. To reduce these issues, authorities often encourage commuters to share their rides, also known as carpooling or ridesharing. To increase the ridesharing usage it is essential that commuters are efficiently matched. In this paper we present RideMatcher, a novel peer-to-peer system for matching car rides based on their routes and travel times. Unlike other ridesharing systems, RideMatcher is completely decentralized, which makes it possible to deploy it on distributed infrastructures, using fog and edge computing. Despite being decentralized, our system is able to efficiently match ridesharing users in near real-time. Our evaluations performed on a dataset with 34,837 real taxi trips from New York show that RideMatcher is able to reduce the number of taxi trips by up to 65%, the distance traveled by taxi cabs by up to 64%, and the cost of the trips by up to 66%.
{"title":"RideMatcher: Peer-to-Peer Matching of Passengers for Efficient Ridesharing","authors":"N. V. Bozdog, M. Makkes, A. V. Halteren, H. Bal","doi":"10.1109/CCGRID.2018.00041","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00041","url":null,"abstract":"The daily home-office commute of millions of people in crowded cities puts a strain on air quality, traveling time and noise pollution. This is especially problematic in western cities, where cars and taxis have low occupancy with daily commuters. To reduce these issues, authorities often encourage commuters to share their rides, also known as carpooling or ridesharing. To increase the ridesharing usage it is essential that commuters are efficiently matched. In this paper we present RideMatcher, a novel peer-to-peer system for matching car rides based on their routes and travel times. Unlike other ridesharing systems, RideMatcher is completely decentralized, which makes it possible to deploy it on distributed infrastructures, using fog and edge computing. Despite being decentralized, our system is able to efficiently match ridesharing users in near real-time. Our evaluations performed on a dataset with 34,837 real taxi trips from New York show that RideMatcher is able to reduce the number of taxi trips by up to 65%, the distance traveled by taxi cabs by up to 64%, and the cost of the trips by up to 66%.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114329401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00029
Subarna Chatterjee, C. Morin
With the advent of the Internet of Things (IoT), data stream processing have gained increased attention due to the ever-increasing need to process heterogeneous and voluminous data streams. This work addresses the problem of selecting a correct stream processing framework for a given application to be executed within a specific physical infrastructure. For this purpose, we focus on a thorough comparative analysis of three data stream processing platforms – Apache Flink, Apache Storm, and Twitter Heron (the enhanced version of Apache Storm), that are chosen based on their potential to process both streams and batches in real-time. The goal of the work is to enlighten the cloud-clients and the cloud-providers with the knowledge of the choice of the resource-efficient and requirement-adaptive streaming platform for a given application so that they can plan during allocation or assignment of Virtual Machines for application execution. For the comparative performance analysis of the chosen platforms, we have experimented using 8-node clusters on Grid5000 experimentation testbed and have selected a wide variety of applications ranging from a conventional benchmark to sensor-based IoT application and statistical batch processing application. In addition to the various performance metrics related to the elasticity and resource usage of the platforms, this work presents a comparative study of the “green-ness” of the streaming platforms by analyzing their power consumption – one of the first attempts of its kind. The obtained results are thoroughly analyzed to illustrate the functional behavior of these platforms under different computing scenarios.
{"title":"Experimental Study on the Performance and Resource Utilization of Data Streaming Frameworks","authors":"Subarna Chatterjee, C. Morin","doi":"10.1109/CCGRID.2018.00029","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00029","url":null,"abstract":"With the advent of the Internet of Things (IoT), data stream processing have gained increased attention due to the ever-increasing need to process heterogeneous and voluminous data streams. This work addresses the problem of selecting a correct stream processing framework for a given application to be executed within a specific physical infrastructure. For this purpose, we focus on a thorough comparative analysis of three data stream processing platforms – Apache Flink, Apache Storm, and Twitter Heron (the enhanced version of Apache Storm), that are chosen based on their potential to process both streams and batches in real-time. The goal of the work is to enlighten the cloud-clients and the cloud-providers with the knowledge of the choice of the resource-efficient and requirement-adaptive streaming platform for a given application so that they can plan during allocation or assignment of Virtual Machines for application execution. For the comparative performance analysis of the chosen platforms, we have experimented using 8-node clusters on Grid5000 experimentation testbed and have selected a wide variety of applications ranging from a conventional benchmark to sensor-based IoT application and statistical batch processing application. In addition to the various performance metrics related to the elasticity and resource usage of the platforms, this work presents a comparative study of the “green-ness” of the streaming platforms by analyzing their power consumption – one of the first attempts of its kind. The obtained results are thoroughly analyzed to illustrate the functional behavior of these platforms under different computing scenarios.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123834208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00073
G. Vernik, M. Factor, E. K. Kolodner, P. Michiardi, Effi Ofer, Francesco Pace
Until now object storage has not been a first-class citizen of the Apache Hadoop ecosystem including Apache Spark. Hadoop connectors to object storage have been based on file semantics, an impedance mismatch, which leads to low performance and the need for an additional consistent storage system to achieve fault tolerance. In particular, Hadoop depends on its underlying storage system and its associated connector for fault tolerance and allowing speculative execution. However, these characteristics are obtained through file operations that are not native for object storage, and are both costly and not atomic. As a result these connectors are not efficient and more importantly they cannot help with fault tolerance for object storage. We introduce Stocator, whose novel algorithm achieves both high performance and fault tolerance by taking advantage of object storage semantics. This greatly decreases the number of operations on object storage as well as enabling a much simpler approach to dealing with the eventually consistent semantics typical of object storage. We have implemented Stocator and shared it in open source. Performance testing with Apache Spark shows that it can be 18 times faster for write intensive workloads and can perform 30 times fewer operations on object storage than the legacy Hadoop connectors, reducing costs both for the client and the object storage service provider.
{"title":"Stocator: Providing High Performance and Fault Tolerance for Apache Spark Over Object Storage","authors":"G. Vernik, M. Factor, E. K. Kolodner, P. Michiardi, Effi Ofer, Francesco Pace","doi":"10.1109/CCGRID.2018.00073","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00073","url":null,"abstract":"Until now object storage has not been a first-class citizen of the Apache Hadoop ecosystem including Apache Spark. Hadoop connectors to object storage have been based on file semantics, an impedance mismatch, which leads to low performance and the need for an additional consistent storage system to achieve fault tolerance. In particular, Hadoop depends on its underlying storage system and its associated connector for fault tolerance and allowing speculative execution. However, these characteristics are obtained through file operations that are not native for object storage, and are both costly and not atomic. As a result these connectors are not efficient and more importantly they cannot help with fault tolerance for object storage. We introduce Stocator, whose novel algorithm achieves both high performance and fault tolerance by taking advantage of object storage semantics. This greatly decreases the number of operations on object storage as well as enabling a much simpler approach to dealing with the eventually consistent semantics typical of object storage. We have implemented Stocator and shared it in open source. Performance testing with Apache Spark shows that it can be 18 times faster for write intensive workloads and can perform 30 times fewer operations on object storage than the legacy Hadoop connectors, reducing costs both for the client and the object storage service provider.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121107609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00080
Daniel Mawhirter, Bo Wu, D. Mehta, Chao Ai
Graphlet counting is a methodology for detecting local structural properties of large graphs that has been in use for over a decade. Despite tremendous effort in optimizing its performance, even 3- and 4-node graphlet counting routines may run for hours or days on highly optimized systems. In this paper, we describe how a synergistic combination of approximate computing with parallel computing can result in multiplicative performance improvements in graphlet counting runtimes with minimal and controllable loss of accuracy. Specifically, we describe two novel techniques, multi-phased sampling for statistical accuracy guarantees and cost-aware sampling to further improve performance on multi-machine runs, which reduce the query time on large graphs from tens of hours to several minutes or seconds with only <1% relative error.
{"title":"ApproxG: Fast Approximate Parallel Graphlet Counting Through Accuracy Control","authors":"Daniel Mawhirter, Bo Wu, D. Mehta, Chao Ai","doi":"10.1109/CCGRID.2018.00080","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00080","url":null,"abstract":"Graphlet counting is a methodology for detecting local structural properties of large graphs that has been in use for over a decade. Despite tremendous effort in optimizing its performance, even 3- and 4-node graphlet counting routines may run for hours or days on highly optimized systems. In this paper, we describe how a synergistic combination of approximate computing with parallel computing can result in multiplicative performance improvements in graphlet counting runtimes with minimal and controllable loss of accuracy. Specifically, we describe two novel techniques, multi-phased sampling for statistical accuracy guarantees and cost-aware sampling to further improve performance on multi-machine runs, which reduce the query time on large graphs from tens of hours to several minutes or seconds with only <1% relative error.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"325 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123213988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00020
Pedro Raminhas, S. Issa, P. Romano
Transactional Memory (TM) is an emerging paradigm that promises to significantly ease the development of parallel programs. Hybrid TM (HyTM) is probably the most promising implementation of the TM abstraction, which seeks to combine the high efficiency of hardware implementations (HTM) with the robustness and flexibility of software-based ones (STM). Unfortunately, though, existing Hybrid TM systems are known to suffer from high overheads to guarantee correct synchronization between concurrent transactions executing in hardware and software. This article introduces DMP-TM (Dynamic Memory Partitioning-TM), a novel HyTM algorithm that exploits, to the best of our knowledge for the first time in the literature, the idea of leveraging operating system-level memory protection mechanisms to detect conflicts between HTM and STM transactions. This innovative design allows for employing highly scalable STM implementations, while avoiding instrumentation on the HTM path. This allows DMP-TM to achieve up to ~ 20× speedups compared to state of the art Hybrid TM solutions in uncontended workloads. Further, thanks to the use of simple and lightweight self-tuning mechanisms, DMP-TM achieves robust performance even in unfavourable workload that exhibits high contention between the STM and HTM path.
{"title":"Enhancing Efficiency of Hybrid Transactional Memory Via Dynamic Data Partitioning Schemes","authors":"Pedro Raminhas, S. Issa, P. Romano","doi":"10.1109/CCGRID.2018.00020","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00020","url":null,"abstract":"Transactional Memory (TM) is an emerging paradigm that promises to significantly ease the development of parallel programs. Hybrid TM (HyTM) is probably the most promising implementation of the TM abstraction, which seeks to combine the high efficiency of hardware implementations (HTM) with the robustness and flexibility of software-based ones (STM). Unfortunately, though, existing Hybrid TM systems are known to suffer from high overheads to guarantee correct synchronization between concurrent transactions executing in hardware and software. This article introduces DMP-TM (Dynamic Memory Partitioning-TM), a novel HyTM algorithm that exploits, to the best of our knowledge for the first time in the literature, the idea of leveraging operating system-level memory protection mechanisms to detect conflicts between HTM and STM transactions. This innovative design allows for employing highly scalable STM implementations, while avoiding instrumentation on the HTM path. This allows DMP-TM to achieve up to ~ 20× speedups compared to state of the art Hybrid TM solutions in uncontended workloads. Further, thanks to the use of simple and lightweight self-tuning mechanisms, DMP-TM achieves robust performance even in unfavourable workload that exhibits high contention between the STM and HTM path.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128486893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00070
Wenqi Liu, Hongxiang Li, Bin Xie
Recently, large-scale networks attract significant attention to analyze and extract the hidden information of big data. Toward this end, graph embedding is a method to embed a high dimensional graph into a much lower dimensional vector space while maximally preserving the structural information of the original network. However, effective graph embedding is particularly challenging when massive graph data are generated and processed for real-time applications. In this paper, we address this challenge and propose a new real-time and distributed graph embedding algorithm (RTDGE) that is capable of distributively embedding a large-scale graph in a streaming fashion. Specifically, our RTDGE consists of the following components: (1) a graph partition scheme that divides all edges into distinct subgraphs, where vertices are associated with edges and may belong to several subgraphs; (2) a dynamic negative sampling (DNS) method that updates the embedded vectors in real-time; and (3) an unsupervised global aggregation scheme that combines all locally embedded vectors into a global vector space. Furthermore, we also build a real-time distributed graph embedding platform based on Apache Kafka and Apache Storm. Extensive experimental results show that RTDGE outperforms existing solutions in terms of graph embedding efficiency and accuracy.
{"title":"Real-Time Graph Partition and Embedding of Large Network","authors":"Wenqi Liu, Hongxiang Li, Bin Xie","doi":"10.1109/CCGRID.2018.00070","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00070","url":null,"abstract":"Recently, large-scale networks attract significant attention to analyze and extract the hidden information of big data. Toward this end, graph embedding is a method to embed a high dimensional graph into a much lower dimensional vector space while maximally preserving the structural information of the original network. However, effective graph embedding is particularly challenging when massive graph data are generated and processed for real-time applications. In this paper, we address this challenge and propose a new real-time and distributed graph embedding algorithm (RTDGE) that is capable of distributively embedding a large-scale graph in a streaming fashion. Specifically, our RTDGE consists of the following components: (1) a graph partition scheme that divides all edges into distinct subgraphs, where vertices are associated with edges and may belong to several subgraphs; (2) a dynamic negative sampling (DNS) method that updates the embedded vectors in real-time; and (3) an unsupervised global aggregation scheme that combines all locally embedded vectors into a global vector space. Furthermore, we also build a real-time distributed graph embedding platform based on Apache Kafka and Apache Storm. Extensive experimental results show that RTDGE outperforms existing solutions in terms of graph embedding efficiency and accuracy.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125326117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00061
R. Luley, Qinru Qiu
Optimizing resource utilization is a critical issue in cloud and cluster-based computing systems. In such systems, computing resources often consist of one or more GPU devices, and much research has already been conducted on means for maximizing compute resources through shared execution strategies. However, one of the most severe resource constraints in these scenarios is the data transfer channel between the host (i.e., CPU) and the device (i.e., GPU). Data transfer contention has been shown to have a significant impact on performance, yet methods for optimizing such contention have not been thoroughly studied. Techniques that have been examined make certain assumptions which limit effectiveness in the general case. In this paper, we introduce a heuristic which selectively aggregates transfers in order to maximize system performance by optimizing the transfer channel bandwidth. We compare this heuristic to traditional first-come-first-served approach, and apply Monte Carlo reinforcement learning to find an optimal policy for message aggregation. Finally, we evaluate the performance of Monte Carlo reinforcement learning with an arbitrarily-initialized policy. We demonstrate its effectiveness in learning optimal data transfer policy without detailed system characterization, which will enable a general adaptable solution for resource management of future systems.
{"title":"Optimizing Data Transfers for Improved Performance on Shared GPUs Using Reinforcement Learning","authors":"R. Luley, Qinru Qiu","doi":"10.1109/CCGRID.2018.00061","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00061","url":null,"abstract":"Optimizing resource utilization is a critical issue in cloud and cluster-based computing systems. In such systems, computing resources often consist of one or more GPU devices, and much research has already been conducted on means for maximizing compute resources through shared execution strategies. However, one of the most severe resource constraints in these scenarios is the data transfer channel between the host (i.e., CPU) and the device (i.e., GPU). Data transfer contention has been shown to have a significant impact on performance, yet methods for optimizing such contention have not been thoroughly studied. Techniques that have been examined make certain assumptions which limit effectiveness in the general case. In this paper, we introduce a heuristic which selectively aggregates transfers in order to maximize system performance by optimizing the transfer channel bandwidth. We compare this heuristic to traditional first-come-first-served approach, and apply Monte Carlo reinforcement learning to find an optimal policy for message aggregation. Finally, we evaluate the performance of Monte Carlo reinforcement learning with an arbitrarily-initialized policy. We demonstrate its effectiveness in learning optimal data transfer policy without detailed system characterization, which will enable a general adaptable solution for resource management of future systems.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116722027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00049
P. Minet, É. Renault, I. Khoufi, S. Boumerdassi
Data collected from an operational Google data center during 29 days represent a very rich and very useful source of information for understanding the main features of a data center. In this paper, we highlight the strong heterogeneity of jobs. The distribution of job execution duration shows a high disparity, as well as the job waiting time before being scheduled. The resource requests in terms of CPU and memory are also analyzed. The knowledge of all these features is needed to design models of jobs, machines and resource requests that are representative of a real data center.
{"title":"Data Analysis of a Google Data Center","authors":"P. Minet, É. Renault, I. Khoufi, S. Boumerdassi","doi":"10.1109/CCGRID.2018.00049","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00049","url":null,"abstract":"Data collected from an operational Google data center during 29 days represent a very rich and very useful source of information for understanding the main features of a data center. In this paper, we highlight the strong heterogeneity of jobs. The distribution of job execution duration shows a high disparity, as well as the job waiting time before being scheduled. The resource requests in terms of CPU and memory are also analyzed. The knowledge of all these features is needed to design models of jobs, machines and resource requests that are representative of a real data center.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"14 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116822815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-28DOI: 10.1109/CCGRID.2018.00044
Sudheer Chunduri, Meysam Ghaffari, M. S. Lahijani, A. Srinivasan, S. Namilae
Numerical simulations are used to analyze the effectiveness of alternate public policy choices in limiting the spread of infections. In practice, it is usually not feasible to predict their precise impacts due to inherent uncertainties, especially at the early stages of an epidemic. One option is to parameterize the sources of uncertainty and carry out a parameter sweep to identify their robustness under a variety of possible scenarios. The Self Propelled Entity Dynamics (SPED) model has used this approach successfully to analyze the robustness of different airline boarding and deplaning procedures. However, the time taken by this approach is too large to answer questions raised during the course of a decision meeting. In this paper, we use a modified approach that pre-computes simulations of passenger movement, performing only the disease-specific analysis in real time. A novel contribution of this paper lies in using a low discrepancy sequence (LDS) in the parameter sweep, and demonstrating that it can lead to a reduction in analysis time by one to three orders of magnitude over the conventional lattice-based parameter sweep. However, its parallelization suffers from greater load imbalance than the conventional approach. We examine this and relate it to number-theoretic properties of the LDS. We then propose solutions to this problem. Our approach and analysis are applicable to other parameter sweep problems too. The primary contributions of this paper lie in the new approach of low discrepancy parameter sweep and in exploring solutions to challenges in its parallelization, evaluated in the context of an important public health application.
{"title":"Parallel Low Discrepancy Parameter Sweep for Public Health Policy","authors":"Sudheer Chunduri, Meysam Ghaffari, M. S. Lahijani, A. Srinivasan, S. Namilae","doi":"10.1109/CCGRID.2018.00044","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00044","url":null,"abstract":"Numerical simulations are used to analyze the effectiveness of alternate public policy choices in limiting the spread of infections. In practice, it is usually not feasible to predict their precise impacts due to inherent uncertainties, especially at the early stages of an epidemic. One option is to parameterize the sources of uncertainty and carry out a parameter sweep to identify their robustness under a variety of possible scenarios. The Self Propelled Entity Dynamics (SPED) model has used this approach successfully to analyze the robustness of different airline boarding and deplaning procedures. However, the time taken by this approach is too large to answer questions raised during the course of a decision meeting. In this paper, we use a modified approach that pre-computes simulations of passenger movement, performing only the disease-specific analysis in real time. A novel contribution of this paper lies in using a low discrepancy sequence (LDS) in the parameter sweep, and demonstrating that it can lead to a reduction in analysis time by one to three orders of magnitude over the conventional lattice-based parameter sweep. However, its parallelization suffers from greater load imbalance than the conventional approach. We examine this and relate it to number-theoretic properties of the LDS. We then propose solutions to this problem. Our approach and analysis are applicable to other parameter sweep problems too. The primary contributions of this paper lie in the new approach of low discrepancy parameter sweep and in exploring solutions to challenges in its parallelization, evaluated in the context of an important public health application.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133561074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}