2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)最新文献_第8页

Modeling Operational Fairness of Hybrid Cloud Brokerage 混合云经纪的运营公平性建模

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00083

Sreekrishnan Venkateswaran, S. Sarkar

Cloud service brokerage is an emerging technology that attempts to simplify the consumption and operation of hybrid clouds. Today's cloud brokers attempt to insulate consumers from the vagaries of multiple clouds. To achieve the insulation, the modern cloud broker needs to disguise itself as the end-provider to consumers by creating and operating a virtual data center construct that we call a "meta-cloud", which is assembled on top of a set of participating supplier clouds. It is crucial for such a cloud broker to be considered a trusted partner both by cloud consumers and by the underpinning cloud suppliers. A fundamental tenet of brokerage trust is vendor neutrality. On the one hand, cloud consumers will be comfortable if a cloud broker guarantees that they will not be led through a preferred path. And on the other hand, cloud suppliers would be more interested in partnering with a cloud broker who promises a fair apportioning of client provisioning requests. Because consumer and supplier trust on a meta-cloud broker stems from the assumption of being agnostic to supplier clouds, there is a need for a test strategy that verifies the fairness of cloud brokerage. In this paper, we propose a calculus of fairness that defines the rules to determine the operational behavior of a cloud broker. The calculus uses temporal logic to model the fact that fairness is a trait that has to be ascertained over time; it is not a characteristic that can be judged at a per-request fulfillment level. Using our temporal calculus of fairness as the basis, we propose an algorithm to determine the fairness of a broker probabilistically, based on its observed request apportioning policies. Our model for the fairness of cloud broker behavior also factors in inter-provider variables such as cost divergence and capacity variance. We empirically validate our approach by constructing a meta-cloud from AWS, Azure and IBM, in addition to leveraging a cloud simulator. Our industrial engagements with large enterprises also validate the need for such cloud brokerage with verifiable fairness.

云服务经纪是一种新兴技术，它试图简化混合云的消费和操作。今天的云代理试图将消费者与多云的变幻莫测隔绝开来。为了实现这种隔离，现代云代理需要通过创建和操作我们称之为“元云”的虚拟数据中心构造，将自己伪装成消费者的最终提供者，元云是在一组参与的供应商云之上组装的。对于这样一个云代理来说，被云消费者和基础云供应商视为值得信赖的合作伙伴是至关重要的。经纪信托的一个基本原则是供应商中立。一方面，如果云代理保证他们不会被引导到首选路径，云消费者将会感到舒适。另一方面，云供应商更有兴趣与承诺公平分配客户供应请求的云代理合作。由于消费者和供应商对元云代理的信任源于对供应商云不可知的假设，因此需要一种测试策略来验证云代理的公平性。在本文中，我们提出了一种公平演算，它定义了确定云代理的操作行为的规则。这种演算使用时间逻辑来模拟这样一个事实，即公平是一种必须随着时间的推移而确定的特征;它不是一个可以在每个请求实现级别上判断的特征。以我们的时间公平性计算为基础，我们提出了一种基于其观察到的请求分配策略概率地确定代理公平性的算法。我们的云代理行为公平性模型还考虑了供应商之间的变量，如成本差异和容量差异。除了利用云模拟器外，我们还通过从AWS、Azure和IBM构建元云来验证我们的方法。我们与大型企业的工业合作也验证了这种具有可验证公平性的云经纪的必要性。

{"title":"Modeling Operational Fairness of Hybrid Cloud Brokerage","authors":"Sreekrishnan Venkateswaran, S. Sarkar","doi":"10.1109/CCGRID.2018.00083","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00083","url":null,"abstract":"Cloud service brokerage is an emerging technology that attempts to simplify the consumption and operation of hybrid clouds. Today's cloud brokers attempt to insulate consumers from the vagaries of multiple clouds. To achieve the insulation, the modern cloud broker needs to disguise itself as the end-provider to consumers by creating and operating a virtual data center construct that we call a \"meta-cloud\", which is assembled on top of a set of participating supplier clouds. It is crucial for such a cloud broker to be considered a trusted partner both by cloud consumers and by the underpinning cloud suppliers. A fundamental tenet of brokerage trust is vendor neutrality. On the one hand, cloud consumers will be comfortable if a cloud broker guarantees that they will not be led through a preferred path. And on the other hand, cloud suppliers would be more interested in partnering with a cloud broker who promises a fair apportioning of client provisioning requests. Because consumer and supplier trust on a meta-cloud broker stems from the assumption of being agnostic to supplier clouds, there is a need for a test strategy that verifies the fairness of cloud brokerage. In this paper, we propose a calculus of fairness that defines the rules to determine the operational behavior of a cloud broker. The calculus uses temporal logic to model the fact that fairness is a trait that has to be ascertained over time; it is not a characteristic that can be judged at a per-request fulfillment level. Using our temporal calculus of fairness as the basis, we propose an algorithm to determine the fairness of a broker probabilistically, based on its observed request apportioning policies. Our model for the fairness of cloud broker behavior also factors in inter-provider variables such as cost divergence and capacity variance. We empirically validate our approach by constructing a meta-cloud from AWS, Azure and IBM, in addition to leveraging a cloud simulator. Our industrial engagements with large enterprises also validate the need for such cloud brokerage with verifiable fairness.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129260733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

RideMatcher: Peer-to-Peer Matching of Passengers for Efficient Ridesharing RideMatcher:实现高效拼车的点对点乘客匹配

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00041

N. V. Bozdog, M. Makkes, A. V. Halteren, H. Bal

The daily home-office commute of millions of people in crowded cities puts a strain on air quality, traveling time and noise pollution. This is especially problematic in western cities, where cars and taxis have low occupancy with daily commuters. To reduce these issues, authorities often encourage commuters to share their rides, also known as carpooling or ridesharing. To increase the ridesharing usage it is essential that commuters are efficiently matched. In this paper we present RideMatcher, a novel peer-to-peer system for matching car rides based on their routes and travel times. Unlike other ridesharing systems, RideMatcher is completely decentralized, which makes it possible to deploy it on distributed infrastructures, using fog and edge computing. Despite being decentralized, our system is able to efficiently match ridesharing users in near real-time. Our evaluations performed on a dataset with 34,837 real taxi trips from New York show that RideMatcher is able to reduce the number of taxi trips by up to 65%, the distance traveled by taxi cabs by up to 64%, and the cost of the trips by up to 66%.

在拥挤的城市里，数百万人每天从家到办公室通勤，这给空气质量、出行时间和噪音污染带来了压力。这在西方城市尤其成问题，在那里，汽车和出租车的使用率很低。为了减少这些问题，当局经常鼓励通勤者拼车，也被称为拼车或拼车。为了提高拼车的使用率，通勤者的有效匹配至关重要。在本文中，我们提出了RideMatcher，一个基于路线和旅行时间匹配汽车乘坐的新颖点对点系统。与其他拼车系统不同，RideMatcher是完全分散的，这使得它可以部署在分布式基础设施上，使用雾和边缘计算。尽管是去中心化的，但我们的系统能够近乎实时地有效匹配拼车用户。我们对来自纽约的34,837次真实出租车旅行的数据集进行了评估，结果表明，RideMatcher能够将出租车旅行次数减少多达65%，出租车行驶距离减少多达64%，旅行成本减少高达66%。

{"title":"RideMatcher: Peer-to-Peer Matching of Passengers for Efficient Ridesharing","authors":"N. V. Bozdog, M. Makkes, A. V. Halteren, H. Bal","doi":"10.1109/CCGRID.2018.00041","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00041","url":null,"abstract":"The daily home-office commute of millions of people in crowded cities puts a strain on air quality, traveling time and noise pollution. This is especially problematic in western cities, where cars and taxis have low occupancy with daily commuters. To reduce these issues, authorities often encourage commuters to share their rides, also known as carpooling or ridesharing. To increase the ridesharing usage it is essential that commuters are efficiently matched. In this paper we present RideMatcher, a novel peer-to-peer system for matching car rides based on their routes and travel times. Unlike other ridesharing systems, RideMatcher is completely decentralized, which makes it possible to deploy it on distributed infrastructures, using fog and edge computing. Despite being decentralized, our system is able to efficiently match ridesharing users in near real-time. Our evaluations performed on a dataset with 34,837 real taxi trips from New York show that RideMatcher is able to reduce the number of taxi trips by up to 65%, the distance traveled by taxi cabs by up to 64%, and the cost of the trips by up to 66%.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114329401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Experimental Study on the Performance and Resource Utilization of Data Streaming Frameworks 数据流框架性能与资源利用的实验研究

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00029

Subarna Chatterjee, C. Morin

With the advent of the Internet of Things (IoT), data stream processing have gained increased attention due to the ever-increasing need to process heterogeneous and voluminous data streams. This work addresses the problem of selecting a correct stream processing framework for a given application to be executed within a specific physical infrastructure. For this purpose, we focus on a thorough comparative analysis of three data stream processing platforms – Apache Flink, Apache Storm, and Twitter Heron (the enhanced version of Apache Storm), that are chosen based on their potential to process both streams and batches in real-time. The goal of the work is to enlighten the cloud-clients and the cloud-providers with the knowledge of the choice of the resource-efficient and requirement-adaptive streaming platform for a given application so that they can plan during allocation or assignment of Virtual Machines for application execution. For the comparative performance analysis of the chosen platforms, we have experimented using 8-node clusters on Grid5000 experimentation testbed and have selected a wide variety of applications ranging from a conventional benchmark to sensor-based IoT application and statistical batch processing application. In addition to the various performance metrics related to the elasticity and resource usage of the platforms, this work presents a comparative study of the “green-ness” of the streaming platforms by analyzing their power consumption – one of the first attempts of its kind. The obtained results are thoroughly analyzed to illustrate the functional behavior of these platforms under different computing scenarios.

随着物联网(IoT)的出现，由于处理异构和大量数据流的需求不断增加，数据流处理受到越来越多的关注。这项工作解决了为在特定物理基础设施中执行的给定应用程序选择正确的流处理框架的问题。为此，我们重点对三个数据流处理平台——Apache Flink、Apache Storm和Twitter Heron (Apache Storm的增强版本)——进行了全面的比较分析，这些平台是基于它们实时处理流和批处理的潜力而选择的。这项工作的目标是让云客户端和云提供商了解如何为给定的应用程序选择资源高效和需求自适应的流平台，以便他们可以在分配或分配应用程序执行的虚拟机期间进行计划。为了比较所选平台的性能分析，我们在Grid5000实验测试台上使用8节点集群进行了实验，并选择了从传统基准到基于传感器的物联网应用和统计批处理应用的各种应用。除了与平台的弹性和资源使用相关的各种性能指标外，这项工作还通过分析其功耗对流媒体平台的“绿色”进行了比较研究——这是同类研究的首次尝试之一。对得到的结果进行了深入的分析，以说明这些平台在不同计算场景下的功能行为。

{"title":"Experimental Study on the Performance and Resource Utilization of Data Streaming Frameworks","authors":"Subarna Chatterjee, C. Morin","doi":"10.1109/CCGRID.2018.00029","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00029","url":null,"abstract":"With the advent of the Internet of Things (IoT), data stream processing have gained increased attention due to the ever-increasing need to process heterogeneous and voluminous data streams. This work addresses the problem of selecting a correct stream processing framework for a given application to be executed within a specific physical infrastructure. For this purpose, we focus on a thorough comparative analysis of three data stream processing platforms – Apache Flink, Apache Storm, and Twitter Heron (the enhanced version of Apache Storm), that are chosen based on their potential to process both streams and batches in real-time. The goal of the work is to enlighten the cloud-clients and the cloud-providers with the knowledge of the choice of the resource-efficient and requirement-adaptive streaming platform for a given application so that they can plan during allocation or assignment of Virtual Machines for application execution. For the comparative performance analysis of the chosen platforms, we have experimented using 8-node clusters on Grid5000 experimentation testbed and have selected a wide variety of applications ranging from a conventional benchmark to sensor-based IoT application and statistical batch processing application. In addition to the various performance metrics related to the elasticity and resource usage of the platforms, this work presents a comparative study of the “green-ness” of the streaming platforms by analyzing their power consumption – one of the first attempts of its kind. The obtained results are thoroughly analyzed to illustrate the functional behavior of these platforms under different computing scenarios.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123834208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Stocator: Providing High Performance and Fault Tolerance for Apache Spark Over Object Storage Stocator:为Apache Spark Over Object Storage提供高性能和容错性

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00073

G. Vernik, M. Factor, E. K. Kolodner, P. Michiardi, Effi Ofer, Francesco Pace

Until now object storage has not been a first-class citizen of the Apache Hadoop ecosystem including Apache Spark. Hadoop connectors to object storage have been based on file semantics, an impedance mismatch, which leads to low performance and the need for an additional consistent storage system to achieve fault tolerance. In particular, Hadoop depends on its underlying storage system and its associated connector for fault tolerance and allowing speculative execution. However, these characteristics are obtained through file operations that are not native for object storage, and are both costly and not atomic. As a result these connectors are not efficient and more importantly they cannot help with fault tolerance for object storage. We introduce Stocator, whose novel algorithm achieves both high performance and fault tolerance by taking advantage of object storage semantics. This greatly decreases the number of operations on object storage as well as enabling a much simpler approach to dealing with the eventually consistent semantics typical of object storage. We have implemented Stocator and shared it in open source. Performance testing with Apache Spark shows that it can be 18 times faster for write intensive workloads and can perform 30 times fewer operations on object storage than the legacy Hadoop connectors, reducing costs both for the client and the object storage service provider.

到目前为止，对象存储还不是Apache Hadoop生态系统(包括Apache Spark)的一等公民。Hadoop到对象存储的连接器是基于文件语义的，这是一种阻抗不匹配，导致性能低下，并且需要额外的一致存储系统来实现容错。特别是，Hadoop依赖于其底层存储系统及其相关连接器来实现容错和允许推测执行。然而，这些特征是通过文件操作获得的，这些操作不是对象存储的本机操作，而且成本高，而且不是原子性的。因此，这些连接器效率不高，更重要的是，它们不能帮助实现对象存储的容错。本文介绍了Stocator算法，该算法利用对象存储语义实现了高性能和容错性。这大大减少了对象存储上的操作数量，并支持一种更简单的方法来处理对象存储典型的最终一致的语义。我们已经实现了Stocator，并在开源中分享了它。使用Apache Spark进行的性能测试表明，对于写密集型工作负载，它可以比传统Hadoop连接器快18倍，在对象存储上执行的操作可以减少30倍，从而降低了客户端和对象存储服务提供商的成本。

{"title":"Stocator: Providing High Performance and Fault Tolerance for Apache Spark Over Object Storage","authors":"G. Vernik, M. Factor, E. K. Kolodner, P. Michiardi, Effi Ofer, Francesco Pace","doi":"10.1109/CCGRID.2018.00073","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00073","url":null,"abstract":"Until now object storage has not been a first-class citizen of the Apache Hadoop ecosystem including Apache Spark. Hadoop connectors to object storage have been based on file semantics, an impedance mismatch, which leads to low performance and the need for an additional consistent storage system to achieve fault tolerance. In particular, Hadoop depends on its underlying storage system and its associated connector for fault tolerance and allowing speculative execution. However, these characteristics are obtained through file operations that are not native for object storage, and are both costly and not atomic. As a result these connectors are not efficient and more importantly they cannot help with fault tolerance for object storage. We introduce Stocator, whose novel algorithm achieves both high performance and fault tolerance by taking advantage of object storage semantics. This greatly decreases the number of operations on object storage as well as enabling a much simpler approach to dealing with the eventually consistent semantics typical of object storage. We have implemented Stocator and shared it in open source. Performance testing with Apache Spark shows that it can be 18 times faster for write intensive workloads and can perform 30 times fewer operations on object storage than the legacy Hadoop connectors, reducing costs both for the client and the object storage service provider.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121107609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

ApproxG: Fast Approximate Parallel Graphlet Counting Through Accuracy Control 通过精度控制快速近似并行石墨计数

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00080

Daniel Mawhirter, Bo Wu, D. Mehta, Chao Ai

Graphlet counting is a methodology for detecting local structural properties of large graphs that has been in use for over a decade. Despite tremendous effort in optimizing its performance, even 3- and 4-node graphlet counting routines may run for hours or days on highly optimized systems. In this paper, we describe how a synergistic combination of approximate computing with parallel computing can result in multiplicative performance improvements in graphlet counting runtimes with minimal and controllable loss of accuracy. Specifically, we describe two novel techniques, multi-phased sampling for statistical accuracy guarantees and cost-aware sampling to further improve performance on multi-machine runs, which reduce the query time on large graphs from tens of hours to several minutes or seconds with only <1% relative error.

Graphlet计数是一种用于检测大型图的局部结构特性的方法，已经使用了十多年。尽管在优化性能方面付出了巨大的努力，但即使是3节点和4节点的graphlet计数例程也可能在高度优化的系统上运行数小时或数天。在本文中，我们描述了近似计算与并行计算的协同组合如何在最小和可控的准确性损失的情况下，在graphlet计数运行时产生乘法性能改进。具体来说，我们描述了两种新技术，用于保证统计准确性的多阶段采样和用于进一步提高多机器运行性能的成本感知采样，这将大型图的查询时间从数十小时减少到几分钟或几秒钟，相对误差仅<1%。

引用次数: 14

Enhancing Efficiency of Hybrid Transactional Memory Via Dynamic Data Partitioning Schemes 通过动态数据分区方案提高混合事务性内存的效率

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00020

Pedro Raminhas, S. Issa, P. Romano

Transactional Memory (TM) is an emerging paradigm that promises to significantly ease the development of parallel programs. Hybrid TM (HyTM) is probably the most promising implementation of the TM abstraction, which seeks to combine the high efficiency of hardware implementations (HTM) with the robustness and flexibility of software-based ones (STM). Unfortunately, though, existing Hybrid TM systems are known to suffer from high overheads to guarantee correct synchronization between concurrent transactions executing in hardware and software. This article introduces DMP-TM (Dynamic Memory Partitioning-TM), a novel HyTM algorithm that exploits, to the best of our knowledge for the first time in the literature, the idea of leveraging operating system-level memory protection mechanisms to detect conflicts between HTM and STM transactions. This innovative design allows for employing highly scalable STM implementations, while avoiding instrumentation on the HTM path. This allows DMP-TM to achieve up to ~ 20× speedups compared to state of the art Hybrid TM solutions in uncontended workloads. Further, thanks to the use of simple and lightweight self-tuning mechanisms, DMP-TM achieves robust performance even in unfavourable workload that exhibits high contention between the STM and HTM path.

事务性内存(Transactional Memory, TM)是一种新兴的范式，有望极大地简化并行程序的开发。混合TM (HyTM)可能是最有前途的TM抽象实现，它寻求将硬件实现(HTM)的高效率与基于软件的实现(STM)的健壮性和灵活性结合起来。然而，不幸的是，已知现有的Hybrid TM系统在保证硬件和软件中执行的并发事务之间的正确同步方面存在很高的开销。本文介绍DMP-TM(动态内存分区- tm)，这是一种新颖的HyTM算法，据我们所知，在文献中首次利用了利用操作系统级内存保护机制来检测HTM和STM事务之间冲突的思想。这种创新的设计允许采用高度可伸缩的STM实现，同时避免在HTM路径上进行检测。这使得DMP-TM在无竞争的工作负载中与最先进的Hybrid TM解决方案相比，可以实现高达20倍的速度提升。此外，由于使用了简单和轻量级的自调优机制，DMP-TM即使在STM和HTM路径之间表现出高度竞争的不利工作负载中也能实现健壮的性能。

{"title":"Enhancing Efficiency of Hybrid Transactional Memory Via Dynamic Data Partitioning Schemes","authors":"Pedro Raminhas, S. Issa, P. Romano","doi":"10.1109/CCGRID.2018.00020","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00020","url":null,"abstract":"Transactional Memory (TM) is an emerging paradigm that promises to significantly ease the development of parallel programs. Hybrid TM (HyTM) is probably the most promising implementation of the TM abstraction, which seeks to combine the high efficiency of hardware implementations (HTM) with the robustness and flexibility of software-based ones (STM). Unfortunately, though, existing Hybrid TM systems are known to suffer from high overheads to guarantee correct synchronization between concurrent transactions executing in hardware and software. This article introduces DMP-TM (Dynamic Memory Partitioning-TM), a novel HyTM algorithm that exploits, to the best of our knowledge for the first time in the literature, the idea of leveraging operating system-level memory protection mechanisms to detect conflicts between HTM and STM transactions. This innovative design allows for employing highly scalable STM implementations, while avoiding instrumentation on the HTM path. This allows DMP-TM to achieve up to ~ 20× speedups compared to state of the art Hybrid TM solutions in uncontended workloads. Further, thanks to the use of simple and lightweight self-tuning mechanisms, DMP-TM achieves robust performance even in unfavourable workload that exhibits high contention between the STM and HTM path.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128486893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Real-Time Graph Partition and Embedding of Large Network 大型网络的实时图划分与嵌入

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00070

Wenqi Liu, Hongxiang Li, Bin Xie

Recently, large-scale networks attract significant attention to analyze and extract the hidden information of big data. Toward this end, graph embedding is a method to embed a high dimensional graph into a much lower dimensional vector space while maximally preserving the structural information of the original network. However, effective graph embedding is particularly challenging when massive graph data are generated and processed for real-time applications. In this paper, we address this challenge and propose a new real-time and distributed graph embedding algorithm (RTDGE) that is capable of distributively embedding a large-scale graph in a streaming fashion. Specifically, our RTDGE consists of the following components: (1) a graph partition scheme that divides all edges into distinct subgraphs, where vertices are associated with edges and may belong to several subgraphs; (2) a dynamic negative sampling (DNS) method that updates the embedded vectors in real-time; and (3) an unsupervised global aggregation scheme that combines all locally embedded vectors into a global vector space. Furthermore, we also build a real-time distributed graph embedding platform based on Apache Kafka and Apache Storm. Extensive experimental results show that RTDGE outperforms existing solutions in terms of graph embedding efficiency and accuracy.

近年来，大规模网络对大数据隐藏信息的分析和提取备受关注。为此，图嵌入是一种将高维图嵌入到低维向量空间中，同时最大限度地保留原始网络结构信息的方法。然而，当为实时应用生成和处理大量图形数据时，有效的图嵌入尤其具有挑战性。在本文中，我们解决了这一挑战，并提出了一种新的实时分布式图嵌入算法(RTDGE)，该算法能够以流方式分布式嵌入大规模图。具体来说，我们的RTDGE由以下部分组成:(1)将所有边划分为不同的子图的图划分方案，其中顶点与边相关联，并且可能属于多个子图;(2)实时更新嵌入向量的动态负采样(DNS)方法;(3)将所有局部嵌入向量组合到一个全局向量空间的无监督全局聚合方案。此外，我们还构建了一个基于Apache Kafka和Apache Storm的实时分布式图嵌入平台。大量的实验结果表明，RTDGE在图嵌入效率和准确性方面优于现有的解决方案。

{"title":"Real-Time Graph Partition and Embedding of Large Network","authors":"Wenqi Liu, Hongxiang Li, Bin Xie","doi":"10.1109/CCGRID.2018.00070","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00070","url":null,"abstract":"Recently, large-scale networks attract significant attention to analyze and extract the hidden information of big data. Toward this end, graph embedding is a method to embed a high dimensional graph into a much lower dimensional vector space while maximally preserving the structural information of the original network. However, effective graph embedding is particularly challenging when massive graph data are generated and processed for real-time applications. In this paper, we address this challenge and propose a new real-time and distributed graph embedding algorithm (RTDGE) that is capable of distributively embedding a large-scale graph in a streaming fashion. Specifically, our RTDGE consists of the following components: (1) a graph partition scheme that divides all edges into distinct subgraphs, where vertices are associated with edges and may belong to several subgraphs; (2) a dynamic negative sampling (DNS) method that updates the embedded vectors in real-time; and (3) an unsupervised global aggregation scheme that combines all locally embedded vectors into a global vector space. Furthermore, we also build a real-time distributed graph embedding platform based on Apache Kafka and Apache Storm. Extensive experimental results show that RTDGE outperforms existing solutions in terms of graph embedding efficiency and accuracy.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125326117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Optimizing Data Transfers for Improved Performance on Shared GPUs Using Reinforcement Learning 使用强化学习优化共享gpu上的数据传输以提高性能

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00061

R. Luley, Qinru Qiu

Optimizing resource utilization is a critical issue in cloud and cluster-based computing systems. In such systems, computing resources often consist of one or more GPU devices, and much research has already been conducted on means for maximizing compute resources through shared execution strategies. However, one of the most severe resource constraints in these scenarios is the data transfer channel between the host (i.e., CPU) and the device (i.e., GPU). Data transfer contention has been shown to have a significant impact on performance, yet methods for optimizing such contention have not been thoroughly studied. Techniques that have been examined make certain assumptions which limit effectiveness in the general case. In this paper, we introduce a heuristic which selectively aggregates transfers in order to maximize system performance by optimizing the transfer channel bandwidth. We compare this heuristic to traditional first-come-first-served approach, and apply Monte Carlo reinforcement learning to find an optimal policy for message aggregation. Finally, we evaluate the performance of Monte Carlo reinforcement learning with an arbitrarily-initialized policy. We demonstrate its effectiveness in learning optimal data transfer policy without detailed system characterization, which will enable a general adaptable solution for resource management of future systems.

在基于云和集群的计算系统中，优化资源利用是一个关键问题。在这样的系统中，计算资源通常由一个或多个GPU设备组成，并且已经对通过共享执行策略最大化计算资源的方法进行了大量研究。然而，在这些场景中最严重的资源限制之一是主机(即CPU)和设备(即GPU)之间的数据传输通道。数据传输争用已被证明对性能有重大影响，但优化这种争用的方法尚未得到深入研究。所研究的技术都有一定的假设，这些假设限制了一般情况下的有效性。本文引入了一种启发式算法，通过优化传输信道带宽，选择性地聚合传输，从而使系统性能最大化。我们将这种启发式方法与传统的先到先得方法进行比较，并应用蒙特卡罗强化学习来寻找消息聚合的最佳策略。最后，我们评估了随机初始化策略下蒙特卡罗强化学习的性能。我们证明了它在学习最佳数据传输策略方面的有效性，而不需要详细的系统特征，这将为未来系统的资源管理提供一个通用的适应性解决方案。

{"title":"Optimizing Data Transfers for Improved Performance on Shared GPUs Using Reinforcement Learning","authors":"R. Luley, Qinru Qiu","doi":"10.1109/CCGRID.2018.00061","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00061","url":null,"abstract":"Optimizing resource utilization is a critical issue in cloud and cluster-based computing systems. In such systems, computing resources often consist of one or more GPU devices, and much research has already been conducted on means for maximizing compute resources through shared execution strategies. However, one of the most severe resource constraints in these scenarios is the data transfer channel between the host (i.e., CPU) and the device (i.e., GPU). Data transfer contention has been shown to have a significant impact on performance, yet methods for optimizing such contention have not been thoroughly studied. Techniques that have been examined make certain assumptions which limit effectiveness in the general case. In this paper, we introduce a heuristic which selectively aggregates transfers in order to maximize system performance by optimizing the transfer channel bandwidth. We compare this heuristic to traditional first-come-first-served approach, and apply Monte Carlo reinforcement learning to find an optimal policy for message aggregation. Finally, we evaluate the performance of Monte Carlo reinforcement learning with an arbitrarily-initialized policy. We demonstrate its effectiveness in learning optimal data transfer policy without detailed system characterization, which will enable a general adaptable solution for resource management of future systems.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116722027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Data Analysis of a Google Data Center 谷歌数据中心数据分析

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00049

P. Minet, É. Renault, I. Khoufi, S. Boumerdassi

Data collected from an operational Google data center during 29 days represent a very rich and very useful source of information for understanding the main features of a data center. In this paper, we highlight the strong heterogeneity of jobs. The distribution of job execution duration shows a high disparity, as well as the job waiting time before being scheduled. The resource requests in terms of CPU and memory are also analyzed. The knowledge of all these features is needed to design models of jobs, machines and resource requests that are representative of a real data center.

从运行中的谷歌数据中心在29天内收集的数据为了解数据中心的主要特性提供了非常丰富和非常有用的信息源。在本文中，我们强调了工作的强烈异质性。作业执行时间分布差异较大，作业被调度前的等待时间分布差异较大。从CPU和内存两个方面分析了资源请求。要设计代表真实数据中心的作业、机器和资源请求模型，就需要了解所有这些特性。

引用次数: 5

Parallel Low Discrepancy Parameter Sweep for Public Health Policy 公共卫生策略的并行低差异参数扫描

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-04-28 DOI: 10.1109/CCGRID.2018.00044

Sudheer Chunduri, Meysam Ghaffari, M. S. Lahijani, A. Srinivasan, S. Namilae

Numerical simulations are used to analyze the effectiveness of alternate public policy choices in limiting the spread of infections. In practice, it is usually not feasible to predict their precise impacts due to inherent uncertainties, especially at the early stages of an epidemic. One option is to parameterize the sources of uncertainty and carry out a parameter sweep to identify their robustness under a variety of possible scenarios. The Self Propelled Entity Dynamics (SPED) model has used this approach successfully to analyze the robustness of different airline boarding and deplaning procedures. However, the time taken by this approach is too large to answer questions raised during the course of a decision meeting. In this paper, we use a modified approach that pre-computes simulations of passenger movement, performing only the disease-specific analysis in real time. A novel contribution of this paper lies in using a low discrepancy sequence (LDS) in the parameter sweep, and demonstrating that it can lead to a reduction in analysis time by one to three orders of magnitude over the conventional lattice-based parameter sweep. However, its parallelization suffers from greater load imbalance than the conventional approach. We examine this and relate it to number-theoretic properties of the LDS. We then propose solutions to this problem. Our approach and analysis are applicable to other parameter sweep problems too. The primary contributions of this paper lie in the new approach of low discrepancy parameter sweep and in exploring solutions to challenges in its parallelization, evaluated in the context of an important public health application.

数值模拟用于分析不同公共政策选择在限制传染病传播方面的有效性。在实践中，由于固有的不确定性，特别是在流行病的早期阶段，通常无法准确预测其影响。一种选择是参数化不确定性的来源，并进行参数扫描，以确定其在各种可能情况下的鲁棒性。自推进实体动力学(Self - Propelled Entity Dynamics, SPED)模型利用该方法成功地分析了不同航线登机和下机过程的鲁棒性。然而，这种方法所花费的时间太长，无法回答决策会议过程中提出的问题。在本文中，我们使用了一种改进的方法，即预先计算乘客运动的模拟，仅实时执行特定疾病的分析。本文的一个新颖贡献在于在参数扫描中使用低差异序列(LDS)，并证明它可以使分析时间比传统的基于格的参数扫描减少一到三个数量级。然而，与传统方法相比，它的并行性受到更大的负载不平衡的影响。我们对此进行了研究，并将其与LDS的数论性质联系起来。然后我们提出解决这个问题的方法。我们的方法和分析也适用于其他参数扫描问题。本文的主要贡献在于低差异参数扫描的新方法，并探索其并行化挑战的解决方案，在一个重要的公共卫生应用的背景下进行评估。

{"title":"Parallel Low Discrepancy Parameter Sweep for Public Health Policy","authors":"Sudheer Chunduri, Meysam Ghaffari, M. S. Lahijani, A. Srinivasan, S. Namilae","doi":"10.1109/CCGRID.2018.00044","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00044","url":null,"abstract":"Numerical simulations are used to analyze the effectiveness of alternate public policy choices in limiting the spread of infections. In practice, it is usually not feasible to predict their precise impacts due to inherent uncertainties, especially at the early stages of an epidemic. One option is to parameterize the sources of uncertainty and carry out a parameter sweep to identify their robustness under a variety of possible scenarios. The Self Propelled Entity Dynamics (SPED) model has used this approach successfully to analyze the robustness of different airline boarding and deplaning procedures. However, the time taken by this approach is too large to answer questions raised during the course of a decision meeting. In this paper, we use a modified approach that pre-computes simulations of passenger movement, performing only the disease-specific analysis in real time. A novel contribution of this paper lies in using a low discrepancy sequence (LDS) in the parameter sweep, and demonstrating that it can lead to a reduction in analysis time by one to three orders of magnitude over the conventional lattice-based parameter sweep. However, its parallelization suffers from greater load imbalance than the conventional approach. We examine this and relate it to number-theoretic properties of the LDS. We then propose solutions to this problem. Our approach and analysis are applicable to other parameter sweep problems too. The primary contributions of this paper lie in the new approach of low discrepancy parameter sweep and in exploring solutions to challenges in its parallelization, evaluated in the context of an important public health application.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133561074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11