2015 IEEE 35th International Conference on Distributed Computing Systems最新文献_第10页

Thread Migration in a Replicated-Kernel OS 复制内核操作系统中的线程迁移

2015 IEEE 35th International Conference on Distributed Computing Systems

Pub Date : 2015-06-01 DOI: 10.1109/ICDCS.2015.36

David Katz, A. Barbalace, Saif Ansary, A. Ravichandran, B. Ravindran

Chip manufacturers continue to increase the number of cores per chip while balancing requirements for low power consumption. This drives a need for simpler cores and hardware caches. Because of these trends, the scalability of existing shared memory system software is in question. Traditional operating systems (OS) for multiprocessors are based on shared memory communication between cores and are symmetric (SMP). Contention in SMP OSes over shared data structures is increasingly significant in newer generations of many-core processors. We propose the use of the replicated-kernel OS design to improve scalability over the traditional SMP OS. Our replicated-kernel design is an extension of the concept of the multikernel. While a multikernel appears to application software as a distributed network of cooperating micro kernels, we provide the appearance of a monolithic, single-system image, task-based OS in which application software is unaware of the distributed nature of the underlying OS. In this paper we tackle the problem of thread migration between kernels in a replicated-kernel OS. We focus on distributed thread group creation, context migration, and address space consistency for threads that execute on different kernels, but belong to the same distributed thread group. This concept is embodied in our prototype OS, called Popcorn Linux, which runs on multicore x86 machines and presents a Linux-like interface to application software that is indistinguishable from the SMP Linux interface. By doing this, we are able to leverage the wealth of existing Linux software for use on our platform while demonstrating the characteristics of the underlying replicated-kernel OS. We show that a replicated-kernel OS scales as well as a multikernel OS by removing the contention on shared data structures. Popcorn, Barr elfish, and SMP Linux are compared on selected benchmarks. Popcorn is shown to be competitive to SMP Linux, and up to 40% faster.

芯片制造商继续增加每个芯片的核心数量，同时平衡低功耗的要求。这推动了对更简单的内核和硬件缓存的需求。由于这些趋势，现有共享内存系统软件的可伸缩性受到了质疑。传统的多处理器操作系统(OS)基于内核之间的共享内存通信，并且是对称的(SMP)。SMP操作系统中对共享数据结构的争用在新一代多核处理器中变得越来越重要。我们建议使用复制内核操作系统设计来提高传统SMP操作系统的可伸缩性。我们的复制内核设计是多内核概念的扩展。在应用软件看来，多内核是由协作的微内核组成的分布式网络，而我们提供了一种单系统、单系统映像、基于任务的操作系统的外观，在这种操作系统中，应用软件不知道底层操作系统的分布式特性。在本文中，我们解决了一个复制内核操作系统中内核之间的线程迁移问题。我们重点关注在不同内核上执行但属于同一分布式线程组的线程的分布式线程组创建、上下文迁移和地址空间一致性。这个概念体现在我们的原型操作系统中，称为Popcorn Linux，它运行在多核x86机器上，并为应用软件提供类似Linux的接口，与SMP Linux接口没有区别。通过这样做，我们能够在展示底层复制内核操作系统的特性的同时，利用在我们平台上使用的丰富的现有Linux软件。通过消除共享数据结构上的争用，我们证明了复制内核操作系统的伸缩性与多内核操作系统一样好。在选定的基准上比较Popcorn、Barr elfish和SMP Linux。Popcorn被证明可以与SMP Linux竞争，并且速度快40%。

{"title":"Thread Migration in a Replicated-Kernel OS","authors":"David Katz, A. Barbalace, Saif Ansary, A. Ravichandran, B. Ravindran","doi":"10.1109/ICDCS.2015.36","DOIUrl":"https://doi.org/10.1109/ICDCS.2015.36","url":null,"abstract":"Chip manufacturers continue to increase the number of cores per chip while balancing requirements for low power consumption. This drives a need for simpler cores and hardware caches. Because of these trends, the scalability of existing shared memory system software is in question. Traditional operating systems (OS) for multiprocessors are based on shared memory communication between cores and are symmetric (SMP). Contention in SMP OSes over shared data structures is increasingly significant in newer generations of many-core processors. We propose the use of the replicated-kernel OS design to improve scalability over the traditional SMP OS. Our replicated-kernel design is an extension of the concept of the multikernel. While a multikernel appears to application software as a distributed network of cooperating micro kernels, we provide the appearance of a monolithic, single-system image, task-based OS in which application software is unaware of the distributed nature of the underlying OS. In this paper we tackle the problem of thread migration between kernels in a replicated-kernel OS. We focus on distributed thread group creation, context migration, and address space consistency for threads that execute on different kernels, but belong to the same distributed thread group. This concept is embodied in our prototype OS, called Popcorn Linux, which runs on multicore x86 machines and presents a Linux-like interface to application software that is indistinguishable from the SMP Linux interface. By doing this, we are able to leverage the wealth of existing Linux software for use on our platform while demonstrating the characteristics of the underlying replicated-kernel OS. We show that a replicated-kernel OS scales as well as a multikernel OS by removing the contention on shared data structures. Popcorn, Barr elfish, and SMP Linux are compared on selected benchmarks. Popcorn is shown to be competitive to SMP Linux, and up to 40% faster.","PeriodicalId":129182,"journal":{"name":"2015 IEEE 35th International Conference on Distributed Computing Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132181753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Accelerating Apache Hive with MPI for Data Warehouse Systems 数据仓库系统用MPI加速Apache Hive

2015 IEEE 35th International Conference on Distributed Computing Systems

Pub Date : 2015-06-01 DOI: 10.1109/ICDCS.2015.73

Lu Chao, Chundian Li, Fan Liang, Xiaoyi Lu, Zhiwei Xu

Data warehouse systems, like Apache Hive, have been widely used in the distributed computing field. However, current generation data warehouse systems have not fully embraced High Performance Computing (HPC) technologies even though the trend of converging Big Data and HPC is emerging. For example, in traditional HPC field, Message Passing Interface (MPI) libraries have been optimized for HPC applications during last decades to deliver ultra-high data movement performance. Recent studies, like DataMPI, are extending MPI for Big Data applications to bridge these two fields. This trend motivates us to explore whether MPI can benefit data warehouse systems, such as Apache Hive. In this paper, we propose a novel design to accelerate Apache Hive by utilizing DataMPI. We further optimize the DataMPI engine by introducing enhanced non-blocking communication and parallelism mechanisms for typical Hive workloads based on their communication characteristics. Our design can fully and transparently support Hive workloads like Intel HiBench and TPC-H with high productivity. Performance evaluation with Intel HiBench shows that with the help of light-weight DataMPI library design, efficient job start up and data movement mechanisms, Hive on DataMPI performs 30% faster than Hive on Hadoop averagely. And the experiments on TPC-H with ORCFile show that the performance of Hive on DataMPI can improve 32% averagely and 53% at most more than that of Hive on Hadoop. To the best of our knowledge, Hive on DataMPI is the first attempt to propose a general design for fully supporting and accelerating data warehouse systems with MPI.

数据仓库系统，如Apache Hive，已经在分布式计算领域得到了广泛的应用。然而，尽管大数据和高性能计算的融合趋势正在兴起，但当前的数据仓库系统并没有完全采用高性能计算技术。例如，在传统的HPC领域，消息传递接口(Message Passing Interface, MPI)库在过去几十年中针对HPC应用进行了优化，以提供超高的数据移动性能。最近的研究，如DataMPI，正在将MPI扩展到大数据应用中，以连接这两个领域。这种趋势促使我们探索MPI是否可以使数据仓库系统(如Apache Hive)受益。本文提出了一种利用DataMPI加速Apache Hive的新设计。我们进一步优化了DataMPI引擎，根据典型Hive工作负载的通信特性，引入了增强的非阻塞通信和并行机制。我们的设计可以完全透明地支持Hive工作负载，如Intel HiBench和TPC-H，具有高生产力。通过Intel HiBench的性能评估表明，借助轻量级的DataMPI库设计、高效的作业启动和数据移动机制，Hive在DataMPI上的运行速度比Hive在Hadoop上的平均速度快30%。利用ORCFile在TPC-H上的实验表明，Hive在DataMPI上的性能比Hive在Hadoop上的性能平均提高32%，最多提高53%。据我们所知，Hive on DataMPI是第一次尝试为完全支持和加速MPI数据仓库系统提出通用设计。

{"title":"Accelerating Apache Hive with MPI for Data Warehouse Systems","authors":"Lu Chao, Chundian Li, Fan Liang, Xiaoyi Lu, Zhiwei Xu","doi":"10.1109/ICDCS.2015.73","DOIUrl":"https://doi.org/10.1109/ICDCS.2015.73","url":null,"abstract":"Data warehouse systems, like Apache Hive, have been widely used in the distributed computing field. However, current generation data warehouse systems have not fully embraced High Performance Computing (HPC) technologies even though the trend of converging Big Data and HPC is emerging. For example, in traditional HPC field, Message Passing Interface (MPI) libraries have been optimized for HPC applications during last decades to deliver ultra-high data movement performance. Recent studies, like DataMPI, are extending MPI for Big Data applications to bridge these two fields. This trend motivates us to explore whether MPI can benefit data warehouse systems, such as Apache Hive. In this paper, we propose a novel design to accelerate Apache Hive by utilizing DataMPI. We further optimize the DataMPI engine by introducing enhanced non-blocking communication and parallelism mechanisms for typical Hive workloads based on their communication characteristics. Our design can fully and transparently support Hive workloads like Intel HiBench and TPC-H with high productivity. Performance evaluation with Intel HiBench shows that with the help of light-weight DataMPI library design, efficient job start up and data movement mechanisms, Hive on DataMPI performs 30% faster than Hive on Hadoop averagely. And the experiments on TPC-H with ORCFile show that the performance of Hive on DataMPI can improve 32% averagely and 53% at most more than that of Hive on Hadoop. To the best of our knowledge, Hive on DataMPI is the first attempt to propose a general design for fully supporting and accelerating data warehouse systems with MPI.","PeriodicalId":129182,"journal":{"name":"2015 IEEE 35th International Conference on Distributed Computing Systems","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127559272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

ECO-DNS: Expected Consistency Optimization for DNS ECO-DNS: DNS预期一致性优化

2015 IEEE 35th International Conference on Distributed Computing Systems

Pub Date : 2015-06-01 DOI: 10.1109/ICDCS.2015.34

Chen Chen, S. Matsumoto, A. Perrig

The flexibility of the current Domain Name System (DNS) has been stretched to its limits to accommodate new applications such as content delivery networks and dynamic DNS. In particular, maintaining cache consistency has become a much larger problem, as emerging technologies require increasingly-frequent updates to DNS records. Though Time-To-Live (TTL) is the most widely used method of controlling cache consistency, it does not offer the fine-grained control necessary for handling these frequent changes. In addition, TTLs are too static to handle sudden changes in traffic caused by Internet failures or social media trends, demonstrating their inflexibility in the face of unforeseen events. To address these problems, we first propose a metric called Expected Aggregate Inconsistency (EAI), which allows us to consider important factors such as a record's update frequency and popularity when quantitatively measuring inconsistency. We then design ECO-DNS, a lightweight system that leverages the information provided by EAI to optimize a record's TTL. This value can be tuned to individual cache servers' preferences between better consistency and bandwidth overhead. Further-more, our optimization model's flexibility allows us to easily adapt ECO-DNS to handle various caching hierarchies such as multi-level caching while considering the trade off among consistency, overhead, latency, and server load.

当前域名系统(DNS)的灵活性已经达到极限，以适应新的应用，如内容交付网络和动态DNS。特别是，维护缓存一致性已经成为一个更大的问题，因为新兴技术需要越来越频繁地更新DNS记录。尽管生存时间(TTL)是控制缓存一致性的最广泛使用的方法，但它不能提供处理这些频繁变化所必需的细粒度控制。此外，ttl过于静态，无法处理由互联网故障或社交媒体趋势引起的流量突然变化，这表明它们在面对不可预见的事件时缺乏灵活性。为了解决这些问题，我们首先提出了一个称为预期聚合不一致性(EAI)的度量，它允许我们在定量测量不一致性时考虑重要的因素，例如记录的更新频率和流行程度。然后，我们设计了ECO-DNS，这是一个轻量级系统，利用EAI提供的信息来优化记录的TTL。这个值可以根据各个缓存服务器在更好的一致性和带宽开销之间的偏好进行调整。此外，我们的优化模型的灵活性使我们能够轻松地调整ECO-DNS来处理各种缓存层次结构，例如多级缓存，同时考虑一致性、开销、延迟和服务器负载之间的权衡。

{"title":"ECO-DNS: Expected Consistency Optimization for DNS","authors":"Chen Chen, S. Matsumoto, A. Perrig","doi":"10.1109/ICDCS.2015.34","DOIUrl":"https://doi.org/10.1109/ICDCS.2015.34","url":null,"abstract":"The flexibility of the current Domain Name System (DNS) has been stretched to its limits to accommodate new applications such as content delivery networks and dynamic DNS. In particular, maintaining cache consistency has become a much larger problem, as emerging technologies require increasingly-frequent updates to DNS records. Though Time-To-Live (TTL) is the most widely used method of controlling cache consistency, it does not offer the fine-grained control necessary for handling these frequent changes. In addition, TTLs are too static to handle sudden changes in traffic caused by Internet failures or social media trends, demonstrating their inflexibility in the face of unforeseen events. To address these problems, we first propose a metric called Expected Aggregate Inconsistency (EAI), which allows us to consider important factors such as a record's update frequency and popularity when quantitatively measuring inconsistency. We then design ECO-DNS, a lightweight system that leverages the information provided by EAI to optimize a record's TTL. This value can be tuned to individual cache servers' preferences between better consistency and bandwidth overhead. Further-more, our optimization model's flexibility allows us to easily adapt ECO-DNS to handle various caching hierarchies such as multi-level caching while considering the trade off among consistency, overhead, latency, and server load.","PeriodicalId":129182,"journal":{"name":"2015 IEEE 35th International Conference on Distributed Computing Systems","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126332137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Lightitude: Indoor Positioning Using Ubiquitous Visible Lights and COTS Devices 照明:利用无处不在的可见光和COTS设备进行室内定位

2015 IEEE 35th International Conference on Distributed Computing Systems

Pub Date : 2015-05-22 DOI: 10.1109/ICDCS.2015.82

Yiqing Hu, Yan Xiong, Wenchao Huang, Xiangyang Li, Yanan Zhang, Xufei Mao, Panlong Yang, Caimei Wang

In this paper, we propose a novel indoor localization scheme, Lightitude, by exploiting ubiquitous visible lights, which are necessarily and densely deployed in almost all indoor environments. Different from existing positioning systems that exploit special LEDs, ubiquitous visible lights lack fingerprints that can uniquely identify the light source, which results in an ambiguity problem that an RLS may correspond to multiple candidate positions. Moreover, received light strength (RLS) is not only determined by device's position, but also seriously affected by its orientation, which causes great complexity in site-survey. To address these challenges, we first propose and validate a realistic light strength model to avoid the expensive site-survey, then harness user's mobility to generate spatial-related RLS to tackle single RLS's position-ambiguity problem. Experiment results show that Lightitude achieves mean accuracy 1.93m and 2.24m in office (720m2) and library scenario (960m2) respectively.

在本文中，我们提出了一种新的室内定位方案，Lightitude，利用无处不在的可见光，这是必要的和密集的部署在几乎所有的室内环境。与现有的利用特殊led的定位系统不同，无处不在的可见光缺乏能够唯一识别光源的指纹，这导致了RLS可能对应多个候选位置的模糊性问题。此外，接收光强(RLS)不仅取决于设备的位置，而且受其方向的严重影响，这给现场测量带来了很大的复杂性。为了解决这些问题，我们首先提出并验证了一个现实的光强度模型，以避免昂贵的现场调查，然后利用用户的移动性来生成空间相关的RLS，以解决单个RLS的位置模糊问题。实验结果表明，在办公场景(720m2)和图书馆场景(960m2)下，lightity的平均精度分别为1.93m和2.24m。

引用次数: 15

Deterministic Symmetry Breaking in Ring Networks 环网络中的确定性对称性破缺

2015 IEEE 35th International Conference on Distributed Computing Systems

Pub Date : 2015-04-27 DOI: 10.1109/ICDCS.2015.59

L. Gąsieniec, T. Jurdzinski, R. Martin, Grzegorz Stachowiak

We study a distributed coordination mechanism for uniform agents located on a circle. The agents perform their actions in synchronised rounds. At the beginning of each round an agent chooses the direction of its movement from clockwise, anticlockwise, or idle, and moves at unit speed during this round. Agents are not allowed to overpass, i.e., When an agent collides with another it instantly starts moving with the same speed in the opposite direction (without exchanging any information with the other agent). However, at the end of each round each agent has access to limited information regarding its trajectory of movement during this round. We assume that n mobile agents are initially located on a circle unit circumference at arbitrary but distinct positions unknown to other agents. The agents are equipped with unique identifiers from a fixed range. The location discovery task to be performed by each agent is to determine the initial position of every other agent. Our main result states that, if the only available information about movement in a round is limited to distance between the initial and the final position, then there is a superlinear lower bound on time needed to solve the location discovery problem. Interestingly, this result corresponds to a combinatorial symmetry breaking problem, which might be of independent interest. If, on the other hand, an agent has access to the distance to its first collision with another agent in a round, we design an asymptotically efficient and close to optimal solution for the location discovery problem.

研究了一种分布在圆上的均匀agent的分布式协调机制。代理在同步的回合中执行它们的动作。在每一轮开始时，agent可以选择顺时针、逆时针或空闲的移动方向，并在这一轮中以单位速度移动。agent不允许交叉，即当一个agent与另一个agent碰撞时，它立即开始以相同的速度向相反的方向移动(不与另一个agent交换任何信息)。然而，在每个回合结束时，每个代理都可以访问有关其在此回合中的运动轨迹的有限信息。我们假设n个移动智能体最初位于单位圆周上任意但不同的位置，其他智能体不知道。代理具有固定范围内的唯一标识符。每个代理执行的位置发现任务是确定每个其他代理的初始位置。我们的主要结果表明，如果关于一轮运动的唯一可用信息仅限于初始位置和最终位置之间的距离，那么解决位置发现问题所需的时间就存在超线性下界。有趣的是，这个结果对应于一个组合对称破缺问题，这可能是一个独立的兴趣。另一方面，如果一个智能体在一轮中能够访问到它与另一个智能体的第一次碰撞的距离，我们为位置发现问题设计了一个渐近有效且接近最优的解。

{"title":"Deterministic Symmetry Breaking in Ring Networks","authors":"L. Gąsieniec, T. Jurdzinski, R. Martin, Grzegorz Stachowiak","doi":"10.1109/ICDCS.2015.59","DOIUrl":"https://doi.org/10.1109/ICDCS.2015.59","url":null,"abstract":"We study a distributed coordination mechanism for uniform agents located on a circle. The agents perform their actions in synchronised rounds. At the beginning of each round an agent chooses the direction of its movement from clockwise, anticlockwise, or idle, and moves at unit speed during this round. Agents are not allowed to overpass, i.e., When an agent collides with another it instantly starts moving with the same speed in the opposite direction (without exchanging any information with the other agent). However, at the end of each round each agent has access to limited information regarding its trajectory of movement during this round. We assume that n mobile agents are initially located on a circle unit circumference at arbitrary but distinct positions unknown to other agents. The agents are equipped with unique identifiers from a fixed range. The location discovery task to be performed by each agent is to determine the initial position of every other agent. Our main result states that, if the only available information about movement in a round is limited to distance between the initial and the final position, then there is a superlinear lower bound on time needed to solve the location discovery problem. Interestingly, this result corresponds to a combinatorial symmetry breaking problem, which might be of independent interest. If, on the other hand, an agent has access to the distance to its first collision with another agent in a round, we design an asymptotically efficient and close to optimal solution for the location discovery problem.","PeriodicalId":129182,"journal":{"name":"2015 IEEE 35th International Conference on Distributed Computing Systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126176218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lazy Ctrl: Scalable Network Control for Cloud Data Centers 延迟Ctrl:用于云数据中心的可扩展网络控制

2015 IEEE 35th International Conference on Distributed Computing Systems

Pub Date : 2015-04-10 DOI: 10.1109/ICDCS.2015.110

Lin Wang, K. Zheng, Baohua Yang, Yi Sun, Yue Zhang, S. Uhlig

The advent of software defined networking enables flexible, reliable and feature-rich control planes for data center networks. However, the tight coupling of centralized control and complete visibility leads to a wide range of issues among which scalability has risen to prominence. We observe that data center traffic is usually highly skewed and thus edge switches can be grouped according to traffic locality. As a result, the workload of the central controller could be highly reduced if we carry out distributed control inside those groups. Based on the above observation, we present LazyCtrl, a novel hybrid control plane design for data center networks. LazyCtrl aims at bringing laziness to the central controller by dynamically devolving most of the control tasks to independent switch groups to process frequent intra-group events using distributed control mechanisms, while handling rare inter-group or other specified events by the controller. We implement LazyCtrl and build a prototype based on Open vSwich and Floodlight. Trace-driven experiments on our prototype show that an effective switch grouping is easy to maintain in multi-tenant clouds and the central controller can be significantly shielded by staying lazy, with its workload reduced by up to 82%.

软件定义网络的出现为数据中心网络提供了灵活、可靠和功能丰富的控制平面。然而，集中控制和完全可见性的紧密耦合导致了一系列的问题，其中可伸缩性变得尤为突出。我们观察到数据中心流量通常是高度倾斜的，因此可以根据流量位置对边缘交换机进行分组。因此，如果我们在这些组中进行分布式控制，可以大大减少中央控制器的工作量。基于上述观察，我们提出了一种新的数据中心网络混合控制平面设计LazyCtrl。LazyCtrl旨在通过动态地将大部分控制任务下放给独立的切换组，使用分布式控制机制处理频繁的组内事件，同时处理罕见的组间事件或控制器指定的其他事件，从而将惰性引入中央控制器。我们实现了LazyCtrl，并建立了一个基于Open vSwich和泛光灯的原型。在我们的原型上进行的跟踪驱动实验表明，在多租户云中很容易维护有效的交换机分组，并且通过保持懒惰可以显着屏蔽中央控制器，其工作负载最多可减少82%。

{"title":"Lazy Ctrl: Scalable Network Control for Cloud Data Centers","authors":"Lin Wang, K. Zheng, Baohua Yang, Yi Sun, Yue Zhang, S. Uhlig","doi":"10.1109/ICDCS.2015.110","DOIUrl":"https://doi.org/10.1109/ICDCS.2015.110","url":null,"abstract":"The advent of software defined networking enables flexible, reliable and feature-rich control planes for data center networks. However, the tight coupling of centralized control and complete visibility leads to a wide range of issues among which scalability has risen to prominence. We observe that data center traffic is usually highly skewed and thus edge switches can be grouped according to traffic locality. As a result, the workload of the central controller could be highly reduced if we carry out distributed control inside those groups. Based on the above observation, we present LazyCtrl, a novel hybrid control plane design for data center networks. LazyCtrl aims at bringing laziness to the central controller by dynamically devolving most of the control tasks to independent switch groups to process frequent intra-group events using distributed control mechanisms, while handling rare inter-group or other specified events by the controller. We implement LazyCtrl and build a prototype based on Open vSwich and Floodlight. Trace-driven experiments on our prototype show that an effective switch grouping is easy to maintain in multi-tenant clouds and the central controller can be significantly shielded by staying lazy, with its workload reduced by up to 82%.","PeriodicalId":129182,"journal":{"name":"2015 IEEE 35th International Conference on Distributed Computing Systems","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123192603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams DRS:快速流上实时分析的动态资源调度

2015 IEEE 35th International Conference on Distributed Computing Systems

Pub Date : 2015-01-15 DOI: 10.1109/ICDCS.2015.49

T. Fu, Jianbing Ding, Richard T. B. Ma, M. Winslett, Y. Yang, Zhenjie Zhang

In a data stream management system (DSMS), users register continuous queries, and receive result updates as data arrive and expire. We focus on applications with real-time constraints, in which the user must receive each result update within a given period after the update occurs. To handle fast data, the DSMS is commonly placed on top of a cloud infrastructure. Because stream properties such as arrival rates can fluctuate unpredictably, cloud resources must be dynamically provisioned and scheduled accordingly to ensure real-time response. It is essential, for the existing systems or future developments, to possess the ability of scheduling resources dynamically according to the current workload, in order to avoid wasting resources, or failing in delivering correct results on time. Motivated by this, we propose DRS, a novel dynamic resource scheduler for cloud-based DSMSs. DRS overcomes three fundamental challenges: (a) how to model the relationship between the provisioned resources and query response time (b) where to best place resources, and (c) how to measure system load with minimal overhead. In particular, DRS includes an accurate performance model based on the theory of Jackson open queueing networks and is capable of handling arbitrary operator topologies, possibly with loops, splits and joins. Extensive experiments with real data confirm that DRS achieves real-time response with close to optimal resource consumption.

在数据流管理系统(DSMS)中，用户注册连续查询，并在数据到达和过期时接收结果更新。我们关注具有实时约束的应用程序，其中用户必须在更新发生后的给定时间段内接收每个结果更新。为了处理快速数据，DSMS通常放在云基础设施的顶部。由于流属性(如到达率)可能出现不可预测的波动，因此必须对云资源进行动态供应和调度，以确保实时响应。对于现有系统或未来的开发来说，必须具备根据当前工作负载动态调度资源的能力，以避免资源浪费，或者不能按时交付正确的结果。基于此，我们提出了一种新的基于云的dsm动态资源调度程序DRS。DRS克服了三个基本挑战:(a)如何对所提供的资源和查询响应时间之间的关系进行建模;(b)在何处放置资源最佳;(c)如何以最小的开销度量系统负载。特别是，DRS包含基于Jackson开放队列网络理论的精确性能模型，并且能够处理任意操作符拓扑，可能带有循环、拆分和连接。大量的真实数据实验证实，DRS在接近最优资源消耗的情况下实现了实时响应。

{"title":"DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams","authors":"T. Fu, Jianbing Ding, Richard T. B. Ma, M. Winslett, Y. Yang, Zhenjie Zhang","doi":"10.1109/ICDCS.2015.49","DOIUrl":"https://doi.org/10.1109/ICDCS.2015.49","url":null,"abstract":"In a data stream management system (DSMS), users register continuous queries, and receive result updates as data arrive and expire. We focus on applications with real-time constraints, in which the user must receive each result update within a given period after the update occurs. To handle fast data, the DSMS is commonly placed on top of a cloud infrastructure. Because stream properties such as arrival rates can fluctuate unpredictably, cloud resources must be dynamically provisioned and scheduled accordingly to ensure real-time response. It is essential, for the existing systems or future developments, to possess the ability of scheduling resources dynamically according to the current workload, in order to avoid wasting resources, or failing in delivering correct results on time. Motivated by this, we propose DRS, a novel dynamic resource scheduler for cloud-based DSMSs. DRS overcomes three fundamental challenges: (a) how to model the relationship between the provisioned resources and query response time (b) where to best place resources, and (c) how to measure system load with minimal overhead. In particular, DRS includes an accurate performance model based on the theory of Jackson open queueing networks and is capable of handling arbitrary operator topologies, possibly with loops, splits and joins. Extensive experiments with real data confirm that DRS achieves real-time response with close to optimal resource consumption.","PeriodicalId":129182,"journal":{"name":"2015 IEEE 35th International Conference on Distributed Computing Systems","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133810433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 94

Crowd-ML: A Privacy-Preserving Learning Framework for a Crowd of Smart Devices Crowd- ml:一种用于大量智能设备的隐私保护学习框架

2015 IEEE 35th International Conference on Distributed Computing Systems

Pub Date : 2015-01-11 DOI: 10.1109/ICDCS.2015.10

Jihun Hamm, Adam C. Champion, Guoxing Chen, M. Belkin, D. Xuan

Smart devices with built-in sensors, computational capabilities, and network connectivity have become increasingly pervasive. Crowds of smart devices offer opportunities to collectively sense and perform computing tasks at an unprecedented scale. This paper presents Crowd-ML, a privacy-preserving machine learning framework for a crowd of smart devices, which can solve a wide range of learning problems for crowd sensing data with differential privacy guarantees. Crowd-ML endows a crowd sensing system with the ability to learn classifiers or predictors online from crowd sensing data privately with minimal computational overhead on devices and servers, suitable for practical large-scale use of the framework. We analyze the performance and scalability of Crowd-ML and implement the system with off-the-shelf smartphones as a proof of concept. We demonstrate the advantages of Crowd-ML with real and simulated experiments under various conditions.

具有内置传感器、计算能力和网络连接的智能设备变得越来越普遍。大量智能设备提供了以前所未有的规模集体感知和执行计算任务的机会。本文提出了一种用于智能设备群体的隐私保护机器学习框架crowd - ml，它可以解决具有差分隐私保证的群体感知数据的广泛学习问题。crowd - ml使人群感知系统能够从人群感知数据中在线学习分类器或预测器，并且在设备和服务器上的计算开销最小，适合实际大规模使用该框架。我们分析了Crowd-ML的性能和可扩展性，并使用现成的智能手机实现了该系统，作为概念验证。我们通过各种条件下的真实和模拟实验证明了Crowd-ML的优势。

引用次数: 61

Task-Cloning Algorithms in a MapReduce Cluster with Competitive Performance Bounds 具有竞争性能边界的MapReduce集群中的任务克隆算法

2015 IEEE 35th International Conference on Distributed Computing Systems

Pub Date : 2015-01-10 DOI: 10.1109/ICDCS.2015.42

Huanle Xu, W. Lau

Job scheduling for a MapReduce cluster has been an active research topic in recent years. However, measurement traces from real-world production environment show that the duration of tasks within a job vary widely. The overall elapsed time of a job, i.e. The so-called flow time, is often dictated by one or few slowly-running tasks within a job, generally referred as the "stragglers". The cause of stragglers include tasks running on partially/intermittently failing machines or the existence of some localized resource bottleneck(s) within a MapReduce cluster. To tackle this online job scheduling challenge, we adopt the task cloning approach and design the corresponding scheduling algorithms which aim at minimizing the weighted sum of job flow times in a MapReduce cluster based on the Shortest Remaining Processing Time scheduler (SRPT). To be more specific, we first design a 2-competitive offline algorithm when the variance of task-duration is negligible. We then extend this offline algorithm to yield the so-called SRPTMS+C algorithm for the online case and show that SRPTMS+C is (1 + ϵ) - speed o (1/ϵ2) - competitive in reducing the weighted sum of job flow times within a cluster. Both of the algorithms explicitly consider the precedence constraints between the two phases within the MapReduce framework. We also demonstrate via trace-driven simulations that SRPTMS+C can significantly reduce the weighted/unweighted sum of job flow times by cutting down the elapsed time of small jobs substantially. In particular, SRPTMS+C beats the Microsoft Mantri scheme by nearly 25% according to this metric.

MapReduce集群的作业调度是近年来一个活跃的研究课题。然而，来自真实生产环境的测量跟踪显示，一个作业中的任务持续时间差异很大。作业的总运行时间，即所谓的流时间，通常由作业中一个或几个运行缓慢的任务决定，这些任务通常被称为“离散者”。导致散列的原因包括在部分/间歇性故障的机器上运行的任务，或者MapReduce集群中存在一些局部资源瓶颈。为了解决这一在线作业调度挑战，我们采用任务克隆方法并设计相应的调度算法，以最小化MapReduce集群中基于最短剩余处理时间调度(SRPT)的作业流时间加权和为目标。更具体地说，我们首先设计了一个任务持续时间方差可以忽略的2竞争离线算法。然后，我们扩展了这种离线算法，为在线情况生成了所谓的SRPTMS+C算法，并表明SRPTMS+C在减少集群内作业流时间加权和方面具有竞争力(1 + λ) -速度o (1/ϵ2)。这两种算法都显式地考虑了MapReduce框架中两个阶段之间的优先级约束。我们还通过跟踪驱动的模拟证明，SRPTMS+C可以通过大幅减少小作业的运行时间来显著减少加权/非加权作业流时间总和。根据这个指标，SRPTMS+C比Microsoft Mantri方案高出近25%。

{"title":"Task-Cloning Algorithms in a MapReduce Cluster with Competitive Performance Bounds","authors":"Huanle Xu, W. Lau","doi":"10.1109/ICDCS.2015.42","DOIUrl":"https://doi.org/10.1109/ICDCS.2015.42","url":null,"abstract":"Job scheduling for a MapReduce cluster has been an active research topic in recent years. However, measurement traces from real-world production environment show that the duration of tasks within a job vary widely. The overall elapsed time of a job, i.e. The so-called flow time, is often dictated by one or few slowly-running tasks within a job, generally referred as the \"stragglers\". The cause of stragglers include tasks running on partially/intermittently failing machines or the existence of some localized resource bottleneck(s) within a MapReduce cluster. To tackle this online job scheduling challenge, we adopt the task cloning approach and design the corresponding scheduling algorithms which aim at minimizing the weighted sum of job flow times in a MapReduce cluster based on the Shortest Remaining Processing Time scheduler (SRPT). To be more specific, we first design a 2-competitive offline algorithm when the variance of task-duration is negligible. We then extend this offline algorithm to yield the so-called SRPTMS+C algorithm for the online case and show that SRPTMS+C is (1 + ϵ) - speed o (1/ϵ2) - competitive in reducing the weighted sum of job flow times within a cluster. Both of the algorithms explicitly consider the precedence constraints between the two phases within the MapReduce framework. We also demonstrate via trace-driven simulations that SRPTMS+C can significantly reduce the weighted/unweighted sum of job flow times by cutting down the elapsed time of small jobs substantially. In particular, SRPTMS+C beats the Microsoft Mantri scheme by nearly 25% according to this metric.","PeriodicalId":129182,"journal":{"name":"2015 IEEE 35th International Conference on Distributed Computing Systems","volume":"34 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123359794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Discrete Load Balancing in Heterogeneous Networks with a Focus on Second-Order Diffusion 基于二阶扩散的异构网络离散负载均衡

2015 IEEE 35th International Conference on Distributed Computing Systems

Pub Date : 2014-12-22 DOI: 10.1109/ICDCS.2015.57

Hoda Akbari, P. Berenbrink, Robert Elsässer, Dominik Kaaser

In this paper we consider a wide class of discrete diffusion load balancing algorithms. The problem is defined as follows. We are given an interconnection network and a number of load items, which are arbitrarily distributed among the nodes of the network. The goal is to redistribute the load in iterative discrete steps such that at the end each node has (almost) the same number of items. In diffusion load balancing, nodes are only allowed to balance their load with their direct neighbors. We show three main results. Firstly, we present a general framework for randomly rounding the flow generated by continuous diffusion schemes over the edges of a graph in order to obtain corresponding discrete schemes. Compared to the results of Rabani, Sinclair, and Wanka, FOCS'98, which are only valid w.r.t. The class of homogeneous first order schemes, our framework can be used to analyze a larger class of diffusion algorithms, such as algorithms for heterogeneous networks and second order schemes. Secondly, we bound the deviation between randomized second order schemes and their continuous counterparts. Finally, we provide a bound for the minimum initial load in a network that is sufficient to prevent the occurrence of negative load at a node during the execution of second order diffusion schemes. Our theoretical results are complemented with extensive simulations on different graph classes. We show empirically that second order schemes, which are usually much faster than first order schemes, will not balance the load completely on a number of networks within reasonable time. However, the maximum load difference at the end seems to be bounded by a constant value, which can be further decreased if first order scheme is applied once this value is achieved by second order scheme.

本文研究了一类广泛的离散扩散负载平衡算法。问题的定义如下。给定一个互连网络和若干负载项，这些负载项任意分布在网络的节点上。目标是在迭代的离散步骤中重新分配负载，以便在最后每个节点具有(几乎)相同数量的项。在扩散负载均衡中，节点只允许与其直接邻居均衡负载。我们展示了三个主要结果。首先，我们提出了一个通用框架，用于在图的边缘上随机舍入由连续扩散格式产生的流，以获得相应的离散格式。与Rabani, Sinclair和Wanka, FOCS'98的结果相比，我们的框架可用于分析更大类别的扩散算法，例如异构网络和二阶格式的算法。Rabani, Sinclair和Wanka, FOCS'98的结果仅对同类一阶格式有效。其次，对随机二阶格式与连续二阶格式之间的偏差进行了定界。最后，我们给出了网络中最小初始负荷的边界，该边界足以防止在执行二阶扩散方案时节点上出现负负荷。我们的理论结果与对不同图类的广泛模拟相辅相成。我们的经验表明，二阶方案通常比一阶方案快得多，但不能在合理的时间内完全平衡多个网络上的负载。然而，最终的最大负载差似乎被一个恒定值所限制，一旦二阶方案达到该值，如果应用一阶方案，则可以进一步减小该值。

{"title":"Discrete Load Balancing in Heterogeneous Networks with a Focus on Second-Order Diffusion","authors":"Hoda Akbari, P. Berenbrink, Robert Elsässer, Dominik Kaaser","doi":"10.1109/ICDCS.2015.57","DOIUrl":"https://doi.org/10.1109/ICDCS.2015.57","url":null,"abstract":"In this paper we consider a wide class of discrete diffusion load balancing algorithms. The problem is defined as follows. We are given an interconnection network and a number of load items, which are arbitrarily distributed among the nodes of the network. The goal is to redistribute the load in iterative discrete steps such that at the end each node has (almost) the same number of items. In diffusion load balancing, nodes are only allowed to balance their load with their direct neighbors. We show three main results. Firstly, we present a general framework for randomly rounding the flow generated by continuous diffusion schemes over the edges of a graph in order to obtain corresponding discrete schemes. Compared to the results of Rabani, Sinclair, and Wanka, FOCS'98, which are only valid w.r.t. The class of homogeneous first order schemes, our framework can be used to analyze a larger class of diffusion algorithms, such as algorithms for heterogeneous networks and second order schemes. Secondly, we bound the deviation between randomized second order schemes and their continuous counterparts. Finally, we provide a bound for the minimum initial load in a network that is sufficient to prevent the occurrence of negative load at a node during the execution of second order diffusion schemes. Our theoretical results are complemented with extensive simulations on different graph classes. We show empirically that second order schemes, which are usually much faster than first order schemes, will not balance the load completely on a number of networks within reasonable time. However, the maximum load difference at the end seems to be bounded by a constant value, which can be further decreased if first order scheme is applied once this value is achieved by second order scheme.","PeriodicalId":129182,"journal":{"name":"2015 IEEE 35th International Conference on Distributed Computing Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133208842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4