2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)最新文献_第4页

Poster: Approximate Caching for Mobile Image Recognition 海报:移动图像识别的近似缓存

2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)

Pub Date : 2021-07-01 DOI: 10.1109/ICDCS51616.2021.00125

James Mariani, Yongqi Han, Li Xiao

Many emerging mobile applications rely heavily upon image recognition of both static images and live video streams. Image recognition is commonly achieved using deep neural networks (DNNs) which can achieve high accuracy but also incur significant computation latency and energy consumption on resource-constrained smartphones. We introduce an in-memory caching paradigm that supports infrastructure-less collaborative computation reuse in smartphone image recognition. We propose using the inertial movement of smartphones, the locality inherent in video streams, as well as information from nearby, peer-to-peer devices to maximize the computation reuse opportunities in mobile image recognition. Experimental results show that our system lowers the average latency of standard mobile neural network image recognition applications by up to 94% with minimal loss of recognition accuracy.

许多新兴的移动应用程序严重依赖于静态图像和实时视频流的图像识别。图像识别通常使用深度神经网络(dnn)来实现，它可以实现高精度，但在资源有限的智能手机上也会产生显着的计算延迟和能耗。我们引入了一种内存缓存范式，支持智能手机图像识别中无基础设施的协作计算重用。我们建议使用智能手机的惯性运动、视频流固有的局部性以及来自附近点对点设备的信息，以最大限度地提高移动图像识别中的计算重用机会。实验结果表明，我们的系统将标准移动神经网络图像识别应用的平均延迟降低了94%，且识别精度损失最小。

引用次数: 0

PupilMeter: Modeling User Preference with Time-Series Features of Pupillary Response 瞳孔测量仪:用瞳孔反应的时间序列特征建模用户偏好

2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)

Pub Date : 2021-07-01 DOI: 10.1109/ICDCS51616.2021.00102

Hongbo Jiang, Xiangyu Shen, Daibo Liu

Modeling user preferences is a challenging problem in the wide application of recommendation services. Existing methods mainly exploit multiple activities irrelevant to user's inner feeling to build user preference model, which may raise model uncertainty and bring about prediction error. In this paper, we present PupilMeter - the first system that moves one step forward towards exploring the correlation between user preference and the instant pupillary response. Specifically, we conduct extensive experiments to dig into the generic physiological process of pupillary response while viewing specific content on smart devices, and further figure out six key time-series features relevant to users' preference degree by using Random Forest. However, the diversity of pupillary responses caused by inherent individual difference poses significant challenges to the generality of learned model. To solve this problem, we use Multilayer Perceptron to automatically train and adjust the importance of key features for each individual and then generate a personalized user preference model associated with user's pupillary response. We have prototyped PupilMeter and conducted both test experiments and in-the-wild studies to comprehensively evaluate the effectiveness of PupilMeter by recruiting 30 volunteers. Experimental results demonstrate that PupilMeter can accurately identify users' preference.

在推荐服务的广泛应用中，用户偏好建模是一个具有挑战性的问题。现有的方法主要是利用与用户内心感受无关的多个活动来构建用户偏好模型，这可能会增加模型的不确定性，带来预测误差。在本文中，我们介绍了瞳孔测量仪——第一个向探索用户偏好和瞳孔即时反应之间的相关性迈进了一步的系统。具体而言，我们进行了大量的实验，深入研究了在智能设备上观看特定内容时瞳孔反应的一般生理过程，并利用随机森林进一步找出了与用户偏好程度相关的六个关键时间序列特征。然而，由于固有的个体差异导致的瞳孔反应的多样性对学习模型的通用性提出了重大挑战。为了解决这个问题，我们使用多层感知器自动训练和调整每个个体的关键特征的重要性，然后生成与用户瞳孔响应相关的个性化用户偏好模型。我们制作了瞳孔测量仪的原型，并招募了30名志愿者，进行了测试实验和野外研究，以全面评估瞳孔测量仪的有效性。实验结果表明，该算法能够准确识别用户的偏好。

{"title":"PupilMeter: Modeling User Preference with Time-Series Features of Pupillary Response","authors":"Hongbo Jiang, Xiangyu Shen, Daibo Liu","doi":"10.1109/ICDCS51616.2021.00102","DOIUrl":"https://doi.org/10.1109/ICDCS51616.2021.00102","url":null,"abstract":"Modeling user preferences is a challenging problem in the wide application of recommendation services. Existing methods mainly exploit multiple activities irrelevant to user's inner feeling to build user preference model, which may raise model uncertainty and bring about prediction error. In this paper, we present PupilMeter - the first system that moves one step forward towards exploring the correlation between user preference and the instant pupillary response. Specifically, we conduct extensive experiments to dig into the generic physiological process of pupillary response while viewing specific content on smart devices, and further figure out six key time-series features relevant to users' preference degree by using Random Forest. However, the diversity of pupillary responses caused by inherent individual difference poses significant challenges to the generality of learned model. To solve this problem, we use Multilayer Perceptron to automatically train and adjust the importance of key features for each individual and then generate a personalized user preference model associated with user's pupillary response. We have prototyped PupilMeter and conducted both test experiments and in-the-wild studies to comprehensively evaluate the effectiveness of PupilMeter by recruiting 30 volunteers. Experimental results demonstrate that PupilMeter can accurately identify users' preference.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127861257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Demo: Software-defined Virtual Networking Across Multiple Edge and Cloud Providers with EdgeVPN.io 演示:使用EdgeVPN.io实现跨多个边缘和云提供商的软件定义虚拟网络

2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)

Pub Date : 2021-07-01 DOI: 10.1109/ICDCS51616.2021.00107

R. Figueiredo, Kensworth C. Subratie

This demonstration will showcase EdgeVPN.io, an open-source software-defined virtual private network (VPN) that enables the creation of scalable layer-2 virtual networks across multiple providers - including scenarios where devices are behind Network Address Translation (NAT) and firewall middleboxes. Its architecture combines a distributed software-defined networking (SDN) control plane and a scalable structured peer-to-peer overlay of Internet tunnels that form its datapath. EdgeVPN.io provides a foundation for the deployment of virtual networks that enable research and development in distributed computing. The demonstration will include a brief overview of the architecture, and will show step-by-step how a researcher can deploy EdgeVPN.io networks on devices including Raspberry Pis, Jetson Nanos, and VMs/Docker containers in the cloud. Attendees will be provided with trial resources to allow them to follow the demonstration hands-on if they so desire.

这个演示将展示EdgeVPN。io是一个开源软件定义的虚拟专用网(VPN)，它支持跨多个提供商创建可扩展的第二层虚拟网络，包括设备位于网络地址转换(NAT)和防火墙中间盒后面的场景。它的体系结构结合了分布式软件定义网络(SDN)控制平面和构成其数据路径的可扩展结构化点对点因特网隧道覆盖层。EdgeVPN。IO为虚拟网络的部署提供了基础，使分布式计算的研究和开发成为可能。演示将包括架构的简要概述，并将逐步展示研究人员如何部署EdgeVPN。包括Raspberry Pis、Jetson nano和云中的vm /Docker容器在内的设备上的io网络。与会者将提供试用资源，如果他们愿意，可以亲自参加演示。

引用次数: 0

Statistical Tail-Latency Bounded QoS Provisioning for Parallel and Distributed Data Centers 并行和分布式数据中心的统计尾延迟有界QoS提供

2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)

Pub Date : 2021-07-01 DOI: 10.1109/ICDCS51616.2021.00078

Xi Zhang, Qixuan Zhu

The large-scale interactive services distribute clients' requests across a large number of physical machine in data center architectures to enhance the quality-of-service (QoS) performance. In parallel and distributed data center architecture, even a temporary spike in latency of any service component can significantly impact the end-to-end delay. Besides the average latency, tail-latency (i.e., worst case latency) of a service has also attracted a lot of research attentions. The tail-latency is a critical performance metric in data centers, where long tail latencies refer to the higher percentiles (such as 98th, 99th) of latency in comparison to the average latency time. While the statistical delay-bounded QoS provisioning theory has been shown to be a powerful technique and useful performance metric for supporting time-sensitive multimedia transmissions over mobile computing networks, how to efficiently extend and implement this technique/performance-metric for statistically bounding the tail-latency for data center networks has neither been well understood nor thoroughly studied. In this paper, we model and characterize the tail-latency distribution in a three-layer parallel and distributed data center architecture, where clients request different types of services and ten download their requested data packets from data center through a first-come-first-serve M/M/1 queueing system. We first define the statistical tail-latency bounded QoS, and investigate the tail-latency problem through generalized extreme value (GEV) theory and generalized Pareto distribution (GPD) theory. Then, we propose a scheme to identify the dominant sources of latency variance in a semantic context, so that we are able to optimize the instructions of those sources to reduce the latency tail. Finally, using numerical analyses we validate and evaluate our developed modeling techniques and schemes for characterizing the tail-latency QoS provisioning theories in supporting data center networks.

大规模交互服务将客户端的请求分布在数据中心体系结构中的大量物理机器上，以提高服务质量(QoS)性能。在并行和分布式数据中心体系结构中，任何服务组件的延迟即使是暂时的峰值也会显著影响端到端延迟。除了平均延迟之外，服务的尾部延迟(即最坏情况延迟)也引起了很多研究的关注。尾延迟是数据中心中的一个关键性能指标，其中长尾延迟是指与平均延迟时间相比，延迟的较高百分位数(例如98、99)。虽然统计延迟限制QoS提供理论已被证明是支持移动计算网络上时间敏感的多媒体传输的强大技术和有用的性能指标，但如何有效地扩展和实现这一技术/性能指标，以统计限制数据中心网络的尾延迟，既没有得到很好的理解，也没有得到彻底的研究。在三层并行分布式数据中心体系结构中，客户端请求不同类型的服务，并通过先到先服务的M/M/1排队系统从数据中心下载其请求的数据包。首先定义了统计尾延迟有界QoS，并利用广义极值(GEV)理论和广义Pareto分布(GPD)理论研究了尾延迟问题。然后，我们提出了一种在语义上下文中识别延迟方差的主要来源的方案，以便我们能够优化这些来源的指令以减少延迟尾部。最后，使用数值分析，我们验证和评估了我们开发的建模技术和方案，用于描述支持数据中心网络的尾延迟QoS提供理论。

{"title":"Statistical Tail-Latency Bounded QoS Provisioning for Parallel and Distributed Data Centers","authors":"Xi Zhang, Qixuan Zhu","doi":"10.1109/ICDCS51616.2021.00078","DOIUrl":"https://doi.org/10.1109/ICDCS51616.2021.00078","url":null,"abstract":"The large-scale interactive services distribute clients' requests across a large number of physical machine in data center architectures to enhance the quality-of-service (QoS) performance. In parallel and distributed data center architecture, even a temporary spike in latency of any service component can significantly impact the end-to-end delay. Besides the average latency, tail-latency (i.e., worst case latency) of a service has also attracted a lot of research attentions. The tail-latency is a critical performance metric in data centers, where long tail latencies refer to the higher percentiles (such as 98th, 99th) of latency in comparison to the average latency time. While the statistical delay-bounded QoS provisioning theory has been shown to be a powerful technique and useful performance metric for supporting time-sensitive multimedia transmissions over mobile computing networks, how to efficiently extend and implement this technique/performance-metric for statistically bounding the tail-latency for data center networks has neither been well understood nor thoroughly studied. In this paper, we model and characterize the tail-latency distribution in a three-layer parallel and distributed data center architecture, where clients request different types of services and ten download their requested data packets from data center through a first-come-first-serve M/M/1 queueing system. We first define the statistical tail-latency bounded QoS, and investigate the tail-latency problem through generalized extreme value (GEV) theory and generalized Pareto distribution (GPD) theory. Then, we propose a scheme to identify the dominant sources of latency variance in a semantic context, so that we are able to optimize the instructions of those sources to reduce the latency tail. Finally, using numerical analyses we validate and evaluate our developed modeling techniques and schemes for characterizing the tail-latency QoS provisioning theories in supporting data center networks.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116918338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Poster: Multi-agent Combinatorial Bandits with Moving Arms 海报:多智能体组合强盗与移动的手臂

2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)

Pub Date : 2021-07-01 DOI: 10.1109/ICDCS51616.2021.00126

Zhiming Huang, Bingshan Hu, Jianping Pan

In this paper, we study a distributed stochastic multi-armed bandit problem that can address many real-world problems such as task assignment for multiple crowdsourcing platforms, traffic scheduling in wireless networks with multiple access points and caching at cellular network edge. We propose an efficient algorithm called multi-agent combinatorial upper confidence bound (MACUCB) with provable performance guarantees and low communication overhead. Furthermore, we perform extensive experiments to show the effectiveness of the proposed algorithm.

本文研究了一种分布式随机多臂强盗问题，该问题可以解决多个众包平台的任务分配、具有多个接入点的无线网络中的流量调度以及蜂窝网络边缘缓存等许多现实问题。我们提出了一种高效的多智能体组合上置信度界(MACUCB)算法，它具有可证明的性能保证和较低的通信开销。此外，我们进行了大量的实验来证明所提出算法的有效性。

引用次数: 1

FASTBLOCK: Accelerating Blockchains via Hardware Transactional Memory FASTBLOCK:通过硬件事务性内存加速区块链

2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)

Pub Date : 2021-07-01 DOI: 10.1109/ICDCS51616.2021.00032

Yue Li, Han Liu, Yuanliang Chen, Jianbo Gao, Zhenhao Wu, Zhi Guan, Zhong Chen

The efficiency of block lifecycle determines the performance of blockchain, which is critically affected by the execution, mining and validation steps in blockchain lifecycle. To accelerate blockchains, many works focus on optimizing the mining step while ignoring other steps. In this paper, we propose a novel blockchain framework-FastBlock to speed up the execution and validation steps by introducing efficient concurrency. To efficiently prevent the potential concurrency violations, FastBlock utilizes symbolic execution to identify minimal atomic sections in each transaction and guarantees the atomicity of these sections in execution step via an efficient concurrency control mechanism-hardware transactional memory (HTM). To enable a deterministic validation step, FastBlock concurrently re-executes transactions based on a happen-before graph without increasing block size. Finally, we implement FastBlock and evaluate it in terms of conflicting transactions rate, number of transactions per block, and varying thread number. Our results indicate that FastBlock is efficient: the execution step and validation step speed up to 3.0x and 2.3x on average over the original serial model respectively with eight concurrent threads.

区块生命周期的效率决定了区块链的性能，而区块链的性能又受到区块链生命周期中执行、挖掘和验证步骤的严重影响。为了加速区块链，许多工作都专注于优化挖矿步骤，而忽略了其他步骤。在本文中，我们提出了一个新的区块链框架- fastblock，通过引入有效的并发性来加快执行和验证步骤。为了有效地防止潜在的并发性冲突，FastBlock利用符号执行来识别每个事务中的最小原子部分，并通过有效的并发控制机制-硬件事务性内存(HTM)在执行步骤中保证这些部分的原子性。为了启用确定性验证步骤，FastBlock在不增加块大小的情况下，基于happens -before图并发地重新执行事务。最后，我们实现了FastBlock，并根据冲突事务率、每个块的事务数和线程数的变化对其进行了评估。结果表明，FastBlock是高效的:在8个并发线程的情况下，FastBlock的执行步长和验证步长分别比原始串行模型平均提高了3.0倍和2.3倍。

{"title":"FASTBLOCK: Accelerating Blockchains via Hardware Transactional Memory","authors":"Yue Li, Han Liu, Yuanliang Chen, Jianbo Gao, Zhenhao Wu, Zhi Guan, Zhong Chen","doi":"10.1109/ICDCS51616.2021.00032","DOIUrl":"https://doi.org/10.1109/ICDCS51616.2021.00032","url":null,"abstract":"The efficiency of block lifecycle determines the performance of blockchain, which is critically affected by the execution, mining and validation steps in blockchain lifecycle. To accelerate blockchains, many works focus on optimizing the mining step while ignoring other steps. In this paper, we propose a novel blockchain framework-FastBlock to speed up the execution and validation steps by introducing efficient concurrency. To efficiently prevent the potential concurrency violations, FastBlock utilizes symbolic execution to identify minimal atomic sections in each transaction and guarantees the atomicity of these sections in execution step via an efficient concurrency control mechanism-hardware transactional memory (HTM). To enable a deterministic validation step, FastBlock concurrently re-executes transactions based on a happen-before graph without increasing block size. Finally, we implement FastBlock and evaluate it in terms of conflicting transactions rate, number of transactions per block, and varying thread number. Our results indicate that FastBlock is efficient: the execution step and validation step speed up to 3.0x and 2.3x on average over the original serial model respectively with eight concurrent threads.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126139293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

BiCord: Bidirectional Coordination among Coexisting Wireless Devices BiCord:共存无线设备之间的双向协调

2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)

Pub Date : 2021-07-01 DOI: 10.1109/ICDCS51616.2021.00037

Zihao Yu, Pengyu Li, C. Boano, Yuan He, Meng Jin, Xiuzhen Guo, Xiaolong Zheng

Cross-technology interference is a major threat to the dependability of low-power wireless communications. Due to power and bandwidth asymmetries, technologies such as Wi-Fi tend to dominate the RF channel and unintentionally destroy low-power wireless communications from resource-constrained technologies such as ZigBee, leading to severe coexistence issues. To address these issues, existing schemes make ZigBee nodes individually assess the RF channel's availability or let Wi-Fi appliances blindly reserve the medium for the transmissions of low-power devices. Without a two-way interaction between devices making use of different wireless technologies, these approaches have limited scenarios or achieve inefficient network performance. This paper presents BiCord, a bidirectional coordination scheme in which resource-constrained wireless devices such as ZigBee nodes and powerful Wi-Fi appliances coordinate their activities to increase coexistence and enhance network performance. Specifically, in BiCord, ZigBee nodes directly request channel resources from Wi-Fi devices, who then reserve the channel for ZigBee transmissions on-demand. This interaction continues until the transmission requirement of ZigBee nodes is both fulfilled and understood by Wi-Fi devices. This way, BiCord avoids unnecessary channel allocations, maximizes the availability of the spectrum, and minimizes transmission delays. We evaluate BiCord on off-the-shelf Wi-Fi and ZigBee devices, demonstrating its effectiveness experimentally. Among others, our results show that BiCord increases channel utilization by up to 50.6% and reduces the average transmission delay of ZigBee nodes by 84.2% compared to state-of-the-art approaches.

跨技术干扰是影响低功耗无线通信可靠性的主要因素。由于功率和带宽的不对称，Wi-Fi等技术倾向于主导RF信道，无意中破坏了ZigBee等资源受限技术的低功耗无线通信，导致严重的共存问题。为了解决这些问题，现有的方案使ZigBee节点单独评估射频信道的可用性，或者让Wi-Fi设备盲目地为低功耗设备的传输保留介质。如果在使用不同无线技术的设备之间不进行双向交互，这些方法的使用场景就会受到限制，或者实现低效率的网络性能。BiCord是一种双向协调方案，在这种方案中，资源受限的无线设备(如ZigBee节点和功能强大的Wi-Fi设备)可以协调它们的活动，以增加共存并提高网络性能。具体来说，在BiCord中，ZigBee节点直接向Wi-Fi设备请求信道资源，然后Wi-Fi设备为ZigBee按需传输保留信道。这种交互一直持续到ZigBee节点的传输需求被Wi-Fi设备满足并理解为止。这样，BiCord避免了不必要的信道分配，最大限度地提高了频谱的可用性，并最大限度地减少了传输延迟。我们在现成的Wi-Fi和ZigBee设备上评估了BiCord，并通过实验证明了其有效性。其中，我们的研究结果表明，与最先进的方法相比，BiCord将信道利用率提高了50.6%，并将ZigBee节点的平均传输延迟降低了84.2%。

{"title":"BiCord: Bidirectional Coordination among Coexisting Wireless Devices","authors":"Zihao Yu, Pengyu Li, C. Boano, Yuan He, Meng Jin, Xiuzhen Guo, Xiaolong Zheng","doi":"10.1109/ICDCS51616.2021.00037","DOIUrl":"https://doi.org/10.1109/ICDCS51616.2021.00037","url":null,"abstract":"Cross-technology interference is a major threat to the dependability of low-power wireless communications. Due to power and bandwidth asymmetries, technologies such as Wi-Fi tend to dominate the RF channel and unintentionally destroy low-power wireless communications from resource-constrained technologies such as ZigBee, leading to severe coexistence issues. To address these issues, existing schemes make ZigBee nodes individually assess the RF channel's availability or let Wi-Fi appliances blindly reserve the medium for the transmissions of low-power devices. Without a two-way interaction between devices making use of different wireless technologies, these approaches have limited scenarios or achieve inefficient network performance. This paper presents BiCord, a bidirectional coordination scheme in which resource-constrained wireless devices such as ZigBee nodes and powerful Wi-Fi appliances coordinate their activities to increase coexistence and enhance network performance. Specifically, in BiCord, ZigBee nodes directly request channel resources from Wi-Fi devices, who then reserve the channel for ZigBee transmissions on-demand. This interaction continues until the transmission requirement of ZigBee nodes is both fulfilled and understood by Wi-Fi devices. This way, BiCord avoids unnecessary channel allocations, maximizes the availability of the spectrum, and minimizes transmission delays. We evaluate BiCord on off-the-shelf Wi-Fi and ZigBee devices, demonstrating its effectiveness experimentally. Among others, our results show that BiCord increases channel utilization by up to 50.6% and reduces the average transmission delay of ZigBee nodes by 84.2% compared to state-of-the-art approaches.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128045945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

INT-probe: Lightweight In-band Network-Wide Telemetry with Stationary Probes INT-probe:轻型带内网络范围遥测固定探头

2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)

Pub Date : 2021-07-01 DOI: 10.1109/ICDCS51616.2021.00090

Tian Pan, Xingchen Lin, Haoyu Song, Enge Song, Zizheng Bian, Hao Li, Jiao Zhang, Fuliang Li, Tao Huang, Chenhao Jia, Bin Liu

Visibility is essential for operating and troubleshooting intricate networks. In-band Network Telemetry (INT) has been embedded in the latest merchant silicons to offer high-precision device and traffic state visibility. INT is actually an underlying technique and each INT instance covers only one monitoring path. The network-wide measurement coverage therefore requires a high-level orchestration to provision multiple INT paths. An optimal path planning is expected to produce a minimum number of paths with a minimum number of overlapping links. Eulerian trail has been used to solve the general problem. However, in production networks, the vantage points where one can deploy probes to start and terminate INT paths are constrained. In this work, we propose an optimal path planning algorithm, INT-probe, which achieves the network-wide telemetry coverage under the constraint of stationary probes. INT-probe formulates the constrained path planning into an extended multi-depot k-Chinese postman problem (MDCPP-set) and then reduces it to a solvable minimum weight perfect matching problem. We analyze algorithm's theoretical bound and the complexity. Extensive evaluation on both wide area networks and data center networks with different scales and topologies are conducted. We show INT-probe is efficient, high-performance, and practical for real-world deployment. For a large-scale data center networks with 1125 switches, INT-probe can generate 112 monitoring paths (reduced by 50.4 %) by allowing only 1.79% increase of the total path length, promptly resolving link failures within 744.71ms.

可见性对于复杂网络的操作和故障排除至关重要。带内网络遥测(INT)已经嵌入到最新的商用芯片中，以提供高精度的设备和流量状态可见性。INT实际上是一种底层技术，每个INT实例只覆盖一条监视路径。因此，网络范围的测量覆盖需要一个高级的编排来提供多个INT路径。最优路径规划期望产生最少数量的路径和最少数量的重叠链路。欧拉轨迹已被用来解决一般问题。然而，在生产网络中，可以部署探针来启动和终止INT路径的有利位置是受限的。本文提出了一种最优路径规划算法INT-probe，在固定探针约束下实现了全网范围的遥测覆盖。INT-probe将约束路径规划化为扩展的多站点邮差问题(MDCPP-set)，并将其简化为可解的最小权值完美匹配问题。分析了算法的理论边界和复杂度。对不同规模和拓扑结构的广域网和数据中心网络进行了广泛的评估。我们展示了INT-probe在实际部署中是高效、高性能和实用的。对于拥有1125台交换机的大型数据中心网络，INT-probe在只允许总路径长度增加1.79%的情况下，可以生成112条监控路径(减少50.4%)，在744.71ms内及时解决链路故障。

{"title":"INT-probe: Lightweight In-band Network-Wide Telemetry with Stationary Probes","authors":"Tian Pan, Xingchen Lin, Haoyu Song, Enge Song, Zizheng Bian, Hao Li, Jiao Zhang, Fuliang Li, Tao Huang, Chenhao Jia, Bin Liu","doi":"10.1109/ICDCS51616.2021.00090","DOIUrl":"https://doi.org/10.1109/ICDCS51616.2021.00090","url":null,"abstract":"Visibility is essential for operating and troubleshooting intricate networks. In-band Network Telemetry (INT) has been embedded in the latest merchant silicons to offer high-precision device and traffic state visibility. INT is actually an underlying technique and each INT instance covers only one monitoring path. The network-wide measurement coverage therefore requires a high-level orchestration to provision multiple INT paths. An optimal path planning is expected to produce a minimum number of paths with a minimum number of overlapping links. Eulerian trail has been used to solve the general problem. However, in production networks, the vantage points where one can deploy probes to start and terminate INT paths are constrained. In this work, we propose an optimal path planning algorithm, INT-probe, which achieves the network-wide telemetry coverage under the constraint of stationary probes. INT-probe formulates the constrained path planning into an extended multi-depot k-Chinese postman problem (MDCPP-set) and then reduces it to a solvable minimum weight perfect matching problem. We analyze algorithm's theoretical bound and the complexity. Extensive evaluation on both wide area networks and data center networks with different scales and topologies are conducted. We show INT-probe is efficient, high-performance, and practical for real-world deployment. For a large-scale data center networks with 1125 switches, INT-probe can generate 112 monitoring paths (reduced by 50.4 %) by allowing only 1.79% increase of the total path length, promptly resolving link failures within 744.71ms.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126518909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

GRACE: A Compressed Communication Framework for Distributed Machine Learning GRACE:分布式机器学习的压缩通信框架

2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)

Pub Date : 2021-07-01 DOI: 10.1109/ICDCS51616.2021.00060

Hang Xu, Chen-Yu Ho, A. Abdelmoniem, Aritra Dutta, E. Bergou, Konstantinos Karatsenidis, M. Canini, Panos Kalnis

Powerful computer clusters are used nowadays to train complex deep neural networks (DNN) on large datasets. Distributed training increasingly becomes communication bound. For this reason, many lossy compression techniques have been proposed to reduce the volume of transferred data. Unfortunately, it is difficult to argue about the behavior of compression methods, because existing work relies on inconsistent evaluation testbeds and largely ignores the performance impact of practical system configurations. In this paper, we present a comprehensive survey of the most influential compressed communication methods for DNN training, together with an intuitive classification (i.e., quantization, sparsification, hybrid and low-rank). Next, we propose GRACE, a unified framework and API that allows for consistent and easy implementation of compressed communication on popular machine learning toolkits. We instantiate GRACE on TensorFlow and PyTorch, and implement 16 such methods. Finally, we present a thorough quantitative evaluation with a variety of DNNs (convolutional and recurrent), datasets and system configurations. We show that the DNN architecture affects the relative performance among methods. Interestingly, depending on the underlying communication library and computational cost of compression / decompression, we demonstrate that some methods may be impractical. GRACE and the entire benchmarking suite are available as open-source.

如今，强大的计算机集群被用于在大型数据集上训练复杂的深度神经网络(DNN)。分布式训练越来越成为通信约束。由于这个原因，已经提出了许多有损压缩技术来减少传输的数据量。不幸的是，很难争论压缩方法的行为，因为现有的工作依赖于不一致的评估测试平台，并且在很大程度上忽略了实际系统配置对性能的影响。在本文中，我们对DNN训练中最具影响力的压缩通信方法进行了全面的调查，并进行了直观的分类(即量化、稀疏化、混合和低秩)。接下来，我们提出了GRACE，这是一个统一的框架和API，允许在流行的机器学习工具包上一致和轻松地实现压缩通信。我们在TensorFlow和PyTorch上实例化了GRACE，并实现了16个这样的方法。最后，我们对各种深度神经网络(卷积和循环)、数据集和系统配置进行了彻底的定量评估。我们证明了深度神经网络的结构会影响不同方法之间的相对性能。有趣的是，根据底层通信库和压缩/解压缩的计算成本，我们证明了一些方法可能是不切实际的。GRACE和整个基准测试套件都是开源的。

{"title":"GRACE: A Compressed Communication Framework for Distributed Machine Learning","authors":"Hang Xu, Chen-Yu Ho, A. Abdelmoniem, Aritra Dutta, E. Bergou, Konstantinos Karatsenidis, M. Canini, Panos Kalnis","doi":"10.1109/ICDCS51616.2021.00060","DOIUrl":"https://doi.org/10.1109/ICDCS51616.2021.00060","url":null,"abstract":"Powerful computer clusters are used nowadays to train complex deep neural networks (DNN) on large datasets. Distributed training increasingly becomes communication bound. For this reason, many lossy compression techniques have been proposed to reduce the volume of transferred data. Unfortunately, it is difficult to argue about the behavior of compression methods, because existing work relies on inconsistent evaluation testbeds and largely ignores the performance impact of practical system configurations. In this paper, we present a comprehensive survey of the most influential compressed communication methods for DNN training, together with an intuitive classification (i.e., quantization, sparsification, hybrid and low-rank). Next, we propose GRACE, a unified framework and API that allows for consistent and easy implementation of compressed communication on popular machine learning toolkits. We instantiate GRACE on TensorFlow and PyTorch, and implement 16 such methods. Finally, we present a thorough quantitative evaluation with a variety of DNNs (convolutional and recurrent), datasets and system configurations. We show that the DNN architecture affects the relative performance among methods. Interestingly, depending on the underlying communication library and computational cost of compression / decompression, we demonstrate that some methods may be impractical. GRACE and the entire benchmarking suite are available as open-source.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123196796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60

Federated Model Search via Reinforcement Learning 基于强化学习的联邦模型搜索

2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)

Pub Date : 2021-07-01 DOI: 10.1109/ICDCS51616.2021.00084

Dixi Yao, Lingdong Wang, Jiayu Xu, Liyao Xiang, Shuo Shao, Yingqi Chen, Yanjun Tong

Federated Learning (FL) framework enables training over distributed datasets while keeping the data local. However, it is difficult to customize a model fitting for all unknown local data. A pre-determined model is most likely to lead to slow convergence or low accuracy, especially when the distributed data is non-i.i.d.. To resolve the issue, we propose a model searching method in the federated learning scenario, and the method automatically searches a model structure fitting for the unseen local data. We novelly design a reinforcement learning-based framework that samples and distributes sub-models to the participants and updates its model selection policy by maximizing the reward. In practice, the model search algorithm takes a long time to converge, and hence we adaptively assign sub-models to participants according to the transmission condition. We further propose delay-compensated synchronization to mitigate loss over late updates to facilitate convergence. Extensive experiments show that our federated model search algorithm produces highly accurate models efficiently, particularly on non-i.i.d. data.

联邦学习(FL)框架支持在保持数据本地的同时对分布式数据集进行训练。然而，很难定制一个适合所有未知局部数据的模型。预先确定的模型很可能导致收敛速度慢或精度低，特别是当分布式数据是非id的时候。为了解决这一问题，我们提出了一种联邦学习场景下的模型搜索方法，该方法自动搜索适合不可见的局部数据的模型结构。我们新颖地设计了一个基于强化学习的框架，该框架对参与者的子模型进行采样和分配，并通过最大化奖励来更新其模型选择策略。在实际应用中，由于模型搜索算法收敛时间较长，因此我们根据传输条件自适应地为参与者分配子模型。我们进一步提出延迟补偿同步，以减轻延迟更新的损失，以促进收敛。大量的实验表明，我们的联邦模型搜索算法可以有效地生成高精度的模型，特别是在非id上。数据。

{"title":"Federated Model Search via Reinforcement Learning","authors":"Dixi Yao, Lingdong Wang, Jiayu Xu, Liyao Xiang, Shuo Shao, Yingqi Chen, Yanjun Tong","doi":"10.1109/ICDCS51616.2021.00084","DOIUrl":"https://doi.org/10.1109/ICDCS51616.2021.00084","url":null,"abstract":"Federated Learning (FL) framework enables training over distributed datasets while keeping the data local. However, it is difficult to customize a model fitting for all unknown local data. A pre-determined model is most likely to lead to slow convergence or low accuracy, especially when the distributed data is non-i.i.d.. To resolve the issue, we propose a model searching method in the federated learning scenario, and the method automatically searches a model structure fitting for the unseen local data. We novelly design a reinforcement learning-based framework that samples and distributes sub-models to the participants and updates its model selection policy by maximizing the reward. In practice, the model search algorithm takes a long time to converge, and hence we adaptively assign sub-models to participants according to the transmission condition. We further propose delay-compensated synchronization to mitigate loss over late updates to facilitate convergence. Extensive experiments show that our federated model search algorithm produces highly accurate models efficiently, particularly on non-i.i.d. data.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132421100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3