Beyond consensus and synchrony in decentralized online optimization using saddle point method

2017 51st Asilomar Conference on Signals, Systems, and Computers Pub Date : 2017-10-01 DOI:10.1109/ACSSC.2017.8335186

A. S. Bedi, Alec Koppel, K. Rajawat

{"title":"Beyond consensus and synchrony in decentralized online optimization using saddle point method","authors":"A. S. Bedi, Alec Koppel, K. Rajawat","doi":"10.1109/ACSSC.2017.8335186","DOIUrl":null,"url":null,"abstract":"We consider online learning problems in multiagent systems comprised of distinct subsets of agents operating without a common time-scale. Each individual in the network is charged with minimizing the global regret, which is a sum of the instantaneous sub-optimality of each agent's actions with respect to a fixed global clairvoyant actor with access to all costs across the network for all time up to a time-horizon T. Since agents are not assumed to be of the same type, the hypothesis that all agents seek a common action is violated, and thus we instead introduce a notion of network discrepancy as a measure of how well agents coordinate their behavior while retaining distinct local behavior. Moreover, agents are not assumed to receive the sequentially arriving costs on a common time index, and thus seek to learn in an asynchronous manner. A variant of the Arrow-Hurwicz saddle point algorithm is proposed to control the growth of global regret and network discrepancy. This algorithm uses Lagrange multipliers to penalize the discrepancies between agents and leads to an implementation that relies on local operations and exchange of variables between neighbors. Decisions made with this method lead to regret whose order is O(√T) and network discrepancy O(T3/4). Empirical evaluation is conducted on an asynchronously operating sensor network estimating a spatially correlated random field.","PeriodicalId":296208,"journal":{"name":"2017 51st Asilomar Conference on Signals, Systems, and Computers","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 51st Asilomar Conference on Signals, Systems, and Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACSSC.2017.8335186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

We consider online learning problems in multiagent systems comprised of distinct subsets of agents operating without a common time-scale. Each individual in the network is charged with minimizing the global regret, which is a sum of the instantaneous sub-optimality of each agent's actions with respect to a fixed global clairvoyant actor with access to all costs across the network for all time up to a time-horizon T. Since agents are not assumed to be of the same type, the hypothesis that all agents seek a common action is violated, and thus we instead introduce a notion of network discrepancy as a measure of how well agents coordinate their behavior while retaining distinct local behavior. Moreover, agents are not assumed to receive the sequentially arriving costs on a common time index, and thus seek to learn in an asynchronous manner. A variant of the Arrow-Hurwicz saddle point algorithm is proposed to control the growth of global regret and network discrepancy. This algorithm uses Lagrange multipliers to penalize the discrepancies between agents and leads to an implementation that relies on local operations and exchange of variables between neighbors. Decisions made with this method lead to regret whose order is O(√T) and network discrepancy O(T3/4). Empirical evaluation is conducted on an asynchronously operating sensor network estimating a spatially correlated random field.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

鞍点法在分散在线优化中的超越共识与同步

我们考虑了多智能体系统中的在线学习问题，该系统由不同的智能体子集组成，没有共同的时间尺度。网络中的每个个体都有最小化全局后悔的责任，全局后悔是每个智能体相对于一个固定的全局千里眼行为者的行动的瞬时次最优性的总和，该行为者在所有时间跨度t内都能获得网络上的所有成本。由于智能体不被假设为相同类型，所有智能体寻求共同行动的假设被违反了。因此，我们引入了网络差异的概念，作为衡量代理在保持不同本地行为的同时协调其行为的程度。此外，不假设代理在一个公共时间索引上接收顺序到达的成本，因此寻求以异步方式学习。提出了一种改进的Arrow-Hurwicz鞍点算法来控制全局后悔和网络差异的增长。该算法使用拉格朗日乘数来惩罚代理之间的差异，并导致依赖于本地操作和邻居之间变量交换的实现。用这种方法做出的决策导致后悔的顺序为O(√T)，网络差异为O(T3/4)。对异步运行的传感器网络估计空间相关随机场进行了经验评价。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 51st Asilomar Conference on Signals, Systems, and Computers

自引率

0.00%

发文量

期刊最新文献

milliProxy: A TCP proxy architecture for 5G mmWave cellular systems Joint user scheduling and power optimization in full-duplex cells with successive interference cancellation Deep neural network architectures for modulation classification Towards provably invisible network flow fingerprints Seeded graph matching: Efficient algorithms and theoretical guarantees