Beyond consensus and synchrony in decentralized online optimization using saddle point method

A. S. Bedi, Alec Koppel, K. Rajawat
{"title":"Beyond consensus and synchrony in decentralized online optimization using saddle point method","authors":"A. S. Bedi, Alec Koppel, K. Rajawat","doi":"10.1109/ACSSC.2017.8335186","DOIUrl":null,"url":null,"abstract":"We consider online learning problems in multiagent systems comprised of distinct subsets of agents operating without a common time-scale. Each individual in the network is charged with minimizing the global regret, which is a sum of the instantaneous sub-optimality of each agent's actions with respect to a fixed global clairvoyant actor with access to all costs across the network for all time up to a time-horizon T. Since agents are not assumed to be of the same type, the hypothesis that all agents seek a common action is violated, and thus we instead introduce a notion of network discrepancy as a measure of how well agents coordinate their behavior while retaining distinct local behavior. Moreover, agents are not assumed to receive the sequentially arriving costs on a common time index, and thus seek to learn in an asynchronous manner. A variant of the Arrow-Hurwicz saddle point algorithm is proposed to control the growth of global regret and network discrepancy. This algorithm uses Lagrange multipliers to penalize the discrepancies between agents and leads to an implementation that relies on local operations and exchange of variables between neighbors. Decisions made with this method lead to regret whose order is O(√T) and network discrepancy O(T3/4). Empirical evaluation is conducted on an asynchronously operating sensor network estimating a spatially correlated random field.","PeriodicalId":296208,"journal":{"name":"2017 51st Asilomar Conference on Signals, Systems, and Computers","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 51st Asilomar Conference on Signals, Systems, and Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACSSC.2017.8335186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

We consider online learning problems in multiagent systems comprised of distinct subsets of agents operating without a common time-scale. Each individual in the network is charged with minimizing the global regret, which is a sum of the instantaneous sub-optimality of each agent's actions with respect to a fixed global clairvoyant actor with access to all costs across the network for all time up to a time-horizon T. Since agents are not assumed to be of the same type, the hypothesis that all agents seek a common action is violated, and thus we instead introduce a notion of network discrepancy as a measure of how well agents coordinate their behavior while retaining distinct local behavior. Moreover, agents are not assumed to receive the sequentially arriving costs on a common time index, and thus seek to learn in an asynchronous manner. A variant of the Arrow-Hurwicz saddle point algorithm is proposed to control the growth of global regret and network discrepancy. This algorithm uses Lagrange multipliers to penalize the discrepancies between agents and leads to an implementation that relies on local operations and exchange of variables between neighbors. Decisions made with this method lead to regret whose order is O(√T) and network discrepancy O(T3/4). Empirical evaluation is conducted on an asynchronously operating sensor network estimating a spatially correlated random field.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
鞍点法在分散在线优化中的超越共识与同步
我们考虑了多智能体系统中的在线学习问题,该系统由不同的智能体子集组成,没有共同的时间尺度。网络中的每个个体都有最小化全局后悔的责任,全局后悔是每个智能体相对于一个固定的全局千里眼行为者的行动的瞬时次最优性的总和,该行为者在所有时间跨度t内都能获得网络上的所有成本。由于智能体不被假设为相同类型,所有智能体寻求共同行动的假设被违反了。因此,我们引入了网络差异的概念,作为衡量代理在保持不同本地行为的同时协调其行为的程度。此外,不假设代理在一个公共时间索引上接收顺序到达的成本,因此寻求以异步方式学习。提出了一种改进的Arrow-Hurwicz鞍点算法来控制全局后悔和网络差异的增长。该算法使用拉格朗日乘数来惩罚代理之间的差异,并导致依赖于本地操作和邻居之间变量交换的实现。用这种方法做出的决策导致后悔的顺序为O(√T),网络差异为O(T3/4)。对异步运行的传感器网络估计空间相关随机场进行了经验评价。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
milliProxy: A TCP proxy architecture for 5G mmWave cellular systems Joint user scheduling and power optimization in full-duplex cells with successive interference cancellation Deep neural network architectures for modulation classification Towards provably invisible network flow fingerprints Seeded graph matching: Efficient algorithms and theoretical guarantees
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1