Bringing Fairness to Actor-Critic Reinforcement Learning for Network Utility Optimization

Jingdi Chen, Yimeng Wang, T. Lan
{"title":"Bringing Fairness to Actor-Critic Reinforcement Learning for Network Utility Optimization","authors":"Jingdi Chen, Yimeng Wang, T. Lan","doi":"10.1109/INFOCOM42981.2021.9488823","DOIUrl":null,"url":null,"abstract":"Fairness is a crucial design objective in virtually all network optimization problems, where limited system resources are shared by multiple agents. Recently, reinforcement learning has been successfully applied to autonomous online decision making in many network design and optimization problems. However, most of them try to maximize the long-term (discounted) reward of all agents, without taking fairness into account. In this paper, we propose a family of algorithms that bring fairness to actorcritic reinforcement learning for optimizing general fairness utility functions. In particular, we present a novel method for adjusting the rewards in standard reinforcement learning by a multiplicative weight depending on both the shape of fairness utility and some statistics of past rewards. It is shown that for proper choice of the adjusted rewards, a policy gradient update converges to at least a stationary point of general αfairness utility optimization. It inspires the design of fairness optimization algorithms in actor-critic reinforcement learning. Evaluations show that the proposed algorithm can be easily deployed in real-world network optimization problems, such as wireless scheduling and video QoE optimization, and can significantly improve the fairness utility value over previous heuristics and learning algorithms.","PeriodicalId":293079,"journal":{"name":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM42981.2021.9488823","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

Fairness is a crucial design objective in virtually all network optimization problems, where limited system resources are shared by multiple agents. Recently, reinforcement learning has been successfully applied to autonomous online decision making in many network design and optimization problems. However, most of them try to maximize the long-term (discounted) reward of all agents, without taking fairness into account. In this paper, we propose a family of algorithms that bring fairness to actorcritic reinforcement learning for optimizing general fairness utility functions. In particular, we present a novel method for adjusting the rewards in standard reinforcement learning by a multiplicative weight depending on both the shape of fairness utility and some statistics of past rewards. It is shown that for proper choice of the adjusted rewards, a policy gradient update converges to at least a stationary point of general αfairness utility optimization. It inspires the design of fairness optimization algorithms in actor-critic reinforcement learning. Evaluations show that the proposed algorithm can be easily deployed in real-world network optimization problems, such as wireless scheduling and video QoE optimization, and can significantly improve the fairness utility value over previous heuristics and learning algorithms.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为网络效用优化引入行为-评价强化学习的公平性
当有限的系统资源被多个智能体共享时,公平性是几乎所有网络优化问题的关键设计目标。近年来,强化学习已成功地应用于许多网络设计和优化问题的自主在线决策。然而,它们中的大多数都试图最大化所有代理的长期(贴现)奖励,而没有考虑公平性。在本文中,我们提出了一系列算法,这些算法将公平性引入行为批评家强化学习,以优化一般公平性效用函数。特别地,我们提出了一种新的方法来调整标准强化学习中的奖励,这种方法是基于公平效用的形状和过去奖励的一些统计数据,通过乘法加权来调整奖励。结果表明,对于适当选择调整后的奖励,策略梯度更新收敛于一般α公平效用优化的至少一个平稳点。它启发了演员-评论家强化学习中公平性优化算法的设计。评价结果表明,该算法可以很容易地应用于实际网络优化问题,如无线调度和视频QoE优化,并且与之前的启发式算法和学习算法相比,可以显著提高公平性效用值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Message from the TPC Chairs Enabling Edge-Cloud Video Analytics for Robotics Applications Practical Analysis of Replication-Based Systems Towards Minimum Fleet for Ridesharing-Aware Mobility-on-Demand Systems Beyond Value Perturbation: Local Differential Privacy in the Temporal Setting
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1