Bringing Fairness to Actor-Critic Reinforcement Learning for Network Utility Optimization

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications Pub Date : 2021-05-10 DOI:10.1109/INFOCOM42981.2021.9488823

Jingdi Chen, Yimeng Wang, T. Lan

{"title":"Bringing Fairness to Actor-Critic Reinforcement Learning for Network Utility Optimization","authors":"Jingdi Chen, Yimeng Wang, T. Lan","doi":"10.1109/INFOCOM42981.2021.9488823","DOIUrl":null,"url":null,"abstract":"Fairness is a crucial design objective in virtually all network optimization problems, where limited system resources are shared by multiple agents. Recently, reinforcement learning has been successfully applied to autonomous online decision making in many network design and optimization problems. However, most of them try to maximize the long-term (discounted) reward of all agents, without taking fairness into account. In this paper, we propose a family of algorithms that bring fairness to actorcritic reinforcement learning for optimizing general fairness utility functions. In particular, we present a novel method for adjusting the rewards in standard reinforcement learning by a multiplicative weight depending on both the shape of fairness utility and some statistics of past rewards. It is shown that for proper choice of the adjusted rewards, a policy gradient update converges to at least a stationary point of general αfairness utility optimization. It inspires the design of fairness optimization algorithms in actor-critic reinforcement learning. Evaluations show that the proposed algorithm can be easily deployed in real-world network optimization problems, such as wireless scheduling and video QoE optimization, and can significantly improve the fairness utility value over previous heuristics and learning algorithms.","PeriodicalId":293079,"journal":{"name":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM42981.2021.9488823","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

Fairness is a crucial design objective in virtually all network optimization problems, where limited system resources are shared by multiple agents. Recently, reinforcement learning has been successfully applied to autonomous online decision making in many network design and optimization problems. However, most of them try to maximize the long-term (discounted) reward of all agents, without taking fairness into account. In this paper, we propose a family of algorithms that bring fairness to actorcritic reinforcement learning for optimizing general fairness utility functions. In particular, we present a novel method for adjusting the rewards in standard reinforcement learning by a multiplicative weight depending on both the shape of fairness utility and some statistics of past rewards. It is shown that for proper choice of the adjusted rewards, a policy gradient update converges to at least a stationary point of general αfairness utility optimization. It inspires the design of fairness optimization algorithms in actor-critic reinforcement learning. Evaluations show that the proposed algorithm can be easily deployed in real-world network optimization problems, such as wireless scheduling and video QoE optimization, and can significantly improve the fairness utility value over previous heuristics and learning algorithms.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

为网络效用优化引入行为-评价强化学习的公平性

当有限的系统资源被多个智能体共享时，公平性是几乎所有网络优化问题的关键设计目标。近年来，强化学习已成功地应用于许多网络设计和优化问题的自主在线决策。然而，它们中的大多数都试图最大化所有代理的长期(贴现)奖励，而没有考虑公平性。在本文中，我们提出了一系列算法，这些算法将公平性引入行为批评家强化学习，以优化一般公平性效用函数。特别地，我们提出了一种新的方法来调整标准强化学习中的奖励，这种方法是基于公平效用的形状和过去奖励的一些统计数据，通过乘法加权来调整奖励。结果表明，对于适当选择调整后的奖励，策略梯度更新收敛于一般α公平效用优化的至少一个平稳点。它启发了演员-评论家强化学习中公平性优化算法的设计。评价结果表明，该算法可以很容易地应用于实际网络优化问题，如无线调度和视频QoE优化，并且与之前的启发式算法和学习算法相比，可以显著提高公平性效用值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications

自引率

0.00%

发文量

期刊最新文献

Message from the TPC Chairs Enabling Edge-Cloud Video Analytics for Robotics Applications Practical Analysis of Replication-Based Systems Towards Minimum Fleet for Ridesharing-Aware Mobility-on-Demand Systems Beyond Value Perturbation: Local Differential Privacy in the Temporal Setting