Joint Optimization of Concave Scalarized Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm

IF 4.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Journal of Artificial Intelligence Research Pub Date : 2022-08-09 DOI:10.1613/jair.1.13981

Qinbo Bai, Mridul Agarwal, V. Aggarwal

引用次数: 1

Abstract

Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives. In this paper, we formulate the problem of maximizing a non-linear concave function of multiple long-term objectives. A policy-gradient based model-free algorithm is proposed for the problem. To compute an estimate of the gradient, an asymptotically biased estimator is proposed. The proposed algorithm is shown to achieve convergence to within an ε of the global optima after sampling O(M4 σ2/(1-γ)8ε4) trajectories where γ is the discount factor and M is the number of the agents, thus achieving the same dependence on ε as the policy gradient algorithm for the standard reinforcement learning.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于策略梯度算法的凹标化多目标强化学习联合优化

许多工程问题都有多个目标，总体目标是优化这些目标的非线性函数。本文讨论了多长期目标非线性凹函数的最大化问题。针对该问题，提出了一种基于策略梯度的无模型算法。为了计算梯度的估计，提出了渐近偏估计。结果表明，该算法在采样O(M4 σ2/(1-γ)8ε4)个轨迹(其中γ为折现因子，M为智能体数量)后，收敛到全局最优值的ε范围内，从而实现了与标准强化学习的策略梯度算法相同的对ε的依赖。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Artificial Intelligence Research 工程技术-计算机：人工智能

CiteScore

9.60

自引率

4.00%

发文量

审稿时长

4 months

期刊介绍： JAIR(ISSN 1076 - 9757) covers all areas of artificial intelligence (AI), publishing refereed research articles, survey articles, and technical notes. Established in 1993 as one of the first electronic scientific journals, JAIR is indexed by INSPEC, Science Citation Index, and MathSciNet. JAIR reviews papers within approximately three months of submission and publishes accepted articles on the internet immediately upon receiving the final versions. JAIR articles are published for free distribution on the internet by the AI Access Foundation, and for purchase in bound volumes by AAAI Press.

期刊最新文献

Collective Belief Revision Competitive Equilibria with a Constant Number of Chores Improving Resource Allocations by Sharing in Pairs A General Model for Aggregating Annotations Across Simple, Complex, and Multi-Object Annotation Tasks Asymptotics of K-Fold Cross Validation