On the lack of gradient domination for linear quadratic Gaussian problems with incomplete state information

2021 60th IEEE Conference on Decision and Control (CDC) Pub Date : 2021-12-14 DOI:10.1109/CDC45484.2021.9683369

Hesameddin Mohammadi, M. Soltanolkotabi, M. Jovanović

引用次数: 7

Abstract

Policy gradient algorithms in model-free reinforcement learning have been shown to achieve global exponential convergence for the Linear Quadratic Regulator problem despite the lack of convexity. However, extending such guarantees beyond the scope of standard LQR and full-state feedback has remained open. A key enabler for existing results on LQR is the so-called gradient dominance property of the underlying optimization problem that can be used as a surrogate for strong convexity. In this paper, we take a step further by studying the convergence of gradient descent for the Linear Quadratic Gaussian problem and demonstrate through examples that LQG does not satisfy the gradient dominance property. Our study shows the non-uniqueness of equilibrium points and thus disproves the global convergence of policy gradient methods for LQG.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

状态信息不完全的线性二次高斯问题的缺乏梯度控制

无模型强化学习中的策略梯度算法虽然缺乏凸性，但仍能实现线性二次型调节器问题的全局指数收敛。然而，将这种保证扩展到标准LQR和全状态反馈范围之外仍然是开放的。LQR上现有结果的一个关键促成因素是所谓的底层优化问题的梯度优势属性，它可以用作强凸性的替代。本文进一步研究了线性二次高斯问题的梯度下降的收敛性，并通过实例证明了LQG不满足梯度优势性。我们的研究证明了平衡点的非唯一性，从而反驳了LQG策略梯度方法的全局收敛性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 60th IEEE Conference on Decision and Control (CDC)

自引率

0.00%

发文量