Fitted Q-Iteration via Max-Plus-Linear Approximation

IF 2 Q2 AUTOMATION & CONTROL SYSTEMS IEEE Control Systems Letters Pub Date : 2024-12-18 DOI:10.1109/LCSYS.2024.3520060

Yichen Liu;Mohamad Amin Sharifi Kolarijani

引用次数: 0

Abstract

In this letter, we consider the application of max-plus-linear approximators for Q-function in offline reinforcement learning of discounted Markov decision processes. In particular, we incorporate these approximators to propose novel fitted Q-iteration (FQI) algorithms with provable convergence. Exploiting the compatibility of the Bellman operator with max-plus operations, we show that the max-plus-linear regression within each iteration of the proposed FQI algorithm reduces to simple max-plus matrix-vector multiplications. We also consider the variational implementation of the proposed algorithm which leads to a per-iteration complexity that is independent of the number of samples.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于最大加线性逼近的拟合q -迭代

在这封信中，我们考虑了q函数的最大加线性逼近器在贴现马尔可夫决策过程的离线强化学习中的应用。特别地，我们结合这些近似来提出新颖的具有可证明收敛性的拟合q迭代（FQI）算法。利用Bellman算子与max-plus操作的兼容性，我们证明了所提出的FQI算法的每次迭代中的max-plus线性回归可以简化为简单的max-plus矩阵向量乘法。我们还考虑了所提出算法的变分实现，这导致了独立于样本数量的每次迭代复杂性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊