Unified finite-time error analysis of soft Q-learning

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neurocomputing Pub Date : 2025-02-04 DOI:10.1016/j.neucom.2025.129582

Narim Jeong, Donghwan Lee

引用次数: 0

Abstract

Soft Q-learning is one of the most commonly used reinforcement learning algorithms for various purposes, e.g., dealing with entropy-regularized Markov decision problems, reducing the overestimation bias, and improving explorations. Its effectiveness in practice has led to its widespread use; however, there has not been much theoretical study on soft Q-learning. This paper attempts to provide an integrated finite-time analytical approach for soft Q-learning from a control-theoretic perspective. We examine three different kinds of soft Q-learning algorithms that use the log-sum-exp operator, the Boltzmann operator, and the mellowmax operator, respectively. Utilizing dynamical switching system models, we obtain the finite-time error bounds of three soft Q-learning variants. We believe that our analysis can assist in a better understanding of soft Q-learning through links with switching system models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.

期刊最新文献

Editorial Board Trusted Cross-view Completion for incomplete multi-view classification A spatio-frequency cross fusion model for deepfake detection and segmentation Lightweight oriented object detection with Dynamic Smooth Feature Fusion Network Dynamic event-triggering adaptive dynamic programming for robust stabilization of partially unknown nonlinear systems