The impact of data distribution on Q-learning with function approximation

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine Learning Pub Date : 2024-06-07 DOI:10.1007/s10994-024-06564-5

Pedro P. Santos, Diogo S. Carvalho, Alberto Sardinha, Francisco S. Melo

{"title":"The impact of data distribution on Q-learning with function approximation","authors":"Pedro P. Santos, Diogo S. Carvalho, Alberto Sardinha, Francisco S. Melo","doi":"10.1007/s10994-024-06564-5","DOIUrl":null,"url":null,"abstract":"We study the interplay between the data distribution and Q-learning-based algorithms with function approximation. We provide a unified theoretical and empirical analysis as to how different properties of the data distribution influence the performance of Q-learning-based algorithms. We connect different lines of research, as well as validate and extend previous results, being primarily focused on offline settings. First, we analyze the impact of the data distribution by using optimization as a tool to better understand which data distributions yield low concentrability coefficients. We motivate high-entropy distributions from a game-theoretical point of view and propose an algorithm to find the optimal data distribution from the point of view of concentrability. Second, from an empirical perspective, we introduce a novel four-state MDP specifically tailored to highlight the impact of the data distribution in the performance of Q-learning-based algorithms with function approximation. Finally, we experimentally assess the impact of the data distribution properties on the performance of two offline Q-learning-based algorithms under different environments. Our results attest to the importance of different properties of the data distribution such as entropy, coverage, and data quality (closeness to optimal policy).","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"19 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-024-06564-5","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

We study the interplay between the data distribution and Q-learning-based algorithms with function approximation. We provide a unified theoretical and empirical analysis as to how different properties of the data distribution influence the performance of Q-learning-based algorithms. We connect different lines of research, as well as validate and extend previous results, being primarily focused on offline settings. First, we analyze the impact of the data distribution by using optimization as a tool to better understand which data distributions yield low concentrability coefficients. We motivate high-entropy distributions from a game-theoretical point of view and propose an algorithm to find the optimal data distribution from the point of view of concentrability. Second, from an empirical perspective, we introduce a novel four-state MDP specifically tailored to highlight the impact of the data distribution in the performance of Q-learning-based algorithms with function approximation. Finally, we experimentally assess the impact of the data distribution properties on the performance of two offline Q-learning-based algorithms under different environments. Our results attest to the importance of different properties of the data distribution such as entropy, coverage, and data quality (closeness to optimal policy).

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

数据分布对函数逼近的 Q-learning 的影响

我们研究了数据分布与基于 Q-learning 的函数逼近算法之间的相互作用。我们对数据分布的不同属性如何影响基于 Q-learning 算法的性能进行了统一的理论和实证分析。我们连接了不同的研究方向，并验证和扩展了以前的成果，主要集中在离线设置上。首先，我们分析了数据分布的影响，将优化作为一种工具，以更好地了解哪些数据分布会产生低同质性系数。我们从博弈论的角度提出了高熵分布的动机，并提出了一种从可集中性的角度寻找最优数据分布的算法。其次，我们从实证的角度出发，引入了一种新的四状态 MDP，专门用于突出数据分布对基于 Q-learning 算法的函数近似性能的影响。最后，我们通过实验评估了数据分布特性在不同环境下对两种基于 Q-learning 的离线算法性能的影响。我们的结果证明了数据分布的不同属性（如熵、覆盖率和数据质量（与最优策略的接近程度））的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Machine Learning 工程技术-计算机：人工智能

CiteScore

11.00

自引率

2.70%

发文量

162

审稿时长

3 months

期刊介绍： Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.