{"title":"Multigoal Reinforcement Learning via Exploring Entropy-Regularized Successor Matching","authors":"Xiaoyun Feng;Yun Zhou","doi":"10.1109/TG.2023.3304315","DOIUrl":null,"url":null,"abstract":"Multigoal reinforcement learning (RL) algorithms tend to achieve and generalize over diverse goals. However, unlike single-goal agents, multigoal agents struggle to break through the exploration bottleneck with a fair share of interactions, owing to rarely reusable goal-oriented experiences with sparse goal-reaching rewards. Therefore, well-arranged behavior goals during training are essential for multigoal agents, especially in long-horizon tasks. To this end, we propose efficient multigoal exploration on the basis of maximizing the entropy of successor features and Exploring entropy-regularized successor matching, namely, E\n<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\nSM. E\n<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\nSM adopts the idea of a successor feature and extends it to entropy-regularized goal-reaching successor mapping that serves as a more stable state feature under sparse rewards. The key contribution of our work is to perform intrinsic goal setting with behavior goals that are more likely to be achieved in terms of future state occupancies as well as promising in expanding the exploration frontier. Experiments on challenging long-horizon manipulation tasks show that E\n<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\nSM deals well with sparse rewards and in pursuit of maximal state-covering, E\n<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\nSM efficiently identifies valuable behavior goals toward specific goal-reaching by matching the successor mapping.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"15 4","pages":"538-548"},"PeriodicalIF":1.7000,"publicationDate":"2023-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Games","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10214633/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Multigoal reinforcement learning (RL) algorithms tend to achieve and generalize over diverse goals. However, unlike single-goal agents, multigoal agents struggle to break through the exploration bottleneck with a fair share of interactions, owing to rarely reusable goal-oriented experiences with sparse goal-reaching rewards. Therefore, well-arranged behavior goals during training are essential for multigoal agents, especially in long-horizon tasks. To this end, we propose efficient multigoal exploration on the basis of maximizing the entropy of successor features and Exploring entropy-regularized successor matching, namely, E
$^{2}$
SM. E
$^{2}$
SM adopts the idea of a successor feature and extends it to entropy-regularized goal-reaching successor mapping that serves as a more stable state feature under sparse rewards. The key contribution of our work is to perform intrinsic goal setting with behavior goals that are more likely to be achieved in terms of future state occupancies as well as promising in expanding the exploration frontier. Experiments on challenging long-horizon manipulation tasks show that E
$^{2}$
SM deals well with sparse rewards and in pursuit of maximal state-covering, E
$^{2}$
SM efficiently identifies valuable behavior goals toward specific goal-reaching by matching the successor mapping.