{"title":"Adaptive action selection using utility-based reinforcement learning","authors":"Kunrong Chen, Fen Lin, Qing Tan, Zhongzhi Shi","doi":"10.1109/GRC.2009.5255163","DOIUrl":null,"url":null,"abstract":"A basic problem of intelligent systems is choosing adaptive action to perform in a non-stationary environment. Due to the combinatorial complexity of actions, agent cannot possibly consider every option available to it at every instant in time. It needs to find good policies that dictate optimum actions to perform in each situation. This paper proposes an algorithm, called UQ-learning, to better solve action selection problem by using reinforcement learning and utility function. Reinforcement learning can provide the information of environment and utility function is used to balance Exploration-Exploitation dilemma. We implement our method with maze navigation tasks in a non-stationary environment. The results of simulated experiments show that utility-based reinforcement learning approach is more effective and efficient compared with Q-learning and Recency-Based Exploration.","PeriodicalId":388774,"journal":{"name":"2009 IEEE International Conference on Granular Computing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Conference on Granular Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GRC.2009.5255163","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
A basic problem of intelligent systems is choosing adaptive action to perform in a non-stationary environment. Due to the combinatorial complexity of actions, agent cannot possibly consider every option available to it at every instant in time. It needs to find good policies that dictate optimum actions to perform in each situation. This paper proposes an algorithm, called UQ-learning, to better solve action selection problem by using reinforcement learning and utility function. Reinforcement learning can provide the information of environment and utility function is used to balance Exploration-Exploitation dilemma. We implement our method with maze navigation tasks in a non-stationary environment. The results of simulated experiments show that utility-based reinforcement learning approach is more effective and efficient compared with Q-learning and Recency-Based Exploration.