{"title":"基于价值的强化学习方法的增加折现因子替代改进","authors":"Linjian Hou, Zhengming Wang, Han Long","doi":"10.1109/CSE53436.2021.00023","DOIUrl":null,"url":null,"abstract":"Discount factor is typically considered as a constant value in conventional Reinforcement Learning (RL) methods, and the exponential inhibition is used to evaluate the future rewards that can guarantee the theoretical convergence of Bellman Equation. However, exponential inhibition mode greatly underestimates future rewards, which is obviously unreasonable. Future rewards, especially those that are closer to the completion of the task, should be given greater importance. In this paper, we review the rationale of discount factor and propose an increasing discount factor to reduce the underestimation effect of exponential inhibition on future rewards. We test two value-based reinforcement learning methods in three scenarios to verify our method. The experimental results show that value-based reinforcement learning with increasing discount factor is more efficient than it with fixed discount factor under certain circumstances.","PeriodicalId":6838,"journal":{"name":"2021 IEEE 24th International Conference on Computational Science and Engineering (CSE)","volume":"47 1","pages":"94-100"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Improvement for Value-Based Reinforcement Learning Method Through Increasing Discount Factor Substitution\",\"authors\":\"Linjian Hou, Zhengming Wang, Han Long\",\"doi\":\"10.1109/CSE53436.2021.00023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Discount factor is typically considered as a constant value in conventional Reinforcement Learning (RL) methods, and the exponential inhibition is used to evaluate the future rewards that can guarantee the theoretical convergence of Bellman Equation. However, exponential inhibition mode greatly underestimates future rewards, which is obviously unreasonable. Future rewards, especially those that are closer to the completion of the task, should be given greater importance. In this paper, we review the rationale of discount factor and propose an increasing discount factor to reduce the underestimation effect of exponential inhibition on future rewards. We test two value-based reinforcement learning methods in three scenarios to verify our method. The experimental results show that value-based reinforcement learning with increasing discount factor is more efficient than it with fixed discount factor under certain circumstances.\",\"PeriodicalId\":6838,\"journal\":{\"name\":\"2021 IEEE 24th International Conference on Computational Science and Engineering (CSE)\",\"volume\":\"47 1\",\"pages\":\"94-100\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 24th International Conference on Computational Science and Engineering (CSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSE53436.2021.00023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 24th International Conference on Computational Science and Engineering (CSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSE53436.2021.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Improvement for Value-Based Reinforcement Learning Method Through Increasing Discount Factor Substitution
Discount factor is typically considered as a constant value in conventional Reinforcement Learning (RL) methods, and the exponential inhibition is used to evaluate the future rewards that can guarantee the theoretical convergence of Bellman Equation. However, exponential inhibition mode greatly underestimates future rewards, which is obviously unreasonable. Future rewards, especially those that are closer to the completion of the task, should be given greater importance. In this paper, we review the rationale of discount factor and propose an increasing discount factor to reduce the underestimation effect of exponential inhibition on future rewards. We test two value-based reinforcement learning methods in three scenarios to verify our method. The experimental results show that value-based reinforcement learning with increasing discount factor is more efficient than it with fixed discount factor under certain circumstances.