{"title":"Swap Softmax Twin Delayed Deep Deterministic Policy Gradient","authors":"Chaohu Liu, Yunbo Zhao","doi":"10.1109/ISAS59543.2023.10164333","DOIUrl":null,"url":null,"abstract":"Reinforcement learning algorithms have attained noteworthy accomplishments in the field of continuous control. One of the classic algorithms in continuous control, the DDPG algorithm, is widely used and has been shown to be susceptible to overestimation. Following this, the TD3 algorithm was introduced, which integrated the notion of double DQN. TD3 takes into account the minimum value between a pair of critics to restrict overestimation. Nevertheless, TD3 may lead to an underestimation bias. To mitigate the impact of errors, we present a novel approach by integrating Swap Softmax with TD3, which can counterbalance the extreme values. We assess the efficacy of our proposed technique on continuous control tasks that are simulated by MuJoCo and provided by OpenAI Gym. Our experimental findings demonstrate a significant enhancement in the performance and robustness.","PeriodicalId":199115,"journal":{"name":"2023 6th International Symposium on Autonomous Systems (ISAS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 6th International Symposium on Autonomous Systems (ISAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISAS59543.2023.10164333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Reinforcement learning algorithms have attained noteworthy accomplishments in the field of continuous control. One of the classic algorithms in continuous control, the DDPG algorithm, is widely used and has been shown to be susceptible to overestimation. Following this, the TD3 algorithm was introduced, which integrated the notion of double DQN. TD3 takes into account the minimum value between a pair of critics to restrict overestimation. Nevertheless, TD3 may lead to an underestimation bias. To mitigate the impact of errors, we present a novel approach by integrating Swap Softmax with TD3, which can counterbalance the extreme values. We assess the efficacy of our proposed technique on continuous control tasks that are simulated by MuJoCo and provided by OpenAI Gym. Our experimental findings demonstrate a significant enhancement in the performance and robustness.