{"title":"基于经验分类的深度强化学习记忆减少与优先经验回放","authors":"Kai-Huan Shen, P. Tsai","doi":"10.1109/SiPS47522.2019.9020610","DOIUrl":null,"url":null,"abstract":"Prioritized experience replay has been widely used in many online reinforcement learning algorithms, providing high efficiency in exploiting past experiences. However, a large replay buffer consumes system storage significantly. Thus, in this paper, a segmentation and classification scheme is proposed. The distribution of temporal-difference errors (TD errors) is first segmented. The experience for network training is classified according to its updated TD error. Then, a swap mechanism for similar experiences is implemented to change the lifetimes of experiences in the replay buffer. The proposed scheme is incorporated in the Deep Deterministic Policy Gradient (DDPG) algorithm, and the Inverted Pendulum and Inverted Double Pendulum tasks are used for verification. From the experiments, our proposed mechanism can effectively remove the buffer redundancy and further reduce the correlation of experiences in the replay buffer. Thus, better learning performance with reduced memory size is achieved at the cost of additional computations of updated TD errors.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"23 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Memory Reduction through Experience Classification f or Deep Reinforcement Learning with Prioritized Experience Replay\",\"authors\":\"Kai-Huan Shen, P. Tsai\",\"doi\":\"10.1109/SiPS47522.2019.9020610\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Prioritized experience replay has been widely used in many online reinforcement learning algorithms, providing high efficiency in exploiting past experiences. However, a large replay buffer consumes system storage significantly. Thus, in this paper, a segmentation and classification scheme is proposed. The distribution of temporal-difference errors (TD errors) is first segmented. The experience for network training is classified according to its updated TD error. Then, a swap mechanism for similar experiences is implemented to change the lifetimes of experiences in the replay buffer. The proposed scheme is incorporated in the Deep Deterministic Policy Gradient (DDPG) algorithm, and the Inverted Pendulum and Inverted Double Pendulum tasks are used for verification. From the experiments, our proposed mechanism can effectively remove the buffer redundancy and further reduce the correlation of experiences in the replay buffer. Thus, better learning performance with reduced memory size is achieved at the cost of additional computations of updated TD errors.\",\"PeriodicalId\":256971,\"journal\":{\"name\":\"2019 IEEE International Workshop on Signal Processing Systems (SiPS)\",\"volume\":\"23 6\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Workshop on Signal Processing Systems (SiPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SiPS47522.2019.9020610\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SiPS47522.2019.9020610","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Memory Reduction through Experience Classification f or Deep Reinforcement Learning with Prioritized Experience Replay
Prioritized experience replay has been widely used in many online reinforcement learning algorithms, providing high efficiency in exploiting past experiences. However, a large replay buffer consumes system storage significantly. Thus, in this paper, a segmentation and classification scheme is proposed. The distribution of temporal-difference errors (TD errors) is first segmented. The experience for network training is classified according to its updated TD error. Then, a swap mechanism for similar experiences is implemented to change the lifetimes of experiences in the replay buffer. The proposed scheme is incorporated in the Deep Deterministic Policy Gradient (DDPG) algorithm, and the Inverted Pendulum and Inverted Double Pendulum tasks are used for verification. From the experiments, our proposed mechanism can effectively remove the buffer redundancy and further reduce the correlation of experiences in the replay buffer. Thus, better learning performance with reduced memory size is achieved at the cost of additional computations of updated TD errors.