Wenwu Yu, Rui Wang, Ruiying Li, Jing Gao, Xiaohui Hu
{"title":"历史上最好的深度强化学习q网络","authors":"Wenwu Yu, Rui Wang, Ruiying Li, Jing Gao, Xiaohui Hu","doi":"10.1109/ICTAI.2018.00012","DOIUrl":null,"url":null,"abstract":"The popular DQN algorithm is known to have some instability and variability which make its performance poor sometimes. In prior work, there is only one target network, the network that is updated by the latest learned Q-value estimate. In this paper, we present multiple target networks which are the extension to the Deep Q-Networks (DQN). Based on the previously learned Q-value estimate networks, we choose several networks that perform best in all previous networks as our auxiliary networks. We show that in order to solve the problem of determining which network is better, we use the score of each episode as a measure of the quality of the network. The key behind our method is that each auxiliary network has some states that it is good at handling and guides the agent to make the right choices. We apply our method to the Atari 2600 games from the OpenAI Gym. We find that DQN with auxiliary networks significantly improves the performance and the stability of games.","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Historical Best Q-Networks for Deep Reinforcement Learning\",\"authors\":\"Wenwu Yu, Rui Wang, Ruiying Li, Jing Gao, Xiaohui Hu\",\"doi\":\"10.1109/ICTAI.2018.00012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The popular DQN algorithm is known to have some instability and variability which make its performance poor sometimes. In prior work, there is only one target network, the network that is updated by the latest learned Q-value estimate. In this paper, we present multiple target networks which are the extension to the Deep Q-Networks (DQN). Based on the previously learned Q-value estimate networks, we choose several networks that perform best in all previous networks as our auxiliary networks. We show that in order to solve the problem of determining which network is better, we use the score of each episode as a measure of the quality of the network. The key behind our method is that each auxiliary network has some states that it is good at handling and guides the agent to make the right choices. We apply our method to the Atari 2600 games from the OpenAI Gym. We find that DQN with auxiliary networks significantly improves the performance and the stability of games.\",\"PeriodicalId\":254686,\"journal\":{\"name\":\"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)\",\"volume\":\"172 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTAI.2018.00012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2018.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Historical Best Q-Networks for Deep Reinforcement Learning
The popular DQN algorithm is known to have some instability and variability which make its performance poor sometimes. In prior work, there is only one target network, the network that is updated by the latest learned Q-value estimate. In this paper, we present multiple target networks which are the extension to the Deep Q-Networks (DQN). Based on the previously learned Q-value estimate networks, we choose several networks that perform best in all previous networks as our auxiliary networks. We show that in order to solve the problem of determining which network is better, we use the score of each episode as a measure of the quality of the network. The key behind our method is that each auxiliary network has some states that it is good at handling and guides the agent to make the right choices. We apply our method to the Atari 2600 games from the OpenAI Gym. We find that DQN with auxiliary networks significantly improves the performance and the stability of games.