M. Coletti, Chathika Gunaratne, Catherine D. Schuman, Robert M. Patton
{"title":"通过对抗进化算法训练强化学习模型","authors":"M. Coletti, Chathika Gunaratne, Catherine D. Schuman, Robert M. Patton","doi":"10.1145/3547276.3548635","DOIUrl":null,"url":null,"abstract":"When training for control problems, more episodes used in training usually leads to better generalizability, but more episodes also requires significantly more training time. There are a variety of approaches for selecting the way that training episodes are chosen, including fixed episodes, uniform sampling, and stochastic sampling, but they can all leave gaps in the training landscape. In this work, we describe an approach that leverages an adversarial evolutionary algorithm to identify the worst performing states for a given model. We then use information about these states in the next cycle of training, which is repeated until the desired level of model performance is met. We demonstrate this approach with the OpenAI Gym cart-pole problem. We show that the adversarial evolutionary algorithm did not reduce the number of episodes required in training needed to attain model generalizability when compared with stochastic sampling, and actually performed slightly worse.","PeriodicalId":255540,"journal":{"name":"Workshop Proceedings of the 51st International Conference on Parallel Processing","volume":"402 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Training reinforcement learning models via an adversarial evolutionary algorithm\",\"authors\":\"M. Coletti, Chathika Gunaratne, Catherine D. Schuman, Robert M. Patton\",\"doi\":\"10.1145/3547276.3548635\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When training for control problems, more episodes used in training usually leads to better generalizability, but more episodes also requires significantly more training time. There are a variety of approaches for selecting the way that training episodes are chosen, including fixed episodes, uniform sampling, and stochastic sampling, but they can all leave gaps in the training landscape. In this work, we describe an approach that leverages an adversarial evolutionary algorithm to identify the worst performing states for a given model. We then use information about these states in the next cycle of training, which is repeated until the desired level of model performance is met. We demonstrate this approach with the OpenAI Gym cart-pole problem. We show that the adversarial evolutionary algorithm did not reduce the number of episodes required in training needed to attain model generalizability when compared with stochastic sampling, and actually performed slightly worse.\",\"PeriodicalId\":255540,\"journal\":{\"name\":\"Workshop Proceedings of the 51st International Conference on Parallel Processing\",\"volume\":\"402 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop Proceedings of the 51st International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3547276.3548635\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3547276.3548635","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Training reinforcement learning models via an adversarial evolutionary algorithm
When training for control problems, more episodes used in training usually leads to better generalizability, but more episodes also requires significantly more training time. There are a variety of approaches for selecting the way that training episodes are chosen, including fixed episodes, uniform sampling, and stochastic sampling, but they can all leave gaps in the training landscape. In this work, we describe an approach that leverages an adversarial evolutionary algorithm to identify the worst performing states for a given model. We then use information about these states in the next cycle of training, which is repeated until the desired level of model performance is met. We demonstrate this approach with the OpenAI Gym cart-pole problem. We show that the adversarial evolutionary algorithm did not reduce the number of episodes required in training needed to attain model generalizability when compared with stochastic sampling, and actually performed slightly worse.