{"title":"Enhancing supervisory training signals with environmental reinforcement learning using adaptive dynamic programming and artificial neural networks","authors":"N. Melton, D. Wunsch","doi":"10.1109/ICCI-CC.2016.7862056","DOIUrl":null,"url":null,"abstract":"A method for hybridizing supervised learning with adaptive dynamic programming was developed to increase the speed, quality, and robustness of on-line neural network learning from an imperfect teacher. Reinforcement learning is used to modify and enhance the original supervisory signal before learning occurs. This paper describes the method of hybridization and presents a model problem in which a human supervisor teaches a simulated car to drive around a race track. Simulation results show successful learning and improvements in convergence time, error rate, and stability over either component method alone.","PeriodicalId":135701,"journal":{"name":"2016 IEEE 15th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 15th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCI-CC.2016.7862056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A method for hybridizing supervised learning with adaptive dynamic programming was developed to increase the speed, quality, and robustness of on-line neural network learning from an imperfect teacher. Reinforcement learning is used to modify and enhance the original supervisory signal before learning occurs. This paper describes the method of hybridization and presents a model problem in which a human supervisor teaches a simulated car to drive around a race track. Simulation results show successful learning and improvements in convergence time, error rate, and stability over either component method alone.