{"title":"Curious model-building control systems","authors":"J. Schmidhuber","doi":"10.1109/IJCNN.1991.170605","DOIUrl":null,"url":null,"abstract":"A novel curious model-building control system is described which actively tries to provoke situations for which it learned to expect to learn something about the environment. Such a system has been implemented as a four-network system based on Watkins' Q-learning algorithm which can be used to maximize the expectation of the temporal derivative of the adaptive assumed reliability of future predictions. An experiment with an artificial nondeterministic environment demonstrates that the system can be superior to previous model-building control systems, which do not address the problem of modeling the reliability of the world model's predictions in uncertain environments and use ad-hoc methods (like random search) to train the world model.<<ETX>>","PeriodicalId":211135,"journal":{"name":"[Proceedings] 1991 IEEE International Joint Conference on Neural Networks","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"1991-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"662","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"[Proceedings] 1991 IEEE International Joint Conference on Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.1991.170605","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 662
Abstract
A novel curious model-building control system is described which actively tries to provoke situations for which it learned to expect to learn something about the environment. Such a system has been implemented as a four-network system based on Watkins' Q-learning algorithm which can be used to maximize the expectation of the temporal derivative of the adaptive assumed reliability of future predictions. An experiment with an artificial nondeterministic environment demonstrates that the system can be superior to previous model-building control systems, which do not address the problem of modeling the reliability of the world model's predictions in uncertain environments and use ad-hoc methods (like random search) to train the world model.<>