Noah Topper, George K. Atia, Ashutosh Trivedi, Alvaro Velasquez
{"title":"Active Grammatical Inference for Non-Markovian Planning","authors":"Noah Topper, George K. Atia, Ashutosh Trivedi, Alvaro Velasquez","doi":"10.1609/icaps.v32i1.19853","DOIUrl":null,"url":null,"abstract":"Planning in finite stochastic environments is canonically posed as a Markov decision process where the transition and reward structures are explicitly known. Reinforcement learning (RL) lifts the explicitness assumption by working with sampling models instead. Further, with the advent of reward machines, we can relax the Markovian assumption on the reward. Angluin's active grammatical inference algorithm L* has found novel application in explicating reward machines for non-Markovian RL. We propose maintaining the assumption of explicit transition dynamics, but with an implicit non-Markovian reward signal, which must be inferred from experiments. We call this setting non-Markovian planning, as opposed to non-Markovian RL. The proposed approach leverages L* to explicate an automaton structure for the underlying planning objective. We exploit the environment model to learn an automaton faster and integrate it with value iteration to accelerate the planning. We compare against recent non-Markovian RL solutions which leverage grammatical inference, and establish complexity results that illustrate the difference in runtime between grammatical inference in planning and RL settings.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Automated Planning and Scheduling","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/icaps.v32i1.19853","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Planning in finite stochastic environments is canonically posed as a Markov decision process where the transition and reward structures are explicitly known. Reinforcement learning (RL) lifts the explicitness assumption by working with sampling models instead. Further, with the advent of reward machines, we can relax the Markovian assumption on the reward. Angluin's active grammatical inference algorithm L* has found novel application in explicating reward machines for non-Markovian RL. We propose maintaining the assumption of explicit transition dynamics, but with an implicit non-Markovian reward signal, which must be inferred from experiments. We call this setting non-Markovian planning, as opposed to non-Markovian RL. The proposed approach leverages L* to explicate an automaton structure for the underlying planning objective. We exploit the environment model to learn an automaton faster and integrate it with value iteration to accelerate the planning. We compare against recent non-Markovian RL solutions which leverage grammatical inference, and establish complexity results that illustrate the difference in runtime between grammatical inference in planning and RL settings.