Milton Condori Fernández, Leliane Nunes de Barros, Karina Valdivia Delgado
{"title":"Occupation Measure Heuristics to Solve Stochastic Shortest Path with Dead Ends","authors":"Milton Condori Fernández, Leliane Nunes de Barros, Karina Valdivia Delgado","doi":"10.1109/bracis.2018.00096","DOIUrl":null,"url":null,"abstract":"The most efficient approach to solve probabilistic planning problems is based on stochastic shortest path (SSP) and uses heuristic search to find a policy that minimizes the expected accumulated cost to the goal (MINCOST criterion). However, this approach can only solve problems with dead ends (states from which it is not possible to reach the goal) if an efficient dead end detection heuristic is used. Another solution would be to plan in two phases: maximizing the probability to reach the goal (MAXPROB) and then minimizing the expected cost (MINCOST). While there exist several heuristics to solve MINCOST, there is not known efficient heuristics to solve MAXPROB. A recent work proposes the first heuristic that takes into account the probabilities, called h pom, which solves a relaxed version of an SSP as a linear program in the dual space. However, to solve large problems with dead ends, h pom must be augmented with a dead end detection heuristic (e.g., h_pom and h_max ). In this work, we propose two new heuristics based on h pom. The first, h^pe_pom (s), estimates the minimal cost of state s to reach the goal, including new variables and constraints for the dead ends and adding an expected penalty for reaching them. The second, h ppom (s), estimates the maximum probability of state s to reach the goal, and is used to solve MAXPROB problems by ignoring action costs; We claim that h ppom (s) is the first heuristic for MAXPROB. Empirical results show that h^pe_pom can solve larger planning instances when compared to h pom h_max.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/bracis.2018.00096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The most efficient approach to solve probabilistic planning problems is based on stochastic shortest path (SSP) and uses heuristic search to find a policy that minimizes the expected accumulated cost to the goal (MINCOST criterion). However, this approach can only solve problems with dead ends (states from which it is not possible to reach the goal) if an efficient dead end detection heuristic is used. Another solution would be to plan in two phases: maximizing the probability to reach the goal (MAXPROB) and then minimizing the expected cost (MINCOST). While there exist several heuristics to solve MINCOST, there is not known efficient heuristics to solve MAXPROB. A recent work proposes the first heuristic that takes into account the probabilities, called h pom, which solves a relaxed version of an SSP as a linear program in the dual space. However, to solve large problems with dead ends, h pom must be augmented with a dead end detection heuristic (e.g., h_pom and h_max ). In this work, we propose two new heuristics based on h pom. The first, h^pe_pom (s), estimates the minimal cost of state s to reach the goal, including new variables and constraints for the dead ends and adding an expected penalty for reaching them. The second, h ppom (s), estimates the maximum probability of state s to reach the goal, and is used to solve MAXPROB problems by ignoring action costs; We claim that h ppom (s) is the first heuristic for MAXPROB. Empirical results show that h^pe_pom can solve larger planning instances when compared to h pom h_max.