Occupation Measure Heuristics to Solve Stochastic Shortest Path with Dead Ends

2018 7th Brazilian Conference on Intelligent Systems (BRACIS) Pub Date : 2018-10-01 DOI:10.1109/bracis.2018.00096

Milton Condori Fernández, Leliane Nunes de Barros, Karina Valdivia Delgado

{"title":"Occupation Measure Heuristics to Solve Stochastic Shortest Path with Dead Ends","authors":"Milton Condori Fernández, Leliane Nunes de Barros, Karina Valdivia Delgado","doi":"10.1109/bracis.2018.00096","DOIUrl":null,"url":null,"abstract":"The most efficient approach to solve probabilistic planning problems is based on stochastic shortest path (SSP) and uses heuristic search to find a policy that minimizes the expected accumulated cost to the goal (MINCOST criterion). However, this approach can only solve problems with dead ends (states from which it is not possible to reach the goal) if an efficient dead end detection heuristic is used. Another solution would be to plan in two phases: maximizing the probability to reach the goal (MAXPROB) and then minimizing the expected cost (MINCOST). While there exist several heuristics to solve MINCOST, there is not known efficient heuristics to solve MAXPROB. A recent work proposes the first heuristic that takes into account the probabilities, called h pom, which solves a relaxed version of an SSP as a linear program in the dual space. However, to solve large problems with dead ends, h pom must be augmented with a dead end detection heuristic (e.g., h_pom and h_max ). In this work, we propose two new heuristics based on h pom. The first, h^pe_pom (s), estimates the minimal cost of state s to reach the goal, including new variables and constraints for the dead ends and adding an expected penalty for reaching them. The second, h ppom (s), estimates the maximum probability of state s to reach the goal, and is used to solve MAXPROB problems by ignoring action costs; We claim that h ppom (s) is the first heuristic for MAXPROB. Empirical results show that h^pe_pom can solve larger planning instances when compared to h pom h_max.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/bracis.2018.00096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The most efficient approach to solve probabilistic planning problems is based on stochastic shortest path (SSP) and uses heuristic search to find a policy that minimizes the expected accumulated cost to the goal (MINCOST criterion). However, this approach can only solve problems with dead ends (states from which it is not possible to reach the goal) if an efficient dead end detection heuristic is used. Another solution would be to plan in two phases: maximizing the probability to reach the goal (MAXPROB) and then minimizing the expected cost (MINCOST). While there exist several heuristics to solve MINCOST, there is not known efficient heuristics to solve MAXPROB. A recent work proposes the first heuristic that takes into account the probabilities, called h pom, which solves a relaxed version of an SSP as a linear program in the dual space. However, to solve large problems with dead ends, h pom must be augmented with a dead end detection heuristic (e.g., h_pom and h_max ). In this work, we propose two new heuristics based on h pom. The first, h^pe_pom (s), estimates the minimal cost of state s to reach the goal, including new variables and constraints for the dead ends and adding an expected penalty for reaching them. The second, h ppom (s), estimates the maximum probability of state s to reach the goal, and is used to solve MAXPROB problems by ignoring action costs; We claim that h ppom (s) is the first heuristic for MAXPROB. Empirical results show that h^pe_pom can solve larger planning instances when compared to h pom h_max.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

带死角随机最短路径的职业测度启发式求解

求解概率规划问题的最有效方法是基于随机最短路径(SSP)，并使用启发式搜索来找到一个策略，该策略使到目标的期望累积成本最小(MINCOST准则)。然而，如果使用有效的死角检测启发式方法，这种方法只能解决死角问题(不可能达到目标的状态)。另一个解决方案是分两个阶段进行计划:最大化达到目标的概率(MAXPROB)，然后最小化预期成本(MINCOST)。虽然存在几种求解MINCOST的启发式方法，但没有已知的求解MAXPROB的有效启发式方法。最近的一项工作提出了第一个考虑概率的启发式算法，称为h - pom，它将SSP作为对偶空间中的线性规划解决了一个宽松版本。然而，为了解决有死角的大问题，必须用死角检测启发式(例如，h_pom和h_max)来增强h - pom。在这项工作中，我们提出了两种新的启发式算法。第一个，h^pe_pom (s)，估计达到目标的状态s的最小成本，包括死角的新变量和约束，并添加达到它们的预期惩罚。第二个是h ppom (s)，估计状态s达到目标的最大概率，并通过忽略行动成本来解决MAXPROB问题;我们声称h (s)是MAXPROB的第一个启发式。实验结果表明，与h_max算法相比，h^pe_pom算法可以求解更大的规划实例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 7th Brazilian Conference on Intelligent Systems (BRACIS)

自引率

0.00%

发文量