International Conference on Automated Planning and Scheduling最新文献

英文中文

Approximating the Value of Collaborative Team Actions for Efficient Multiagent Navigation in Uncertain Graphs 不确定图中高效多智能体导航的协同团队行动值逼近

International Conference on Automated Planning and Scheduling

Pub Date : 2023-07-01 DOI: 10.1609/icaps.v33i1.27250

M. Stadler, Jacopo Banfi, Nicholas A. Roy

For a team of collaborative agents navigating through an unknown environment, collaborative actions such as sensing the traversability of a route can have a large impact on aggregate team performance. However, planning over the full space of joint team actions is generally computationally intractable. Furthermore, typically only a small number of collaborative actions is useful for a given team task, but it is not obvious how to assess the usefulness of a given action. In this work, we model collaborative team policies on stochastic graphs using macro-actions, where each macro-action for a given agent can consist of a sequence of movements, sensing actions, and actions of waiting to receive information from other agents. To reduce the number of macro-actions considered during planning, we generate optimistic approximations of candidate future team states, then restrict the planning domain to a small policy class which consists of only macro-actions which are likely to lead to high-reward future team states. We optimize team plans over the small policy class, and demonstrate that the approach enables a team to find policies which actively balance between reducing task-relevant environmental uncertainty and efficiently navigating to goals in toy graph and island road network domains, finding better plans than policies that do not act to reduce environmental uncertainty.

对于在未知环境中导航的协作代理团队，诸如感知路线的可穿越性之类的协作行为可能会对总体团队性能产生很大影响。然而，在联合团队行动的整个空间上进行规划通常在计算上是难以处理的。此外，对于给定的团队任务，通常只有少量的协作操作是有用的，但是如何评估给定操作的有用性并不明显。在这项工作中，我们使用宏动作在随机图上建模协作团队策略，其中给定代理的每个宏动作可以由一系列动作、感知动作和等待从其他代理接收信息的动作组成。为了减少在规划过程中考虑的宏观行为的数量，我们生成候选未来团队状态的乐观近似，然后将规划域限制为一个小策略类，该策略类仅由可能导致高回报未来团队状态的宏观行为组成。我们在小策略类上优化团队计划，并证明该方法使团队能够找到在减少任务相关环境不确定性和有效导航到玩具图和岛屿道路网络域的目标之间积极平衡的策略，找到比不采取行动减少环境不确定性的策略更好的计划。

{"title":"Approximating the Value of Collaborative Team Actions for Efficient Multiagent Navigation in Uncertain Graphs","authors":"M. Stadler, Jacopo Banfi, Nicholas A. Roy","doi":"10.1609/icaps.v33i1.27250","DOIUrl":"https://doi.org/10.1609/icaps.v33i1.27250","url":null,"abstract":"For a team of collaborative agents navigating through an unknown environment, collaborative actions such as sensing the traversability of a route can have a large impact on aggregate team performance. However, planning over the full space of joint team actions is generally computationally intractable. Furthermore, typically only a small number of collaborative actions is useful for a given team task, but it is not obvious how to assess the usefulness of a given action. In this work, we model collaborative team policies on stochastic graphs using macro-actions, where each macro-action for a given agent can consist of a sequence of movements, sensing actions, and actions of waiting to receive information from other agents. To reduce the number of macro-actions considered during planning, we generate optimistic approximations of candidate future team states, then restrict the planning domain to a small policy class which consists of only macro-actions which are likely to lead to high-reward future team states. We optimize team plans over the small policy class, and demonstrate that the approach enables a team to find policies which actively balance between reducing task-relevant environmental uncertainty and efficiently navigating to goals in toy graph and island road network domains, finding better plans than policies that do not act to reduce environmental uncertainty.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"382 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133455968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Goal Recognition with Timing Information 基于时间信息的目标识别

International Conference on Automated Planning and Scheduling

Pub Date : 2023-07-01 DOI: 10.1609/icaps.v33i1.27224

Chenyuan Zhang, Charles Kemp, N. Lipovetzky

Goal recognition has been extensively studied by AI researchers, but most algorithms take only observed actions as input. Here we argue that the time taken to carry out these actions provides an additional signal that supports goal recognition. We present a behavioral experiment confirming that people use timing information in this way, and develop and evaluate a goal recognition algorithm that is sensitive to both actions and timing information. Our results suggest that existing goal recognition algorithms can be improved by incorporating a model of planning time on both synthetic data and human data, and that these improvements can be substantial in scenarios in which relatively few actions have been observed.

人工智能研究人员对目标识别进行了广泛的研究，但大多数算法只将观察到的动作作为输入。在这里，我们认为执行这些动作所花费的时间提供了一个支持目标识别的额外信号。我们提出了一个行为实验，证实了人们以这种方式使用时间信息，并开发和评估了一种对动作和时间信息都敏感的目标识别算法。我们的研究结果表明，现有的目标识别算法可以通过结合合成数据和人类数据的计划时间模型来改进，并且这些改进在观察到的行动相对较少的情况下是实质性的。

引用次数: 2

Parallel Batch Processing for the Coating Problem 涂层问题的并行批处理

International Conference on Automated Planning and Scheduling

Pub Date : 2023-07-01 DOI: 10.1609/icaps.v33i1.27192

M. Horn, Emir Demirovic, N. Yorke-Smith

We solve a challenging scheduling problem with parallel batch processing and two-dimensional shelf strip packing constraints that arises in the tool coating field. Tools are assembled on so-called planetaries (batches) before they are loaded into coating machines to get coated. The assembling is not trivial and must fulfil specific constraints, which we refer to as shelf strip packing constraints. Further, each tool is associated with a starting time window s.t. tools can only be put on the same planetary if their time window overlap. The objective is to minimise the makespan and the number of required planetaries. Since the problem naturally decomposes into scheduling and packing parts, we tackle the problem with a two-phase logic-based Benders decomposition approach. The master problem assigns items to batches. The first phase solves as subproblem the packing problem by checking if the assignment is feasible, whereas the second phase solves the scheduling subproblem. The approach is compared with a monolithic mixed integer linear programming approach as well as a monolithic constraint programming approach. Experimental evaluation shows that our proposed approach outperforms the state-of-the-art benchmarks by solving more instances to optimality in a shorter time.

我们解决了刀具涂层领域中出现的具有并行批量加工和二维货架条包装约束的具有挑战性的调度问题。工具被组装在所谓的行星(批次)上，然后被装入涂层机进行涂层。装配不是简单的，必须满足特定的约束条件，我们称之为货架条包装约束。此外，每个工具都与一个起始时间窗口相关联，只有当工具的时间窗口重叠时，它们才能被放置在同一个行星上。目标是尽量减少完工时间和所需行星的数量。由于问题自然地分解为调度和包装部分，我们使用基于两阶段逻辑的Benders分解方法来解决问题。主问题将项目分配给批次。第一阶段通过检查分配是否可行将打包问题作为子问题来解决，第二阶段解决调度子问题。将该方法与整体混合整数线性规划方法和整体约束规划方法进行了比较。实验评估表明，我们提出的方法通过在更短的时间内解决更多的实例以达到最优性，从而优于最先进的基准。

{"title":"Parallel Batch Processing for the Coating Problem","authors":"M. Horn, Emir Demirovic, N. Yorke-Smith","doi":"10.1609/icaps.v33i1.27192","DOIUrl":"https://doi.org/10.1609/icaps.v33i1.27192","url":null,"abstract":"We solve a challenging scheduling problem with parallel batch processing and two-dimensional shelf strip packing constraints that arises in the tool coating field. Tools are assembled on so-called planetaries (batches) before they are loaded into coating machines to get coated. The assembling is not trivial and must fulfil specific constraints, which we refer to as shelf strip packing constraints. Further, each tool is associated with a starting time window s.t. tools can only be put on the same planetary if their time window overlap. The objective is to minimise the makespan and the number of required planetaries. Since the problem naturally decomposes into scheduling and packing parts, we tackle the problem with a two-phase logic-based Benders decomposition approach. The master problem assigns items to batches. The first phase solves as subproblem the packing problem by checking if the assignment is feasible, whereas the second phase solves the scheduling subproblem. The approach is compared with a monolithic mixed integer linear programming approach as well as a monolithic constraint programming approach. Experimental evaluation shows that our proposed approach outperforms the state-of-the-art benchmarks by solving more instances to optimality in a shorter time.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133307709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Solving the Multi-Choice Two Dimensional Shelf Strip Packing Problem with Time Windows 带时间窗的多选择二维货架条包装问题的求解

International Conference on Automated Planning and Scheduling

Pub Date : 2023-07-01 DOI: 10.1609/icaps.v33i1.27229

M. Horn, Emir Demirovic, N. Yorke-Smith

In the tool coating field, scheduling of production lines requires solving an optimisation problem which we call the multi-choice two-dimensional shelf strip packing problem with time windows. A set of rectangular items needs to be packed in two stages: items are placed on shelves, which in turn are placed on one of several available strips. Crucially, the item's width depends on the chosen strip and each item is associated with a time window such that items can only be placed on the same shelf if their time windows overlap. In collaboration with an industrial partner, this real-world optimisation problem is tackled in this paper by both exact and heuristic methods. The exact method is an arc-flow-based integer linear programming formulation, solved with the commercial solver CPLEX. Experimental evaluation shows that this approach can solve instances to proven optimality with up to 20 different item sizes. Larger, more realistic instances are solved heuristically by an adaptive large neighbourhood search, using first fit and best fit decreasing approaches as repair heuristics. In this way, we obtain high-quality solutions with a remaining optimality gap below 3.3% for instances with up to 2000 different item sizes. The work reported is due to be incorporated into an end-to-end decision support system with the industrial partner.

在刀具涂层领域，生产线调度需要解决一个优化问题，我们称之为带时间窗的二维多选择货架条形包装问题。一组矩形物品需要分两个阶段打包:物品被放置在货架上，而货架又被放置在几个可用的条带上。最重要的是，物品的宽度取决于所选择的条，每个物品都与一个时间窗口相关联，这样，只有当它们的时间窗口重叠时，物品才能放在同一个架子上。在与工业合作伙伴的合作中，本文通过精确和启发式方法解决了这个现实世界的优化问题。精确的方法是一个基于弧流的整数线性规划公式，用商用求解器CPLEX求解。实验评估表明，该方法可以解决多达20个不同项目大小的实例，以证明最优性。更大、更现实的实例通过自适应大邻域搜索启发式地解决，使用首次拟合和最佳拟合递减方法作为修复启发式。通过这种方式，我们获得了高质量的解决方案，对于多达2000个不同项目大小的实例，剩余最优性差距低于3.3%。报告的工作将与工业合作伙伴一起纳入端到端决策支持系统。

{"title":"Solving the Multi-Choice Two Dimensional Shelf Strip Packing Problem with Time Windows","authors":"M. Horn, Emir Demirovic, N. Yorke-Smith","doi":"10.1609/icaps.v33i1.27229","DOIUrl":"https://doi.org/10.1609/icaps.v33i1.27229","url":null,"abstract":"In the tool coating field, scheduling of production lines requires solving an optimisation problem which we call the multi-choice two-dimensional shelf strip packing problem with time windows. A set of rectangular items needs to be packed in two stages: items are placed on shelves, which in turn are placed on one of several available strips. Crucially, the item's width depends on the chosen strip and each item is associated with a time window such that items can only be placed on the same shelf if their time windows overlap. In collaboration with an industrial partner, this real-world optimisation problem is tackled in this paper by both exact and heuristic methods. The exact method is an arc-flow-based integer linear programming formulation, solved with the commercial solver CPLEX. Experimental evaluation shows that this approach can solve instances to proven optimality with up to 20 different item sizes. Larger, more realistic instances are solved heuristically by an adaptive large neighbourhood search, using first fit and best fit decreasing approaches as repair heuristics. In this way, we obtain high-quality solutions with a remaining optimality gap below 3.3% for instances with up to 2000 different item sizes. The work reported is due to be incorporated into an end-to-end decision support system with the industrial partner.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"12 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114093339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Finding Matrix Multiplication Algorithms with Classical Planning 基于经典规划的矩阵乘法算法研究

International Conference on Automated Planning and Scheduling

Pub Date : 2023-07-01 DOI: 10.1609/icaps.v33i1.27220

David Speck, Paul Höft, Daniel Gnad, Jendrik Seipp

Matrix multiplication is a fundamental operation of linear algebra, with applications ranging from quantum physics to artificial intelligence. Given its importance, enormous resources have been invested in the search for faster matrix multiplication algorithms. Recently, this search has been cast as a single-player game. By learning how to play this game efficiently, the newly-introduced AlphaTensor reinforcement learning agent is able to discover many new faster algorithms. In this paper, we show that finding matrix multiplication algorithms can also be cast as a classical planning problem. Based on this observation, we introduce a challenging benchmark suite for classical planning and evaluate state-of-the-art planning techniques on it. We analyze the strengths and limitations of different planning approaches in this domain and show that we can use classical planning to find lower bounds and concrete algorithms for matrix multiplication.

矩阵乘法是线性代数的基本运算，其应用范围从量子物理到人工智能。鉴于它的重要性，人们已经投入了大量的资源来寻找更快的矩阵乘法算法。最近，这一搜索变成了单人游戏。通过学习如何有效地玩这个游戏，新引入的alphatensensor强化学习代理能够发现许多新的更快的算法。在本文中，我们证明了寻找矩阵乘法算法也可以作为一个经典的规划问题。基于这一观察，我们为经典规划引入了一个具有挑战性的基准套件，并在其上评估最先进的规划技术。我们分析了不同规划方法在这一领域的优势和局限性，并表明我们可以使用经典规划来寻找矩阵乘法的下界和具体算法。

引用次数: 1

Automaton-Guided Curriculum Generation for Reinforcement Learning Agents 用于强化学习代理的自动引导课程生成

International Conference on Automated Planning and Scheduling

Pub Date : 2023-04-11 DOI: 10.48550/arXiv.2304.05271

Yash Shukla, A. Kulkarni, R. Wright, Alvaro Velasquez, J. Sinapov

Despite advances in Reinforcement Learning, many sequential decision making tasks remain prohibitively expensive and impractical to learn. Recently, approaches that automatically generate reward functions from logical task specifications have been proposed to mitigate this issue; however, they scale poorly on long-horizon tasks (i.e., tasks where the agent needs to perform a series of correct actions to reach the goal state, considering future transitions while choosing an action). Employing a curriculum (a sequence of increasingly complex tasks) further improves the learning speed of the agent by sequencing intermediate tasks suited to the learning capacity of the agent. However, generating curricula from the logical specification still remains an unsolved problem. To this end, we propose AGCL, Automaton-guided Curriculum Learning, a novel method for automatically generating curricula for the target task in the form of Directed Acyclic Graphs (DAGs). AGCL encodes the specification in the form of a deterministic finite automaton (DFA), and then uses the DFA along with the Object-Oriented MDP (OOMDP) representation to generate a curriculum as a DAG, where the vertices correspond to tasks, and edges correspond to the direction of knowledge transfer. Experiments in gridworld and physics-based simulated robotics domains show that the curricula produced by AGCL achieve improved time-to-threshold performance on a complex sequential decision-making problem relative to state-of-the-art curriculum learning (e.g, teacher-student, self-play) and automaton-guided reinforcement learning baselines (e.g, Q-Learning for Reward Machines). Further, we demonstrate that AGCL performs well even in the presence of noise in the task's OOMDP description, and also when distractor objects are present that are not modeled in the logical specification of the tasks' objectives.

尽管强化学习取得了进步，但许多顺序决策任务仍然非常昂贵且难以学习。最近，人们提出了从逻辑任务规范中自动生成奖励函数的方法来缓解这一问题;然而，它们在长期任务(即代理需要执行一系列正确动作以达到目标状态的任务，在选择动作时考虑未来的转换)上的可伸缩性很差。采用课程(一系列日益复杂的任务)，通过排序适合智能体学习能力的中间任务，进一步提高了智能体的学习速度。然而，从逻辑规范生成课程仍然是一个未解决的问题。为此，我们提出了AGCL (Automaton-guided Curriculum Learning)，这是一种以有向无环图(dag)形式自动生成目标任务课程的新方法。AGCL以确定性有限自动机(DFA)的形式对规范进行编码，然后使用DFA和面向对象的MDP (OOMDP)表示生成课程作为DAG，其中顶点对应任务，边对应知识转移的方向。在网格世界和基于物理的模拟机器人领域的实验表明，相对于最先进的课程学习(例如，师生、自我游戏)和自动引导的强化学习基线(例如，奖励机器的Q-Learning)， AGCL制作的课程在复杂的顺序决策问题上实现了更好的时间到阈值性能。此外，我们证明，即使在任务的OOMDP描述中存在噪声，以及在任务目标的逻辑规范中没有建模的干扰对象存在时，AGCL也表现良好。

{"title":"Automaton-Guided Curriculum Generation for Reinforcement Learning Agents","authors":"Yash Shukla, A. Kulkarni, R. Wright, Alvaro Velasquez, J. Sinapov","doi":"10.48550/arXiv.2304.05271","DOIUrl":"https://doi.org/10.48550/arXiv.2304.05271","url":null,"abstract":"Despite advances in Reinforcement Learning, many sequential decision making tasks remain prohibitively expensive and impractical to learn. Recently, approaches that automatically generate reward functions from logical task specifications have been proposed to mitigate this issue; however, they scale poorly on long-horizon tasks (i.e., tasks where the agent needs to perform a series of correct actions to reach the goal state, considering future transitions while choosing an action). Employing a curriculum (a sequence of increasingly complex tasks) further improves the learning speed of the agent by sequencing intermediate tasks suited to the learning capacity of the agent. However, generating curricula from the logical specification still remains an unsolved problem. To this end, we propose AGCL, Automaton-guided Curriculum Learning, a novel method for automatically generating curricula for the target task in the form of Directed Acyclic Graphs (DAGs). AGCL encodes the specification in the form of a deterministic finite automaton (DFA), and then uses the DFA along with the Object-Oriented MDP (OOMDP) representation to generate a curriculum as a DAG, where the vertices correspond to tasks, and edges correspond to the direction of knowledge transfer. Experiments in gridworld and physics-based simulated robotics domains show that the curricula produced by AGCL achieve improved time-to-threshold performance on a complex sequential decision-making problem relative to state-of-the-art curriculum learning (e.g, teacher-student, self-play) and automaton-guided reinforcement learning baselines (e.g, Q-Learning for Reward Machines). Further, we demonstrate that AGCL performs well even in the presence of noise in the task's OOMDP description, and also when distractor objects are present that are not modeled in the logical specification of the tasks' objectives.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122566494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Planning for Manipulation among Movable Objects: Deciding Which Objects Go Where, in What Order, and How 可移动对象之间的操作计划:决定哪些对象去哪里，以什么顺序，以及如何去

International Conference on Automated Planning and Scheduling

Pub Date : 2023-03-23 DOI: 10.48550/arXiv.2303.13385

D. Saxena, M. Likhachev

We are interested in pick-and-place style robot manipulation tasks in cluttered and confined 3D workspaces among movable objects that may be rearranged by the robot and may slide, tilt, lean or topple. A recently proposed algorithm, M4M, determines which objects need to be moved and where by solving a Multi-Agent Pathfinding (MAPF) abstraction of this problem. It then utilises a nonprehensile push planner to compute actions for how the robot might realise these rearrangements and a rigid body physics simulator to check whether the actions satisfy physics constraints encoded in the problem. However, M4M greedily commits to valid pushes found during planning, and does not reason about orderings over pushes if multiple objects need to be rearranged. Furthermore, M4M does not reason about other possible MAPF solutions that lead to different rearrangements and pushes. In this paper, we extend M4M and present Enhanced-M4M (E-M4M) -- a systematic graph search-based solver that searches over orderings of pushes for movable objects that need to be rearranged and different possible rearrangements of the scene. We introduce several algorithmic optimisations to circumvent the increased computational complexity, discuss the space of problems solvable by E-M4M and show that experimentally, both on the real robot and in simulation, it significantly outperforms the original M4M algorithm, as well as other state-of-the-art alternatives when dealing with complex scenes.

我们对拾取式机器人操作任务感兴趣，这些任务在杂乱和受限的3D工作空间中，机器人可能会重新排列可移动的物体，并可能滑动、倾斜、倾斜或倾倒。最近提出的一种算法M4M，通过解决该问题的多代理寻路(MAPF)抽象来确定哪些对象需要移动以及移动到哪里。然后，它利用一个不可理解的推计划器来计算机器人如何实现这些重排的动作，并利用一个刚体物理模拟器来检查这些动作是否满足问题中编码的物理约束。然而，M4M贪婪地提交在计划期间发现的有效推送，并且如果需要重新安排多个对象，则不会对推送进行排序。此外，M4M不会推理导致不同重排和推动的其他可能的MAPF解决方案。在本文中，我们扩展了M4M，并提出了Enhanced-M4M (E-M4M)——一个系统的基于图搜索的求解器，它搜索需要重排的可移动物体的推送顺序和场景的不同可能重排。我们引入了几种算法优化来规避增加的计算复杂性，讨论了E-M4M可解决的问题空间，并在实验中表明，无论是在真实机器人上还是在模拟中，它都明显优于原始的M4M算法，以及处理复杂场景时的其他最先进的替代方案。

{"title":"Planning for Manipulation among Movable Objects: Deciding Which Objects Go Where, in What Order, and How","authors":"D. Saxena, M. Likhachev","doi":"10.48550/arXiv.2303.13385","DOIUrl":"https://doi.org/10.48550/arXiv.2303.13385","url":null,"abstract":"We are interested in pick-and-place style robot manipulation tasks in cluttered and confined 3D workspaces among movable objects that may be rearranged by the robot and may slide, tilt, lean or topple. A recently proposed algorithm, M4M, determines which objects need to be moved and where by solving a Multi-Agent Pathfinding (MAPF) abstraction of this problem. It then utilises a nonprehensile push planner to compute actions for how the robot might realise these rearrangements and a rigid body physics simulator to check whether the actions satisfy physics constraints encoded in the problem. However, M4M greedily commits to valid pushes found during planning, and does not reason about orderings over pushes if multiple objects need to be rearranged. Furthermore, M4M does not reason about other possible MAPF solutions that lead to different rearrangements and pushes. In this paper, we extend M4M and present Enhanced-M4M (E-M4M) -- a systematic graph search-based solver that searches over orderings of pushes for movable objects that need to be rearranged and different possible rearrangements of the scene. We introduce several algorithmic optimisations to circumvent the increased computational complexity, discuss the space of problems solvable by E-M4M and show that experimentally, both on the real robot and in simulation, it significantly outperforms the original M4M algorithm, as well as other state-of-the-art alternatives when dealing with complex scenes.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116200852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Reinforcement Learning for Omega-Regular Specifications on Continuous-Time MDP 连续时间MDP上omega -规则规范的强化学习

International Conference on Automated Planning and Scheduling

Pub Date : 2023-03-16 DOI: 10.48550/arXiv.2303.09528

A. Falah, Shibashis Guha, Ashutosh Trivedi

Continuous-time Markov decision processes (CTMDPs) are canonical models to express sequential decision-making under dense-time and stochastic environments. When the stochastic evolution of the environment is only available via sampling, model-free reinforcement learning (RL) is the algorithm-of-choice to compute optimal decision sequence. RL, on the other hand, requires the learning objective to be encoded as scalar reward signals. Since doing such translations manually is both tedious and error-prone, a number of techniques have been proposed to translate high-level objectives (expressed in logic or automata formalism) to scalar rewards for discrete-time Markov decision processes. Unfortunately, no automatic translation exists for CTMDPs. We consider CTMDP environments against the learning objectives expressed as omega-regular languages. Omega-regular languages generalize regular languages to infinite-horizon specifications and can express properties given in popular linear-time logic LTL. To accommodate the dense-time nature of CTMDPs, we consider two different semantics of omega-regular objectives: 1) satisfaction semantics where the goal of the learner is to maximize the probability of spending positive time in the good states, and 2) expectation semantics where the goal of the learner is to optimize the long-run expected average time spent in the ''good states'' of the automaton. We present an approach enabling correct translation to scalar reward signals that can be readily used by off-the-shelf RL algorithms for CTMDPs. We demonstrate the effectiveness of the proposed algorithms by evaluating it on some popular CTMDP benchmarks with omega-regular objectives.

连续时间马尔可夫决策过程(ctmdp)是表达密集时间和随机环境下序列决策的典型模型。当环境的随机演化只能通过采样获得时，无模型强化学习(RL)是计算最优决策序列的首选算法。另一方面，强化学习要求将学习目标编码为标量奖励信号。由于手工进行这种转换既乏味又容易出错，因此已经提出了许多技术来将高级目标(以逻辑或自动机形式表示)转换为离散时间马尔可夫决策过程的标量奖励。不幸的是，ctmdp不存在自动翻译。我们将CTMDP环境与表达为ω -正则语言的学习目标相违背。omega -正则语言将正则语言推广到无限视界规范，可以表达流行的线性时间逻辑LTL中给出的属性。为了适应ctmdp的密集时间性质，我们考虑了两种不同的-规则目标语义:1)满意度语义，学习者的目标是最大化在良好状态下花费积极时间的概率;2)期望语义，学习者的目标是优化在自动机的“良好状态”下花费的长期预期平均时间。我们提出了一种能够正确转换标量奖励信号的方法，该方法可以很容易地用于ctmdp的现成RL算法。我们通过在一些流行的具有ω -规则目标的CTMDP基准上对其进行评估来证明所提出算法的有效性。

{"title":"Reinforcement Learning for Omega-Regular Specifications on Continuous-Time MDP","authors":"A. Falah, Shibashis Guha, Ashutosh Trivedi","doi":"10.48550/arXiv.2303.09528","DOIUrl":"https://doi.org/10.48550/arXiv.2303.09528","url":null,"abstract":"Continuous-time Markov decision processes (CTMDPs) are canonical models to express sequential decision-making under dense-time and stochastic environments. When the stochastic evolution of the environment is only available via sampling, model-free reinforcement learning (RL) is the algorithm-of-choice to compute optimal decision sequence. RL, on the other hand, requires the learning objective to be encoded as scalar reward signals. Since doing such translations manually is both tedious and error-prone, a number of techniques have been proposed to translate high-level objectives (expressed in logic or automata formalism) to scalar rewards for discrete-time Markov decision processes. Unfortunately, no automatic translation exists for CTMDPs.\u0000 \u0000 We consider CTMDP environments against the learning objectives expressed as omega-regular languages. Omega-regular languages generalize regular languages to infinite-horizon specifications and can express properties given in popular linear-time logic LTL. To accommodate the dense-time nature of CTMDPs, we consider two different semantics of omega-regular objectives: 1) satisfaction semantics where the goal of the learner is to maximize the probability of spending positive time in the good states, and 2) expectation semantics where the goal of the learner is to optimize the long-run expected average time spent in the ''good states'' of the automaton. We present an approach enabling correct translation to scalar reward signals that can be readily used by off-the-shelf RL algorithms for CTMDPs. We demonstrate the effectiveness of the proposed algorithms by evaluating it on some popular CTMDP benchmarks with omega-regular objectives.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122981566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Act-Then-Measure: Reinforcement Learning for Partially Observable Environments with Active Measuring 行动-测量:基于主动测量的部分可观察环境的强化学习

International Conference on Automated Planning and Scheduling

Pub Date : 2023-03-14 DOI: 10.48550/arXiv.2303.08271

Merlijn Krale, T. D. Simão, N. Jansen

We study Markov decision processes (MDPs), where agents control when and how they gather information, as formalized by action-contingent noiselessly observable MDPs (ACNO-MPDs). In these models, actions have two components: a control action that influences how the environment changes and a measurement action that affects the agent's observation. To solve ACNO-MDPs, we introduce the act-then-measure (ATM) heuristic, which assumes that we can ignore future state uncertainty when choosing control actions. To decide whether or not to measure, we introduce the concept of measuring value. We show how following this heuristic may lead to shorter policy computation times and prove a bound on the performance loss it incurs. We develop a reinforcement learning algorithm based on the ATM heuristic, using a Dyna-Q variant adapted for partially observable domains, and showcase its superior performance compared to prior methods on a number of partially-observable environments.

我们研究马尔可夫决策过程(mdp)，其中代理控制何时以及如何收集信息，由行动偶然的无噪音可观察mdp (ACNO-MPDs)形式化。在这些模型中，行为有两个组成部分:影响环境如何变化的控制行为和影响代理观察的测量行为。为了解决ACNO-MDPs问题，我们引入了行动-测量(ATM)启发式算法，该算法假设我们在选择控制动作时可以忽略未来状态的不确定性。为了决定是否测量，我们引入了测量值的概念。我们将展示遵循这种启发式方法如何缩短策略计算时间，并证明它所导致的性能损失的限度。我们开发了一种基于ATM启发式的强化学习算法，使用适用于部分可观察域的Dyna-Q变体，并在许多部分可观察环境中展示了与先前方法相比其优越的性能。

引用次数: 0

Explainable Goal Recognition: A Framework Based on Weight of Evidence 可解释目标识别:基于证据权重的框架

International Conference on Automated Planning and Scheduling

Pub Date : 2023-03-09 DOI: 10.48550/arXiv.2303.05622

Abeer Alshehri, Tim Miller, Mor Vered

We introduce and evaluate an eXplainable goal recognition (XGR) model that uses the Weight of Evidence (WoE) framework to explain goal recognition problems. Our model provides human-centered explanations that answer `why?' and `why not?' questions. We computationally evaluate the performance of our system over eight different goal recognition domains showing it does not significantly increase the underlying recognition run time. Using a human behavioral study to obtain the ground truth from human annotators, we further show that the XGR model can successfully generate human-like explanations. We then report on a study with 40 participants who observe agents playing a Sokoban game and then receive explanations of the goal recognition output. We investigated participants’ understanding obtained by explanations through task prediction, explanation satisfaction, and trust.

我们引入并评估了一个可解释目标识别(XGR)模型，该模型使用证据权重(WoE)框架来解释目标识别问题。我们的模型提供了以人为本的解释，回答“为什么?”和“为什么不呢?”的问题。我们计算评估了我们的系统在八个不同的目标识别领域的性能，表明它并没有显着增加底层识别运行时间。通过人类行为研究从人类注释者那里获得基本事实，我们进一步证明了XGR模型可以成功地生成类似人类的解释。然后，我们报告了一项有40名参与者的研究，他们观察代理人玩Sokoban游戏，然后收到目标识别输出的解释。我们通过任务预测、解释满意度和信任来调查被试通过解释获得的理解。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International Conference on Automated Planning and Scheduling

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀