International Conference on Automated Planning and Scheduling最新文献

英文中文

Improving Time-Dependent Contraction Hierarchies 改进与时间相关的收缩层次结构

International Conference on Automated Planning and Scheduling

Pub Date : 2022-06-13 DOI: 10.1609/icaps.v32i1.19818

Bojie Shen, M. A. Cheema, Daniel D. Harabor, P. J. Stuckey

Computing time-optimal shortest paths, in road networks, is one of the most popular applications of Artificial Intelligence. This problem is tricky to solve because road congestion affects travel times. The state-of-the-art in this area is an algorithm called Time-dependent Contraction Hierarchies (TCH). Although fast and optimal, TCH still suffers from two main drawbacks: (1) the usual query process uses bi-directional Dijkstra search to find the shortest path, which can be time-consuming; and (2) the TCH is constructed w.r.t. the entire time domain T, which complicates the search process for queries q that start and finish in a smaller time period Tq ⊂ T. In this work, we improve TCH by making use of time-independent heuristics, which speed up optimal search, and by computing TCHs for different subsets of the time domain, which further reduces the size of the search space. We give a full description of these methods and discuss their optimality-preserving characteristics. We report significant query time improvements against a baseline implementation of TCH.

在道路网络中，计算时间最优最短路径是人工智能最流行的应用之一。这个问题很难解决，因为道路拥堵会影响出行时间。该领域的最新技术是一种称为时间相关收缩层次(TCH)的算法。虽然快速且最优，但TCH仍然存在两个主要缺点:(1)通常的查询过程使用双向Dijkstra搜索来寻找最短路径，这可能会花费大量时间;(2) TCH是在整个时域T上构造的，这使得查询q的搜索过程变得复杂，查询q在较小的时间段Tq∧T内开始和结束。在这项工作中，我们通过使用时间无关的启发式来改进TCH，这加快了最优搜索的速度，并通过计算时域不同子集的TCH，进一步减小了搜索空间的大小。给出了这些方法的详细描述，并讨论了它们的最优保持特性。我们报告了与TCH的基线实现相比查询时间的显著改进。

引用次数: 0

Reinforcement Learning Approach to Solve Dynamic Bi-objective Police Patrol Dispatching and Rescheduling Problem 求解动态双目标警务巡逻调度与调度问题的强化学习方法

International Conference on Automated Planning and Scheduling

Pub Date : 2022-06-13 DOI: 10.1609/icaps.v32i1.19831

Waldy Joe, H. Lau, Jonathan Pan

Police patrol aims to fulfill two main objectives namely to project presence and to respond to incidents in a timely manner. Incidents happen dynamically and can disrupt the initially-planned patrol schedules. The key decisions to be made will be which patrol agent to be dispatched to respond to an incident and subsequently how to adapt the patrol schedules in response to such dynamically-occurring incidents whilst still fulfilling both objectives; which sometimes can be conflicting. In this paper, we define this real-world problem as a Dynamic Bi-Objective Police Patrol Dispatching and Rescheduling Problem and propose a solution approach that combines Deep Reinforcement Learning (specifically neural networks-based Temporal-Difference learning with experience replay) to approximate the value function and a rescheduling heuristic based on ejection chains to learn both dispatching and rescheduling policies jointly. To address the dual objectives, we propose a reward function that implicitly tries to maximize the rate of successfully responding to an incident within a response time target while minimizing the reduction in patrol presence without the need to explicitly set predetermined weights for each objective. The proposed approach is able to compute both dispatching and rescheduling decisions almost instantaneously. Our work serves as the first work in the literature that takes into account these dual patrol objectives and real-world operational consideration where incident response may disrupt existing patrol schedules.

警察巡逻的目的主要有两个，即显示巡逻的存在，以及及时对事件作出反应。事故是动态发生的，可能会打乱最初计划的巡逻时间表。要做出的关键决定将是派遣哪些巡逻人员来应对事件，以及随后如何调整巡逻时间表以应对此类动态发生的事件，同时仍能实现两个目标;这有时是相互矛盾的。在本文中，我们将这一现实问题定义为动态双目标警察巡逻调度和重调度问题，并提出了一种解决方法，该方法结合深度强化学习(特别是基于神经网络的时间差异学习和经验重播)来近似值函数和基于弹射链的重调度启发式来共同学习调度和重调度策略。为了解决双重目标，我们提出了一个奖励函数，该函数隐含地试图最大化在响应时间目标内成功响应事件的比率，同时最小化巡逻存在的减少，而不需要明确地为每个目标设置预定权重。该方法几乎可以在瞬间计算调度和重调度决策。我们的工作是文献中第一个考虑到这些双重巡逻目标和实际操作考虑的工作，其中事件响应可能会破坏现有的巡逻时间表。

{"title":"Reinforcement Learning Approach to Solve Dynamic Bi-objective Police Patrol Dispatching and Rescheduling Problem","authors":"Waldy Joe, H. Lau, Jonathan Pan","doi":"10.1609/icaps.v32i1.19831","DOIUrl":"https://doi.org/10.1609/icaps.v32i1.19831","url":null,"abstract":"Police patrol aims to fulfill two main objectives namely to project presence and to respond to incidents in a timely manner. Incidents happen dynamically and can disrupt the initially-planned patrol schedules. The key decisions to be made will be which patrol agent to be dispatched to respond to an incident and subsequently how to adapt the patrol schedules in response to such dynamically-occurring incidents whilst still fulfilling both objectives; which sometimes can be conflicting. In this paper, we define this real-world problem as a Dynamic Bi-Objective Police Patrol Dispatching and Rescheduling Problem and propose a solution approach that combines Deep Reinforcement Learning (specifically neural networks-based Temporal-Difference learning with experience replay) to approximate the value function and a rescheduling heuristic based on ejection chains to learn both dispatching and rescheduling policies jointly. To address the dual objectives, we propose a reward function that implicitly tries to maximize the rate of successfully responding to an incident within a response time target while minimizing the reduction in patrol presence without the need to explicitly set predetermined weights for each objective. The proposed approach is able to compute both dispatching and rescheduling decisions almost instantaneously. Our work serves as the first work in the literature that takes into account these dual patrol objectives and real-world operational consideration where incident response may disrupt existing patrol schedules.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134123566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Active Grammatical Inference for Non-Markovian Planning 非马尔可夫规划的主动语法推理

International Conference on Automated Planning and Scheduling

Pub Date : 2022-06-13 DOI: 10.1609/icaps.v32i1.19853

Noah Topper, George K. Atia, Ashutosh Trivedi, Alvaro Velasquez

Planning in finite stochastic environments is canonically posed as a Markov decision process where the transition and reward structures are explicitly known. Reinforcement learning (RL) lifts the explicitness assumption by working with sampling models instead. Further, with the advent of reward machines, we can relax the Markovian assumption on the reward. Angluin's active grammatical inference algorithm L* has found novel application in explicating reward machines for non-Markovian RL. We propose maintaining the assumption of explicit transition dynamics, but with an implicit non-Markovian reward signal, which must be inferred from experiments. We call this setting non-Markovian planning, as opposed to non-Markovian RL. The proposed approach leverages L* to explicate an automaton structure for the underlying planning objective. We exploit the environment model to learn an automaton faster and integrate it with value iteration to accelerate the planning. We compare against recent non-Markovian RL solutions which leverage grammatical inference, and establish complexity results that illustrate the difference in runtime between grammatical inference in planning and RL settings.

有限随机环境下的规划通常是一个马尔可夫决策过程，其中过渡和奖励结构是明确已知的。强化学习(RL)通过使用采样模型来提升显式假设。此外，随着奖励机器的出现，我们可以放松对奖励的马尔可夫假设。Angluin的主动语法推理算法L*在解释非马尔可夫强化学习的奖励机器方面找到了新的应用。我们建议保持明确的转移动力学假设，但有一个隐含的非马尔可夫奖励信号，这必须从实验中推断出来。我们称这种设置为非马尔可夫规划，与非马尔可夫RL相反。所建议的方法利用L*来解释潜在规划目标的自动化结构。我们利用环境模型来更快地学习自动机，并将其与值迭代集成以加速规划。我们比较了最近利用语法推理的非马尔可夫强化学习解决方案，并建立了复杂性结果，说明了规划和强化学习设置中语法推理在运行时间上的差异。

{"title":"Active Grammatical Inference for Non-Markovian Planning","authors":"Noah Topper, George K. Atia, Ashutosh Trivedi, Alvaro Velasquez","doi":"10.1609/icaps.v32i1.19853","DOIUrl":"https://doi.org/10.1609/icaps.v32i1.19853","url":null,"abstract":"Planning in finite stochastic environments is canonically posed as a Markov decision process where the transition and reward structures are explicitly known. Reinforcement learning (RL) lifts the explicitness assumption by working with sampling models instead. Further, with the advent of reward machines, we can relax the Markovian assumption on the reward. Angluin's active grammatical inference algorithm L* has found novel application in explicating reward machines for non-Markovian RL. We propose maintaining the assumption of explicit transition dynamics, but with an implicit non-Markovian reward signal, which must be inferred from experiments. We call this setting non-Markovian planning, as opposed to non-Markovian RL. The proposed approach leverages L* to explicate an automaton structure for the underlying planning objective. We exploit the environment model to learn an automaton faster and integrate it with value iteration to accelerate the planning. We compare against recent non-Markovian RL solutions which leverage grammatical inference, and establish complexity results that illustrate the difference in runtime between grammatical inference in planning and RL settings.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133342300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Hybrid Genetic Algorithm for the Vehicle Routing Problem with Roaming Delivery Locations 具有漫游配送地点的车辆路径问题的混合遗传算法

International Conference on Automated Planning and Scheduling

Pub Date : 2022-06-13 DOI: 10.1609/icaps.v32i1.19813

Quang Anh Pham, Minh Hoàng Hà, Duy Manh Vu, Huy-Hoang Nguyen

The Vehicle Routing Problem with Roaming Delivery Locations (VRPRDL) is a variant of the Vehicle Routing Problem (VRP) in which a customer can be present at many locations during a working day and a time window is associated with each location. The objective is to find a set of routes such that (i) the total traveling cost is minimized, (ii) only one location of each customer is visited within its time window, and (iii) all capacity constraints are satisfied. To solve the problem, we introduce a hybrid genetic algorithm which relies on problem-tailored solution representation, mutation, local search operators, as well as a set covering component exploring routes found during the search to find better solutions. We also propose a new split procedure which based on dynamic programming to evaluate the fitness of chromosomes. Experiments conducted on the benchmark instances clearly show that our proposed algorithm outperforms existing approaches in terms of stability and solution quality. We also improve 49 best known solutions of the literature.

具有漫游配送地点的车辆路由问题(VRPRDL)是车辆路由问题(VRP)的一种变体，其中客户可以在一个工作日内出现在许多地点，并且每个地点都有一个时间窗口。目标是找到一组路线，使(i)总旅行成本最小，(ii)每个顾客在其时间窗口内只访问一个地点，以及(iii)满足所有容量限制。为了解决这一问题，我们引入了一种混合遗传算法，该算法依靠问题定制解表示、突变、局部搜索算子以及搜索过程中发现的一组覆盖组件探索路径来找到更好的解。我们还提出了一种新的基于动态规划的分裂方法来评估染色体的适合度。在基准实例上进行的实验清楚地表明，我们提出的算法在稳定性和解质量方面优于现有方法。我们还改进了文献中最著名的49个解决方案。

引用次数: 0

Deep Reinforcement Learning for a Multi-Objective Online Order Batching Problem 多目标在线订单批处理问题的深度强化学习

International Conference on Automated Planning and Scheduling

Pub Date : 2022-06-13 DOI: 10.1609/icaps.v32i1.19829

M. Beeks, Reza Refaei Afshar, Yingqian Zhang, R. Dijkman, Claudy van Dorst, S. D. Looijer

On-time delivery and low service costs are two important performance metrics in warehousing operations. This paper proposes a Deep Reinforcement Learning (DRL) based approach to solve the online Order Batching and Sequence Problem (OBSP) to optimize these two objectives. To learn how to balance the trade-off between two objectives, we introduce a Bayesian optimization framework to shape the reward function of the DRL agent, such that the influences of learning to these objectives are adjusted to different environments. We compare our approach with several heuristics using problem instances of real-world size where thousands of orders arrive dynamically per hour. We show the Proximal Policy Optimization (PPO) algorithm with Bayesian optimization outperforms the heuristics in all tested scenarios on both objectives. In addition, it finds different weights for the components in the reward function in different scenarios, indicating its capability of learning how to set the importance of two objectives under different environments. We also provide policy analysis on the learned DRL agent, where a decision tree is used to infer decision rules to enable the interpretability of the DRL approach.

准时交货和低服务成本是仓储作业中两个重要的绩效指标。本文提出了一种基于深度强化学习(DRL)的方法来解决在线订单批处理和序列问题(OBSP)，以优化这两个目标。为了学习如何平衡两个目标之间的权衡，我们引入了一个贝叶斯优化框架来塑造DRL代理的奖励函数，这样学习对这些目标的影响就会根据不同的环境进行调整。我们将我们的方法与几个启发式方法进行比较，这些启发式方法使用的是实际规模的问题实例，其中每小时有数千个订单动态到达。我们展示了具有贝叶斯优化的近端策略优化(PPO)算法在两个目标的所有测试场景中都优于启发式算法。此外，在不同的场景下，它找到了奖励函数中不同分量的权重，表明它有能力学习如何在不同的环境下设置两个目标的重要性。我们还提供了对学习到的DRL代理的策略分析，其中使用决策树来推断决策规则，以使DRL方法具有可解释性。

{"title":"Deep Reinforcement Learning for a Multi-Objective Online Order Batching Problem","authors":"M. Beeks, Reza Refaei Afshar, Yingqian Zhang, R. Dijkman, Claudy van Dorst, S. D. Looijer","doi":"10.1609/icaps.v32i1.19829","DOIUrl":"https://doi.org/10.1609/icaps.v32i1.19829","url":null,"abstract":"On-time delivery and low service costs are two important performance metrics in warehousing operations. This paper proposes a Deep Reinforcement Learning (DRL) based approach to solve the online Order Batching and Sequence Problem (OBSP) to optimize these two objectives. \u0000To learn how to balance the trade-off between two objectives, we introduce a Bayesian optimization framework to shape the reward function of the DRL agent, such that the influences of learning to these objectives are adjusted to different environments. We compare our approach with several heuristics using problem instances of real-world size where thousands of orders arrive dynamically per hour. \u0000We show the Proximal Policy Optimization (PPO) algorithm with Bayesian optimization outperforms the heuristics in all tested scenarios on both objectives. In addition, it finds different weights for the components in the reward function in different scenarios, indicating its capability of learning how to set the importance of two objectives under different environments. We also provide policy analysis on the learned DRL agent, where a decision tree is used to infer decision rules to enable the interpretability of the DRL approach.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123662464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Actor-Focused Interactive Visualization for AI Planning 面向AI规划的以参与者为中心的交互式可视化

International Conference on Automated Planning and Scheduling

Pub Date : 2022-06-13 DOI: 10.1609/icaps.v32i1.19857

G. Cantareira, Gerard Canal, R. Borgo

As we grow more reliant on AI systems for an increasing variety of applications in our lives, the need to understand and interpret such systems also becomes more pronounced, be it for improvement, trust, or legal liability. AI Planning is one type of task that provides explanation challenges, particularly due to the increasing complexity in generated plans and convoluted causal chains that connect actions and determine overall plan structure. While there are many recent techniques to support plan explanation, visual aids for navigating this data are quite limited. Furthermore, there is often a barrier between techniques focused on abstract planning concepts and domain-related explanations. In this paper, we present a visual analytics tool to support plan summarization and interaction, focusing in robotics domains using an actor-based structure. We show how users can quickly grasp vital information about actions involved in a plan and how they relate to each other. Finally, we present a framework used to design our tool, highlighting how general PDDL elements can be converted into visual representations and further connecting concept to domain.

随着我们越来越依赖于人工智能系统在生活中越来越多的应用，理解和解释这些系统的需求也变得更加明显，无论是为了改进、信任还是法律责任。AI Planning是一种提供解释挑战的任务类型，特别是由于生成的计划和连接行动并决定整体计划结构的错综复杂的因果链越来越复杂。虽然最近有许多技术支持计划解释，但用于导航这些数据的可视化辅助工具相当有限。此外，在专注于抽象规划概念的技术和与领域相关的解释之间经常存在障碍。在本文中，我们提出了一个可视化的分析工具，以支持计划总结和交互，重点在机器人领域使用基于角色的结构。我们展示了用户如何快速掌握计划中涉及的行动的重要信息，以及它们如何相互关联。最后，我们提出了一个用于设计我们的工具的框架，强调了如何将一般的PDDL元素转换为可视化表示，并进一步将概念连接到领域。

{"title":"Actor-Focused Interactive Visualization for AI Planning","authors":"G. Cantareira, Gerard Canal, R. Borgo","doi":"10.1609/icaps.v32i1.19857","DOIUrl":"https://doi.org/10.1609/icaps.v32i1.19857","url":null,"abstract":"As we grow more reliant on AI systems for an increasing variety of applications in our lives, the need to understand and interpret such systems also becomes more pronounced, be it for improvement, trust, or legal liability. AI Planning is one type of task that provides explanation challenges, particularly due to the increasing complexity in generated plans and convoluted causal chains that connect actions and determine overall plan structure. While there are many recent techniques to support plan explanation, visual aids for navigating this data are quite limited. Furthermore, there is often a barrier between techniques focused on abstract planning concepts and domain-related explanations. In this paper, we present a visual analytics tool to support plan summarization and interaction, focusing in robotics domains using an actor-based structure. We show how users can quickly grasp vital information about actions involved in a plan and how they relate to each other. Finally, we present a framework used to design our tool, highlighting how general PDDL elements can be converted into visual representations and further connecting concept to domain.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129629248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Neural Network Action Policy Verification via Predicate Abstraction 基于谓词抽象的神经网络动作策略验证

International Conference on Automated Planning and Scheduling

Pub Date : 2022-06-13 DOI: 10.1609/icaps.v32i1.19822

Marcel Vinzent, Marcel Steinmetz, Jörg Hoffmann

Neural networks (NN) are an increasingly important representation of action policies. Verifying that such policies are safe is potentially very hard as it compounds the state space explosion with the difficulty of analyzing even single NN decision episodes. Here we address that challenge through abstract reachability analysis. We show how to compute predicate abstractions of the policy state space subgraph induced by fixing an NN action policy. A key sub-problem here is the computation of abstract state transitions that may be taken by the policy, which as we show can be tackled by connecting to off-the-shelf SMT solvers. We devise a range of algorithmic enhancements, leveraging relaxed tests to avoid costly calls to SMT. We empirically evaluate the resulting machinery on a collection of benchmarks. The results show that our enhancements are required for practicality, and that our approach can outperform two competing approaches based on explicit enumeration and bounded-length verification.

神经网络(NN)是行动策略的一种越来越重要的表现形式。验证这些策略的安全性可能非常困难，因为它将状态空间爆炸与分析单个NN决策集的难度相结合。在这里，我们通过抽象的可达性分析来解决这个挑战。我们展示了如何计算由固定一个神经网络动作策略引起的策略状态空间子图的谓词抽象。这里的一个关键子问题是策略可能采取的抽象状态转换的计算，正如我们所展示的，可以通过连接到现成的SMT求解器来解决这个问题。我们设计了一系列算法增强，利用宽松的测试来避免昂贵的SMT调用。我们在一组基准上对产生的机器进行经验评估。结果表明，我们的改进是实用性所必需的，并且我们的方法可以优于基于显式枚举和有界长度验证的两种竞争方法。

引用次数: 5

Distributed Fleet Management in Noisy Environments via Model-Predictive Control 基于模型预测控制的噪声环境下分布式车队管理

International Conference on Automated Planning and Scheduling

Pub Date : 2022-06-13 DOI: 10.1609/icaps.v32i1.19843

Simon Boegh, P. G. Jensen, Martin Kristjansen, K. Larsen, Ulrik Nyman

We consider dynamic route planning for a fleet of Autonomous Mobile Robots (AMRs) doing fetch and carry tasks on a shared factory floor. In this paper, we propose Stochastic Work Graphs (SWG) as a formalism for capturing the semantics of such distributed and uncertain planning problems. We encode SWGs in the form of a Euclidean Markov Decision Process (EMDP) in the tool Uppaal Stratego, which employs Q-Learning to synthesize near-optimal plans. Furthermore, we deploy the tool in an online and distributed fashion to facilitate scalable, rapid replanning. While executing their current plan, each AMR generates a new plan incorporating updated information about the other AMRs positions and plans. We propose a two-layer Model Predictive Controller-structure (waypoint and station planning), each individually solved by the Q-learning-based solver. We demonstrate our approach using ARGoS3 large-scale robot simulation, where we simulate the AMR movement and observe an up to 27.5% improvement in makespan over a greedy approach to planning. To do so, we have implemented the full software stack, translating observations into SWGs and solving those with our proposed method. In addition, we construct a benchmark platform for comparing planning techniques on a reasonably realistic physical simulation and provide this under the MIT open-source license.

我们考虑了一组在共享工厂车间进行取货和搬运任务的自主移动机器人(amr)的动态路线规划。在本文中，我们提出随机工作图(SWG)作为一种形式来捕获这种分布式和不确定规划问题的语义。我们在Uppaal Stratego工具中以欧几里得马尔可夫决策过程(EMDP)的形式对swg进行编码，该工具使用Q-Learning来综合接近最优的计划。此外，我们以在线和分布式的方式部署该工具，以促进可扩展的、快速的重新规划。在执行他们当前的计划时，每个AMR生成一个包含关于其他AMR位置和计划的更新信息的新计划。我们提出了一个两层模型预测控制器结构(航路点和站点规划)，每层都由基于q学习的求解器单独求解。我们使用ARGoS3大规模机器人仿真来演示我们的方法，我们模拟了AMR运动，并观察到与贪婪规划方法相比，最大完工时间提高了27.5%。为此，我们实现了完整的软件堆栈，将观测结果转换为swg，并使用我们提出的方法解决这些问题。此外，我们构建了一个基准平台，用于比较合理逼真的物理模拟上的规划技术，并在MIT开源许可下提供。

{"title":"Distributed Fleet Management in Noisy Environments via Model-Predictive Control","authors":"Simon Boegh, P. G. Jensen, Martin Kristjansen, K. Larsen, Ulrik Nyman","doi":"10.1609/icaps.v32i1.19843","DOIUrl":"https://doi.org/10.1609/icaps.v32i1.19843","url":null,"abstract":"We consider dynamic route planning for a fleet of Autonomous Mobile Robots (AMRs) doing fetch and carry tasks on a shared factory floor. In this paper, we propose Stochastic Work Graphs (SWG) as a formalism for capturing the semantics of such distributed and uncertain planning problems. We encode SWGs in the form of a Euclidean Markov Decision Process (EMDP) in the tool Uppaal Stratego, which employs Q-Learning to synthesize near-optimal plans. Furthermore, we deploy the tool in an online and distributed fashion to facilitate scalable, rapid replanning. While executing their current plan, each AMR generates a new plan incorporating updated information about the other AMRs positions and plans. We propose a two-layer Model Predictive Controller-structure (waypoint and station planning), each individually solved by the Q-learning-based solver. We demonstrate our approach using ARGoS3 large-scale robot simulation, where we simulate the AMR movement and observe an up to 27.5% improvement in makespan over a greedy approach to planning. To do so, we have implemented the full software stack, translating observations into SWGs and solving those with our proposed method. In addition, we construct a benchmark platform for comparing planning techniques on a reasonably realistic physical simulation and provide this under the MIT open-source license.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115691364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Debugging a Policy: Automatic Action-Policy Testing in AI Planning 调试策略:AI规划中的自动动作策略测试

International Conference on Automated Planning and Scheduling

Pub Date : 2022-06-13 DOI: 10.1609/icaps.v32i1.19820

Marcel Steinmetz, Daniel Fiser, Hasan Ferit Eniser, Patrick Ferber, Timo P. Gros, Philippe, Heim, D. Höller, Xandra Schuler, Valentin Wüstholz, M. Christakis, Jörg Hoffmann

Testing is a promising way to gain trust in neural action policies π. Previous work on policy testing in sequential decision making targeted environment behavior leading to failure conditions. But if the failure is unavoidable given that behavior, then π is not actually to blame. For a situation to qualify as a "bug" in π, there must be an alternative policy π' that does better. We introduce a generic policy testing framework based on that intuition. This raises the bug confirmation problem, deciding whether or not a state is a bug. We analyze the use of optimistic and pessimistic bounds for the design of test oracles approximating that problem. We contribute an implementation of our framework in classical planning, experimenting with several test oracles and with random-walk methods generating test states biased to poor policy performance and/or state novelty. We evaluate these techniques on policies π learned with ASNets. We find that they are able to effectively identify bugs in these π, and that our random-walk biases improve over uninformed baselines.

测试是神经行为策略中获得信任的一种很有前途的方法。先前在序贯决策中策略测试的研究针对导致失败的环境行为。但是，如果失败是不可避免的，考虑到这种行为，那么π实际上并不是罪魁祸首。如果一种情况符合π的“缺陷”，那么一定有另一种策略π'做得更好。我们基于这种直觉引入了一个通用策略测试框架。这就产生了bug确认问题，即决定一个状态是否为bug。我们分析了使用乐观界和悲观界来设计近似该问题的测试预言机。我们在经典规划中贡献了我们的框架的实现，用几个测试预言机和随机漫步方法进行实验，生成偏向于差策略性能和/或状态新颖性的测试状态。我们在使用ASNets学习的策略π上评估这些技术。我们发现它们能够有效地识别这些π中的错误，并且我们的随机漫步偏差在不知情的基线上得到改善。

{"title":"Debugging a Policy: Automatic Action-Policy Testing in AI Planning","authors":"Marcel Steinmetz, Daniel Fiser, Hasan Ferit Eniser, Patrick Ferber, Timo P. Gros, Philippe, Heim, D. Höller, Xandra Schuler, Valentin Wüstholz, M. Christakis, Jörg Hoffmann","doi":"10.1609/icaps.v32i1.19820","DOIUrl":"https://doi.org/10.1609/icaps.v32i1.19820","url":null,"abstract":"Testing is a promising way to gain trust in neural action policies π. Previous work on policy testing in sequential decision making targeted environment behavior leading to failure conditions. But if the failure is unavoidable given that behavior, then π is not actually to blame. For a situation to qualify as a \"bug\" in π, there must be an alternative policy π' that does better. We introduce a generic policy testing framework based on that intuition. This raises the bug confirmation problem, deciding whether or not a state is a bug. We analyze the use of optimistic and pessimistic bounds for the design of test oracles approximating that problem. We contribute an implementation of our framework in classical planning, experimenting with several test oracles and with random-walk methods generating test states biased to poor policy performance and/or state novelty. We evaluate these techniques on policies π learned with ASNets. We find that they are able to effectively identify bugs in these π, and that our random-walk biases improve over uninformed baselines.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131380831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

An End-to-End Automatic Cache Replacement Policy Using Deep Reinforcement Learning 基于深度强化学习的端到端自动缓存替换策略

International Conference on Automated Planning and Scheduling

Pub Date : 2022-06-13 DOI: 10.1609/icaps.v32i1.19840

Yangqing Zhou, Fang Wang, Zhan Shi, D. Feng

In the past few decades, much research has been conducted on the design of cache replacement policies. Prior work frequently relies on manually-engineered heuristics to capture the most common cache access patterns, or predict the reuse distance and try to identify the blocks that are either cache-friendly or cache-averse. Researchers are now applying recent advances in machine learning to guide cache replacement policy, augmenting or replacing traditional heuristics and data structures. However, most existing approaches depend on the certain environment which restricted their application, e.g, most of the approaches only consider the on-chip cache consisting of program counters (PCs). Moreover, those approaches with attractive hit rates are usually unable to deal with modern irregular workloads, due to the limited feature used. In contrast, we propose a pervasive cache replacement framework to automatically learn the relationship between the probability distribution of different replacement policies and workload distribution by using deep reinforcement learning. We train an end-to-end cache replacement policy only on the past requested address through two simple and stable cache replacement policies. Furthermore, the overall framework can be easily plugged into any scenario that requires cache. Our simulation results on 8 production storage traces run against 3 different cache configurations confirm that the proposed cache replacement policy is effective and outperforms several state-of-the-art approaches.

在过去的几十年里，人们对缓存替换策略的设计进行了大量的研究。以前的工作经常依赖于人工设计的启发式方法来捕获最常见的缓存访问模式，或者预测重用距离，并尝试识别缓存友好或缓存厌恶的块。研究人员现在正在应用机器学习的最新进展来指导缓存替换策略，增强或取代传统的启发式方法和数据结构。然而，现有的大多数方法都依赖于特定的环境，这限制了它们的应用，例如，大多数方法只考虑由程序计数器(pc)组成的片上缓存。此外，由于所使用的功能有限，那些具有吸引人的命中率的方法通常无法处理现代不规则的工作负载。在此基础上，我们提出了一种普普性缓存替换框架，通过深度强化学习自动学习不同替换策略的概率分布与工作负载分布之间的关系。通过两种简单稳定的缓存替换策略，只对过去请求的地址进行端到端缓存替换策略的训练。此外，整个框架可以很容易地插入到任何需要缓存的场景中。我们对3种不同缓存配置的8种生产存储跟踪进行了模拟，结果证实了所提出的缓存替换策略是有效的，并且优于几种最先进的方法。

{"title":"An End-to-End Automatic Cache Replacement Policy Using Deep Reinforcement Learning","authors":"Yangqing Zhou, Fang Wang, Zhan Shi, D. Feng","doi":"10.1609/icaps.v32i1.19840","DOIUrl":"https://doi.org/10.1609/icaps.v32i1.19840","url":null,"abstract":"In the past few decades, much research has been conducted on the design of cache replacement policies. Prior work frequently relies on manually-engineered heuristics to capture the most common cache access patterns, or predict the reuse distance and try to identify the blocks that are either cache-friendly or cache-averse. Researchers are now applying recent advances in machine learning to guide cache replacement policy, augmenting or replacing traditional heuristics and data structures. However, most existing approaches depend on the certain environment which restricted their application, e.g, most of the approaches only consider the on-chip cache consisting of program counters (PCs). Moreover, those approaches with attractive hit rates are usually unable to deal with modern irregular workloads, due to the limited feature used. In contrast, we propose a pervasive cache replacement framework to automatically learn the relationship between the probability distribution of different replacement policies and workload distribution by using deep reinforcement learning. We train an end-to-end cache replacement policy only on the past requested address through two simple and stable cache replacement policies. Furthermore, the overall framework can be easily plugged into any scenario that requires cache. Our simulation results on 8 production storage traces run against 3 different cache configurations confirm that the proposed cache replacement policy is effective and outperforms several state-of-the-art approaches.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125537086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International Conference on Automated Planning and Scheduling

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀