Maximum margin planning

Proceedings of the 23rd international conference on Machine learning Pub Date : 2006-06-25 DOI:10.1145/1143844.1143936

Ashesh Jain, Michael Hu, Nathan D. Ratliff, Drew Bagnell, Martin A Zinkevich

引用次数: 681

Abstract

Imitation learning of sequential, goal-directed behavior by standard supervised techniques is often difficult. We frame learning such behaviors as a maximum margin structured prediction problem over a space of policies. In this approach, we learn mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior. Further, we demonstrate a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference. Although the technique is general, it is particularly relevant in problems where A* and dynamic programming approaches make learning policies tractable in problems beyond the limitations of a QP formulation. We demonstrate our approach applied to route planning for outdoor mobile robots, where the behavior a designer wishes a planner to execute is often clear, while specifying cost functions that engender this behavior is a much more difficult task.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

最大余量规划

通过标准监督技术对顺序的、目标导向的行为进行模仿学习通常是困难的。我们将学习这些行为定义为策略空间上的最大边际结构化预测问题。在这种方法中，我们学习从特征到成本的映射，因此具有这些成本的MDP中的最优策略可以模仿专家的行为。此外，我们展示了一种简单的，可证明有效的结构化最大边际学习方法，基于子梯度方法，利用现有的快速算法进行推理。尽管该技术是通用的，但它特别适用于A*和动态规划方法使学习策略在超出QP公式限制的问题中易于处理的问题。我们展示了我们的方法应用于户外移动机器人的路线规划，其中设计者希望计划者执行的行为通常是明确的，而指定产生这种行为的成本函数是一项更加困难的任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 23rd international conference on Machine learning

自引率

0.00%

发文量

期刊最新文献

On a theory of learning with similarity functions Bayesian learning of measurement and structural models Predictive search distributions Data association for topic intensity tracking Feature value acquisition in testing: a sequential batch test algorithm