Multi-Task Imitation Learning for Linear Dynamical Systems

Conference on Learning for Dynamics & Control Pub Date : 2022-12-01 DOI:10.48550/arXiv.2212.00186

Thomas Zhang, Katie Kang, Bruce Lee, C. Tomlin, S. Levine, Stephen Tu, N. Matni

引用次数: 8

Abstract

We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)$, where $n_x>k$ is the state dimension, $n_u$ is the input dimension, $N_{\mathrm{shared}}$ denotes the total amount of data collected for each policy during representation learning, and $N_{\mathrm{target}}$ is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

线性动力系统的多任务模仿学习

我们研究了线性系统上高效模仿学习的表示学习。特别是，我们考虑将学习分为两个阶段的设置:(a)预训练步骤，其中从$H$源策略中学习共享的$k$维表示，以及(b)目标策略微调步骤，其中学习到的表示用于参数化策略类。我们发现由学习的目标策略生成的轨迹上的模仿间隙由$\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)$限定，其中$n_x>k$为状态维，$n_u$为输入维，$N_{\mathrm{shared}}$表示表示学习过程中每个策略收集的数据总量，$N_{\mathrm{target}}$为目标任务数据量。这个结果形式化了一种直觉，即跨相关任务聚合数据来学习一个表示可以显著提高学习目标任务的样本效率。模拟结果证实了这一界限所暗示的趋势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Conference on Learning for Dynamics & Control

自引率

0.00%

发文量