Multi-Task Imitation Learning for Linear Dynamical Systems

Thomas Zhang, Katie Kang, Bruce Lee, C. Tomlin, S. Levine, Stephen Tu, N. Matni
{"title":"Multi-Task Imitation Learning for Linear Dynamical Systems","authors":"Thomas Zhang, Katie Kang, Bruce Lee, C. Tomlin, S. Levine, Stephen Tu, N. Matni","doi":"10.48550/arXiv.2212.00186","DOIUrl":null,"url":null,"abstract":"We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $\\tilde{O}\\left( \\frac{k n_x}{HN_{\\mathrm{shared}}} + \\frac{k n_u}{N_{\\mathrm{target}}}\\right)$, where $n_x>k$ is the state dimension, $n_u$ is the input dimension, $N_{\\mathrm{shared}}$ denotes the total amount of data collected for each policy during representation learning, and $N_{\\mathrm{target}}$ is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Learning for Dynamics & Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2212.00186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)$, where $n_x>k$ is the state dimension, $n_u$ is the input dimension, $N_{\mathrm{shared}}$ denotes the total amount of data collected for each policy during representation learning, and $N_{\mathrm{target}}$ is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
线性动力系统的多任务模仿学习
我们研究了线性系统上高效模仿学习的表示学习。特别是,我们考虑将学习分为两个阶段的设置:(a)预训练步骤,其中从$H$源策略中学习共享的$k$维表示,以及(b)目标策略微调步骤,其中学习到的表示用于参数化策略类。我们发现由学习的目标策略生成的轨迹上的模仿间隙由$\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)$限定,其中$n_x>k$为状态维,$n_u$为输入维,$N_{\mathrm{shared}}$表示表示学习过程中每个策略收集的数据总量,$N_{\mathrm{target}}$为目标任务数据量。这个结果形式化了一种直觉,即跨相关任务聚合数据来学习一个表示可以显著提高学习目标任务的样本效率。模拟结果证实了这一界限所暗示的趋势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Agile Catching with Whole-Body MPC and Blackbox Policy Learning Time Dependent Inverse Optimal Control using Trigonometric Basis Functions Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning Black-Box vs. Gray-Box: A Case Study on Learning Table Tennis Ball Trajectory Prediction with Spin and Impacts Model-based Validation as Probabilistic Inference
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1