Thomas Zhang, Katie Kang, Bruce Lee, C. Tomlin, S. Levine, Stephen Tu, N. Matni
{"title":"线性动力系统的多任务模仿学习","authors":"Thomas Zhang, Katie Kang, Bruce Lee, C. Tomlin, S. Levine, Stephen Tu, N. Matni","doi":"10.48550/arXiv.2212.00186","DOIUrl":null,"url":null,"abstract":"We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $\\tilde{O}\\left( \\frac{k n_x}{HN_{\\mathrm{shared}}} + \\frac{k n_u}{N_{\\mathrm{target}}}\\right)$, where $n_x>k$ is the state dimension, $n_u$ is the input dimension, $N_{\\mathrm{shared}}$ denotes the total amount of data collected for each policy during representation learning, and $N_{\\mathrm{target}}$ is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Multi-Task Imitation Learning for Linear Dynamical Systems\",\"authors\":\"Thomas Zhang, Katie Kang, Bruce Lee, C. Tomlin, S. Levine, Stephen Tu, N. Matni\",\"doi\":\"10.48550/arXiv.2212.00186\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $\\\\tilde{O}\\\\left( \\\\frac{k n_x}{HN_{\\\\mathrm{shared}}} + \\\\frac{k n_u}{N_{\\\\mathrm{target}}}\\\\right)$, where $n_x>k$ is the state dimension, $n_u$ is the input dimension, $N_{\\\\mathrm{shared}}$ denotes the total amount of data collected for each policy during representation learning, and $N_{\\\\mathrm{target}}$ is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.\",\"PeriodicalId\":268449,\"journal\":{\"name\":\"Conference on Learning for Dynamics & Control\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Conference on Learning for Dynamics & Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2212.00186\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Learning for Dynamics & Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2212.00186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-Task Imitation Learning for Linear Dynamical Systems
We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)$, where $n_x>k$ is the state dimension, $n_u$ is the input dimension, $N_{\mathrm{shared}}$ denotes the total amount of data collected for each policy during representation learning, and $N_{\mathrm{target}}$ is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.