Generalizable Multi-Modal Adversarial Imitation Learning for Non-Stationary Dynamics

Yi-Chen Li;Ningjing Chao;Zongzhang Zhang;Fuxiang Zhang;Lei Yuan;Yang Yu
{"title":"Generalizable Multi-Modal Adversarial Imitation Learning for Non-Stationary Dynamics","authors":"Yi-Chen Li;Ningjing Chao;Zongzhang Zhang;Fuxiang Zhang;Lei Yuan;Yang Yu","doi":"10.1109/TPAMI.2025.3552228","DOIUrl":null,"url":null,"abstract":"Imitation Learning (IL) learns from experts, on which most existing studies assume that the imitator will be deployed in stationary environments. However, real-world scenarios commonly involve perturbations, necessitating robust imitators for <italic>non-stationary</i> scenarios. To fulfill this, we leverage a multi-modal expert dataset encompassing diverse dynamics, while still adhering to the shared goal between the experts and imitator. Different from conventional multi-modal IL work that considers reproducing the demonstrated different behaviors, we aim to imitate a policy that rapidly adapts to sudden dynamic changes, even when encountering dynamics unseen during training. We propose a method called Generalizable Multi-modal Adversarial Imitation Learning (GMAIL) for non-stationary dynamics, which adversarially trains a discriminator and a generator. Due to dynamic mismatch between the experts and the imitator, the optimal next state for the imitator may require several steps for the experts to reach, inspiring us to propose to take the state-next-state pairs within <italic>multiple</i> steps in the demonstrated trajectories to facilitate imitation under dynamic mismatch. For quick identification of the changed dynamic, GMAIL learns a dynamics-sensitive generator by introducing a history-based context encoder. On a wide range of navigation, locomotion and autonomous driving tasks, empirical results illustrate the effectiveness of GMAIL.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 7","pages":"5600-5612"},"PeriodicalIF":18.6000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10930709/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Imitation Learning (IL) learns from experts, on which most existing studies assume that the imitator will be deployed in stationary environments. However, real-world scenarios commonly involve perturbations, necessitating robust imitators for non-stationary scenarios. To fulfill this, we leverage a multi-modal expert dataset encompassing diverse dynamics, while still adhering to the shared goal between the experts and imitator. Different from conventional multi-modal IL work that considers reproducing the demonstrated different behaviors, we aim to imitate a policy that rapidly adapts to sudden dynamic changes, even when encountering dynamics unseen during training. We propose a method called Generalizable Multi-modal Adversarial Imitation Learning (GMAIL) for non-stationary dynamics, which adversarially trains a discriminator and a generator. Due to dynamic mismatch between the experts and the imitator, the optimal next state for the imitator may require several steps for the experts to reach, inspiring us to propose to take the state-next-state pairs within multiple steps in the demonstrated trajectories to facilitate imitation under dynamic mismatch. For quick identification of the changed dynamic, GMAIL learns a dynamics-sensitive generator by introducing a history-based context encoder. On a wide range of navigation, locomotion and autonomous driving tasks, empirical results illustrate the effectiveness of GMAIL.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
针对非稳态动态的可通用多模式对抗模仿学习
模仿学习(IL)向专家学习,现有的大多数研究都假设模仿者将被部署在固定的环境中。然而,现实世界的场景通常涉及扰动,需要非平稳场景的鲁棒模仿者。为了实现这一点,我们利用了一个包含不同动态的多模态专家数据集,同时仍然坚持专家和模仿者之间的共同目标。与传统的多模态IL工作不同,我们的目标是模仿一种快速适应突然动态变化的策略,即使在训练中遇到未见的动态。我们提出了一种非平稳动态的通用多模态对抗模仿学习(GMAIL)方法,该方法对抗性地训练一个鉴别器和一个生成器。由于专家和模仿者之间的动态不匹配,模仿者的最佳下一状态可能需要专家达到多个步骤,这启发我们提出在演示轨迹的多个步骤中采取状态-下一状态对,以促进动态不匹配下的模仿。为了快速识别变化的动态,GMAIL通过引入基于历史的上下文编码器学习了一个动态敏感生成器。在广泛的导航、运动和自动驾驶任务中,实证结果证明了GMAIL的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
GrowSP++: Growing Superpoints and Primitives for Unsupervised 3D Semantic Segmentation. Unsupervised Gaze Representation Learning by Switching Features. H2OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers. MV2DFusion: Leveraging Modality-Specific Object Semantics for Multi-Modal 3D Detection. Parse Trees Guided LLM Prompt Compression.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1