Sheida Nozari, L. Marcenaro, David Martín, C. Regazzoni
{"title":"Observational Learning: Imitation Through an Adaptive Probabilistic Approach","authors":"Sheida Nozari, L. Marcenaro, David Martín, C. Regazzoni","doi":"10.1109/ICAS49788.2021.9551152","DOIUrl":null,"url":null,"abstract":"This paper proposes an adaptive method to enable imitation learning from expert demonstrations in a multi-agent context. The proposed system employs the inverse reinforcement learning method to a coupled Dynamic Bayesian Network to facilitate dynamic learning in an interactive system. This method studies the interaction at both discrete and continuous levels by identifying inter-relationships between the objects to facilitate the prediction of an expert agent. We evaluate the learning procedure in the scene of learner agent based on probabilistic reward function. Our goal is to estimate policies that predict matched trajectories with the observed one by minimizing the Kullback-Leiber divergence. The reward policies provide a probabilistic dynamic structure to minimise the abnormalities.","PeriodicalId":287105,"journal":{"name":"2021 IEEE International Conference on Autonomous Systems (ICAS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Autonomous Systems (ICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAS49788.2021.9551152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper proposes an adaptive method to enable imitation learning from expert demonstrations in a multi-agent context. The proposed system employs the inverse reinforcement learning method to a coupled Dynamic Bayesian Network to facilitate dynamic learning in an interactive system. This method studies the interaction at both discrete and continuous levels by identifying inter-relationships between the objects to facilitate the prediction of an expert agent. We evaluate the learning procedure in the scene of learner agent based on probabilistic reward function. Our goal is to estimate policies that predict matched trajectories with the observed one by minimizing the Kullback-Leiber divergence. The reward policies provide a probabilistic dynamic structure to minimise the abnormalities.