{"title":"基于专家样本行为克隆的最大熵逆强化学习","authors":"Dazi Li, Jianghai Du","doi":"10.1109/DDCLS52934.2021.9455476","DOIUrl":null,"url":null,"abstract":"This study proposes a preprocessing framework for expert examples based on behavior cloning (BC) to solve the problem that inverse reinforcement learning (IRL) is inaccurate due to the noises of expert examples. In order to remove the noises in the expert examples, we first use supervised learning to learn the approximate expert policy, and then use this approximate expert policy to clone new expert examples from the old expert examples, the idea of this preprocessing framework is BC, IRL can obtain higher quality expert examples after preprocessing. The IRL framework adopts the form of maximum entropy, and specific experiments demonstrate the effectiveness of the proposed approach, in the case of expert examples with noises, the reward functions that after BC preprocessing is better than that without preprocessing, especially with the increase of noise level, the effect is particularly obvious.","PeriodicalId":325897,"journal":{"name":"2021 IEEE 10th Data Driven Control and Learning Systems Conference (DDCLS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Maximum Entropy Inverse Reinforcement Learning Based on Behavior Cloning of Expert Examples\",\"authors\":\"Dazi Li, Jianghai Du\",\"doi\":\"10.1109/DDCLS52934.2021.9455476\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study proposes a preprocessing framework for expert examples based on behavior cloning (BC) to solve the problem that inverse reinforcement learning (IRL) is inaccurate due to the noises of expert examples. In order to remove the noises in the expert examples, we first use supervised learning to learn the approximate expert policy, and then use this approximate expert policy to clone new expert examples from the old expert examples, the idea of this preprocessing framework is BC, IRL can obtain higher quality expert examples after preprocessing. The IRL framework adopts the form of maximum entropy, and specific experiments demonstrate the effectiveness of the proposed approach, in the case of expert examples with noises, the reward functions that after BC preprocessing is better than that without preprocessing, especially with the increase of noise level, the effect is particularly obvious.\",\"PeriodicalId\":325897,\"journal\":{\"name\":\"2021 IEEE 10th Data Driven Control and Learning Systems Conference (DDCLS)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 10th Data Driven Control and Learning Systems Conference (DDCLS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DDCLS52934.2021.9455476\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 10th Data Driven Control and Learning Systems Conference (DDCLS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DDCLS52934.2021.9455476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Maximum Entropy Inverse Reinforcement Learning Based on Behavior Cloning of Expert Examples
This study proposes a preprocessing framework for expert examples based on behavior cloning (BC) to solve the problem that inverse reinforcement learning (IRL) is inaccurate due to the noises of expert examples. In order to remove the noises in the expert examples, we first use supervised learning to learn the approximate expert policy, and then use this approximate expert policy to clone new expert examples from the old expert examples, the idea of this preprocessing framework is BC, IRL can obtain higher quality expert examples after preprocessing. The IRL framework adopts the form of maximum entropy, and specific experiments demonstrate the effectiveness of the proposed approach, in the case of expert examples with noises, the reward functions that after BC preprocessing is better than that without preprocessing, especially with the increase of noise level, the effect is particularly obvious.