Neural-based reinforcement learning is a promising approach for teaching robots new behaviours. But one of its main limitations is the need for carefully hand-coded reward signals by an expert. It is thus crucial to automate the reward learning process so that new skills can be taught to robots by their users. This article proposes an approach for enabling robots to learn reward signals for sequential tasks from visual observations, eliminating the need for expert-designed reward signals. It involves dividing the sequential task into smaller sub-tasks using a novel auto-labelling technique to generate rewards for demonstration data. A novel image classifier is proposed to estimate the visual rewards for each task accurately. The effectiveness of the proposed approach is demonstrated in generating informative reward signals through comprehensive evaluations on three challenging sequential tasks: block stacking, door opening, and nuts assembly. By using the learnt reward signals to train reinforcement learning agents from demonstration, we are able to induce policies that outperform those trained with sparse oracle rewards. Since our approach consistently outperformed several baselines including DDPG, TD3, SAC, DAPG, GAIL, and AWAC, it represents an advancement in the application of model-free reinforcement learning to sequential robotic tasks.
扫码关注我们
求助内容:
应助结果提醒方式:
