Developing effective reward functions is crucial for robot learning, as they guide behavior and facilitate adaptation to human-like tasks. We present Human2Bot (H2B), advancing the learning of such a generalized multi-task reward function that can be used zero-shot to execute unknown tasks in unseen environments. H2B is a newly designed task similarity estimation model that is trained on a large dataset of human videos. The model determines whether two videos from different environments represent the same task. At test time, the model serves as a reward function, evaluating how closely a robot’s execution matches the human demonstration. While previous approaches necessitate robot-specific data to learn reward functions or policies, our method can learn without any robot datasets. To achieve generalization in robotic environments, we incorporate a domain augmentation process that generates synthetic videos with varied visual appearances resembling simulation environments, alongside a multi-scale inter-frame attention mechanism that aligns human and robot task understanding. Finally, H2B is integrated with Visual Model Predictive Control (VMPC) to perform manipulation tasks in simulation and on the xARM6 robot in real-world settings. Our approach outperforms previous methods in simulated and real-world environments trained solely on human data, eliminating the need for privileged robot datasets.
扫码关注我们
求助内容:
应助结果提醒方式:
