Elena Merlo;Marta Lagomarsino;Edoardo Lamon;Arash Ajoudani
{"title":"Exploiting Information Theory for Intuitive Robot Programming of Manual Activities","authors":"Elena Merlo;Marta Lagomarsino;Edoardo Lamon;Arash Ajoudani","doi":"10.1109/TRO.2025.3530267","DOIUrl":null,"url":null,"abstract":"Observational learning is a promising approach to enable people without expertise in programming to transfer skills to robots in a user-friendly manner, since it mirrors how humans learn new behaviors by observing others. Many existing methods focus on instructing robots to mimic human trajectories, but motion-level strategies often pose challenges in skills generalization across diverse environments. This article proposes a novel framework that allows robots to achieve a <italic>higher-level</i> understanding of human-demonstrated manual tasks recorded in RGB videos. By recognizing the task structure and goals, robots generalize what observed to unseen scenarios. We found our task representation on Shannon's Information Theory (IT), which is applied for the first time to manual tasks. IT helps extract the active scene elements and quantify the information shared between hands and objects. We exploit scene graph properties to encode the extracted interaction features in a compact structure and segment the demonstration into blocks, streamlining the generation of behavior trees for robot replicas. Experiments validated the effectiveness of IT to automatically generate robot execution plans from a single human demonstration. In addition, we provide HANDSOME, an open-source dataset of HAND Skills demOnstrated by Multi-subjEcts, to promote further research and evaluation in this field.","PeriodicalId":50388,"journal":{"name":"IEEE Transactions on Robotics","volume":"41 ","pages":"1245-1262"},"PeriodicalIF":10.5000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Robotics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10842468/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Observational learning is a promising approach to enable people without expertise in programming to transfer skills to robots in a user-friendly manner, since it mirrors how humans learn new behaviors by observing others. Many existing methods focus on instructing robots to mimic human trajectories, but motion-level strategies often pose challenges in skills generalization across diverse environments. This article proposes a novel framework that allows robots to achieve a higher-level understanding of human-demonstrated manual tasks recorded in RGB videos. By recognizing the task structure and goals, robots generalize what observed to unseen scenarios. We found our task representation on Shannon's Information Theory (IT), which is applied for the first time to manual tasks. IT helps extract the active scene elements and quantify the information shared between hands and objects. We exploit scene graph properties to encode the extracted interaction features in a compact structure and segment the demonstration into blocks, streamlining the generation of behavior trees for robot replicas. Experiments validated the effectiveness of IT to automatically generate robot execution plans from a single human demonstration. In addition, we provide HANDSOME, an open-source dataset of HAND Skills demOnstrated by Multi-subjEcts, to promote further research and evaluation in this field.
观察学习是一种很有前途的方法,可以让没有编程专业知识的人以一种用户友好的方式将技能传授给机器人,因为它反映了人类如何通过观察他人来学习新行为。许多现有的方法侧重于指导机器人模仿人类的轨迹,但运动水平的策略往往对技能在不同环境中的泛化构成挑战。本文提出了一个新的框架,允许机器人实现对RGB视频中记录的人类演示的手动任务的更高层次的理解。通过识别任务结构和目标,机器人将观察到的情况归纳为未见的情况。我们在Shannon的信息理论(IT)上发现了我们的任务表示,这是第一次将其应用于手动任务。IT有助于提取活动场景元素,并量化手和物体之间共享的信息。我们利用场景图属性将提取的交互特征编码为紧凑的结构,并将演示分割成块,简化了机器人复制品的行为树生成。实验验证了IT从单个人类演示中自动生成机器人执行计划的有效性。此外,我们还提供了一个开源的HAND Skills demonstration by Multi-subjEcts数据集HANDSOME,以促进该领域的进一步研究和评估。
期刊介绍:
The IEEE Transactions on Robotics (T-RO) is dedicated to publishing fundamental papers covering all facets of robotics, drawing on interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, and beyond. From industrial applications to service and personal assistants, surgical operations to space, underwater, and remote exploration, robots and intelligent machines play pivotal roles across various domains, including entertainment, safety, search and rescue, military applications, agriculture, and intelligent vehicles.
Special emphasis is placed on intelligent machines and systems designed for unstructured environments, where a significant portion of the environment remains unknown and beyond direct sensing or control.