Semantic constraints to represent common sense required in household actions for multimodal learning-from-observation robot

The International Journal of Robotics Research Pub Date : 2023-11-29 DOI:10.1177/02783649231212929

Katsushi Ikeuchi, Naoki Wake, Kazuhiro Sasabuchi, Jun Takamatsu

{"title":"Semantic constraints to represent common sense required in household actions for multimodal learning-from-observation robot","authors":"Katsushi Ikeuchi, Naoki Wake, Kazuhiro Sasabuchi, Jun Takamatsu","doi":"10.1177/02783649231212929","DOIUrl":null,"url":null,"abstract":"The learning-from-observation (LfO) paradigm allows a robot to learn how to perform actions by observing human actions. Previous research in top-down learning-from-observation has mainly focused on the industrial domain, which consists only of the real physical constraints between a manipulated tool and the robot’s working environment. To extend this paradigm to the household domain, which consists of imaginary constraints derived from human common sense, we introduce the idea of semantic constraints, which are represented similarly to the physical constraints by defining an imaginary contact with an imaginary environment. By studying the transitions between contact states under physical and semantic constraints, we derive a necessary and sufficient set of task representations that provides the upper bound of the possible task set. We then apply the task representations to analyze various actions in top-rated household YouTube videos and real home cooking recordings, classify frequently occurring constraint patterns into physical, semantic, and multi-step task groups, and determine a subset that covers standard household actions. Finally, we design and implement task models, corresponding to these task representations in the subset, with the necessary daemon functions to collect the necessary parameters to perform the corresponding household actions. Our results provide promising directions for incorporating common sense into the robot teaching literature.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"44 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International Journal of Robotics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/02783649231212929","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The learning-from-observation (LfO) paradigm allows a robot to learn how to perform actions by observing human actions. Previous research in top-down learning-from-observation has mainly focused on the industrial domain, which consists only of the real physical constraints between a manipulated tool and the robot’s working environment. To extend this paradigm to the household domain, which consists of imaginary constraints derived from human common sense, we introduce the idea of semantic constraints, which are represented similarly to the physical constraints by defining an imaginary contact with an imaginary environment. By studying the transitions between contact states under physical and semantic constraints, we derive a necessary and sufficient set of task representations that provides the upper bound of the possible task set. We then apply the task representations to analyze various actions in top-rated household YouTube videos and real home cooking recordings, classify frequently occurring constraint patterns into physical, semantic, and multi-step task groups, and determine a subset that covers standard household actions. Finally, we design and implement task models, corresponding to these task representations in the subset, with the necessary daemon functions to collect the necessary parameters to perform the corresponding household actions. Our results provide promising directions for incorporating common sense into the robot teaching literature.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用语义约束来表示多模态观察学习机器人在家庭行动中所需的常识

从观察中学习（LfO）范式允许机器人通过观察人类的行动来学习如何执行动作。以往自上而下的 "从观察中学习 "研究主要集中在工业领域，该领域只包括被操纵工具与机器人工作环境之间的实际物理约束。为了将这一范例扩展到由源自人类常识的假想约束组成的家居领域，我们引入了语义约束的概念，通过定义与假想环境的假想接触来表示与物理约束类似的语义约束。通过研究物理和语义约束下接触状态之间的转换，我们得出了一套必要且充分的任务表示法，为可能的任务集提供了上限。然后，我们应用任务表示法来分析 YouTube 热门家庭视频和真实家庭烹饪录音中的各种动作，将经常出现的约束模式分为物理、语义和多步骤任务组，并确定了涵盖标准家庭动作的子集。最后，我们设计并实现了与子集中的这些任务表征相对应的任务模型，并配备了必要的守护进程功能，以收集必要的参数来执行相应的家庭操作。我们的成果为将常识纳入机器人教学文献提供了很好的方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The International Journal of Robotics Research

自引率

0.00%

发文量