Integrating User Gaze with Verbal Instruction to Reliably Estimate Robotic Task Parameters in a Human-Robot Collaborative Environment

Proceedings of the 2023 6th International Conference on Machine Vision and Applications Pub Date : 2023-03-10 DOI:10.1145/3589572.3589580

S. K. Paul, M. Nicolescu, M. Nicolescu

{"title":"Integrating User Gaze with Verbal Instruction to Reliably Estimate Robotic Task Parameters in a Human-Robot Collaborative Environment","authors":"S. K. Paul, M. Nicolescu, M. Nicolescu","doi":"10.1145/3589572.3589580","DOIUrl":null,"url":null,"abstract":"As robots become more ubiquitous in our daily life, it has become very important to extract task and environmental information through more natural, meaningful, and easy-to-use interaction interfaces. Not only this helps the user to adapt to (thus trust) a robot in a collaborative environment, it can supplement the core sensory information, helping the robot make reliable decisions. This paper presents a framework that combines two natural interaction interfaces: speech and gaze to reliably infer the object of interest and the robotic task parameters. The gaze estimation module utilizes pre-defined 3D facial points and matches them to a set of extracted estimated 3D facial landmarks of the users from 2D images to infer the gaze direction. Subsequently, the verbal instructions are passed through a deep learning model to extract the information relevant to a robotic task. These extracted task parameters from verbal instructions along with the estimated gaze directions are combined to detect and/or disambiguate objects in the scene to generate the final task configurations. The proposed framework shows very promising results in integrating the relevant task parameters for the intended robotic tasks in different real-world interaction scenarios.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3589572.3589580","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

As robots become more ubiquitous in our daily life, it has become very important to extract task and environmental information through more natural, meaningful, and easy-to-use interaction interfaces. Not only this helps the user to adapt to (thus trust) a robot in a collaborative environment, it can supplement the core sensory information, helping the robot make reliable decisions. This paper presents a framework that combines two natural interaction interfaces: speech and gaze to reliably infer the object of interest and the robotic task parameters. The gaze estimation module utilizes pre-defined 3D facial points and matches them to a set of extracted estimated 3D facial landmarks of the users from 2D images to infer the gaze direction. Subsequently, the verbal instructions are passed through a deep learning model to extract the information relevant to a robotic task. These extracted task parameters from verbal instructions along with the estimated gaze directions are combined to detect and/or disambiguate objects in the scene to generate the final task configurations. The proposed framework shows very promising results in integrating the relevant task parameters for the intended robotic tasks in different real-world interaction scenarios.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于用户注视和语言指令的人机协作环境下机器人任务参数可靠估计

随着机器人在我们的日常生活中越来越普遍，通过更自然、更有意义、更易于使用的交互界面提取任务和环境信息变得非常重要。这不仅有助于用户在协作环境中适应(从而信任)机器人，还可以补充核心感官信息，帮助机器人做出可靠的决策。本文提出了一个结合语音和凝视两种自然交互界面的框架，以可靠地推断感兴趣的对象和机器人任务参数。注视估计模块利用预定义的3D面部点，将其与一组从2D图像中提取的估计用户的3D面部地标进行匹配，从而推断出注视方向。随后，口头指令通过深度学习模型来提取与机器人任务相关的信息。这些从口头指令中提取的任务参数与估计的凝视方向相结合，以检测和/或消除场景中的物体的歧义，从而生成最终的任务配置。所提出的框架在整合不同现实世界交互场景中机器人任务的相关任务参数方面显示出非常有希望的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2023 6th International Conference on Machine Vision and Applications

自引率

0.00%

发文量