Integrating User Gaze with Verbal Instruction to Reliably Estimate Robotic Task Parameters in a Human-Robot Collaborative Environment

S. K. Paul, M. Nicolescu, M. Nicolescu
{"title":"Integrating User Gaze with Verbal Instruction to Reliably Estimate Robotic Task Parameters in a Human-Robot Collaborative Environment","authors":"S. K. Paul, M. Nicolescu, M. Nicolescu","doi":"10.1145/3589572.3589580","DOIUrl":null,"url":null,"abstract":"As robots become more ubiquitous in our daily life, it has become very important to extract task and environmental information through more natural, meaningful, and easy-to-use interaction interfaces. Not only this helps the user to adapt to (thus trust) a robot in a collaborative environment, it can supplement the core sensory information, helping the robot make reliable decisions. This paper presents a framework that combines two natural interaction interfaces: speech and gaze to reliably infer the object of interest and the robotic task parameters. The gaze estimation module utilizes pre-defined 3D facial points and matches them to a set of extracted estimated 3D facial landmarks of the users from 2D images to infer the gaze direction. Subsequently, the verbal instructions are passed through a deep learning model to extract the information relevant to a robotic task. These extracted task parameters from verbal instructions along with the estimated gaze directions are combined to detect and/or disambiguate objects in the scene to generate the final task configurations. The proposed framework shows very promising results in integrating the relevant task parameters for the intended robotic tasks in different real-world interaction scenarios.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3589572.3589580","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

As robots become more ubiquitous in our daily life, it has become very important to extract task and environmental information through more natural, meaningful, and easy-to-use interaction interfaces. Not only this helps the user to adapt to (thus trust) a robot in a collaborative environment, it can supplement the core sensory information, helping the robot make reliable decisions. This paper presents a framework that combines two natural interaction interfaces: speech and gaze to reliably infer the object of interest and the robotic task parameters. The gaze estimation module utilizes pre-defined 3D facial points and matches them to a set of extracted estimated 3D facial landmarks of the users from 2D images to infer the gaze direction. Subsequently, the verbal instructions are passed through a deep learning model to extract the information relevant to a robotic task. These extracted task parameters from verbal instructions along with the estimated gaze directions are combined to detect and/or disambiguate objects in the scene to generate the final task configurations. The proposed framework shows very promising results in integrating the relevant task parameters for the intended robotic tasks in different real-world interaction scenarios.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于用户注视和语言指令的人机协作环境下机器人任务参数可靠估计
随着机器人在我们的日常生活中越来越普遍,通过更自然、更有意义、更易于使用的交互界面提取任务和环境信息变得非常重要。这不仅有助于用户在协作环境中适应(从而信任)机器人,还可以补充核心感官信息,帮助机器人做出可靠的决策。本文提出了一个结合语音和凝视两种自然交互界面的框架,以可靠地推断感兴趣的对象和机器人任务参数。注视估计模块利用预定义的3D面部点,将其与一组从2D图像中提取的估计用户的3D面部地标进行匹配,从而推断出注视方向。随后,口头指令通过深度学习模型来提取与机器人任务相关的信息。这些从口头指令中提取的任务参数与估计的凝视方向相结合,以检测和/或消除场景中的物体的歧义,从而生成最终的任务配置。所提出的框架在整合不同现实世界交互场景中机器人任务的相关任务参数方面显示出非常有希望的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Object-Based Vehicle Color Recognition in Uncontrolled Environment Detection of Fibrillatory Episodes in Atrial Fibrillation Rhythms via Topology-informed Machine Learning Structure-Enhanced Translation from PET to CT Modality with Paired GANs Multi-temporal process quality prediction based on graph neural network On-Demand Multiclass Imaging for Sample Scarcity in Industrial Environments
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1