Information Fusion in Visual-Task Inference

2012 Ninth Conference on Computer and Robot Vision Pub Date : 2012-05-28 DOI:10.1109/CRV.2012.14

Amin Haji Abolhassani, James J. Clark

{"title":"Information Fusion in Visual-Task Inference","authors":"Amin Haji Abolhassani, James J. Clark","doi":"10.1109/CRV.2012.14","DOIUrl":null,"url":null,"abstract":"Eye movement is a rich modality that can provide us with a window into a person's mind. In a typical human-human interaction, we can get information about the behavioral state of the others by examining their eye movements. For instance, when a poker player looks into the eyes of his opponent, he looks for any indication of bluffing by verifying the dynamics of the eye movements. However, the information extracted from the eyes is not the only source of information we get in a human-human interaction and other modalities, such as speech or gesture, help us infer the behavioral state of the others. Most of the time this fusion of information refines our decisions and helps us better infer people's cognitive and behavioral activity based on their actions. In this paper, we develop a probabilistic framework to fuse different sources of information to infer the ongoing task in a visual search activity given the viewer's eye movement data. We propose to use a dynamic programming method called token passing in an eye-typing application to reveal what the subject is typing during a search process by observing his direction of gaze during the execution of the task. Token passing is a computationally simple technique that allows us to fuse higher order constraints in the inference process and build models dynamically so we can have unlimited number of hypotheses. In the experiments we examine the effect of higher order information, in the form of a lexicon dictionary, on the task recognition accuracy.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Ninth Conference on Computer and Robot Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CRV.2012.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Eye movement is a rich modality that can provide us with a window into a person's mind. In a typical human-human interaction, we can get information about the behavioral state of the others by examining their eye movements. For instance, when a poker player looks into the eyes of his opponent, he looks for any indication of bluffing by verifying the dynamics of the eye movements. However, the information extracted from the eyes is not the only source of information we get in a human-human interaction and other modalities, such as speech or gesture, help us infer the behavioral state of the others. Most of the time this fusion of information refines our decisions and helps us better infer people's cognitive and behavioral activity based on their actions. In this paper, we develop a probabilistic framework to fuse different sources of information to infer the ongoing task in a visual search activity given the viewer's eye movement data. We propose to use a dynamic programming method called token passing in an eye-typing application to reveal what the subject is typing during a search process by observing his direction of gaze during the execution of the task. Token passing is a computationally simple technique that allows us to fuse higher order constraints in the inference process and build models dynamically so we can have unlimited number of hypotheses. In the experiments we examine the effect of higher order information, in the form of a lexicon dictionary, on the task recognition accuracy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

视觉任务推理中的信息融合

眼动是一种丰富的形态，它可以为我们提供一扇了解一个人思想的窗口。在典型的人际互动中，我们可以通过观察对方的眼球运动来了解对方的行为状态。例如，当一个扑克玩家看着对手的眼睛时，他会通过验证眼球运动的动态来寻找任何虚张声势的迹象。然而，从眼睛中提取的信息并不是我们在人际互动中获得的唯一信息来源，其他方式，如语言或手势，可以帮助我们推断他人的行为状态。大多数时候，这种信息融合可以改进我们的决策，帮助我们更好地根据人们的行为推断出他们的认知和行为活动。在本文中，我们开发了一个概率框架来融合不同来源的信息，以推断视觉搜索活动中正在进行的任务。我们建议在一个眼睛打字应用中使用一种动态规划方法，即标记传递，通过观察对象在执行任务过程中注视的方向来揭示对象在搜索过程中输入的内容。令牌传递是一种计算简单的技术，它允许我们在推理过程中融合高阶约束，并动态构建模型，这样我们就可以有无限数量的假设。在实验中，我们考察了词典词典形式的高阶信息对任务识别准确率的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2012 Ninth Conference on Computer and Robot Vision

自引率

0.00%

发文量

期刊最新文献

Visual Place Categorization in Indoor Environments Probabilistic Obstacle Detection Using 2 1/2 D Terrain Maps Shape from Suggestive Contours Using 3D Priors Large-Scale Tattoo Image Retrieval A Metaheuristic Bat-Inspired Algorithm for Full Body Human Pose Estimation