{"title":"Information Fusion in Visual-Task Inference","authors":"Amin Haji Abolhassani, James J. Clark","doi":"10.1109/CRV.2012.14","DOIUrl":null,"url":null,"abstract":"Eye movement is a rich modality that can provide us with a window into a person's mind. In a typical human-human interaction, we can get information about the behavioral state of the others by examining their eye movements. For instance, when a poker player looks into the eyes of his opponent, he looks for any indication of bluffing by verifying the dynamics of the eye movements. However, the information extracted from the eyes is not the only source of information we get in a human-human interaction and other modalities, such as speech or gesture, help us infer the behavioral state of the others. Most of the time this fusion of information refines our decisions and helps us better infer people's cognitive and behavioral activity based on their actions. In this paper, we develop a probabilistic framework to fuse different sources of information to infer the ongoing task in a visual search activity given the viewer's eye movement data. We propose to use a dynamic programming method called token passing in an eye-typing application to reveal what the subject is typing during a search process by observing his direction of gaze during the execution of the task. Token passing is a computationally simple technique that allows us to fuse higher order constraints in the inference process and build models dynamically so we can have unlimited number of hypotheses. In the experiments we examine the effect of higher order information, in the form of a lexicon dictionary, on the task recognition accuracy.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Ninth Conference on Computer and Robot Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CRV.2012.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Eye movement is a rich modality that can provide us with a window into a person's mind. In a typical human-human interaction, we can get information about the behavioral state of the others by examining their eye movements. For instance, when a poker player looks into the eyes of his opponent, he looks for any indication of bluffing by verifying the dynamics of the eye movements. However, the information extracted from the eyes is not the only source of information we get in a human-human interaction and other modalities, such as speech or gesture, help us infer the behavioral state of the others. Most of the time this fusion of information refines our decisions and helps us better infer people's cognitive and behavioral activity based on their actions. In this paper, we develop a probabilistic framework to fuse different sources of information to infer the ongoing task in a visual search activity given the viewer's eye movement data. We propose to use a dynamic programming method called token passing in an eye-typing application to reveal what the subject is typing during a search process by observing his direction of gaze during the execution of the task. Token passing is a computationally simple technique that allows us to fuse higher order constraints in the inference process and build models dynamically so we can have unlimited number of hypotheses. In the experiments we examine the effect of higher order information, in the form of a lexicon dictionary, on the task recognition accuracy.