Semantically-Based Human Scanpath Estimation with HMMs

2013 IEEE International Conference on Computer Vision Pub Date : 2013-12-01 DOI:10.1109/ICCV.2013.401

Huiying Liu, Dong Xu, Qingming Huang, Wen Li, Min Xu, Stephen Lin

{"title":"Semantically-Based Human Scanpath Estimation with HMMs","authors":"Huiying Liu, Dong Xu, Qingming Huang, Wen Li, Min Xu, Stephen Lin","doi":"10.1109/ICCV.2013.401","DOIUrl":null,"url":null,"abstract":"We present a method for estimating human scan paths, which are sequences of gaze shifts that follow visual attention over an image. In this work, scan paths are modeled based on three principal factors that influence human attention, namely low-level feature saliency, spatial position, and semantic content. Low-level feature saliency is formulated as transition probabilities between different image regions based on feature differences. The effect of spatial position on gaze shifts is modeled as a Levy flight with the shifts following a 2D Cauchy distribution. To account for semantic content, we propose to use a Hidden Markov Model (HMM) with a Bag-of-Visual-Words descriptor of image regions. An HMM is well-suited for this purpose in that 1) the hidden states, obtained by unsupervised learning, can represent latent semantic concepts, 2) the prior distribution of the hidden states describes visual attraction to the semantic concepts, and 3) the transition probabilities represent human gaze shift patterns. The proposed method is applied to task-driven viewing processes. Experiments and analysis performed on human eye gaze data verify the effectiveness of this method.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"2010 1","pages":"3232-3239"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2013.401","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 39

Abstract

We present a method for estimating human scan paths, which are sequences of gaze shifts that follow visual attention over an image. In this work, scan paths are modeled based on three principal factors that influence human attention, namely low-level feature saliency, spatial position, and semantic content. Low-level feature saliency is formulated as transition probabilities between different image regions based on feature differences. The effect of spatial position on gaze shifts is modeled as a Levy flight with the shifts following a 2D Cauchy distribution. To account for semantic content, we propose to use a Hidden Markov Model (HMM) with a Bag-of-Visual-Words descriptor of image regions. An HMM is well-suited for this purpose in that 1) the hidden states, obtained by unsupervised learning, can represent latent semantic concepts, 2) the prior distribution of the hidden states describes visual attraction to the semantic concepts, and 3) the transition probabilities represent human gaze shift patterns. The proposed method is applied to task-driven viewing processes. Experiments and analysis performed on human eye gaze data verify the effectiveness of this method.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于语义的hmm人体扫描路径估计

我们提出了一种估计人类扫描路径的方法，这是跟随视觉注意力在图像上的凝视转移序列。在这项工作中，扫描路径基于影响人类注意力的三个主要因素建模，即低水平特征显著性、空间位置和语义内容。低水平特征显著性被表示为基于特征差异的不同图像区域之间的过渡概率。空间位置对注视位移的影响模型为Levy飞行，注视位移服从二维柯西分布。为了考虑语义内容，我们建议使用隐马尔可夫模型(HMM)和图像区域的视觉词袋描述符。HMM非常适合这一目的，因为1)通过无监督学习获得的隐藏状态可以表示潜在的语义概念，2)隐藏状态的先验分布描述了对语义概念的视觉吸引力，3)转移概率表示人类的目光转移模式。将该方法应用于任务驱动的查看过程。通过人眼注视数据的实验和分析，验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 IEEE International Conference on Computer Vision

自引率

0.00%

发文量

期刊最新文献

PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects A General Dense Image Matching Framework Combining Direct and Feature-Based Costs Latent Space Sparse Subspace Clustering Non-convex P-Norm Projection for Robust Sparsity Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition