从哪里和如何到我们所看到的

2013 IEEE International Conference on Computer Vision Pub Date : 2013-12-01 DOI:10.1109/ICCV.2013.83

S. Karthikeyan, V. Jagadeesh, Renuka Shenoy, M. Eckstein, B. S. Manjunath

{"title":"从哪里和如何到我们所看到的","authors":"S. Karthikeyan, V. Jagadeesh, Renuka Shenoy, M. Eckstein, B. S. Manjunath","doi":"10.1109/ICCV.2013.83","DOIUrl":null,"url":null,"abstract":"Eye movement studies have confirmed that overt attention is highly biased towards faces and text regions in images. In this paper we explore a novel problem of predicting face and text regions in images using eye tracking data from multiple subjects. The problem is challenging as we aim to predict the semantics (face/text/background) only from eye tracking data without utilizing any image information. The proposed algorithm spatially clusters eye tracking data obtained in an image into different coherent groups and subsequently models the likelihood of the clusters containing faces and text using a fully connected Markov Random Field (MRF). Given the eye tracking data from a test image, it predicts potential face/head (humans, dogs and cats) and text locations reliably. Furthermore, the approach can be used to select regions of interest for further analysis by object detectors for faces and text. The hybrid eye position/object detector approach achieves better detection performance and reduced computation time compared to using only the object detection algorithm. We also present a new eye tracking dataset on 300 images selected from ICDAR, Street-view, Flickr and Oxford-IIIT Pet Dataset from 15 subjects.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"126 1","pages":"625-632"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":"{\"title\":\"From Where and How to What We See\",\"authors\":\"S. Karthikeyan, V. Jagadeesh, Renuka Shenoy, M. Eckstein, B. S. Manjunath\",\"doi\":\"10.1109/ICCV.2013.83\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Eye movement studies have confirmed that overt attention is highly biased towards faces and text regions in images. In this paper we explore a novel problem of predicting face and text regions in images using eye tracking data from multiple subjects. The problem is challenging as we aim to predict the semantics (face/text/background) only from eye tracking data without utilizing any image information. The proposed algorithm spatially clusters eye tracking data obtained in an image into different coherent groups and subsequently models the likelihood of the clusters containing faces and text using a fully connected Markov Random Field (MRF). Given the eye tracking data from a test image, it predicts potential face/head (humans, dogs and cats) and text locations reliably. Furthermore, the approach can be used to select regions of interest for further analysis by object detectors for faces and text. The hybrid eye position/object detector approach achieves better detection performance and reduced computation time compared to using only the object detection algorithm. We also present a new eye tracking dataset on 300 images selected from ICDAR, Street-view, Flickr and Oxford-IIIT Pet Dataset from 15 subjects.\",\"PeriodicalId\":6351,\"journal\":{\"name\":\"2013 IEEE International Conference on Computer Vision\",\"volume\":\"126 1\",\"pages\":\"625-632\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Conference on Computer Vision\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCV.2013.83\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2013.83","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 32

摘要

眼球运动研究已经证实，明显的注意力高度偏向于图像中的人脸和文字区域。在本文中，我们探索了一个新的问题，即使用来自多个受试者的眼动追踪数据来预测图像中的人脸和文本区域。这个问题是具有挑战性的，因为我们的目标是仅从眼动追踪数据中预测语义(人脸/文本/背景)，而不使用任何图像信息。该算法将图像中获得的眼动追踪数据在空间上聚类为不同的连贯组，然后使用完全连接的马尔可夫随机场(MRF)对包含人脸和文本的聚类进行可能性建模。根据测试图像的眼动追踪数据，它可以可靠地预测潜在的面部/头部(人类、狗和猫)和文本位置。此外，该方法可用于选择感兴趣的区域，以便对象检测器对人脸和文本进行进一步分析。与仅使用目标检测算法相比，混合眼位/目标检测方法具有更好的检测性能和更少的计算时间。我们还提出了一个新的眼动追踪数据集，该数据集从ICDAR、街景、Flickr和Oxford-IIIT Pet数据集中选择了300幅图像，来自15个主题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

From Where and How to What We See

Eye movement studies have confirmed that overt attention is highly biased towards faces and text regions in images. In this paper we explore a novel problem of predicting face and text regions in images using eye tracking data from multiple subjects. The problem is challenging as we aim to predict the semantics (face/text/background) only from eye tracking data without utilizing any image information. The proposed algorithm spatially clusters eye tracking data obtained in an image into different coherent groups and subsequently models the likelihood of the clusters containing faces and text using a fully connected Markov Random Field (MRF). Given the eye tracking data from a test image, it predicts potential face/head (humans, dogs and cats) and text locations reliably. Furthermore, the approach can be used to select regions of interest for further analysis by object detectors for faces and text. The hybrid eye position/object detector approach achieves better detection performance and reduced computation time compared to using only the object detection algorithm. We also present a new eye tracking dataset on 300 images selected from ICDAR, Street-view, Flickr and Oxford-IIIT Pet Dataset from 15 subjects.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE International Conference on Computer Vision

自引率

0.00%

发文量

期刊最新文献

PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects A General Dense Image Matching Framework Combining Direct and Feature-Based Costs Latent Space Sparse Subspace Clustering Non-convex P-Norm Projection for Robust Sparsity Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition