How is Visual Attention Influenced by Text Guidance? Database and Model

Yinan Sun;Xiongkuo Min;Huiyu Duan;Guangtao Zhai
{"title":"How is Visual Attention Influenced by Text Guidance? Database and Model","authors":"Yinan Sun;Xiongkuo Min;Huiyu Duan;Guangtao Zhai","doi":"10.1109/TIP.2024.3461956","DOIUrl":null,"url":null,"abstract":"The analysis and prediction of visual attention have long been crucial tasks in the fields of computer vision and image processing. In practical applications, images are generally accompanied by various text descriptions, however, few studies have explored the influence of text descriptions on visual attention, let alone developed visual saliency prediction models considering text guidance. In this paper, we conduct a comprehensive study on text-guided image saliency (TIS) from both subjective and objective perspectives. Specifically, we construct a TIS database named SJTU-TIS, which includes 1200 text-image pairs and the corresponding collected eye-tracking data. Based on the established SJTU-TIS database, we analyze the influence of various text descriptions on visual attention. Then, to facilitate the development of saliency prediction models considering text influence, we construct a benchmark for the established SJTU-TIS database using state-of-the-art saliency models. Finally, considering the effect of text descriptions on visual attention, while most existing saliency models ignore this impact, we further propose a text-guided saliency (TGSal) prediction model, which extracts and integrates both image features and text features to predict the image saliency under various text-description conditions. Our proposed model significantly outperforms the state-of-the-art saliency models on both the SJTU-TIS database and the pure image saliency databases in terms of various evaluation metrics. The SJTU-TIS database and the code of the proposed TGSal model will be released at: \n<uri>https://github.com/IntMeGroup/TGSal</uri>\n.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5392-5407"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10689368/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The analysis and prediction of visual attention have long been crucial tasks in the fields of computer vision and image processing. In practical applications, images are generally accompanied by various text descriptions, however, few studies have explored the influence of text descriptions on visual attention, let alone developed visual saliency prediction models considering text guidance. In this paper, we conduct a comprehensive study on text-guided image saliency (TIS) from both subjective and objective perspectives. Specifically, we construct a TIS database named SJTU-TIS, which includes 1200 text-image pairs and the corresponding collected eye-tracking data. Based on the established SJTU-TIS database, we analyze the influence of various text descriptions on visual attention. Then, to facilitate the development of saliency prediction models considering text influence, we construct a benchmark for the established SJTU-TIS database using state-of-the-art saliency models. Finally, considering the effect of text descriptions on visual attention, while most existing saliency models ignore this impact, we further propose a text-guided saliency (TGSal) prediction model, which extracts and integrates both image features and text features to predict the image saliency under various text-description conditions. Our proposed model significantly outperforms the state-of-the-art saliency models on both the SJTU-TIS database and the pure image saliency databases in terms of various evaluation metrics. The SJTU-TIS database and the code of the proposed TGSal model will be released at: https://github.com/IntMeGroup/TGSal .
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
文字引导如何影响视觉注意力?数据库和模型。
长期以来,视觉注意力的分析和预测一直是计算机视觉和图像处理领域的重要任务。在实际应用中,图像一般都会伴有各种文字描述,但很少有研究探讨文字描述对视觉注意力的影响,更不用说开发考虑文字引导的视觉突出预测模型了。在本文中,我们从主观和客观两个角度对文本引导的图像显著性(TIS)进行了全面研究。具体来说,我们构建了一个名为 SJTU-TIS 的 TIS 数据库,其中包括 1200 个文本-图像对和相应的眼动数据。基于已建立的 SJTU-TIS 数据库,我们分析了各种文字描述对视觉注意力的影响。然后,为了促进考虑文本影响的显著性预测模型的开发,我们使用最先进的显著性模型为已建立的 SJTU-TIS 数据库构建了一个基准。最后,考虑到文本描述对视觉注意力的影响,而大多数现有的显著性模型都忽略了这一影响,我们进一步提出了文本引导的显著性(TGSal)预测模型,该模型提取并整合了图像特征和文本特征,以预测各种文本描述条件下的图像显著性。在 SJTU-TIS 数据库和纯图像突出度数据库上,我们提出的模型在各种评价指标上都明显优于最先进的突出度模型。SJTU-TIS 数据库和 TGSal 模型的代码将在以下网站发布:https://github.com/IntMeGroup/TGSal。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
EvRepSL: Event-Stream Representation via Self-Supervised Learning for Event-Based Vision DeepDuoHDR: A Low Complexity Two Exposure Algorithm for HDR Deghosting on Mobile Devices AnySR: Realizing Image Super-Resolution as Any-Scale, Any-Resource Enhanced Multispectral Band-to-Band Registration Using Co-Occurrence Scale Space and Spatial Confined RANSAC Guided Segmented Affine Transformation Pro2Diff: Proposal Propagation for Multi-Object Tracking via the Diffusion Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1