How is Visual Attention Influenced by Text Guidance? Database and Model

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2024-09-23 DOI:10.1109/TIP.2024.3461956

Yinan Sun;Xiongkuo Min;Huiyu Duan;Guangtao Zhai

{"title":"How is Visual Attention Influenced by Text Guidance? Database and Model","authors":"Yinan Sun;Xiongkuo Min;Huiyu Duan;Guangtao Zhai","doi":"10.1109/TIP.2024.3461956","DOIUrl":null,"url":null,"abstract":"The analysis and prediction of visual attention have long been crucial tasks in the fields of computer vision and image processing. In practical applications, images are generally accompanied by various text descriptions, however, few studies have explored the influence of text descriptions on visual attention, let alone developed visual saliency prediction models considering text guidance. In this paper, we conduct a comprehensive study on text-guided image saliency (TIS) from both subjective and objective perspectives. Specifically, we construct a TIS database named SJTU-TIS, which includes 1200 text-image pairs and the corresponding collected eye-tracking data. Based on the established SJTU-TIS database, we analyze the influence of various text descriptions on visual attention. Then, to facilitate the development of saliency prediction models considering text influence, we construct a benchmark for the established SJTU-TIS database using state-of-the-art saliency models. Finally, considering the effect of text descriptions on visual attention, while most existing saliency models ignore this impact, we further propose a text-guided saliency (TGSal) prediction model, which extracts and integrates both image features and text features to predict the image saliency under various text-description conditions. Our proposed model significantly outperforms the state-of-the-art saliency models on both the SJTU-TIS database and the pure image saliency databases in terms of various evaluation metrics. The SJTU-TIS database and the code of the proposed TGSal model will be released at: \n<uri>https://github.com/IntMeGroup/TGSal</uri>\n.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5392-5407"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10689368/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The analysis and prediction of visual attention have long been crucial tasks in the fields of computer vision and image processing. In practical applications, images are generally accompanied by various text descriptions, however, few studies have explored the influence of text descriptions on visual attention, let alone developed visual saliency prediction models considering text guidance. In this paper, we conduct a comprehensive study on text-guided image saliency (TIS) from both subjective and objective perspectives. Specifically, we construct a TIS database named SJTU-TIS, which includes 1200 text-image pairs and the corresponding collected eye-tracking data. Based on the established SJTU-TIS database, we analyze the influence of various text descriptions on visual attention. Then, to facilitate the development of saliency prediction models considering text influence, we construct a benchmark for the established SJTU-TIS database using state-of-the-art saliency models. Finally, considering the effect of text descriptions on visual attention, while most existing saliency models ignore this impact, we further propose a text-guided saliency (TGSal) prediction model, which extracts and integrates both image features and text features to predict the image saliency under various text-description conditions. Our proposed model significantly outperforms the state-of-the-art saliency models on both the SJTU-TIS database and the pure image saliency databases in terms of various evaluation metrics. The SJTU-TIS database and the code of the proposed TGSal model will be released at: https://github.com/IntMeGroup/TGSal .

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

文字引导如何影响视觉注意力？数据库和模型。

长期以来，视觉注意力的分析和预测一直是计算机视觉和图像处理领域的重要任务。在实际应用中，图像一般都会伴有各种文字描述，但很少有研究探讨文字描述对视觉注意力的影响，更不用说开发考虑文字引导的视觉突出预测模型了。在本文中，我们从主观和客观两个角度对文本引导的图像显著性（TIS）进行了全面研究。具体来说，我们构建了一个名为 SJTU-TIS 的 TIS 数据库，其中包括 1200 个文本-图像对和相应的眼动数据。基于已建立的 SJTU-TIS 数据库，我们分析了各种文字描述对视觉注意力的影响。然后，为了促进考虑文本影响的显著性预测模型的开发，我们使用最先进的显著性模型为已建立的 SJTU-TIS 数据库构建了一个基准。最后，考虑到文本描述对视觉注意力的影响，而大多数现有的显著性模型都忽略了这一影响，我们进一步提出了文本引导的显著性（TGSal）预测模型，该模型提取并整合了图像特征和文本特征，以预测各种文本描述条件下的图像显著性。在 SJTU-TIS 数据库和纯图像突出度数据库上，我们提出的模型在各种评价指标上都明显优于最先进的突出度模型。SJTU-TIS 数据库和 TGSal 模型的代码将在以下网站发布：https://github.com/IntMeGroup/TGSal。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量