Visual sentiment analysis with semantic correlation enhancement

IF 5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Complex & Intelligent Systems Pub Date : 2023-12-19 DOI:10.1007/s40747-023-01296-w

{"title":"Visual sentiment analysis with semantic correlation enhancement","authors":"","doi":"10.1007/s40747-023-01296-w","DOIUrl":null,"url":null,"abstract":"<h3>Abstract</h3> <p>Visual sentiment analysis is in great demand as it provides a computational method to recognize sentiment information in abundant visual contents from social media sites. Most of existing methods use CNNs to extract varying visual attributes for image sentiment prediction, but they failed to comprehensively consider the correlation among visual components, and are limited by the receptive field of convolutional layers as a result. In this work, we propose a visual semantic correlation network VSCNet, a Transformer-based visual sentiment prediction model. Precisely, global visual features are captured through an extended attention network stacked by a well-designed extended attention mechanism like Transformer. An off-the-shelf object query tool is used to determine the local candidates of potential affective regions, by which redundant and noisy visual proposals are filtered out. All candidates considered affective are embedded into a computable semantic space. Finally, a fusion strategy integrates semantic representations and visual features for sentiment analysis. Extensive experiments reveal that our method outperforms previous studies on 5 annotated public image sentiment datasets without any training tricks. More specifically, it achieves 1.8% higher accuracy on FI benchmark compared with other state-of-the-art methods.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"8 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-023-01296-w","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Visual sentiment analysis is in great demand as it provides a computational method to recognize sentiment information in abundant visual contents from social media sites. Most of existing methods use CNNs to extract varying visual attributes for image sentiment prediction, but they failed to comprehensively consider the correlation among visual components, and are limited by the receptive field of convolutional layers as a result. In this work, we propose a visual semantic correlation network VSCNet, a Transformer-based visual sentiment prediction model. Precisely, global visual features are captured through an extended attention network stacked by a well-designed extended attention mechanism like Transformer. An off-the-shelf object query tool is used to determine the local candidates of potential affective regions, by which redundant and noisy visual proposals are filtered out. All candidates considered affective are embedded into a computable semantic space. Finally, a fusion strategy integrates semantic representations and visual features for sentiment analysis. Extensive experiments reveal that our method outperforms previous studies on 5 annotated public image sentiment datasets without any training tricks. More specifically, it achieves 1.8% higher accuracy on FI benchmark compared with other state-of-the-art methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过语义相关性增强进行视觉情感分析

摘要视觉情感分析为从社交媒体网站的丰富视觉内容中识别情感信息提供了一种计算方法，因此需求量很大。现有方法大多使用 CNN 提取不同的视觉属性进行图像情感预测，但这些方法未能全面考虑视觉成分之间的相关性，因此受到卷积层感受野的限制。在这项工作中，我们提出了视觉语义关联网络 VSCNet，这是一种基于变换器的视觉情感预测模型。确切地说，全局视觉特征是通过一个由精心设计的扩展注意力机制（如 Transformer）堆叠而成的扩展注意力网络来捕捉的。使用现成的对象查询工具来确定潜在情感区域的本地候选对象，从而过滤掉冗余和嘈杂的视觉建议。所有被认为具有情感的候选对象都被嵌入到一个可计算的语义空间中。最后，融合策略整合了语义表征和视觉特征，用于情感分析。广泛的实验表明，在 5 个有注释的公共图像情感数据集上，我们的方法在没有任何训练技巧的情况下优于之前的研究。更具体地说，与其他最先进的方法相比，我们的方法在 FI 基准上的准确率提高了 1.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Complex & Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

9.60

自引率

10.30%

发文量

297

期刊介绍： Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.