The increasing proliferation of harmful memes has a serious negative impact on society, rendering the detection of such memes a formidable challenge. Prior research has predominantly concentrated on the modal and semantic attributes of memes while neglecting the significance of cross-modal interactions and detailed semantic information. Although some approaches have incorporated large language models, they often have the problem of harmful avoidance due to ethical constraints. To address these issues, we propose a novel sentiment-aware cross-modal semantic interaction detector, which delves into the profound implications through three principal dimensions: semantic extraction, modal interaction, and sentiment polarity assessment. In the semantic extraction module, Visual Question-Answering is utilized to incorporate detailed knowledge and descriptions. For modal interaction, the positional relationships between meme objects and texts are investigated, and a distance-based attentional multimodal detector is established. In the sentiment polarity module, the sentiment polarity of the text is judged. These components are integrated to form a cohesive joint detection system. Extensive experiments across three benchmark datasets demonstrate SSID significantly outperforms state-of-the-art baselines, enhancing detection accuracy and exhibiting robustness.
扫码关注我们
求助内容:
应助结果提醒方式:
