基于三重关系信息交叉注意网络的多模态仇恨模因检测

Xiaolin Liang, Yajuan Huang, Wen Liu, He Zhu, Zhao Liang, Libo Chen
{"title":"基于三重关系信息交叉注意网络的多模态仇恨模因检测","authors":"Xiaolin Liang, Yajuan Huang, Wen Liu, He Zhu, Zhao Liang, Libo Chen","doi":"10.1109/IJCNN55064.2022.9892164","DOIUrl":null,"url":null,"abstract":"Memes are spreading on social networking. Most are created to be humorous, while some become hateful with the combination of images and words, conveying negative information to people. The hateful memes detection poses an interesting multimodal fusion problem, unlike traditional multi-modal tasks, the majority of memos have images and text that are only weakly consistent or even uncorrelated, so various modalities contained in the data play an important role in predicting its results. In this paper, we attempt to work on the Facebook Meme challenge, which solves the binary classification task of predicting a meme's hatefulness or not. We extract triplet-relation information from origin OCR text features, image content features and image caption features and proposed a novel cross-attention network to address this task. TRICAN leverages object detection and image caption models to explore visual modalities to obtain “actual captions” and then combines combine origin OCR text with the multi-modal representation to perform hateful memes detection. These meme-related features are then reconstructed and fused into one feature vector for prediction. We have performed extensively experimental on multi-modal memory datasets. Experimental results demonstrate the effectiveness of TRICAN and the usefulness of triplet-relation information.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"41 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TRICAN: Multi-Modal Hateful Memes Detection with Triplet-Relation Information Cross-Attention Network\",\"authors\":\"Xiaolin Liang, Yajuan Huang, Wen Liu, He Zhu, Zhao Liang, Libo Chen\",\"doi\":\"10.1109/IJCNN55064.2022.9892164\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Memes are spreading on social networking. Most are created to be humorous, while some become hateful with the combination of images and words, conveying negative information to people. The hateful memes detection poses an interesting multimodal fusion problem, unlike traditional multi-modal tasks, the majority of memos have images and text that are only weakly consistent or even uncorrelated, so various modalities contained in the data play an important role in predicting its results. In this paper, we attempt to work on the Facebook Meme challenge, which solves the binary classification task of predicting a meme's hatefulness or not. We extract triplet-relation information from origin OCR text features, image content features and image caption features and proposed a novel cross-attention network to address this task. TRICAN leverages object detection and image caption models to explore visual modalities to obtain “actual captions” and then combines combine origin OCR text with the multi-modal representation to perform hateful memes detection. These meme-related features are then reconstructed and fused into one feature vector for prediction. We have performed extensively experimental on multi-modal memory datasets. Experimental results demonstrate the effectiveness of TRICAN and the usefulness of triplet-relation information.\",\"PeriodicalId\":106974,\"journal\":{\"name\":\"2022 International Joint Conference on Neural Networks (IJCNN)\",\"volume\":\"41 4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Joint Conference on Neural Networks (IJCNN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN55064.2022.9892164\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN55064.2022.9892164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

表情包在社交网络上传播。大多数是为了幽默而创作的,而有些则是通过图像和文字的结合而变得可恨,向人们传达负面信息。仇恨模因检测提出了一个有趣的多模态融合问题,与传统的多模态任务不同,大多数备忘录的图像和文本只有弱一致性甚至不相关,因此数据中包含的各种模态在预测其结果中起着重要作用。在本文中,我们试图研究Facebook Meme挑战,该挑战解决了预测Meme是否可恨的二元分类任务。我们从原始OCR文本特征、图像内容特征和图像标题特征中提取三重关系信息,并提出了一种新的交叉注意网络来解决这一问题。TRICAN利用目标检测和图像标题模型探索视觉模态,获得“实际标题”,然后将组合原始OCR文本与多模态表示相结合,进行仇恨模因检测。然后将这些模因相关的特征重构并融合成一个特征向量进行预测。我们在多模态记忆数据集上进行了大量的实验。实验结果证明了TRICAN的有效性和三重关系信息的实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
TRICAN: Multi-Modal Hateful Memes Detection with Triplet-Relation Information Cross-Attention Network
Memes are spreading on social networking. Most are created to be humorous, while some become hateful with the combination of images and words, conveying negative information to people. The hateful memes detection poses an interesting multimodal fusion problem, unlike traditional multi-modal tasks, the majority of memos have images and text that are only weakly consistent or even uncorrelated, so various modalities contained in the data play an important role in predicting its results. In this paper, we attempt to work on the Facebook Meme challenge, which solves the binary classification task of predicting a meme's hatefulness or not. We extract triplet-relation information from origin OCR text features, image content features and image caption features and proposed a novel cross-attention network to address this task. TRICAN leverages object detection and image caption models to explore visual modalities to obtain “actual captions” and then combines combine origin OCR text with the multi-modal representation to perform hateful memes detection. These meme-related features are then reconstructed and fused into one feature vector for prediction. We have performed extensively experimental on multi-modal memory datasets. Experimental results demonstrate the effectiveness of TRICAN and the usefulness of triplet-relation information.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Parameterization of Vector Symbolic Approach for Sequence Encoding Based Visual Place Recognition Nested compression of convolutional neural networks with Tucker-2 decomposition SQL-Rank++: A Novel Listwise Approach for Collaborative Ranking with Implicit Feedback ACTSS: Input Detection Defense against Backdoor Attacks via Activation Subset Scanning ADV-ResNet: Residual Network with Controlled Adversarial Regularization for Effective Classification of Practical Time Series Under Training Data Scarcity Problem
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1