SCATE:用于多模态假新闻检测的共享交叉注意转换器编码器

Tanmay Sachan, Nikhil Pinnaparaju, Manish Gupta, Vasudeva Varma
{"title":"SCATE:用于多模态假新闻检测的共享交叉注意转换器编码器","authors":"Tanmay Sachan, Nikhil Pinnaparaju, Manish Gupta, Vasudeva Varma","doi":"10.1145/3487351.3490965","DOIUrl":null,"url":null,"abstract":"Social media platforms have democratized the publication process resulting into easy and viral propagation of information. Oftentimes this misinformation is accompanied by misleading or doctored images that quickly circulate across the internet and reach many unsuspecting users. Several manual as well as automated efforts have been undertaken in the past to solve this critical problem. While manual efforts cannot keep up with the rate at which this content is churned out, many automated approaches only leverage concatenation (of the image and text representations) thereby failing to build effective crossmodal embeddings. Architectures like this fail in many cases because the text or image doesn't need to be false for the corresponding text, image pair to be misinformation. While some recent work attempts to use attention techniques to compute a crossmodal representation using pretrained text and image embeddings, we show a more effective approach towards utilizing such pretrained embeddings to build richer representations that can be classified better. This involves several challenges like how to handle text variations on Twitter and Weibo, how to encode the image information and how to leverage the text and image encodings together effectively. Our architecture, SCATE (Shared Cross Attention Transformer Encoders), leverages deep convolutional neural networks and transformer-based methods to encode image and text information utilizing crossmodal attention and shared layers for the two modalities. Our experiments with three popular benchmark datasets (Twitter, WeiboA and WeiboB) show that our proposed methods outperform the state-of-the-art methods by approximately three percentage points on all three datasets.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"SCATE: shared cross attention transformer encoders for multimodal fake news detection\",\"authors\":\"Tanmay Sachan, Nikhil Pinnaparaju, Manish Gupta, Vasudeva Varma\",\"doi\":\"10.1145/3487351.3490965\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social media platforms have democratized the publication process resulting into easy and viral propagation of information. Oftentimes this misinformation is accompanied by misleading or doctored images that quickly circulate across the internet and reach many unsuspecting users. Several manual as well as automated efforts have been undertaken in the past to solve this critical problem. While manual efforts cannot keep up with the rate at which this content is churned out, many automated approaches only leverage concatenation (of the image and text representations) thereby failing to build effective crossmodal embeddings. Architectures like this fail in many cases because the text or image doesn't need to be false for the corresponding text, image pair to be misinformation. While some recent work attempts to use attention techniques to compute a crossmodal representation using pretrained text and image embeddings, we show a more effective approach towards utilizing such pretrained embeddings to build richer representations that can be classified better. This involves several challenges like how to handle text variations on Twitter and Weibo, how to encode the image information and how to leverage the text and image encodings together effectively. Our architecture, SCATE (Shared Cross Attention Transformer Encoders), leverages deep convolutional neural networks and transformer-based methods to encode image and text information utilizing crossmodal attention and shared layers for the two modalities. Our experiments with three popular benchmark datasets (Twitter, WeiboA and WeiboB) show that our proposed methods outperform the state-of-the-art methods by approximately three percentage points on all three datasets.\",\"PeriodicalId\":320904,\"journal\":{\"name\":\"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3487351.3490965\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3487351.3490965","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

社交媒体平台使出版过程民主化,从而使信息的传播变得容易和病毒式传播。通常情况下,这些错误信息伴随着误导或篡改的图像,迅速在互联网上传播,并到达许多毫无戒心的用户。过去已经进行了一些手动和自动化的工作来解决这个关键问题。虽然手工工作无法跟上内容的生产速度,但许多自动化方法只利用连接(图像和文本表示),因此无法构建有效的跨模式嵌入。这样的架构在很多情况下会失败,因为文本或图像不需要为假,对应的文本或图像对就会是错误的信息。虽然最近的一些工作试图使用注意力技术来计算使用预训练的文本和图像嵌入的跨模态表示,但我们展示了一种更有效的方法来利用这种预训练的嵌入来构建更丰富的表示,可以更好地分类。这涉及到几个挑战,比如如何处理Twitter和微博上的文本变化,如何对图像信息进行编码,以及如何有效地利用文本和图像编码。我们的架构SCATE(共享交叉注意转换器编码器)利用深度卷积神经网络和基于转换器的方法,利用跨模态注意和两种模态的共享层对图像和文本信息进行编码。我们对三个流行的基准数据集(Twitter、WeiboA和WeiboB)进行的实验表明,我们提出的方法在所有三个数据集上的性能都比最先进的方法高出大约三个百分点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SCATE: shared cross attention transformer encoders for multimodal fake news detection
Social media platforms have democratized the publication process resulting into easy and viral propagation of information. Oftentimes this misinformation is accompanied by misleading or doctored images that quickly circulate across the internet and reach many unsuspecting users. Several manual as well as automated efforts have been undertaken in the past to solve this critical problem. While manual efforts cannot keep up with the rate at which this content is churned out, many automated approaches only leverage concatenation (of the image and text representations) thereby failing to build effective crossmodal embeddings. Architectures like this fail in many cases because the text or image doesn't need to be false for the corresponding text, image pair to be misinformation. While some recent work attempts to use attention techniques to compute a crossmodal representation using pretrained text and image embeddings, we show a more effective approach towards utilizing such pretrained embeddings to build richer representations that can be classified better. This involves several challenges like how to handle text variations on Twitter and Weibo, how to encode the image information and how to leverage the text and image encodings together effectively. Our architecture, SCATE (Shared Cross Attention Transformer Encoders), leverages deep convolutional neural networks and transformer-based methods to encode image and text information utilizing crossmodal attention and shared layers for the two modalities. Our experiments with three popular benchmark datasets (Twitter, WeiboA and WeiboB) show that our proposed methods outperform the state-of-the-art methods by approximately three percentage points on all three datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Predicting COVID-19 with AI techniques: current research and future directions Predictions of drug metabolism pathways through CYP 3A4 enzyme by analysing drug-target interactions network graph An insight into network structure measures and number of driver nodes Temporal dynamics of posts and user engagement of influencers on Facebook and Instagram Vibe check: social resonance learning for enhanced recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1