{"title":"SCATE:用于多模态假新闻检测的共享交叉注意转换器编码器","authors":"Tanmay Sachan, Nikhil Pinnaparaju, Manish Gupta, Vasudeva Varma","doi":"10.1145/3487351.3490965","DOIUrl":null,"url":null,"abstract":"Social media platforms have democratized the publication process resulting into easy and viral propagation of information. Oftentimes this misinformation is accompanied by misleading or doctored images that quickly circulate across the internet and reach many unsuspecting users. Several manual as well as automated efforts have been undertaken in the past to solve this critical problem. While manual efforts cannot keep up with the rate at which this content is churned out, many automated approaches only leverage concatenation (of the image and text representations) thereby failing to build effective crossmodal embeddings. Architectures like this fail in many cases because the text or image doesn't need to be false for the corresponding text, image pair to be misinformation. While some recent work attempts to use attention techniques to compute a crossmodal representation using pretrained text and image embeddings, we show a more effective approach towards utilizing such pretrained embeddings to build richer representations that can be classified better. This involves several challenges like how to handle text variations on Twitter and Weibo, how to encode the image information and how to leverage the text and image encodings together effectively. Our architecture, SCATE (Shared Cross Attention Transformer Encoders), leverages deep convolutional neural networks and transformer-based methods to encode image and text information utilizing crossmodal attention and shared layers for the two modalities. Our experiments with three popular benchmark datasets (Twitter, WeiboA and WeiboB) show that our proposed methods outperform the state-of-the-art methods by approximately three percentage points on all three datasets.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"SCATE: shared cross attention transformer encoders for multimodal fake news detection\",\"authors\":\"Tanmay Sachan, Nikhil Pinnaparaju, Manish Gupta, Vasudeva Varma\",\"doi\":\"10.1145/3487351.3490965\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social media platforms have democratized the publication process resulting into easy and viral propagation of information. Oftentimes this misinformation is accompanied by misleading or doctored images that quickly circulate across the internet and reach many unsuspecting users. Several manual as well as automated efforts have been undertaken in the past to solve this critical problem. While manual efforts cannot keep up with the rate at which this content is churned out, many automated approaches only leverage concatenation (of the image and text representations) thereby failing to build effective crossmodal embeddings. Architectures like this fail in many cases because the text or image doesn't need to be false for the corresponding text, image pair to be misinformation. While some recent work attempts to use attention techniques to compute a crossmodal representation using pretrained text and image embeddings, we show a more effective approach towards utilizing such pretrained embeddings to build richer representations that can be classified better. This involves several challenges like how to handle text variations on Twitter and Weibo, how to encode the image information and how to leverage the text and image encodings together effectively. Our architecture, SCATE (Shared Cross Attention Transformer Encoders), leverages deep convolutional neural networks and transformer-based methods to encode image and text information utilizing crossmodal attention and shared layers for the two modalities. Our experiments with three popular benchmark datasets (Twitter, WeiboA and WeiboB) show that our proposed methods outperform the state-of-the-art methods by approximately three percentage points on all three datasets.\",\"PeriodicalId\":320904,\"journal\":{\"name\":\"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3487351.3490965\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3487351.3490965","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Social media platforms have democratized the publication process resulting into easy and viral propagation of information. Oftentimes this misinformation is accompanied by misleading or doctored images that quickly circulate across the internet and reach many unsuspecting users. Several manual as well as automated efforts have been undertaken in the past to solve this critical problem. While manual efforts cannot keep up with the rate at which this content is churned out, many automated approaches only leverage concatenation (of the image and text representations) thereby failing to build effective crossmodal embeddings. Architectures like this fail in many cases because the text or image doesn't need to be false for the corresponding text, image pair to be misinformation. While some recent work attempts to use attention techniques to compute a crossmodal representation using pretrained text and image embeddings, we show a more effective approach towards utilizing such pretrained embeddings to build richer representations that can be classified better. This involves several challenges like how to handle text variations on Twitter and Weibo, how to encode the image information and how to leverage the text and image encodings together effectively. Our architecture, SCATE (Shared Cross Attention Transformer Encoders), leverages deep convolutional neural networks and transformer-based methods to encode image and text information utilizing crossmodal attention and shared layers for the two modalities. Our experiments with three popular benchmark datasets (Twitter, WeiboA and WeiboB) show that our proposed methods outperform the state-of-the-art methods by approximately three percentage points on all three datasets.