{"title":"Multi-Modal Sarcasm Detection with Prompt-Tuning","authors":"Daijun Ding, Hutchin Huang, Bowen Zhang, Cheng Peng, Yangyang Li, Xianghua Fu, Liwen Jing","doi":"10.1109/ACAIT56212.2022.10137937","DOIUrl":null,"url":null,"abstract":"Sarcasm is a meaningful and effective form of expression which people often use to express sentiments that are contrary to their literal meaning. It is fairly common to encounter such expressions on social media platforms. Comparing with the traditional approach of text sarcasm detection, multi-modal sarcasm detection is proved to be more effective when dealing with information on social networks with various forms of communication. In this work, a prompt-tuning method is proposed for multi-modal sarcasm detection (Pmt-MmSD). Specifically, to model the incongruity of text modalities, we first build a prompt-PLM network. Second, to model the text-image incongruity, an inter-modality attention network (ImAN) is designed based on self-attention mechanism. In addition, we utilize the pre-trained Vision Transformer (ViT) network to process the image modality. Extensive experiments demonstrated the effectiveness of the proposed Pmt-MmSD model for multi-modal sarcasm detection, which significantly outperforms the state-of-the-art results.","PeriodicalId":398228,"journal":{"name":"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACAIT56212.2022.10137937","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Sarcasm is a meaningful and effective form of expression which people often use to express sentiments that are contrary to their literal meaning. It is fairly common to encounter such expressions on social media platforms. Comparing with the traditional approach of text sarcasm detection, multi-modal sarcasm detection is proved to be more effective when dealing with information on social networks with various forms of communication. In this work, a prompt-tuning method is proposed for multi-modal sarcasm detection (Pmt-MmSD). Specifically, to model the incongruity of text modalities, we first build a prompt-PLM network. Second, to model the text-image incongruity, an inter-modality attention network (ImAN) is designed based on self-attention mechanism. In addition, we utilize the pre-trained Vision Transformer (ViT) network to process the image modality. Extensive experiments demonstrated the effectiveness of the proposed Pmt-MmSD model for multi-modal sarcasm detection, which significantly outperforms the state-of-the-art results.