Pankaj Gupta, Ananya Pandey, Ajeet Kumar, D. Vishwakarma
{"title":"Attention-free based dual-encoder mechanism for Aspect-based Multimodal Sentiment Recognition","authors":"Pankaj Gupta, Ananya Pandey, Ajeet Kumar, D. Vishwakarma","doi":"10.1109/APSIT58554.2023.10201711","DOIUrl":null,"url":null,"abstract":"Multimodal aspect-based sentiment recognition (MABSR) is a recently developed task in sentiment recognition that tries to assess the sentiment associated with text and image pairings by generally extracting the polarity terms from the pairs. Both the pipeline and the unified transformer based technique, which employs the cross-attention only mechanism, have been widely utilized in recent works. However, the alignment between text and picture is not openly and reliably included in these approaches. There is still a minimum threshold of aligned image-text pairings needed for supervised fine-tuning of said universal transformers for MABSR. Motivated by this observation and inspired by the various attention-only mechanisms, we analyze MABSR and propose an attention-free encoder-based transformer architecture. Dual attention-free based backbone encoder models with cross-modal symmetry are utilized in this work. To improve cross-modal performance, we include two new subtasks: aspect-only extraction and polarity feature representation alignment. This motivates both encoders to provide more precise depictions of multiple modalities.","PeriodicalId":170044,"journal":{"name":"2023 International Conference in Advances in Power, Signal, and Information Technology (APSIT)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference in Advances in Power, Signal, and Information Technology (APSIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIT58554.2023.10201711","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Multimodal aspect-based sentiment recognition (MABSR) is a recently developed task in sentiment recognition that tries to assess the sentiment associated with text and image pairings by generally extracting the polarity terms from the pairs. Both the pipeline and the unified transformer based technique, which employs the cross-attention only mechanism, have been widely utilized in recent works. However, the alignment between text and picture is not openly and reliably included in these approaches. There is still a minimum threshold of aligned image-text pairings needed for supervised fine-tuning of said universal transformers for MABSR. Motivated by this observation and inspired by the various attention-only mechanisms, we analyze MABSR and propose an attention-free encoder-based transformer architecture. Dual attention-free based backbone encoder models with cross-modal symmetry are utilized in this work. To improve cross-modal performance, we include two new subtasks: aspect-only extraction and polarity feature representation alignment. This motivates both encoders to provide more precise depictions of multiple modalities.