{"title":"多模态多转弯对话姿态检测:挑战数据集和有效模型","authors":"Fuqiang Niu, Zebang Cheng, Xianghua Fu, Xiaojiang Peng, Genan Dai, Yin Chen, Hu Huang, Bowen Zhang","doi":"arxiv-2409.00597","DOIUrl":null,"url":null,"abstract":"Stance detection, which aims to identify public opinion towards specific\ntargets using social media data, is an important yet challenging task. With the\nproliferation of diverse multimodal social media content including text, and\nimages multimodal stance detection (MSD) has become a crucial research area.\nHowever, existing MSD studies have focused on modeling stance within individual\ntext-image pairs, overlooking the multi-party conversational contexts that\nnaturally occur on social media. This limitation stems from a lack of datasets\nthat authentically capture such conversational scenarios, hindering progress in\nconversational MSD. To address this, we introduce a new multimodal multi-turn\nconversational stance detection dataset (called MmMtCSD). To derive stances\nfrom this challenging dataset, we propose a novel multimodal large language\nmodel stance detection framework (MLLM-SD), that learns joint stance\nrepresentations from textual and visual modalities. Experiments on MmMtCSD show\nstate-of-the-art performance of our proposed MLLM-SD approach for multimodal\nstance detection. We believe that MmMtCSD will contribute to advancing\nreal-world applications of stance detection research.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model\",\"authors\":\"Fuqiang Niu, Zebang Cheng, Xianghua Fu, Xiaojiang Peng, Genan Dai, Yin Chen, Hu Huang, Bowen Zhang\",\"doi\":\"arxiv-2409.00597\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stance detection, which aims to identify public opinion towards specific\\ntargets using social media data, is an important yet challenging task. With the\\nproliferation of diverse multimodal social media content including text, and\\nimages multimodal stance detection (MSD) has become a crucial research area.\\nHowever, existing MSD studies have focused on modeling stance within individual\\ntext-image pairs, overlooking the multi-party conversational contexts that\\nnaturally occur on social media. This limitation stems from a lack of datasets\\nthat authentically capture such conversational scenarios, hindering progress in\\nconversational MSD. To address this, we introduce a new multimodal multi-turn\\nconversational stance detection dataset (called MmMtCSD). To derive stances\\nfrom this challenging dataset, we propose a novel multimodal large language\\nmodel stance detection framework (MLLM-SD), that learns joint stance\\nrepresentations from textual and visual modalities. Experiments on MmMtCSD show\\nstate-of-the-art performance of our proposed MLLM-SD approach for multimodal\\nstance detection. We believe that MmMtCSD will contribute to advancing\\nreal-world applications of stance detection research.\",\"PeriodicalId\":501480,\"journal\":{\"name\":\"arXiv - CS - Multimedia\",\"volume\":\"31 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.00597\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.00597","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model
Stance detection, which aims to identify public opinion towards specific
targets using social media data, is an important yet challenging task. With the
proliferation of diverse multimodal social media content including text, and
images multimodal stance detection (MSD) has become a crucial research area.
However, existing MSD studies have focused on modeling stance within individual
text-image pairs, overlooking the multi-party conversational contexts that
naturally occur on social media. This limitation stems from a lack of datasets
that authentically capture such conversational scenarios, hindering progress in
conversational MSD. To address this, we introduce a new multimodal multi-turn
conversational stance detection dataset (called MmMtCSD). To derive stances
from this challenging dataset, we propose a novel multimodal large language
model stance detection framework (MLLM-SD), that learns joint stance
representations from textual and visual modalities. Experiments on MmMtCSD show
state-of-the-art performance of our proposed MLLM-SD approach for multimodal
stance detection. We believe that MmMtCSD will contribute to advancing
real-world applications of stance detection research.