多模态事件检测的多幅图像集中注意力

Yangyang Li, Jun Li, Hao Jin, Liang Peng
{"title":"多模态事件检测的多幅图像集中注意力","authors":"Yangyang Li, Jun Li, Hao Jin, Liang Peng","doi":"10.1145/3469877.3495642","DOIUrl":null,"url":null,"abstract":"Multimodal social event detection has been attracting tremendous research attention in recent years, due to that it provides comprehensive and complementary understanding of social events and is important to public security and administration. Most existing works have been focusing on the fusion of multimodal information, especially for single image and text fusion. Such single image-text pair processing breaks the correlations between images of the same post and may affect the accuracy of event detection. In this work, we propose to focus attention across multiple images for multimodal event detection, which is also more reasonable for tweets with short text and multiple images. Towards this end, we elaborate a novel Multi-Image Focusing Network (MIFN) to connect text content with visual aspects in multiple images. Our MIFN consists of a feature extractor, a multi-focal network and an event classifier. The multi-focal network implements a focal attention across all the images, and fuses the most related regions with texts as multimodal representation. The event classifier finally predict the social event class based on the multimodal representations. To evaluate the effectiveness of our proposed approach, we conduct extensive experiments on a commonly-used disaster dataset. The experimental results demonstrate that, in both humanitarian event detection task and its variant of hurricane disaster, the proposed MIFN outperforms all the baselines. The ablation studies also exhibit the ability to filter the irrelevant regions across images which results in improving the accuracy of multimodal event detection.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Focusing Attention across Multiple Images for Multimodal Event Detection\",\"authors\":\"Yangyang Li, Jun Li, Hao Jin, Liang Peng\",\"doi\":\"10.1145/3469877.3495642\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multimodal social event detection has been attracting tremendous research attention in recent years, due to that it provides comprehensive and complementary understanding of social events and is important to public security and administration. Most existing works have been focusing on the fusion of multimodal information, especially for single image and text fusion. Such single image-text pair processing breaks the correlations between images of the same post and may affect the accuracy of event detection. In this work, we propose to focus attention across multiple images for multimodal event detection, which is also more reasonable for tweets with short text and multiple images. Towards this end, we elaborate a novel Multi-Image Focusing Network (MIFN) to connect text content with visual aspects in multiple images. Our MIFN consists of a feature extractor, a multi-focal network and an event classifier. The multi-focal network implements a focal attention across all the images, and fuses the most related regions with texts as multimodal representation. The event classifier finally predict the social event class based on the multimodal representations. To evaluate the effectiveness of our proposed approach, we conduct extensive experiments on a commonly-used disaster dataset. The experimental results demonstrate that, in both humanitarian event detection task and its variant of hurricane disaster, the proposed MIFN outperforms all the baselines. The ablation studies also exhibit the ability to filter the irrelevant regions across images which results in improving the accuracy of multimodal event detection.\",\"PeriodicalId\":210974,\"journal\":{\"name\":\"ACM Multimedia Asia\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Multimedia Asia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3469877.3495642\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469877.3495642","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,多模式社会事件侦查因其提供了对社会事件的全面和互补的认识,对公共安全与行政管理具有重要意义而受到了极大的关注。现有的研究大多集中在多模态信息的融合,特别是单幅图像和文本的融合。这种单一的图像-文本对处理破坏了同一帖子图像之间的相关性,可能会影响事件检测的准确性。在这项工作中,我们提出将注意力集中在多幅图像上进行多模态事件检测,这对于文本短、图像多的推文来说也更合理。为此,我们设计了一种新的多图像聚焦网络(MIFN),将多图像中的文本内容与视觉方面联系起来。我们的MIFN由一个特征提取器、一个多焦点网络和一个事件分类器组成。多焦点网络实现了对所有图像的焦点关注,并将与文本最相关的区域融合为多模态表示。事件分类器最后根据多模态表示来预测社会事件的类别。为了评估我们提出的方法的有效性,我们在一个常用的灾难数据集上进行了大量的实验。实验结果表明,在人道主义事件检测任务及其飓风灾害变体中,所提出的MIFN都优于所有基线。消融研究还显示了过滤图像中不相关区域的能力,从而提高了多模态事件检测的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Focusing Attention across Multiple Images for Multimodal Event Detection
Multimodal social event detection has been attracting tremendous research attention in recent years, due to that it provides comprehensive and complementary understanding of social events and is important to public security and administration. Most existing works have been focusing on the fusion of multimodal information, especially for single image and text fusion. Such single image-text pair processing breaks the correlations between images of the same post and may affect the accuracy of event detection. In this work, we propose to focus attention across multiple images for multimodal event detection, which is also more reasonable for tweets with short text and multiple images. Towards this end, we elaborate a novel Multi-Image Focusing Network (MIFN) to connect text content with visual aspects in multiple images. Our MIFN consists of a feature extractor, a multi-focal network and an event classifier. The multi-focal network implements a focal attention across all the images, and fuses the most related regions with texts as multimodal representation. The event classifier finally predict the social event class based on the multimodal representations. To evaluate the effectiveness of our proposed approach, we conduct extensive experiments on a commonly-used disaster dataset. The experimental results demonstrate that, in both humanitarian event detection task and its variant of hurricane disaster, the proposed MIFN outperforms all the baselines. The ablation studies also exhibit the ability to filter the irrelevant regions across images which results in improving the accuracy of multimodal event detection.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Multi-Scale Graph Convolutional Network and Dynamic Iterative Class Loss for Ship Segmentation in Remote Sensing Images Structural Knowledge Organization and Transfer for Class-Incremental Learning Hard-Boundary Attention Network for Nuclei Instance Segmentation Score Transformer: Generating Musical Score from Note-level Representation CMRD-Net: An Improved Method for Underwater Image Enhancement
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1