CRISP: A cross-modal integration framework based on the surprisingly popular algorithm for multimodal named entity recognition

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neurocomputing Pub Date : 2024-10-28 DOI:10.1016/j.neucom.2024.128792
Haitao Liu , Xianwei Xin , Jihua Song , Weiming Peng
{"title":"CRISP: A cross-modal integration framework based on the surprisingly popular algorithm for multimodal named entity recognition","authors":"Haitao Liu ,&nbsp;Xianwei Xin ,&nbsp;Jihua Song ,&nbsp;Weiming Peng","doi":"10.1016/j.neucom.2024.128792","DOIUrl":null,"url":null,"abstract":"<div><div>The multimodal named entity recognition task on social media involves recognizing named entities with textual and visual information, which is of great significance for information processing. Nevertheless, many existing models still face the following challenges. First, in the process of cross-modal interaction, the attention mechanism sometimes focuses on trivial parts in the images that are not relevant to entities, which not only neglects valuable information but also inevitably introduces visual noise. Second, the gate mechanism is widely used for filtering out visual information to reduce the influence of noise on text understanding. However, the gate mechanism neglects capturing fine-grained semantic relevance between modalities, which easily affects the filtration process. To address these issues, we propose a cross-modal integration framework based on the surprisingly popular algorithm, aiming at enhancing the integration of effective visual guidance and reducing the interference of irrelevant visual noise. Specifically, we design a dual-branch interaction module that includes the attention mechanism and the surprisingly popular algorithm, allowing the model to focus on valuable but overlooked parts in the images. Furthermore, we compute the matching degree between modalities at the multi-granularity level, using the Choquet integral to establish a more reasonable basis for filtering out visual noise. We have conducted extensive experiments on public datasets, and the experimental results demonstrate the advantages of our model.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128792"},"PeriodicalIF":5.5000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224015637","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The multimodal named entity recognition task on social media involves recognizing named entities with textual and visual information, which is of great significance for information processing. Nevertheless, many existing models still face the following challenges. First, in the process of cross-modal interaction, the attention mechanism sometimes focuses on trivial parts in the images that are not relevant to entities, which not only neglects valuable information but also inevitably introduces visual noise. Second, the gate mechanism is widely used for filtering out visual information to reduce the influence of noise on text understanding. However, the gate mechanism neglects capturing fine-grained semantic relevance between modalities, which easily affects the filtration process. To address these issues, we propose a cross-modal integration framework based on the surprisingly popular algorithm, aiming at enhancing the integration of effective visual guidance and reducing the interference of irrelevant visual noise. Specifically, we design a dual-branch interaction module that includes the attention mechanism and the surprisingly popular algorithm, allowing the model to focus on valuable but overlooked parts in the images. Furthermore, we compute the matching degree between modalities at the multi-granularity level, using the Choquet integral to establish a more reasonable basis for filtering out visual noise. We have conducted extensive experiments on public datasets, and the experimental results demonstrate the advantages of our model.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CRISP:基于多模态命名实体识别惊人流行算法的跨模态整合框架
社交媒体上的多模态命名实体识别任务涉及识别带有文本和视觉信息的命名实体,这对信息处理具有重要意义。然而,许多现有模型仍面临以下挑战。首先,在跨模态交互过程中,注意力机制有时会关注图像中与实体无关的琐碎部分,这不仅会忽略有价值的信息,还不可避免地会引入视觉噪声。其次,门机制被广泛用于过滤视觉信息,以减少噪声对文本理解的影响。然而,门机制忽略了捕捉模态之间细粒度的语义相关性,这很容易影响过滤过程。为了解决这些问题,我们提出了一种基于惊人算法的跨模态整合框架,旨在加强有效视觉引导的整合,减少无关视觉噪声的干扰。具体来说,我们设计了一个双分支交互模块,其中包括注意力机制和令人惊讶的流行算法,使模型能够关注图像中有价值但被忽视的部分。此外,我们在多粒度水平上计算模态之间的匹配度,利用乔奎特积分为过滤视觉噪声建立更合理的基础。我们在公共数据集上进行了大量实验,实验结果证明了我们模型的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
期刊最新文献
Editorial Board Virtual sample generation for small sample learning: A survey, recent developments and future prospects Adaptive selection of spectral–spatial features for hyperspectral image classification using a modified-CBAM-based network FPGA-based component-wise LSTM training accelerator for neural granger causality analysis Multi-sensor information fusion in Internet of Vehicles based on deep learning: A review
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1