Haitao Liu , Xianwei Xin , Jihua Song , Weiming Peng
{"title":"CRISP: A cross-modal integration framework based on the surprisingly popular algorithm for multimodal named entity recognition","authors":"Haitao Liu , Xianwei Xin , Jihua Song , Weiming Peng","doi":"10.1016/j.neucom.2024.128792","DOIUrl":null,"url":null,"abstract":"<div><div>The multimodal named entity recognition task on social media involves recognizing named entities with textual and visual information, which is of great significance for information processing. Nevertheless, many existing models still face the following challenges. First, in the process of cross-modal interaction, the attention mechanism sometimes focuses on trivial parts in the images that are not relevant to entities, which not only neglects valuable information but also inevitably introduces visual noise. Second, the gate mechanism is widely used for filtering out visual information to reduce the influence of noise on text understanding. However, the gate mechanism neglects capturing fine-grained semantic relevance between modalities, which easily affects the filtration process. To address these issues, we propose a cross-modal integration framework based on the surprisingly popular algorithm, aiming at enhancing the integration of effective visual guidance and reducing the interference of irrelevant visual noise. Specifically, we design a dual-branch interaction module that includes the attention mechanism and the surprisingly popular algorithm, allowing the model to focus on valuable but overlooked parts in the images. Furthermore, we compute the matching degree between modalities at the multi-granularity level, using the Choquet integral to establish a more reasonable basis for filtering out visual noise. We have conducted extensive experiments on public datasets, and the experimental results demonstrate the advantages of our model.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128792"},"PeriodicalIF":5.5000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224015637","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The multimodal named entity recognition task on social media involves recognizing named entities with textual and visual information, which is of great significance for information processing. Nevertheless, many existing models still face the following challenges. First, in the process of cross-modal interaction, the attention mechanism sometimes focuses on trivial parts in the images that are not relevant to entities, which not only neglects valuable information but also inevitably introduces visual noise. Second, the gate mechanism is widely used for filtering out visual information to reduce the influence of noise on text understanding. However, the gate mechanism neglects capturing fine-grained semantic relevance between modalities, which easily affects the filtration process. To address these issues, we propose a cross-modal integration framework based on the surprisingly popular algorithm, aiming at enhancing the integration of effective visual guidance and reducing the interference of irrelevant visual noise. Specifically, we design a dual-branch interaction module that includes the attention mechanism and the surprisingly popular algorithm, allowing the model to focus on valuable but overlooked parts in the images. Furthermore, we compute the matching degree between modalities at the multi-granularity level, using the Choquet integral to establish a more reasonable basis for filtering out visual noise. We have conducted extensive experiments on public datasets, and the experimental results demonstrate the advantages of our model.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.