Multimodal heterogeneous graph entity-level fusion for named entity recognition with multi-granularity visual guidance

Yunchao Gong, Xueqiang Lv, Zhu Yuan, ZhaoJun Wang, Feng Hu, Xindong You
{"title":"Multimodal heterogeneous graph entity-level fusion for named entity recognition with multi-granularity visual guidance","authors":"Yunchao Gong, Xueqiang Lv, Zhu Yuan, ZhaoJun Wang, Feng Hu, Xindong You","doi":"10.1007/s11227-024-06347-8","DOIUrl":null,"url":null,"abstract":"<p>Multimodal named entity recognition (MNER) is an emerging foundational task in natural language processing. However, existing methods have two main limitations: 1) previous methods have focused on the visual representation of the entire image or target objects. However, they overlook the fine-grained semantic correspondence between entities and visual target objects, or ignore the visual cues of the overall scene and background details in the image. 2) Existing methods have not effectively overcome the semantic gap between different modalities due to the heterogeneity between text and images. To address these issues, we propose a novel multimodal heterogeneous graph entity-level fusion method for MNER (HGMVG) to achieve cross-modal feature interaction from coarse to fine between text and images under the guidance of visual information at different granularities, which can improve the accuracy of named entity recognition. Specifically, to resolve the first issue, we cascade cross-modal semantic interaction information between text and vision at different visual granularities to obtain a comprehensive and effective multimodal representation. For the second issue, we describe the precise semantic correspondences between entity-level words and visual target objects via multimodal heterogeneous graphs, and utilize heterogeneous graph attention networks to achieve cross-modal fine-grained semantic interactions. We conduct extensive experiments on two publicly available Twitter datasets, and the experimental results demonstrate that HGMVG outperforms the current state-of-the-art models in the MNER task.</p>","PeriodicalId":501596,"journal":{"name":"The Journal of Supercomputing","volume":"48 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11227-024-06347-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Multimodal named entity recognition (MNER) is an emerging foundational task in natural language processing. However, existing methods have two main limitations: 1) previous methods have focused on the visual representation of the entire image or target objects. However, they overlook the fine-grained semantic correspondence between entities and visual target objects, or ignore the visual cues of the overall scene and background details in the image. 2) Existing methods have not effectively overcome the semantic gap between different modalities due to the heterogeneity between text and images. To address these issues, we propose a novel multimodal heterogeneous graph entity-level fusion method for MNER (HGMVG) to achieve cross-modal feature interaction from coarse to fine between text and images under the guidance of visual information at different granularities, which can improve the accuracy of named entity recognition. Specifically, to resolve the first issue, we cascade cross-modal semantic interaction information between text and vision at different visual granularities to obtain a comprehensive and effective multimodal representation. For the second issue, we describe the precise semantic correspondences between entity-level words and visual target objects via multimodal heterogeneous graphs, and utilize heterogeneous graph attention networks to achieve cross-modal fine-grained semantic interactions. We conduct extensive experiments on two publicly available Twitter datasets, and the experimental results demonstrate that HGMVG outperforms the current state-of-the-art models in the MNER task.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用多粒度视觉引导进行命名实体识别的多模态异构图实体级融合
多模态命名实体识别(MNER)是自然语言处理中一项新兴的基础任务。然而,现有方法有两个主要局限:1) 以往的方法侧重于整个图像或目标对象的视觉表示。但是,它们忽略了实体与视觉目标对象之间的细粒度语义对应关系,或者忽略了图像中整体场景和背景细节的视觉线索。2) 由于文本和图像之间的异质性,现有方法无法有效克服不同模态之间的语义差距。针对这些问题,我们提出了一种新颖的 MNER 多模态异构图实体级融合方法(HGMVG),在不同粒度的视觉信息指导下,实现文本与图像之间从粗到细的跨模态特征交互,从而提高命名实体识别的准确率。具体来说,为了解决第一个问题,我们在不同的视觉粒度上级联文本与视觉之间的跨模态语义交互信息,从而获得全面有效的多模态表征。针对第二个问题,我们通过多模态异构图描述了实体层词语与视觉目标对象之间的精确语义对应关系,并利用异构图注意力网络实现了跨模态细粒度语义交互。我们在两个公开的 Twitter 数据集上进行了大量实验,实验结果表明 HGMVG 在 MNER 任务中的表现优于目前最先进的模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A quadratic regression model to quantify certain latest corona treatment drug molecules based on coindices of M-polynomial Data integration from traditional to big data: main features and comparisons of ETL approaches End-to-end probability analysis method for multi-core distributed systems A cloud computing approach to superscale colored traveling salesman problems Approximating neural distinguishers using differential-linear imbalance
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1