ZS-MNET: A zero-shot learning based approach to multimodal named entity typing

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neural Networks Pub Date : 2025-02-17 DOI:10.1016/j.neunet.2025.107264

Baohang Zhou , Ying Zhang , Kehui Song , Xuhui Sui , Yu Zhao , Xiaojie Yuan

{"title":"ZS-MNET: A zero-shot learning based approach to multimodal named entity typing","authors":"Baohang Zhou , Ying Zhang , Kehui Song , Xuhui Sui , Yu Zhao , Xiaojie Yuan","doi":"10.1016/j.neunet.2025.107264","DOIUrl":null,"url":null,"abstract":"<div><div>The task of named entity typing (NET) on social platforms is significant as it involves identifying the various types of named entities within unstructured text. The existing methods for NET only utilize the text modality to classify the types of named entities and ignore the semantic correlation of multimodal data. Moreover, the growing number of multimodal data implies a growing type set and the newly emerged entity types should be recognized without additional training. To address the aforementioned disadvantages, we introduce a zero-shot learning based multimodal NET (ZS-MNET) model that combines textual and visual modalities to recognize previously unseen named entity types in a zero-shot manner. The proposed ZS-MNET utilizes both text and image information to bridge the semantic correlation between multimodal data and label information, as opposed to the traditional zero-shot NET (ZS-NET) models. To incorporate fine-grained multimodal representations, we utilize pre-trained models that incorporate language and vision, particularly BERT and ViT, which are founded on transformer architectures. Besides, we propose the different multimodal representations to focus on fine-grained features for modeling semantic correlation between multimodal data and entity types in a fusion way. The experimental results underscore the utility of multimodal data in the NET field, while our approach surpasses previous ZS-NET models in performance.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107264"},"PeriodicalIF":6.3000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025001431","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The task of named entity typing (NET) on social platforms is significant as it involves identifying the various types of named entities within unstructured text. The existing methods for NET only utilize the text modality to classify the types of named entities and ignore the semantic correlation of multimodal data. Moreover, the growing number of multimodal data implies a growing type set and the newly emerged entity types should be recognized without additional training. To address the aforementioned disadvantages, we introduce a zero-shot learning based multimodal NET (ZS-MNET) model that combines textual and visual modalities to recognize previously unseen named entity types in a zero-shot manner. The proposed ZS-MNET utilizes both text and image information to bridge the semantic correlation between multimodal data and label information, as opposed to the traditional zero-shot NET (ZS-NET) models. To incorporate fine-grained multimodal representations, we utilize pre-trained models that incorporate language and vision, particularly BERT and ViT, which are founded on transformer architectures. Besides, we propose the different multimodal representations to focus on fine-grained features for modeling semantic correlation between multimodal data and entity types in a fusion way. The experimental results underscore the utility of multimodal data in the NET field, while our approach surpasses previous ZS-NET models in performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ZS-MNET：基于零学习的多模态命名实体类型方法

社交平台上的命名实体类型（NET）任务非常重要，因为它涉及到在非结构化文本中识别各种类型的命名实体。现有的NET方法仅利用文本模态对命名实体的类型进行分类，而忽略了多模态数据之间的语义关联。此外，越来越多的多模态数据意味着越来越多的类型集，新出现的实体类型应该在没有额外训练的情况下识别出来。为了解决上述缺点，我们引入了基于零射击学习的多模态。NET （ZS-MNET）模型，该模型结合了文本和视觉模式，以零射击的方式识别以前未见过的命名实体类型。与传统的零射击网络（ZS-NET）模型不同，本文提出的ZS-MNET利用文本和图像信息来架起多模态数据和标签信息之间语义关联的桥梁。为了结合细粒度的多模态表示，我们利用预先训练的模型，这些模型结合了语言和视觉，特别是BERT和ViT，它们建立在转换器架构上。此外，我们提出了不同的多模态表示，重点关注细粒度特征，以融合的方式建模多模态数据与实体类型之间的语义关联。实验结果强调了多模态数据在。NET领域的实用性，而我们的方法在性能上超越了以前的ZS-NET模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.