Baohang Zhou , Ying Zhang , Kehui Song , Xuhui Sui , Yu Zhao , Xiaojie Yuan
{"title":"ZS-MNET: A zero-shot learning based approach to multimodal named entity typing","authors":"Baohang Zhou , Ying Zhang , Kehui Song , Xuhui Sui , Yu Zhao , Xiaojie Yuan","doi":"10.1016/j.neunet.2025.107264","DOIUrl":null,"url":null,"abstract":"<div><div>The task of named entity typing (NET) on social platforms is significant as it involves identifying the various types of named entities within unstructured text. The existing methods for NET only utilize the text modality to classify the types of named entities and ignore the semantic correlation of multimodal data. Moreover, the growing number of multimodal data implies a growing type set and the newly emerged entity types should be recognized without additional training. To address the aforementioned disadvantages, we introduce a zero-shot learning based multimodal NET (ZS-MNET) model that combines textual and visual modalities to recognize previously unseen named entity types in a zero-shot manner. The proposed ZS-MNET utilizes both text and image information to bridge the semantic correlation between multimodal data and label information, as opposed to the traditional zero-shot NET (ZS-NET) models. To incorporate fine-grained multimodal representations, we utilize pre-trained models that incorporate language and vision, particularly BERT and ViT, which are founded on transformer architectures. Besides, we propose the different multimodal representations to focus on fine-grained features for modeling semantic correlation between multimodal data and entity types in a fusion way. The experimental results underscore the utility of multimodal data in the NET field, while our approach surpasses previous ZS-NET models in performance.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107264"},"PeriodicalIF":6.0000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025001431","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The task of named entity typing (NET) on social platforms is significant as it involves identifying the various types of named entities within unstructured text. The existing methods for NET only utilize the text modality to classify the types of named entities and ignore the semantic correlation of multimodal data. Moreover, the growing number of multimodal data implies a growing type set and the newly emerged entity types should be recognized without additional training. To address the aforementioned disadvantages, we introduce a zero-shot learning based multimodal NET (ZS-MNET) model that combines textual and visual modalities to recognize previously unseen named entity types in a zero-shot manner. The proposed ZS-MNET utilizes both text and image information to bridge the semantic correlation between multimodal data and label information, as opposed to the traditional zero-shot NET (ZS-NET) models. To incorporate fine-grained multimodal representations, we utilize pre-trained models that incorporate language and vision, particularly BERT and ViT, which are founded on transformer architectures. Besides, we propose the different multimodal representations to focus on fine-grained features for modeling semantic correlation between multimodal data and entity types in a fusion way. The experimental results underscore the utility of multimodal data in the NET field, while our approach surpasses previous ZS-NET models in performance.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.