FRCL-MNER: A Finer Grained Rank-Based Contrastive Learning Framework for Multimodal NER

IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE transactions on neural networks and learning systems Pub Date : 2025-02-10 DOI:10.1109/TNNLS.2025.3528567
Tianwei Yan;Shan Zhao;Wentao Ma;Shezheng Song;Chengyu Wang;Zhibo Rao;Shizhao Chen;Zhigang Luo;Xinwang Liu
{"title":"FRCL-MNER: A Finer Grained Rank-Based Contrastive Learning Framework for Multimodal NER","authors":"Tianwei Yan;Shan Zhao;Wentao Ma;Shezheng Song;Chengyu Wang;Zhibo Rao;Shizhao Chen;Zhigang Luo;Xinwang Liu","doi":"10.1109/TNNLS.2025.3528567","DOIUrl":null,"url":null,"abstract":"Multimodal named entity recognition (MNER) is an emerging field that aims to automatically detect named entities and classify their categories, utilizing input text and auxiliary resources such as images. While previous studies have leveraged object detectors to preprocess images and fuse textual semantics with corresponding image features, these methods often overlook the potential finer grained information within each modality and may exacerbate error propagation due to predetection. To address these issues, we propose a finer grained rank-based contrastive learning (FRCL) framework for MNER. This framework employs a global-level contrastive learning to align multimodal semantic features and a Top-K rank-based mask strategy to construct positive-negative pairs, thereby learning a finer grained multimodal interaction representation. Experimental results from three well-known social media datasets reveal that our approach surpasses existing strong baselines, and achieves up to a 1.54% improvement on the Twitter2015 dataset. Extensive discussions further confirm the effectiveness of our approach. We will release the source code on <uri>https://github.com/augusyan/FRCL</uri>.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 6","pages":"10779-10793"},"PeriodicalIF":8.9000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10879144/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Multimodal named entity recognition (MNER) is an emerging field that aims to automatically detect named entities and classify their categories, utilizing input text and auxiliary resources such as images. While previous studies have leveraged object detectors to preprocess images and fuse textual semantics with corresponding image features, these methods often overlook the potential finer grained information within each modality and may exacerbate error propagation due to predetection. To address these issues, we propose a finer grained rank-based contrastive learning (FRCL) framework for MNER. This framework employs a global-level contrastive learning to align multimodal semantic features and a Top-K rank-based mask strategy to construct positive-negative pairs, thereby learning a finer grained multimodal interaction representation. Experimental results from three well-known social media datasets reveal that our approach surpasses existing strong baselines, and achieves up to a 1.54% improvement on the Twitter2015 dataset. Extensive discussions further confirm the effectiveness of our approach. We will release the source code on https://github.com/augusyan/FRCL.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FRCL-MNER:多模态NER的细粒度基于秩的对比学习框架
多模态命名实体识别(MNER)是一个新兴领域,其目的是利用输入文本和辅助资源(如图像)自动检测命名实体并对其分类。虽然以前的研究利用目标检测器对图像进行预处理,并将文本语义与相应的图像特征融合在一起,但这些方法往往忽略了每种模态中潜在的更细粒度信息,并且可能由于预检测而加剧错误传播。为了解决这些问题,我们为MNER提出了一个更细粒度的基于秩的对比学习(FRCL)框架。该框架采用全局级对比学习来对齐多模态语义特征,并采用基于Top-K秩的掩码策略来构建正负对,从而学习更细粒度的多模态交互表示。来自三个知名社交媒体数据集的实验结果表明,我们的方法超越了现有的强基线,在Twitter2015数据集上实现了1.54%的改进。广泛的讨论进一步证实了我们的做法的有效性。我们将在https://github.com/augusyan/FRCL上发布源代码。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE transactions on neural networks and learning systems
IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
CiteScore
23.80
自引率
9.60%
发文量
2102
审稿时长
3-8 weeks
期刊介绍: The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.
期刊最新文献
When Optimal Transport Meets Photo-Realistic Image Dehazing With Unpaired Training. Multistage PCA Whitening: A Robust Method to Dimensionality Reduction in Image Retrieval. Neural Architecture Search With Spatial-Spectral Attention for Higher-Order Nonlinear Hyperspectral Unmixing. Spatial Meta-Learning-Based Representation for Unseen Geographic Entities. Actor-Critic-Based Prescribed Performance Optimal Control for Flexible-Joint Robots With Input Delay.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1