ME3A: A Multimodal Entity Entailment framework for multimodal Entity Alignment

IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Processing & Management Pub Date : 2024-11-05 DOI:10.1016/j.ipm.2024.103951
Yu Zhao, Ying Zhang, Xuhui Sui, Xiangrui Cai
{"title":"ME3A: A Multimodal Entity Entailment framework for multimodal Entity Alignment","authors":"Yu Zhao,&nbsp;Ying Zhang,&nbsp;Xuhui Sui,&nbsp;Xiangrui Cai","doi":"10.1016/j.ipm.2024.103951","DOIUrl":null,"url":null,"abstract":"<div><div>Current methods for multimodal entity alignment (MEA) primarily rely on entity representation learning, which undermines entity alignment performance because of cross-KG interaction deficiency and multimodal heterogeneity. In this paper, we propose a <strong>M</strong>ultimodal <strong>E</strong>ntity <strong>E</strong>ntailment framework of multimodal <strong>E</strong>ntity <strong>A</strong>lignment task, <strong>ME<sup>3</sup>A</strong>, and recast the MEA task as an entailment problem about entities in the two KGs. This way, the cross-KG modality information directly interacts with each other in the unified textual space. Specifically, we construct the multimodal information in the unified textual space as textual sequences: for relational and attribute modalities, we combine the neighbors and attribute values of entities as sentences; for visual modality, we map the entity image as trainable prefixes and insert them into sequences. Then, we input the concatenated sequences of two entities into the pre-trained language model (PLM) as an entailment reasoner to capture the unified fine-grained correlation pattern of the multimodal tokens between entities. Two types of entity aligners are proposed to model the bi-directional entailment probability as the entity similarity. Extensive experiments conducted on nine MEA datasets with various modality combination settings demonstrate that our ME<span><math><msup><mrow></mrow><mrow><mn>3</mn></mrow></msup></math></span>A effectively incorporates multimodal information and surpasses the performance of the state-of-the-art MEA methods by 16.5% at most.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 1","pages":"Article 103951"},"PeriodicalIF":7.4000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324003108","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Current methods for multimodal entity alignment (MEA) primarily rely on entity representation learning, which undermines entity alignment performance because of cross-KG interaction deficiency and multimodal heterogeneity. In this paper, we propose a Multimodal Entity Entailment framework of multimodal Entity Alignment task, ME3A, and recast the MEA task as an entailment problem about entities in the two KGs. This way, the cross-KG modality information directly interacts with each other in the unified textual space. Specifically, we construct the multimodal information in the unified textual space as textual sequences: for relational and attribute modalities, we combine the neighbors and attribute values of entities as sentences; for visual modality, we map the entity image as trainable prefixes and insert them into sequences. Then, we input the concatenated sequences of two entities into the pre-trained language model (PLM) as an entailment reasoner to capture the unified fine-grained correlation pattern of the multimodal tokens between entities. Two types of entity aligners are proposed to model the bi-directional entailment probability as the entity similarity. Extensive experiments conducted on nine MEA datasets with various modality combination settings demonstrate that our ME3A effectively incorporates multimodal information and surpasses the performance of the state-of-the-art MEA methods by 16.5% at most.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ME3A:用于多模态实体对齐的多模态实体关联框架
目前的多模态实体配准(MEA)方法主要依赖于实体表征学习,但由于跨 KG 交互缺陷和多模态异质性,实体配准性能受到影响。本文提出了多模态实体对齐任务 ME3A 的多模态实体枚举框架,并将多模态实体对齐任务重塑为两个 KG 中实体的枚举问题。这样,跨 KG 的模态信息就可以在统一的文本空间中直接交互。具体来说,我们将统一文本空间中的多模态信息构建为文本序列:对于关系模态和属性模态,我们将实体的相邻关系和属性值组合为句子;对于视觉模态,我们将实体图像映射为可训练的前缀,并将其插入序列中。然后,我们将两个实体的串联序列输入预先训练好的语言模型(PLM)作为蕴涵推理器,以捕捉实体间多模态标记的统一细粒度关联模式。我们提出了两类实体对齐器,将双向 "entailment probability "建模为实体相似性。在九个具有不同模态组合设置的 MEA 数据集上进行的广泛实验表明,我们的 ME3A 有效地整合了多模态信息,其性能最多比最先进的 MEA 方法高出 16.5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Information Processing & Management
Information Processing & Management 工程技术-计算机:信息系统
CiteScore
17.00
自引率
11.60%
发文量
276
审稿时长
39 days
期刊介绍: Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.
期刊最新文献
Unsupervised Adaptive Hypergraph Correlation Hashing for multimedia retrieval Enhancing robustness in implicit feedback recommender systems with subgraph contrastive learning Domain disentanglement and fusion based on hyperbolic neural networks for zero-shot sketch-based image retrieval Patients' cognitive and behavioral paradoxes in the process of adopting conflicting health information: A dynamic perspective Study of technology communities and dominant technology lock-in in the Internet of Things domain - Based on social network analysis of patent network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1