Multi-hop neighbor fusion enhanced hierarchical transformer for multi-modal knowledge graph completion

Yunpeng Wang, Bo Ning, Xin Wang, Guanyu Li
{"title":"Multi-hop neighbor fusion enhanced hierarchical transformer for multi-modal knowledge graph completion","authors":"Yunpeng Wang, Bo Ning, Xin Wang, Guanyu Li","doi":"10.1007/s11280-024-01289-w","DOIUrl":null,"url":null,"abstract":"<p>Multi-modal knowledge graph (MKG) refers to a structured semantic network that accurately represents the real-world information by incorporating multiple modalities. Existing researches primarily focus on leveraging multi-modal fusion to enhance the representation capability of entity nodes and link prediction to deal with the incompleteness of the MKG. However, the inherent heterogeneity between structural modality and semantic modality poses challenges to the multi-modal fusion, as noise interference could compromise the effectiveness of the fusion representation. In this study, we propose a novel hierarchical Transformer architecture, named MNFormer, which captures the structural and semantic information while avoiding heterogeneity issues by fully integrating both multi-hop neighbor paths and image-text embeddings. During the encoding stage of MNFormer, we design multiple layers of Multi-hop Neighbor Fusion (MNF) module that employ attentions to merge the image and text features. These MNF modules progressively fuse the information of neighboring entities hop by hop along the neighbor paths of the source entity. The Transformer during decoding stage is then utilized to integrate the outputs of all MNF modules, whose output is subsequently employed to match target entities and accomplish MKG completion. Moreover, we develop a semantic direction loss to enhance the fitting performance of MNFormer. Experimental results on four datasets demonstrate that MNFormer exhibits notable competitiveness when compared to the state-of-the-art models. Additionally, ablation studies showcase the significant ability of MNFormer to effectively combine structural and semantic information, leading to enhanced performance through complementary enhancements.</p>","PeriodicalId":501180,"journal":{"name":"World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Wide Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11280-024-01289-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Multi-modal knowledge graph (MKG) refers to a structured semantic network that accurately represents the real-world information by incorporating multiple modalities. Existing researches primarily focus on leveraging multi-modal fusion to enhance the representation capability of entity nodes and link prediction to deal with the incompleteness of the MKG. However, the inherent heterogeneity between structural modality and semantic modality poses challenges to the multi-modal fusion, as noise interference could compromise the effectiveness of the fusion representation. In this study, we propose a novel hierarchical Transformer architecture, named MNFormer, which captures the structural and semantic information while avoiding heterogeneity issues by fully integrating both multi-hop neighbor paths and image-text embeddings. During the encoding stage of MNFormer, we design multiple layers of Multi-hop Neighbor Fusion (MNF) module that employ attentions to merge the image and text features. These MNF modules progressively fuse the information of neighboring entities hop by hop along the neighbor paths of the source entity. The Transformer during decoding stage is then utilized to integrate the outputs of all MNF modules, whose output is subsequently employed to match target entities and accomplish MKG completion. Moreover, we develop a semantic direction loss to enhance the fitting performance of MNFormer. Experimental results on four datasets demonstrate that MNFormer exhibits notable competitiveness when compared to the state-of-the-art models. Additionally, ablation studies showcase the significant ability of MNFormer to effectively combine structural and semantic information, leading to enhanced performance through complementary enhancements.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于多模态知识图谱补全的多跳邻居融合增强型分层变换器
多模态知识图谱(MKG)是指一种结构化的语义网络,它通过融合多种模态来准确地表达真实世界的信息。现有研究主要侧重于利用多模态融合来增强实体节点的表示能力和链接预测能力,以应对 MKG 的不完整性。然而,结构模态和语义模态之间固有的异质性给多模态融合带来了挑战,因为噪声干扰会影响融合表示的有效性。在本研究中,我们提出了一种名为 MNFormer 的新型分层变换器架构,它能捕捉结构和语义信息,同时通过充分整合多跳邻居路径和图像文本嵌入来避免异质性问题。在 MNFormer 的编码阶段,我们设计了多层多跳邻居融合(MNF)模块,用于合并图像和文本特征。这些 MNF 模块沿着源实体的邻居路径一跳一跳地逐步融合邻居实体的信息。然后,在解码阶段利用变换器整合所有 MNF 模块的输出,再利用其输出匹配目标实体,完成 MKG。此外,我们还开发了一种语义方向损失,以提高 MNFormer 的拟合性能。在四个数据集上的实验结果表明,与最先进的模型相比,MNFormer 表现出了显著的竞争力。此外,消融研究显示了 MNFormer 有效结合结构和语义信息的显著能力,通过互补增强提高了性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
HetFS: a method for fast similarity search with ad-hoc meta-paths on heterogeneous information networks A SHAP-based controversy analysis through communities on Twitter pFind: Privacy-preserving lost object finding in vehicular crowdsensing Use of prompt-based learning for code-mixed and code-switched text classification Drug traceability system based on semantic blockchain and on a reputation method
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1