Multi-hop neighbor fusion enhanced hierarchical transformer for multi-modal knowledge graph completion

World Wide Web Pub Date : 2024-07-19 DOI:10.1007/s11280-024-01289-w

Yunpeng Wang, Bo Ning, Xin Wang, Guanyu Li

{"title":"Multi-hop neighbor fusion enhanced hierarchical transformer for multi-modal knowledge graph completion","authors":"Yunpeng Wang, Bo Ning, Xin Wang, Guanyu Li","doi":"10.1007/s11280-024-01289-w","DOIUrl":null,"url":null,"abstract":"<p>Multi-modal knowledge graph (MKG) refers to a structured semantic network that accurately represents the real-world information by incorporating multiple modalities. Existing researches primarily focus on leveraging multi-modal fusion to enhance the representation capability of entity nodes and link prediction to deal with the incompleteness of the MKG. However, the inherent heterogeneity between structural modality and semantic modality poses challenges to the multi-modal fusion, as noise interference could compromise the effectiveness of the fusion representation. In this study, we propose a novel hierarchical Transformer architecture, named MNFormer, which captures the structural and semantic information while avoiding heterogeneity issues by fully integrating both multi-hop neighbor paths and image-text embeddings. During the encoding stage of MNFormer, we design multiple layers of Multi-hop Neighbor Fusion (MNF) module that employ attentions to merge the image and text features. These MNF modules progressively fuse the information of neighboring entities hop by hop along the neighbor paths of the source entity. The Transformer during decoding stage is then utilized to integrate the outputs of all MNF modules, whose output is subsequently employed to match target entities and accomplish MKG completion. Moreover, we develop a semantic direction loss to enhance the fitting performance of MNFormer. Experimental results on four datasets demonstrate that MNFormer exhibits notable competitiveness when compared to the state-of-the-art models. Additionally, ablation studies showcase the significant ability of MNFormer to effectively combine structural and semantic information, leading to enhanced performance through complementary enhancements.</p>","PeriodicalId":501180,"journal":{"name":"World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Wide Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11280-024-01289-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-modal knowledge graph (MKG) refers to a structured semantic network that accurately represents the real-world information by incorporating multiple modalities. Existing researches primarily focus on leveraging multi-modal fusion to enhance the representation capability of entity nodes and link prediction to deal with the incompleteness of the MKG. However, the inherent heterogeneity between structural modality and semantic modality poses challenges to the multi-modal fusion, as noise interference could compromise the effectiveness of the fusion representation. In this study, we propose a novel hierarchical Transformer architecture, named MNFormer, which captures the structural and semantic information while avoiding heterogeneity issues by fully integrating both multi-hop neighbor paths and image-text embeddings. During the encoding stage of MNFormer, we design multiple layers of Multi-hop Neighbor Fusion (MNF) module that employ attentions to merge the image and text features. These MNF modules progressively fuse the information of neighboring entities hop by hop along the neighbor paths of the source entity. The Transformer during decoding stage is then utilized to integrate the outputs of all MNF modules, whose output is subsequently employed to match target entities and accomplish MKG completion. Moreover, we develop a semantic direction loss to enhance the fitting performance of MNFormer. Experimental results on four datasets demonstrate that MNFormer exhibits notable competitiveness when compared to the state-of-the-art models. Additionally, ablation studies showcase the significant ability of MNFormer to effectively combine structural and semantic information, leading to enhanced performance through complementary enhancements.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于多模态知识图谱补全的多跳邻居融合增强型分层变换器

多模态知识图谱（MKG）是指一种结构化的语义网络，它通过融合多种模态来准确地表达真实世界的信息。现有研究主要侧重于利用多模态融合来增强实体节点的表示能力和链接预测能力，以应对 MKG 的不完整性。然而，结构模态和语义模态之间固有的异质性给多模态融合带来了挑战，因为噪声干扰会影响融合表示的有效性。在本研究中，我们提出了一种名为 MNFormer 的新型分层变换器架构，它能捕捉结构和语义信息，同时通过充分整合多跳邻居路径和图像文本嵌入来避免异质性问题。在 MNFormer 的编码阶段，我们设计了多层多跳邻居融合（MNF）模块，用于合并图像和文本特征。这些 MNF 模块沿着源实体的邻居路径一跳一跳地逐步融合邻居实体的信息。然后，在解码阶段利用变换器整合所有 MNF 模块的输出，再利用其输出匹配目标实体，完成 MKG。此外，我们还开发了一种语义方向损失，以提高 MNFormer 的拟合性能。在四个数据集上的实验结果表明，与最先进的模型相比，MNFormer 表现出了显著的竞争力。此外，消融研究显示了 MNFormer 有效结合结构和语义信息的显著能力，通过互补增强提高了性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

World Wide Web

自引率

0.00%

发文量