超越表达：知识三重事实的全面可视化

IF 8.1 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Processing & Management Pub Date : 2025-05-01 Epub Date: 2025-01-18 DOI:10.1016/j.ipm.2025.104062

Wei Liu , Yixue He , Chao Wang , Shaorong Xie , Weimin Li

{"title":"超越表达：知识三重事实的全面可视化","authors":"Wei Liu , Yixue He , Chao Wang , Shaorong Xie , Weimin Li","doi":"10.1016/j.ipm.2025.104062","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-modal Knowledge Graphs (KGs) enhance traditional KGs by incorporating multi-modal data to bridge the information gap in natural language processing (NLP) tasks. One direct method to incorporate multi-modal data is to associate structured KG with corresponding image modalities, thereby visualizing entities and triplet facts. However, existing visualization methods for triplet facts often exclude triplet facts containing abstract entities and non-visual relations, resulting in their disassociation from corresponding image modalities. This exclusion compromises the completeness and utility of multi-modal KGs. In this paper, we aim to construct a comprehensive multi-modal KG that includes abstract entities and non-visual relations, ensuring complete visualization of every triplet fact. To achieve this purpose, we propose a method for the integration of image <strong>R</strong>etrieval-<strong>G</strong>eneration-<strong>E</strong>diting (RGE) to completely and accurately visualize each triplet fact. Initially, we correct the triplet facts by integrating a Large Language Model (LLM) with a retrieved knowledge database about triplet facts. Subsequently, by providing appropriate contextual examples to the LLM, we generate visual elements of relations, enriching the semantics of the triplet facts. We then employ image retrieval to obtain images that reflect the semantics of each triplet fact. For those triplet facts for which images cannot be directly retrieved, we utilize image generation and editing to create and modify images that can express the semantics of the triplet facts. Through the RGE method, we construct a multi-modal KG named <span>DB15kFact</span>, which includes 86,722 triplet facts, 274 relations, 12,842 entities, and 387,096 images. The construction of <span>DB15kFact</span> has resulted in a fourfold increase in the number of relations compared to the previous multi-modal KG, ImgFact. In experiments, both automatic and manual evaluations confirm the quality of <span>DB15kFact</span>. The results demonstrate that the <span>DB15kFact</span> significantly enhances model performance in link prediction and relation classification. Notably, in link prediction, the model optimized with <span>DB15kFact</span> achieves a 7.12% improvement in the H@10 metric compared to existing solutions.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 3","pages":"Article 104062"},"PeriodicalIF":8.1000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Beyond expression: Comprehensive visualization of knowledge triplet facts\",\"authors\":\"Wei Liu , Yixue He , Chao Wang , Shaorong Xie , Weimin Li\",\"doi\":\"10.1016/j.ipm.2025.104062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multi-modal Knowledge Graphs (KGs) enhance traditional KGs by incorporating multi-modal data to bridge the information gap in natural language processing (NLP) tasks. One direct method to incorporate multi-modal data is to associate structured KG with corresponding image modalities, thereby visualizing entities and triplet facts. However, existing visualization methods for triplet facts often exclude triplet facts containing abstract entities and non-visual relations, resulting in their disassociation from corresponding image modalities. This exclusion compromises the completeness and utility of multi-modal KGs. In this paper, we aim to construct a comprehensive multi-modal KG that includes abstract entities and non-visual relations, ensuring complete visualization of every triplet fact. To achieve this purpose, we propose a method for the integration of image <strong>R</strong>etrieval-<strong>G</strong>eneration-<strong>E</strong>diting (RGE) to completely and accurately visualize each triplet fact. Initially, we correct the triplet facts by integrating a Large Language Model (LLM) with a retrieved knowledge database about triplet facts. Subsequently, by providing appropriate contextual examples to the LLM, we generate visual elements of relations, enriching the semantics of the triplet facts. We then employ image retrieval to obtain images that reflect the semantics of each triplet fact. For those triplet facts for which images cannot be directly retrieved, we utilize image generation and editing to create and modify images that can express the semantics of the triplet facts. Through the RGE method, we construct a multi-modal KG named <span>DB15kFact</span>, which includes 86,722 triplet facts, 274 relations, 12,842 entities, and 387,096 images. The construction of <span>DB15kFact</span> has resulted in a fourfold increase in the number of relations compared to the previous multi-modal KG, ImgFact. In experiments, both automatic and manual evaluations confirm the quality of <span>DB15kFact</span>. The results demonstrate that the <span>DB15kFact</span> significantly enhances model performance in link prediction and relation classification. Notably, in link prediction, the model optimized with <span>DB15kFact</span> achieves a 7.12% improvement in the H@10 metric compared to existing solutions.</div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"62 3\",\"pages\":\"Article 104062\"},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457325000044\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325000044","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/18 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

多模态知识图（KGs）通过融合多模态数据来弥补自然语言处理（NLP）任务中的信息鸿沟，从而增强了传统知识图的功能。合并多模态数据的一种直接方法是将结构化KG与相应的图像模态相关联，从而将实体和三重事实可视化。然而，现有的三重事实可视化方法往往会排除包含抽象实体和非视觉关系的三重事实，导致它们与相应的图像模态分离。在本文中，我们的目标是构建一个包括抽象实体和非视觉关系的综合多模态KG，以确保每个三重事实的完全可视化。为了实现这一目的，我们提出了一种图像检索-生成-编辑（RGE）集成的方法，以完整、准确地可视化每个三重事实。首先，我们通过集成大型语言模型（LLM）和检索到的关于三重事实的知识库来纠正三重事实。随后，通过向法学硕士提供适当的上下文示例，我们生成关系的可视化元素，丰富了三重事实的语义。然后，我们使用图像检索来获得反映每个三重事实语义的图像。对于那些不能直接检索图像的三重事实，我们利用图像生成和编辑来创建和修改可以表达三重事实语义的图像。通过RGE方法，我们构建了一个名为DB15kFact的多模态KG，其中包括86,722个三重事实，274个关系，12,842个实体和387,096个图像。DB15kFact的构建使关系数量比之前的多模态KG ImgFact增加了四倍。在实验中，自动和手动评估都证实了DB15kFact的质量。结果表明，DB15kFact显著提高了模型在链路预测和关系分类方面的性能。值得注意的是，在链路预测中，与现有解决方案相比，使用DB15kFact优化的模型在H@10指标上实现了7.12%的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Beyond expression: Comprehensive visualization of knowledge triplet facts

Multi-modal Knowledge Graphs (KGs) enhance traditional KGs by incorporating multi-modal data to bridge the information gap in natural language processing (NLP) tasks. One direct method to incorporate multi-modal data is to associate structured KG with corresponding image modalities, thereby visualizing entities and triplet facts. However, existing visualization methods for triplet facts often exclude triplet facts containing abstract entities and non-visual relations, resulting in their disassociation from corresponding image modalities. This exclusion compromises the completeness and utility of multi-modal KGs. In this paper, we aim to construct a comprehensive multi-modal KG that includes abstract entities and non-visual relations, ensuring complete visualization of every triplet fact. To achieve this purpose, we propose a method for the integration of image Retrieval-Generation-Editing (RGE) to completely and accurately visualize each triplet fact. Initially, we correct the triplet facts by integrating a Large Language Model (LLM) with a retrieved knowledge database about triplet facts. Subsequently, by providing appropriate contextual examples to the LLM, we generate visual elements of relations, enriching the semantics of the triplet facts. We then employ image retrieval to obtain images that reflect the semantics of each triplet fact. For those triplet facts for which images cannot be directly retrieved, we utilize image generation and editing to create and modify images that can express the semantics of the triplet facts. Through the RGE method, we construct a multi-modal KG named DB15kFact, which includes 86,722 triplet facts, 274 relations, 12,842 entities, and 387,096 images. The construction of DB15kFact has resulted in a fourfold increase in the number of relations compared to the previous multi-modal KG, ImgFact. In experiments, both automatic and manual evaluations confirm the quality of DB15kFact. The results demonstrate that the DB15kFact significantly enhances model performance in link prediction and relation classification. Notably, in link prediction, the model optimized with DB15kFact achieves a 7.12% improvement in the H@10 metric compared to existing solutions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.