SimE4KG: Distributed and Explainable Multi-Modal Semantic Similarity Estimation for Knowledge Graphs

IF 0.6 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Semantic Computing Pub Date : 2023-04-18 DOI:10.1142/s1793351x23600012

Carsten Felix Draschner, Hajira Jabeen, Jens Lehmann

{"title":"SimE4KG: Distributed and Explainable Multi-Modal Semantic Similarity Estimation for Knowledge Graphs","authors":"Carsten Felix Draschner, Hajira Jabeen, Jens Lehmann","doi":"10.1142/s1793351x23600012","DOIUrl":null,"url":null,"abstract":"In recent years, exciting sources of data have been modeled as knowledge graphs (KGs). This modeling represents both structural relationships and the entity-specific multi-modal data in KGs. In various data analytics pipelines and machine learning (ML), the task of semantic similarity estimation plays a significant role. Assigning similarity values to entity pairs is needed in recommendation systems, clustering, classification, entity matching/disambiguation and many others. Efficient and scalable frameworks are needed to handle the quadratic complexity of all-pair semantic similarity on Big Data KGs. Moreover, heterogeneous KGs demand multi-modal semantic similarity estimation to cover the versatile contents like categorical relations between classes or their attribute literals like strings, timestamps or numeric data. In this paper, we propose the SimE4KG framework as a resource providing generic open-source modules that perform semantic similarity estimation in multi-modal KGs. To justify the computational costs of similarity estimation, the SimE4KG generates reproducible, reusable and explainable results. The pipeline results are a native semantic RDF KG, including the experiment results, hyper-parameter setup and explanation of the results, like the most influential features. For fast and scalable execution in memory, we implemented the distributed approach using Apache Spark. The entire development of this framework is integrated into the holistic distributed Semantic ANalytics StAck (SANSA).","PeriodicalId":43471,"journal":{"name":"International Journal of Semantic Computing","volume":"69 1","pages":"0"},"PeriodicalIF":0.6000,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Semantic Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s1793351x23600012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, exciting sources of data have been modeled as knowledge graphs (KGs). This modeling represents both structural relationships and the entity-specific multi-modal data in KGs. In various data analytics pipelines and machine learning (ML), the task of semantic similarity estimation plays a significant role. Assigning similarity values to entity pairs is needed in recommendation systems, clustering, classification, entity matching/disambiguation and many others. Efficient and scalable frameworks are needed to handle the quadratic complexity of all-pair semantic similarity on Big Data KGs. Moreover, heterogeneous KGs demand multi-modal semantic similarity estimation to cover the versatile contents like categorical relations between classes or their attribute literals like strings, timestamps or numeric data. In this paper, we propose the SimE4KG framework as a resource providing generic open-source modules that perform semantic similarity estimation in multi-modal KGs. To justify the computational costs of similarity estimation, the SimE4KG generates reproducible, reusable and explainable results. The pipeline results are a native semantic RDF KG, including the experiment results, hyper-parameter setup and explanation of the results, like the most influential features. For fast and scalable execution in memory, we implemented the distributed approach using Apache Spark. The entire development of this framework is integrated into the holistic distributed Semantic ANalytics StAck (SANSA).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SimE4KG:知识图的分布式和可解释多模态语义相似度估计

近年来，令人兴奋的数据来源已被建模为知识图(KGs)。在各种数据分析管道和机器学习(ML)中，语义相似度估计任务起着重要的作用。在推荐系统、聚类、分类、实体匹配/消歧等许多领域都需要为实体对分配相似值。大数据知识库需要高效、可扩展的框架来处理全对语义相似度的二次复杂度，此外，异构知识库需要多模态语义相似度估计，以涵盖类之间的分类关系或其属性文字(如字符串、时间戳或数字数据)等通用内容。在本文中，我们提出了SimE4KG框架作为一种资源，提供通用的开源模块，用于在多模态kg中进行语义相似度估计。为了证明相似度估计的计算成本是合理的，SimE4KG生成可重复、可重用和可解释的结果。管道结果是一个原生语义RDF KG，包括实验结果、超参数设置和结果解释等最具影响力的特征。为了在内存中快速和可伸缩地执行，我们使用Apache Spark实现了分布式方法。该框架的整个开发被集成到整体分布式语义分析堆栈(SANSA)中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Semantic Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

1.70

自引率

12.50%

发文量