SimE4KG: Distributed and Explainable Multi-Modal Semantic Similarity Estimation for Knowledge Graphs

IF 0.3 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Semantic Computing Pub Date : 2023-04-18 DOI:10.1142/s1793351x23600012
Carsten Felix Draschner, Hajira Jabeen, Jens Lehmann
{"title":"SimE4KG: Distributed and Explainable Multi-Modal Semantic Similarity Estimation for Knowledge Graphs","authors":"Carsten Felix Draschner, Hajira Jabeen, Jens Lehmann","doi":"10.1142/s1793351x23600012","DOIUrl":null,"url":null,"abstract":"In recent years, exciting sources of data have been modeled as knowledge graphs (KGs). This modeling represents both structural relationships and the entity-specific multi-modal data in KGs. In various data analytics pipelines and machine learning (ML), the task of semantic similarity estimation plays a significant role. Assigning similarity values to entity pairs is needed in recommendation systems, clustering, classification, entity matching/disambiguation and many others. Efficient and scalable frameworks are needed to handle the quadratic complexity of all-pair semantic similarity on Big Data KGs. Moreover, heterogeneous KGs demand multi-modal semantic similarity estimation to cover the versatile contents like categorical relations between classes or their attribute literals like strings, timestamps or numeric data. In this paper, we propose the SimE4KG framework as a resource providing generic open-source modules that perform semantic similarity estimation in multi-modal KGs. To justify the computational costs of similarity estimation, the SimE4KG generates reproducible, reusable and explainable results. The pipeline results are a native semantic RDF KG, including the experiment results, hyper-parameter setup and explanation of the results, like the most influential features. For fast and scalable execution in memory, we implemented the distributed approach using Apache Spark. The entire development of this framework is integrated into the holistic distributed Semantic ANalytics StAck (SANSA).","PeriodicalId":43471,"journal":{"name":"International Journal of Semantic Computing","volume":"69 1","pages":"0"},"PeriodicalIF":0.3000,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Semantic Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s1793351x23600012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, exciting sources of data have been modeled as knowledge graphs (KGs). This modeling represents both structural relationships and the entity-specific multi-modal data in KGs. In various data analytics pipelines and machine learning (ML), the task of semantic similarity estimation plays a significant role. Assigning similarity values to entity pairs is needed in recommendation systems, clustering, classification, entity matching/disambiguation and many others. Efficient and scalable frameworks are needed to handle the quadratic complexity of all-pair semantic similarity on Big Data KGs. Moreover, heterogeneous KGs demand multi-modal semantic similarity estimation to cover the versatile contents like categorical relations between classes or their attribute literals like strings, timestamps or numeric data. In this paper, we propose the SimE4KG framework as a resource providing generic open-source modules that perform semantic similarity estimation in multi-modal KGs. To justify the computational costs of similarity estimation, the SimE4KG generates reproducible, reusable and explainable results. The pipeline results are a native semantic RDF KG, including the experiment results, hyper-parameter setup and explanation of the results, like the most influential features. For fast and scalable execution in memory, we implemented the distributed approach using Apache Spark. The entire development of this framework is integrated into the holistic distributed Semantic ANalytics StAck (SANSA).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SimE4KG:知识图的分布式和可解释多模态语义相似度估计
近年来,令人兴奋的数据来源已被建模为知识图(KGs)。在各种数据分析管道和机器学习(ML)中,语义相似度估计任务起着重要的作用。在推荐系统、聚类、分类、实体匹配/消歧等许多领域都需要为实体对分配相似值。大数据知识库需要高效、可扩展的框架来处理全对语义相似度的二次复杂度,此外,异构知识库需要多模态语义相似度估计,以涵盖类之间的分类关系或其属性文字(如字符串、时间戳或数字数据)等通用内容。在本文中,我们提出了SimE4KG框架作为一种资源,提供通用的开源模块,用于在多模态kg中进行语义相似度估计。为了证明相似度估计的计算成本是合理的,SimE4KG生成可重复、可重用和可解释的结果。管道结果是一个原生语义RDF KG,包括实验结果、超参数设置和结果解释等最具影响力的特征。为了在内存中快速和可伸缩地执行,我们使用Apache Spark实现了分布式方法。该框架的整个开发被集成到整体分布式语义分析堆栈(SANSA)中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International Journal of Semantic Computing
International Journal of Semantic Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
1.70
自引率
12.50%
发文量
39
期刊最新文献
Model-Agnostic Zero-Shot Intent Detection via Contrastive Transfer Learning A 15-Category Audio Dataset for Drones and An Audio-based UAV Classification Using Machine Learning Automatic Domain-Adaptive Sentiment Analysis with SentiMap Basic Evaluation and Scoring of Energy Use in Range Image Curvature Determination Accuracy enhancement of industrial robots based on visual servoing using optimal adaptive rbfnn integral terminal fractional-order super-twisting algorithm
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1