smart-KG：基于分区的关联数据片段，用于查询知识图谱

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Semantic Web Pub Date : 2024-03-20 DOI:10.3233/sw-243571

Amr Azzam, A. Polleres, Javier D. Fernández, Maribel Acosta

{"title":"smart-KG：基于分区的关联数据片段，用于查询知识图谱","authors":"Amr Azzam, A. Polleres, Javier D. Fernández, Maribel Acosta","doi":"10.3233/sw-243571","DOIUrl":null,"url":null,"abstract":"RDF and SPARQL provide a uniform way to publish and query billions of triples in open knowledge graphs (KGs) on the Web. Yet, provisioning of a fast, reliable, and responsive live querying solution for open KGs is still hardly possible through SPARQL endpoints alone: while such endpoints provide a remarkable performance for single queries, they typically can not cope with highly concurrent query workloads by multiple clients. To mitigate this, the Linked Data Fragments (LDF) framework sparked the design of different alternative low-cost interfaces such as Triple Pattern Fragments (TPF), that partially offload the query processing workload to the client side. On the downside, such interfaces still come with the expense of unnecessarily high network load due to the necessary transfer of intermediate results to the client, leading to query performance degradation compared with endpoints. To address this problem, in the present work, we investigate alternative interfaces, refining and extending the original TPF idea, which also aims at reducing server-resource consumption, by shipping query-relevant partitions of KGs from the server to the client. To this end, first, we align formal definitions and notations of the original LDF framework to uniformly present existing LDF implements and such “partition-based” LDF approaches. These novel LDF interfaces retrieve, instead of the exact triples matching a particular query pattern, a subset of pre-materialized, compressed, partitions of the original graph, containing all answers to a query pattern, to be further evaluated on the client side. As a concrete representative of partition-based LDF, we present smart-KG+, extending and refining our prior work (In WWW ’20: The Web Conference 2020 (2020) 984–994 ACM / IW3C2) in several respects. Our proposed approach is a step forward towards a better-balanced share of the query processing load between clients and servers by shipping graph partitions driven by the structure of RDF graphs to group entities described with the same sets of properties and classes, resulting in significant data transfer reduction. Our experiments demonstrate that the smart-KG+ significantly outperforms existing Web SPARQL interfaces on both pre-existing benchmarks for highly concurrent query execution as well as an accustomed query workload inspired by query logs of existing SPARQL endpoints.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"smart-KG: Partition-Based Linked Data Fragments for querying knowledge graphs\",\"authors\":\"Amr Azzam, A. Polleres, Javier D. Fernández, Maribel Acosta\",\"doi\":\"10.3233/sw-243571\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"RDF and SPARQL provide a uniform way to publish and query billions of triples in open knowledge graphs (KGs) on the Web. Yet, provisioning of a fast, reliable, and responsive live querying solution for open KGs is still hardly possible through SPARQL endpoints alone: while such endpoints provide a remarkable performance for single queries, they typically can not cope with highly concurrent query workloads by multiple clients. To mitigate this, the Linked Data Fragments (LDF) framework sparked the design of different alternative low-cost interfaces such as Triple Pattern Fragments (TPF), that partially offload the query processing workload to the client side. On the downside, such interfaces still come with the expense of unnecessarily high network load due to the necessary transfer of intermediate results to the client, leading to query performance degradation compared with endpoints. To address this problem, in the present work, we investigate alternative interfaces, refining and extending the original TPF idea, which also aims at reducing server-resource consumption, by shipping query-relevant partitions of KGs from the server to the client. To this end, first, we align formal definitions and notations of the original LDF framework to uniformly present existing LDF implements and such “partition-based” LDF approaches. These novel LDF interfaces retrieve, instead of the exact triples matching a particular query pattern, a subset of pre-materialized, compressed, partitions of the original graph, containing all answers to a query pattern, to be further evaluated on the client side. As a concrete representative of partition-based LDF, we present smart-KG+, extending and refining our prior work (In WWW ’20: The Web Conference 2020 (2020) 984–994 ACM / IW3C2) in several respects. Our proposed approach is a step forward towards a better-balanced share of the query processing load between clients and servers by shipping graph partitions driven by the structure of RDF graphs to group entities described with the same sets of properties and classes, resulting in significant data transfer reduction. Our experiments demonstrate that the smart-KG+ significantly outperforms existing Web SPARQL interfaces on both pre-existing benchmarks for highly concurrent query execution as well as an accustomed query workload inspired by query logs of existing SPARQL endpoints.\",\"PeriodicalId\":48694,\"journal\":{\"name\":\"Semantic Web\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Semantic Web\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.3233/sw-243571\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Semantic Web","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/sw-243571","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 1

摘要

RDF 和 SPARQL 为在网络上发布和查询开放知识图谱（KG）中的数十亿个三元组提供了统一的方法。然而，仅仅通过SPARQL端点还很难为开放知识图谱提供快速、可靠、响应迅速的实时查询解决方案：虽然这些端点为单次查询提供了出色的性能，但它们通常无法应对多个客户端高度并发的查询工作量。为了缓解这一问题，关联数据片段（LDF）框架激发了人们设计不同的低成本替代接口，比如三重模式片段（TPF），它们可以将查询处理的工作量部分卸载到客户端。但缺点是，由于需要将中间结果传输到客户端，这些接口仍会带来不必要的高网络负载，导致查询性能比端点低。为了解决这个问题，在本研究中，我们研究了替代接口，完善并扩展了最初的 TPF 概念，其目的也是通过将与查询相关的 KG 分区从服务器传送到客户端来减少服务器资源消耗。为此，我们首先调整了原始 LDF 框架的正式定义和符号，以统一呈现现有的 LDF 实现和这种 "基于分区 "的 LDF 方法。这些新颖的 LDF 接口检索的不是与特定查询模式匹配的精确三元组，而是原始图的预实体化压缩分区子集，其中包含查询模式的所有答案，以便在客户端进行进一步评估。作为基于分区的 LDF 的具体代表，我们提出了 smart-KG+，在多个方面扩展并完善了我们之前的工作（In WWW '20: The Web Conference 2020 (2020) 984-994 ACM / IW3C2）。我们提出的方法通过利用 RDF 图结构驱动的图分区，将具有相同属性和类集的实体进行分组，从而显著减少了数据传输量，在更好地平衡客户端和服务器之间的查询处理负载分担方面向前迈进了一步。我们的实验证明，无论是在高度并发查询执行的现有基准上，还是在受现有 SPARQL 端点查询日志启发的习惯查询工作量上，smart-KG+ 都明显优于现有的网络 SPARQL 接口。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

smart-KG: Partition-Based Linked Data Fragments for querying knowledge graphs

RDF and SPARQL provide a uniform way to publish and query billions of triples in open knowledge graphs (KGs) on the Web. Yet, provisioning of a fast, reliable, and responsive live querying solution for open KGs is still hardly possible through SPARQL endpoints alone: while such endpoints provide a remarkable performance for single queries, they typically can not cope with highly concurrent query workloads by multiple clients. To mitigate this, the Linked Data Fragments (LDF) framework sparked the design of different alternative low-cost interfaces such as Triple Pattern Fragments (TPF), that partially offload the query processing workload to the client side. On the downside, such interfaces still come with the expense of unnecessarily high network load due to the necessary transfer of intermediate results to the client, leading to query performance degradation compared with endpoints. To address this problem, in the present work, we investigate alternative interfaces, refining and extending the original TPF idea, which also aims at reducing server-resource consumption, by shipping query-relevant partitions of KGs from the server to the client. To this end, first, we align formal definitions and notations of the original LDF framework to uniformly present existing LDF implements and such “partition-based” LDF approaches. These novel LDF interfaces retrieve, instead of the exact triples matching a particular query pattern, a subset of pre-materialized, compressed, partitions of the original graph, containing all answers to a query pattern, to be further evaluated on the client side. As a concrete representative of partition-based LDF, we present smart-KG+, extending and refining our prior work (In WWW ’20: The Web Conference 2020 (2020) 984–994 ACM / IW3C2) in several respects. Our proposed approach is a step forward towards a better-balanced share of the query processing load between clients and servers by shipping graph partitions driven by the structure of RDF graphs to group entities described with the same sets of properties and classes, resulting in significant data transfer reduction. Our experiments demonstrate that the smart-KG+ significantly outperforms existing Web SPARQL interfaces on both pre-existing benchmarks for highly concurrent query execution as well as an accustomed query workload inspired by query logs of existing SPARQL endpoints.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Semantic Web COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

8.30

自引率

6.70%

发文量

期刊介绍： The journal Semantic Web – Interoperability, Usability, Applicability brings together researchers from various fields which share the vision and need for more effective and meaningful ways to share information across agents and services on the future internet and elsewhere. As such, Semantic Web technologies shall support the seamless integration of data, on-the-fly composition and interoperation of Web services, as well as more intuitive search engines. The semantics – or meaning – of information, however, cannot be defined without a context, which makes personalization, trust, and provenance core topics for Semantic Web research. New retrieval paradigms, user interfaces, and visualization techniques have to unleash the power of the Semantic Web and at the same time hide its complexity from the user. Based on this vision, the journal welcomes contributions ranging from theoretical and foundational research over methods and tools to descriptions of concrete ontologies and applications in all areas. We especially welcome papers which add a social, spatial, and temporal dimension to Semantic Web research, as well as application-oriented papers making use of formal semantics.