Knowledge Graph Construction with a Façade: A Unified Method to Access Heterogeneous Data Sources on the Web

IF 4.1 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Internet Technology Pub Date : 2023-02-23 DOI:https://dl.acm.org/doi/10.1145/3555312

Luigi Asprino, Enrico Daga, Aldo Gangemi, Paul Mulholland

{"title":"Knowledge Graph Construction with a Façade: A Unified Method to Access Heterogeneous Data Sources on the Web","authors":"Luigi Asprino, Enrico Daga, Aldo Gangemi, Paul Mulholland","doi":"https://dl.acm.org/doi/10.1145/3555312","DOIUrl":null,"url":null,"abstract":"Data integration is the dominant use case for RDF Knowledge Graphs. However, Web resources come in formats with weak semantics (for example, CSV and JSON), or formats specific to a given application (for example, BibTex, HTML, and Markdown). To solve this problem, Knowledge Graph Construction (KGC) is gaining momentum due to its focus on supporting users in transforming data into RDF. However, using existing KGC frameworks result in complex data processing pipelines, which mix structural and semantic mappings, whose development and maintenance constitute a significant bottleneck for KG engineers. Such frameworks force users to rely on different tools, sometimes based on heterogeneous languages, for inspecting sources, designing mappings, and generating triples, thus making the process unnecessarily complicated. We argue that it is possible and desirable to equip KG engineers with the ability of interacting with Web data formats by relying on their expertise in RDF and the well-established SPARQL query language [2]. In this article, we study a unified method for data access to heterogeneous data sources with Facade-X, a meta-model implemented in a new data integration system called SPARQL Anything. We demonstrate that our approach is theoretically sound, since it allows a single meta-model, based on RDF, to represent data from (a) any file format expressible in BNF syntax, as well as (b) any relational database. We compare our method to state-of-the-art approaches in terms of usability (cognitive complexity of the mappings) and general performance. Finally, we discuss the benefits and challenges of this novel approach by engaging with the reference user community.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"118 1","pages":""},"PeriodicalIF":4.1000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Internet Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3555312","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Data integration is the dominant use case for RDF Knowledge Graphs. However, Web resources come in formats with weak semantics (for example, CSV and JSON), or formats specific to a given application (for example, BibTex, HTML, and Markdown). To solve this problem, Knowledge Graph Construction (KGC) is gaining momentum due to its focus on supporting users in transforming data into RDF. However, using existing KGC frameworks result in complex data processing pipelines, which mix structural and semantic mappings, whose development and maintenance constitute a significant bottleneck for KG engineers. Such frameworks force users to rely on different tools, sometimes based on heterogeneous languages, for inspecting sources, designing mappings, and generating triples, thus making the process unnecessarily complicated. We argue that it is possible and desirable to equip KG engineers with the ability of interacting with Web data formats by relying on their expertise in RDF and the well-established SPARQL query language [2].

In this article, we study a unified method for data access to heterogeneous data sources with Facade-X, a meta-model implemented in a new data integration system called SPARQL Anything. We demonstrate that our approach is theoretically sound, since it allows a single meta-model, based on RDF, to represent data from (a) any file format expressible in BNF syntax, as well as (b) any relational database. We compare our method to state-of-the-art approaches in terms of usability (cognitive complexity of the mappings) and general performance. Finally, we discuss the benefits and challenges of this novel approach by engaging with the reference user community.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于farade的知识图谱构建:一种访问Web异构数据源的统一方法

数据集成是RDF知识图的主要用例。但是，Web资源的格式具有较弱的语义(例如，CSV和JSON)，或者特定于给定应用程序的格式(例如，BibTex、HTML和Markdown)。为了解决这个问题，知识图构造(Knowledge Graph Construction, KGC)正在获得动力，因为它专注于支持用户将数据转换为RDF。然而，使用现有的KGC框架会导致复杂的数据处理管道，其中混合了结构和语义映射，其开发和维护构成了KG工程师的重大瓶颈。这样的框架迫使用户依赖不同的工具(有时是基于异构语言)来检查源、设计映射和生成三元组，从而使过程不必要地复杂化。我们认为，通过依赖他们在RDF和完善的SPARQL查询语言方面的专业知识，使KG工程师具备与Web数据格式交互的能力是可能的，也是可取的[2]。在本文中，我们研究了使用Facade-X对异构数据源进行数据访问的统一方法，Facade-X是在名为SPARQL Anything的新数据集成系统中实现的元模型。我们证明了我们的方法在理论上是合理的，因为它允许一个基于RDF的元模型来表示来自(a)任何可以用BNF语法表示的文件格式以及(b)任何关系数据库的数据。我们将我们的方法与最先进的方法在可用性(映射的认知复杂性)和一般性能方面进行比较。最后，我们通过参考用户社区讨论了这种新方法的好处和挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Internet Technology 工程技术-计算机：软件工程

CiteScore

10.30

自引率

1.90%

发文量

137

审稿时长

>12 weeks

期刊介绍： ACM Transactions on Internet Technology (TOIT) brings together many computing disciplines including computer software engineering, computer programming languages, middleware, database management, security, knowledge discovery and data mining, networking and distributed systems, communications, performance and scalability etc. TOIT will cover the results and roles of the individual disciplines and the relationshipsamong them.