Editorial: Special Issue on Data Linking

IF 3.1 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Journal of Web Semantics Pub Date : 2013-01-01 DOI:10.2139/ssrn.3199075

A. Ferrara, A. Nikolov, F. Scharffe

{"title":"Editorial: Special Issue on Data Linking","authors":"A. Ferrara, A. Nikolov, F. Scharffe","doi":"10.2139/ssrn.3199075","DOIUrl":null,"url":null,"abstract":"In this special issue of the Journal of Web Semantics, we present two papers dealing both with one of the most important problem in the field of web data management: data interlinking. This field has gained significant interest over the last years, with the evolution of web technologies enabling the emergence of a web of data. The exponentially increasing number of data sources published as linked data or embedded in web pages through the use of dedicated schemas require techniques able to efficiently identify common entities appearing across these sources. Over the last years many systems were developed involving a wide range of techniques taking into account various information about the data sets involved in order to find the most accurate links between them. Vocabularies, existing links, data ranges, ontology alignments, and user input are combined for the best results. Most efficient systems are semiautomated as they require the user to input a linkage specification, indicating what to link with what and thus guiding the tool in the process. However, for web scale data interlinking, the amount of user input in a link specification is still too high. Most recent research thus focus on minimizing the user input. The two papers in this special issue are presenting research results going in this direction, each of them following a specific path to achieve a similar goal. In the first paper Active Learning of Expressive Linkage Rules using Genetic Programming, the authors of the interlinking tool Silk present a technique to automate the construction of linkage specifications through active learning and genetic algorithms. The resulting system only requires the user to validate a few links until an acceptable specification is reached. In the second paper An Automatic Key Discovery Approach for Data Linking, Fatiha SAIS, Nathalie Pernelle, and Danai Symeonidou propose a technique to automate the selection of predicates to be compared during the interlinking process. The method discovers sets of properties allowing to identify data resources uniquely in a given data set, similarly to the notion of keys in relational databases. Both articles have gone through a very rigorous selection process and were both improved since their first submission. It was an editorial choice to only retain articles meeting a very high standard, resulting in only two articles published. We believe this will ensure a stronger field of research. Enjoy reading!","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"23 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Semantics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.2139/ssrn.3199075","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In this special issue of the Journal of Web Semantics, we present two papers dealing both with one of the most important problem in the field of web data management: data interlinking. This field has gained significant interest over the last years, with the evolution of web technologies enabling the emergence of a web of data. The exponentially increasing number of data sources published as linked data or embedded in web pages through the use of dedicated schemas require techniques able to efficiently identify common entities appearing across these sources. Over the last years many systems were developed involving a wide range of techniques taking into account various information about the data sets involved in order to find the most accurate links between them. Vocabularies, existing links, data ranges, ontology alignments, and user input are combined for the best results. Most efficient systems are semiautomated as they require the user to input a linkage specification, indicating what to link with what and thus guiding the tool in the process. However, for web scale data interlinking, the amount of user input in a link specification is still too high. Most recent research thus focus on minimizing the user input. The two papers in this special issue are presenting research results going in this direction, each of them following a specific path to achieve a similar goal. In the first paper Active Learning of Expressive Linkage Rules using Genetic Programming, the authors of the interlinking tool Silk present a technique to automate the construction of linkage specifications through active learning and genetic algorithms. The resulting system only requires the user to validate a few links until an acceptable specification is reached. In the second paper An Automatic Key Discovery Approach for Data Linking, Fatiha SAIS, Nathalie Pernelle, and Danai Symeonidou propose a technique to automate the selection of predicates to be compared during the interlinking process. The method discovers sets of properties allowing to identify data resources uniquely in a given data set, similarly to the notion of keys in relational databases. Both articles have gone through a very rigorous selection process and were both improved since their first submission. It was an editorial choice to only retain articles meeting a very high standard, resulting in only two articles published. We believe this will ensure a stronger field of research. Enjoy reading!

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

社论:数据链接特刊

在本期的《网络语义学杂志》中，我们发表了两篇论文，讨论了网络数据管理领域中最重要的问题之一:数据互连。随着网络技术的发展，数据网络的出现，这个领域在过去几年里获得了极大的兴趣。作为链接数据发布或通过使用专用模式嵌入web页面的数据源数量呈指数级增长，这需要能够有效识别这些数据源中出现的公共实体的技术。在过去几年中，开发了许多系统，涉及范围广泛的技术，考虑到有关所涉数据集的各种信息，以便找到它们之间最准确的联系。将词汇表、现有链接、数据范围、本体对齐和用户输入结合起来，以获得最佳结果。大多数有效的系统都是半自动化的，因为它们需要用户输入链接规范，指示什么与什么链接，从而在过程中指导工具。然而，对于网络规模的数据互连，用户在一个链路规范中的输入量仍然过高。因此，最近的研究主要集中在最小化用户输入上。本期特刊的两篇论文都是在这个方向上展示研究成果，每一篇论文都遵循一个特定的路径来实现类似的目标。在第一篇论文中，使用遗传规划的表达性链接规则的主动学习，互连工具Silk的作者提出了一种通过主动学习和遗传算法自动构建链接规范的技术。生成的系统只需要用户验证几个链接，直到达到可接受的规范。在第二篇论文《数据链接的自动键发现方法》中，Fatiha SAIS、Nathalie Pernelle和Danai Symeonidou提出了一种技术，可以在互连过程中自动选择要比较的谓词。该方法发现允许在给定数据集中唯一地标识数据资源的属性集，类似于关系数据库中的键的概念。这两篇文章都经过了非常严格的筛选过程，并且自首次提交以来都得到了改进。这是编辑的选择，只保留符合非常高标准的文章，结果只发表了两篇文章。我们相信这将确保一个更强大的研究领域。喜欢阅读!

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Web Semantics 工程技术-计算机：人工智能

CiteScore

6.20

自引率

12.00%

发文量

审稿时长

14.6 weeks

期刊介绍： The Journal of Web Semantics is an interdisciplinary journal based on research and applications of various subject areas that contribute to the development of a knowledge-intensive and intelligent service Web. These areas include: knowledge technologies, ontology, agents, databases and the semantic grid, obviously disciplines like information retrieval, language technology, human-computer interaction and knowledge discovery are of major relevance as well. All aspects of the Semantic Web development are covered. The publication of large-scale experiments and their analysis is also encouraged to clearly illustrate scenarios and methods that introduce semantics into existing Web interfaces, contents and services. The journal emphasizes the publication of papers that combine theories, methods and experiments from different subject areas in order to deliver innovative semantic methods and applications.