LINDA: distributed web-of-data-scale entity matching

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI:10.1145/2396761.2398582

Christoph Böhm, Gerard de Melo, Felix Naumann, G. Weikum

引用次数: 94

Abstract

Linked Data has emerged as a powerful way of interconnecting structured data on the Web. However, the cross-linkage between Linked Data sources is not as extensive as one would hope for. In this paper, we formalize the task of automatically creating "sameAs" links across data sources in a globally consistent manner. Our algorithm, presented in a multi-core as well as a distributed version, achieves this link generation by accounting for joint evidence of a match. Experiments confirm that our system scales beyond 100 million entities and delivers highly accurate results despite the vast heterogeneity and daunting scale.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

琳达:分布式数据网络规模的实体匹配

关联数据(Linked Data)作为一种连接Web上结构化数据的强大方式而出现。然而，关联数据源之间的交叉链接并不像人们希望的那样广泛。在本文中，我们以全局一致的方式形式化了跨数据源自动创建“相同”链接的任务。我们的算法在多核和分布式版本中提出，通过考虑匹配的联合证据来实现这种链接生成。实验证实，尽管存在巨大的异质性和令人生畏的规模，我们的系统仍然可以扩展超过1亿个实体，并提供高度准确的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 21st ACM international conference on Information and knowledge management

自引率

0.00%

发文量

期刊最新文献

Predicting web search success with fine-grained interaction data User activity profiling with multi-layer analysis Search result presentation based on faceted clustering Domain dependent query reformulation for web search CrowdTiles: presenting crowd-based information for event-driven information needs