Profiling linked open data with ProLOD

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2010-03-01 DOI:10.1109/ICDEW.2010.5452762

Christoph Böhm, Felix Naumann, Ziawasch Abedjan, D. Fenz, Toni Grütze, Daniel Hefenbrock, M. Pohl, David Sonnabend

{"title":"Profiling linked open data with ProLOD","authors":"Christoph Böhm, Felix Naumann, Ziawasch Abedjan, D. Fenz, Toni Grütze, Daniel Hefenbrock, M. Pohl, David Sonnabend","doi":"10.1109/ICDEW.2010.5452762","DOIUrl":null,"url":null,"abstract":"Linked open data (LOD), as provided by a quickly growing number of sources constitutes a wealth of easily accessible information. However, this data is not easy to understand. It is usually provided as a set of (RDF) triples, often enough in the form of enormous files covering many domains. What is more, the data usually has a loose structure when it is derived from end-user generated sources, such as Wikipedia. Finally, the quality of the actual data is also worrisome, because it may be incomplete, poorly formatted, inconsistent, etc. To understand and profile such linked open data, traditional data profiling methods do not suffice. With ProLOD, we propose a suite of methods ranging from the domain level (clustering, labeling), via the schema level (matching, disambiguation), to the data level (data type detection, pattern detection, value distribution). Packaged into an interactive, web-based tool, they allow iterative exploration and discovery of new LOD sources. Thus, users can quickly gauge the relevance of the source for the problem at hand (e.g., some integration task), focus on and explore the relevant subset.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"68","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDEW.2010.5452762","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 68

Abstract

Linked open data (LOD), as provided by a quickly growing number of sources constitutes a wealth of easily accessible information. However, this data is not easy to understand. It is usually provided as a set of (RDF) triples, often enough in the form of enormous files covering many domains. What is more, the data usually has a loose structure when it is derived from end-user generated sources, such as Wikipedia. Finally, the quality of the actual data is also worrisome, because it may be incomplete, poorly formatted, inconsistent, etc. To understand and profile such linked open data, traditional data profiling methods do not suffice. With ProLOD, we propose a suite of methods ranging from the domain level (clustering, labeling), via the schema level (matching, disambiguation), to the data level (data type detection, pattern detection, value distribution). Packaged into an interactive, web-based tool, they allow iterative exploration and discovery of new LOD sources. Thus, users can quickly gauge the relevance of the source for the problem at hand (e.g., some integration task), focus on and explore the relevant subset.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

分析与ProLOD链接的开放数据

链接开放数据(LOD)由数量迅速增长的来源提供，构成了大量易于获取的信息。然而，这些数据并不容易理解。它通常以一组(RDF)三元组的形式提供，通常以覆盖许多领域的巨大文件的形式提供。此外，当数据来自最终用户生成的源(如Wikipedia)时，数据通常具有松散的结构。最后，实际数据的质量也令人担忧，因为它可能不完整、格式不佳、不一致等。要理解和分析这种链接的开放数据，传统的数据分析方法是不够的。在ProLOD中，我们提出了一套方法，从领域级别(聚类、标记)，通过模式级别(匹配、消歧义)，到数据级别(数据类型检测、模式检测、值分布)。它们被打包成一个交互式的、基于web的工具，允许迭代地探索和发现新的LOD源。因此，用户可以快速测量手头问题(例如，某些集成任务)的源的相关性，关注并探索相关子集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)

自引率

0.00%

发文量

期刊最新文献

Fast algorithms for time series mining Ontology alignment argumentation with mutual dependency between arguments and mappings A first step towards integration independence Towards enterprise software as a service in the cloud U-DBSCAN : A density-based clustering algorithm for uncertain objects