Profiling linked open data with ProLOD

Christoph Böhm, Felix Naumann, Ziawasch Abedjan, D. Fenz, Toni Grütze, Daniel Hefenbrock, M. Pohl, David Sonnabend
{"title":"Profiling linked open data with ProLOD","authors":"Christoph Böhm, Felix Naumann, Ziawasch Abedjan, D. Fenz, Toni Grütze, Daniel Hefenbrock, M. Pohl, David Sonnabend","doi":"10.1109/ICDEW.2010.5452762","DOIUrl":null,"url":null,"abstract":"Linked open data (LOD), as provided by a quickly growing number of sources constitutes a wealth of easily accessible information. However, this data is not easy to understand. It is usually provided as a set of (RDF) triples, often enough in the form of enormous files covering many domains. What is more, the data usually has a loose structure when it is derived from end-user generated sources, such as Wikipedia. Finally, the quality of the actual data is also worrisome, because it may be incomplete, poorly formatted, inconsistent, etc. To understand and profile such linked open data, traditional data profiling methods do not suffice. With ProLOD, we propose a suite of methods ranging from the domain level (clustering, labeling), via the schema level (matching, disambiguation), to the data level (data type detection, pattern detection, value distribution). Packaged into an interactive, web-based tool, they allow iterative exploration and discovery of new LOD sources. Thus, users can quickly gauge the relevance of the source for the problem at hand (e.g., some integration task), focus on and explore the relevant subset.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"68","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDEW.2010.5452762","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 68

Abstract

Linked open data (LOD), as provided by a quickly growing number of sources constitutes a wealth of easily accessible information. However, this data is not easy to understand. It is usually provided as a set of (RDF) triples, often enough in the form of enormous files covering many domains. What is more, the data usually has a loose structure when it is derived from end-user generated sources, such as Wikipedia. Finally, the quality of the actual data is also worrisome, because it may be incomplete, poorly formatted, inconsistent, etc. To understand and profile such linked open data, traditional data profiling methods do not suffice. With ProLOD, we propose a suite of methods ranging from the domain level (clustering, labeling), via the schema level (matching, disambiguation), to the data level (data type detection, pattern detection, value distribution). Packaged into an interactive, web-based tool, they allow iterative exploration and discovery of new LOD sources. Thus, users can quickly gauge the relevance of the source for the problem at hand (e.g., some integration task), focus on and explore the relevant subset.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
分析与ProLOD链接的开放数据
链接开放数据(LOD)由数量迅速增长的来源提供,构成了大量易于获取的信息。然而,这些数据并不容易理解。它通常以一组(RDF)三元组的形式提供,通常以覆盖许多领域的巨大文件的形式提供。此外,当数据来自最终用户生成的源(如Wikipedia)时,数据通常具有松散的结构。最后,实际数据的质量也令人担忧,因为它可能不完整、格式不佳、不一致等。要理解和分析这种链接的开放数据,传统的数据分析方法是不够的。在ProLOD中,我们提出了一套方法,从领域级别(聚类、标记),通过模式级别(匹配、消歧义),到数据级别(数据类型检测、模式检测、值分布)。它们被打包成一个交互式的、基于web的工具,允许迭代地探索和发现新的LOD源。因此,用户可以快速测量手头问题(例如,某些集成任务)的源的相关性,关注并探索相关子集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Fast algorithms for time series mining Ontology alignment argumentation with mutual dependency between arguments and mappings A first step towards integration independence Towards enterprise software as a service in the cloud U-DBSCAN : A density-based clustering algorithm for uncertain objects
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1