On data dependencies in dataspaces

Shaoxu Song, Lei Chen, Philip S. Yu
{"title":"On data dependencies in dataspaces","authors":"Shaoxu Song, Lei Chen, Philip S. Yu","doi":"10.1109/ICDE.2011.5767857","DOIUrl":null,"url":null,"abstract":"To study data dependencies over heterogeneous data in dataspaces, we define a general dependency form, namely comparable dependencies (CDs), which specifies constraints on comparable attributes. It covers the semantics of a broad class of dependencies in databases, including functional dependencies (FDs), metric functional dependencies (MFDs), and matching dependencies (MDs). As we illustrated, comparable dependencies are useful in real practice of dataspaces, e.g., semantic query optimization. Due to the heterogeneous data in dataspaces, the first question, known as the validation problem, is to determine whether a dependency (almost) holds in a data instance. Unfortunately, as we proved, the validation problem with certain error or confidence guarantee is generally hard. In fact, the confidence validation problem is also NP-hard to approximate to within any constant factor. Nevertheless, we develop several approaches for efficient approximation computation, including greedy and randomized approaches with an approximation bound on the maximum number of violations that an object may introduce. Finally, through an extensive experimental evaluation on real data, we verify the superiority of our methods.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 27th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2011.5767857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

Abstract

To study data dependencies over heterogeneous data in dataspaces, we define a general dependency form, namely comparable dependencies (CDs), which specifies constraints on comparable attributes. It covers the semantics of a broad class of dependencies in databases, including functional dependencies (FDs), metric functional dependencies (MFDs), and matching dependencies (MDs). As we illustrated, comparable dependencies are useful in real practice of dataspaces, e.g., semantic query optimization. Due to the heterogeneous data in dataspaces, the first question, known as the validation problem, is to determine whether a dependency (almost) holds in a data instance. Unfortunately, as we proved, the validation problem with certain error or confidence guarantee is generally hard. In fact, the confidence validation problem is also NP-hard to approximate to within any constant factor. Nevertheless, we develop several approaches for efficient approximation computation, including greedy and randomized approaches with an approximation bound on the maximum number of violations that an object may introduce. Finally, through an extensive experimental evaluation on real data, we verify the superiority of our methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
关于数据空间中的数据依赖关系
为了研究数据空间中异构数据的数据依赖关系,我们定义了一种通用的依赖关系形式,即可比依赖关系(cd),它指定了对可比属性的约束。它涵盖了数据库中大量依赖项的语义,包括功能依赖项(fd)、度量功能依赖项(mfd)和匹配依赖项(MDs)。正如我们所说明的,可比依赖关系在数据空间的实际实践中是有用的,例如,语义查询优化。由于数据空间中的数据是异构的,因此第一个问题(称为验证问题)是确定数据实例中是否存在依赖项(几乎)。不幸的是,正如我们所证明的,具有一定错误或置信度保证的验证问题通常是困难的。事实上,置信度验证问题也是np困难的,难以在任何常数因子内近似。然而,我们开发了几种有效的近似计算方法,包括贪心和随机方法,其近似界是一个对象可能引入的最大违例数。最后,通过对实际数据进行广泛的实验评估,验证了本文方法的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Advanced search, visualization and tagging of sensor metadata Bidirectional mining of non-redundant recurrent rules from a sequence database Web-scale information extraction with vertex Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins Dynamic prioritization of database queries
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1