Uniqueness Constraints for Object Stores

IF 1.5 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Journal of Data and Information Quality Pub Date : 2023-01-19 DOI:10.1145/3581758
Philipp Skavantzos, Uwe Leck, Kaiqi Zhao, S. Link
{"title":"Uniqueness Constraints for Object Stores","authors":"Philipp Skavantzos, Uwe Leck, Kaiqi Zhao, S. Link","doi":"10.1145/3581758","DOIUrl":null,"url":null,"abstract":"Object stores offer an increasingly popular choice for data management and analytics. As with every data model, managing the integrity of objects is fundamental for data quality but also important for the efficiency of update and query operations. In response to shortcomings of unique and existence constraints in object stores, we propose a new principled class of constraints that separates uniqueness from existence dimensions of data quality, and fully supports multiple labels and composite properties. We illustrate benefits of the constraints on real-world examples of property graphs where node integrity is enforced for better update and query performance. The benefits are quantified experimentally in terms of perfectly scaling the access to data through indices that result from the constraints. We establish axiomatic and algorithmic characterizations for the underlying implication problem. In addition, we fully characterize which non-redundant families of constraints attain maximum cardinality for any given finite sets of labels and properties. We exemplify further use cases of the constraints: elicitation of business rules, identification of data quality problems, and design for data quality. Finally, we propose extensions to managing the integrity of objects in object stores such as graph databases.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"15 1","pages":"1 - 29"},"PeriodicalIF":1.5000,"publicationDate":"2023-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal of Data and Information Quality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581758","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1

Abstract

Object stores offer an increasingly popular choice for data management and analytics. As with every data model, managing the integrity of objects is fundamental for data quality but also important for the efficiency of update and query operations. In response to shortcomings of unique and existence constraints in object stores, we propose a new principled class of constraints that separates uniqueness from existence dimensions of data quality, and fully supports multiple labels and composite properties. We illustrate benefits of the constraints on real-world examples of property graphs where node integrity is enforced for better update and query performance. The benefits are quantified experimentally in terms of perfectly scaling the access to data through indices that result from the constraints. We establish axiomatic and algorithmic characterizations for the underlying implication problem. In addition, we fully characterize which non-redundant families of constraints attain maximum cardinality for any given finite sets of labels and properties. We exemplify further use cases of the constraints: elicitation of business rules, identification of data quality problems, and design for data quality. Finally, we propose extensions to managing the integrity of objects in object stores such as graph databases.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
对象存储的唯一性约束
对象存储为数据管理和分析提供了一个日益流行的选择。与每个数据模型一样,管理对象的完整性是数据质量的基础,但对于更新和查询操作的效率也很重要。针对对象存储中存在的惟一性约束和存在性约束的不足,我们提出了一种新的原则约束,它将数据质量的惟一性维度与存在性维度分离开来,并完全支持多标签和复合属性。我们将在属性图的实际示例中说明约束的好处,其中强制节点完整性以获得更好的更新和查询性能。这些好处是通过实验量化的,即通过约束产生的索引完美地扩展对数据的访问。我们建立了隐含问题的公理和算法表征。此外,我们充分刻画了对于任何给定的有限标签和属性集,哪些非冗余约束族达到了最大基数。我们举例说明了约束的进一步用例:业务规则的推导、数据质量问题的识别以及数据质量的设计。最后,我们提出了扩展来管理对象存储(如图数据库)中对象的完整性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACM Journal of Data and Information Quality
ACM Journal of Data and Information Quality COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
4.10
自引率
4.80%
发文量
0
期刊最新文献
Text2EL+: Expert Guided Event Log Enrichment using Unstructured Text A Catalog of Consumer IoT Device Characteristics for Data Quality Estimation AI explainibility and acceptance; a case study for underwater mine hunting Data quality assessment through a preference model Editorial: Special Issue on Quality Aspects of Data Preparation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1