释放可扩展数据交换的功能依赖

A. Bonifati, Ioana Ileana, Michele Linardi
{"title":"释放可扩展数据交换的功能依赖","authors":"A. Bonifati, Ioana Ileana, Michele Linardi","doi":"10.1145/2949689.2949698","DOIUrl":null,"url":null,"abstract":"We address the problem of efficiently evaluating target functional dependencies (fds) in the Data Exchange (DE) process. Target fds naturally occur in many DE scenarios, including the ones in Life Sciences in which multiple source relations need to be structured under a constrained target schema. However, despite their wide use, target fds' evaluation is still a bottleneck in the state-of-the-art DE engines. Systems relying on an all-SQL approach typically do not support target fds unless additional information is provided. Alternatively, DE engines that do include these dependencies typically pay the price of a significant drop in performance and scalability. In this paper, we present a novel chase-based algorithm that can efficiently handle arbitrary fds on the target. Our approach essentially relies on exploiting the interactions between source-to-target (s-t) tuple-generating dependencies (tgds) and target fds. This allows us to tame the size of the intermediate chase results, by playing on a careful ordering of chase steps interleaving fds and (chosen) tgds. As a direct consequence, we importantly diminish the fd application scope, often a central cause of the dramatic overhead induced by target fds. Moreover, reasoning on dependency interaction further leads us to interesting parallelization opportunities, yielding additional scalability gains. We provide a proof-of-concept implementation of our chase-based algorithm and an experimental study aimed at gauging its scalability and efficiency. Finally, we empirically compare with the latest DE engines, and show that our algorithm outperforms them.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Functional Dependencies Unleashed for Scalable Data Exchange\",\"authors\":\"A. Bonifati, Ioana Ileana, Michele Linardi\",\"doi\":\"10.1145/2949689.2949698\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We address the problem of efficiently evaluating target functional dependencies (fds) in the Data Exchange (DE) process. Target fds naturally occur in many DE scenarios, including the ones in Life Sciences in which multiple source relations need to be structured under a constrained target schema. However, despite their wide use, target fds' evaluation is still a bottleneck in the state-of-the-art DE engines. Systems relying on an all-SQL approach typically do not support target fds unless additional information is provided. Alternatively, DE engines that do include these dependencies typically pay the price of a significant drop in performance and scalability. In this paper, we present a novel chase-based algorithm that can efficiently handle arbitrary fds on the target. Our approach essentially relies on exploiting the interactions between source-to-target (s-t) tuple-generating dependencies (tgds) and target fds. This allows us to tame the size of the intermediate chase results, by playing on a careful ordering of chase steps interleaving fds and (chosen) tgds. As a direct consequence, we importantly diminish the fd application scope, often a central cause of the dramatic overhead induced by target fds. Moreover, reasoning on dependency interaction further leads us to interesting parallelization opportunities, yielding additional scalability gains. We provide a proof-of-concept implementation of our chase-based algorithm and an experimental study aimed at gauging its scalability and efficiency. Finally, we empirically compare with the latest DE engines, and show that our algorithm outperforms them.\",\"PeriodicalId\":254803,\"journal\":{\"name\":\"Proceedings of the 28th International Conference on Scientific and Statistical Database Management\",\"volume\":\"75 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 28th International Conference on Scientific and Statistical Database Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2949689.2949698\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2949689.2949698","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

摘要

我们解决了在数据交换(DE)过程中有效评估目标功能依赖关系(fds)的问题。目标fd自然出现在许多DE场景中,包括生命科学中的场景,其中需要在受限的目标模式下构建多个源关系。然而,尽管目标fd被广泛使用,但其评估仍然是最先进的DE引擎的瓶颈。依赖于全sql方法的系统通常不支持目标fd,除非提供了额外的信息。另外,包含这些依赖项的DE引擎通常会付出性能和可伸缩性显著下降的代价。在本文中,我们提出了一种新的基于追逐的算法,可以有效地处理目标上的任意fds。我们的方法本质上依赖于利用源到目标(s-t)元组生成依赖关系(tgds)和目标fd之间的交互。这允许我们通过小心地排列追逐步骤,使fds和(选择的)tgds交织在一起,从而控制中间追逐结果的大小。直接的结果是,我们大大缩小了fd的应用范围,这通常是目标fd引起的巨大开销的主要原因。此外,对依赖交互的推理进一步为我们带来了有趣的并行化机会,从而获得了额外的可伸缩性收益。我们提供了基于追逐的算法的概念验证实现和旨在衡量其可扩展性和效率的实验研究。最后,我们与最新的DE引擎进行了经验比较,表明我们的算法优于它们。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Functional Dependencies Unleashed for Scalable Data Exchange
We address the problem of efficiently evaluating target functional dependencies (fds) in the Data Exchange (DE) process. Target fds naturally occur in many DE scenarios, including the ones in Life Sciences in which multiple source relations need to be structured under a constrained target schema. However, despite their wide use, target fds' evaluation is still a bottleneck in the state-of-the-art DE engines. Systems relying on an all-SQL approach typically do not support target fds unless additional information is provided. Alternatively, DE engines that do include these dependencies typically pay the price of a significant drop in performance and scalability. In this paper, we present a novel chase-based algorithm that can efficiently handle arbitrary fds on the target. Our approach essentially relies on exploiting the interactions between source-to-target (s-t) tuple-generating dependencies (tgds) and target fds. This allows us to tame the size of the intermediate chase results, by playing on a careful ordering of chase steps interleaving fds and (chosen) tgds. As a direct consequence, we importantly diminish the fd application scope, often a central cause of the dramatic overhead induced by target fds. Moreover, reasoning on dependency interaction further leads us to interesting parallelization opportunities, yielding additional scalability gains. We provide a proof-of-concept implementation of our chase-based algorithm and an experimental study aimed at gauging its scalability and efficiency. Finally, we empirically compare with the latest DE engines, and show that our algorithm outperforms them.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
SMS: Stable Matching Algorithm using Skylines Graph-based modelling of query sets for differential privacy Efficient Feedback Collection for Pay-as-you-go Source Selection Multi-Assignment Single Joins for Parallel Cross-Match of Astronomic Catalogs on Heterogeneous Clusters Compact and queryable representation of raster datasets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1