首页 > 最新文献

Proceedings. ACM-SIGMOD International Conference on Management of Data最新文献

英文 中文
Protecting Data Markets from Strategic Buyers 保护数据市场免受战略买家的影响
Pub Date : 2022-01-01 DOI: 10.1145/3514221.3517855
R. Fernandez
{"title":"Protecting Data Markets from Strategic Buyers","authors":"R. Fernandez","doi":"10.1145/3514221.3517855","DOIUrl":"https://doi.org/10.1145/3514221.3517855","url":null,"abstract":"","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72847516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XLJoins XLJoins
Pub Date : 2021-01-01 DOI: 10.1145/3448016.3450582
A. Shanghooshabad
Figure 1: An XLJoin example (QX from TPC-H benchmark): Structure learning component receives a join query, metadata, tables and existing models, and builds an MRF graph based on the query then while inferring the JAs (nodes showed in black), a BN is built, and finally, a uniform sample of JAs is generated using Ancestral sampling starting from the root to the leaves. Non-JAs (blue nodes) are added using the MRF once the JAs sampled from the BN because they do not affect uniformity.
图1:XLJoin示例(来自TPC-H基准测试的QX):结构学习组件接收连接查询、元数据、表和现有模型,并基于查询构建MRF图,然后在推断JAs(黑色显示的节点)的同时构建BN,最后使用从根到叶的祖先采样生成JAs的统一样本。非JAs(蓝色节点)一旦从BN中采样,就使用MRF添加,因为它们不影响均匀性。
{"title":"XLJoins","authors":"A. Shanghooshabad","doi":"10.1145/3448016.3450582","DOIUrl":"https://doi.org/10.1145/3448016.3450582","url":null,"abstract":"Figure 1: An XLJoin example (QX from TPC-H benchmark): Structure learning component receives a join query, metadata, tables and existing models, and builds an MRF graph based on the query then while inferring the JAs (nodes showed in black), a BN is built, and finally, a uniform sample of JAs is generated using Ancestral sampling starting from the root to the leaves. Non-JAs (blue nodes) are added using the MRF once the JAs sampled from the BN because they do not affect uniformity.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77595776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Convergence of Array DBMS and Cellular Automata: A Road Traffic Simulation Case 阵列DBMS和元胞自动机的收敛:一个道路交通仿真案例
Pub Date : 2021-01-01 DOI: 10.1145/3448016.3458457
R. A. R. Zalipynis
{"title":"Convergence of Array DBMS and Cellular Automata: A Road Traffic Simulation Case","authors":"R. A. R. Zalipynis","doi":"10.1145/3448016.3458457","DOIUrl":"https://doi.org/10.1145/3448016.3458457","url":null,"abstract":"","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77669625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finding Related Tables in Data Lakes for Interactive Data Science. 在交互式数据科学中寻找数据湖中的相关表。
Pub Date : 2020-06-01 DOI: 10.1145/3318464.3389726
Yi Zhang, Zachary G Ives

Many modern data science applications build on data lakes, schema-agnostic repositories of data files and data products that offer limited organization and management capabilities. There is a need to build data lake search capabilities into data science environments, so scientists and analysts can find tables, schemas, workflows, and datasets useful to their task at hand. We develop search and management solutions for the Jupyter Notebook data science platform, to enable scientists to augment training data, find potential features to extract, clean data, and find joinable or linkable tables. Our core methods also generalize to other settings where computational tasks involve execution of programs or scripts.

许多现代数据科学应用程序建立在数据湖、与模式无关的数据文件存储库和数据产品之上,它们提供的组织和管理功能有限。有必要在数据科学环境中构建数据湖搜索功能,这样科学家和分析师就可以找到对他们手头任务有用的表、模式、工作流和数据集。我们为Jupyter Notebook数据科学平台开发搜索和管理解决方案,使科学家能够增强训练数据,找到提取的潜在特征,清理数据,并找到可连接或可链接的表。我们的核心方法也可以推广到涉及程序或脚本执行的计算任务的其他设置。
{"title":"Finding Related Tables in Data Lakes for Interactive Data Science.","authors":"Yi Zhang,&nbsp;Zachary G Ives","doi":"10.1145/3318464.3389726","DOIUrl":"https://doi.org/10.1145/3318464.3389726","url":null,"abstract":"<p><p>Many modern data science applications build on <i>data lakes</i>, schema-agnostic repositories of data files and data products that offer limited organization and management capabilities. There is a need to build data lake search capabilities into data science environments, so scientists and analysts can find tables, schemas, workflows, and datasets useful to their task at hand. We develop search and management solutions for the Jupyter Notebook data science platform, to enable scientists to augment training data, find potential features to extract, clean data, and find joinable or linkable tables. Our core methods also generalize to other settings where computational tasks involve execution of programs or scripts.</p>","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3318464.3389726","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38553303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Near-Optimal Distributed Band-Joins through Recursive Partitioning. 通过递归分区实现近最优分布式带状连接
Pub Date : 2020-06-01 DOI: 10.1145/3318464.3389750
Rundong Li, Wolfgang Gatterbauer, Mirek Riedewald

We consider running-time optimization for band-joins in a distributed system, e.g., the cloud. To balance load across worker machines, input has to be partitioned, which causes duplication. We explore how to resolve this tension between maximum load per worker and input duplication for band-joins between two relations. Previous work suffered from high optimization cost or considered partitionings that were too restricted (resulting in suboptimal join performance). Our main insight is that recursive partitioning of the join-attribute space with the appropriate split scoring measure can achieve both low optimization cost and low join cost. It is the first approach that is not only effective for one-dimensional band-joins but also for joins on multiple attributes. Experiments indicate that our method is able to find partitionings that are within 10% of the lower bound for both maximum load per worker and input duplication for a broad range of settings, significantly improving over previous work.

我们考虑的是分布式系统(如云)中带状连接的运行时间优化问题。为了平衡工人机之间的负载,必须对输入进行分区,这就造成了重复。我们探讨了如何解决两个关系之间带状连接的每个工作机最大负载和输入重复之间的矛盾。以前的工作存在优化成本过高或考虑的分区限制过多(导致连接性能不理想)的问题。我们的主要见解是,对连接属性空间进行递归分区,并采用适当的拆分评分标准,既能降低优化成本,又能降低连接成本。这是第一种不仅对一维带状连接有效,而且对多属性连接也有效的方法。实验表明,我们的方法能够在各种设置下找到每个工作者最大负载和输入重复率都在下限 10% 以内的分区,比以前的工作有了显著提高。
{"title":"Near-Optimal Distributed Band-Joins through Recursive Partitioning.","authors":"Rundong Li, Wolfgang Gatterbauer, Mirek Riedewald","doi":"10.1145/3318464.3389750","DOIUrl":"10.1145/3318464.3389750","url":null,"abstract":"<p><p>We consider running-time optimization for band-joins in a distributed system, e.g., the cloud. To balance load across worker machines, input has to be partitioned, which causes duplication. We explore how to resolve this tension between <i>maximum load per worker</i> and <i>input duplication</i> for band-joins between two relations. Previous work suffered from high optimization cost or considered partitionings that were too restricted (resulting in suboptimal join performance). Our main insight is that <i>recursive partitioning of the join-attribute space</i> with the appropriate split scoring measure can achieve both low optimization cost and low join cost. It is the first approach that is not only effective for one-dimensional band-joins but also for joins on multiple attributes. Experiments indicate that our method is able to find partitionings that are within 10% of the <i>lower bound</i> for both maximum load per worker and input duplication for a broad range of settings, significantly improving over previous work.</p>","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7872589/pdf/nihms-1666242.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25354876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Join Algorithms Meet Top-k. 最佳连接算法与 Top-k 相结合。
Pub Date : 2020-06-01 DOI: 10.1145/3318464.3383132
Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

Top-k queries have been studied intensively in the database community and they are an important means to reduce query cost when only the "best" or "most interesting" results are needed instead of the full output. While some optimality results exist, e.g., the famous Threshold Algorithm, they hold only in a fairly limited model of computation that does not account for the cost incurred by large intermediate results and hence is not aligned with typical database-optimizer cost models. On the other hand, the idea of avoiding large intermediate results is arguably the main goal of recent work on optimal join algorithms, which uses the standard RAM model of computation to determine algorithm complexity. This research has created a lot of excitement due to its promise of reducing the time complexity of join queries with cycles, but it has mostly focused on full-output computation. We argue that the two areas can and should be studied from a unified point of view in order to achieve optimality in the common model of computation for a very general class of top-k-style join queries. This tutorial has two main objectives. First, we will explore and contrast the main assumptions, concepts, and algorithmic achievements of the two research areas. Second, we will cover recent, as well as some older, approaches that emerged at the intersection to support efficient ranked enumeration of join-query results. These are related to classic work on k-shortest path algorithms and more general optimization problems, some of which dates back to the 1950s. We demonstrate that this line of research warrants renewed attention in the challenging context of ranked enumeration for general join queries.

当只需要 "最好 "或 "最有趣 "的结果而不是全部输出结果时,Top-k 查询是降低查询成本的重要手段。虽然存在一些优化结果,例如著名的阈值算法,但这些结果只在相当有限的计算模型中成立,没有考虑大的中间结果所产生的成本,因此与典型的数据库优化器成本模型不一致。另一方面,避免大量中间结果的想法可以说是最近关于最优连接算法研究的主要目标,该研究使用标准 RAM 计算模型来确定算法复杂度。这项研究有望降低循环连接查询的时间复杂度,因此引起了广泛关注,但它主要集中在全输出计算上。我们认为,这两个领域可以而且应该从统一的角度进行研究,以便在通用计算模型中为一类非常通用的拓扑式连接查询实现最优性。本教程有两个主要目标。首先,我们将探讨和对比这两个研究领域的主要假设、概念和算法成就。其次,我们将介绍最近和以前出现的一些方法,这些方法支持对联接查询结果进行高效的排序枚举。这些方法与 k 最短路径算法和更一般的优化问题方面的经典工作有关,其中一些工作可以追溯到 20 世纪 50 年代。我们证明,在对一般连接查询进行排序枚举这一具有挑战性的背景下,这一研究方向值得重新关注。
{"title":"Optimal Join Algorithms Meet Top-<i>k</i>.","authors":"Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald","doi":"10.1145/3318464.3383132","DOIUrl":"10.1145/3318464.3383132","url":null,"abstract":"<p><p><i>Top-k queries</i> have been studied intensively in the database community and they are an important means to reduce query cost when only the \"best\" or \"most interesting\" results are needed instead of the full output. While some optimality results exist, e.g., the famous Threshold Algorithm, they hold only in a fairly limited model of computation that does not account for the cost incurred by large intermediate results and hence is not aligned with typical database-optimizer cost models. On the other hand, the idea of avoiding large intermediate results is arguably the main goal of recent work on <i>optimal join algorithms</i>, which uses the standard RAM model of computation to determine algorithm complexity. This research has created a lot of excitement due to its promise of reducing the time complexity of join queries with cycles, but it has mostly focused on full-output computation. We argue that the two areas can and should be studied from a unified point of view in order to achieve optimality in the common model of computation for a very general class of top-<i>k</i>-style join queries. This tutorial has two main objectives. First, we will explore and contrast the main assumptions, concepts, and algorithmic achievements of the two research areas. Second, we will cover recent, as well as some older, approaches that emerged at the intersection to support efficient <i>ranked enumeration of join-query results</i>. These are related to classic work on <i>k</i>-shortest path algorithms and more general optimization problems, some of which dates back to the 1950s. We demonstrate that this line of research warrants renewed attention in the challenging context of ranked enumeration for general join queries.</p>","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7872590/pdf/nihms-1666240.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25354877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances. 超越出处:用基于模式的抗衡解释查询答案
Pub Date : 2019-06-01 DOI: 10.1145/3299869.3300066
Zhengjie Miao, Qitian Zeng, Boris Glavic, Sudeepa Roy

Provenance and intervention-based techniques have been used to explain surprisingly high or low outcomes of aggregation queries. However, such techniques may miss interesting explanations emerging from data that is not in the provenance. For instance, an unusually low number of publications of a prolific researcher in a certain venue and year can be explained by an increased number of publications in another venue in the same year. We present a novel approach for explaining outliers in aggregation queries through counter-balancing. That is, explanations are outliers in the opposite direction of the outlier of interest. Outliers are defined w.r.t. patterns that hold over the data in aggregate. We present efficient methods for mining such aggregate regression patterns (ARPs), discuss how to use ARPs to generate and rank explanations, and experimentally demonstrate the efficiency and effectiveness of our approach.

基于来源和干预的技术已被用于解释聚合查询中令人惊讶的高或低结果。然而,这些技术可能会遗漏从非来源数据中产生的有趣解释。例如,某个多产研究人员在某年某地发表的论文数量异常低,可以用同年在另一个地方发表的论文数量增加来解释。我们提出了一种通过反平衡来解释聚合查询中异常值的新方法。也就是说,所解释的异常值与所关注的异常值方向相反。离群值的定义与数据总体上的模式有关。我们提出了挖掘这种聚合回归模式(ARP)的有效方法,讨论了如何使用 ARP 生成解释并对其进行排序,并通过实验证明了我们方法的效率和有效性。
{"title":"Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances.","authors":"Zhengjie Miao, Qitian Zeng, Boris Glavic, Sudeepa Roy","doi":"10.1145/3299869.3300066","DOIUrl":"10.1145/3299869.3300066","url":null,"abstract":"<p><p>Provenance and intervention-based techniques have been used to explain surprisingly high or low outcomes of aggregation queries. However, such techniques may miss interesting explanations emerging from data that is <i>not</i> in the provenance. For instance, an unusually low number of publications of a prolific researcher in a certain venue and year can be explained by an increased number of publications in another venue in the same year. We present a novel approach for explaining outliers in aggregation queries through <i>counter-balancing</i>. That is, explanations are outliers in the opposite direction of the outlier of interest. Outliers are defined w.r.t. patterns that hold over the data in aggregate. We present efficient methods for mining such <i>aggregate regression patterns</i> (<i>ARPs</i>), discuss how to use ARPs to generate and rank explanations, and experimentally demonstrate the efficiency and effectiveness of our approach.</p>","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6980245/pdf/nihms-1030948.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37581491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iQCAR: inter-Query Contention Analyzer for Data Analytics Frameworks. iQCAR:数据分析框架的查询间争用分析器。
Pub Date : 2019-06-01 DOI: 10.1145/3299869.3319904
Prajakta Kalmegh, Shivnath Babu, Sudeepa Roy

Resource interferences caused by concurrent queries is one of the key reasons for unpredictable performance and missed workload SLAs in cluster computing systems. Analyzing these inter-query resource interactions is critical in order to answer time-sensitive questions like 'who is creating resource conflicts to my query'. More importantly, diagnosing whether the resource blocked times of a 'victim' query are caused by other queries or some other external factor can help the database administrator narrow down the many possibilities of query performance degradation. We introduce iQCAR, an inter-Query Contention Analyzer, that attributes blame for the slowdown of a query to concurrent queries. iQCAR models the resource conflicts using a multi-level directed acyclic graph that can help administrators compare impacts from concurrent queries, identify most contentious queries, resources and hosts in an online execution for a selected time window. Our experiments using TPCDS queries on Apache Spark show that our approach is substantially more accurate than other methods based on overlap time between concurrent queries.

并发查询引起的资源干扰是集群计算系统中性能不可预测和工作负载sla缺失的主要原因之一。分析这些查询间资源交互对于回答诸如“谁在为我的查询创建资源冲突”之类的时间敏感问题至关重要。更重要的是,诊断“受害者”查询的资源阻塞时间是由其他查询还是其他外部因素引起的,可以帮助数据库管理员缩小查询性能下降的许多可能性。我们介绍了iQCAR,一个查询间争用分析器,它将查询速度变慢的原因归结为并发查询。iQCAR使用多级有向无环图对资源冲突进行建模,该图可以帮助管理员比较并发查询的影响,在选定的时间窗口内识别在线执行中最有争议的查询、资源和主机。我们在Apache Spark上使用TPCDS查询的实验表明,基于并发查询之间的重叠时间,我们的方法比其他方法要准确得多。
{"title":"iQCAR: inter-Query Contention Analyzer for Data Analytics Frameworks.","authors":"Prajakta Kalmegh,&nbsp;Shivnath Babu,&nbsp;Sudeepa Roy","doi":"10.1145/3299869.3319904","DOIUrl":"https://doi.org/10.1145/3299869.3319904","url":null,"abstract":"<p><p>Resource interferences caused by concurrent queries is one of the key reasons for unpredictable performance and missed workload SLAs in cluster computing systems. Analyzing these inter-query resource interactions is critical in order to answer time-sensitive questions like 'who is creating resource conflicts to my query'. More importantly, diagnosing whether the resource blocked times of a 'victim' query are caused by other queries or some other external factor can help the database administrator narrow down the many possibilities of query performance degradation. We introduce iQCAR, an inter-Query Contention Analyzer, that attributes blame for the slowdown of a query to concurrent queries. iQCAR models the resource conflicts using a multi-level directed acyclic graph that can help administrators compare impacts from concurrent queries, identify most contentious queries, resources and hosts in an online execution for a selected time window. Our experiments using TPCDS queries on Apache Spark show that our approach is substantially more accurate than other methods based on overlap time between concurrent queries.</p>","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3299869.3319904","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25578774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
RATest: Explaining Wrong Relational Queries Using Small Examples. RATest:用小例子解释错误的关系查询。
Pub Date : 2019-06-01 DOI: 10.1145/3299869.3320236
Zhengjie Miao, Sudeepa Roy, Jun Yang

We present a system called RATEST, designed to help debug relational queries against reference queries and test database instances. In many applications, e.g., classroom learning and regression testing, we test the correctness of a user query Q by evaluating it over a test database instance D and comparing its result with that of evaluating a reference (correct) query Q 0 over D. If Q(D) differs from Q 0(D), the user knows Q is incorrect. However, D can be large (often by design), which makes debugging Q difficult. The key idea behind RATEST is to show the user a much smaller database instance D' ⊆ D, which we call a counterexample, such that Q(D') ≠ Q 0(D'). RATEST builds on data provenance and constraint solving, and employs a suite of techniques to support, at interactive speed, complex queries involving differences and group-by aggregation. We demonstrate an application of RATEST in learning: it has been used successfully by a large undergraduate database course in a university to help students with a relational algebra assignment.

我们提出了一个名为RATEST的系统,旨在帮助根据引用查询调试关系查询并测试数据库实例。在许多应用程序中,例如课堂学习和回归测试,我们通过在测试数据库实例D上评估用户查询Q,并将其结果与在D上评估参考(正确)查询Q 0的结果进行比较,来测试用户查询Q的正确性。如果Q(D)与Q 0(D)不同,则用户知道Q不正确。然而,D可能很大(通常是设计的),这使得调试Q变得困难。RATEST背后的关键思想是向用户显示一个小得多的数据库实例D'⊆D,我们称之为反例,这样Q(D')≠Q 0(D'。RATEST建立在数据来源和约束解决的基础上,并采用一套技术以交互速度支持涉及差异和按聚合分组的复杂查询。我们展示了RATEST在学习中的应用:它已被一所大学的大型本科生数据库课程成功地用于帮助学生完成关系代数作业。
{"title":"RATest: Explaining Wrong Relational Queries Using Small Examples.","authors":"Zhengjie Miao,&nbsp;Sudeepa Roy,&nbsp;Jun Yang","doi":"10.1145/3299869.3320236","DOIUrl":"https://doi.org/10.1145/3299869.3320236","url":null,"abstract":"<p><p>We present a system called RATEST, designed to help debug relational queries against reference queries and test database instances. In many applications, e.g., classroom learning and regression testing, we test the correctness of a user query <i>Q</i> by evaluating it over a test database instance <i>D</i> and comparing its result with that of evaluating a reference (correct) query <i>Q</i> <sub>0</sub> over <i>D</i>. If <i>Q</i>(<i>D</i>) differs from <i>Q</i> <sub>0</sub>(<i>D</i>), the user knows <i>Q</i> is incorrect. However, <i>D</i> can be large (often by design), which makes debugging <i>Q</i> difficult. The key idea behind RATEST is to show the user a much smaller database instance <i>D</i>' ⊆ <i>D</i>, which we call a <i>counterexample,</i> such that <i>Q</i>(<i>D</i>') <i>≠ Q</i> <sub>0</sub>(<i>D</i>'). RATEST builds on data provenance and constraint solving, and employs a suite of techniques to support, at interactive speed, complex queries involving differences and group-by aggregation. We demonstrate an application of RATEST in learning: it has been used successfully by a large undergraduate database course in a university to help students with a relational algebra assignment.</p>","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3299869.3320236","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41223153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
iQCAR: A Demonstration of an Inter-Query Contention Analyzer for Cluster Computing Frameworks. iQCAR:集群计算框架内查询争用分析器的演示。
Pub Date : 2018-06-01 DOI: 10.1145/3183713.3193567
Prajakta Kalmegh, Harrison Lundberg, Frederick Xu, Shivnath Babu, Sudeepa Roy

Unpredictability in query runtimes can arise in a shared cluster as a result of resource contentions caused by inter-query interactions. iQCAR - inter Query Contention AnalyzeR is a system that formally models these interferences between concurrent queries and provides a framework to attribute blame for contentions. iQCAR leverages a multi-level directed acyclic graph called iQC-Graph to diagnose the aberrations in query schedules that lead to these resource contentions. The demonstration will enable users to perform a step-wise deep exploration of such resource contentions faced by a query at various stages of its execution. The interface will allow users to identify top-k victims and sources of contentions, diagnose high-contention nodes and resources in the cluster, and rank their impacts on the performance of a query. Users will also be able to navigate through a set of rules recommended by iQCAR to compare how application of each rule by the cluster scheduler resolves the contentions in subsequent executions.

在共享集群中,查询运行时的不可预测性可能是由查询间交互引起的资源争用的结果。iQCAR—查询争用分析器是一个系统,它正式地对并发查询之间的这些干扰进行建模,并提供一个框架来确定争用的原因。iQCAR利用一个称为iQC-Graph的多级有向无环图来诊断查询调度中导致这些资源争用的异常。该演示将使用户能够对查询在执行的各个阶段所面临的资源争用进行逐步深入的探索。该界面将允许用户识别前k个受害者和争用源,诊断集群中的高争用节点和资源,并对它们对查询性能的影响进行排名。用户还可以浏览iQCAR推荐的一组规则,以比较集群调度器如何应用每个规则在后续执行中解决争用。
{"title":"iQCAR: A Demonstration of an Inter-Query Contention Analyzer for Cluster Computing Frameworks.","authors":"Prajakta Kalmegh,&nbsp;Harrison Lundberg,&nbsp;Frederick Xu,&nbsp;Shivnath Babu,&nbsp;Sudeepa Roy","doi":"10.1145/3183713.3193567","DOIUrl":"https://doi.org/10.1145/3183713.3193567","url":null,"abstract":"<p><p>Unpredictability in query runtimes can arise in a shared cluster as a result of resource contentions caused by inter-query interactions. iQCAR - <i>i</i>nter <b>Q</b>uery <b>C</b>ontention <b>A</b>nalyze<b>R</b> is a system that formally models these interferences between concurrent queries and provides a framework to attribute blame for contentions. iQCAR leverages a multi-level directed acyclic graph called iQC-Graph to diagnose the aberrations in query schedules that lead to these resource contentions. The demonstration will enable users to perform a step-wise deep exploration of such resource contentions faced by a query at various stages of its execution. The interface will allow users to identify top-<i>k</i> victims and sources of contentions, diagnose high-contention nodes and resources in the cluster, and rank their impacts on the performance of a query. Users will also be able to navigate through a set of rules recommended by iQCAR to compare how application of each rule by the cluster scheduler resolves the contentions in subsequent executions.</p>","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3183713.3193567","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37408129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings. ACM-SIGMOD International Conference on Management of Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1