首页 > 最新文献

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory最新文献

英文 中文
Front Matter, Table of Contents, Preface, Conference Organization, List of Authors 前文,目录,序言,会议组织,作者名单
Michael Benedikt, G. Orsi
{"title":"Front Matter, Table of Contents, Preface, Conference Organization, List of Authors","authors":"Michael Benedikt, G. Orsi","doi":"10.4230/LIPIcs.ICDT.2017.0","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2017.0","url":null,"abstract":"","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87136276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
k-Regret Minimizing Set: Efficient Algorithms and Hardness k-遗憾最小化集:高效算法和硬度
Wei Cao, J. Li, Haitao Wang, Kangning Wang, Ruosong Wang, R. C. Wong, Wei Zhan
We study the k-regret minimizing query (k-RMS), which is a useful operator for supporting multi-criteria decision-making. Given two integers k and r, a k-RMS returns r tuples from the database which minimize the k-regret ratio, defined as one minus the worst ratio between the k-th maximum utility score among all tuples in the database and the maximum utility score of the r tuples returned. A solution set contains only r tuples, enjoying the benefits of both top-k queries and skyline queries. Proposed in 2012, the query has been studied extensively in recent years. In this paper, we advance the theory and the practice of k-RMS in the following aspects. First, we develop efficient algorithms for k-RMS (and its decision version) when the dimensionality is 2. The running time of our algorithms outperforms those of previous ones. Second, we show that k-RMS is NP-hard even when the dimensionality is 3. This provides a complete characterization of the complexity of k-RMS, and answers an open question in previous studies. In addition, we present approximation algorithms for the problem when the dimensionality is 3 or larger.
我们研究了k-遗憾最小化查询(k-RMS),它是支持多准则决策的有用算子。给定两个整数k和r, k- rms从数据库返回r个最小k-后悔比率的元组,定义为1减去数据库中所有元组中第k个最大效用得分与返回的r个元组的最大效用得分之间的最差比率。一个解决方案集只包含r个元组,可以同时享受top-k查询和skyline查询的好处。该查询于2012年提出,近年来得到了广泛的研究。本文从以下几个方面提出了k-RMS的理论和实践。首先,当维数为2时,我们开发了k-RMS(及其决策版本)的有效算法。我们的算法的运行时间优于以前的算法。其次,我们证明了即使维度为3,k-RMS也是np困难的。这为k-RMS的复杂性提供了一个完整的表征,并回答了以前研究中的一个开放性问题。此外,我们还提出了维数为3或更大的问题的近似算法。
{"title":"k-Regret Minimizing Set: Efficient Algorithms and Hardness","authors":"Wei Cao, J. Li, Haitao Wang, Kangning Wang, Ruosong Wang, R. C. Wong, Wei Zhan","doi":"10.4230/LIPIcs.ICDT.2017.11","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2017.11","url":null,"abstract":"We study the k-regret minimizing query (k-RMS), which is a useful operator for supporting multi-criteria decision-making. Given two integers k and r, a k-RMS returns r tuples from the database which minimize the k-regret ratio, defined as one minus the worst ratio between the k-th maximum utility score among all tuples in the database and the maximum utility score of the r tuples returned. A solution set contains only r tuples, enjoying the benefits of both top-k queries and skyline queries. Proposed in 2012, the query has been studied extensively in recent years. In this paper, we advance the theory and the practice of k-RMS in the following aspects. First, we develop efficient algorithms for k-RMS (and its decision version) when the dimensionality is 2. The running time of our algorithms outperforms those of previous ones. Second, we show that k-RMS is NP-hard even when the dimensionality is 3. This provides a complete characterization of the complexity of k-RMS, and answers an open question in previous studies. In addition, we present approximation algorithms for the problem when the dimensionality is 3 or larger.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81800461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
m-tables: Representing Missing Data m-tables:表示丢失的数据
Bruhathi Sundarmurthy, Paraschos Koutris, Willis Lang, J. Naughton, V. Tannen
Representation systems have been widely used to capture different forms of incomplete data in various settings. However, existing representation systems are not expressive enough to handle the more complex scenarios of missing data that can occur in practice: these could vary from missing attribute values, missing a known number of tuples, or even missing an unknown number of tuples. In this work, we propose a new representation system called m-tables, that can represent many different types of missing data. We show that m-tables form a closed, complete and strong representation system under both set and bag semantics and are strictly more expressive than conditional tables under both the closed and open world assumptions. We further study the complexity of computing certain and possible answers in m-tables. Finally, we discuss how to "interpret" m-tables through a novel labeling scheme that marks a type of generalized tuples as certain or possible.
表示系统已被广泛用于在各种设置中捕获不同形式的不完整数据。然而,现有的表示系统没有足够的表达能力来处理实际中可能出现的更复杂的丢失数据的场景:这些场景可能是丢失属性值、丢失已知数量的元组、甚至丢失未知数量的元组。在这项工作中,我们提出了一种新的表示系统,称为m-tables,它可以表示许多不同类型的缺失数据。我们证明了m表在集合和袋语义下都形成了一个封闭、完整和强的表示系统,并且在封闭和开放世界假设下都比条件表具有更严格的表达能力。我们进一步研究了计算m-表中确定答案和可能答案的复杂性。最后,我们讨论了如何通过一种新的标记方案来“解释”m表,这种标记方案将一类广义元组标记为确定的或可能的。
{"title":"m-tables: Representing Missing Data","authors":"Bruhathi Sundarmurthy, Paraschos Koutris, Willis Lang, J. Naughton, V. Tannen","doi":"10.4230/LIPIcs.ICDT.2017.21","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2017.21","url":null,"abstract":"Representation systems have been widely used to capture different forms of incomplete data in various settings. However, existing representation systems are not expressive enough to handle the more complex scenarios of missing data that can occur in practice: these could vary from missing attribute values, missing a known number of tuples, or even missing an unknown number of tuples. In this work, we propose a new representation system called m-tables, that can represent many different types of missing data. We show that m-tables form a closed, complete and strong representation system under both set and bag semantics and are strictly more expressive than conditional tables under both the closed and open world assumptions. We further study the complexity of computing certain and possible answers in m-tables. Finally, we discuss how to \"interpret\" m-tables through a novel labeling scheme that marks a type of generalized tuples as certain or possible.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79839354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Compression of Unordered XML Trees 无序XML树的压缩
Markus Lohrey, S. Maneth, C. Reh
Many XML documents are data-centric and do not make use of the inherent document order. Can we provide stronger compression for such documents through giving up order? We first consider compression via minimal dags (directed acyclic graphs) and study the worst case ratio of the size of the ordered dag divided by the size of the unordered dag, where the worst case is taken for all trees of size n. We prove that this worst case ratio is n / log n for the edge size and n log log n / log n for the node size. In experiments we compare several known compressors on the original document tree versus on a canonical version obtained by length-lexicographical sorting of subtrees. For some documents this difference is surprisingly large: reverse binary dags can be smaller by a factor of 3.7 and other compressors can be smaller by factors of up to 190.
许多XML文档以数据为中心,不使用固有的文档顺序。是否可以通过放弃订单来提供更强的压缩?我们首先考虑通过最小dag(有向无环图)进行压缩,并研究有序dag的大小除以无序dag的大小的最坏情况比,其中最坏情况适用于大小为n的所有树。我们证明了这个最坏情况比对于边大小为n / log n,对于节点大小为n log log n / log n。在实验中,我们比较了原始文档树上的几个已知压缩器与通过子树的长度字典排序获得的规范版本。对于某些文档,这种差异惊人地大:反向二进制包可以小3.7倍,而其他压缩器可以小190倍。
{"title":"Compression of Unordered XML Trees","authors":"Markus Lohrey, S. Maneth, C. Reh","doi":"10.4230/LIPIcs.ICDT.2017.18","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2017.18","url":null,"abstract":"Many XML documents are data-centric and do not make use of the inherent document order. Can we provide stronger compression for such documents through giving up order? We first consider compression via minimal dags (directed acyclic graphs) and study the worst case ratio of the size of the ordered dag divided by the size of the unordered dag, where the worst case is taken for all trees of size n. We prove that this worst case ratio is n / log n for the edge size and n log log n / log n for the node size. In experiments we compare several known compressors on the original document tree versus on a canonical version obtained by length-lexicographical sorting of subtrees. For some documents this difference is surprisingly large: reverse binary dags can be smaller by a factor of 3.7 and other compressors can be smaller by factors of up to 190.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81843991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
GYM: A Multiround Distributed Join Algorithm GYM:一种多轮分布式连接算法
F. Afrati, Manas R. Joglekar, C. Ré, S. Salihoglu, J. Ullman
Multiround algorithms are now commonly used in distributed data processing systems, yet the extent to which algorithms can benefit from running more rounds is not well understood. This paper answers this question for several rounds for the problem of computing the equijoin of n relations. Given any query Q with width w, intersection width iw, input size IN, output size OUT, and a cluster of machines with M=Omega(IN frac{1}{epsilon}) memory available per machine, where epsilon > 1 and w ge 1 are constants, we show that: 1. Q can be computed in O(n) rounds with O(n(INw + OUT)2/M) communication cost with high probability. Q can be computed in O(log(n)) rounds with O(n(INmax(w, 3iw) + OUT)2/M) communication cost with high probability. Intersection width is a new notion we introduce for queries and generalized hypertree decompositions (GHDs) of queries that captures how connected the adjacent components of the GHDs are. We achieve our first result by introducing a distributed and generalized version of Yannakakis's algorithm, called GYM. GYM takes as input any GHD of Q with width w and depth d, and computes Q in O(d + log(n)) rounds and O(n (INw + OUT)2/M) communication cost. We achieve our second result by showing how to construct GHDs of Q with width max(w, 3iw) and depth O(log(n)). We describe another technique to construct GHDs with longer widths and lower depths, demonstrating other tradeoffs one can make between communication and the number of rounds.
多轮算法现在普遍用于分布式数据处理系统中,然而,算法从运行更多轮中获益的程度尚未得到很好的理解。本文对计算n关系的等联接问题作了几轮的回答。给定任意查询Q,其宽度为w,交集宽度为iw,输入大小为IN,输出大小为OUT,每台机器的可用内存为M= Omega (IN frac{1}{epsilon}),其中epsilon > 1和w ge 1是常量,我们显示:1。Q可以以O(n(INw + OUT)2/M)通信代价高概率地以O(n(INw + OUT)2/M为周期计算。Q可以在O(log(n))轮中以O(n(INmax(w, 3iw) + OUT)2/M)的高概率通信代价计算。交集宽度是我们为查询和查询的广义超树分解(GHDs)引入的一个新概念,它捕获了GHDs相邻组件的连接方式。我们通过引入Yannakakis算法的分布式和广义版本来实现我们的第一个结果,称为GYM。GYM将宽度为w,深度为d的Q的任意GHD作为输入,并在O(d + log(n))轮和O(n (INw + OUT)2/M)通信成本中计算Q。我们通过展示如何构建宽度为max(w, 3iw)和深度为O(log(n))的Q的ghd来实现第二个结果。我们描述了另一种构建具有更长的宽度和更低深度的ghd的技术,演示了在通信和回合数之间可以做出的其他权衡。
{"title":"GYM: A Multiround Distributed Join Algorithm","authors":"F. Afrati, Manas R. Joglekar, C. Ré, S. Salihoglu, J. Ullman","doi":"10.4230/LIPIcs.ICDT.2017.4","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2017.4","url":null,"abstract":"Multiround algorithms are now commonly used in distributed data processing systems, yet the extent to which algorithms can benefit from running more rounds is not well understood. This paper answers this question for several rounds for the problem of computing the equijoin of n relations. Given any query Q with width w, intersection width iw, input size IN, output size OUT, and a cluster of machines with M=Omega(IN frac{1}{epsilon}) memory available per machine, where epsilon > 1 and w ge 1 are constants, we show that: 1. Q can be computed in O(n) rounds with O(n(INw + OUT)2/M) communication cost with high probability. Q can be computed in O(log(n)) rounds with O(n(INmax(w, 3iw) + OUT)2/M) communication cost with high probability. Intersection width is a new notion we introduce for queries and generalized hypertree decompositions (GHDs) of queries that captures how connected the adjacent components of the GHDs are. We achieve our first result by introducing a distributed and generalized version of Yannakakis's algorithm, called GYM. GYM takes as input any GHD of Q with width w and depth d, and computes Q in O(d + log(n)) rounds and O(n (INw + OUT)2/M) communication cost. We achieve our second result by showing how to construct GHDs of Q with width max(w, 3iw) and depth O(log(n)). We describe another technique to construct GHDs with longer widths and lower depths, demonstrating other tradeoffs one can make between communication and the number of rounds.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84297694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Combined Tractability of Query Evaluation via Tree Automata and Cycluits (Extended Version) 基于树自动机和循环的查询求值组合可跟踪性(扩展版)
Antoine Amarilli, P. Bourhis, Mikaël Monet, P. Senellart
We investigate parameterizations of both database instances and queries that make query evaluation fixed-parameter tractable in combined complexity. We introduce a new Datalog fragment with stratified negation, intensional-clique-guarded Datalog (ICG-Datalog), with linear-time evaluation on structures of bounded treewidth for programs of bounded rule size. Such programs capture in particular conjunctive queries with simplicial decompositions of bounded width, guarded negation fragment queries of bounded CQ-rank, or two-way regular path queries. Our result proceeds via compilation to alternating two-way automata, whose semantics is defined via cyclic provenance circuits (cycluits) that can be tractably evaluated. Last, we prove that probabilistic query evaluation remains intractable in combined complexity under this parameterization.
我们研究了数据库实例和查询的参数化,这些参数化使得查询计算在组合复杂性中固定参数易于处理。我们引入了一种新的数据记录片段,它具有分层否定、密集团保护数据记录(ICG-Datalog),它对有界规则大小的规划的有界树宽结构具有线性时间评价。这样的程序捕获特定的具有有界宽度的简单分解的合取查询,有界CQ-rank的保护否定片段查询,或双向正则路径查询。我们的结果通过编译得到交替双向自动机,其语义通过可跟踪评估的循环溯源电路(cyclits)定义。最后,我们证明了在这种参数化下,概率查询计算在组合复杂度下仍然是难以处理的。
{"title":"Combined Tractability of Query Evaluation via Tree Automata and Cycluits (Extended Version)","authors":"Antoine Amarilli, P. Bourhis, Mikaël Monet, P. Senellart","doi":"10.4230/LIPIcs.ICDT.2017.6","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2017.6","url":null,"abstract":"We investigate parameterizations of both database instances and queries that make query evaluation fixed-parameter tractable in combined complexity. We introduce a new Datalog fragment with stratified negation, intensional-clique-guarded Datalog (ICG-Datalog), with linear-time evaluation on structures of bounded treewidth for programs of bounded rule size. Such programs capture in particular conjunctive queries with simplicial decompositions of bounded width, guarded negation fragment queries of bounded CQ-rank, or two-way regular path queries. Our result proceeds via compilation to alternating two-way automata, whose semantics is defined via cyclic provenance circuits (cycluits) that can be tractably evaluated. Last, we prove that probabilistic query evaluation remains intractable in combined complexity under this parameterization.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81345835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
The complexity of reverse engineering problems for conjunctive queries 联合查询的逆向工程问题的复杂性
P. Barceló, M. Romero
Reverse engineering problems for conjunctive queries (CQs), such as query by example (QBE) or definability, take a set of user examples and convert them into an explanatory CQ. Despite their importance, the complexity of these problems is prohibitively high (coNEXPTIME-complete). We isolate their two main sources of complexity and propose relaxations of them that reduce the complexity while having meaningful theoretical interpretations. The first relaxation is based on the idea of using existential pebble games for approximating homomorphism tests. We show that this characterizes QBE/definability for CQs up to treewidth $k$, while reducing the complexity to EXPTIME. As a side result, we obtain that the complexity of the QBE/definability problems for CQs of treewidth $k$ is EXPTIME-complete for each $k geq 1$. The second relaxation is based on the idea of "desynchronizing" direct products, which characterizes QBE/definability for unions of CQs and reduces the complexity to coNP. The combination of these two relaxations yields tractability for QBE and characterizes it in terms of unions of CQs of treewidth at most $k$. We also study the complexity of these problems for conjunctive regular path queries over graph databases, showing them to be no more difficult than for CQs.
联合查询(CQ)的逆向工程问题,如按例查询(QBE)或可定义性,采用一组用户示例并将它们转换为解释性CQ。尽管它们很重要,但这些问题的复杂性高得令人望而却步(coNEXPTIME-complete)。我们分离了它们的两个主要复杂性来源,并提出了它们的放松,以减少复杂性,同时有意义的理论解释。第一个松弛是基于使用存在鹅卵石游戏来近似同态测试的想法。我们表明,这表征了cq的QBE/可定义性,直到树宽$k$,同时将复杂性降低到EXPTIME。作为一个间接结果,我们得到了树宽$k$的cq的QBE/可定义性问题的复杂度对于每个$k geq 1$都是EXPTIME-complete。第二种松弛是基于“去同步”直接积的思想,它具有cq联合的QBE/可定义性,并将复杂性降低到coNP。这两种松弛的结合产生了QBE的可驯服性,并以树宽最多为$k$的cq的并集来描述它。我们还研究了图数据库上的合取规则路径查询的这些问题的复杂性,表明它们并不比cq更难。
{"title":"The complexity of reverse engineering problems for conjunctive queries","authors":"P. Barceló, M. Romero","doi":"10.4230/LIPIcs.ICDT.2017.7","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2017.7","url":null,"abstract":"Reverse engineering problems for conjunctive queries (CQs), such as query by example (QBE) or definability, take a set of user examples and convert them into an explanatory CQ. Despite their importance, the complexity of these problems is prohibitively high (coNEXPTIME-complete). We isolate their two main sources of complexity and propose relaxations of them that reduce the complexity while having meaningful theoretical interpretations. The first relaxation is based on the idea of using existential pebble games for approximating homomorphism tests. We show that this characterizes QBE/definability for CQs up to treewidth $k$, while reducing the complexity to EXPTIME. As a side result, we obtain that the complexity of the QBE/definability problems for CQs of treewidth $k$ is EXPTIME-complete for each $k geq 1$. The second relaxation is based on the idea of \"desynchronizing\" direct products, which characterizes QBE/definability for unions of CQs and reduces the complexity to coNP. The combination of these two relaxations yields tractability for QBE and characterizes it in terms of unions of CQs of treewidth at most $k$. We also study the complexity of these problems for conjunctive regular path queries over graph databases, showing them to be no more difficult than for CQs.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78645723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Filtering With the Crowd: CrowdScreen Revisited 与人群一起过滤:重新审视CrowdScreen
B. Groz, Ezra Levin, I. Meilijson, T. Milo
Filtering a set of items, based on a set of properties that can be verified by humans, is a common application of CrowdSourcing. When the workers are error-prone, each item is presented to multiple users, to limit the probability of misclassification. Since the Crowd is a relatively expensive resource, minimizing the number of questions per item may naturally result in big savings. Several algorithms to address this minimization problem have been presented in the CrowdScreen framework by Parameswaran et al. However, those algorithms do not scale well and therefore cannot be used in scenarios where high accuracy is required in spite of high user error rates. The goal of this paper is thus to devise algorithms that can cope with such situations. To achieve this, we provide new theoretical insights to the problem, then use them to develop a new efficient algorithm. We also propose novel optimizations for the algorithms of CrowdScreen that improve their scalability. We complement our theoretical study by an experimental evaluation of the algorithms on a large set of synthetic parameters as well as real-life crowdsourcing scenarios, demonstrating the advantages of our solution.
基于一组可以由人类验证的属性来筛选一组项目,这是众包的一个常见应用。当工作人员容易出错时,每个项目都呈现给多个用户,以限制错误分类的概率。由于Crowd是一种相对昂贵的资源,因此最小化每个条目的问题数量自然会节省大量资源。Parameswaran等人在CrowdScreen框架中提出了几种解决这个最小化问题的算法。然而,这些算法不能很好地扩展,因此不能用于要求高精度的场景,尽管用户错误率很高。因此,本文的目标是设计出能够处理这种情况的算法。为了实现这一目标,我们为问题提供了新的理论见解,然后使用它们来开发新的高效算法。我们还对CrowdScreen算法提出了新的优化,以提高其可扩展性。我们通过对大量合成参数和现实生活中的众包场景的算法进行实验评估来补充我们的理论研究,证明了我们的解决方案的优势。
{"title":"Filtering With the Crowd: CrowdScreen Revisited","authors":"B. Groz, Ezra Levin, I. Meilijson, T. Milo","doi":"10.4230/LIPIcs.ICDT.2016.12","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2016.12","url":null,"abstract":"Filtering a set of items, based on a set of properties that can be verified by humans, is a common application of CrowdSourcing. When the workers are error-prone, each item is presented to multiple users, to limit the probability of misclassification. Since the Crowd is a relatively expensive resource, minimizing the number of questions per item may naturally result in big savings. Several algorithms to address this minimization problem have been presented in the CrowdScreen framework by Parameswaran et al. However, those algorithms do not scale well and therefore cannot be used in scenarios where high accuracy is required in spite of high user error rates. The goal of this paper is thus to devise algorithms that can cope with such situations. To achieve this, we provide new theoretical insights to the problem, then use them to develop a new efficient algorithm. We also propose novel optimizations for the algorithms of CrowdScreen that improve their scalability. We complement our theoretical study by an experimental evaluation of the algorithms on a large set of synthetic parameters as well as real-life crowdsourcing scenarios, demonstrating the advantages of our solution.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86083813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Formal Study of Collaborative Access Control in Distributed Datalog 分布式数据中协同访问控制的形式化研究
S. Abiteboul, P. Bourhis, V. Vianu
We formalize and study a declaratively specified collaborative access control mechanism for data dissemination in a distributed environment. Data dissemination is specified using distributed datalog. Access control is also defined by datalog-style rules, at the relation level for extensional relations, and at the tuple level for intensional ones, based on the derivation of tuples. The model also includes a mechanism for " declassifying " data, that allows circumventing overly restrictive access control. We consider the complexity of determining whether a peer is allowed to access a given fact, and address the problem of achieving the goal of disseminating certain information under some access control policy. We also investigate the problem of information leakage, which occurs when a peer is able to infer facts to which the peer is not allowed access by the policy. Finally, we consider access control extended to facts equipped with provenance information, motivated by the many applications where such information is required. We provide semantics for access control with provenance, and establish the complexity of determining whether a peer may access a given fact together with its provenance. This work is motivated by the access control of the Webdamlog system, whose core features it formalizes.
我们形式化并研究了一种在分布式环境中用于数据传播的声明式指定的协作访问控制机制。数据传播是使用分布式数据表指定的。访问控制也是由数据风格的规则定义的,基于元组的派生,在关系级别定义扩展关系,在元组级别定义内涵关系。该模型还包括一个“解密”数据的机制,允许规避过于严格的访问控制。我们考虑了确定是否允许对等体访问给定事实的复杂性,并解决了在某些访问控制策略下实现传播特定信息目标的问题。我们还研究了信息泄漏问题,当对等体能够推断出策略不允许对等体访问的事实时,就会发生信息泄漏问题。最后,我们考虑将访问控制扩展到配备了来源信息的事实,这是由许多需要此类信息的应用程序驱动的。我们提供了具有来源的访问控制语义,并建立了确定对等体是否可以访问给定事实及其来源的复杂性。这项工作的动机是Webdamlog系统的访问控制,它将其核心特性形式化。
{"title":"A Formal Study of Collaborative Access Control in Distributed Datalog","authors":"S. Abiteboul, P. Bourhis, V. Vianu","doi":"10.4230/LIPIcs.ICDT.2016.10","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2016.10","url":null,"abstract":"We formalize and study a declaratively specified collaborative access control mechanism for data dissemination in a distributed environment. Data dissemination is specified using distributed datalog. Access control is also defined by datalog-style rules, at the relation level for extensional relations, and at the tuple level for intensional ones, based on the derivation of tuples. The model also includes a mechanism for \" declassifying \" data, that allows circumventing overly restrictive access control. We consider the complexity of determining whether a peer is allowed to access a given fact, and address the problem of achieving the goal of disseminating certain information under some access control policy. We also investigate the problem of information leakage, which occurs when a peer is able to infer facts to which the peer is not allowed access by the policy. Finally, we consider access control extended to facts equipped with provenance information, motivated by the many applications where such information is required. We provide semantics for access control with provenance, and establish the complexity of determining whether a peer may access a given fact together with its provenance. This work is motivated by the access control of the Webdamlog system, whose core features it formalizes.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81187938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Worst-Case Optimal Algorithms for Parallel Query Processing 并行查询处理的最坏情况最优算法
P. Beame, Paraschos Koutris, Dan Suciu
In this paper, we study the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with $p$ servers. In contrast to previous work, where upper and lower bounds on the communication were specified for particular structures of data (either data without skew, or data with specific types of skew), in this work we focus on worst-case analysis of the communication cost. The goal is to find worst-case optimal parallel algorithms, similar to the work of [18] for sequential algorithms. We first show that for a single round we can obtain an optimal worst-case algorithm. The optimal load for a conjunctive query $q$ when all relations have size equal to $M$ is $O(M/p^{1/psi^*})$, where $psi^*$ is a new query-related quantity called the edge quasi-packing number, which is different from both the edge packing number and edge cover number of the query hypergraph. For multiple rounds, we present algorithms that are optimal for several classes of queries. Finally, we show a surprising connection to the external memory model, which allows us to translate parallel algorithms to external memory algorithms. This technique allows us to recover (within a polylogarithmic factor) several recent results on the I/O complexity for computing join queries, and also obtain optimal algorithms for other classes of queries.
在本文中,我们研究了在并行设置下,使用$p$服务器计算大型数据库上的一个连接查询问题的通信复杂度。在之前的工作中,通信的上界和下界是为特定的数据结构(要么是没有倾斜的数据,要么是具有特定类型倾斜的数据)指定的,与此相反,在这项工作中,我们专注于通信成本的最坏情况分析。目标是找到最坏情况下的最优并行算法,类似于[18]对顺序算法的工作。我们首先证明了对于单个回合,我们可以得到一个最优最坏情况算法。当所有关系的大小都等于$M$时,合取查询$q$的最优负载为$O(M/p^{1/psi^*})$,其中$psi^*$是一个新的与查询相关的量,称为边拟填充数,它不同于查询超图的边填充数和边覆盖数。对于多轮,我们提出了几种查询类的最优算法。最后,我们展示了与外部内存模型的惊人联系,它允许我们将并行算法转换为外部内存算法。该技术允许我们恢复(在多对数因子范围内)计算连接查询的I/O复杂度的几个最新结果,并获得其他查询类的最优算法。
{"title":"Worst-Case Optimal Algorithms for Parallel Query Processing","authors":"P. Beame, Paraschos Koutris, Dan Suciu","doi":"10.4230/LIPIcs.ICDT.2016.8","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2016.8","url":null,"abstract":"In this paper, we study the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with $p$ servers. In contrast to previous work, where upper and lower bounds on the communication were specified for particular structures of data (either data without skew, or data with specific types of skew), in this work we focus on worst-case analysis of the communication cost. The goal is to find worst-case optimal parallel algorithms, similar to the work of [18] for sequential algorithms. \u0000We first show that for a single round we can obtain an optimal worst-case algorithm. The optimal load for a conjunctive query $q$ when all relations have size equal to $M$ is $O(M/p^{1/psi^*})$, where $psi^*$ is a new query-related quantity called the edge quasi-packing number, which is different from both the edge packing number and edge cover number of the query hypergraph. For multiple rounds, we present algorithms that are optimal for several classes of queries. Finally, we show a surprising connection to the external memory model, which allows us to translate parallel algorithms to external memory algorithms. This technique allows us to recover (within a polylogarithmic factor) several recent results on the I/O complexity for computing join queries, and also obtain optimal algorithms for other classes of queries.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77126567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
期刊
Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1