首页 > 最新文献

Proceedings of the 2016 International Conference on Management of Data最新文献

英文 中文
DB-Risk: The Game of Global Database Placement 数据库风险:全球数据库布局的游戏
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899405
Victor Zakhary, Faisal Nawab, D. Agrawal, A. E. Abbadi
Geo-replication is the process of maintaining copies of data at geographically dispersed datacenters for better availability and fault-tolerance. The distinguishing characteristic of geo-replication is the large wide-area latency between datacenters that varies widely depending on the location of the datacenters. Thus, choosing which datacenters to deploy a cloud application has a direct impact on the observable response time. We propose an optimization framework that automatically derives a geo-replication placement plan with the objective of minimizing latency. By running the optimization framework on real placement scenarios, we learn a set of placement optimizations for geo-replication. Some of these optimizations are surprising while others are in retrospect straight-forward. In this demonstration, we highlight the geo-replication placement optimizations through the DB-Risk game. DB-Risk invites players to create different placement scenarios while experimenting with the proposed optimizations. The placements created by the players are tested on real cloud deployments.
地理复制是在地理上分散的数据中心维护数据副本的过程,以获得更好的可用性和容错性。地理复制的显著特征是数据中心之间存在较大的广域延迟,该延迟因数据中心的位置而异。因此,选择在哪个数据中心部署云应用程序对可观察的响应时间有直接影响。我们提出了一个优化框架,该框架可以自动导出一个以最小化延迟为目标的地理复制放置计划。通过在实际放置场景中运行优化框架,我们学习了一组用于地理复制的放置优化。其中一些优化是令人惊讶的,而另一些则是直接回顾的。在本演示中,我们将通过DB-Risk游戏强调地理复制放置优化。《DB-Risk》邀请玩家创造不同的放置场景,同时尝试所建议的优化。玩家创建的位置是在真实的云部署上进行测试的。
{"title":"DB-Risk: The Game of Global Database Placement","authors":"Victor Zakhary, Faisal Nawab, D. Agrawal, A. E. Abbadi","doi":"10.1145/2882903.2899405","DOIUrl":"https://doi.org/10.1145/2882903.2899405","url":null,"abstract":"Geo-replication is the process of maintaining copies of data at geographically dispersed datacenters for better availability and fault-tolerance. The distinguishing characteristic of geo-replication is the large wide-area latency between datacenters that varies widely depending on the location of the datacenters. Thus, choosing which datacenters to deploy a cloud application has a direct impact on the observable response time. We propose an optimization framework that automatically derives a geo-replication placement plan with the objective of minimizing latency. By running the optimization framework on real placement scenarios, we learn a set of placement optimizations for geo-replication. Some of these optimizations are surprising while others are in retrospect straight-forward. In this demonstration, we highlight the geo-replication placement optimizations through the DB-Risk game. DB-Risk invites players to create different placement scenarios while experimenting with the proposed optimizations. The placements created by the players are tested on real cloud deployments.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76565282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Querying Geo-Textual Data: Spatial Keyword Queries and Beyond 查询地理文本数据:空间关键字查询及其他
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2912572
G. Cong, Christian S. Jensen
Over the past decade, we have moved from a predominantly desktop based web to a predominantly mobile web, where users most often access the web from mobile devices such as smartphones. In addition, we are witnessing a proliferation of geo-located, textual web content. Motivated in part by these developments, the research community has been hard at work enabling the efficient computation of a variety of query functionality on geo-textual data, yielding a sizable body of literature on the querying of geo-textual data. With a focus on different types of keyword-based queries on geo-textual data, the tutorial also explores topics such as continuous queries on streaming geo-textual data, queries that retrieve attractive regions of geo-textual objects, and queries that extract properties, e.g., topics and top-$k$ frequent words, of the objects in regions. The tutorial is designed to offer an overview of the problems addressed in this body of literature and offers an overview of pertinent concepts and techniques. In addition, the tutorial suggests open problems and new research direction.
在过去的十年里,我们已经从一个以桌面为主的网络转向了一个以移动网络为主的网络,在移动网络中,用户通常通过智能手机等移动设备访问网络。此外,我们正在目睹地理定位的文本网络内容的激增。在这些发展的推动下,研究界一直在努力实现对地理文本数据的各种查询功能的有效计算,产生了大量关于地理文本数据查询的文献。本教程将重点关注地理文本数据上不同类型的基于关键字的查询,还将探讨诸如流地理文本数据上的连续查询、检索地理文本对象的吸引区域的查询以及提取区域中对象的属性(例如主题和top- k - frequency words)的查询等主题。本教程旨在概述本文献中所涉及的问题,并概述相关概念和技术。此外,本教程还提出了有待解决的问题和新的研究方向。
{"title":"Querying Geo-Textual Data: Spatial Keyword Queries and Beyond","authors":"G. Cong, Christian S. Jensen","doi":"10.1145/2882903.2912572","DOIUrl":"https://doi.org/10.1145/2882903.2912572","url":null,"abstract":"Over the past decade, we have moved from a predominantly desktop based web to a predominantly mobile web, where users most often access the web from mobile devices such as smartphones. In addition, we are witnessing a proliferation of geo-located, textual web content. Motivated in part by these developments, the research community has been hard at work enabling the efficient computation of a variety of query functionality on geo-textual data, yielding a sizable body of literature on the querying of geo-textual data. With a focus on different types of keyword-based queries on geo-textual data, the tutorial also explores topics such as continuous queries on streaming geo-textual data, queries that retrieve attractive regions of geo-textual objects, and queries that extract properties, e.g., topics and top-$k$ frequent words, of the objects in regions. The tutorial is designed to offer an overview of the problems addressed in this body of literature and offers an overview of pertinent concepts and techniques. In addition, the tutorial suggests open problems and new research direction.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82575810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
K-means Split Revisited: Well-grounded Approach and Experimental Evaluation 重新审视k均值分裂:有充分根据的方法和实验评估
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2914833
V. Grigorev, G. Chernishev
R-tree is a data structure used for multidimensional indexing. Essentially, it is a balanced tree consisting of nested hyper-rectangles which are used to locate the data. One of the most performance sensitive parts of this data structure is its split algorithm, which runs during node overflows. The split can be performed in multiple ways, according to many different criteria and in general the problem of finding an optimal solution is NP-hard. There are many heuristic split algorithms. In this paper we study an existing k-means node split algorithm. We describe a number of serious issues in its theoretical foundation, which made us to re-design k-means split. We propose several well-grounded solutions to the re-emerged problem of k-means split. Finally, we report the comparison results using PostgreSQL and contemporary benchmark for multidimensional structures.
R-tree是一种用于多维索引的数据结构。本质上,它是一个由嵌套的超矩形组成的平衡树,用于定位数据。这种数据结构对性能最敏感的部分之一是它的分割算法,该算法在节点溢出时运行。根据许多不同的标准,拆分可以以多种方式执行,通常找到最优解的问题是np困难的。有许多启发式分割算法。本文研究了一种现有的k-均值节点分割算法。我们在其理论基础上描述了一些严重的问题,这使得我们重新设计k-means分裂。对于再次出现的k-均值分裂问题,我们提出了几个有根据的解决方案。最后,我们报告了使用PostgreSQL和当代多维结构基准的比较结果。
{"title":"K-means Split Revisited: Well-grounded Approach and Experimental Evaluation","authors":"V. Grigorev, G. Chernishev","doi":"10.1145/2882903.2914833","DOIUrl":"https://doi.org/10.1145/2882903.2914833","url":null,"abstract":"R-tree is a data structure used for multidimensional indexing. Essentially, it is a balanced tree consisting of nested hyper-rectangles which are used to locate the data. One of the most performance sensitive parts of this data structure is its split algorithm, which runs during node overflows. The split can be performed in multiple ways, according to many different criteria and in general the problem of finding an optimal solution is NP-hard. There are many heuristic split algorithms. In this paper we study an existing k-means node split algorithm. We describe a number of serious issues in its theoretical foundation, which made us to re-design k-means split. We propose several well-grounded solutions to the re-emerged problem of k-means split. Finally, we report the comparison results using PostgreSQL and contemporary benchmark for multidimensional structures.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75172267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Minimizing Average Regret Ratio in Database 最小化数据库中的平均后悔率
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2914831
Sepanta Zeighami, R. C. Wong
We propose "average regret ratio" as a metric to measure users' satisfaction after a user sees k selected points of a database, instead of all of the points in the database. We introduce the average regret ratio as another means of multi-criteria decision making. Unlike the original k-regret operator that uses the maximum regret ratio, the average regret ratio takes into account the satisfaction of a general user. While assuming the existence of some utility functions for the users, in contrast to the top-k query, it does not require a user to input his or her utility function but instead depends on the probability distribution of the utility functions. We prove that the average regret ratio is a supermodular function and provide a polynomial-time approximation algorithm to find the average regret ratio minimizing set for a database.
我们提出“平均后悔率”作为衡量用户在看到数据库中的k个选定点而不是数据库中的所有点后的满意度的度量标准。我们引入平均后悔率作为多准则决策的另一种手段。与最初使用最大后悔率的k-后悔算子不同,平均后悔率考虑了一般用户的满意度。假设存在一些用户的效用函数,与top-k查询相反,它不需要用户输入他或她的效用函数,而是依赖于效用函数的概率分布。我们证明了平均后悔率是一个超模函数,并给出了一个求数据库平均后悔率最小集的多项式时间逼近算法。
{"title":"Minimizing Average Regret Ratio in Database","authors":"Sepanta Zeighami, R. C. Wong","doi":"10.1145/2882903.2914831","DOIUrl":"https://doi.org/10.1145/2882903.2914831","url":null,"abstract":"We propose \"average regret ratio\" as a metric to measure users' satisfaction after a user sees k selected points of a database, instead of all of the points in the database. We introduce the average regret ratio as another means of multi-criteria decision making. Unlike the original k-regret operator that uses the maximum regret ratio, the average regret ratio takes into account the satisfaction of a general user. While assuming the existence of some utility functions for the users, in contrast to the top-k query, it does not require a user to input his or her utility function but instead depends on the probability distribution of the utility functions. We prove that the average regret ratio is a supermodular function and provide a polynomial-time approximation algorithm to find the average regret ratio minimizing set for a database.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86836402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
ROLL: Fast In-Memory Generation of Gigantic Scale-free Networks ROLL:巨大无标度网络的快速内存生成
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882964
A. Hadian, Sadegh Heyrani-Nobari, B. Minaei-Bidgoli, Qiang Qu
Real-world graphs are not always publicly available or sometimes do not meet specific research requirements. These challenges call for generating synthetic networks that follow properties of the real-world networks. Barabási-Albert (BA) is a well-known model for generating scale-free graphs, i.e graphs with power-law degree distribution. In BA model, the network is generated through an iterative stochastic process called preferential attachment. Although BA is highly demanded, due to the inherent complexity of the preferential attachment, this model cannot be scaled to generate billion-node graphs. In this paper, we propose ROLL-tree, a fast in-memory roulette wheel data structure that accelerates the BA network generation process by exploiting the statistical behaviors of the underlying growth model. Our proposed method has the following properties: (a) Fast: It performs +1000 times faster than the state-of-the-art on a single node PC; (b) Exact: It strictly follows the BA model, using an efficient data structure instead of approximation techniques; (c) Generalizable: It can be adapted for other "rich-get-richer" stochastic growth models. Our extensive experiments prove that ROLL-tree can effectively accelerate graph-generation through the preferential attachment process. On a commodity single processor machine, for example, ROLL-tree generates a scale-free graph of 1.1 billion nodes and 6.6 billion edges (the size of Yahoo's Webgraph) in 62 minutes while the state-of-the-art (SA) takes about four years on the same machine.
现实世界的图表并不总是公开的,或者有时不符合特定的研究要求。这些挑战要求生成遵循现实世界网络属性的合成网络。Barabási-Albert (BA)是一个著名的模型,用于生成无标度图,即具有幂律度分布的图。在BA模型中,网络是通过一个称为优先依恋的迭代随机过程生成的。虽然对BA的要求很高,但由于优先依恋的固有复杂性,该模型无法扩展到生成十亿节点图。在本文中,我们提出了ROLL-tree,这是一种快速的内存轮盘数据结构,通过利用底层增长模型的统计行为来加速BA网络的生成过程。我们提出的方法具有以下特性:(a)快速:在单节点PC上,它的执行速度比最先进的方法快1000倍;(b)精确:它严格遵循BA模型,使用有效的数据结构而不是近似技术;(c)可推广:它可适用于其他“富得更富”的随机增长模型。我们的大量实验证明,ROLL-tree可以通过优先附着过程有效地加速图的生成。例如,在一台普通的单处理器机器上,ROLL-tree在62分钟内生成一个包含11亿个节点和66亿个边(雅虎的Webgraph的大小)的无标度图,而最先进的(SA)在同一台机器上需要大约四年的时间。
{"title":"ROLL: Fast In-Memory Generation of Gigantic Scale-free Networks","authors":"A. Hadian, Sadegh Heyrani-Nobari, B. Minaei-Bidgoli, Qiang Qu","doi":"10.1145/2882903.2882964","DOIUrl":"https://doi.org/10.1145/2882903.2882964","url":null,"abstract":"Real-world graphs are not always publicly available or sometimes do not meet specific research requirements. These challenges call for generating synthetic networks that follow properties of the real-world networks. Barabási-Albert (BA) is a well-known model for generating scale-free graphs, i.e graphs with power-law degree distribution. In BA model, the network is generated through an iterative stochastic process called preferential attachment. Although BA is highly demanded, due to the inherent complexity of the preferential attachment, this model cannot be scaled to generate billion-node graphs. In this paper, we propose ROLL-tree, a fast in-memory roulette wheel data structure that accelerates the BA network generation process by exploiting the statistical behaviors of the underlying growth model. Our proposed method has the following properties: (a) Fast: It performs +1000 times faster than the state-of-the-art on a single node PC; (b) Exact: It strictly follows the BA model, using an efficient data structure instead of approximation techniques; (c) Generalizable: It can be adapted for other \"rich-get-richer\" stochastic growth models. Our extensive experiments prove that ROLL-tree can effectively accelerate graph-generation through the preferential attachment process. On a commodity single processor machine, for example, ROLL-tree generates a scale-free graph of 1.1 billion nodes and 6.6 billion edges (the size of Yahoo's Webgraph) in 62 minutes while the state-of-the-art (SA) takes about four years on the same machine.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87627783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Exploring Privacy-Accuracy Tradeoffs using DPComp 使用DPComp探索隐私-准确性的权衡
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899387
Michael Hay, Ashwin Machanavajjhala, G. Miklau, Yan Chen, Dan Zhang, G. Bissias
The emergence of differential privacy as a primary standard for privacy protection has led to the development, by the research community, of hundreds of algorithms for various data analysis tasks. Yet deployment of these techniques has been slowed by the complexity of algorithms and an incomplete understanding of the cost to accuracy implied by the adoption of differential privacy. In this demonstration we present DPComp, a publicly-accessible web-based system, designed to support a broad community of users, including data analysts, privacy researchers, and data owners. Users can use DPComp to assess the accuracy of state-of-the-art privacy algorithms and interactively explore algorithm output in order to understand, both quantitatively and qualitatively, the error introduced by the algorithms. In addition, users can contribute new algorithms and new (non-sensitive) datasets. DPComp automatically incorporates user contributions into an evolving benchmark based on a rigorous evaluation methodology articulated by Hay et al. (SIGMOD 2016).
差分隐私作为隐私保护的主要标准的出现,导致了研究界为各种数据分析任务开发了数百种算法。然而,由于算法的复杂性,以及对采用差分隐私所隐含的准确性成本的不完全理解,这些技术的部署已经放缓。在本演示中,我们介绍了DPComp,这是一个可公开访问的基于web的系统,旨在支持广泛的用户社区,包括数据分析师、隐私研究人员和数据所有者。用户可以使用DPComp来评估最先进的隐私算法的准确性,并交互式地探索算法输出,以便定量和定性地了解算法引入的误差。此外,用户可以贡献新的算法和新的(非敏感的)数据集。DPComp根据Hay等人(SIGMOD 2016)提出的严格评估方法,自动将用户贡献纳入不断发展的基准。
{"title":"Exploring Privacy-Accuracy Tradeoffs using DPComp","authors":"Michael Hay, Ashwin Machanavajjhala, G. Miklau, Yan Chen, Dan Zhang, G. Bissias","doi":"10.1145/2882903.2899387","DOIUrl":"https://doi.org/10.1145/2882903.2899387","url":null,"abstract":"The emergence of differential privacy as a primary standard for privacy protection has led to the development, by the research community, of hundreds of algorithms for various data analysis tasks. Yet deployment of these techniques has been slowed by the complexity of algorithms and an incomplete understanding of the cost to accuracy implied by the adoption of differential privacy. In this demonstration we present DPComp, a publicly-accessible web-based system, designed to support a broad community of users, including data analysts, privacy researchers, and data owners. Users can use DPComp to assess the accuracy of state-of-the-art privacy algorithms and interactively explore algorithm output in order to understand, both quantitatively and qualitatively, the error introduced by the algorithms. In addition, users can contribute new algorithms and new (non-sensitive) datasets. DPComp automatically incorporates user contributions into an evolving benchmark based on a rigorous evaluation methodology articulated by Hay et al. (SIGMOD 2016).","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"114 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88081703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
TicToc: Time Traveling Optimistic Concurrency Control TicToc:时间旅行乐观并发控制
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882935
Xiangyao Yu, Andrew Pavlo, Daniel Sánchez, S. Devadas
Concurrency control for on-line transaction processing (OLTP) database management systems (DBMSs) is a nasty game. Achieving higher performance on emerging many-core systems is difficult. Previous research has shown that timestamp management is the key scalability bottleneck in concurrency control algorithms. This prevents the system from scaling to large numbers of cores. In this paper we present TicToc, a new optimistic concurrency control algorithm that avoids the scalability and concurrency bottlenecks of prior T/O schemes. TicToc relies on a novel and provably correct data-driven timestamp management protocol. Instead of assigning timestamps to transactions, this protocol assigns read and write timestamps to data items and uses them to lazily compute a valid commit timestamp for each transaction. TicToc removes the need for centralized timestamp allocation, and commits transactions that would be aborted by conventional T/O schemes. We implemented TicToc along with four other concurrency control algorithms in an in-memory, shared-everything OLTP DBMS and compared their performance on different workloads. Our results show that TicToc achieves up to 92% better throughput while reducing the abort rate by 3.3x over these previous algorithms.
联机事务处理(OLTP)数据库管理系统(dbms)的并发控制是一个棘手的问题。在新兴的多核系统上实现更高的性能是很困难的。以往的研究表明,时间戳管理是并发控制算法中关键的可扩展性瓶颈。这可以防止系统扩展到大量的内核。本文提出了一种新的乐观并发控制算法TicToc,它避免了现有T/O方案的可扩展性和并发性瓶颈。TicToc依赖于一种新颖且可证明正确的数据驱动的时间戳管理协议。该协议没有为事务分配时间戳,而是为数据项分配读和写时间戳,并使用它们惰性地计算每个事务的有效提交时间戳。TicToc消除了对集中时间戳分配的需求,并提交了可能被传统的T/O方案中止的事务。我们在一个内存中、共享一切的OLTP DBMS中实现了TicToc和其他四种并发控制算法,并比较了它们在不同工作负载下的性能。我们的研究结果表明,与之前的算法相比,TicToc的吞吐量提高了92%,同时将中断率降低了3.3倍。
{"title":"TicToc: Time Traveling Optimistic Concurrency Control","authors":"Xiangyao Yu, Andrew Pavlo, Daniel Sánchez, S. Devadas","doi":"10.1145/2882903.2882935","DOIUrl":"https://doi.org/10.1145/2882903.2882935","url":null,"abstract":"Concurrency control for on-line transaction processing (OLTP) database management systems (DBMSs) is a nasty game. Achieving higher performance on emerging many-core systems is difficult. Previous research has shown that timestamp management is the key scalability bottleneck in concurrency control algorithms. This prevents the system from scaling to large numbers of cores. In this paper we present TicToc, a new optimistic concurrency control algorithm that avoids the scalability and concurrency bottlenecks of prior T/O schemes. TicToc relies on a novel and provably correct data-driven timestamp management protocol. Instead of assigning timestamps to transactions, this protocol assigns read and write timestamps to data items and uses them to lazily compute a valid commit timestamp for each transaction. TicToc removes the need for centralized timestamp allocation, and commits transactions that would be aborted by conventional T/O schemes. We implemented TicToc along with four other concurrency control algorithms in an in-memory, shared-everything OLTP DBMS and compared their performance on different workloads. Our results show that TicToc achieves up to 92% better throughput while reducing the abort rate by 3.3x over these previous algorithms.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90617416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 142
TARDiS: A Branch-and-Merge Approach To Weak Consistency TARDiS:实现弱一致性的分支合并方法
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882951
Natacha Crooks, Youer Pu, Nancy Estrada, Trinabh Gupta, L. Alvisi, Allen Clement
This paper presents the design, implementation, and evaluation of TARDiS (Transactional Asynchronously Replicated Divergent Store), a transactional key-value store explicitly designed for weakly-consistent systems. Reasoning about these systems is hard, as neither causal consistency nor per-object eventual convergence allow applications to deal satisfactorily with write-write conflicts. TARDiS instead exposes as its fundamental abstraction the set of conflicting branches that arise in weakly-consistent systems. To this end, TARDiS introduces a new concurrency control mechanism: branch-on-conflict. On the one hand, TARDiS guarantees that storage will appear sequential to any thread of execution that extends a branch, keeping application logic simple. On the other, TARDiS provides applications, when needed, with the tools and context necessary to merge branches atomically, when and how applications want. Since branch-on-conflict in TARDiS is fast, weakly-consistent applications can benefit from adopting this paradigm not only for operations issued by different sites, but also, when appropriate, for conflicting local operations. We find that TARDiS reduces coding complexity for these applications and that judicious branch-on-conflict can improve their local throughput at each site by two to eight times.
本文介绍了TARDiS(事务性异步复制发散存储)的设计、实现和评估,TARDiS是为弱一致性系统明确设计的事务性键值存储。对这些系统进行推理是很困难的,因为无论是因果一致性还是每个对象的最终收敛都不能让应用程序满意地处理写-写冲突。相反,TARDiS将在弱一致性系统中出现的冲突分支集公开为其基本抽象。为此,TARDiS引入了一种新的并发控制机制:冲突分支。一方面,TARDiS保证存储对于扩展分支的任何执行线程来说都是顺序的,从而使应用程序逻辑保持简单。另一方面,TARDiS在需要时为应用程序提供必要的工具和上下文,以便在应用程序需要的时间和方式自动合并分支。由于TARDiS中的冲突分支速度很快,因此采用这种模式的弱一致性应用程序不仅可以从不同站点发出的操作中获益,而且在适当的时候还可以从冲突的本地操作中获益。我们发现TARDiS降低了这些应用程序的编码复杂性,并且明智的冲突分支可以将每个站点的本地吞吐量提高2到8倍。
{"title":"TARDiS: A Branch-and-Merge Approach To Weak Consistency","authors":"Natacha Crooks, Youer Pu, Nancy Estrada, Trinabh Gupta, L. Alvisi, Allen Clement","doi":"10.1145/2882903.2882951","DOIUrl":"https://doi.org/10.1145/2882903.2882951","url":null,"abstract":"This paper presents the design, implementation, and evaluation of TARDiS (Transactional Asynchronously Replicated Divergent Store), a transactional key-value store explicitly designed for weakly-consistent systems. Reasoning about these systems is hard, as neither causal consistency nor per-object eventual convergence allow applications to deal satisfactorily with write-write conflicts. TARDiS instead exposes as its fundamental abstraction the set of conflicting branches that arise in weakly-consistent systems. To this end, TARDiS introduces a new concurrency control mechanism: branch-on-conflict. On the one hand, TARDiS guarantees that storage will appear sequential to any thread of execution that extends a branch, keeping application logic simple. On the other, TARDiS provides applications, when needed, with the tools and context necessary to merge branches atomically, when and how applications want. Since branch-on-conflict in TARDiS is fast, weakly-consistent applications can benefit from adopting this paradigm not only for operations issued by different sites, but also, when appropriate, for conflicting local operations. We find that TARDiS reduces coding complexity for these applications and that judicious branch-on-conflict can improve their local throughput at each site by two to eight times.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"130 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79591440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Low-Overhead Asynchronous Checkpointing in Main-Memory Database Systems 内存数据库系统中的低开销异步检查点
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2915966
Kun Ren, Thaddeus Diamond, D. Abadi, Alexander Thomson
As it becomes increasingly common for transaction processing systems to operate on datasets that fit within the main memory of a single machine or a cluster of commodity machines, traditional mechanisms for guaranteeing transaction durability---which typically involve synchronous log flushes---incur increasingly unappealing costs to otherwise lightweight transactions. Many applications have turned to periodically checkpointing full database state. However, existing checkpointing methods---even those which avoid freezing the storage layer---often come with significant costs to operation throughput, end-to-end latency, and total memory usage. This paper presents Checkpointing Asynchronously using Logical Consistency (CALC), a lightweight, asynchronous technique for capturing database snapshots that does not require a physical point of consistency to create a checkpoint, and avoids conspicuous latency spikes incurred by other database snapshotting schemes. Our experiments show that CALC can capture frequent checkpoints across a variety of transactional workloads with extremely small cost to transactional throughput and low additional memory usage compared to other state-of-the-art checkpointing systems.
随着事务处理系统对适合单个机器或商用机器集群的主内存的数据集进行操作变得越来越普遍,保证事务持久性的传统机制(通常涉及同步日志刷新)对于轻量级事务来说会产生越来越不吸引人的成本。许多应用程序已经转向定期检查点全数据库状态。然而,现有的检查点方法——即使是那些避免冻结存储层的方法——通常会在操作吞吐量、端到端延迟和总内存使用方面付出巨大代价。本文介绍了使用逻辑一致性(CALC)的异步检查点,这是一种轻量级的异步技术,用于捕获数据库快照,不需要物理一致性点来创建检查点,并避免了其他数据库快照方案引起的明显延迟峰值。我们的实验表明,与其他最先进的检查点系统相比,CALC可以在各种事务工作负载中捕获频繁的检查点,并且事务吞吐量的成本非常小,额外的内存使用也很低。
{"title":"Low-Overhead Asynchronous Checkpointing in Main-Memory Database Systems","authors":"Kun Ren, Thaddeus Diamond, D. Abadi, Alexander Thomson","doi":"10.1145/2882903.2915966","DOIUrl":"https://doi.org/10.1145/2882903.2915966","url":null,"abstract":"As it becomes increasingly common for transaction processing systems to operate on datasets that fit within the main memory of a single machine or a cluster of commodity machines, traditional mechanisms for guaranteeing transaction durability---which typically involve synchronous log flushes---incur increasingly unappealing costs to otherwise lightweight transactions. Many applications have turned to periodically checkpointing full database state. However, existing checkpointing methods---even those which avoid freezing the storage layer---often come with significant costs to operation throughput, end-to-end latency, and total memory usage. This paper presents Checkpointing Asynchronously using Logical Consistency (CALC), a lightweight, asynchronous technique for capturing database snapshots that does not require a physical point of consistency to create a checkpoint, and avoids conspicuous latency spikes incurred by other database snapshotting schemes. Our experiments show that CALC can capture frequent checkpoints across a variety of transactional workloads with extremely small cost to transactional throughput and low additional memory usage compared to other state-of-the-art checkpointing systems.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84765807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Set-based Similarity Search for Time Series 基于集合的时间序列相似度搜索
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882963
Jinglin Peng, Hongzhi Wang, Jianzhong Li, Hong Gao
A fundamental problem of time series is k nearest neighbor (k-NN) query processing. However, existing methods are not fast enough for large dataset. In this paper, we propose a novel approach, STS3, to process k-NN queries by transforming time series to sets and measure the similarity under Jaccard metric. Our approach is more accurate than Dynamic Time Warping(DTW) in our suitable scenarios and it is faster than most of the existing methods, due to the efficient similarity search for sets. Besides, we also developed an index, a pruning and an approximation technique to improve the k-NN query procedure. As shown in the experimental results, all of them could accelerate the query processing effectively.
时间序列的一个基本问题是k近邻(k- nn)查询处理。然而,现有的方法对于大数据集来说速度不够快。在本文中,我们提出了一种新的方法,STS3,通过将时间序列转换为集合来处理k-NN查询,并在Jaccard度量下测量相似性。在合适的场景下,我们的方法比动态时间扭曲(DTW)更准确,并且由于对集合的有效相似性搜索,它比大多数现有方法更快。此外,我们还开发了索引、剪枝和近似技术来改进k-NN查询过程。实验结果表明,它们都能有效地加快查询处理速度。
{"title":"Set-based Similarity Search for Time Series","authors":"Jinglin Peng, Hongzhi Wang, Jianzhong Li, Hong Gao","doi":"10.1145/2882903.2882963","DOIUrl":"https://doi.org/10.1145/2882903.2882963","url":null,"abstract":"A fundamental problem of time series is k nearest neighbor (k-NN) query processing. However, existing methods are not fast enough for large dataset. In this paper, we propose a novel approach, STS3, to process k-NN queries by transforming time series to sets and measure the similarity under Jaccard metric. Our approach is more accurate than Dynamic Time Warping(DTW) in our suitable scenarios and it is faster than most of the existing methods, due to the efficient similarity search for sets. Besides, we also developed an index, a pruning and an approximation technique to improve the k-NN query procedure. As shown in the experimental results, all of them could accelerate the query processing effectively.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88196065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
期刊
Proceedings of the 2016 International Conference on Management of Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1