Victor Zakhary, Faisal Nawab, D. Agrawal, A. E. Abbadi
Geo-replication is the process of maintaining copies of data at geographically dispersed datacenters for better availability and fault-tolerance. The distinguishing characteristic of geo-replication is the large wide-area latency between datacenters that varies widely depending on the location of the datacenters. Thus, choosing which datacenters to deploy a cloud application has a direct impact on the observable response time. We propose an optimization framework that automatically derives a geo-replication placement plan with the objective of minimizing latency. By running the optimization framework on real placement scenarios, we learn a set of placement optimizations for geo-replication. Some of these optimizations are surprising while others are in retrospect straight-forward. In this demonstration, we highlight the geo-replication placement optimizations through the DB-Risk game. DB-Risk invites players to create different placement scenarios while experimenting with the proposed optimizations. The placements created by the players are tested on real cloud deployments.
{"title":"DB-Risk: The Game of Global Database Placement","authors":"Victor Zakhary, Faisal Nawab, D. Agrawal, A. E. Abbadi","doi":"10.1145/2882903.2899405","DOIUrl":"https://doi.org/10.1145/2882903.2899405","url":null,"abstract":"Geo-replication is the process of maintaining copies of data at geographically dispersed datacenters for better availability and fault-tolerance. The distinguishing characteristic of geo-replication is the large wide-area latency between datacenters that varies widely depending on the location of the datacenters. Thus, choosing which datacenters to deploy a cloud application has a direct impact on the observable response time. We propose an optimization framework that automatically derives a geo-replication placement plan with the objective of minimizing latency. By running the optimization framework on real placement scenarios, we learn a set of placement optimizations for geo-replication. Some of these optimizations are surprising while others are in retrospect straight-forward. In this demonstration, we highlight the geo-replication placement optimizations through the DB-Risk game. DB-Risk invites players to create different placement scenarios while experimenting with the proposed optimizations. The placements created by the players are tested on real cloud deployments.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76565282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the past decade, we have moved from a predominantly desktop based web to a predominantly mobile web, where users most often access the web from mobile devices such as smartphones. In addition, we are witnessing a proliferation of geo-located, textual web content. Motivated in part by these developments, the research community has been hard at work enabling the efficient computation of a variety of query functionality on geo-textual data, yielding a sizable body of literature on the querying of geo-textual data. With a focus on different types of keyword-based queries on geo-textual data, the tutorial also explores topics such as continuous queries on streaming geo-textual data, queries that retrieve attractive regions of geo-textual objects, and queries that extract properties, e.g., topics and top-$k$ frequent words, of the objects in regions. The tutorial is designed to offer an overview of the problems addressed in this body of literature and offers an overview of pertinent concepts and techniques. In addition, the tutorial suggests open problems and new research direction.
在过去的十年里,我们已经从一个以桌面为主的网络转向了一个以移动网络为主的网络,在移动网络中,用户通常通过智能手机等移动设备访问网络。此外,我们正在目睹地理定位的文本网络内容的激增。在这些发展的推动下,研究界一直在努力实现对地理文本数据的各种查询功能的有效计算,产生了大量关于地理文本数据查询的文献。本教程将重点关注地理文本数据上不同类型的基于关键字的查询,还将探讨诸如流地理文本数据上的连续查询、检索地理文本对象的吸引区域的查询以及提取区域中对象的属性(例如主题和top- k - frequency words)的查询等主题。本教程旨在概述本文献中所涉及的问题,并概述相关概念和技术。此外,本教程还提出了有待解决的问题和新的研究方向。
{"title":"Querying Geo-Textual Data: Spatial Keyword Queries and Beyond","authors":"G. Cong, Christian S. Jensen","doi":"10.1145/2882903.2912572","DOIUrl":"https://doi.org/10.1145/2882903.2912572","url":null,"abstract":"Over the past decade, we have moved from a predominantly desktop based web to a predominantly mobile web, where users most often access the web from mobile devices such as smartphones. In addition, we are witnessing a proliferation of geo-located, textual web content. Motivated in part by these developments, the research community has been hard at work enabling the efficient computation of a variety of query functionality on geo-textual data, yielding a sizable body of literature on the querying of geo-textual data. With a focus on different types of keyword-based queries on geo-textual data, the tutorial also explores topics such as continuous queries on streaming geo-textual data, queries that retrieve attractive regions of geo-textual objects, and queries that extract properties, e.g., topics and top-$k$ frequent words, of the objects in regions. The tutorial is designed to offer an overview of the problems addressed in this body of literature and offers an overview of pertinent concepts and techniques. In addition, the tutorial suggests open problems and new research direction.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82575810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R-tree is a data structure used for multidimensional indexing. Essentially, it is a balanced tree consisting of nested hyper-rectangles which are used to locate the data. One of the most performance sensitive parts of this data structure is its split algorithm, which runs during node overflows. The split can be performed in multiple ways, according to many different criteria and in general the problem of finding an optimal solution is NP-hard. There are many heuristic split algorithms. In this paper we study an existing k-means node split algorithm. We describe a number of serious issues in its theoretical foundation, which made us to re-design k-means split. We propose several well-grounded solutions to the re-emerged problem of k-means split. Finally, we report the comparison results using PostgreSQL and contemporary benchmark for multidimensional structures.
{"title":"K-means Split Revisited: Well-grounded Approach and Experimental Evaluation","authors":"V. Grigorev, G. Chernishev","doi":"10.1145/2882903.2914833","DOIUrl":"https://doi.org/10.1145/2882903.2914833","url":null,"abstract":"R-tree is a data structure used for multidimensional indexing. Essentially, it is a balanced tree consisting of nested hyper-rectangles which are used to locate the data. One of the most performance sensitive parts of this data structure is its split algorithm, which runs during node overflows. The split can be performed in multiple ways, according to many different criteria and in general the problem of finding an optimal solution is NP-hard. There are many heuristic split algorithms. In this paper we study an existing k-means node split algorithm. We describe a number of serious issues in its theoretical foundation, which made us to re-design k-means split. We propose several well-grounded solutions to the re-emerged problem of k-means split. Finally, we report the comparison results using PostgreSQL and contemporary benchmark for multidimensional structures.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75172267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose "average regret ratio" as a metric to measure users' satisfaction after a user sees k selected points of a database, instead of all of the points in the database. We introduce the average regret ratio as another means of multi-criteria decision making. Unlike the original k-regret operator that uses the maximum regret ratio, the average regret ratio takes into account the satisfaction of a general user. While assuming the existence of some utility functions for the users, in contrast to the top-k query, it does not require a user to input his or her utility function but instead depends on the probability distribution of the utility functions. We prove that the average regret ratio is a supermodular function and provide a polynomial-time approximation algorithm to find the average regret ratio minimizing set for a database.
{"title":"Minimizing Average Regret Ratio in Database","authors":"Sepanta Zeighami, R. C. Wong","doi":"10.1145/2882903.2914831","DOIUrl":"https://doi.org/10.1145/2882903.2914831","url":null,"abstract":"We propose \"average regret ratio\" as a metric to measure users' satisfaction after a user sees k selected points of a database, instead of all of the points in the database. We introduce the average regret ratio as another means of multi-criteria decision making. Unlike the original k-regret operator that uses the maximum regret ratio, the average regret ratio takes into account the satisfaction of a general user. While assuming the existence of some utility functions for the users, in contrast to the top-k query, it does not require a user to input his or her utility function but instead depends on the probability distribution of the utility functions. We prove that the average regret ratio is a supermodular function and provide a polynomial-time approximation algorithm to find the average regret ratio minimizing set for a database.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86836402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Hadian, Sadegh Heyrani-Nobari, B. Minaei-Bidgoli, Qiang Qu
Real-world graphs are not always publicly available or sometimes do not meet specific research requirements. These challenges call for generating synthetic networks that follow properties of the real-world networks. Barabási-Albert (BA) is a well-known model for generating scale-free graphs, i.e graphs with power-law degree distribution. In BA model, the network is generated through an iterative stochastic process called preferential attachment. Although BA is highly demanded, due to the inherent complexity of the preferential attachment, this model cannot be scaled to generate billion-node graphs. In this paper, we propose ROLL-tree, a fast in-memory roulette wheel data structure that accelerates the BA network generation process by exploiting the statistical behaviors of the underlying growth model. Our proposed method has the following properties: (a) Fast: It performs +1000 times faster than the state-of-the-art on a single node PC; (b) Exact: It strictly follows the BA model, using an efficient data structure instead of approximation techniques; (c) Generalizable: It can be adapted for other "rich-get-richer" stochastic growth models. Our extensive experiments prove that ROLL-tree can effectively accelerate graph-generation through the preferential attachment process. On a commodity single processor machine, for example, ROLL-tree generates a scale-free graph of 1.1 billion nodes and 6.6 billion edges (the size of Yahoo's Webgraph) in 62 minutes while the state-of-the-art (SA) takes about four years on the same machine.
{"title":"ROLL: Fast In-Memory Generation of Gigantic Scale-free Networks","authors":"A. Hadian, Sadegh Heyrani-Nobari, B. Minaei-Bidgoli, Qiang Qu","doi":"10.1145/2882903.2882964","DOIUrl":"https://doi.org/10.1145/2882903.2882964","url":null,"abstract":"Real-world graphs are not always publicly available or sometimes do not meet specific research requirements. These challenges call for generating synthetic networks that follow properties of the real-world networks. Barabási-Albert (BA) is a well-known model for generating scale-free graphs, i.e graphs with power-law degree distribution. In BA model, the network is generated through an iterative stochastic process called preferential attachment. Although BA is highly demanded, due to the inherent complexity of the preferential attachment, this model cannot be scaled to generate billion-node graphs. In this paper, we propose ROLL-tree, a fast in-memory roulette wheel data structure that accelerates the BA network generation process by exploiting the statistical behaviors of the underlying growth model. Our proposed method has the following properties: (a) Fast: It performs +1000 times faster than the state-of-the-art on a single node PC; (b) Exact: It strictly follows the BA model, using an efficient data structure instead of approximation techniques; (c) Generalizable: It can be adapted for other \"rich-get-richer\" stochastic growth models. Our extensive experiments prove that ROLL-tree can effectively accelerate graph-generation through the preferential attachment process. On a commodity single processor machine, for example, ROLL-tree generates a scale-free graph of 1.1 billion nodes and 6.6 billion edges (the size of Yahoo's Webgraph) in 62 minutes while the state-of-the-art (SA) takes about four years on the same machine.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87627783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Hay, Ashwin Machanavajjhala, G. Miklau, Yan Chen, Dan Zhang, G. Bissias
The emergence of differential privacy as a primary standard for privacy protection has led to the development, by the research community, of hundreds of algorithms for various data analysis tasks. Yet deployment of these techniques has been slowed by the complexity of algorithms and an incomplete understanding of the cost to accuracy implied by the adoption of differential privacy. In this demonstration we present DPComp, a publicly-accessible web-based system, designed to support a broad community of users, including data analysts, privacy researchers, and data owners. Users can use DPComp to assess the accuracy of state-of-the-art privacy algorithms and interactively explore algorithm output in order to understand, both quantitatively and qualitatively, the error introduced by the algorithms. In addition, users can contribute new algorithms and new (non-sensitive) datasets. DPComp automatically incorporates user contributions into an evolving benchmark based on a rigorous evaluation methodology articulated by Hay et al. (SIGMOD 2016).
{"title":"Exploring Privacy-Accuracy Tradeoffs using DPComp","authors":"Michael Hay, Ashwin Machanavajjhala, G. Miklau, Yan Chen, Dan Zhang, G. Bissias","doi":"10.1145/2882903.2899387","DOIUrl":"https://doi.org/10.1145/2882903.2899387","url":null,"abstract":"The emergence of differential privacy as a primary standard for privacy protection has led to the development, by the research community, of hundreds of algorithms for various data analysis tasks. Yet deployment of these techniques has been slowed by the complexity of algorithms and an incomplete understanding of the cost to accuracy implied by the adoption of differential privacy. In this demonstration we present DPComp, a publicly-accessible web-based system, designed to support a broad community of users, including data analysts, privacy researchers, and data owners. Users can use DPComp to assess the accuracy of state-of-the-art privacy algorithms and interactively explore algorithm output in order to understand, both quantitatively and qualitatively, the error introduced by the algorithms. In addition, users can contribute new algorithms and new (non-sensitive) datasets. DPComp automatically incorporates user contributions into an evolving benchmark based on a rigorous evaluation methodology articulated by Hay et al. (SIGMOD 2016).","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"114 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88081703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangyao Yu, Andrew Pavlo, Daniel Sánchez, S. Devadas
Concurrency control for on-line transaction processing (OLTP) database management systems (DBMSs) is a nasty game. Achieving higher performance on emerging many-core systems is difficult. Previous research has shown that timestamp management is the key scalability bottleneck in concurrency control algorithms. This prevents the system from scaling to large numbers of cores. In this paper we present TicToc, a new optimistic concurrency control algorithm that avoids the scalability and concurrency bottlenecks of prior T/O schemes. TicToc relies on a novel and provably correct data-driven timestamp management protocol. Instead of assigning timestamps to transactions, this protocol assigns read and write timestamps to data items and uses them to lazily compute a valid commit timestamp for each transaction. TicToc removes the need for centralized timestamp allocation, and commits transactions that would be aborted by conventional T/O schemes. We implemented TicToc along with four other concurrency control algorithms in an in-memory, shared-everything OLTP DBMS and compared their performance on different workloads. Our results show that TicToc achieves up to 92% better throughput while reducing the abort rate by 3.3x over these previous algorithms.
{"title":"TicToc: Time Traveling Optimistic Concurrency Control","authors":"Xiangyao Yu, Andrew Pavlo, Daniel Sánchez, S. Devadas","doi":"10.1145/2882903.2882935","DOIUrl":"https://doi.org/10.1145/2882903.2882935","url":null,"abstract":"Concurrency control for on-line transaction processing (OLTP) database management systems (DBMSs) is a nasty game. Achieving higher performance on emerging many-core systems is difficult. Previous research has shown that timestamp management is the key scalability bottleneck in concurrency control algorithms. This prevents the system from scaling to large numbers of cores. In this paper we present TicToc, a new optimistic concurrency control algorithm that avoids the scalability and concurrency bottlenecks of prior T/O schemes. TicToc relies on a novel and provably correct data-driven timestamp management protocol. Instead of assigning timestamps to transactions, this protocol assigns read and write timestamps to data items and uses them to lazily compute a valid commit timestamp for each transaction. TicToc removes the need for centralized timestamp allocation, and commits transactions that would be aborted by conventional T/O schemes. We implemented TicToc along with four other concurrency control algorithms in an in-memory, shared-everything OLTP DBMS and compared their performance on different workloads. Our results show that TicToc achieves up to 92% better throughput while reducing the abort rate by 3.3x over these previous algorithms.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90617416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Natacha Crooks, Youer Pu, Nancy Estrada, Trinabh Gupta, L. Alvisi, Allen Clement
This paper presents the design, implementation, and evaluation of TARDiS (Transactional Asynchronously Replicated Divergent Store), a transactional key-value store explicitly designed for weakly-consistent systems. Reasoning about these systems is hard, as neither causal consistency nor per-object eventual convergence allow applications to deal satisfactorily with write-write conflicts. TARDiS instead exposes as its fundamental abstraction the set of conflicting branches that arise in weakly-consistent systems. To this end, TARDiS introduces a new concurrency control mechanism: branch-on-conflict. On the one hand, TARDiS guarantees that storage will appear sequential to any thread of execution that extends a branch, keeping application logic simple. On the other, TARDiS provides applications, when needed, with the tools and context necessary to merge branches atomically, when and how applications want. Since branch-on-conflict in TARDiS is fast, weakly-consistent applications can benefit from adopting this paradigm not only for operations issued by different sites, but also, when appropriate, for conflicting local operations. We find that TARDiS reduces coding complexity for these applications and that judicious branch-on-conflict can improve their local throughput at each site by two to eight times.
{"title":"TARDiS: A Branch-and-Merge Approach To Weak Consistency","authors":"Natacha Crooks, Youer Pu, Nancy Estrada, Trinabh Gupta, L. Alvisi, Allen Clement","doi":"10.1145/2882903.2882951","DOIUrl":"https://doi.org/10.1145/2882903.2882951","url":null,"abstract":"This paper presents the design, implementation, and evaluation of TARDiS (Transactional Asynchronously Replicated Divergent Store), a transactional key-value store explicitly designed for weakly-consistent systems. Reasoning about these systems is hard, as neither causal consistency nor per-object eventual convergence allow applications to deal satisfactorily with write-write conflicts. TARDiS instead exposes as its fundamental abstraction the set of conflicting branches that arise in weakly-consistent systems. To this end, TARDiS introduces a new concurrency control mechanism: branch-on-conflict. On the one hand, TARDiS guarantees that storage will appear sequential to any thread of execution that extends a branch, keeping application logic simple. On the other, TARDiS provides applications, when needed, with the tools and context necessary to merge branches atomically, when and how applications want. Since branch-on-conflict in TARDiS is fast, weakly-consistent applications can benefit from adopting this paradigm not only for operations issued by different sites, but also, when appropriate, for conflicting local operations. We find that TARDiS reduces coding complexity for these applications and that judicious branch-on-conflict can improve their local throughput at each site by two to eight times.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"130 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79591440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kun Ren, Thaddeus Diamond, D. Abadi, Alexander Thomson
As it becomes increasingly common for transaction processing systems to operate on datasets that fit within the main memory of a single machine or a cluster of commodity machines, traditional mechanisms for guaranteeing transaction durability---which typically involve synchronous log flushes---incur increasingly unappealing costs to otherwise lightweight transactions. Many applications have turned to periodically checkpointing full database state. However, existing checkpointing methods---even those which avoid freezing the storage layer---often come with significant costs to operation throughput, end-to-end latency, and total memory usage. This paper presents Checkpointing Asynchronously using Logical Consistency (CALC), a lightweight, asynchronous technique for capturing database snapshots that does not require a physical point of consistency to create a checkpoint, and avoids conspicuous latency spikes incurred by other database snapshotting schemes. Our experiments show that CALC can capture frequent checkpoints across a variety of transactional workloads with extremely small cost to transactional throughput and low additional memory usage compared to other state-of-the-art checkpointing systems.
{"title":"Low-Overhead Asynchronous Checkpointing in Main-Memory Database Systems","authors":"Kun Ren, Thaddeus Diamond, D. Abadi, Alexander Thomson","doi":"10.1145/2882903.2915966","DOIUrl":"https://doi.org/10.1145/2882903.2915966","url":null,"abstract":"As it becomes increasingly common for transaction processing systems to operate on datasets that fit within the main memory of a single machine or a cluster of commodity machines, traditional mechanisms for guaranteeing transaction durability---which typically involve synchronous log flushes---incur increasingly unappealing costs to otherwise lightweight transactions. Many applications have turned to periodically checkpointing full database state. However, existing checkpointing methods---even those which avoid freezing the storage layer---often come with significant costs to operation throughput, end-to-end latency, and total memory usage. This paper presents Checkpointing Asynchronously using Logical Consistency (CALC), a lightweight, asynchronous technique for capturing database snapshots that does not require a physical point of consistency to create a checkpoint, and avoids conspicuous latency spikes incurred by other database snapshotting schemes. Our experiments show that CALC can capture frequent checkpoints across a variety of transactional workloads with extremely small cost to transactional throughput and low additional memory usage compared to other state-of-the-art checkpointing systems.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84765807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinglin Peng, Hongzhi Wang, Jianzhong Li, Hong Gao
A fundamental problem of time series is k nearest neighbor (k-NN) query processing. However, existing methods are not fast enough for large dataset. In this paper, we propose a novel approach, STS3, to process k-NN queries by transforming time series to sets and measure the similarity under Jaccard metric. Our approach is more accurate than Dynamic Time Warping(DTW) in our suitable scenarios and it is faster than most of the existing methods, due to the efficient similarity search for sets. Besides, we also developed an index, a pruning and an approximation technique to improve the k-NN query procedure. As shown in the experimental results, all of them could accelerate the query processing effectively.
{"title":"Set-based Similarity Search for Time Series","authors":"Jinglin Peng, Hongzhi Wang, Jianzhong Li, Hong Gao","doi":"10.1145/2882903.2882963","DOIUrl":"https://doi.org/10.1145/2882903.2882963","url":null,"abstract":"A fundamental problem of time series is k nearest neighbor (k-NN) query processing. However, existing methods are not fast enough for large dataset. In this paper, we propose a novel approach, STS3, to process k-NN queries by transforming time series to sets and measure the similarity under Jaccard metric. Our approach is more accurate than Dynamic Time Warping(DTW) in our suitable scenarios and it is faster than most of the existing methods, due to the efficient similarity search for sets. Besides, we also developed an index, a pruning and an approximation technique to improve the k-NN query procedure. As shown in the experimental results, all of them could accelerate the query processing effectively.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88196065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}