首页 > 最新文献

Proceedings of the 2016 International Conference on Management of Data最新文献

英文 中文
Transaction Healing: Scaling Optimistic Concurrency Control on Multicores 事务修复:在多核上扩展乐观并发控制
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2915202
Yingjun Wu, C. Chan, K. Tan
Today's main-memory databases can support very high transaction rate for OLTP applications. However, when a large number of concurrent transactions contend on the same data records, the system performance can deteriorate significantly. This is especially the case when scaling transaction processing with optimistic concurrency control (OCC) on multicore machines. In this paper, we propose a new concurrency-control mechanism, called transaction healing, that exploits program semantics to scale the conventional OCC towards dozens of cores even under highly contended workloads. Transaction healing captures the dependencies across operations within a transaction prior to its execution. Instead of blindly rejecting a transaction once its validation fails, the proposed mechanism judiciously restores any non-serializable operation and heals inconsistent transaction states as well as query results according to the extracted dependencies. Transaction healing can partially update the membership of read/write sets when processing dependent transactions. Such overhead, however, is largely reduced by carefully avoiding false aborts and rearranging validation orders. We implemented the idea of transaction healing in TheDB, a main-memory database prototype that provides full ACID guarantee with a scalable commit protocol. By evaluating TheDB on a 48-core machine with two widely-used benchmarks, we confirm that transaction healing can scale near-linearly, yielding significantly higher transaction rate than the state-of-the-art OCC implementations.
今天的主内存数据库可以支持OLTP应用程序非常高的事务率。但是,当大量并发事务争用相同的数据记录时,系统性能可能会显著下降。在多核机器上使用乐观并发控制(OCC)扩展事务处理时尤其如此。在本文中,我们提出了一种新的并发控制机制,称为事务修复,它利用程序语义将传统的OCC扩展到几十个核心,即使在高度竞争的工作负载下也是如此。事务修复在事务执行之前捕获事务内各操作之间的依赖关系。提议的机制不是在验证失败后盲目地拒绝事务,而是明智地恢复任何不可序列化的操作,并根据提取的依赖项修复不一致的事务状态和查询结果。事务修复可以在处理依赖事务时部分更新读/写集的成员。但是,通过小心地避免错误中止和重新安排验证顺序,可以在很大程度上减少这种开销。我们在TheDB中实现了事务修复的想法,这是一个主内存数据库原型,通过可扩展的提交协议提供了完整的ACID保证。通过在48核机器上使用两个广泛使用的基准测试评估TheDB,我们确认事务修复可以近乎线性地扩展,产生比最先进的OCC实现更高的事务率。
{"title":"Transaction Healing: Scaling Optimistic Concurrency Control on Multicores","authors":"Yingjun Wu, C. Chan, K. Tan","doi":"10.1145/2882903.2915202","DOIUrl":"https://doi.org/10.1145/2882903.2915202","url":null,"abstract":"Today's main-memory databases can support very high transaction rate for OLTP applications. However, when a large number of concurrent transactions contend on the same data records, the system performance can deteriorate significantly. This is especially the case when scaling transaction processing with optimistic concurrency control (OCC) on multicore machines. In this paper, we propose a new concurrency-control mechanism, called transaction healing, that exploits program semantics to scale the conventional OCC towards dozens of cores even under highly contended workloads. Transaction healing captures the dependencies across operations within a transaction prior to its execution. Instead of blindly rejecting a transaction once its validation fails, the proposed mechanism judiciously restores any non-serializable operation and heals inconsistent transaction states as well as query results according to the extracted dependencies. Transaction healing can partially update the membership of read/write sets when processing dependent transactions. Such overhead, however, is largely reduced by carefully avoiding false aborts and rearranging validation orders. We implemented the idea of transaction healing in TheDB, a main-memory database prototype that provides full ACID guarantee with a scalable commit protocol. By evaluating TheDB on a 48-core machine with two widely-used benchmarks, we confirm that transaction healing can scale near-linearly, yielding significantly higher transaction rate than the state-of-the-art OCC implementations.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"00 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79020583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
What Makes a Good Physical plan?: Experiencing Hardware-Conscious Query Optimization with Candomblé 什么是好的健身计划?:使用candomblaise体验基于硬件的查询优化
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899410
H. Pirk, Oscar Moll, S. Madden
Query optimization is hard and the current proliferation of "modern" hardware does nothing to make it any easier. In addition, the tools that are commonly used by performance engineers, such as compiler intrinsics, static analyzers or hardware performance counters are neither integrated with data management systems nor easy to learn. This fact makes it (unnecessarily) hard to educate engineers, to prototype and to optimize database query plans for modern hardware. To address this problem, we developed a system called Candomblé that lets database performance engineers interactively examine, optimize and evaluate query plans using a touch-based interface. Candomblé puts attendants in the place of a physical query optimizer that has to rewrite a physical query plan into a better equivalent plan. Attendants experience the challenges when ad-hoc optimizing a physical plan for processing devices such as GPUs and CPUs and capture their gained knowledge in rules to be used by a rule-based optimizer.
查询优化是困难的,而当前“现代”硬件的激增并没有使它变得更容易。此外,性能工程师常用的工具,如编译器内在特性、静态分析器或硬件性能计数器,既不能与数据管理系统集成,也不容易学习。这一事实(不必要地)使得培训工程师、为现代硬件设计原型和优化数据库查询计划变得困难。为了解决这个问题,我们开发了一个名为candomblaise的系统,该系统允许数据库性能工程师使用基于触摸的界面交互式地检查、优化和评估查询计划。candomblaise将服务员置于物理查询优化器的位置,而物理查询优化器必须将物理查询计划重写为更好的等效计划。当为处理设备(如gpu和cpu)临时优化物理计划并将他们获得的知识捕获到规则中以供基于规则的优化器使用时,服务人员会遇到挑战。
{"title":"What Makes a Good Physical plan?: Experiencing Hardware-Conscious Query Optimization with Candomblé","authors":"H. Pirk, Oscar Moll, S. Madden","doi":"10.1145/2882903.2899410","DOIUrl":"https://doi.org/10.1145/2882903.2899410","url":null,"abstract":"Query optimization is hard and the current proliferation of \"modern\" hardware does nothing to make it any easier. In addition, the tools that are commonly used by performance engineers, such as compiler intrinsics, static analyzers or hardware performance counters are neither integrated with data management systems nor easy to learn. This fact makes it (unnecessarily) hard to educate engineers, to prototype and to optimize database query plans for modern hardware. To address this problem, we developed a system called Candomblé that lets database performance engineers interactively examine, optimize and evaluate query plans using a touch-based interface. Candomblé puts attendants in the place of a physical query optimizer that has to rewrite a physical query plan into a better equivalent plan. Attendants experience the challenges when ad-hoc optimizing a physical plan for processing devices such as GPUs and CPUs and capture their gained knowledge in rules to be used by a rule-based optimizer.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84601491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
PerNav: A Route Summarization Framework for Personalized Navigation PerNav:一个用于个性化导航的路线汇总框架
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899384
Yaguang Li, Han Su, Ugur Demiryurek, Bolong Zheng, Kai Zeng, C. Shahabi
In this paper, we study a route summarization framework for Personalized Navigation dubbed PerNav - with which the goal is to generate more intuitive and customized turn-by-turn directions based on user generated content. The turn-by-turn directions provided in the existing navigation applications are exclusively derived from underlying road network topology information i.e., the connectivity of nodes to each other. Therefore, the turn-by-turn directions are simplified as metric translation of physical world (e.g. distance/time to turn) to spoken language. Such translation- that ignores human cognition about the geographic space- is often verbose and redundant for the drivers who have knowledge about the geographical areas. PerNav utilizes wealth of user generated historical trajectory data to extract namely "landmarks" (e.g., point of interests or intersections) and frequently visited routes between them from the road network. Then this extracted information is used to obtain cognitive turn-by-turn directions customized for each user.
在本文中,我们研究了一个名为PerNav的个性化导航路线汇总框架,其目标是基于用户生成的内容生成更直观和定制的逐向方向。现有导航应用程序中提供的逐向方向完全来自底层道路网络拓扑信息,即节点之间的连通性。因此,转弯方向被简化为物理世界(如距离/转弯时间)到口语的公制翻译。这种翻译忽略了人类对地理空间的认知,对于了解地理区域的司机来说,往往是冗长和多余的。PerNav利用丰富的用户生成的历史轨迹数据,从道路网络中提取“地标”(例如,兴趣点或十字路口)以及它们之间经常访问的路线。然后将提取的信息用于为每个用户定制的认知逐向方向。
{"title":"PerNav: A Route Summarization Framework for Personalized Navigation","authors":"Yaguang Li, Han Su, Ugur Demiryurek, Bolong Zheng, Kai Zeng, C. Shahabi","doi":"10.1145/2882903.2899384","DOIUrl":"https://doi.org/10.1145/2882903.2899384","url":null,"abstract":"In this paper, we study a route summarization framework for Personalized Navigation dubbed PerNav - with which the goal is to generate more intuitive and customized turn-by-turn directions based on user generated content. The turn-by-turn directions provided in the existing navigation applications are exclusively derived from underlying road network topology information i.e., the connectivity of nodes to each other. Therefore, the turn-by-turn directions are simplified as metric translation of physical world (e.g. distance/time to turn) to spoken language. Such translation- that ignores human cognition about the geographic space- is often verbose and redundant for the drivers who have knowledge about the geographical areas. PerNav utilizes wealth of user generated historical trajectory data to extract namely \"landmarks\" (e.g., point of interests or intersections) and frequently visited routes between them from the road network. Then this extracted information is used to obtain cognitive turn-by-turn directions customized for each user.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89388677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Similarity Join over Array Data 数组数据上的相似性连接
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2915247
Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu
Scientific applications are generating an ever-increasing volume of multi-dimensional data that are largely processed inside distributed array databases and frameworks. Similarity join is a fundamental operation across scientific workloads that requires complex processing over an unbounded number of pairs of multi-dimensional points. In this paper, we introduce a novel distributed similarity join operator for multi-dimensional arrays. Unlike immediate extensions to array join and relational similarity join, the proposed operator minimizes the overall data transfer and network congestion while providing load-balancing, without completely repartitioning and replicating the input arrays. We define formally array similarity join and present the design, optimization strategies, and evaluation of the first array similarity join operator.
科学应用程序正在生成越来越多的多维数据,这些数据主要在分布式数组数据库和框架中进行处理。相似性连接是跨科学工作负载的基本操作,它需要对无限数量的多维点对进行复杂处理。本文引入了一种新的多维数组分布式相似连接算子。与对数组连接和关系相似连接的直接扩展不同,所建议的操作符在提供负载平衡的同时最小化了总体数据传输和网络拥塞,而无需完全重新分区和复制输入数组。我们正式定义了数组相似连接,并给出了第一个数组相似连接操作符的设计、优化策略和计算。
{"title":"Similarity Join over Array Data","authors":"Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu","doi":"10.1145/2882903.2915247","DOIUrl":"https://doi.org/10.1145/2882903.2915247","url":null,"abstract":"Scientific applications are generating an ever-increasing volume of multi-dimensional data that are largely processed inside distributed array databases and frameworks. Similarity join is a fundamental operation across scientific workloads that requires complex processing over an unbounded number of pairs of multi-dimensional points. In this paper, we introduce a novel distributed similarity join operator for multi-dimensional arrays. Unlike immediate extensions to array join and relational similarity join, the proposed operator minimizes the overall data transfer and network congestion while providing load-balancing, without completely repartitioning and replicating the input arrays. We define formally array similarity join and present the design, optimization strategies, and evaluation of the first array similarity join operator.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77286377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Local Similarity Search for Unstructured Text 非结构化文本的局部相似度搜索
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2915211
Pei Wang, Chuan Xiao, Jianbin Qin, Wei Wang, Xiaoyan Zhang, Y. Ishikawa
With the growing popularity of electronic documents, replication can occur for many reasons. People may copy text segments from various sources and make modifications. In this paper, we study the problem of local similarity search to find partially replicated text. Unlike existing studies on similarity search which find entirely duplicated documents, our target is to identify documents that approximately share a pair of sliding windows which differ by no more than τ tokens. Our problem is technically challenging because for sliding windows the tokens to be indexed are less selective than entire documents, rendering set similarity join-based algorithms less efficient. Our proposed method is based on enumerating token combinations to obtain signatures with high selectivity. In order to strike a balance between signature and candidate generation, we partition the token universe and for different partitions we generate combinations composed of different numbers of tokens. A cost-aware algorithm is devised to find a good partitioning of the token universe. We also propose to leverage the overlap between adjacent windows to share computation and thus speed up query processing. In addition, we develop the techniques to support the large thresholds. Experiments on real datasets demonstrate the efficiency of our method against alternative solutions.
随着电子文档的日益普及,复制的发生有很多原因。人们可以从各种来源复制文本片段并进行修改。在本文中,我们研究了局部相似搜索的问题,以找到部分复制的文本。与现有的相似性搜索研究发现完全重复的文档不同,我们的目标是识别大约共享一对滑动窗口的文档,其差异不超过τ个令牌。我们的问题在技术上是具有挑战性的,因为对于滑动窗口,要索引的令牌比整个文档的选择性要低,使得基于集合相似度连接的算法效率较低。我们提出的方法是基于枚举令牌组合来获得高选择性的签名。为了在签名和候选生成之间取得平衡,我们对令牌域进行分区,对于不同的分区,我们生成由不同数量的令牌组成的组合。设计了一种成本感知算法来找到令牌域的良好划分。我们还建议利用相邻窗口之间的重叠来共享计算,从而加快查询处理。此外,我们还开发了支持大阈值的技术。在真实数据集上的实验证明了我们的方法对替代解决方案的有效性。
{"title":"Local Similarity Search for Unstructured Text","authors":"Pei Wang, Chuan Xiao, Jianbin Qin, Wei Wang, Xiaoyan Zhang, Y. Ishikawa","doi":"10.1145/2882903.2915211","DOIUrl":"https://doi.org/10.1145/2882903.2915211","url":null,"abstract":"With the growing popularity of electronic documents, replication can occur for many reasons. People may copy text segments from various sources and make modifications. In this paper, we study the problem of local similarity search to find partially replicated text. Unlike existing studies on similarity search which find entirely duplicated documents, our target is to identify documents that approximately share a pair of sliding windows which differ by no more than τ tokens. Our problem is technically challenging because for sliding windows the tokens to be indexed are less selective than entire documents, rendering set similarity join-based algorithms less efficient. Our proposed method is based on enumerating token combinations to obtain signatures with high selectivity. In order to strike a balance between signature and candidate generation, we partition the token universe and for different partitions we generate combinations composed of different numbers of tokens. A cost-aware algorithm is devised to find a good partitioning of the token universe. We also propose to leverage the overlap between adjacent windows to share computation and thus speed up query processing. In addition, we develop the techniques to support the large thresholds. Experiments on real datasets demonstrate the efficiency of our method against alternative solutions.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76357666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Top-k Relevant Semantic Place Retrieval on Spatial RDF Data 空间RDF数据Top-k相关语义位置检索
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882941
Jieming Shi, Dingming Wu, N. Mamoulis
RDF data are traditionally accessed using structured query languages, such as SPARQL. However, this requires users to understand the language as well as the RDF schema. Keyword search on RDF data aims at relieving the user from these requirements; the user only inputs a set of keywords and the goal is to find small RDF subgraphs which contain all keywords. At the same time, popular RDF knowledge bases also include spatial semantics, which opens the road to location-based search operations. In this work, we propose and study a novel location-based keyword search query on RDF data. The objective of top-k relevant semantic places (kSP) retrieval is to find RDF subgraphs which contain the query keywords and are rooted at spatial entities close to the query location. The novelty of kSP queries is that they are location-aware and that they do not rely on the use of structured query languages. We design a basic method for the processing of kSP queries. To further accelerate kSP retrieval, two pruning approaches and a data preprocessing technique are proposed. Extensive empirical studies on two real datasets demonstrate the superior and robust performance of our proposals compared to the basic method.
传统上使用结构化查询语言(如SPARQL)访问RDF数据。但是,这需要用户理解该语言以及RDF模式。RDF数据的关键字搜索就是为了使用户从这些需求中解脱出来;用户只输入一组关键字,目标是找到包含所有关键字的RDF子图。同时,流行的RDF知识库还包括空间语义,这为基于位置的搜索操作开辟了道路。在这项工作中,我们提出并研究了一种新的基于位置的RDF关键字搜索查询。top-k相关语义位置(kSP)检索的目标是查找包含查询关键字的RDF子图,这些子图植根于靠近查询位置的空间实体。kSP查询的新颖之处在于它们是位置感知的,并且不依赖于结构化查询语言的使用。我们设计了一个处理kSP查询的基本方法。为了进一步加快kSP检索速度,提出了两种剪枝方法和一种数据预处理技术。在两个真实数据集上的大量实证研究表明,与基本方法相比,我们的建议具有优越的鲁棒性。
{"title":"Top-k Relevant Semantic Place Retrieval on Spatial RDF Data","authors":"Jieming Shi, Dingming Wu, N. Mamoulis","doi":"10.1145/2882903.2882941","DOIUrl":"https://doi.org/10.1145/2882903.2882941","url":null,"abstract":"RDF data are traditionally accessed using structured query languages, such as SPARQL. However, this requires users to understand the language as well as the RDF schema. Keyword search on RDF data aims at relieving the user from these requirements; the user only inputs a set of keywords and the goal is to find small RDF subgraphs which contain all keywords. At the same time, popular RDF knowledge bases also include spatial semantics, which opens the road to location-based search operations. In this work, we propose and study a novel location-based keyword search query on RDF data. The objective of top-k relevant semantic places (kSP) retrieval is to find RDF subgraphs which contain the query keywords and are rooted at spatial entities close to the query location. The novelty of kSP queries is that they are location-aware and that they do not rely on the use of structured query languages. We design a basic method for the processing of kSP queries. To further accelerate kSP retrieval, two pruning approaches and a data preprocessing technique are proposed. Extensive empirical studies on two real datasets demonstrate the superior and robust performance of our proposals compared to the basic method.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84330807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Adaptive Data Skipping in Main-Memory Systems 主存系统中的自适应数据跳变
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2914836
Wilson Qin, Stratos Idreos
As modern main-memory optimized data systems increasingly rely on fast scans, lightweight indexes that allow for data skipping play a crucial role in data filtering to reduce system I/O. Scans benefit from data skipping when the data order is sorted, semi-sorted, or comprised of clustered values. However data skipping loses effectiveness over arbitrary data distributions. Applying data skipping techniques over non-sorted data can significantly decrease query performance since the extra cost of metadata reads result in no corresponding scan performance gains. We introduce adaptive data skipping as a framework for structures and techniques that respond to a vast array of data distributions and query workloads. We reveal an adaptive zonemaps design and implementation on a main-memory column store prototype to demonstrate that adaptive data skipping has potential for 1.4X speedup.
由于现代主存优化的数据系统越来越依赖于快速扫描,允许数据跳过的轻量级索引在数据过滤中起着至关重要的作用,可以减少系统I/O。当数据顺序排序、半排序或由聚集值组成时,扫描受益于数据跳过。然而,数据跳变在任意数据分布中失去了有效性。在未排序的数据上应用数据跳过技术会显著降低查询性能,因为元数据读取的额外成本不会带来相应的扫描性能提升。我们将自适应数据跳转作为响应大量数据分布和查询工作负载的结构和技术框架引入。我们在一个主存列存储原型上展示了一个自适应区域地图的设计和实现,以证明自适应数据跳转具有1.4倍加速的潜力。
{"title":"Adaptive Data Skipping in Main-Memory Systems","authors":"Wilson Qin, Stratos Idreos","doi":"10.1145/2882903.2914836","DOIUrl":"https://doi.org/10.1145/2882903.2914836","url":null,"abstract":"As modern main-memory optimized data systems increasingly rely on fast scans, lightweight indexes that allow for data skipping play a crucial role in data filtering to reduce system I/O. Scans benefit from data skipping when the data order is sorted, semi-sorted, or comprised of clustered values. However data skipping loses effectiveness over arbitrary data distributions. Applying data skipping techniques over non-sorted data can significantly decrease query performance since the extra cost of metadata reads result in no corresponding scan performance gains. We introduce adaptive data skipping as a framework for structures and techniques that respond to a vast array of data distributions and query workloads. We reveal an adaptive zonemaps design and implementation on a main-memory column store prototype to demonstrate that adaptive data skipping has potential for 1.4X speedup.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84117711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Robust Query Processing in Co-Processor-accelerated Databases 协处理器加速数据库中的鲁棒查询处理
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882936
S. Breß, Henning Funke, J. Teubner
Technology limitations are making the use of heterogeneous computing devices much more than an academic curiosity. In fact, the use of such devices is widely acknowledged to be the only promising way to achieve application-speedups that users urgently need and expect. However, building a robust and efficient query engine for heterogeneous co-processor environments is still a significant challenge. In this paper, we identify two effects that limit performance in case co-processor resources become scarce. Cache thrashing occurs when the working set of queries does not fit into the co-processor's data cache, resulting in performance degradations up to a factor of 24. Heap contention occurs when multiple operators run in parallel on a co-processor and when their accumulated memory footprint exceeds the main memory capacity of the co-processor, slowing down query execution by up to a factor of six. We propose solutions for both effects. Data-driven operator placement avoids data movements when they might be harmful; query chopping limits co-processor memory usage and thus avoids contention. The combined approach-data-driven query chopping-achieves robust and scalable performance on co-processors. We validate our proposal with our open-source GPU-accelerated database engine CoGaDB and the popular star schema and TPC-H benchmarks.
技术限制使得异构计算设备的使用不仅仅是学术上的好奇心。事实上,使用这种设备被广泛认为是实现用户迫切需要和期望的应用程序加速的唯一有希望的方法。然而,为异构协处理器环境构建一个健壮而高效的查询引擎仍然是一个重大挑战。在本文中,我们确定了在协处理器资源稀缺的情况下限制性能的两种影响。当查询的工作集不适合协处理器的数据缓存时,就会出现缓存抖动,导致性能下降高达24倍。当多个操作符在协处理器上并行运行时,当它们累积的内存占用超过协处理器的主内存容量时,就会发生堆争用,从而使查询执行速度减慢多达六倍。我们针对这两种影响提出解决方案。数据驱动的操作符位置避免了可能有害的数据移动;查询截断限制了协处理器内存的使用,从而避免了争用。这种组合方法——数据驱动的查询切分——在协处理器上实现了健壮和可扩展的性能。我们用开源gpu加速数据库引擎CoGaDB、流行的星型模式和TPC-H基准测试来验证我们的建议。
{"title":"Robust Query Processing in Co-Processor-accelerated Databases","authors":"S. Breß, Henning Funke, J. Teubner","doi":"10.1145/2882903.2882936","DOIUrl":"https://doi.org/10.1145/2882903.2882936","url":null,"abstract":"Technology limitations are making the use of heterogeneous computing devices much more than an academic curiosity. In fact, the use of such devices is widely acknowledged to be the only promising way to achieve application-speedups that users urgently need and expect. However, building a robust and efficient query engine for heterogeneous co-processor environments is still a significant challenge. In this paper, we identify two effects that limit performance in case co-processor resources become scarce. Cache thrashing occurs when the working set of queries does not fit into the co-processor's data cache, resulting in performance degradations up to a factor of 24. Heap contention occurs when multiple operators run in parallel on a co-processor and when their accumulated memory footprint exceeds the main memory capacity of the co-processor, slowing down query execution by up to a factor of six. We propose solutions for both effects. Data-driven operator placement avoids data movements when they might be harmful; query chopping limits co-processor memory usage and thus avoids contention. The combined approach-data-driven query chopping-achieves robust and scalable performance on co-processors. We validate our proposal with our open-source GPU-accelerated database engine CoGaDB and the popular star schema and TPC-H benchmarks.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"112 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81810565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Towards a Non-2PC Transaction Management in Distributed Database Systems 面向分布式数据库系统的非2pc事务管理
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882923
Qian Lin, Pengfei Chang, Gang Chen, B. Ooi, K. Tan, Zhengkui Wang
Shared-nothing architecture has been widely used in distributed databases to achieve good scalability. While it offers superior performance for local transactions, the overhead of processing distributed transactions can degrade the system performance significantly. The key contributor to the degradation is the expensive two-phase commit (2PC) protocol used to ensure atomic commitment of distributed transactions. In this paper, we propose a transaction management scheme called LEAP to avoid the 2PC protocol within distributed transaction processing. Instead of processing a distributed transaction across multiple nodes, LEAP converts the distributed transaction into a local transaction. This benefits the processing locality and facilitates adaptive data repartitioning when there is a change in data access pattern. Based on LEAP, we develop an online transaction processing (OLTP) system, L-Store, and compare it with the state-of-the-art distributed in-memory OLTP system, H-Store, which relies on the 2PC protocol for distributed transaction processing, and H^L-Store, a H-Store that has been modified to make use of LEAP. Results of an extensive experimental evaluation show that our LEAP-based engines are superior over H-Store by a wide margin, especially for workloads that exhibit locality-based data accesses.
无共享架构在分布式数据库中得到了广泛的应用,以获得良好的可扩展性。虽然它为本地事务提供了优越的性能,但处理分布式事务的开销可能会显著降低系统性能。导致性能下降的关键因素是昂贵的两阶段提交(2PC)协议,该协议用于确保分布式事务的原子提交。为了避免分布式事务处理中的2PC协议,本文提出了一种称为LEAP的事务管理方案。LEAP不是跨多个节点处理分布式事务,而是将分布式事务转换为本地事务。这有利于处理局部性,并便于在数据访问模式发生变化时进行自适应数据重分区。基于LEAP,我们开发了一个在线事务处理(OLTP)系统L-Store,并将其与最先进的分布式内存OLTP系统H- store和H^L-Store进行了比较。H- store依赖于2PC协议进行分布式事务处理,H^L-Store是一个经过修改以利用LEAP的H- store。广泛的实验评估结果表明,我们基于leap的引擎在很大程度上优于H-Store,特别是对于显示基于位置的数据访问的工作负载。
{"title":"Towards a Non-2PC Transaction Management in Distributed Database Systems","authors":"Qian Lin, Pengfei Chang, Gang Chen, B. Ooi, K. Tan, Zhengkui Wang","doi":"10.1145/2882903.2882923","DOIUrl":"https://doi.org/10.1145/2882903.2882923","url":null,"abstract":"Shared-nothing architecture has been widely used in distributed databases to achieve good scalability. While it offers superior performance for local transactions, the overhead of processing distributed transactions can degrade the system performance significantly. The key contributor to the degradation is the expensive two-phase commit (2PC) protocol used to ensure atomic commitment of distributed transactions. In this paper, we propose a transaction management scheme called LEAP to avoid the 2PC protocol within distributed transaction processing. Instead of processing a distributed transaction across multiple nodes, LEAP converts the distributed transaction into a local transaction. This benefits the processing locality and facilitates adaptive data repartitioning when there is a change in data access pattern. Based on LEAP, we develop an online transaction processing (OLTP) system, L-Store, and compare it with the state-of-the-art distributed in-memory OLTP system, H-Store, which relies on the 2PC protocol for distributed transaction processing, and H^L-Store, a H-Store that has been modified to make use of LEAP. Results of an extensive experimental evaluation show that our LEAP-based engines are superior over H-Store by a wide margin, especially for workloads that exhibit locality-based data accesses.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86541578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
High-Performance Geospatial Analytics in HyPerSpace 超空间中的高性能地理空间分析
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899412
Varun Pandey, Andreas Kipf, Dimitri Vorona, Tobias Mühlbauer, Thomas Neumann, A. Kemper
In the past few years, massive amounts of location-based data has been captured. Numerous datasets containing user location information are readily available to the public. Analyzing such datasets can lead to fascinating insights into the mobility patterns and behaviors of users. Moreover, in recent times a number of geospatial data-driven companies like Uber, Lyft, and Foursquare have emerged. Real-time analysis of geospatial data is essential and enables an emerging class of applications. Database support for geospatial operations is turning into a necessity instead of a distinct feature provided by only a few databases. Even though a lot of database systems provide geospatial support nowadays, queries often do not consider the most current database state. Geospatial queries are inherently slow given the fact that some of these queries require a couple of geometric computations. Disk-based database systems that do support geospatial datatypes and queries, provide rich features and functions, but they fall behind when performance is considered: specifically if real-time analysis of the latest transactional state is a requirement. In this demonstration, we present HyPerSpace, an extension to the high-performance main-memory database system HyPer developed at the Technical University of Munich, capable of processing geospatial queries with sub-second latencies.
在过去的几年里,大量的基于位置的数据被捕获。公众可以随时获得包含用户位置信息的大量数据集。分析这些数据集可以让我们深入了解用户的移动模式和行为。此外,最近出现了许多地理空间数据驱动的公司,如优步、Lyft和Foursquare。地理空间数据的实时分析是必不可少的,并使一类新兴的应用成为可能。对地理空间操作的数据库支持正在变成一种必需品,而不是只有少数数据库提供的独特功能。尽管现在很多数据库系统都提供地理空间支持,但查询通常不会考虑最新的数据库状态。地理空间查询本身就很慢,因为其中一些查询需要进行一些几何计算。基于磁盘的数据库系统确实支持地理空间数据类型和查询,提供了丰富的特性和功能,但是考虑到性能,特别是在需要实时分析最新事务状态时,它们就落后了。在本演示中,我们介绍了HyPerSpace,这是慕尼黑工业大学开发的高性能主内存数据库系统HyPer的扩展,能够以亚秒级延迟处理地理空间查询。
{"title":"High-Performance Geospatial Analytics in HyPerSpace","authors":"Varun Pandey, Andreas Kipf, Dimitri Vorona, Tobias Mühlbauer, Thomas Neumann, A. Kemper","doi":"10.1145/2882903.2899412","DOIUrl":"https://doi.org/10.1145/2882903.2899412","url":null,"abstract":"In the past few years, massive amounts of location-based data has been captured. Numerous datasets containing user location information are readily available to the public. Analyzing such datasets can lead to fascinating insights into the mobility patterns and behaviors of users. Moreover, in recent times a number of geospatial data-driven companies like Uber, Lyft, and Foursquare have emerged. Real-time analysis of geospatial data is essential and enables an emerging class of applications. Database support for geospatial operations is turning into a necessity instead of a distinct feature provided by only a few databases. Even though a lot of database systems provide geospatial support nowadays, queries often do not consider the most current database state. Geospatial queries are inherently slow given the fact that some of these queries require a couple of geometric computations. Disk-based database systems that do support geospatial datatypes and queries, provide rich features and functions, but they fall behind when performance is considered: specifically if real-time analysis of the latest transactional state is a requirement. In this demonstration, we present HyPerSpace, an extension to the high-performance main-memory database system HyPer developed at the Technical University of Munich, capable of processing geospatial queries with sub-second latencies.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88662807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
期刊
Proceedings of the 2016 International Conference on Management of Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1