首页 > 最新文献

2011 IEEE 27th International Conference on Data Engineering最新文献

英文 中文
Transactional In-Page Logging for multiversion read consistency and recovery 用于多版本读取一致性和恢复的事务性页内日志记录
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767889
Sang-Won Lee, Bongki Moon
Recently, a new buffer and storage management strategy called In-Page Logging (IPL) has been proposed for database systems based on flash memory. Its main objective is to overcome the limitations of flash memory such as erase-before-write and asymmetric read/write speeds by storing changes made to a data page in a form of log records without overwriting the data page itself. Since it maintains a series of changes made to a data page separately from the original data page until they are merged, the IPL scheme provides unique opportunities to design light-weight transactional support for database systems. In this paper, we propose the transactional IPL (TIPL) scheme that takes advantage of the IPL log records to support multiversion read consistency and light-weight database recovery. Due to the dual use of IPL log records, namely, for snapshot isolation and fast recovery as well as flash-aware write optimization, TIPL achieves transactional support for flash memory database systems that minimizes the space and time overhead during normal database processing and shortens the database recovery time.
近年来,针对基于闪存的数据库系统,提出了一种新的缓存和存储管理策略——页内日志(IPL)。它的主要目标是通过以日志记录的形式存储对数据页的更改,而不覆盖数据页本身,从而克服闪存的限制,例如写前擦除和非对称读/写速度。由于它将对数据页所做的一系列更改与原始数据页分开维护,直到它们合并,因此IPL方案为为数据库系统设计轻量级事务支持提供了独特的机会。本文提出了一种事务性IPL (TIPL)方案,该方案利用IPL日志记录来支持多版本读取一致性和轻量级数据库恢复。由于IPL日志记录的双重用途,即用于快照隔离和快速恢复以及闪存感知写优化,TIPL实现了对闪存数据库系统的事务性支持,从而最大限度地减少了正常数据库处理期间的空间和时间开销,缩短了数据库恢复时间。
{"title":"Transactional In-Page Logging for multiversion read consistency and recovery","authors":"Sang-Won Lee, Bongki Moon","doi":"10.1109/ICDE.2011.5767889","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767889","url":null,"abstract":"Recently, a new buffer and storage management strategy called In-Page Logging (IPL) has been proposed for database systems based on flash memory. Its main objective is to overcome the limitations of flash memory such as erase-before-write and asymmetric read/write speeds by storing changes made to a data page in a form of log records without overwriting the data page itself. Since it maintains a series of changes made to a data page separately from the original data page until they are merged, the IPL scheme provides unique opportunities to design light-weight transactional support for database systems. In this paper, we propose the transactional IPL (TIPL) scheme that takes advantage of the IPL log records to support multiversion read consistency and light-weight database recovery. Due to the dual use of IPL log records, namely, for snapshot isolation and fast recovery as well as flash-aware write optimization, TIPL achieves transactional support for flash memory database systems that minimizes the space and time overhead during normal database processing and shortens the database recovery time.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122967008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
NORMS: An automatic tool to perform schema label normalization 规范:执行模式标签规范化的自动工具
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767952
S. Sorrentino, S. Bergamaschi, M. Gawinecki
Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and structure). Schema matching systems usually exploit lexical and semantic information provided by lexical databases/thesauri to discover intra/inter semantic relationships among schema elements. However, most of them obtain poor performance on real world scenarios due to the significant presence of “non-dictionary words”. Non-dictionary words include compound nouns, abbreviations and acronyms. In this paper, we present NORMS (NORMalizer of Schemata), a tool performing schema label normalization to increase the number of comparable labels extracted from schemata1.
模式匹配是跨异构数据源(格式和结构都是异构的)查找概念之间关系的问题。模式匹配系统通常利用词汇数据库/词典提供的词汇和语义信息来发现模式元素之间的语义内/语义间关系。然而,由于“非字典单词”的大量存在,它们中的大多数在现实场景中表现不佳。非词典词汇包括复合名词、缩略语和首字母缩略词。在本文中,我们提出了norm (NORMalizer of Schemata),一个执行模式标签规范化的工具,以增加从schemata1中提取的可比较标签的数量。
{"title":"NORMS: An automatic tool to perform schema label normalization","authors":"S. Sorrentino, S. Bergamaschi, M. Gawinecki","doi":"10.1109/ICDE.2011.5767952","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767952","url":null,"abstract":"Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and structure). Schema matching systems usually exploit lexical and semantic information provided by lexical databases/thesauri to discover intra/inter semantic relationships among schema elements. However, most of them obtain poor performance on real world scenarios due to the significant presence of “non-dictionary words”. Non-dictionary words include compound nouns, abbreviations and acronyms. In this paper, we present NORMS (NORMalizer of Schemata), a tool performing schema label normalization to increase the number of comparable labels extracted from schemata1.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125689098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
AMC - A framework for modelling and comparing matching systems as matching processes AMC -作为匹配过程建模和比较匹配系统的框架
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767940
E. Peukert, Julian Eberius, E. Rahm
We present the Auto Mapping Core (AMC), a new framework that supports fast construction and tuning of schema matching approaches for specific domains such as ontology alignment, model matching or database-schema matching. Distinctive features of our framework are new visualisation techniques for modelling matching processes, stepwise tuning of parameters, intermediate result analysis and performance-oriented rewrites. Furthermore, existing matchers can be plugged into the framework to comparatively evaluate them in a common environment. This allows deeper analysis of behaviour and shortcomings in existing complex matching systems.
我们提出了自动映射核心(AMC),这是一个新的框架,支持快速构建和调整特定领域的模式匹配方法,如本体对齐、模型匹配或数据库-模式匹配。我们的框架的显著特点是新的可视化技术建模匹配过程,逐步调整参数,中间结果分析和面向性能的重写。此外,现有的匹配器可以插入到框架中,以便在公共环境中对它们进行比较评估。这允许对现有复杂匹配系统的行为和缺陷进行更深入的分析。
{"title":"AMC - A framework for modelling and comparing matching systems as matching processes","authors":"E. Peukert, Julian Eberius, E. Rahm","doi":"10.1109/ICDE.2011.5767940","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767940","url":null,"abstract":"We present the Auto Mapping Core (AMC), a new framework that supports fast construction and tuning of schema matching approaches for specific domains such as ontology alignment, model matching or database-schema matching. Distinctive features of our framework are new visualisation techniques for modelling matching processes, stepwise tuning of parameters, intermediate result analysis and performance-oriented rewrites. Furthermore, existing matchers can be plugged into the framework to comparatively evaluate them in a common environment. This allows deeper analysis of behaviour and shortcomings in existing complex matching systems.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131718528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems RCFile:在基于mapreduce的仓库系统中快速且节省空间的数据放置结构
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767933
Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, Zhiwei Xu
MapReduce-based data warehouse systems are playing important roles of supporting big data analytics to understand quickly the dynamics of user behavior trends and their needs in typical Web service providers and social network sites (e.g., Facebook). In such a system, the data placement structure is a critical factor that can affect the warehouse performance in a fundamental way. Based on our observations and analysis of Facebook production systems, we have characterized four requirements for the data placement structure: (1) fast data loading, (2) fast query processing, (3) highly efficient storage space utilization, and (4) strong adaptivity to highly dynamic workload patterns. We have examined three commonly accepted data placement structures in conventional databases, namely row-stores, column-stores, and hybrid-stores in the context of large data analysis using MapReduce. We show that they are not very suitable for big data processing in distributed systems. In this paper, we present a big data placement structure called RCFile (Record Columnar File) and its implementation in the Hadoop system. With intensive experiments, we show the effectiveness of RCFile in satisfying the four requirements. RCFile has been chosen in Facebook data warehouse system as the default option. It has also been adopted by Hive and Pig, the two most widely used data analysis systems developed in Facebook and Yahoo!
基于mapreduce的数据仓库系统在支持大数据分析方面发挥着重要作用,可以快速了解典型Web服务提供商和社交网站(例如Facebook)中用户行为趋势的动态及其需求。在这样的系统中,数据放置结构是能够从根本上影响仓库性能的关键因素。根据我们对Facebook生产系统的观察和分析,我们描述了数据放置结构的四个要求:(1)快速数据加载,(2)快速查询处理,(3)高效存储空间利用,以及(4)对高动态工作负载模式的强适应性。我们研究了传统数据库中常用的三种数据放置结构,即使用MapReduce进行大数据分析时的行存储、列存储和混合存储。我们表明它们不太适合分布式系统中的大数据处理。本文介绍了一种名为RCFile (Record Columnar File)的大数据放置结构及其在Hadoop系统中的实现。通过大量的实验,我们证明了RCFile在满足这四个要求方面的有效性。Facebook数据仓库系统选择RCFile作为默认选项。它也被Hive和Pig所采用,这是Facebook和Yahoo!
{"title":"RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems","authors":"Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, Zhiwei Xu","doi":"10.1109/ICDE.2011.5767933","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767933","url":null,"abstract":"MapReduce-based data warehouse systems are playing important roles of supporting big data analytics to understand quickly the dynamics of user behavior trends and their needs in typical Web service providers and social network sites (e.g., Facebook). In such a system, the data placement structure is a critical factor that can affect the warehouse performance in a fundamental way. Based on our observations and analysis of Facebook production systems, we have characterized four requirements for the data placement structure: (1) fast data loading, (2) fast query processing, (3) highly efficient storage space utilization, and (4) strong adaptivity to highly dynamic workload patterns. We have examined three commonly accepted data placement structures in conventional databases, namely row-stores, column-stores, and hybrid-stores in the context of large data analysis using MapReduce. We show that they are not very suitable for big data processing in distributed systems. In this paper, we present a big data placement structure called RCFile (Record Columnar File) and its implementation in the Hadoop system. With intensive experiments, we show the effectiveness of RCFile in satisfying the four requirements. RCFile has been chosen in Facebook data warehouse system as the default option. It has also been adopted by Hive and Pig, the two most widely used data analysis systems developed in Facebook and Yahoo!","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131274920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 292
Decomposing DAGs into spanning trees: A new way to compress transitive closures 将dag分解为生成树:压缩传递闭包的一种新方法
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767832
Yangjun Chen, Yibin Chen
Let G(V, E) be a digraph (directed graph) with n nodes and e edges. Digraph G* = (V, E*) is the reflexive, transitive closure if (v, u) ∈ E* iff there is a path from v to u in G. Efficient storage of G* is important for supporting reachability queries which are not only common on graph databases, but also serve as fundamental operations used in many graph algorithms. A lot of strategies have been suggested based on the graph labeling, by which each node is assigned with certain labels such that the reachability of any two nodes through a path can be determined by their labels. Among them are interval labelling, chain decomposition, and 2-hop labeling. However, due to the very large size of many real world graphs, the computational cost and size of labels using existing methods would prove too expensive to be practical. In this paper, we propose a new approach to decompose a graph into a series of spanning trees which may share common edges, to transform a reachability query over a graph into a set of queries over trees. We demonstrate both analytically and empirically the efficiency and effectiveness of our method.
设G(V, E)是一个有向图(有向图),有n个节点和E条边。有向图G* = (V, E*)是自反的,传递的闭包,如果(V, u)∈E*,如果在G中有一条从V到u的路径,G*的有效存储对于支持可达性查询非常重要,可达性查询不仅在图数据库中很常见,而且是许多图算法中使用的基本操作。人们提出了许多基于图标记的策略,通过给每个节点分配特定的标签,使得任意两个节点通过路径的可达性可以通过它们的标签来确定。其中包括区间标记、链分解和2-hop标记。然而,由于许多现实世界的图非常大,使用现有方法的计算成本和标签的大小将被证明过于昂贵而不实用。在本文中,我们提出了一种新的方法,将图分解为一系列可能共享共同边的生成树,从而将图上的可达性查询转换为树上的查询集。我们从分析和经验两方面证明了我们的方法的效率和有效性。
{"title":"Decomposing DAGs into spanning trees: A new way to compress transitive closures","authors":"Yangjun Chen, Yibin Chen","doi":"10.1109/ICDE.2011.5767832","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767832","url":null,"abstract":"Let G(V, E) be a digraph (directed graph) with n nodes and e edges. Digraph G* = (V, E*) is the reflexive, transitive closure if (v, u) ∈ E* iff there is a path from v to u in G. Efficient storage of G* is important for supporting reachability queries which are not only common on graph databases, but also serve as fundamental operations used in many graph algorithms. A lot of strategies have been suggested based on the graph labeling, by which each node is assigned with certain labels such that the reachability of any two nodes through a path can be determined by their labels. Among them are interval labelling, chain decomposition, and 2-hop labeling. However, due to the very large size of many real world graphs, the computational cost and size of labels using existing methods would prove too expensive to be practical. In this paper, we propose a new approach to decompose a graph into a series of spanning trees which may share common edges, to transform a reachability query over a graph into a set of queries over trees. We demonstrate both analytically and empirically the efficiency and effectiveness of our method.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133653922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
The extensibility framework in Microsoft StreamInsight Microsoft StreamInsight中的可扩展性框架
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767878
Mohamed H. Ali, B. Chandramouli, J. Goldstein, R. Schindlauer
Microsoft StreamInsight (StreamInsight, for brevity) is a platform for developing and deploying streaming applications, which need to run continuous queries over high-data-rate streams of input events. StreamInsight leverages a well-defined temporal stream model and operator algebra, as the underlying basis for processing long-running continuous queries over event streams. This allows StreamInsight to handle imperfections in event delivery and to provide correctness guarantees on the generated output. StreamInsight natively supports a diverse range of off-the-shelf streaming operators. In order to cater to a much broader range of customer scenarios and applications, StreamInsight has recently introduced a new extensibility infrastructure. With this infrastructure, StreamInsight enables developers to integrate their domain expertise within the query pipeline in the form of user defined modules (functions, operators, and aggregates). This paper describes the extensibility framework in StreamInsight; an ongoing effort at Microsoft SQL Server to support the integration of user-defined modules in a stream processing system. More specifically, the paper addresses the extensibility problem from three perspectives: the query writer's perspective, the user defined module writer's perspective, and the system's internal perspective. The paper introduces and addresses a range of new and subtle challenges that arise when we try to add extensibility to a streaming system, in a manner that is easy to use, powerful, and practical. We summarize our experience and provide future directions for supporting stream-oriented workloads in different business domains.
Microsoft StreamInsight(简称为StreamInsight)是一个开发和部署流应用程序的平台,它需要在输入事件的高数据速率流上运行连续查询。StreamInsight利用定义良好的时态流模型和操作符代数,作为处理事件流上长时间连续查询的底层基础。这允许StreamInsight处理事件交付中的缺陷,并为生成的输出提供正确性保证。StreamInsight原生支持各种现成的流媒体运营商。为了迎合更广泛的客户场景和应用,StreamInsight最近引入了一个新的可扩展性基础设施。有了这个基础设施,StreamInsight使开发人员能够以用户定义模块(函数、操作符和聚合)的形式将他们的领域专业知识集成到查询管道中。本文描述了StreamInsight的可扩展性框架;Microsoft SQL Server正在努力支持在流处理系统中集成用户定义模块。更具体地说,本文从三个角度解决了可扩展性问题:查询编写器的角度、用户定义模块编写器的角度和系统内部的角度。本文介绍并解决了一系列新的和微妙的挑战,当我们试图以一种易于使用、功能强大和实用的方式向流系统添加可扩展性时,会出现这些挑战。我们总结了我们的经验,并提供了在不同业务领域中支持面向流工作负载的未来方向。
{"title":"The extensibility framework in Microsoft StreamInsight","authors":"Mohamed H. Ali, B. Chandramouli, J. Goldstein, R. Schindlauer","doi":"10.1109/ICDE.2011.5767878","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767878","url":null,"abstract":"Microsoft StreamInsight (StreamInsight, for brevity) is a platform for developing and deploying streaming applications, which need to run continuous queries over high-data-rate streams of input events. StreamInsight leverages a well-defined temporal stream model and operator algebra, as the underlying basis for processing long-running continuous queries over event streams. This allows StreamInsight to handle imperfections in event delivery and to provide correctness guarantees on the generated output. StreamInsight natively supports a diverse range of off-the-shelf streaming operators. In order to cater to a much broader range of customer scenarios and applications, StreamInsight has recently introduced a new extensibility infrastructure. With this infrastructure, StreamInsight enables developers to integrate their domain expertise within the query pipeline in the form of user defined modules (functions, operators, and aggregates). This paper describes the extensibility framework in StreamInsight; an ongoing effort at Microsoft SQL Server to support the integration of user-defined modules in a stream processing system. More specifically, the paper addresses the extensibility problem from three perspectives: the query writer's perspective, the user defined module writer's perspective, and the system's internal perspective. The paper introduces and addresses a range of new and subtle challenges that arise when we try to add extensibility to a streaming system, in a manner that is easy to use, powerful, and practical. We summarize our experience and provide future directions for supporting stream-oriented workloads in different business domains.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117046612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
SQPR: Stream query planning with reuse SQPR:具有重用性的流查询规划
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767851
Evangelia Kalyvianaki, W. Wiesemann, Q. Vu, D. Kuhn, P. Pietzuch
When users submit new queries to a distributed stream processing system (DSPS), a query planner must allocate physical resources, such as CPU cores, memory and network bandwidth, from a set of hosts to queries. Allocation decisions must provide the correct mix of resources required by queries, while achieving an efficient overall allocation to scale in the number of admitted queries. By exploiting overlap between queries and reusing partial results, a query planner can conserve resources but has to carry out more complex planning decisions. In this paper, we describe SQPR, a query planner that targets DSPSs in data centre environments with heterogeneous resources. SQPR models query admission, allocation and reuse as a single constrained optimisation problem and solves an approximate version to achieve scalability. It prevents individual resources from becoming bottlenecks by re-planning past allocation decisions and supports different allocation objectives. As our experimental evaluation in comparison with a state-of-the-art planner shows SQPR makes efficient resource allocation decisions, even with a high utilisation of resources, with acceptable overheads.
当用户向分布式流处理系统(DSPS)提交新的查询时,查询规划器必须从一组主机上为查询分配物理资源,如CPU内核、内存和网络带宽。分配决策必须提供查询所需资源的正确组合,同时实现有效的总体分配,以扩展所允许查询的数量。通过利用查询之间的重叠和重用部分结果,查询规划器可以节省资源,但必须执行更复杂的规划决策。在本文中,我们描述了SQPR,一种针对具有异构资源的数据中心环境中的dsb的查询规划器。SQPR模型将查询准入、分配和重用作为一个单一的约束优化问题,并解决一个近似的版本来实现可扩展性。它通过重新规划过去的分配决策来防止单个资源成为瓶颈,并支持不同的分配目标。与最先进的规划器相比,我们的实验评估表明,即使资源利用率很高,SQPR也能在可接受的开销下做出有效的资源分配决策。
{"title":"SQPR: Stream query planning with reuse","authors":"Evangelia Kalyvianaki, W. Wiesemann, Q. Vu, D. Kuhn, P. Pietzuch","doi":"10.1109/ICDE.2011.5767851","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767851","url":null,"abstract":"When users submit new queries to a distributed stream processing system (DSPS), a query planner must allocate physical resources, such as CPU cores, memory and network bandwidth, from a set of hosts to queries. Allocation decisions must provide the correct mix of resources required by queries, while achieving an efficient overall allocation to scale in the number of admitted queries. By exploiting overlap between queries and reusing partial results, a query planner can conserve resources but has to carry out more complex planning decisions. In this paper, we describe SQPR, a query planner that targets DSPSs in data centre environments with heterogeneous resources. SQPR models query admission, allocation and reuse as a single constrained optimisation problem and solves an approximate version to achieve scalability. It prevents individual resources from becoming bottlenecks by re-planning past allocation decisions and supports different allocation objectives. As our experimental evaluation in comparison with a state-of-the-art planner shows SQPR makes efficient resource allocation decisions, even with a high utilisation of resources, with acceptable overheads.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116368209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
High performance database logging using storage class memory 使用存储类内存的高性能数据库日志记录
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767918
Ru Fang, Hui-I Hsiao, Bin He, C. Mohan, Yun Wang
Storage class memory (SCM), a new generation of memory technology, offers non-volatility, high-speed, and byte-addressability, which combines the best properties of current hard disk drives (HDD) and main memory. With these extraordinary features, current systems and software stacks need to be redesigned to get significantly improved performance by eliminating disk input/output (I/O) barriers; and simpler system designs by avoiding complicated data format transformations. In current DBMSs, logging and recovery are the most important components to enforce the atomicity and durability of a database. Traditionally, database systems rely on disks for logging transaction actions and log records are forced to disks when a transaction commits. Because of the slow disk I/O speed, logging becomes one of the major bottlenecks for a DBMS. Exploiting SCM as a persistent memory for transaction logging can significantly reduce logging overhead. In this paper, we present the detailed design of an SCM-based approach for DBMSs logging, which achieves high performance by simplified system design and better concurrency support. We also discuss solutions to tackle several major issues arising during system recovery, including hole detection, partial write detection, and any-point failure recovery. This new logging approach is used to replace the traditional disk based logging approach in DBMSs. To analyze the performance characteristics of our SCM-based logging approach, we implement the prototype on IBM SolidDB. In common circumstances, our experimental results show that the new SCM-based logging approach provides as much as 7 times throughput improvement over disk-based logging in the Telecommunication Application Transaction Processing (TATP) benchmark.
存储级存储器(SCM)是新一代的存储器技术,它结合了当前硬盘驱动器(HDD)和主存储器的最佳特性,具有非易失性、高速和字节寻址能力。有了这些非凡的功能,当前的系统和软件栈需要重新设计,通过消除磁盘输入/输出(I/O)障碍来显著提高性能;通过避免复杂的数据格式转换,简化系统设计。在当前的dbms中,日志记录和恢复是加强数据库原子性和持久性的最重要组件。传统上,数据库系统依赖于磁盘记录事务操作,当事务提交时,日志记录被强制保存到磁盘。由于磁盘I/O速度较慢,日志记录成为DBMS的主要瓶颈之一。利用SCM作为事务日志的持久内存可以显著减少日志开销。在本文中,我们详细设计了一种基于scm的dbms日志记录方法,通过简化系统设计和更好的并发支持来实现高性能。我们还讨论了解决系统恢复过程中出现的几个主要问题的解决方案,包括漏洞检测、部分写检测和任意点故障恢复。这种新的日志记录方法用于取代dbms中传统的基于磁盘的日志记录方法。为了分析基于scm的日志记录方法的性能特征,我们在IBM SolidDB上实现了原型。在一般情况下,我们的实验结果表明,在电信应用事务处理(TATP)基准测试中,新的基于scm的日志记录方法提供了比基于磁盘的日志记录多7倍的吞吐量改进。
{"title":"High performance database logging using storage class memory","authors":"Ru Fang, Hui-I Hsiao, Bin He, C. Mohan, Yun Wang","doi":"10.1109/ICDE.2011.5767918","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767918","url":null,"abstract":"Storage class memory (SCM), a new generation of memory technology, offers non-volatility, high-speed, and byte-addressability, which combines the best properties of current hard disk drives (HDD) and main memory. With these extraordinary features, current systems and software stacks need to be redesigned to get significantly improved performance by eliminating disk input/output (I/O) barriers; and simpler system designs by avoiding complicated data format transformations. In current DBMSs, logging and recovery are the most important components to enforce the atomicity and durability of a database. Traditionally, database systems rely on disks for logging transaction actions and log records are forced to disks when a transaction commits. Because of the slow disk I/O speed, logging becomes one of the major bottlenecks for a DBMS. Exploiting SCM as a persistent memory for transaction logging can significantly reduce logging overhead. In this paper, we present the detailed design of an SCM-based approach for DBMSs logging, which achieves high performance by simplified system design and better concurrency support. We also discuss solutions to tackle several major issues arising during system recovery, including hole detection, partial write detection, and any-point failure recovery. This new logging approach is used to replace the traditional disk based logging approach in DBMSs. To analyze the performance characteristics of our SCM-based logging approach, we implement the prototype on IBM SolidDB. In common circumstances, our experimental results show that the new SCM-based logging approach provides as much as 7 times throughput improvement over disk-based logging in the Telecommunication Application Transaction Processing (TATP) benchmark.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116430584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 110
Efficient continuously moving top-k spatial keyword query processing 高效的连续移动top-k空间关键字查询处理
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767861
Dingming Wu, Man Lung Yiu, Christian S. Jensen, G. Cong
Web users and content are increasingly being geo-positioned. This development gives prominence to spatial keyword queries, which involve both the locations and textual descriptions of content.
网络用户和内容正日益被地理定位。这种发展突出了空间关键字查询,它涉及内容的位置和文本描述。
{"title":"Efficient continuously moving top-k spatial keyword query processing","authors":"Dingming Wu, Man Lung Yiu, Christian S. Jensen, G. Cong","doi":"10.1109/ICDE.2011.5767861","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767861","url":null,"abstract":"Web users and content are increasingly being geo-positioned. This development gives prominence to spatial keyword queries, which involve both the locations and textual descriptions of content.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121816176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 159
Adapting microsoft SQL server for cloud computing 为云计算调整microsoft SQL server
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767935
P. Bernstein, Istvan Cseri, Nishant Dani, Nigel Ellis, Ajay Kalhan, Gopal Kakivaya, D. Lomet, Ramesh Manne, Lev Novik, Tomas Talius
Cloud SQL Server is a relational database system designed to scale-out to cloud computing workloads. It uses Microsoft SQL Server as its core. To scale out, it uses a partitioned database on a shared-nothing system architecture. Transactions are constrained to execute on one partition, to avoid the need for two-phase commit. The database is replicated for high availability using a custom primary-copy replication scheme. It currently serves as the storage engine for Microsoft's Exchange Hosted Archive and SQL Azure.
云SQL Server是一个关系数据库系统,旨在向外扩展到云计算工作负载。它以Microsoft SQL Server为核心。为了向外扩展,它在无共享系统架构上使用分区数据库。事务被限制在一个分区上执行,以避免需要两阶段提交。使用自定义主拷贝复制方案复制数据库以获得高可用性。它目前是微软Exchange Hosted Archive和SQL Azure的存储引擎。
{"title":"Adapting microsoft SQL server for cloud computing","authors":"P. Bernstein, Istvan Cseri, Nishant Dani, Nigel Ellis, Ajay Kalhan, Gopal Kakivaya, D. Lomet, Ramesh Manne, Lev Novik, Tomas Talius","doi":"10.1109/ICDE.2011.5767935","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767935","url":null,"abstract":"Cloud SQL Server is a relational database system designed to scale-out to cloud computing workloads. It uses Microsoft SQL Server as its core. To scale out, it uses a partitioned database on a shared-nothing system architecture. Transactions are constrained to execute on one partition, to avoid the need for two-phase commit. The database is replicated for high availability using a custom primary-copy replication scheme. It currently serves as the storage engine for Microsoft's Exchange Hosted Archive and SQL Azure.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130780390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 128
期刊
2011 IEEE 27th International Conference on Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1