首页 > 最新文献

2011 IEEE 27th International Conference on Data Engineering最新文献

英文 中文
Bidirectional mining of non-redundant recurrent rules from a sequence database 从序列数据库中双向挖掘非冗余循环规则
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767848
D. Lo, Bolin Ding, Lucia, Jiawei Han
We are interested in scalable mining of a non-redundant set of significant recurrent rules from a sequence database. Recurrent rules have the form “whenever a series of precedent events occurs, eventually a series of consequent events occurs”. They are intuitive and characterize behaviors in many domains. An example is the domain of software specification, in which the rules capture a family of properties beneficial to program verification and bug detection. We enhance a past work on mining recurrent rules by Lo, Khoo, and Liu to perform mining more scalably. We propose a new set of pruning properties embedded in a new mining algorithm. Performance and case studies on benchmark synthetic and real datasets show that our approach is much more efficient and outperforms the state-of-the-art approach in mining recurrent rules by up to two orders of magnitude.
我们感兴趣的是从序列数据库中可扩展地挖掘非冗余的重要循环规则集。循环规则的形式是“每当一系列先例事件发生时,最终会发生一系列后续事件”。它们是直观的,并表征了许多领域的行为。软件规范领域就是一个例子,其中的规则捕获了一系列有利于程序验证和错误检测的属性。我们改进了Lo, Khoo和Liu过去在挖掘循环规则方面的工作,以执行更具可扩展性的挖掘。我们提出了一套新的剪枝属性嵌入到一个新的挖掘算法中。在基准合成数据集和真实数据集上的性能和案例研究表明,我们的方法更有效,并且在挖掘循环规则方面比最先进的方法高出两个数量级。
{"title":"Bidirectional mining of non-redundant recurrent rules from a sequence database","authors":"D. Lo, Bolin Ding, Lucia, Jiawei Han","doi":"10.1109/ICDE.2011.5767848","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767848","url":null,"abstract":"We are interested in scalable mining of a non-redundant set of significant recurrent rules from a sequence database. Recurrent rules have the form “whenever a series of precedent events occurs, eventually a series of consequent events occurs”. They are intuitive and characterize behaviors in many domains. An example is the domain of software specification, in which the rules capture a family of properties beneficial to program verification and bug detection. We enhance a past work on mining recurrent rules by Lo, Khoo, and Liu to perform mining more scalably. We propose a new set of pruning properties embedded in a new mining algorithm. Performance and case studies on benchmark synthetic and real datasets show that our approach is much more efficient and outperforms the state-of-the-art approach in mining recurrent rules by up to two orders of magnitude.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114258274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
The extensibility framework in Microsoft StreamInsight Microsoft StreamInsight中的可扩展性框架
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767878
Mohamed H. Ali, B. Chandramouli, J. Goldstein, R. Schindlauer
Microsoft StreamInsight (StreamInsight, for brevity) is a platform for developing and deploying streaming applications, which need to run continuous queries over high-data-rate streams of input events. StreamInsight leverages a well-defined temporal stream model and operator algebra, as the underlying basis for processing long-running continuous queries over event streams. This allows StreamInsight to handle imperfections in event delivery and to provide correctness guarantees on the generated output. StreamInsight natively supports a diverse range of off-the-shelf streaming operators. In order to cater to a much broader range of customer scenarios and applications, StreamInsight has recently introduced a new extensibility infrastructure. With this infrastructure, StreamInsight enables developers to integrate their domain expertise within the query pipeline in the form of user defined modules (functions, operators, and aggregates). This paper describes the extensibility framework in StreamInsight; an ongoing effort at Microsoft SQL Server to support the integration of user-defined modules in a stream processing system. More specifically, the paper addresses the extensibility problem from three perspectives: the query writer's perspective, the user defined module writer's perspective, and the system's internal perspective. The paper introduces and addresses a range of new and subtle challenges that arise when we try to add extensibility to a streaming system, in a manner that is easy to use, powerful, and practical. We summarize our experience and provide future directions for supporting stream-oriented workloads in different business domains.
Microsoft StreamInsight(简称为StreamInsight)是一个开发和部署流应用程序的平台,它需要在输入事件的高数据速率流上运行连续查询。StreamInsight利用定义良好的时态流模型和操作符代数,作为处理事件流上长时间连续查询的底层基础。这允许StreamInsight处理事件交付中的缺陷,并为生成的输出提供正确性保证。StreamInsight原生支持各种现成的流媒体运营商。为了迎合更广泛的客户场景和应用,StreamInsight最近引入了一个新的可扩展性基础设施。有了这个基础设施,StreamInsight使开发人员能够以用户定义模块(函数、操作符和聚合)的形式将他们的领域专业知识集成到查询管道中。本文描述了StreamInsight的可扩展性框架;Microsoft SQL Server正在努力支持在流处理系统中集成用户定义模块。更具体地说,本文从三个角度解决了可扩展性问题:查询编写器的角度、用户定义模块编写器的角度和系统内部的角度。本文介绍并解决了一系列新的和微妙的挑战,当我们试图以一种易于使用、功能强大和实用的方式向流系统添加可扩展性时,会出现这些挑战。我们总结了我们的经验,并提供了在不同业务领域中支持面向流工作负载的未来方向。
{"title":"The extensibility framework in Microsoft StreamInsight","authors":"Mohamed H. Ali, B. Chandramouli, J. Goldstein, R. Schindlauer","doi":"10.1109/ICDE.2011.5767878","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767878","url":null,"abstract":"Microsoft StreamInsight (StreamInsight, for brevity) is a platform for developing and deploying streaming applications, which need to run continuous queries over high-data-rate streams of input events. StreamInsight leverages a well-defined temporal stream model and operator algebra, as the underlying basis for processing long-running continuous queries over event streams. This allows StreamInsight to handle imperfections in event delivery and to provide correctness guarantees on the generated output. StreamInsight natively supports a diverse range of off-the-shelf streaming operators. In order to cater to a much broader range of customer scenarios and applications, StreamInsight has recently introduced a new extensibility infrastructure. With this infrastructure, StreamInsight enables developers to integrate their domain expertise within the query pipeline in the form of user defined modules (functions, operators, and aggregates). This paper describes the extensibility framework in StreamInsight; an ongoing effort at Microsoft SQL Server to support the integration of user-defined modules in a stream processing system. More specifically, the paper addresses the extensibility problem from three perspectives: the query writer's perspective, the user defined module writer's perspective, and the system's internal perspective. The paper introduces and addresses a range of new and subtle challenges that arise when we try to add extensibility to a streaming system, in a manner that is easy to use, powerful, and practical. We summarize our experience and provide future directions for supporting stream-oriented workloads in different business domains.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117046612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
Transactional In-Page Logging for multiversion read consistency and recovery 用于多版本读取一致性和恢复的事务性页内日志记录
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767889
Sang-Won Lee, Bongki Moon
Recently, a new buffer and storage management strategy called In-Page Logging (IPL) has been proposed for database systems based on flash memory. Its main objective is to overcome the limitations of flash memory such as erase-before-write and asymmetric read/write speeds by storing changes made to a data page in a form of log records without overwriting the data page itself. Since it maintains a series of changes made to a data page separately from the original data page until they are merged, the IPL scheme provides unique opportunities to design light-weight transactional support for database systems. In this paper, we propose the transactional IPL (TIPL) scheme that takes advantage of the IPL log records to support multiversion read consistency and light-weight database recovery. Due to the dual use of IPL log records, namely, for snapshot isolation and fast recovery as well as flash-aware write optimization, TIPL achieves transactional support for flash memory database systems that minimizes the space and time overhead during normal database processing and shortens the database recovery time.
近年来,针对基于闪存的数据库系统,提出了一种新的缓存和存储管理策略——页内日志(IPL)。它的主要目标是通过以日志记录的形式存储对数据页的更改,而不覆盖数据页本身,从而克服闪存的限制,例如写前擦除和非对称读/写速度。由于它将对数据页所做的一系列更改与原始数据页分开维护,直到它们合并,因此IPL方案为为数据库系统设计轻量级事务支持提供了独特的机会。本文提出了一种事务性IPL (TIPL)方案,该方案利用IPL日志记录来支持多版本读取一致性和轻量级数据库恢复。由于IPL日志记录的双重用途,即用于快照隔离和快速恢复以及闪存感知写优化,TIPL实现了对闪存数据库系统的事务性支持,从而最大限度地减少了正常数据库处理期间的空间和时间开销,缩短了数据库恢复时间。
{"title":"Transactional In-Page Logging for multiversion read consistency and recovery","authors":"Sang-Won Lee, Bongki Moon","doi":"10.1109/ICDE.2011.5767889","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767889","url":null,"abstract":"Recently, a new buffer and storage management strategy called In-Page Logging (IPL) has been proposed for database systems based on flash memory. Its main objective is to overcome the limitations of flash memory such as erase-before-write and asymmetric read/write speeds by storing changes made to a data page in a form of log records without overwriting the data page itself. Since it maintains a series of changes made to a data page separately from the original data page until they are merged, the IPL scheme provides unique opportunities to design light-weight transactional support for database systems. In this paper, we propose the transactional IPL (TIPL) scheme that takes advantage of the IPL log records to support multiversion read consistency and light-weight database recovery. Due to the dual use of IPL log records, namely, for snapshot isolation and fast recovery as well as flash-aware write optimization, TIPL achieves transactional support for flash memory database systems that minimizes the space and time overhead during normal database processing and shortens the database recovery time.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122967008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
CompRec-Trip: A composite recommendation system for travel planning CompRec-Trip:一个旅行计划的综合推荐系统
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767954
M. Xie, L. Lakshmanan, P. Wood
Classical recommender systems provide users with a list of recommendations where each recommendation consists of a single item, e.g., a book or a DVD. However, applications such as travel planning can benefit from a system capable of recommending packages of items, under a user-specified budget and in the form of sets or sequences. In this context, there is a need for a system that can recommend top-k packages for the user to choose from. In this paper, we propose a novel system, CompRec-Trip, which can automatically generate composite recommendations for travel planning. The system leverages rating information from underlying recommender systems, allows flexible package configuration and incorporates users' cost budgets on both time and money. Furthermore, the proposed CompRec-Trip system has a rich graphical user interface which allows users to customize the returned composite recommendations and take into account external local information.
经典的推荐系统为用户提供一个推荐列表,其中每个推荐由单个项目组成,例如,一本书或DVD。然而,像旅行计划这样的应用程序可以从一个能够在用户指定的预算下以集合或序列的形式推荐物品包的系统中受益。在这种情况下,需要一个能够推荐top-k软件包供用户选择的系统。在本文中,我们提出了一个新的系统,CompRec-Trip,它可以自动生成复合推荐的旅行计划。该系统利用来自底层推荐系统的评级信息,允许灵活的包配置,并结合用户的时间和金钱成本预算。此外,所提出的CompRec-Trip系统具有丰富的图形用户界面,允许用户自定义返回的组合推荐并考虑外部本地信息。
{"title":"CompRec-Trip: A composite recommendation system for travel planning","authors":"M. Xie, L. Lakshmanan, P. Wood","doi":"10.1109/ICDE.2011.5767954","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767954","url":null,"abstract":"Classical recommender systems provide users with a list of recommendations where each recommendation consists of a single item, e.g., a book or a DVD. However, applications such as travel planning can benefit from a system capable of recommending packages of items, under a user-specified budget and in the form of sets or sequences. In this context, there is a need for a system that can recommend top-k packages for the user to choose from. In this paper, we propose a novel system, CompRec-Trip, which can automatically generate composite recommendations for travel planning. The system leverages rating information from underlying recommender systems, allows flexible package configuration and incorporates users' cost budgets on both time and money. Furthermore, the proposed CompRec-Trip system has a rich graphical user interface which allows users to customize the returned composite recommendations and take into account external local information.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129999339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Adapting microsoft SQL server for cloud computing 为云计算调整microsoft SQL server
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767935
P. Bernstein, Istvan Cseri, Nishant Dani, Nigel Ellis, Ajay Kalhan, Gopal Kakivaya, D. Lomet, Ramesh Manne, Lev Novik, Tomas Talius
Cloud SQL Server is a relational database system designed to scale-out to cloud computing workloads. It uses Microsoft SQL Server as its core. To scale out, it uses a partitioned database on a shared-nothing system architecture. Transactions are constrained to execute on one partition, to avoid the need for two-phase commit. The database is replicated for high availability using a custom primary-copy replication scheme. It currently serves as the storage engine for Microsoft's Exchange Hosted Archive and SQL Azure.
云SQL Server是一个关系数据库系统,旨在向外扩展到云计算工作负载。它以Microsoft SQL Server为核心。为了向外扩展,它在无共享系统架构上使用分区数据库。事务被限制在一个分区上执行,以避免需要两阶段提交。使用自定义主拷贝复制方案复制数据库以获得高可用性。它目前是微软Exchange Hosted Archive和SQL Azure的存储引擎。
{"title":"Adapting microsoft SQL server for cloud computing","authors":"P. Bernstein, Istvan Cseri, Nishant Dani, Nigel Ellis, Ajay Kalhan, Gopal Kakivaya, D. Lomet, Ramesh Manne, Lev Novik, Tomas Talius","doi":"10.1109/ICDE.2011.5767935","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767935","url":null,"abstract":"Cloud SQL Server is a relational database system designed to scale-out to cloud computing workloads. It uses Microsoft SQL Server as its core. To scale out, it uses a partitioned database on a shared-nothing system architecture. Transactions are constrained to execute on one partition, to avoid the need for two-phase commit. The database is replicated for high availability using a custom primary-copy replication scheme. It currently serves as the storage engine for Microsoft's Exchange Hosted Archive and SQL Azure.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130780390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 128
RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems RCFile:在基于mapreduce的仓库系统中快速且节省空间的数据放置结构
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767933
Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, Zhiwei Xu
MapReduce-based data warehouse systems are playing important roles of supporting big data analytics to understand quickly the dynamics of user behavior trends and their needs in typical Web service providers and social network sites (e.g., Facebook). In such a system, the data placement structure is a critical factor that can affect the warehouse performance in a fundamental way. Based on our observations and analysis of Facebook production systems, we have characterized four requirements for the data placement structure: (1) fast data loading, (2) fast query processing, (3) highly efficient storage space utilization, and (4) strong adaptivity to highly dynamic workload patterns. We have examined three commonly accepted data placement structures in conventional databases, namely row-stores, column-stores, and hybrid-stores in the context of large data analysis using MapReduce. We show that they are not very suitable for big data processing in distributed systems. In this paper, we present a big data placement structure called RCFile (Record Columnar File) and its implementation in the Hadoop system. With intensive experiments, we show the effectiveness of RCFile in satisfying the four requirements. RCFile has been chosen in Facebook data warehouse system as the default option. It has also been adopted by Hive and Pig, the two most widely used data analysis systems developed in Facebook and Yahoo!
基于mapreduce的数据仓库系统在支持大数据分析方面发挥着重要作用,可以快速了解典型Web服务提供商和社交网站(例如Facebook)中用户行为趋势的动态及其需求。在这样的系统中,数据放置结构是能够从根本上影响仓库性能的关键因素。根据我们对Facebook生产系统的观察和分析,我们描述了数据放置结构的四个要求:(1)快速数据加载,(2)快速查询处理,(3)高效存储空间利用,以及(4)对高动态工作负载模式的强适应性。我们研究了传统数据库中常用的三种数据放置结构,即使用MapReduce进行大数据分析时的行存储、列存储和混合存储。我们表明它们不太适合分布式系统中的大数据处理。本文介绍了一种名为RCFile (Record Columnar File)的大数据放置结构及其在Hadoop系统中的实现。通过大量的实验,我们证明了RCFile在满足这四个要求方面的有效性。Facebook数据仓库系统选择RCFile作为默认选项。它也被Hive和Pig所采用,这是Facebook和Yahoo!
{"title":"RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems","authors":"Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, Zhiwei Xu","doi":"10.1109/ICDE.2011.5767933","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767933","url":null,"abstract":"MapReduce-based data warehouse systems are playing important roles of supporting big data analytics to understand quickly the dynamics of user behavior trends and their needs in typical Web service providers and social network sites (e.g., Facebook). In such a system, the data placement structure is a critical factor that can affect the warehouse performance in a fundamental way. Based on our observations and analysis of Facebook production systems, we have characterized four requirements for the data placement structure: (1) fast data loading, (2) fast query processing, (3) highly efficient storage space utilization, and (4) strong adaptivity to highly dynamic workload patterns. We have examined three commonly accepted data placement structures in conventional databases, namely row-stores, column-stores, and hybrid-stores in the context of large data analysis using MapReduce. We show that they are not very suitable for big data processing in distributed systems. In this paper, we present a big data placement structure called RCFile (Record Columnar File) and its implementation in the Hadoop system. With intensive experiments, we show the effectiveness of RCFile in satisfying the four requirements. RCFile has been chosen in Facebook data warehouse system as the default option. It has also been adopted by Hive and Pig, the two most widely used data analysis systems developed in Facebook and Yahoo!","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131274920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 292
Efficient continuously moving top-k spatial keyword query processing 高效的连续移动top-k空间关键字查询处理
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767861
Dingming Wu, Man Lung Yiu, Christian S. Jensen, G. Cong
Web users and content are increasingly being geo-positioned. This development gives prominence to spatial keyword queries, which involve both the locations and textual descriptions of content.
网络用户和内容正日益被地理定位。这种发展突出了空间关键字查询,它涉及内容的位置和文本描述。
{"title":"Efficient continuously moving top-k spatial keyword query processing","authors":"Dingming Wu, Man Lung Yiu, Christian S. Jensen, G. Cong","doi":"10.1109/ICDE.2011.5767861","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767861","url":null,"abstract":"Web users and content are increasingly being geo-positioned. This development gives prominence to spatial keyword queries, which involve both the locations and textual descriptions of content.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121816176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 159
SQPR: Stream query planning with reuse SQPR:具有重用性的流查询规划
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767851
Evangelia Kalyvianaki, W. Wiesemann, Q. Vu, D. Kuhn, P. Pietzuch
When users submit new queries to a distributed stream processing system (DSPS), a query planner must allocate physical resources, such as CPU cores, memory and network bandwidth, from a set of hosts to queries. Allocation decisions must provide the correct mix of resources required by queries, while achieving an efficient overall allocation to scale in the number of admitted queries. By exploiting overlap between queries and reusing partial results, a query planner can conserve resources but has to carry out more complex planning decisions. In this paper, we describe SQPR, a query planner that targets DSPSs in data centre environments with heterogeneous resources. SQPR models query admission, allocation and reuse as a single constrained optimisation problem and solves an approximate version to achieve scalability. It prevents individual resources from becoming bottlenecks by re-planning past allocation decisions and supports different allocation objectives. As our experimental evaluation in comparison with a state-of-the-art planner shows SQPR makes efficient resource allocation decisions, even with a high utilisation of resources, with acceptable overheads.
当用户向分布式流处理系统(DSPS)提交新的查询时,查询规划器必须从一组主机上为查询分配物理资源,如CPU内核、内存和网络带宽。分配决策必须提供查询所需资源的正确组合,同时实现有效的总体分配,以扩展所允许查询的数量。通过利用查询之间的重叠和重用部分结果,查询规划器可以节省资源,但必须执行更复杂的规划决策。在本文中,我们描述了SQPR,一种针对具有异构资源的数据中心环境中的dsb的查询规划器。SQPR模型将查询准入、分配和重用作为一个单一的约束优化问题,并解决一个近似的版本来实现可扩展性。它通过重新规划过去的分配决策来防止单个资源成为瓶颈,并支持不同的分配目标。与最先进的规划器相比,我们的实验评估表明,即使资源利用率很高,SQPR也能在可接受的开销下做出有效的资源分配决策。
{"title":"SQPR: Stream query planning with reuse","authors":"Evangelia Kalyvianaki, W. Wiesemann, Q. Vu, D. Kuhn, P. Pietzuch","doi":"10.1109/ICDE.2011.5767851","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767851","url":null,"abstract":"When users submit new queries to a distributed stream processing system (DSPS), a query planner must allocate physical resources, such as CPU cores, memory and network bandwidth, from a set of hosts to queries. Allocation decisions must provide the correct mix of resources required by queries, while achieving an efficient overall allocation to scale in the number of admitted queries. By exploiting overlap between queries and reusing partial results, a query planner can conserve resources but has to carry out more complex planning decisions. In this paper, we describe SQPR, a query planner that targets DSPSs in data centre environments with heterogeneous resources. SQPR models query admission, allocation and reuse as a single constrained optimisation problem and solves an approximate version to achieve scalability. It prevents individual resources from becoming bottlenecks by re-planning past allocation decisions and supports different allocation objectives. As our experimental evaluation in comparison with a state-of-the-art planner shows SQPR makes efficient resource allocation decisions, even with a high utilisation of resources, with acceptable overheads.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116368209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
High performance database logging using storage class memory 使用存储类内存的高性能数据库日志记录
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767918
Ru Fang, Hui-I Hsiao, Bin He, C. Mohan, Yun Wang
Storage class memory (SCM), a new generation of memory technology, offers non-volatility, high-speed, and byte-addressability, which combines the best properties of current hard disk drives (HDD) and main memory. With these extraordinary features, current systems and software stacks need to be redesigned to get significantly improved performance by eliminating disk input/output (I/O) barriers; and simpler system designs by avoiding complicated data format transformations. In current DBMSs, logging and recovery are the most important components to enforce the atomicity and durability of a database. Traditionally, database systems rely on disks for logging transaction actions and log records are forced to disks when a transaction commits. Because of the slow disk I/O speed, logging becomes one of the major bottlenecks for a DBMS. Exploiting SCM as a persistent memory for transaction logging can significantly reduce logging overhead. In this paper, we present the detailed design of an SCM-based approach for DBMSs logging, which achieves high performance by simplified system design and better concurrency support. We also discuss solutions to tackle several major issues arising during system recovery, including hole detection, partial write detection, and any-point failure recovery. This new logging approach is used to replace the traditional disk based logging approach in DBMSs. To analyze the performance characteristics of our SCM-based logging approach, we implement the prototype on IBM SolidDB. In common circumstances, our experimental results show that the new SCM-based logging approach provides as much as 7 times throughput improvement over disk-based logging in the Telecommunication Application Transaction Processing (TATP) benchmark.
存储级存储器(SCM)是新一代的存储器技术,它结合了当前硬盘驱动器(HDD)和主存储器的最佳特性,具有非易失性、高速和字节寻址能力。有了这些非凡的功能,当前的系统和软件栈需要重新设计,通过消除磁盘输入/输出(I/O)障碍来显著提高性能;通过避免复杂的数据格式转换,简化系统设计。在当前的dbms中,日志记录和恢复是加强数据库原子性和持久性的最重要组件。传统上,数据库系统依赖于磁盘记录事务操作,当事务提交时,日志记录被强制保存到磁盘。由于磁盘I/O速度较慢,日志记录成为DBMS的主要瓶颈之一。利用SCM作为事务日志的持久内存可以显著减少日志开销。在本文中,我们详细设计了一种基于scm的dbms日志记录方法,通过简化系统设计和更好的并发支持来实现高性能。我们还讨论了解决系统恢复过程中出现的几个主要问题的解决方案,包括漏洞检测、部分写检测和任意点故障恢复。这种新的日志记录方法用于取代dbms中传统的基于磁盘的日志记录方法。为了分析基于scm的日志记录方法的性能特征,我们在IBM SolidDB上实现了原型。在一般情况下,我们的实验结果表明,在电信应用事务处理(TATP)基准测试中,新的基于scm的日志记录方法提供了比基于磁盘的日志记录多7倍的吞吐量改进。
{"title":"High performance database logging using storage class memory","authors":"Ru Fang, Hui-I Hsiao, Bin He, C. Mohan, Yun Wang","doi":"10.1109/ICDE.2011.5767918","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767918","url":null,"abstract":"Storage class memory (SCM), a new generation of memory technology, offers non-volatility, high-speed, and byte-addressability, which combines the best properties of current hard disk drives (HDD) and main memory. With these extraordinary features, current systems and software stacks need to be redesigned to get significantly improved performance by eliminating disk input/output (I/O) barriers; and simpler system designs by avoiding complicated data format transformations. In current DBMSs, logging and recovery are the most important components to enforce the atomicity and durability of a database. Traditionally, database systems rely on disks for logging transaction actions and log records are forced to disks when a transaction commits. Because of the slow disk I/O speed, logging becomes one of the major bottlenecks for a DBMS. Exploiting SCM as a persistent memory for transaction logging can significantly reduce logging overhead. In this paper, we present the detailed design of an SCM-based approach for DBMSs logging, which achieves high performance by simplified system design and better concurrency support. We also discuss solutions to tackle several major issues arising during system recovery, including hole detection, partial write detection, and any-point failure recovery. This new logging approach is used to replace the traditional disk based logging approach in DBMSs. To analyze the performance characteristics of our SCM-based logging approach, we implement the prototype on IBM SolidDB. In common circumstances, our experimental results show that the new SCM-based logging approach provides as much as 7 times throughput improvement over disk-based logging in the Telecommunication Application Transaction Processing (TATP) benchmark.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116430584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 110
Representative skylines using threshold-based preference distributions 使用基于阈值的偏好分布的代表性天际线
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767873
Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, R. Lipton, Jun Xu
The study of skylines and their variants has received considerable attention in recent years. Skylines are essentially sets of most interesting (undominated) tuples in a database. However, since the skyline is often very large, much research effort has been devoted to identifying a smaller subset of (say k) “representative skyline” points. Several different definitions of representative skylines have been considered. Most of these formulations are intuitive in that they try to achieve some kind of clustering “spread” over the entire skyline, with k points. In this work, we take a more principled approach in defining the representative skyline objective. One of our main contributions is to formulate the problem of displaying k representative skyline points such that the probability that a random user would click on one of them is maximized.
近年来,对天际线及其变化的研究受到了相当大的关注。天际线本质上是数据库中最有趣的(未决定的)元组的集合。然而,由于天际线通常非常大,许多研究工作一直致力于确定较小的子集(例如k)“代表性天际线”点。有代表性的天际线有几种不同的定义。大多数这些公式都是直观的,因为它们试图在整个天际线上实现某种聚类“扩展”,有k个点。在这项工作中,我们采取了更有原则的方法来定义具有代表性的天际线目标。我们的主要贡献之一是制定了显示k个代表性天际线点的问题,以便随机用户点击其中一个点的概率最大化。
{"title":"Representative skylines using threshold-based preference distributions","authors":"Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, R. Lipton, Jun Xu","doi":"10.1109/ICDE.2011.5767873","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767873","url":null,"abstract":"The study of skylines and their variants has received considerable attention in recent years. Skylines are essentially sets of most interesting (undominated) tuples in a database. However, since the skyline is often very large, much research effort has been devoted to identifying a smaller subset of (say k) “representative skyline” points. Several different definitions of representative skylines have been considered. Most of these formulations are intuitive in that they try to achieve some kind of clustering “spread” over the entire skyline, with k points. In this work, we take a more principled approach in defining the representative skyline objective. One of our main contributions is to formulate the problem of displaying k representative skyline points such that the probability that a random user would click on one of them is maximized.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127238096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
期刊
2011 IEEE 27th International Conference on Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1