Pub Date : 2011-04-11DOI: 10.1109/ICDE.2011.5767848
D. Lo, Bolin Ding, Lucia, Jiawei Han
We are interested in scalable mining of a non-redundant set of significant recurrent rules from a sequence database. Recurrent rules have the form “whenever a series of precedent events occurs, eventually a series of consequent events occurs”. They are intuitive and characterize behaviors in many domains. An example is the domain of software specification, in which the rules capture a family of properties beneficial to program verification and bug detection. We enhance a past work on mining recurrent rules by Lo, Khoo, and Liu to perform mining more scalably. We propose a new set of pruning properties embedded in a new mining algorithm. Performance and case studies on benchmark synthetic and real datasets show that our approach is much more efficient and outperforms the state-of-the-art approach in mining recurrent rules by up to two orders of magnitude.
{"title":"Bidirectional mining of non-redundant recurrent rules from a sequence database","authors":"D. Lo, Bolin Ding, Lucia, Jiawei Han","doi":"10.1109/ICDE.2011.5767848","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767848","url":null,"abstract":"We are interested in scalable mining of a non-redundant set of significant recurrent rules from a sequence database. Recurrent rules have the form “whenever a series of precedent events occurs, eventually a series of consequent events occurs”. They are intuitive and characterize behaviors in many domains. An example is the domain of software specification, in which the rules capture a family of properties beneficial to program verification and bug detection. We enhance a past work on mining recurrent rules by Lo, Khoo, and Liu to perform mining more scalably. We propose a new set of pruning properties embedded in a new mining algorithm. Performance and case studies on benchmark synthetic and real datasets show that our approach is much more efficient and outperforms the state-of-the-art approach in mining recurrent rules by up to two orders of magnitude.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114258274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-04-11DOI: 10.1109/ICDE.2011.5767878
Mohamed H. Ali, B. Chandramouli, J. Goldstein, R. Schindlauer
Microsoft StreamInsight (StreamInsight, for brevity) is a platform for developing and deploying streaming applications, which need to run continuous queries over high-data-rate streams of input events. StreamInsight leverages a well-defined temporal stream model and operator algebra, as the underlying basis for processing long-running continuous queries over event streams. This allows StreamInsight to handle imperfections in event delivery and to provide correctness guarantees on the generated output. StreamInsight natively supports a diverse range of off-the-shelf streaming operators. In order to cater to a much broader range of customer scenarios and applications, StreamInsight has recently introduced a new extensibility infrastructure. With this infrastructure, StreamInsight enables developers to integrate their domain expertise within the query pipeline in the form of user defined modules (functions, operators, and aggregates). This paper describes the extensibility framework in StreamInsight; an ongoing effort at Microsoft SQL Server to support the integration of user-defined modules in a stream processing system. More specifically, the paper addresses the extensibility problem from three perspectives: the query writer's perspective, the user defined module writer's perspective, and the system's internal perspective. The paper introduces and addresses a range of new and subtle challenges that arise when we try to add extensibility to a streaming system, in a manner that is easy to use, powerful, and practical. We summarize our experience and provide future directions for supporting stream-oriented workloads in different business domains.
Microsoft StreamInsight(简称为StreamInsight)是一个开发和部署流应用程序的平台,它需要在输入事件的高数据速率流上运行连续查询。StreamInsight利用定义良好的时态流模型和操作符代数,作为处理事件流上长时间连续查询的底层基础。这允许StreamInsight处理事件交付中的缺陷,并为生成的输出提供正确性保证。StreamInsight原生支持各种现成的流媒体运营商。为了迎合更广泛的客户场景和应用,StreamInsight最近引入了一个新的可扩展性基础设施。有了这个基础设施,StreamInsight使开发人员能够以用户定义模块(函数、操作符和聚合)的形式将他们的领域专业知识集成到查询管道中。本文描述了StreamInsight的可扩展性框架;Microsoft SQL Server正在努力支持在流处理系统中集成用户定义模块。更具体地说,本文从三个角度解决了可扩展性问题:查询编写器的角度、用户定义模块编写器的角度和系统内部的角度。本文介绍并解决了一系列新的和微妙的挑战,当我们试图以一种易于使用、功能强大和实用的方式向流系统添加可扩展性时,会出现这些挑战。我们总结了我们的经验,并提供了在不同业务领域中支持面向流工作负载的未来方向。
{"title":"The extensibility framework in Microsoft StreamInsight","authors":"Mohamed H. Ali, B. Chandramouli, J. Goldstein, R. Schindlauer","doi":"10.1109/ICDE.2011.5767878","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767878","url":null,"abstract":"Microsoft StreamInsight (StreamInsight, for brevity) is a platform for developing and deploying streaming applications, which need to run continuous queries over high-data-rate streams of input events. StreamInsight leverages a well-defined temporal stream model and operator algebra, as the underlying basis for processing long-running continuous queries over event streams. This allows StreamInsight to handle imperfections in event delivery and to provide correctness guarantees on the generated output. StreamInsight natively supports a diverse range of off-the-shelf streaming operators. In order to cater to a much broader range of customer scenarios and applications, StreamInsight has recently introduced a new extensibility infrastructure. With this infrastructure, StreamInsight enables developers to integrate their domain expertise within the query pipeline in the form of user defined modules (functions, operators, and aggregates). This paper describes the extensibility framework in StreamInsight; an ongoing effort at Microsoft SQL Server to support the integration of user-defined modules in a stream processing system. More specifically, the paper addresses the extensibility problem from three perspectives: the query writer's perspective, the user defined module writer's perspective, and the system's internal perspective. The paper introduces and addresses a range of new and subtle challenges that arise when we try to add extensibility to a streaming system, in a manner that is easy to use, powerful, and practical. We summarize our experience and provide future directions for supporting stream-oriented workloads in different business domains.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117046612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-04-11DOI: 10.1109/ICDE.2011.5767889
Sang-Won Lee, Bongki Moon
Recently, a new buffer and storage management strategy called In-Page Logging (IPL) has been proposed for database systems based on flash memory. Its main objective is to overcome the limitations of flash memory such as erase-before-write and asymmetric read/write speeds by storing changes made to a data page in a form of log records without overwriting the data page itself. Since it maintains a series of changes made to a data page separately from the original data page until they are merged, the IPL scheme provides unique opportunities to design light-weight transactional support for database systems. In this paper, we propose the transactional IPL (TIPL) scheme that takes advantage of the IPL log records to support multiversion read consistency and light-weight database recovery. Due to the dual use of IPL log records, namely, for snapshot isolation and fast recovery as well as flash-aware write optimization, TIPL achieves transactional support for flash memory database systems that minimizes the space and time overhead during normal database processing and shortens the database recovery time.
{"title":"Transactional In-Page Logging for multiversion read consistency and recovery","authors":"Sang-Won Lee, Bongki Moon","doi":"10.1109/ICDE.2011.5767889","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767889","url":null,"abstract":"Recently, a new buffer and storage management strategy called In-Page Logging (IPL) has been proposed for database systems based on flash memory. Its main objective is to overcome the limitations of flash memory such as erase-before-write and asymmetric read/write speeds by storing changes made to a data page in a form of log records without overwriting the data page itself. Since it maintains a series of changes made to a data page separately from the original data page until they are merged, the IPL scheme provides unique opportunities to design light-weight transactional support for database systems. In this paper, we propose the transactional IPL (TIPL) scheme that takes advantage of the IPL log records to support multiversion read consistency and light-weight database recovery. Due to the dual use of IPL log records, namely, for snapshot isolation and fast recovery as well as flash-aware write optimization, TIPL achieves transactional support for flash memory database systems that minimizes the space and time overhead during normal database processing and shortens the database recovery time.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122967008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-04-11DOI: 10.1109/ICDE.2011.5767954
M. Xie, L. Lakshmanan, P. Wood
Classical recommender systems provide users with a list of recommendations where each recommendation consists of a single item, e.g., a book or a DVD. However, applications such as travel planning can benefit from a system capable of recommending packages of items, under a user-specified budget and in the form of sets or sequences. In this context, there is a need for a system that can recommend top-k packages for the user to choose from. In this paper, we propose a novel system, CompRec-Trip, which can automatically generate composite recommendations for travel planning. The system leverages rating information from underlying recommender systems, allows flexible package configuration and incorporates users' cost budgets on both time and money. Furthermore, the proposed CompRec-Trip system has a rich graphical user interface which allows users to customize the returned composite recommendations and take into account external local information.
{"title":"CompRec-Trip: A composite recommendation system for travel planning","authors":"M. Xie, L. Lakshmanan, P. Wood","doi":"10.1109/ICDE.2011.5767954","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767954","url":null,"abstract":"Classical recommender systems provide users with a list of recommendations where each recommendation consists of a single item, e.g., a book or a DVD. However, applications such as travel planning can benefit from a system capable of recommending packages of items, under a user-specified budget and in the form of sets or sequences. In this context, there is a need for a system that can recommend top-k packages for the user to choose from. In this paper, we propose a novel system, CompRec-Trip, which can automatically generate composite recommendations for travel planning. The system leverages rating information from underlying recommender systems, allows flexible package configuration and incorporates users' cost budgets on both time and money. Furthermore, the proposed CompRec-Trip system has a rich graphical user interface which allows users to customize the returned composite recommendations and take into account external local information.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129999339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-04-11DOI: 10.1109/ICDE.2011.5767935
P. Bernstein, Istvan Cseri, Nishant Dani, Nigel Ellis, Ajay Kalhan, Gopal Kakivaya, D. Lomet, Ramesh Manne, Lev Novik, Tomas Talius
Cloud SQL Server is a relational database system designed to scale-out to cloud computing workloads. It uses Microsoft SQL Server as its core. To scale out, it uses a partitioned database on a shared-nothing system architecture. Transactions are constrained to execute on one partition, to avoid the need for two-phase commit. The database is replicated for high availability using a custom primary-copy replication scheme. It currently serves as the storage engine for Microsoft's Exchange Hosted Archive and SQL Azure.
{"title":"Adapting microsoft SQL server for cloud computing","authors":"P. Bernstein, Istvan Cseri, Nishant Dani, Nigel Ellis, Ajay Kalhan, Gopal Kakivaya, D. Lomet, Ramesh Manne, Lev Novik, Tomas Talius","doi":"10.1109/ICDE.2011.5767935","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767935","url":null,"abstract":"Cloud SQL Server is a relational database system designed to scale-out to cloud computing workloads. It uses Microsoft SQL Server as its core. To scale out, it uses a partitioned database on a shared-nothing system architecture. Transactions are constrained to execute on one partition, to avoid the need for two-phase commit. The database is replicated for high availability using a custom primary-copy replication scheme. It currently serves as the storage engine for Microsoft's Exchange Hosted Archive and SQL Azure.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130780390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MapReduce-based data warehouse systems are playing important roles of supporting big data analytics to understand quickly the dynamics of user behavior trends and their needs in typical Web service providers and social network sites (e.g., Facebook). In such a system, the data placement structure is a critical factor that can affect the warehouse performance in a fundamental way. Based on our observations and analysis of Facebook production systems, we have characterized four requirements for the data placement structure: (1) fast data loading, (2) fast query processing, (3) highly efficient storage space utilization, and (4) strong adaptivity to highly dynamic workload patterns. We have examined three commonly accepted data placement structures in conventional databases, namely row-stores, column-stores, and hybrid-stores in the context of large data analysis using MapReduce. We show that they are not very suitable for big data processing in distributed systems. In this paper, we present a big data placement structure called RCFile (Record Columnar File) and its implementation in the Hadoop system. With intensive experiments, we show the effectiveness of RCFile in satisfying the four requirements. RCFile has been chosen in Facebook data warehouse system as the default option. It has also been adopted by Hive and Pig, the two most widely used data analysis systems developed in Facebook and Yahoo!
{"title":"RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems","authors":"Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, Zhiwei Xu","doi":"10.1109/ICDE.2011.5767933","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767933","url":null,"abstract":"MapReduce-based data warehouse systems are playing important roles of supporting big data analytics to understand quickly the dynamics of user behavior trends and their needs in typical Web service providers and social network sites (e.g., Facebook). In such a system, the data placement structure is a critical factor that can affect the warehouse performance in a fundamental way. Based on our observations and analysis of Facebook production systems, we have characterized four requirements for the data placement structure: (1) fast data loading, (2) fast query processing, (3) highly efficient storage space utilization, and (4) strong adaptivity to highly dynamic workload patterns. We have examined three commonly accepted data placement structures in conventional databases, namely row-stores, column-stores, and hybrid-stores in the context of large data analysis using MapReduce. We show that they are not very suitable for big data processing in distributed systems. In this paper, we present a big data placement structure called RCFile (Record Columnar File) and its implementation in the Hadoop system. With intensive experiments, we show the effectiveness of RCFile in satisfying the four requirements. RCFile has been chosen in Facebook data warehouse system as the default option. It has also been adopted by Hive and Pig, the two most widely used data analysis systems developed in Facebook and Yahoo!","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131274920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-04-11DOI: 10.1109/ICDE.2011.5767861
Dingming Wu, Man Lung Yiu, Christian S. Jensen, G. Cong
Web users and content are increasingly being geo-positioned. This development gives prominence to spatial keyword queries, which involve both the locations and textual descriptions of content.
网络用户和内容正日益被地理定位。这种发展突出了空间关键字查询,它涉及内容的位置和文本描述。
{"title":"Efficient continuously moving top-k spatial keyword query processing","authors":"Dingming Wu, Man Lung Yiu, Christian S. Jensen, G. Cong","doi":"10.1109/ICDE.2011.5767861","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767861","url":null,"abstract":"Web users and content are increasingly being geo-positioned. This development gives prominence to spatial keyword queries, which involve both the locations and textual descriptions of content.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121816176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-04-11DOI: 10.1109/ICDE.2011.5767851
Evangelia Kalyvianaki, W. Wiesemann, Q. Vu, D. Kuhn, P. Pietzuch
When users submit new queries to a distributed stream processing system (DSPS), a query planner must allocate physical resources, such as CPU cores, memory and network bandwidth, from a set of hosts to queries. Allocation decisions must provide the correct mix of resources required by queries, while achieving an efficient overall allocation to scale in the number of admitted queries. By exploiting overlap between queries and reusing partial results, a query planner can conserve resources but has to carry out more complex planning decisions. In this paper, we describe SQPR, a query planner that targets DSPSs in data centre environments with heterogeneous resources. SQPR models query admission, allocation and reuse as a single constrained optimisation problem and solves an approximate version to achieve scalability. It prevents individual resources from becoming bottlenecks by re-planning past allocation decisions and supports different allocation objectives. As our experimental evaluation in comparison with a state-of-the-art planner shows SQPR makes efficient resource allocation decisions, even with a high utilisation of resources, with acceptable overheads.
{"title":"SQPR: Stream query planning with reuse","authors":"Evangelia Kalyvianaki, W. Wiesemann, Q. Vu, D. Kuhn, P. Pietzuch","doi":"10.1109/ICDE.2011.5767851","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767851","url":null,"abstract":"When users submit new queries to a distributed stream processing system (DSPS), a query planner must allocate physical resources, such as CPU cores, memory and network bandwidth, from a set of hosts to queries. Allocation decisions must provide the correct mix of resources required by queries, while achieving an efficient overall allocation to scale in the number of admitted queries. By exploiting overlap between queries and reusing partial results, a query planner can conserve resources but has to carry out more complex planning decisions. In this paper, we describe SQPR, a query planner that targets DSPSs in data centre environments with heterogeneous resources. SQPR models query admission, allocation and reuse as a single constrained optimisation problem and solves an approximate version to achieve scalability. It prevents individual resources from becoming bottlenecks by re-planning past allocation decisions and supports different allocation objectives. As our experimental evaluation in comparison with a state-of-the-art planner shows SQPR makes efficient resource allocation decisions, even with a high utilisation of resources, with acceptable overheads.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116368209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-04-11DOI: 10.1109/ICDE.2011.5767918
Ru Fang, Hui-I Hsiao, Bin He, C. Mohan, Yun Wang
Storage class memory (SCM), a new generation of memory technology, offers non-volatility, high-speed, and byte-addressability, which combines the best properties of current hard disk drives (HDD) and main memory. With these extraordinary features, current systems and software stacks need to be redesigned to get significantly improved performance by eliminating disk input/output (I/O) barriers; and simpler system designs by avoiding complicated data format transformations. In current DBMSs, logging and recovery are the most important components to enforce the atomicity and durability of a database. Traditionally, database systems rely on disks for logging transaction actions and log records are forced to disks when a transaction commits. Because of the slow disk I/O speed, logging becomes one of the major bottlenecks for a DBMS. Exploiting SCM as a persistent memory for transaction logging can significantly reduce logging overhead. In this paper, we present the detailed design of an SCM-based approach for DBMSs logging, which achieves high performance by simplified system design and better concurrency support. We also discuss solutions to tackle several major issues arising during system recovery, including hole detection, partial write detection, and any-point failure recovery. This new logging approach is used to replace the traditional disk based logging approach in DBMSs. To analyze the performance characteristics of our SCM-based logging approach, we implement the prototype on IBM SolidDB. In common circumstances, our experimental results show that the new SCM-based logging approach provides as much as 7 times throughput improvement over disk-based logging in the Telecommunication Application Transaction Processing (TATP) benchmark.
{"title":"High performance database logging using storage class memory","authors":"Ru Fang, Hui-I Hsiao, Bin He, C. Mohan, Yun Wang","doi":"10.1109/ICDE.2011.5767918","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767918","url":null,"abstract":"Storage class memory (SCM), a new generation of memory technology, offers non-volatility, high-speed, and byte-addressability, which combines the best properties of current hard disk drives (HDD) and main memory. With these extraordinary features, current systems and software stacks need to be redesigned to get significantly improved performance by eliminating disk input/output (I/O) barriers; and simpler system designs by avoiding complicated data format transformations. In current DBMSs, logging and recovery are the most important components to enforce the atomicity and durability of a database. Traditionally, database systems rely on disks for logging transaction actions and log records are forced to disks when a transaction commits. Because of the slow disk I/O speed, logging becomes one of the major bottlenecks for a DBMS. Exploiting SCM as a persistent memory for transaction logging can significantly reduce logging overhead. In this paper, we present the detailed design of an SCM-based approach for DBMSs logging, which achieves high performance by simplified system design and better concurrency support. We also discuss solutions to tackle several major issues arising during system recovery, including hole detection, partial write detection, and any-point failure recovery. This new logging approach is used to replace the traditional disk based logging approach in DBMSs. To analyze the performance characteristics of our SCM-based logging approach, we implement the prototype on IBM SolidDB. In common circumstances, our experimental results show that the new SCM-based logging approach provides as much as 7 times throughput improvement over disk-based logging in the Telecommunication Application Transaction Processing (TATP) benchmark.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116430584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-04-11DOI: 10.1109/ICDE.2011.5767873
Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, R. Lipton, Jun Xu
The study of skylines and their variants has received considerable attention in recent years. Skylines are essentially sets of most interesting (undominated) tuples in a database. However, since the skyline is often very large, much research effort has been devoted to identifying a smaller subset of (say k) “representative skyline” points. Several different definitions of representative skylines have been considered. Most of these formulations are intuitive in that they try to achieve some kind of clustering “spread” over the entire skyline, with k points. In this work, we take a more principled approach in defining the representative skyline objective. One of our main contributions is to formulate the problem of displaying k representative skyline points such that the probability that a random user would click on one of them is maximized.
{"title":"Representative skylines using threshold-based preference distributions","authors":"Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, R. Lipton, Jun Xu","doi":"10.1109/ICDE.2011.5767873","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767873","url":null,"abstract":"The study of skylines and their variants has received considerable attention in recent years. Skylines are essentially sets of most interesting (undominated) tuples in a database. However, since the skyline is often very large, much research effort has been devoted to identifying a smaller subset of (say k) “representative skyline” points. Several different definitions of representative skylines have been considered. Most of these formulations are intuitive in that they try to achieve some kind of clustering “spread” over the entire skyline, with k points. In this work, we take a more principled approach in defining the representative skyline objective. One of our main contributions is to formulate the problem of displaying k representative skyline points such that the probability that a random user would click on one of them is maximized.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127238096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}