Future database application systems will be designed as Service Oriented Architectures (SOAs) like SAP’s NetWeaver instead of monolithic software systems such as SAP’s R/3. The decomposition in finer-grained services allows the usage of hardware clusters and a flexible serviceto- server allocation but also increases the complexity of administration. Thus, new administration techniques like our self-organizing infrastructure that we developed in cooperation with the SAP Adaptive Computing Infrastructure (ACI) group are necessary. For our purpose the available hardware is virtualized, pooled, and monitored. A fuzzy logic based controller module supervises all services running on the hardware platform and remedies exceptional situations automatically. With this self-organizing infrastructure we reduce the necessary hardware and administration overhead and, thus, lower the total cost of ownership (TCO). We used our prototype implementation, called Auto- Globe, for SAP-internal tests and we performed comprehensive simulation studies to demonstrate the effectiveness of our proposed concept.
{"title":"AutoGlobe: An Automatic Administration Concept for Service-Oriented Database Applications","authors":"S. Seltzsam, D. Gmach, Stefan Krompass, A. Kemper","doi":"10.1109/ICDE.2006.26","DOIUrl":"https://doi.org/10.1109/ICDE.2006.26","url":null,"abstract":"Future database application systems will be designed as Service Oriented Architectures (SOAs) like SAP’s NetWeaver instead of monolithic software systems such as SAP’s R/3. The decomposition in finer-grained services allows the usage of hardware clusters and a flexible serviceto- server allocation but also increases the complexity of administration. Thus, new administration techniques like our self-organizing infrastructure that we developed in cooperation with the SAP Adaptive Computing Infrastructure (ACI) group are necessary. For our purpose the available hardware is virtualized, pooled, and monitored. A fuzzy logic based controller module supervises all services running on the hardware platform and remedies exceptional situations automatically. With this self-organizing infrastructure we reduce the necessary hardware and administration overhead and, thus, lower the total cost of ownership (TCO). We used our prototype implementation, called Auto- Globe, for SAP-internal tests and we performed comprehensive simulation studies to demonstrate the effectiveness of our proposed concept.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"90 1","pages":"90-90"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76730817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Esther Ryvkina, Anurag Maskey, Mitch Cherniack, S. Zdonik
Data stream processing systems have become ubiquitous in academic [1, 2, 5, 6] and commercial [11] sectors, with application areas that include financial services, network traffic analysis, battlefield monitoring and traffic control [3]. The append-only model of streams implies that input data is immutable and therefore always correct. But in practice, streaming data sources often contend with noise (e.g., embedded sensors) or data entry errors (e.g., financial data feeds) resulting in erroneous inputs and therefore, erroneous query results. Many data stream sources (e.g., commercial ticker feeds) issue "revision tuples" (revisions) that amend previously issued tuples (e.g. erroneous share prices). Ideally, any stream processing engine should process revision inputs by generating revision outputs that correct previous query results. We know of no stream processing system that presently has this capability.
{"title":"Revision Processing in a Stream Processing Engine: A High-Level Design","authors":"Esther Ryvkina, Anurag Maskey, Mitch Cherniack, S. Zdonik","doi":"10.1109/ICDE.2006.130","DOIUrl":"https://doi.org/10.1109/ICDE.2006.130","url":null,"abstract":"Data stream processing systems have become ubiquitous in academic [1, 2, 5, 6] and commercial [11] sectors, with application areas that include financial services, network traffic analysis, battlefield monitoring and traffic control [3]. The append-only model of streams implies that input data is immutable and therefore always correct. But in practice, streaming data sources often contend with noise (e.g., embedded sensors) or data entry errors (e.g., financial data feeds) resulting in erroneous inputs and therefore, erroneous query results. Many data stream sources (e.g., commercial ticker feeds) issue \"revision tuples\" (revisions) that amend previously issued tuples (e.g. erroneous share prices). Ideally, any stream processing engine should process revision inputs by generating revision outputs that correct previous query results. We know of no stream processing system that presently has this capability.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"5 1","pages":"141-141"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73164823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashwin Machanavajjhala, J. Gehrke, Daniel Kifer, Muthuramakrishnan Venkitasubramaniam
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called kappa-anonymity has gained popularity. In a kappa-anonymized dataset, each record is indistinguishable from at least k—1 other records with respect to certain "identifying" attributes. In this paper we show with two simple attacks that a kappa-anonymized dataset has some subtle, but severe privacy problems. First, we show that an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. Second, attackers often have background knowledge, and we show that kappa-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks and we propose a novel and powerful privacy definition called ell-diversity. In addition to building a formal foundation for ell-diversity, we show in an experimental evaluation that ell-diversity is practical and can be implemented efficiently.
在不泄露个人敏感信息的情况下发布个人数据是一个重要问题。近年来,一种名为“kappa-匿名”的隐私新定义越来越受欢迎。在kappa匿名数据集中,每条记录在某些“识别”属性方面与至少k-1条其他记录无法区分。在本文中,我们通过两个简单的攻击表明,一个kappa匿名数据集存在一些微妙但严重的隐私问题。首先,我们证明了当敏感属性的多样性很小时,攻击者可以发现这些敏感属性的值。其次,攻击者通常有背景知识,我们证明了kappa匿名并不能保证隐私免受使用背景知识的攻击者的攻击。我们对这两种攻击进行了详细的分析,并提出了一种新颖而强大的隐私定义,称为well -diversity。除了为 well -diversity建立正式的基础外,我们还通过实验评估表明 well -diversity是可行的,并且可以有效地实施。
{"title":"L-diversity: privacy beyond k-anonymity","authors":"Ashwin Machanavajjhala, J. Gehrke, Daniel Kifer, Muthuramakrishnan Venkitasubramaniam","doi":"10.1145/1217299.1217302","DOIUrl":"https://doi.org/10.1145/1217299.1217302","url":null,"abstract":"Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called kappa-anonymity has gained popularity. In a kappa-anonymized dataset, each record is indistinguishable from at least k—1 other records with respect to certain \"identifying\" attributes. In this paper we show with two simple attacks that a kappa-anonymized dataset has some subtle, but severe privacy problems. First, we show that an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. Second, attackers often have background knowledge, and we show that kappa-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks and we propose a novel and powerful privacy definition called ell-diversity. In addition to building a formal foundation for ell-diversity, we show in an experimental evaluation that ell-diversity is practical and can be implemented efficiently.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"44 1","pages":"24-24"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73750049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing techniques to mine periodic patterns in time series data are focused on discovering full-cycle periodic patterns from an entire time series. However, many useful partial periodic patterns are hidden in long and complex time series data. In this paper, we aim to discover the partial periodicity in local segments of the time series data. We introduce the notion of character density to partition the time series into variable-length fragments and to determine the lower bound of each character’s period. We propose a novel algorithm, called DPMiner, to find the dense periodic patterns in time series data. Experimental results on both synthetic and real-life datasets demonstrate that the proposed algorithm is effective and efficient to reveal interesting dense periodic patterns.
{"title":"Mining Dense Periodic Patterns in Time Series Data","authors":"Chang Sheng, W. Hsu, M. Lee","doi":"10.1109/ICDE.2006.97","DOIUrl":"https://doi.org/10.1109/ICDE.2006.97","url":null,"abstract":"Existing techniques to mine periodic patterns in time series data are focused on discovering full-cycle periodic patterns from an entire time series. However, many useful partial periodic patterns are hidden in long and complex time series data. In this paper, we aim to discover the partial periodicity in local segments of the time series data. We introduce the notion of character density to partition the time series into variable-length fragments and to determine the lower bound of each character’s period. We propose a novel algorithm, called DPMiner, to find the dense periodic patterns in time series data. Experimental results on both synthetic and real-life datasets demonstrate that the proposed algorithm is effective and efficient to reveal interesting dense periodic patterns.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"4 1","pages":"115-115"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84232954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
XML is an important practical paradigm in information technology and has a broad range of applications. How to access and retrieve the XML data is crucial to these applications. There are two standard ways for accessing and manipulating XML data, the Simple API for XML (SAX) and the Document Object Model (DOM). However, when an application needs to traverse through XML data, it is not easy to retrieve the required data with these two standard ways. XML data is impossible to be retrieved back and forth by SAX, and the graph-oriented DOM notation is not easy to work with. With such limitation, the W3C supervises the development of three important languages: XPath [3], XQuery and XSLT for exploring and querying XML. Among these three languages, XPath is the key and cornerstone language for the other two. XPath defines expressions for traversing an XML document and specifies the set of nodes (XPath 1.0) or the sequence of nodes (XPath 2.0) in the XML document.
{"title":"XPlainer: An XPath Debugging Framework","authors":"M. Consens, John W. S. Liu, Bill O'Farrell","doi":"10.1109/ICDE.2006.177","DOIUrl":"https://doi.org/10.1109/ICDE.2006.177","url":null,"abstract":"XML is an important practical paradigm in information technology and has a broad range of applications. How to access and retrieve the XML data is crucial to these applications. There are two standard ways for accessing and manipulating XML data, the Simple API for XML (SAX) and the Document Object Model (DOM). However, when an application needs to traverse through XML data, it is not easy to retrieve the required data with these two standard ways. XML data is impossible to be retrieved back and forth by SAX, and the graph-oriented DOM notation is not easy to work with. With such limitation, the W3C supervises the development of three important languages: XPath [3], XQuery and XSLT for exploring and querying XML. Among these three languages, XPath is the key and cornerstone language for the other two. XPath defines expressions for traversing an XML document and specifies the set of nodes (XPath 1.0) or the sequence of nodes (XPath 2.0) in the XML document.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"78 1","pages":"170-170"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84032335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Until recently, most data integration techniques involved central components, e.g., global schemas, to enable transparent access to heterogeneous databases. Today, however, with the democratization of tools facilitating knowledge elicitation in machine-processable formats, one cannot rely on global, centralized schemas anymore as knowledge creation and consumption are getting more and more dynamic and decentralized. Peer Data Management Systems (PDMS) provide an answer to this problem by eliminating the central semantic component and considering instead compositions of local, pair-wise mappings to propagate queries from one database to the others. PDMS approaches proposed so far make the implicit assumption that all mappings used in this way are correct. This obviously cannot be taken as granted in typical PDMS settings where mappings can be created (semi) automatically by independent parties. In this work, we propose a totally decentralized, efficient message passing scheme to automatically detect erroneous mappings in PDMS. Our scheme is based on a probabilistic model where we take advantage of transitive closures of mapping operations to confront local belief on the correctness of a mapping against evidences gathered around the network. We show that our scheme can be efficiently embedded in any PDMS and provide a preliminary evaluation of our techniques on sets of both automatically-generated and real-world schemas.
直到最近,大多数数据集成技术都涉及中心组件,例如全局模式,以支持对异构数据库的透明访问。然而,今天,随着以机器可处理格式促进知识获取的工具的民主化,人们不能再依赖全局的、集中的模式,因为知识的创造和消费变得越来越动态和分散。对等数据管理系统(Peer Data Management Systems, PDMS)解决了这个问题,它消除了中心语义组件,转而考虑本地配对映射的组合,从而将查询从一个数据库传播到另一个数据库。目前提出的PDMS方法隐含地假设以这种方式使用的所有映射都是正确的。在典型的PDMS设置中,映射可以由独立的各方(半)自动创建,这显然不能被视为理所当然。在这项工作中,我们提出了一个完全分散的、高效的消息传递方案来自动检测PDMS中的错误映射。我们的方案基于一个概率模型,在这个模型中,我们利用映射操作的传递闭包来对抗针对网络周围收集的证据的映射正确性的局部信念。我们展示了我们的模式可以有效地嵌入到任何PDMS中,并在自动生成的模式集和实际模式集上对我们的技术进行了初步评估。
{"title":"Probabilistic Message Passing in Peer Data Management Systems","authors":"P. Cudré-Mauroux, K. Aberer, Andras Feher","doi":"10.1109/ICDE.2006.118","DOIUrl":"https://doi.org/10.1109/ICDE.2006.118","url":null,"abstract":"Until recently, most data integration techniques involved central components, e.g., global schemas, to enable transparent access to heterogeneous databases. Today, however, with the democratization of tools facilitating knowledge elicitation in machine-processable formats, one cannot rely on global, centralized schemas anymore as knowledge creation and consumption are getting more and more dynamic and decentralized. Peer Data Management Systems (PDMS) provide an answer to this problem by eliminating the central semantic component and considering instead compositions of local, pair-wise mappings to propagate queries from one database to the others. PDMS approaches proposed so far make the implicit assumption that all mappings used in this way are correct. This obviously cannot be taken as granted in typical PDMS settings where mappings can be created (semi) automatically by independent parties. In this work, we propose a totally decentralized, efficient message passing scheme to automatically detect erroneous mappings in PDMS. Our scheme is based on a probabilistic model where we take advantage of transitive closures of mapping operations to confront local belief on the correctness of a mapping against evidences gathered around the network. We show that our scheme can be efficiently embedded in any PDMS and provide a preliminary evaluation of our techniques on sets of both automatically-generated and real-world schemas.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"34 1","pages":"41-41"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80457948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a new technique for using samples to estimate join cardinalities. This technique, which we term "end-biased samples," is inspired by recent work in network traffic measurement. It improves on random samples by using coordinated pseudo-random samples and retaining the sampled values in proportion to their frequency. We show that end-biased samples always provide more accurate estimates than random samples with the same sample size. The comparison with histograms is more interesting ― while end-biased histograms are somewhat better than end-biased samples for uncorrelated data sets, end-biased samples dominate by a large margin when the data is correlated. Finally, we compare end-biased samples to the recently proposed "skimmed sketches" and show that neither dominates the other, that each has different and compelling strengths and weaknesses. These results suggest that endbiased samples may be a useful addition to the repertoire of techniques used for data summarization.
{"title":"End-biased Samples for Join Cardinality Estimation","authors":"Cristian Estan, J. Naughton","doi":"10.1109/ICDE.2006.61","DOIUrl":"https://doi.org/10.1109/ICDE.2006.61","url":null,"abstract":"We present a new technique for using samples to estimate join cardinalities. This technique, which we term \"end-biased samples,\" is inspired by recent work in network traffic measurement. It improves on random samples by using coordinated pseudo-random samples and retaining the sampled values in proportion to their frequency. We show that end-biased samples always provide more accurate estimates than random samples with the same sample size. The comparison with histograms is more interesting ― while end-biased histograms are somewhat better than end-biased samples for uncorrelated data sets, end-biased samples dominate by a large margin when the data is correlated. Finally, we compare end-biased samples to the recently proposed \"skimmed sketches\" and show that neither dominates the other, that each has different and compelling strengths and weaknesses. These results suggest that endbiased samples may be a useful addition to the repertoire of techniques used for data summarization.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"1 1","pages":"20-20"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82857464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A system that enables real time query processing on large spatial networks is demonstrated. The system provides functionality for processing a wide range of spatial queries such as nearest neighbor searches and spatial joins on spatial networks of sufficiently large sizes.
{"title":"Enabling Query Processing on Spatial Networks","authors":"Jagan Sankaranarayanan, H. Alborzi, H. Samet","doi":"10.1109/ICDE.2006.60","DOIUrl":"https://doi.org/10.1109/ICDE.2006.60","url":null,"abstract":"A system that enables real time query processing on large spatial networks is demonstrated. The system provides functionality for processing a wide range of spatial queries such as nearest neighbor searches and spatial joins on spatial networks of sufficiently large sizes.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"31 1","pages":"163-163"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88096475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Beckmann, A. Halverson, R. Krishnamurthy, J. Naughton
"Sparse" data, in which relations have many attributes that are null for most tuples, presents a challenge for relational database management systems. If one uses the normal "horizontal" schema to store such data sets in any of the three leading commercial RDBMS, the result is tables that occupy vast amounts of storage, most of which is devoted to nulls. If one attempts to avoid this storage blowup by using a "vertical" schema, the storage utilization is indeed better, but query performance is orders of magnitude slower for certain classes of queries. In this paper, we argue that the proper way to handle sparse data is not to use a vertical schema, but rather to extend the RDBMS tuple storage format to allow the representation of sparse attributes as interpreted fields. The addition of interpreted storage allows for efficient and transparent querying of sparse data, uniform access to all attributes, and schema scalability. We show, through an implementation in PostgreSQL, that the interpreted storage approach dominates in query efficiency and ease-of-use over the current horizontal storage and vertical schema approaches over a wide range of queries and sparse data sets.
{"title":"Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format","authors":"J. Beckmann, A. Halverson, R. Krishnamurthy, J. Naughton","doi":"10.1109/ICDE.2006.67","DOIUrl":"https://doi.org/10.1109/ICDE.2006.67","url":null,"abstract":"\"Sparse\" data, in which relations have many attributes that are null for most tuples, presents a challenge for relational database management systems. If one uses the normal \"horizontal\" schema to store such data sets in any of the three leading commercial RDBMS, the result is tables that occupy vast amounts of storage, most of which is devoted to nulls. If one attempts to avoid this storage blowup by using a \"vertical\" schema, the storage utilization is indeed better, but query performance is orders of magnitude slower for certain classes of queries. In this paper, we argue that the proper way to handle sparse data is not to use a vertical schema, but rather to extend the RDBMS tuple storage format to allow the representation of sparse attributes as interpreted fields. The addition of interpreted storage allows for efficient and transparent querying of sparse data, uniform access to all attributes, and schema scalability. We show, through an implementation in PostgreSQL, that the interpreted storage approach dominates in query efficiency and ease-of-use over the current horizontal storage and vertical schema approaches over a wide range of queries and sparse data sets.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"102 1","pages":"58-58"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76102059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Ayad, J. Naughton, Stephen J. Wright, U. Srivastava
Data streaming systems face the possibility of having to shed load in the case of CPU or memory resource limitations. We study the CPU limited scenario in detail. First, we propose a new model for the CPU cost. Then we formally state the problem of shedding load for the goal of obtaining the maximum possible subset of the complete answer, and propose an online strategy for semantic load shedding. Moving on to random load shedding, we discuss random load shedding strategies that decouple the window maintenance and tuple production operations of the symmetric hash join, and prove that one of them — Probe-No-Insert — always dominates the previously proposed coin flipping strategy.
{"title":"Approximating StreamingWindow Joins Under CPU Limitations","authors":"A. Ayad, J. Naughton, Stephen J. Wright, U. Srivastava","doi":"10.1109/ICDE.2006.24","DOIUrl":"https://doi.org/10.1109/ICDE.2006.24","url":null,"abstract":"Data streaming systems face the possibility of having to shed load in the case of CPU or memory resource limitations. We study the CPU limited scenario in detail. First, we propose a new model for the CPU cost. Then we formally state the problem of shedding load for the goal of obtaining the maximum possible subset of the complete answer, and propose an online strategy for semantic load shedding. Moving on to random load shedding, we discuss random load shedding strategies that decouple the window maintenance and tuple production operations of the symmetric hash join, and prove that one of them — Probe-No-Insert — always dominates the previously proposed coin flipping strategy.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"1 1","pages":"142-142"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79528380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}