22nd International Conference on Data Engineering (ICDE'06)最新文献

英文中文

AutoGlobe: An Automatic Administration Concept for Service-Oriented Database Applications

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.26

S. Seltzsam, D. Gmach, Stefan Krompass, A. Kemper

Future database application systems will be designed as Service Oriented Architectures (SOAs) like SAP’s NetWeaver instead of monolithic software systems such as SAP’s R/3. The decomposition in finer-grained services allows the usage of hardware clusters and a flexible serviceto- server allocation but also increases the complexity of administration. Thus, new administration techniques like our self-organizing infrastructure that we developed in cooperation with the SAP Adaptive Computing Infrastructure (ACI) group are necessary. For our purpose the available hardware is virtualized, pooled, and monitored. A fuzzy logic based controller module supervises all services running on the hardware platform and remedies exceptional situations automatically. With this self-organizing infrastructure we reduce the necessary hardware and administration overhead and, thus, lower the total cost of ownership (TCO). We used our prototype implementation, called Auto- Globe, for SAP-internal tests and we performed comprehensive simulation studies to demonstrate the effectiveness of our proposed concept.

未来的数据库应用系统将被设计成面向服务的体系结构(soa)，比如SAP的NetWeaver，而不是像SAP的R/3这样的单一软件系统。细粒度服务的分解允许使用硬件集群和灵活的服务到服务器分配，但也增加了管理的复杂性。因此，像我们与SAP Adaptive Computing infrastructure (ACI)小组合作开发的自组织基础设施这样的新管理技术是必要的。对于我们的目的，可用的硬件是虚拟化、池化和监控的。基于模糊逻辑的控制器模块对硬件平台上运行的所有业务进行监控，并自动处理异常情况。有了这种自组织基础设施，我们减少了必要的硬件和管理开销，从而降低了总拥有成本(TCO)。我们使用我们的原型实现，称为Auto- Globe，进行sap内部测试，我们进行了全面的模拟研究，以证明我们提出的概念的有效性。

引用次数: 41

Revision Processing in a Stream Processing Engine: A High-Level Design 流处理引擎中的修订处理:一种高级设计

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.130

Esther Ryvkina, Anurag Maskey, Mitch Cherniack, S. Zdonik

Data stream processing systems have become ubiquitous in academic [1, 2, 5, 6] and commercial [11] sectors, with application areas that include financial services, network traffic analysis, battlefield monitoring and traffic control [3]. The append-only model of streams implies that input data is immutable and therefore always correct. But in practice, streaming data sources often contend with noise (e.g., embedded sensors) or data entry errors (e.g., financial data feeds) resulting in erroneous inputs and therefore, erroneous query results. Many data stream sources (e.g., commercial ticker feeds) issue "revision tuples" (revisions) that amend previously issued tuples (e.g. erroneous share prices). Ideally, any stream processing engine should process revision inputs by generating revision outputs that correct previous query results. We know of no stream processing system that presently has this capability.

数据流处理系统在学术[1,2,5,6]和商业[11]领域已经无处不在，其应用领域包括金融服务、网络流量分析、战场监控和交通控制[3]。流的仅追加模型意味着输入数据是不可变的，因此总是正确的。但在实践中，流数据源经常与噪声(例如，嵌入式传感器)或数据输入错误(例如，财务数据馈送)相斗争，导致错误的输入，因此，错误的查询结果。许多数据流源(例如，商业报价器提要)发布“修订元组”(修订)，修改先前发布的元组(例如错误的股票价格)。理想情况下，任何流处理引擎都应该通过生成修正先前查询结果的修正输出来处理修正输入。据我们所知，目前还没有流处理系统具备这种能力。

引用次数: 88

L-diversity: privacy beyond k-anonymity l多样性:超越k匿名的隐私

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1145/1217299.1217302

Ashwin Machanavajjhala, J. Gehrke, Daniel Kifer, Muthuramakrishnan Venkitasubramaniam

Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called kappa-anonymity has gained popularity. In a kappa-anonymized dataset, each record is indistinguishable from at least k—1 other records with respect to certain "identifying" attributes. In this paper we show with two simple attacks that a kappa-anonymized dataset has some subtle, but severe privacy problems. First, we show that an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. Second, attackers often have background knowledge, and we show that kappa-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks and we propose a novel and powerful privacy definition called ell-diversity. In addition to building a formal foundation for ell-diversity, we show in an experimental evaluation that ell-diversity is practical and can be implemented efficiently.

在不泄露个人敏感信息的情况下发布个人数据是一个重要问题。近年来，一种名为“kappa-匿名”的隐私新定义越来越受欢迎。在kappa匿名数据集中，每条记录在某些“识别”属性方面与至少k-1条其他记录无法区分。在本文中，我们通过两个简单的攻击表明，一个kappa匿名数据集存在一些微妙但严重的隐私问题。首先，我们证明了当敏感属性的多样性很小时，攻击者可以发现这些敏感属性的值。其次，攻击者通常有背景知识，我们证明了kappa匿名并不能保证隐私免受使用背景知识的攻击者的攻击。我们对这两种攻击进行了详细的分析，并提出了一种新颖而强大的隐私定义，称为well -diversity。除了为 well -diversity建立正式的基础外，我们还通过实验评估表明 well -diversity是可行的，并且可以有效地实施。

引用次数: 5163

Mining Dense Periodic Patterns in Time Series Data 挖掘时间序列数据中的密集周期模式

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.97

Chang Sheng, W. Hsu, M. Lee

Existing techniques to mine periodic patterns in time series data are focused on discovering full-cycle periodic patterns from an entire time series. However, many useful partial periodic patterns are hidden in long and complex time series data. In this paper, we aim to discover the partial periodicity in local segments of the time series data. We introduce the notion of character density to partition the time series into variable-length fragments and to determine the lower bound of each character’s period. We propose a novel algorithm, called DPMiner, to find the dense periodic patterns in time series data. Experimental results on both synthetic and real-life datasets demonstrate that the proposed algorithm is effective and efficient to reveal interesting dense periodic patterns.

现有的时间序列数据周期模式挖掘技术侧重于从整个时间序列中发现全周期周期模式。然而，许多有用的部分周期模式隐藏在长而复杂的时间序列数据中。本文的目的是发现时间序列数据局部段的部分周期性。我们引入字符密度的概念，将时间序列划分为可变长度的片段，并确定每个字符周期的下界。我们提出了一种新的算法DPMiner来寻找时间序列数据中的密集周期模式。在合成数据集和真实数据集上的实验结果表明，该算法能够有效地揭示有趣的密集周期模式。

引用次数: 57

XPlainer: An XPath Debugging Framework xplain: XPath调试框架

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.177

M. Consens, John W. S. Liu, Bill O'Farrell

XML is an important practical paradigm in information technology and has a broad range of applications. How to access and retrieve the XML data is crucial to these applications. There are two standard ways for accessing and manipulating XML data, the Simple API for XML (SAX) and the Document Object Model (DOM). However, when an application needs to traverse through XML data, it is not easy to retrieve the required data with these two standard ways. XML data is impossible to be retrieved back and forth by SAX, and the graph-oriented DOM notation is not easy to work with. With such limitation, the W3C supervises the development of three important languages: XPath [3], XQuery and XSLT for exploring and querying XML. Among these three languages, XPath is the key and cornerstone language for the other two. XPath defines expressions for traversing an XML document and specifies the set of nodes (XPath 1.0) or the sequence of nodes (XPath 2.0) in the XML document.

XML是信息技术中一种重要的实用范式，具有广泛的应用范围。如何访问和检索XML数据对这些应用程序至关重要。访问和操作XML数据有两种标准方法，即XML的简单API (SAX)和文档对象模型(DOM)。但是，当应用程序需要遍历XML数据时，使用这两种标准方法检索所需的数据并不容易。XML数据不可能通过SAX来回检索，面向图的DOM表示法也不容易使用。由于存在这样的限制，W3C负责监督三种重要语言的开发:XPath[3]、XQuery和用于探索和查询XML的XSLT。在这三种语言中，XPath是其他两种语言的关键和基础语言。XPath定义遍历XML文档的表达式，并指定XML文档中的节点集(XPath 1.0)或节点序列(XPath 2.0)。

引用次数: 1

Probabilistic Message Passing in Peer Data Management Systems 对等数据管理系统中的概率消息传递

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.118

P. Cudré-Mauroux, K. Aberer, Andras Feher

Until recently, most data integration techniques involved central components, e.g., global schemas, to enable transparent access to heterogeneous databases. Today, however, with the democratization of tools facilitating knowledge elicitation in machine-processable formats, one cannot rely on global, centralized schemas anymore as knowledge creation and consumption are getting more and more dynamic and decentralized. Peer Data Management Systems (PDMS) provide an answer to this problem by eliminating the central semantic component and considering instead compositions of local, pair-wise mappings to propagate queries from one database to the others. PDMS approaches proposed so far make the implicit assumption that all mappings used in this way are correct. This obviously cannot be taken as granted in typical PDMS settings where mappings can be created (semi) automatically by independent parties. In this work, we propose a totally decentralized, efficient message passing scheme to automatically detect erroneous mappings in PDMS. Our scheme is based on a probabilistic model where we take advantage of transitive closures of mapping operations to confront local belief on the correctness of a mapping against evidences gathered around the network. We show that our scheme can be efficiently embedded in any PDMS and provide a preliminary evaluation of our techniques on sets of both automatically-generated and real-world schemas.

直到最近，大多数数据集成技术都涉及中心组件，例如全局模式，以支持对异构数据库的透明访问。然而，今天，随着以机器可处理格式促进知识获取的工具的民主化，人们不能再依赖全局的、集中的模式，因为知识的创造和消费变得越来越动态和分散。对等数据管理系统(Peer Data Management Systems, PDMS)解决了这个问题，它消除了中心语义组件，转而考虑本地配对映射的组合，从而将查询从一个数据库传播到另一个数据库。目前提出的PDMS方法隐含地假设以这种方式使用的所有映射都是正确的。在典型的PDMS设置中，映射可以由独立的各方(半)自动创建，这显然不能被视为理所当然。在这项工作中，我们提出了一个完全分散的、高效的消息传递方案来自动检测PDMS中的错误映射。我们的方案基于一个概率模型，在这个模型中，我们利用映射操作的传递闭包来对抗针对网络周围收集的证据的映射正确性的局部信念。我们展示了我们的模式可以有效地嵌入到任何PDMS中，并在自动生成的模式集和实际模式集上对我们的技术进行了初步评估。

{"title":"Probabilistic Message Passing in Peer Data Management Systems","authors":"P. Cudré-Mauroux, K. Aberer, Andras Feher","doi":"10.1109/ICDE.2006.118","DOIUrl":"https://doi.org/10.1109/ICDE.2006.118","url":null,"abstract":"Until recently, most data integration techniques involved central components, e.g., global schemas, to enable transparent access to heterogeneous databases. Today, however, with the democratization of tools facilitating knowledge elicitation in machine-processable formats, one cannot rely on global, centralized schemas anymore as knowledge creation and consumption are getting more and more dynamic and decentralized. Peer Data Management Systems (PDMS) provide an answer to this problem by eliminating the central semantic component and considering instead compositions of local, pair-wise mappings to propagate queries from one database to the others. PDMS approaches proposed so far make the implicit assumption that all mappings used in this way are correct. This obviously cannot be taken as granted in typical PDMS settings where mappings can be created (semi) automatically by independent parties. In this work, we propose a totally decentralized, efficient message passing scheme to automatically detect erroneous mappings in PDMS. Our scheme is based on a probabilistic model where we take advantage of transitive closures of mapping operations to confront local belief on the correctness of a mapping against evidences gathered around the network. We show that our scheme can be efficiently embedded in any PDMS and provide a preliminary evaluation of our techniques on sets of both automatically-generated and real-world schemas.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"34 1","pages":"41-41"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80457948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

End-biased Samples for Join Cardinality Estimation 端点偏置样本的连接基数估计

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.61

Cristian Estan, J. Naughton

We present a new technique for using samples to estimate join cardinalities. This technique, which we term "end-biased samples," is inspired by recent work in network traffic measurement. It improves on random samples by using coordinated pseudo-random samples and retaining the sampled values in proportion to their frequency. We show that end-biased samples always provide more accurate estimates than random samples with the same sample size. The comparison with histograms is more interesting ― while end-biased histograms are somewhat better than end-biased samples for uncorrelated data sets, end-biased samples dominate by a large margin when the data is correlated. Finally, we compare end-biased samples to the recently proposed "skimmed sketches" and show that neither dominates the other, that each has different and compelling strengths and weaknesses. These results suggest that endbiased samples may be a useful addition to the repertoire of techniques used for data summarization.

我们提出了一种使用样本估计连接基数的新技术。这种技术，我们称之为“端偏样本”，是受到最近网络流量测量工作的启发。它通过使用协调伪随机样本并按频率比例保留采样值来改进随机样本。我们表明，端偏样本总是比具有相同样本量的随机样本提供更准确的估计。与直方图的比较更有趣——虽然对于不相关的数据集，端偏直方图比端偏样本要好一些，但当数据相关时，端偏样本占主导地位。最后，我们将端偏样本与最近提出的“略读草图”进行了比较，并表明两者都不占主导地位，两者都有不同且引人注目的优势和劣势。这些结果表明，端偏样本可能是一个有用的补充，用于数据汇总技术的曲目。

引用次数: 60

Enabling Query Processing on Spatial Networks 启用空间网络上的查询处理

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.60

Jagan Sankaranarayanan, H. Alborzi, H. Samet

A system that enables real time query processing on large spatial networks is demonstrated. The system provides functionality for processing a wide range of spatial queries such as nearest neighbor searches and spatial joins on spatial networks of sufficiently large sizes.

演示了一个能够在大空间网络上实现实时查询处理的系统。该系统提供了处理大范围空间查询的功能，例如在足够大的空间网络上进行最近邻搜索和空间连接。

引用次数: 7

Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format 使用解释属性存储格式扩展rdbms以支持稀疏数据集

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.67

J. Beckmann, A. Halverson, R. Krishnamurthy, J. Naughton

"Sparse" data, in which relations have many attributes that are null for most tuples, presents a challenge for relational database management systems. If one uses the normal "horizontal" schema to store such data sets in any of the three leading commercial RDBMS, the result is tables that occupy vast amounts of storage, most of which is devoted to nulls. If one attempts to avoid this storage blowup by using a "vertical" schema, the storage utilization is indeed better, but query performance is orders of magnitude slower for certain classes of queries. In this paper, we argue that the proper way to handle sparse data is not to use a vertical schema, but rather to extend the RDBMS tuple storage format to allow the representation of sparse attributes as interpreted fields. The addition of interpreted storage allows for efficient and transparent querying of sparse data, uniform access to all attributes, and schema scalability. We show, through an implementation in PostgreSQL, that the interpreted storage approach dominates in query efficiency and ease-of-use over the current horizontal storage and vertical schema approaches over a wide range of queries and sparse data sets.

在“稀疏”数据中，关系的许多属性对于大多数元组都是空的，这对关系数据库管理系统提出了挑战。如果使用正常的“水平”模式在三种领先的商用RDBMS中的任何一种中存储此类数据集，结果是表占用大量存储空间，其中大部分用于空值。如果试图通过使用“垂直”模式来避免这种存储爆炸，那么存储利用率确实会更好，但是对于某些查询类，查询性能会降低几个数量级。在本文中，我们认为处理稀疏数据的正确方法不是使用垂直模式，而是扩展RDBMS元组存储格式，以允许将稀疏属性表示为解释字段。解释存储的添加允许对稀疏数据进行高效和透明的查询、对所有属性的统一访问以及模式可伸缩性。通过在PostgreSQL中的一个实现，我们展示了在查询效率和易用性方面，解释存储方法比当前的水平存储和垂直模式方法在广泛的查询和稀疏数据集上占据主导地位。

{"title":"Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format","authors":"J. Beckmann, A. Halverson, R. Krishnamurthy, J. Naughton","doi":"10.1109/ICDE.2006.67","DOIUrl":"https://doi.org/10.1109/ICDE.2006.67","url":null,"abstract":"\"Sparse\" data, in which relations have many attributes that are null for most tuples, presents a challenge for relational database management systems. If one uses the normal \"horizontal\" schema to store such data sets in any of the three leading commercial RDBMS, the result is tables that occupy vast amounts of storage, most of which is devoted to nulls. If one attempts to avoid this storage blowup by using a \"vertical\" schema, the storage utilization is indeed better, but query performance is orders of magnitude slower for certain classes of queries. In this paper, we argue that the proper way to handle sparse data is not to use a vertical schema, but rather to extend the RDBMS tuple storage format to allow the representation of sparse attributes as interpreted fields. The addition of interpreted storage allows for efficient and transparent querying of sparse data, uniform access to all attributes, and schema scalability. We show, through an implementation in PostgreSQL, that the interpreted storage approach dominates in query efficiency and ease-of-use over the current horizontal storage and vertical schema approaches over a wide range of queries and sparse data sets.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"102 1","pages":"58-58"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76102059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 101

Approximating StreamingWindow Joins Under CPU Limitations 在CPU限制下近似StreamingWindow连接

22nd International Conference on Data Engineering (ICDE'06)

Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.24

A. Ayad, J. Naughton, Stephen J. Wright, U. Srivastava

Data streaming systems face the possibility of having to shed load in the case of CPU or memory resource limitations. We study the CPU limited scenario in detail. First, we propose a new model for the CPU cost. Then we formally state the problem of shedding load for the goal of obtaining the maximum possible subset of the complete answer, and propose an online strategy for semantic load shedding. Moving on to random load shedding, we discuss random load shedding strategies that decouple the window maintenance and tuple production operations of the symmetric hash join, and prove that one of them — Probe-No-Insert — always dominates the previously proposed coin flipping strategy.

在CPU或内存资源限制的情况下，数据流系统可能不得不减少负载。我们详细研究了CPU受限的场景。首先，我们提出了一个新的CPU成本模型。然后，我们以获得完整答案的最大可能子集为目标，形式化地描述了减载问题，并提出了一种语义减载的在线策略。继续讨论随机减载策略，我们讨论了将对称哈希连接的窗口维护和元组生成操作解耦的随机减载策略，并证明了其中一种策略——Probe-No-Insert——总是优于先前提出的抛硬币策略。

引用次数: 18

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

22nd International Conference on Data Engineering (ICDE'06)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀