首页 > 最新文献

21st International Conference on Data Engineering (ICDE'05)最新文献

英文 中文
Odysseus: a high-performance ORDBMS tightly-coupled with IR features 奥德修斯:一个高性能ORDBMS与IR特性紧密耦合
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.95
K. Whang, Min-Jae Lee, Jae-Gil Lee, Min-Soo Kim, Wook-Shin Han
We propose the notion of tight-coupling [K. Whang et al., (1999)] to add new data types into the DBMS engine. In this paper, we introduce the Odysseus ORDBMS and present its tightly-coupled IR features (US patented). We demonstrate a Web search engine capable of managing 20 million Web pages in a non-parallel configuration using Odysseus.
我们提出了紧密耦合的概念[K]。Whang et al.,(1999)]向DBMS引擎中添加新的数据类型。本文介绍了Odysseus ORDBMS,并介绍了其紧耦合IR特性(美国专利)。我们将演示一个Web搜索引擎,它能够使用奥德修斯在非并行配置中管理2000万个Web页面。
{"title":"Odysseus: a high-performance ORDBMS tightly-coupled with IR features","authors":"K. Whang, Min-Jae Lee, Jae-Gil Lee, Min-Soo Kim, Wook-Shin Han","doi":"10.1109/ICDE.2005.95","DOIUrl":"https://doi.org/10.1109/ICDE.2005.95","url":null,"abstract":"We propose the notion of tight-coupling [K. Whang et al., (1999)] to add new data types into the DBMS engine. In this paper, we introduce the Odysseus ORDBMS and present its tightly-coupled IR features (US patented). We demonstrate a Web search engine capable of managing 20 million Web pages in a non-parallel configuration using Odysseus.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121487947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Reverse nearest neighbors in large graphs 在大图中反转最近邻
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.124
Man Lung Yiu, D. Papadias, N. Mamoulis, Yufei Tao
A reverse nearest neighbor query returns the data objects that have a query point as their nearest neighbor. Although such queries have been studied quite extensively in Euclidean spaces, there is no previous work in the context of large graphs. In this paper, we propose algorithms and optimization techniques for RNN queries by utilizing some characteristics of networks.
反向最近邻查询返回以查询点为最近邻的数据对象。虽然这样的查询在欧几里得空间中已经得到了广泛的研究,但在大图的背景下还没有以前的工作。在本文中,我们利用网络的一些特征提出了RNN查询的算法和优化技术。
{"title":"Reverse nearest neighbors in large graphs","authors":"Man Lung Yiu, D. Papadias, N. Mamoulis, Yufei Tao","doi":"10.1109/ICDE.2005.124","DOIUrl":"https://doi.org/10.1109/ICDE.2005.124","url":null,"abstract":"A reverse nearest neighbor query returns the data objects that have a query point as their nearest neighbor. Although such queries have been studied quite extensively in Euclidean spaces, there is no previous work in the context of large graphs. In this paper, we propose algorithms and optimization techniques for RNN queries by utilizing some characteristics of networks.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122388789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Data privacy through optimal k-anonymization 通过最优k-匿名化实现数据隐私
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.42
R. Bayardo, R. Agrawal
Data de-identification reconciles the demand for release of data for research purposes and the demand for privacy from individuals. This paper proposes and evaluates an optimization algorithm for the powerful de-identification procedure known as k-anonymization. A k-anonymized dataset has the property that each record is indistinguishable from at least k - 1 others. Even simple restrictions of optimized k-anonymity are NP-hard, leading to significant computational challenges. We present a new approach to exploring the space of possible anonymizations that tames the combinatorics of the problem, and develop data-management strategies to reduce reliance on expensive operations such as sorting. Through experiments on real census data, we show the resulting algorithm can find optimal k-anonymizations under two representative cost measures and a wide range of k. We also show that the algorithm can produce good anonymizations in circumstances where the input data or input parameters preclude finding an optimal solution in reasonable time. Finally, we use the algorithm to explore the effects of different coding approaches and problem variations on anonymization quality and performance. To our knowledge, this is the first result demonstrating optimal k-anonymization of a non-trivial dataset under a general model of the problem.
数据去识别化协调了为研究目的而发布数据的需求和对个人隐私的需求。本文提出并评估了一种被称为k-匿名化的强大去识别过程的优化算法。k匿名数据集具有这样的属性,即每个记录与至少k- 1个其他记录无法区分。即使是优化k-匿名的简单限制也是np困难的,这导致了重大的计算挑战。我们提出了一种新的方法来探索可能的匿名化空间,这种方法可以驯服问题的组合,并开发数据管理策略来减少对昂贵操作(如排序)的依赖。通过对真实人口普查数据的实验,我们表明所得到的算法可以在两种具有代表性的成本度量和广泛的k范围下找到最优k-匿名化。我们还表明,在输入数据或输入参数无法在合理时间内找到最优解的情况下,该算法可以产生良好的匿名化。最后,我们使用该算法探讨了不同编码方法和问题变化对匿名化质量和性能的影响。据我们所知,这是在该问题的一般模型下展示非平凡数据集的最佳k-匿名化的第一个结果。
{"title":"Data privacy through optimal k-anonymization","authors":"R. Bayardo, R. Agrawal","doi":"10.1109/ICDE.2005.42","DOIUrl":"https://doi.org/10.1109/ICDE.2005.42","url":null,"abstract":"Data de-identification reconciles the demand for release of data for research purposes and the demand for privacy from individuals. This paper proposes and evaluates an optimization algorithm for the powerful de-identification procedure known as k-anonymization. A k-anonymized dataset has the property that each record is indistinguishable from at least k - 1 others. Even simple restrictions of optimized k-anonymity are NP-hard, leading to significant computational challenges. We present a new approach to exploring the space of possible anonymizations that tames the combinatorics of the problem, and develop data-management strategies to reduce reliance on expensive operations such as sorting. Through experiments on real census data, we show the resulting algorithm can find optimal k-anonymizations under two representative cost measures and a wide range of k. We also show that the algorithm can produce good anonymizations in circumstances where the input data or input parameters preclude finding an optimal solution in reasonable time. Finally, we use the algorithm to explore the effects of different coding approaches and problem variations on anonymization quality and performance. To our knowledge, this is the first result demonstrating optimal k-anonymization of a non-trivial dataset under a general model of the problem.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122838802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1327
Filter based directory replication and caching 基于过滤器的目录复制和缓存
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.67
Apurva Kumar
This paper describes a novel filter based replication model for lightweight directory access protocol (LDAP) directories. Instead of replicating entire subtrees from the directory information tree (DIT), only entries matching a filter specification are replicated Advantages of the filter based replication framework over existing subtree based mechanisms have been demonstrated for a real enterprise directory using real workloads.
本文描述了一种新的基于过滤器的轻量级目录访问协议(LDAP)目录复制模型。与从目录信息树(DIT)复制整个子树不同,只有与过滤器规范匹配的条目才会被复制,基于过滤器的复制框架相对于现有基于子树机制的优势已经在使用真实工作负载的真实企业目录中得到了证明。
{"title":"Filter based directory replication and caching","authors":"Apurva Kumar","doi":"10.1109/ICDE.2005.67","DOIUrl":"https://doi.org/10.1109/ICDE.2005.67","url":null,"abstract":"This paper describes a novel filter based replication model for lightweight directory access protocol (LDAP) directories. Instead of replicating entire subtrees from the directory information tree (DIT), only entries matching a filter specification are replicated Advantages of the filter based replication framework over existing subtree based mechanisms have been demonstrated for a real enterprise directory using real workloads.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128075533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
On the signature trees and balanced signature trees 关于签名树和平衡签名树
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.99
Yangjun Chen
Advanced database application areas, such as computer aided design, office automation, digital libraries, data-mining as well as hypertext and multimedia systems need to handle complex data structures with set-valued attributes, which can be represented as bit strings, called signatures. A set of signatures can be stored in a file, called a signature file. In this paper, we propose a new method to organize a signature file into a tree structure, called a signature tree, to speed up the signature file scanning and query evaluation.
高级数据库应用领域,如计算机辅助设计、办公自动化、数字图书馆、数据挖掘以及超文本和多媒体系统,需要处理具有集值属性的复杂数据结构,这些数据结构可以表示为位串,称为签名。一组签名可以存储在一个文件中,称为签名文件。本文提出了一种将签名文件组织成树状结构的方法——签名树,以加快签名文件的扫描和查询计算速度。
{"title":"On the signature trees and balanced signature trees","authors":"Yangjun Chen","doi":"10.1109/ICDE.2005.99","DOIUrl":"https://doi.org/10.1109/ICDE.2005.99","url":null,"abstract":"Advanced database application areas, such as computer aided design, office automation, digital libraries, data-mining as well as hypertext and multimedia systems need to handle complex data structures with set-valued attributes, which can be represented as bit strings, called signatures. A set of signatures can be stored in a file, called a signature file. In this paper, we propose a new method to organize a signature file into a tree structure, called a signature tree, to speed up the signature file scanning and query evaluation.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134222456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Privacy and ownership preserving of outsourced medical data 外包医疗数据的隐私和所有权保护
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.111
E. Bertino, B. Ooi, Yanjiang Yang, R. Deng
The demand for the secondary use of medical data is increasing steadily to allow for the provision of better quality health care. Two important issues pertaining to this sharing of data have to be addressed: one is the privacy protection for individuals referred to in the data; the other is copyright protection over the data. In this paper, we present a unified framework that seamlessly combines techniques of binning and digital watermarking to attain the dual goals of privacy and copyright protection. Our binning method is built upon an earlier approach of generalization and suppression by allowing a broader concept of generalization. To ensure data usefulness, we propose constraining binning by usage metrics that define maximal allowable information loss, and the metrics can be enforced off-line. Our watermarking algorithm watermarks the binned data in a hierarchical manner by leveraging on the very nature of the data. The method is resilient to the generalization attack that is specific to the binned data, as well as other attacks intended to destroy the inserted mark. We prove that watermarking could not adversely interfere with binning, and implemented the framework. Experiments were conducted, and the results show the robustness of the proposed framework.
对医疗数据二次使用的需求正在稳步增长,以便提供更优质的保健服务。必须解决与这种数据共享有关的两个重要问题:一是数据中提到的个人隐私保护;另一个是数据的版权保护。在本文中,我们提出了一个统一的框架,无缝地结合了分码和数字水印技术,以实现隐私和版权保护的双重目标。我们的分箱方法建立在早期的泛化和抑制方法的基础上,允许更广泛的泛化概念。为了确保数据的有用性,我们提出了使用指标来约束分组,这些指标定义了最大允许的信息丢失,并且这些指标可以离线执行。我们的水印算法利用数据的本质,以分层的方式对分组数据进行水印。该方法对特定于分组数据的泛化攻击以及旨在破坏插入标记的其他攻击具有弹性。证明了水印不会对分帧产生不利干扰,并实现了该框架。实验结果表明,该框架具有较好的鲁棒性。
{"title":"Privacy and ownership preserving of outsourced medical data","authors":"E. Bertino, B. Ooi, Yanjiang Yang, R. Deng","doi":"10.1109/ICDE.2005.111","DOIUrl":"https://doi.org/10.1109/ICDE.2005.111","url":null,"abstract":"The demand for the secondary use of medical data is increasing steadily to allow for the provision of better quality health care. Two important issues pertaining to this sharing of data have to be addressed: one is the privacy protection for individuals referred to in the data; the other is copyright protection over the data. In this paper, we present a unified framework that seamlessly combines techniques of binning and digital watermarking to attain the dual goals of privacy and copyright protection. Our binning method is built upon an earlier approach of generalization and suppression by allowing a broader concept of generalization. To ensure data usefulness, we propose constraining binning by usage metrics that define maximal allowable information loss, and the metrics can be enforced off-line. Our watermarking algorithm watermarks the binned data in a hierarchical manner by leveraging on the very nature of the data. The method is resilient to the generalization attack that is specific to the binned data, as well as other attacks intended to destroy the inserted mark. We prove that watermarking could not adversely interfere with binning, and implemented the framework. Experiments were conducted, and the results show the robustness of the proposed framework.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132299152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 159
On the optimal ordering of maps and selections under factorization 分解下映射的最优排序与选择
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.97
Thomas Neumann, S. Helmer, G. Moerkotte
The query optimizer of a database system is confronted with two aspects when handling user-defined functions (UDFs) in query predicates: the vast differences in evaluation costs between UDFs (and other functions) and multiple calls of the same (expensive) UDF The former is dealt with by ordering the evaluation of the predicates optimally, the latter by identifying common subexpressions and thereby avoiding costly recomputation. Current approaches order n predicates optimally (neglecting factorization) in O(nlogn). Their result may deviate significantly from the optimal solution under factorization. We formalize the problem of finding optimal orderings under factorization and prove that it is NP-hard. Furthermore, we show how to improve on the run time of the brute-force algorithm (which computes all possible orderings) by presenting different enhanced algorithms. Although in the worst case these algorithms obviously still behave exponentially, our experiments demonstrate that for real-life examples their performance is much better.
数据库系统的查询优化器在处理查询谓词中的用户定义函数(UDF)时面临两个方面的问题:UDF(和其他函数)之间计算成本的巨大差异,以及对同一(昂贵的)UDF的多次调用。前者通过对谓词的求值进行最优排序来处理,后者通过标识公共子表达式来处理,从而避免代价高昂的重新计算。目前的方法在O(nlogn)内最优地使用O(n)个谓词(忽略因子分解)。它们的结果可能与因式分解下的最优解有很大的偏差。我们形式化了在分解下寻找最优排序的问题,并证明了它是np困难的。此外,我们还展示了如何通过展示不同的增强算法来改进暴力破解算法(计算所有可能的排序)的运行时间。尽管在最坏的情况下,这些算法显然仍然表现得呈指数级增长,但我们的实验表明,对于现实生活中的例子,它们的性能要好得多。
{"title":"On the optimal ordering of maps and selections under factorization","authors":"Thomas Neumann, S. Helmer, G. Moerkotte","doi":"10.1109/ICDE.2005.97","DOIUrl":"https://doi.org/10.1109/ICDE.2005.97","url":null,"abstract":"The query optimizer of a database system is confronted with two aspects when handling user-defined functions (UDFs) in query predicates: the vast differences in evaluation costs between UDFs (and other functions) and multiple calls of the same (expensive) UDF The former is dealt with by ordering the evaluation of the predicates optimally, the latter by identifying common subexpressions and thereby avoiding costly recomputation. Current approaches order n predicates optimally (neglecting factorization) in O(nlogn). Their result may deviate significantly from the optimal solution under factorization. We formalize the problem of finding optimal orderings under factorization and prove that it is NP-hard. Furthermore, we show how to improve on the run time of the brute-force algorithm (which computes all possible orderings) by presenting different enhanced algorithms. Although in the worst case these algorithms obviously still behave exponentially, our experiments demonstrate that for real-life examples their performance is much better.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117187879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Efficient processing of skyline queries with partially-ordered domains 具有部分有序域的天际线查询的高效处理
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.60
C. Chan, P. Eng, K. Tan
Many decision support applications are characterized by several features: (1) the query is typically based on multiple criteria; (2) there is no single optimal answer (or answer set); (3) because of (2), users typically look for satisfying answers; (4) for the same query, different users, dictated by their personal preferences, may find different answers meeting their needs. As such, it is important for the DBMS to present all interesting answers that may fulfill a user's need. In this article, we focus on the set of interesting answers called the skyline. Given a set of points, the skyline comprises the points that are not dominated by other points. A point dominates another point if it is as good or better in all dimensions and better in at least one dimension. We address the novel and important problem of evaluating skyline queries involving partially-ordered attribute domains.
许多决策支持应用程序具有以下几个特征:(1)查询通常基于多个标准;(2)不存在单一最优答案(或答案集);(3)由于(2),用户通常会寻找满意的答案;(4)对于同一个查询,不同的用户,根据他们的个人喜好,可能会找到满足他们需求的不同答案。因此,DBMS提供可能满足用户需求的所有有趣的答案是很重要的。在这篇文章中,我们将关注一组有趣的答案——天际线。给定一组点,天际线由不受其他点支配的点组成。如果一个点在所有维度上都一样好或更好,并且至少在一个维度上更好,那么它就优于另一个点。我们解决了评估涉及部分有序属性域的skyline查询的新颖而重要的问题。
{"title":"Efficient processing of skyline queries with partially-ordered domains","authors":"C. Chan, P. Eng, K. Tan","doi":"10.1109/ICDE.2005.60","DOIUrl":"https://doi.org/10.1109/ICDE.2005.60","url":null,"abstract":"Many decision support applications are characterized by several features: (1) the query is typically based on multiple criteria; (2) there is no single optimal answer (or answer set); (3) because of (2), users typically look for satisfying answers; (4) for the same query, different users, dictated by their personal preferences, may find different answers meeting their needs. As such, it is important for the DBMS to present all interesting answers that may fulfill a user's need. In this article, we focus on the set of interesting answers called the skyline. Given a set of points, the skyline comprises the points that are not dominated by other points. A point dominates another point if it is as good or better in all dimensions and better in at least one dimension. We address the novel and important problem of evaluating skyline queries involving partially-ordered attribute domains.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116982058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Effective computation of biased quantiles over data streams 有效计算数据流上的偏分位数
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.55
Graham Cormode, Flip Korn, S. Muthukrishnan, D. Srivastava
Skew is prevalent in many data sources such as IP traffic streams. To continually summarize the distribution of such data, a high-biased set of quantiles (e.g., 50th, 90th and 99th percentiles) with finer error guarantees at higher ranks (e.g., errors of 5, 1 and 0.1 percent, respectively) is more useful than uniformly distributed quantiles (e.g., 25th, 50th and 75th percentiles) with uniform error guarantees. In this paper, we address the following two problems. First, can we compute quantiles with finer error guarantees for the higher ranks of the data distribution effectively using less space and computation time than computing all quantiles uniformly at the finest error? Second, if specific quantiles and their error bounds are requested a priori, can the necessary space usage and computation time be reduced? We answer both questions in the affirmative by formalizing them as the "high-biased" and the "targeted" quantiles problems, respectively, and presenting algorithms with provable guarantees, that perform significantly better than previously known solutions for these problems. We implemented our algorithms in the Gigascope data stream management system, and evaluated alternate approaches for maintaining the relevant summary structures. Our experimental results on real and synthetic IP data streams complement our theoretical analyses, and highlight the importance of lightweight, non-blocking implementations when maintaining summary structures over highspeed data streams.
在许多数据源(如IP流量流)中普遍存在偏差。为了不断地总结这些数据的分布,高偏差的分位数(例如,第50、90和99百分位数)在更高的等级(例如,分别为5.1%和0.1%的误差)上具有更精细的误差保证,比具有统一误差保证的均匀分布的分位数(例如,第25、50和75百分位数)更有用。在本文中,我们解决以下两个问题。首先,与以最优误差统一计算所有分位数相比,我们是否可以使用更少的空间和计算时间,有效地为数据分布的较高级别计算具有更精细误差保证的分位数?其次,如果预先请求特定的分位数及其误差范围,是否可以减少必要的空间使用和计算时间?我们通过将它们分别形式化为“高偏差”和“目标”分位数问题来肯定地回答这两个问题,并提出具有可证明保证的算法,这些算法的性能明显优于这些问题的先前已知解决方案。我们在Gigascope数据流管理系统中实现了我们的算法,并评估了维护相关摘要结构的替代方法。我们在真实和合成IP数据流上的实验结果补充了我们的理论分析,并强调了在高速数据流上维护摘要结构时轻量级、非阻塞实现的重要性。
{"title":"Effective computation of biased quantiles over data streams","authors":"Graham Cormode, Flip Korn, S. Muthukrishnan, D. Srivastava","doi":"10.1109/ICDE.2005.55","DOIUrl":"https://doi.org/10.1109/ICDE.2005.55","url":null,"abstract":"Skew is prevalent in many data sources such as IP traffic streams. To continually summarize the distribution of such data, a high-biased set of quantiles (e.g., 50th, 90th and 99th percentiles) with finer error guarantees at higher ranks (e.g., errors of 5, 1 and 0.1 percent, respectively) is more useful than uniformly distributed quantiles (e.g., 25th, 50th and 75th percentiles) with uniform error guarantees. In this paper, we address the following two problems. First, can we compute quantiles with finer error guarantees for the higher ranks of the data distribution effectively using less space and computation time than computing all quantiles uniformly at the finest error? Second, if specific quantiles and their error bounds are requested a priori, can the necessary space usage and computation time be reduced? We answer both questions in the affirmative by formalizing them as the \"high-biased\" and the \"targeted\" quantiles problems, respectively, and presenting algorithms with provable guarantees, that perform significantly better than previously known solutions for these problems. We implemented our algorithms in the Gigascope data stream management system, and evaluated alternate approaches for maintaining the relevant summary structures. Our experimental results on real and synthetic IP data streams complement our theoretical analyses, and highlight the importance of lightweight, non-blocking implementations when maintaining summary structures over highspeed data streams.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116832987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Adaptive process management with ADEPT2 使用ADEPT2进行自适应流程管理
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.17
M. Reichert, S. Rinderle-Ma, U. Kreher, P. Dadam
In the ADEPT project we have been working on the design and implementation of next generation process management software. Based on a conceptual framework for dynamic process changes, on novel process support functions, and on advanced implementation concepts, the developed system enables the realization of adaptive, process-aware information systems (PAIS). Basically, process changes can take place at the type as well as the instance level: changes of single process instances may have to be carried out in an ad-hoc manner and must not affect system robustness and consistency. Process type changes, in turn, must be quickly accomplished in order to adapt the PAIS to business process changes. ADEPT2 offers powerful concepts for modeling, analyzing, and verifying process schemes. Particularly, it ensures schema correctness, like the absence of deadlock-causing cycles or erroneous data flows. This, in turn, constitutes an important prerequisite for dynamic process changes as well. ADEPT2 supports both ad-hoc changes of single process instances and the propagation of process type changes to running instances.
在ADEPT项目中,我们一直致力于下一代过程管理软件的设计和实现。基于动态过程变化的概念框架、新颖的过程支持功能和先进的实现概念,开发的系统能够实现自适应的、过程感知的信息系统(PAIS)。基本上,流程更改可以发生在类型级别和实例级别:单个流程实例的更改可能必须以特别的方式执行,并且必须不影响系统的健壮性和一致性。反过来,必须快速完成流程类型更改,以便使PAIS适应业务流程更改。ADEPT2为流程方案的建模、分析和验证提供了强大的概念。特别是,它确保了模式的正确性,比如没有导致死锁的周期或错误的数据流。反过来,这也是动态过程变化的重要先决条件。ADEPT2既支持对单个流程实例进行临时更改,也支持将流程类型更改传播到正在运行的实例。
{"title":"Adaptive process management with ADEPT2","authors":"M. Reichert, S. Rinderle-Ma, U. Kreher, P. Dadam","doi":"10.1109/ICDE.2005.17","DOIUrl":"https://doi.org/10.1109/ICDE.2005.17","url":null,"abstract":"In the ADEPT project we have been working on the design and implementation of next generation process management software. Based on a conceptual framework for dynamic process changes, on novel process support functions, and on advanced implementation concepts, the developed system enables the realization of adaptive, process-aware information systems (PAIS). Basically, process changes can take place at the type as well as the instance level: changes of single process instances may have to be carried out in an ad-hoc manner and must not affect system robustness and consistency. Process type changes, in turn, must be quickly accomplished in order to adapt the PAIS to business process changes. ADEPT2 offers powerful concepts for modeling, analyzing, and verifying process schemes. Particularly, it ensures schema correctness, like the absence of deadlock-causing cycles or erroneous data flows. This, in turn, constitutes an important prerequisite for dynamic process changes as well. ADEPT2 supports both ad-hoc changes of single process instances and the propagation of process type changes to running instances.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115485948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 187
期刊
21st International Conference on Data Engineering (ICDE'05)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1