首页 > 最新文献

22nd International Conference on Data Engineering (ICDE'06)最新文献

英文 中文
R-trees with Update Memos 带有更新备忘录的r树
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.125
Xiaopeng Xiong, Walid G. Aref
The problem of frequently updating multi-dimensional indexes arises in many location-dependent applications. While the R-tree and its variants are one of the dominant choices for indexing multi-dimensional objects, the R-tree exhibits inferior performance in the presence of frequent updates. In this paper, we present an R-tree variant, termed the RUM-tree (stands for R-tree with Update Memo) that minimizes the cost of object updates. The RUM-tree processes updates in a memo-based approach that avoids disk accesses for purging old entries during an update process. Therefore, the cost of an update operation in the RUM-tree reduces to the cost of only an insert operation. The removal of old object entries is carried out by a garbage cleaner inside the RUM-tree. In this paper, we present the details of the RUM-tree and study its properties. Theoretical analysis and experimental evaluation demonstrate that the RUMtree outperforms other R-tree variants by up to a factor of eight in scenarios with frequent updates.
频繁更新多维索引的问题出现在许多位置相关的应用程序中。虽然r树及其变体是索引多维对象的主要选择之一,但在频繁更新的情况下,r树表现出较差的性能。在本文中,我们提出了一种R-tree的变体,称为RUM-tree(代表带有Update Memo的R-tree),它可以最小化对象更新的成本。RUM-tree以一种基于备忘录的方法处理更新,这种方法避免了在更新过程中为清除旧条目而访问磁盘。因此,在RUM-tree中更新操作的成本降低到仅为插入操作的成本。旧对象项的删除是由ram树中的垃圾清理器执行的。在本文中,我们给出了rum树的细节,并研究了它的性质。理论分析和实验评估表明,在频繁更新的情况下,RUMtree的性能比其他R-tree变体高出8倍。
{"title":"R-trees with Update Memos","authors":"Xiaopeng Xiong, Walid G. Aref","doi":"10.1109/ICDE.2006.125","DOIUrl":"https://doi.org/10.1109/ICDE.2006.125","url":null,"abstract":"The problem of frequently updating multi-dimensional indexes arises in many location-dependent applications. While the R-tree and its variants are one of the dominant choices for indexing multi-dimensional objects, the R-tree exhibits inferior performance in the presence of frequent updates. In this paper, we present an R-tree variant, termed the RUM-tree (stands for R-tree with Update Memo) that minimizes the cost of object updates. The RUM-tree processes updates in a memo-based approach that avoids disk accesses for purging old entries during an update process. Therefore, the cost of an update operation in the RUM-tree reduces to the cost of only an insert operation. The removal of old object entries is carried out by a garbage cleaner inside the RUM-tree. In this paper, we present the details of the RUM-tree and study its properties. Theoretical analysis and experimental evaluation demonstrate that the RUMtree outperforms other R-tree variants by up to a factor of eight in scenarios with frequent updates.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"65 1","pages":"22-22"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85127798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
Segmentation of Publication Records of Authors from the Web 网络作者出版记录的分割
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.137
Wei Zhang, Clement T. Yu, N. Smalheiser, Vetle I. Torvik
Publication records are often found in the authors’ personal home pages. If such a record is partitioned into a list of semantic fields of authors, title, date, etc., the unstructured texts can be converted into structured data, which can be used in other applications. In this paper, we present PEPURS, a publication record segmentation system. It adopts a novel "Split and Merge" strategy. A publication record is split into segments; multiple statistical classifiers compute their likelihoods of belonging to different fields; finally adjacent segments are merged if they belong to the same field. PEPURS introduces the punctuation marks and their neighboring texts as a new feature to distinguish different roles of the marks. PEPURS yields high accuracy scores in experiments.
出版记录通常可以在作者的个人主页上找到。如果将这样的记录划分为作者、标题、日期等语义字段的列表,则可以将非结构化文本转换为结构化数据,以便在其他应用程序中使用。本文提出了一种出版物记录分割系统PEPURS。它采用了一种新颖的“拆分合并”策略。发布记录被分成几段;多个统计分类器计算它们属于不同领域的可能性;最后,如果相邻段属于同一字段,则合并它们。PEPURS引入了标点符号及其相邻文本作为一种新的特征来区分标点符号的不同角色。PEPURS在实验中获得了较高的精度分数。
{"title":"Segmentation of Publication Records of Authors from the Web","authors":"Wei Zhang, Clement T. Yu, N. Smalheiser, Vetle I. Torvik","doi":"10.1109/ICDE.2006.137","DOIUrl":"https://doi.org/10.1109/ICDE.2006.137","url":null,"abstract":"Publication records are often found in the authors’ personal home pages. If such a record is partitioned into a list of semantic fields of authors, title, date, etc., the unstructured texts can be converted into structured data, which can be used in other applications. In this paper, we present PEPURS, a publication record segmentation system. It adopts a novel \"Split and Merge\" strategy. A publication record is split into segments; multiple statistical classifiers compute their likelihoods of belonging to different fields; finally adjacent segments are merged if they belong to the same field. PEPURS introduces the punctuation marks and their neighboring texts as a new feature to distinguish different roles of the marks. PEPURS yields high accuracy scores in experiments.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"5 1","pages":"120-120"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90727799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Composition and Disclosure of Unlinkable Distributed Databases 不可链接分布式数据库的组成与披露
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.41
B. Malin, L. Sweeney
An individual’s location-visit pattern, or trail, can be leveraged to link sensitive data back to identity. We propose a secure multiparty computation protocol that enables locations to provably prevent such linkages. The protocol incorporates a controllable parameter specifying the minimum number of identities a sensitive piece of data must be linkable to via its trail.
可以利用个人的位置访问模式或踪迹将敏感数据链接回身份。我们提出了一个安全的多方计算协议,使位置能够防止这种连接。该协议包含一个可控参数,指定敏感数据必须通过其路径链接到的最小身份数量。
{"title":"Composition and Disclosure of Unlinkable Distributed Databases","authors":"B. Malin, L. Sweeney","doi":"10.1109/ICDE.2006.41","DOIUrl":"https://doi.org/10.1109/ICDE.2006.41","url":null,"abstract":"An individual’s location-visit pattern, or trail, can be leveraged to link sensitive data back to identity. We propose a secure multiparty computation protocol that enables locations to provably prevent such linkages. The protocol incorporates a controllable parameter specifying the minimum number of identities a sensitive piece of data must be linkable to via its trail.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"25 1","pages":"118-118"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84029051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Efficient Discovery of Emerging Frequent Patterns in ArbitraryWindows on Data Streams 数据流任意窗口中频繁模式的有效发现
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.57
Xiaoming Jin, Xinqiang Zuo, K. Lam, Jianmin Wang, Jiaguang Sun
This paper proposes an effective data mining technique for finding useful patterns in streaming sequences. At present, typical approaches to this problem are to search for patterns in a fixed-size window sliding through the stream of data being collected. The practical values of such approaches are limited in that, in typical application scenarios, the patterns are emerging and it is difficult, if not impossible, to determine a priori a suitable window size within which useful patterns may exist. It is therefore desirable to devise techniques that can identify useful patterns with arbitrary window sizes. Attempts to this problem are challenging, however, because it requires a highly efficient searching in a substantially bigger solution space. This paper presents a new method which includes firstly a pruning strategy to reduce the search space and secondly a mining strategy that adopts a dynamic index structure to allow efficient discovery of emerging patterns in a streaming sequence. Experimental results on real data and synthetic data show that the proposed method outperforms other existing schemes both in computational efficiency and effectiveness in finding useful patterns.
本文提出了一种有效的数据挖掘技术,用于在流序列中发现有用的模式。目前,解决该问题的典型方法是在一个固定大小的窗口中搜索模式,该窗口滑动穿过正在收集的数据流。这种方法的实际价值是有限的,因为在典型的应用程序场景中,模式正在出现,很难(如果不是不可能的话)先验地确定一个合适的窗口大小,其中可能存在有用的模式。因此,需要设计出能够识别任意窗口大小的有用模式的技术。然而,尝试解决这个问题是具有挑战性的,因为它需要在更大的解决方案空间中进行高效搜索。本文提出了一种新的方法,该方法首先采用剪枝策略来减少搜索空间,其次采用动态索引结构的挖掘策略来有效地发现流序列中的新模式。在实际数据和合成数据上的实验结果表明,该方法在计算效率和发现有用模式的有效性方面都优于现有的方法。
{"title":"Efficient Discovery of Emerging Frequent Patterns in ArbitraryWindows on Data Streams","authors":"Xiaoming Jin, Xinqiang Zuo, K. Lam, Jianmin Wang, Jiaguang Sun","doi":"10.1109/ICDE.2006.57","DOIUrl":"https://doi.org/10.1109/ICDE.2006.57","url":null,"abstract":"This paper proposes an effective data mining technique for finding useful patterns in streaming sequences. At present, typical approaches to this problem are to search for patterns in a fixed-size window sliding through the stream of data being collected. The practical values of such approaches are limited in that, in typical application scenarios, the patterns are emerging and it is difficult, if not impossible, to determine a priori a suitable window size within which useful patterns may exist. It is therefore desirable to devise techniques that can identify useful patterns with arbitrary window sizes. Attempts to this problem are challenging, however, because it requires a highly efficient searching in a substantially bigger solution space. This paper presents a new method which includes firstly a pruning strategy to reduce the search space and secondly a mining strategy that adopts a dynamic index structure to allow efficient discovery of emerging patterns in a streaming sequence. Experimental results on real data and synthetic data show that the proposed method outperforms other existing schemes both in computational efficiency and effectiveness in finding useful patterns.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"18 1","pages":"113-113"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87089894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
What’s Different: Distributed, Continuous Monitoring of Duplicate-Resilient Aggregates on Data Streams 不同之处:数据流上重复弹性聚合的分布式、连续监控
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.173
Graham Cormode, S. Muthukrishnan, W. Zhuang
Emerging applications in sensor systems and network-wide IP traffic analysis present many technical challenges. They need distributed monitoring and continuous tracking of events. They have severe resource constraints not only at each site in terms of per-update processing time and archival space for highspeed streams of observations, but also crucially, communication constraints for collaborating on the monitoring task. These elements have been addressed in a series of recent works. A fundamental issue that arises is that one cannot make the "uniqueness" assumption on observed events which is present in previous works, since widescale monitoring invariably encounters the same events at different points. For example, within the network of an Internet Service Provider packets of the same flow will be observed in different routers; similarly, the same individual will be observed by multiple mobile sensors in monitoring wild animals. Aggregates of interest on such distributed environments must be resilient to duplicate observations. We study such duplicate-resilient aggregates that measure the extent of the duplication―how many unique observations are there, how many observations are unique―as well as standard holistic aggregates such as quantiles and heavy hitters over the unique items. We present accuracy guaranteed, highly communication-efficient algorithms for these aggregates that work within the time and space constraints of high speed streams. We also present results of a detailed experimental study on both real-life and synthetic data.
传感器系统和全网IP流量分析的新兴应用提出了许多技术挑战。它们需要对事件进行分布式监控和连续跟踪。它们不仅在每个站点的每次更新处理时间和高速观测流的存档空间方面存在严重的资源限制,而且至关重要的是,在监测任务上进行协作的通信限制。这些因素在最近的一系列作品中得到了解决。出现的一个基本问题是,人们不能对以前工作中出现的观察事件做出“唯一性”假设,因为大规模监测总是在不同的点遇到相同的事件。例如,在互联网服务提供商的网络中,将在不同的路由器中观察到相同流的数据包;同样,在监测野生动物时,同一个体也会被多个移动传感器观察到。在这种分布式环境中,感兴趣的聚合必须对重复观察具有弹性。我们研究这样的重复弹性聚合,测量重复的程度——有多少独特的观察,有多少观察是独特的——以及标准的整体聚合,如分位数和独特项目的重击。在高速流的时间和空间限制下,我们为这些聚合提供了精度保证、通信效率高的算法。我们还介绍了对现实生活和合成数据进行详细实验研究的结果。
{"title":"What’s Different: Distributed, Continuous Monitoring of Duplicate-Resilient Aggregates on Data Streams","authors":"Graham Cormode, S. Muthukrishnan, W. Zhuang","doi":"10.1109/ICDE.2006.173","DOIUrl":"https://doi.org/10.1109/ICDE.2006.173","url":null,"abstract":"Emerging applications in sensor systems and network-wide IP traffic analysis present many technical challenges. They need distributed monitoring and continuous tracking of events. They have severe resource constraints not only at each site in terms of per-update processing time and archival space for highspeed streams of observations, but also crucially, communication constraints for collaborating on the monitoring task. These elements have been addressed in a series of recent works. A fundamental issue that arises is that one cannot make the \"uniqueness\" assumption on observed events which is present in previous works, since widescale monitoring invariably encounters the same events at different points. For example, within the network of an Internet Service Provider packets of the same flow will be observed in different routers; similarly, the same individual will be observed by multiple mobile sensors in monitoring wild animals. Aggregates of interest on such distributed environments must be resilient to duplicate observations. We study such duplicate-resilient aggregates that measure the extent of the duplication―how many unique observations are there, how many observations are unique―as well as standard holistic aggregates such as quantiles and heavy hitters over the unique items. We present accuracy guaranteed, highly communication-efficient algorithms for these aggregates that work within the time and space constraints of high speed streams. We also present results of a detailed experimental study on both real-life and synthetic data.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"7 1","pages":"57-57"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85632651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
SIPPER: Selecting Informative Peers in Structured P2P Environment for Content-Based Retrieval 基于内容检索的结构化P2P环境中信息对等点的选择
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.139
Shuigeng Zhou, Zhengjie Zhang, Weining Qian, Aoying Zhou
In this demonstration, we present a prototype system called SIPPER, which is the abbreviation for Selecting Informative Peers in Structured P2P Environment for Content-based Retrieval. SIPPER distinguishes itself from the existing P2P-IR systems by the following two features: First, to improve retrieval efficiency, SIPPER employs a novel peer selection method to direct the query to a small fraction of relevant peers in the network for searching globally relevant documents. Second, to reduce the bandwidth cost of meta data publishing, SIPPER uses a new publishing mechanism, the term-node publishing mechanism, which is different from the traditional term-document model [2].
在本演示中,我们提出了一个名为SIPPER的原型系统,SIPPER是结构化P2P环境中基于内容的检索选择信息对等体的缩写。SIPPER与现有P2P-IR系统的区别在于:首先,为了提高检索效率,SIPPER采用了一种新颖的对等点选择方法,将查询引导到网络中相关对等点的一小部分,以搜索全局相关文档;其次,为了降低元数据发布的带宽成本,SIPPER采用了一种新的发布机制,即术语-节点发布机制,它不同于传统的术语-文档模型[2]。
{"title":"SIPPER: Selecting Informative Peers in Structured P2P Environment for Content-Based Retrieval","authors":"Shuigeng Zhou, Zhengjie Zhang, Weining Qian, Aoying Zhou","doi":"10.1109/ICDE.2006.139","DOIUrl":"https://doi.org/10.1109/ICDE.2006.139","url":null,"abstract":"In this demonstration, we present a prototype system called SIPPER, which is the abbreviation for Selecting Informative Peers in Structured P2P Environment for Content-based Retrieval. SIPPER distinguishes itself from the existing P2P-IR systems by the following two features: First, to improve retrieval efficiency, SIPPER employs a novel peer selection method to direct the query to a small fraction of relevant peers in the network for searching globally relevant documents. Second, to reduce the bandwidth cost of meta data publishing, SIPPER uses a new publishing mechanism, the term-node publishing mechanism, which is different from the traditional term-document model [2].","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"23 1","pages":"161-161"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85779219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
ACXESS - Access Control for XML with Enhanced Security Specifications 访问控制与增强的安全规范的XML
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.12
Sriram Mohan, Jonathan Klinginsmith, Arijit Sengupta, Yuqing Wu
We present ACXESS (Access Control for XML with Enhanced Security Specifications), a system for specifying and enforcing enhanced security constraints on XML via virtual "security views" and query rewrites. ACXESS is the first system that bears the capability to specify and enforce complicated security policies on both subtrees and structural relationships.
我们提出了accessess(带有增强安全规范的XML访问控制),这是一个通过虚拟“安全视图”和查询重写来指定和执行增强的XML安全约束的系统。access是第一个能够在子树和结构关系上指定和执行复杂安全策略的系统。
{"title":"ACXESS - Access Control for XML with Enhanced Security Specifications","authors":"Sriram Mohan, Jonathan Klinginsmith, Arijit Sengupta, Yuqing Wu","doi":"10.1109/ICDE.2006.12","DOIUrl":"https://doi.org/10.1109/ICDE.2006.12","url":null,"abstract":"We present ACXESS (Access Control for XML with Enhanced Security Specifications), a system for specifying and enforcing enhanced security constraints on XML via virtual \"security views\" and query rewrites. ACXESS is the first system that bears the capability to specify and enforce complicated security policies on both subtrees and structural relationships.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"23 1","pages":"171-171"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79696932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking C-Cubing:基于聚合检查的封闭立方体的高效计算
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.31
Dong Xin, Zheng Shao, Jiawei Han, Hongyan Liu
It is well recognized that data cubing often produces huge outputs. Two popular efforts devoted to this problem are (1) iceberg cube, where only significant cells are kept, and (2) closed cube, where a group of cells which preserve roll-up/drill-down semantics are losslessly compressed to one cell. Due to its usability and importance, efficient computation of closed cubes still warrants a thorough study. In this paper, we propose a new measure, called closedness, for efficient closed data cubing. We show that closedness is an algebraic measure and can be computed efficiently and incrementally. Based on closedness measure, we develop an an aggregation-based approach, called C-Cubing (i.e., Closed-Cubing), and integrate it into two successful iceberg cubing algorithms: MM-Cubing and Star-Cubing. Our performance study shows that C-Cubing runs almost one order of magnitude faster than the previous approaches. We further study how the performance of the alternative algorithms of C-Cubing varies w.r.t the properties of the data sets.
众所周知,数据立方经常产生巨大的输出。致力于这个问题的两个流行的努力是:(1)冰山立方体,其中只保留有效的单元;(2)封闭立方体,其中一组保留上卷/下钻语义的单元被无损压缩到一个单元。由于闭立方的可用性和重要性,它的高效计算仍然值得深入研究。在本文中,我们提出了一种新的度量,称为封闭性,用于有效的封闭数据立方。我们证明了封闭性是一个代数度量,可以有效地和增量地计算。基于封闭性度量,我们开发了一种基于聚合的方法,称为C-Cubing(即Closed-Cubing),并将其集成到两个成功的冰山立方算法中:MM-Cubing和Star-Cubing。我们的性能研究表明,C-Cubing的运行速度几乎比以前的方法快一个数量级。我们进一步研究了c -立方算法的性能如何随数据集的性质而变化。
{"title":"C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking","authors":"Dong Xin, Zheng Shao, Jiawei Han, Hongyan Liu","doi":"10.1109/ICDE.2006.31","DOIUrl":"https://doi.org/10.1109/ICDE.2006.31","url":null,"abstract":"It is well recognized that data cubing often produces huge outputs. Two popular efforts devoted to this problem are (1) iceberg cube, where only significant cells are kept, and (2) closed cube, where a group of cells which preserve roll-up/drill-down semantics are losslessly compressed to one cell. Due to its usability and importance, efficient computation of closed cubes still warrants a thorough study. In this paper, we propose a new measure, called closedness, for efficient closed data cubing. We show that closedness is an algebraic measure and can be computed efficiently and incrementally. Based on closedness measure, we develop an an aggregation-based approach, called C-Cubing (i.e., Closed-Cubing), and integrate it into two successful iceberg cubing algorithms: MM-Cubing and Star-Cubing. Our performance study shows that C-Cubing runs almost one order of magnitude faster than the previous approaches. We further study how the performance of the alternative algorithms of C-Cubing varies w.r.t the properties of the data sets.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"93 1","pages":"4-4"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83886403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Mondrian Multidimensional K-Anonymity 蒙德里安多维k -匿名
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.101
K. LeFevre, D. DeWitt, R. Ramakrishnan
K-Anonymity has been proposed as a mechanism for protecting privacy in microdata publishing, and numerous recoding "models" have been considered for achieving ��anonymity. This paper proposes a new multidimensional model, which provides an additional degree of flexibility not seen in previous (single-dimensional) approaches. Often this flexibility leads to higher-quality anonymizations, as measured both by general-purpose metrics and more specific notions of query answerability. Optimal multidimensional anonymization is NP-hard (like previous optimal ��-anonymity problems). However, we introduce a simple greedy approximation algorithm, and experimental results show that this greedy algorithm frequently leads to more desirable anonymizations than exhaustive optimal algorithms for two single-dimensional models.
k -匿名被认为是微数据发布中保护隐私的一种机制,许多重新编码“模型”被认为可以实现“匿名”。本文提出了一种新的多维模型,它提供了以前(单维)方法所没有的额外的灵活性。这种灵活性通常会导致更高质量的匿名化,这可以通过通用指标和更具体的查询可回答性概念来衡量。最优多维匿名化是np困难的(就像之前的最优匿名问题一样)。然而,我们引入了一种简单的贪婪近似算法,实验结果表明,对于两个单维模型,这种贪婪算法通常比穷举最优算法产生更理想的匿名化。
{"title":"Mondrian Multidimensional K-Anonymity","authors":"K. LeFevre, D. DeWitt, R. Ramakrishnan","doi":"10.1109/ICDE.2006.101","DOIUrl":"https://doi.org/10.1109/ICDE.2006.101","url":null,"abstract":"K-Anonymity has been proposed as a mechanism for protecting privacy in microdata publishing, and numerous recoding \"models\" have been considered for achieving ��anonymity. This paper proposes a new multidimensional model, which provides an additional degree of flexibility not seen in previous (single-dimensional) approaches. Often this flexibility leads to higher-quality anonymizations, as measured both by general-purpose metrics and more specific notions of query answerability. Optimal multidimensional anonymization is NP-hard (like previous optimal ��-anonymity problems). However, we introduce a simple greedy approximation algorithm, and experimental results show that this greedy algorithm frequently leads to more desirable anonymizations than exhaustive optimal algorithms for two single-dimensional models.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"1 1","pages":"25-25"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82839072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1209
Continuous Reverse Nearest Neighbor Monitoring 连续反向最近邻监控
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.43
Tian Xia, Donghui Zhang
Continuous spatio-temporal queries have recently received increasing attention due to the abundance of location-aware applications. This paper addresses the Continuous Reverse Nearest Neighbor (CRNN) Query. Given a set of objects O and a query set Q, the CRNN query monitors the exact reverse nearest neighbors of each query point, under the model that both the objects and the query points may move unpredictably. Existing methods for the reverse nearest neighbor (RNN) query either are static or assume a priori knowledge of the trajectory information, and thus do not apply. Related recent work on continuous range query and continuous nearest neighbor query relies on the fact that a simple monitoring region exists. Due to the unique features of the RNN problem, it is non-trivial to even define a monitoring region for the CRNN query. This paper defines the monitoring region for the CRNN query, discusses how to perform initial computation, and then focuses on incremental CRNN monitoring upon updates. The monitoring region according to one query point consists of two types of regions. We argue that the two types should be handled separately. In continuous monitoring, two optimization techniques are proposed. Experimental results prove that our proposed approach is both efficient and scalable.
由于大量的位置感知应用,连续的时空查询最近受到越来越多的关注。本文研究了连续反向最近邻(CRNN)查询。给定一组对象O和一个查询集Q,在对象和查询点都可能不可预测移动的模型下,CRNN查询监视每个查询点的精确反向近邻。现有的反向最近邻(RNN)查询方法要么是静态的,要么假定对轨迹信息有先验知识,因此不适用。最近关于连续距离查询和连续最近邻查询的相关工作依赖于存在一个简单的监视区域。由于RNN问题的独特特征,甚至为CRNN查询定义一个监控区域都是非常重要的。本文定义了CRNN查询的监控区域,讨论了如何进行初始计算,然后重点研究了更新时的增量CRNN监控。一个查询点的监控区域由两种类型的区域组成。我们认为这两种类型应该分开处理。在连续监测中,提出了两种优化技术。实验结果证明了该方法的有效性和可扩展性。
{"title":"Continuous Reverse Nearest Neighbor Monitoring","authors":"Tian Xia, Donghui Zhang","doi":"10.1109/ICDE.2006.43","DOIUrl":"https://doi.org/10.1109/ICDE.2006.43","url":null,"abstract":"Continuous spatio-temporal queries have recently received increasing attention due to the abundance of location-aware applications. This paper addresses the Continuous Reverse Nearest Neighbor (CRNN) Query. Given a set of objects O and a query set Q, the CRNN query monitors the exact reverse nearest neighbors of each query point, under the model that both the objects and the query points may move unpredictably. Existing methods for the reverse nearest neighbor (RNN) query either are static or assume a priori knowledge of the trajectory information, and thus do not apply. Related recent work on continuous range query and continuous nearest neighbor query relies on the fact that a simple monitoring region exists. Due to the unique features of the RNN problem, it is non-trivial to even define a monitoring region for the CRNN query. This paper defines the monitoring region for the CRNN query, discusses how to perform initial computation, and then focuses on incremental CRNN monitoring upon updates. The monitoring region according to one query point consists of two types of regions. We argue that the two types should be handled separately. In continuous monitoring, two optimization techniques are proposed. Experimental results prove that our proposed approach is both efficient and scalable.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"5 1","pages":"77-77"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76392151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
期刊
22nd International Conference on Data Engineering (ICDE'06)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1