首页 > 最新文献

22nd International Conference on Data Engineering (ICDE'06)最新文献

英文 中文
Learning from Aggregate Views 从聚合视图中学习
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.86
Bee-Chung Chen, Lei Chen, R. Ramakrishnan, D. Musicant
In this paper, we introduce a new class of data mining problems called learning from aggregate views. In contrast to the traditional problem of learning from a single table of training examples, the new goal is to learn from multiple aggregate views of the underlying data, without access to the un-aggregated data. We motivate this new problem, present a general problem framework, develop learning methods for RFA (Restriction-Free Aggregate) views defined using COUNT, SUM, AVG and STDEV, and offer theoretical and experimental results that characterize the proposed methods.
在本文中,我们引入了一类新的数据挖掘问题,称为从聚合视图中学习。与从单个训练样例表中学习的传统问题相比,新的目标是从底层数据的多个聚合视图中学习,而不需要访问未聚合的数据。我们提出了这个新问题,提出了一个通用的问题框架,开发了使用COUNT, SUM, AVG和STDEV定义的RFA(无限制聚合)视图的学习方法,并提供了表征所提出方法的理论和实验结果。
{"title":"Learning from Aggregate Views","authors":"Bee-Chung Chen, Lei Chen, R. Ramakrishnan, D. Musicant","doi":"10.1109/ICDE.2006.86","DOIUrl":"https://doi.org/10.1109/ICDE.2006.86","url":null,"abstract":"In this paper, we introduce a new class of data mining problems called learning from aggregate views. In contrast to the traditional problem of learning from a single table of training examples, the new goal is to learn from multiple aggregate views of the underlying data, without access to the un-aggregated data. We motivate this new problem, present a general problem framework, develop learning methods for RFA (Restriction-Free Aggregate) views defined using COUNT, SUM, AVG and STDEV, and offer theoretical and experimental results that characterize the proposed methods.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"492 1","pages":"3-3"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76724219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Continuous Reverse Nearest Neighbor Monitoring 连续反向最近邻监控
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.43
Tian Xia, Donghui Zhang
Continuous spatio-temporal queries have recently received increasing attention due to the abundance of location-aware applications. This paper addresses the Continuous Reverse Nearest Neighbor (CRNN) Query. Given a set of objects O and a query set Q, the CRNN query monitors the exact reverse nearest neighbors of each query point, under the model that both the objects and the query points may move unpredictably. Existing methods for the reverse nearest neighbor (RNN) query either are static or assume a priori knowledge of the trajectory information, and thus do not apply. Related recent work on continuous range query and continuous nearest neighbor query relies on the fact that a simple monitoring region exists. Due to the unique features of the RNN problem, it is non-trivial to even define a monitoring region for the CRNN query. This paper defines the monitoring region for the CRNN query, discusses how to perform initial computation, and then focuses on incremental CRNN monitoring upon updates. The monitoring region according to one query point consists of two types of regions. We argue that the two types should be handled separately. In continuous monitoring, two optimization techniques are proposed. Experimental results prove that our proposed approach is both efficient and scalable.
由于大量的位置感知应用,连续的时空查询最近受到越来越多的关注。本文研究了连续反向最近邻(CRNN)查询。给定一组对象O和一个查询集Q,在对象和查询点都可能不可预测移动的模型下,CRNN查询监视每个查询点的精确反向近邻。现有的反向最近邻(RNN)查询方法要么是静态的,要么假定对轨迹信息有先验知识,因此不适用。最近关于连续距离查询和连续最近邻查询的相关工作依赖于存在一个简单的监视区域。由于RNN问题的独特特征,甚至为CRNN查询定义一个监控区域都是非常重要的。本文定义了CRNN查询的监控区域,讨论了如何进行初始计算,然后重点研究了更新时的增量CRNN监控。一个查询点的监控区域由两种类型的区域组成。我们认为这两种类型应该分开处理。在连续监测中,提出了两种优化技术。实验结果证明了该方法的有效性和可扩展性。
{"title":"Continuous Reverse Nearest Neighbor Monitoring","authors":"Tian Xia, Donghui Zhang","doi":"10.1109/ICDE.2006.43","DOIUrl":"https://doi.org/10.1109/ICDE.2006.43","url":null,"abstract":"Continuous spatio-temporal queries have recently received increasing attention due to the abundance of location-aware applications. This paper addresses the Continuous Reverse Nearest Neighbor (CRNN) Query. Given a set of objects O and a query set Q, the CRNN query monitors the exact reverse nearest neighbors of each query point, under the model that both the objects and the query points may move unpredictably. Existing methods for the reverse nearest neighbor (RNN) query either are static or assume a priori knowledge of the trajectory information, and thus do not apply. Related recent work on continuous range query and continuous nearest neighbor query relies on the fact that a simple monitoring region exists. Due to the unique features of the RNN problem, it is non-trivial to even define a monitoring region for the CRNN query. This paper defines the monitoring region for the CRNN query, discusses how to perform initial computation, and then focuses on incremental CRNN monitoring upon updates. The monitoring region according to one query point consists of two types of regions. We argue that the two types should be handled separately. In continuous monitoring, two optimization techniques are proposed. Experimental results prove that our proposed approach is both efficient and scalable.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"5 1","pages":"77-77"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76392151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
SaveRF: Towards Efficient Relevance Feedback Search SaveRF:迈向高效的相关反馈搜索
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.132
Heng Tao Shen, B. Ooi, K. Tan
In multimedia retrieval, a query is typically interactively refined towards the ‘optimal’ answers by exploiting user feedback. However, in existing work, in each iteration, the refined query is re-evaluated. This is not only inefficient but fails to exploit the answers that may be common between iterations. In this paper, we introduce a new approach called SaveRF (Save random accesses in Relevance Feedback) for iterative relevance feedback search. SaveRF predicts the potential candidates for the next iteration and maintains this small set for efficient sequential scan. By doing so, repeated candidate accesses can be saved, hence reducing the number of random accesses. In addition, efficient scan on the overlap before the search starts also tightens the search space with smaller pruning radius. We implemented SaveRF and our experimental study on real life data sets show that it can reduce the I/O cost significantly.
在多媒体检索中,通过利用用户反馈,查询通常被交互式地细化为“最佳”答案。然而,在现有的工作中,在每次迭代中,精炼的查询都会被重新评估。这不仅效率低下,而且无法利用迭代之间可能常见的答案。本文提出了一种新的相关反馈迭代搜索方法SaveRF (Save random access In Relevance Feedback)。SaveRF预测下一次迭代的潜在候选项,并维护这个小集合以进行有效的顺序扫描。这样做可以节省重复的候选访问,从而减少随机访问的数量。此外,在搜索开始前对重叠部分进行高效扫描,以更小的剪枝半径收紧了搜索空间。我们实现了SaveRF,我们对现实生活数据集的实验研究表明,它可以显着降低I/O成本。
{"title":"SaveRF: Towards Efficient Relevance Feedback Search","authors":"Heng Tao Shen, B. Ooi, K. Tan","doi":"10.1109/ICDE.2006.132","DOIUrl":"https://doi.org/10.1109/ICDE.2006.132","url":null,"abstract":"In multimedia retrieval, a query is typically interactively refined towards the ‘optimal’ answers by exploiting user feedback. However, in existing work, in each iteration, the refined query is re-evaluated. This is not only inefficient but fails to exploit the answers that may be common between iterations. In this paper, we introduce a new approach called SaveRF (Save random accesses in Relevance Feedback) for iterative relevance feedback search. SaveRF predicts the potential candidates for the next iteration and maintains this small set for efficient sequential scan. By doing so, repeated candidate accesses can be saved, hence reducing the number of random accesses. In addition, efficient scan on the overlap before the search starts also tightens the search space with smaller pruning radius. We implemented SaveRF and our experimental study on real life data sets show that it can reduce the I/O cost significantly.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"151 1","pages":"110-110"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76739217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Declarative Network Monitoring with an Underprovisioned Query Processor 使用未充分配置的查询处理器的声明性网络监控
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.46
Frederick Reiss, J. Hellerstein
Many of the data sources used in stream query processing are known to exhibit bursty behavior. We focus here on passive network monitoring, an application in which the data rates typically exhibit a large peak-to-average ratio. Provisioning a stream query processor to handle peak rates in such a setting can be prohibitively expensive. In this paper, we propose to solve this problem by provisioning the query processor for typical data rates instead of much higher peak data rates. To enable this strategy, we present mechanisms and policies for managing the tradeoffs between the latency and accuracy of query results when bursts exceed the steady-state capacity of the query processor. We describe the current status of our implementation and present experimental results on a testbed network monitoring application to demonstrate the utility of our approach
流查询处理中使用的许多数据源都表现出突发行为。我们在这里关注被动网络监控,这是一种数据速率通常表现出较大峰值与平均比率的应用程序。在这种设置中,配置流查询处理器来处理峰值速率可能会非常昂贵。在本文中,我们建议通过为典型数据速率而不是更高的峰值数据速率提供查询处理器来解决这个问题。为了实现这一策略,我们提出了一些机制和策略,用于在突发超过查询处理器的稳态容量时管理查询结果的延迟和准确性之间的权衡。我们描述了我们实现的现状,并给出了一个测试平台网络监控应用程序的实验结果,以证明我们的方法的实用性
{"title":"Declarative Network Monitoring with an Underprovisioned Query Processor","authors":"Frederick Reiss, J. Hellerstein","doi":"10.1109/ICDE.2006.46","DOIUrl":"https://doi.org/10.1109/ICDE.2006.46","url":null,"abstract":"Many of the data sources used in stream query processing are known to exhibit bursty behavior. We focus here on passive network monitoring, an application in which the data rates typically exhibit a large peak-to-average ratio. Provisioning a stream query processor to handle peak rates in such a setting can be prohibitively expensive. In this paper, we propose to solve this problem by provisioning the query processor for typical data rates instead of much higher peak data rates. To enable this strategy, we present mechanisms and policies for managing the tradeoffs between the latency and accuracy of query results when bursts exceed the steady-state capacity of the query processor. We describe the current status of our implementation and present experimental results on a testbed network monitoring application to demonstrate the utility of our approach","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"744 1","pages":"56-56"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76879152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
ACXESS - Access Control for XML with Enhanced Security Specifications 访问控制与增强的安全规范的XML
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.12
Sriram Mohan, Jonathan Klinginsmith, Arijit Sengupta, Yuqing Wu
We present ACXESS (Access Control for XML with Enhanced Security Specifications), a system for specifying and enforcing enhanced security constraints on XML via virtual "security views" and query rewrites. ACXESS is the first system that bears the capability to specify and enforce complicated security policies on both subtrees and structural relationships.
我们提出了accessess(带有增强安全规范的XML访问控制),这是一个通过虚拟“安全视图”和查询重写来指定和执行增强的XML安全约束的系统。access是第一个能够在子树和结构关系上指定和执行复杂安全策略的系统。
{"title":"ACXESS - Access Control for XML with Enhanced Security Specifications","authors":"Sriram Mohan, Jonathan Klinginsmith, Arijit Sengupta, Yuqing Wu","doi":"10.1109/ICDE.2006.12","DOIUrl":"https://doi.org/10.1109/ICDE.2006.12","url":null,"abstract":"We present ACXESS (Access Control for XML with Enhanced Security Specifications), a system for specifying and enforcing enhanced security constraints on XML via virtual \"security views\" and query rewrites. ACXESS is the first system that bears the capability to specify and enforce complicated security policies on both subtrees and structural relationships.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"23 1","pages":"171-171"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79696932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Mondrian Multidimensional K-Anonymity 蒙德里安多维k -匿名
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.101
K. LeFevre, D. DeWitt, R. Ramakrishnan
K-Anonymity has been proposed as a mechanism for protecting privacy in microdata publishing, and numerous recoding "models" have been considered for achieving ��anonymity. This paper proposes a new multidimensional model, which provides an additional degree of flexibility not seen in previous (single-dimensional) approaches. Often this flexibility leads to higher-quality anonymizations, as measured both by general-purpose metrics and more specific notions of query answerability. Optimal multidimensional anonymization is NP-hard (like previous optimal ��-anonymity problems). However, we introduce a simple greedy approximation algorithm, and experimental results show that this greedy algorithm frequently leads to more desirable anonymizations than exhaustive optimal algorithms for two single-dimensional models.
k -匿名被认为是微数据发布中保护隐私的一种机制,许多重新编码“模型”被认为可以实现“匿名”。本文提出了一种新的多维模型,它提供了以前(单维)方法所没有的额外的灵活性。这种灵活性通常会导致更高质量的匿名化,这可以通过通用指标和更具体的查询可回答性概念来衡量。最优多维匿名化是np困难的(就像之前的最优匿名问题一样)。然而,我们引入了一种简单的贪婪近似算法,实验结果表明,对于两个单维模型,这种贪婪算法通常比穷举最优算法产生更理想的匿名化。
{"title":"Mondrian Multidimensional K-Anonymity","authors":"K. LeFevre, D. DeWitt, R. Ramakrishnan","doi":"10.1109/ICDE.2006.101","DOIUrl":"https://doi.org/10.1109/ICDE.2006.101","url":null,"abstract":"K-Anonymity has been proposed as a mechanism for protecting privacy in microdata publishing, and numerous recoding \"models\" have been considered for achieving ��anonymity. This paper proposes a new multidimensional model, which provides an additional degree of flexibility not seen in previous (single-dimensional) approaches. Often this flexibility leads to higher-quality anonymizations, as measured both by general-purpose metrics and more specific notions of query answerability. Optimal multidimensional anonymization is NP-hard (like previous optimal ��-anonymity problems). However, we introduce a simple greedy approximation algorithm, and experimental results show that this greedy algorithm frequently leads to more desirable anonymizations than exhaustive optimal algorithms for two single-dimensional models.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"1 1","pages":"25-25"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82839072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1209
R-trees with Update Memos 带有更新备忘录的r树
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.125
Xiaopeng Xiong, Walid G. Aref
The problem of frequently updating multi-dimensional indexes arises in many location-dependent applications. While the R-tree and its variants are one of the dominant choices for indexing multi-dimensional objects, the R-tree exhibits inferior performance in the presence of frequent updates. In this paper, we present an R-tree variant, termed the RUM-tree (stands for R-tree with Update Memo) that minimizes the cost of object updates. The RUM-tree processes updates in a memo-based approach that avoids disk accesses for purging old entries during an update process. Therefore, the cost of an update operation in the RUM-tree reduces to the cost of only an insert operation. The removal of old object entries is carried out by a garbage cleaner inside the RUM-tree. In this paper, we present the details of the RUM-tree and study its properties. Theoretical analysis and experimental evaluation demonstrate that the RUMtree outperforms other R-tree variants by up to a factor of eight in scenarios with frequent updates.
频繁更新多维索引的问题出现在许多位置相关的应用程序中。虽然r树及其变体是索引多维对象的主要选择之一,但在频繁更新的情况下,r树表现出较差的性能。在本文中,我们提出了一种R-tree的变体,称为RUM-tree(代表带有Update Memo的R-tree),它可以最小化对象更新的成本。RUM-tree以一种基于备忘录的方法处理更新,这种方法避免了在更新过程中为清除旧条目而访问磁盘。因此,在RUM-tree中更新操作的成本降低到仅为插入操作的成本。旧对象项的删除是由ram树中的垃圾清理器执行的。在本文中,我们给出了rum树的细节,并研究了它的性质。理论分析和实验评估表明,在频繁更新的情况下,RUMtree的性能比其他R-tree变体高出8倍。
{"title":"R-trees with Update Memos","authors":"Xiaopeng Xiong, Walid G. Aref","doi":"10.1109/ICDE.2006.125","DOIUrl":"https://doi.org/10.1109/ICDE.2006.125","url":null,"abstract":"The problem of frequently updating multi-dimensional indexes arises in many location-dependent applications. While the R-tree and its variants are one of the dominant choices for indexing multi-dimensional objects, the R-tree exhibits inferior performance in the presence of frequent updates. In this paper, we present an R-tree variant, termed the RUM-tree (stands for R-tree with Update Memo) that minimizes the cost of object updates. The RUM-tree processes updates in a memo-based approach that avoids disk accesses for purging old entries during an update process. Therefore, the cost of an update operation in the RUM-tree reduces to the cost of only an insert operation. The removal of old object entries is carried out by a garbage cleaner inside the RUM-tree. In this paper, we present the details of the RUM-tree and study its properties. Theoretical analysis and experimental evaluation demonstrate that the RUMtree outperforms other R-tree variants by up to a factor of eight in scenarios with frequent updates.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"65 1","pages":"22-22"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85127798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
Composition and Disclosure of Unlinkable Distributed Databases 不可链接分布式数据库的组成与披露
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.41
B. Malin, L. Sweeney
An individual’s location-visit pattern, or trail, can be leveraged to link sensitive data back to identity. We propose a secure multiparty computation protocol that enables locations to provably prevent such linkages. The protocol incorporates a controllable parameter specifying the minimum number of identities a sensitive piece of data must be linkable to via its trail.
可以利用个人的位置访问模式或踪迹将敏感数据链接回身份。我们提出了一个安全的多方计算协议,使位置能够防止这种连接。该协议包含一个可控参数,指定敏感数据必须通过其路径链接到的最小身份数量。
{"title":"Composition and Disclosure of Unlinkable Distributed Databases","authors":"B. Malin, L. Sweeney","doi":"10.1109/ICDE.2006.41","DOIUrl":"https://doi.org/10.1109/ICDE.2006.41","url":null,"abstract":"An individual’s location-visit pattern, or trail, can be leveraged to link sensitive data back to identity. We propose a secure multiparty computation protocol that enables locations to provably prevent such linkages. The protocol incorporates a controllable parameter specifying the minimum number of identities a sensitive piece of data must be linkable to via its trail.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"25 1","pages":"118-118"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84029051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Approximation Techniques for Indexing the Earth Mover’s Distance in Multimedia Databases 多媒体数据库中推土机距离索引的近似技术
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.25
I. Assent, Andrea Wenning, T. Seidl
Todays abundance of storage coupled with digital technologies in virtually any scientific or commercial application such as medical and biological imaging or music archives deal with tremendous quantities of images, videos or audio files stored in large multimedia databases. For content-based data mining and retrieval purposes suitable similarity models are crucial. The Earth Mover’s Distance was introduced in Computer Vision to better approach human perceptual similarities. Its computation, however, is too complex for usage in interactive multimedia database scenarios. In order to enable efficient query processing in large databases, we propose an index-supported multistep algorithm. We therefore develop new lower bounding approximation techniques for the Earth Mover’s Distance which satisfy high quality criteria including completeness (no false drops), index-suitability and fast computation. We demonstrate the efficiency of our approach in extensive experiments on large image databases
今天,在几乎任何科学或商业应用(如医学和生物成像或音乐档案)中,存储空间的丰富与数字技术相结合,处理存储在大型多媒体数据库中的大量图像、视频或音频文件。对于基于内容的数据挖掘和检索来说,合适的相似度模型至关重要。为了更好地接近人类感知相似性,在计算机视觉中引入了地球移动者的距离。然而,它的计算过于复杂,无法用于交互式多媒体数据库场景。为了在大型数据库中实现高效的查询处理,我们提出了一种索引支持的多步算法。因此,我们开发了新的下边界近似技术,以满足高质量的标准,包括完整性(无假滴),索引适用性和快速计算。我们在大型图像数据库的大量实验中证明了我们的方法的有效性
{"title":"Approximation Techniques for Indexing the Earth Mover’s Distance in Multimedia Databases","authors":"I. Assent, Andrea Wenning, T. Seidl","doi":"10.1109/ICDE.2006.25","DOIUrl":"https://doi.org/10.1109/ICDE.2006.25","url":null,"abstract":"Todays abundance of storage coupled with digital technologies in virtually any scientific or commercial application such as medical and biological imaging or music archives deal with tremendous quantities of images, videos or audio files stored in large multimedia databases. For content-based data mining and retrieval purposes suitable similarity models are crucial. The Earth Mover’s Distance was introduced in Computer Vision to better approach human perceptual similarities. Its computation, however, is too complex for usage in interactive multimedia database scenarios. In order to enable efficient query processing in large databases, we propose an index-supported multistep algorithm. We therefore develop new lower bounding approximation techniques for the Earth Mover’s Distance which satisfy high quality criteria including completeness (no false drops), index-suitability and fast computation. We demonstrate the efficiency of our approach in extensive experiments on large image databases","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"91 1","pages":"11-11"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83790023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking C-Cubing:基于聚合检查的封闭立方体的高效计算
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.31
Dong Xin, Zheng Shao, Jiawei Han, Hongyan Liu
It is well recognized that data cubing often produces huge outputs. Two popular efforts devoted to this problem are (1) iceberg cube, where only significant cells are kept, and (2) closed cube, where a group of cells which preserve roll-up/drill-down semantics are losslessly compressed to one cell. Due to its usability and importance, efficient computation of closed cubes still warrants a thorough study. In this paper, we propose a new measure, called closedness, for efficient closed data cubing. We show that closedness is an algebraic measure and can be computed efficiently and incrementally. Based on closedness measure, we develop an an aggregation-based approach, called C-Cubing (i.e., Closed-Cubing), and integrate it into two successful iceberg cubing algorithms: MM-Cubing and Star-Cubing. Our performance study shows that C-Cubing runs almost one order of magnitude faster than the previous approaches. We further study how the performance of the alternative algorithms of C-Cubing varies w.r.t the properties of the data sets.
众所周知,数据立方经常产生巨大的输出。致力于这个问题的两个流行的努力是:(1)冰山立方体,其中只保留有效的单元;(2)封闭立方体,其中一组保留上卷/下钻语义的单元被无损压缩到一个单元。由于闭立方的可用性和重要性,它的高效计算仍然值得深入研究。在本文中,我们提出了一种新的度量,称为封闭性,用于有效的封闭数据立方。我们证明了封闭性是一个代数度量,可以有效地和增量地计算。基于封闭性度量,我们开发了一种基于聚合的方法,称为C-Cubing(即Closed-Cubing),并将其集成到两个成功的冰山立方算法中:MM-Cubing和Star-Cubing。我们的性能研究表明,C-Cubing的运行速度几乎比以前的方法快一个数量级。我们进一步研究了c -立方算法的性能如何随数据集的性质而变化。
{"title":"C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking","authors":"Dong Xin, Zheng Shao, Jiawei Han, Hongyan Liu","doi":"10.1109/ICDE.2006.31","DOIUrl":"https://doi.org/10.1109/ICDE.2006.31","url":null,"abstract":"It is well recognized that data cubing often produces huge outputs. Two popular efforts devoted to this problem are (1) iceberg cube, where only significant cells are kept, and (2) closed cube, where a group of cells which preserve roll-up/drill-down semantics are losslessly compressed to one cell. Due to its usability and importance, efficient computation of closed cubes still warrants a thorough study. In this paper, we propose a new measure, called closedness, for efficient closed data cubing. We show that closedness is an algebraic measure and can be computed efficiently and incrementally. Based on closedness measure, we develop an an aggregation-based approach, called C-Cubing (i.e., Closed-Cubing), and integrate it into two successful iceberg cubing algorithms: MM-Cubing and Star-Cubing. Our performance study shows that C-Cubing runs almost one order of magnitude faster than the previous approaches. We further study how the performance of the alternative algorithms of C-Cubing varies w.r.t the properties of the data sets.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"93 1","pages":"4-4"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83886403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
期刊
22nd International Conference on Data Engineering (ICDE'06)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1