Today’s Content-Based Image Retrieval (CBIR) techniques are based on the "k-nearest neighbors" (k- NN) model. They retrieve images from a single neighborhood using low-level visual features. In this model, semantically similar images are assumed to be clustered in the high-dimensional feature space. Unfortunately, no visual-based feature vector is sufficient to facilitate perfect semantic clustering; and semantically similar images with different appearances are always clustered into distinct neighborhoods in the feature space. Confinement of the search results to a single neighborhood is an inherent limitation of the k-NN techniques. In this paper we consider a new image retrieval paradigm — the Query Decomposition model - that facilitates retrieval of semantically similar images from multiple neighborhoods in the feature space. The retrieval results are the k most similar images from different relevant clusters. We introduce a prototype, and present experimental results to illustrate the effectiveness and efficiency of this new approach to content-based image retrieval.
{"title":"Query Decomposition: A Multiple Neighborhood Approach to Relevance Feedback Processing in Content-based Image Retrieval","authors":"K. Hua, Ning Yu, Danzhou Liu","doi":"10.1109/ICDE.2006.123","DOIUrl":"https://doi.org/10.1109/ICDE.2006.123","url":null,"abstract":"Today’s Content-Based Image Retrieval (CBIR) techniques are based on the \"k-nearest neighbors\" (k- NN) model. They retrieve images from a single neighborhood using low-level visual features. In this model, semantically similar images are assumed to be clustered in the high-dimensional feature space. Unfortunately, no visual-based feature vector is sufficient to facilitate perfect semantic clustering; and semantically similar images with different appearances are always clustered into distinct neighborhoods in the feature space. Confinement of the search results to a single neighborhood is an inherent limitation of the k-NN techniques. In this paper we consider a new image retrieval paradigm — the Query Decomposition model - that facilitates retrieval of semantically similar images from multiple neighborhoods in the feature space. The retrieval results are the k most similar images from different relevant clusters. We introduce a prototype, and present experimental results to illustrate the effectiveness and efficiency of this new approach to content-based image retrieval.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"54 1","pages":"84-84"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86154373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Deng, Xiaofang Zhou, Heng Tao Shen, K. Xu, Xuemin Lin
A k-NN query finds the k nearest-neighbors of a given point from a point database. When it is sufficient to measure object distance using the Euclidian distance, the key to efficient k-NN query processing is to fetch and check the distances of a minimum number of points from the database. For many applications, such as vehicle movement along road networks or rover and animal movement along terrain surfaces, the distance is only meaningful when it is along a valid movement path. For this type of k-NN queries, the focus of efficient query processing is to minimize the cost of computing distances using the environment data (such as the road network data and the terrain data), which can be several orders of magnitude larger than that of the point data. Efficient processing of k-NN queries based on the Euclidian distance or the road network distance has been investigated extensively in the past. In this paper, we investigate the problem of surface k-NN query processing, where the distance is calculated from the shortest path along a terrain surface. This problem is very challenging, as the terrain data can be very large and the computational cost of finding shortest paths is very high. We propose an efficient solution based on multiresolution terrain models. Our approach eliminates the need of costly process of finding shortest paths by ranking objects using estimated lower and upper bounds of distance on multiresolution terrain models.
{"title":"Surface k-NN Query Processing","authors":"K. Deng, Xiaofang Zhou, Heng Tao Shen, K. Xu, Xuemin Lin","doi":"10.1109/ICDE.2006.152","DOIUrl":"https://doi.org/10.1109/ICDE.2006.152","url":null,"abstract":"A k-NN query finds the k nearest-neighbors of a given point from a point database. When it is sufficient to measure object distance using the Euclidian distance, the key to efficient k-NN query processing is to fetch and check the distances of a minimum number of points from the database. For many applications, such as vehicle movement along road networks or rover and animal movement along terrain surfaces, the distance is only meaningful when it is along a valid movement path. For this type of k-NN queries, the focus of efficient query processing is to minimize the cost of computing distances using the environment data (such as the road network data and the terrain data), which can be several orders of magnitude larger than that of the point data. Efficient processing of k-NN queries based on the Euclidian distance or the road network distance has been investigated extensively in the past. In this paper, we investigate the problem of surface k-NN query processing, where the distance is calculated from the shortest path along a terrain surface. This problem is very challenging, as the terrain data can be very large and the computational cost of finding shortest paths is very high. We propose an efficient solution based on multiresolution terrain models. Our approach eliminates the need of costly process of finding shortest paths by ranking objects using estimated lower and upper bounds of distance on multiresolution terrain models.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"28 1","pages":"78-78"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82556576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Estimating the result sizes of XML queries is important in query optimization and is useful in providing a quick feedback about the queries. Existing works have focused on the selectivity estimation of XML queries without order-based axes. In this work, we develop a framework to estimate the result sizes of XPath expressions with order-based axes. We describe how the path and order information of XML elements can be captured and summarized in compact data structures. We also describe methods to estimate the selectivity of XPath queries. The results of extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness and accuracy of the proposed approach.
{"title":"An Estimation System for XPath Expressions","authors":"Hanyu Li, M. Lee, W. Hsu, G. Cong","doi":"10.1109/ICDE.2006.19","DOIUrl":"https://doi.org/10.1109/ICDE.2006.19","url":null,"abstract":"Estimating the result sizes of XML queries is important in query optimization and is useful in providing a quick feedback about the queries. Existing works have focused on the selectivity estimation of XML queries without order-based axes. In this work, we develop a framework to estimate the result sizes of XPath expressions with order-based axes. We describe how the path and order information of XML elements can be captured and summarized in compact data structures. We also describe methods to estimate the selectivity of XPath queries. The results of extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness and accuracy of the proposed approach.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"23 1","pages":"54-54"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74502760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Web-database systems are nowadays an integral part of everybody’s life, with applications ranging from monitoring/ trading stock portfolios, to personalized blog aggregation and news services, to personalized weather tracking services. For most of these services to be successful (and their users to be kept satisfied), two criteria need to be met: user requests must be answered in a timely fashion and using fresh data. This paper presents a framework to balance both requirements from the users’ perspective. Toward this, we propose a user satisfaction metric to measure the overall effectiveness of the Web-database system. We also provide a set of algorithms to dynamically optimize this metric, through query admission control and update frequency modulation. Finally, we present extensive experimental results which compare our proposed algorithms to the current state of the art and show that we outperform competitors under various workloads (generated based on real traces) and user requirements.
web -数据库系统如今已成为每个人生活中不可或缺的一部分,其应用范围从监视/交易股票投资组合,到个性化博客聚合和新闻服务,再到个性化天气跟踪服务。要使大多数这些服务成功(并使其用户满意),需要满足两个标准:必须及时响应用户请求并使用新鲜数据。本文提出了一个从用户角度平衡这两种需求的框架。为此,我们提出了一个用户满意度度量来度量web -数据库系统的总体有效性。我们还提供了一套通过查询允许控制和更新调频来动态优化该度量的算法。最后,我们提出了广泛的实验结果,将我们提出的算法与当前的艺术状态进行比较,并表明我们在各种工作负载(基于真实轨迹生成)和用户需求下的表现优于竞争对手。
{"title":"UNIT: User-centric Transaction Management in Web-Database Systems","authors":"Huiming Qu, Alexandros Labrinidis, D. Mossé","doi":"10.1109/ICDE.2006.166","DOIUrl":"https://doi.org/10.1109/ICDE.2006.166","url":null,"abstract":"Web-database systems are nowadays an integral part of everybody’s life, with applications ranging from monitoring/ trading stock portfolios, to personalized blog aggregation and news services, to personalized weather tracking services. For most of these services to be successful (and their users to be kept satisfied), two criteria need to be met: user requests must be answered in a timely fashion and using fresh data. This paper presents a framework to balance both requirements from the users’ perspective. Toward this, we propose a user satisfaction metric to measure the overall effectiveness of the Web-database system. We also provide a set of algorithms to dynamically optimize this metric, through query admission control and update frequency modulation. Finally, we present extensive experimental results which compare our proposed algorithms to the current state of the art and show that we outperform competitors under various workloads (generated based on real traces) and user requirements.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"57 1","pages":"33-33"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74635467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Materializing the contents of views has important applications including providing fast access to derived database repositories, optimizing query processing based on cached results, and increasing availability. Maintaining the consistency between materialized views and their base data in the presence of source updates is important to ensure that the materialized views are up-to-date. The straightforward solution for this problem is to recompute the view from scratch over the updated sources.
{"title":"Incremental Maintenance of Materialized XQuery Views","authors":"M. El-Sayed, Elke A. Rundensteiner, Murali Mani","doi":"10.1109/ICDE.2006.80","DOIUrl":"https://doi.org/10.1109/ICDE.2006.80","url":null,"abstract":"Materializing the contents of views has important applications including providing fast access to derived database repositories, optimizing query processing based on cached results, and increasing availability. Maintaining the consistency between materialized views and their base data in the presence of source updates is important to ensure that the materialized views are up-to-date. The straightforward solution for this problem is to recompute the view from scratch over the updated sources.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"6 1","pages":"129-129"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79835954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As native XML database systems (e.g., [3, 7, 8]) get increasingly popular, fine-granularity concurrency control becomes imperative in order to allow different clients to concurrently access the same documents. Existing concurrency control approaches for XML are mainly based on locking [2, 3, 4, 6, 5]. However, the experiments of [5] have shown that the locking overhead, especially for read operations, can be tremendous. In this paper, we present two snapshot based concurrency control mechanisms that avoid locking. Instead, transactions access a committed snapshot of the data.
{"title":"Don’t be a Pessimist: Use Snapshot based Concurrency Control for XML","authors":"Zeeshan Sardar, Bettina Kemme","doi":"10.1109/ICDE.2006.51","DOIUrl":"https://doi.org/10.1109/ICDE.2006.51","url":null,"abstract":"As native XML database systems (e.g., [3, 7, 8]) get increasingly popular, fine-granularity concurrency control becomes imperative in order to allow different clients to concurrently access the same documents. Existing concurrency control approaches for XML are mainly based on locking [2, 3, 4, 6, 5]. However, the experiments of [5] have shown that the locking overhead, especially for read operations, can be tremendous. In this paper, we present two snapshot based concurrency control mechanisms that avoid locking. Instead, transactions access a committed snapshot of the data.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"40 1","pages":"130-130"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80186472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DongDong Zhang, Jianzhong Li, Kimutai Kimeli, Weiping Wang
This paper focuses on multi-way sliding window join (SWJoin) processing over distributed data streams. A novel Join algorithm is proposed based on two distributed data stream transfer models. To reduce the communication cost and lighten the workload on the central processor node, the algorithm filters out tuples that can’t contribute to multiway SWJoin results by transforming the join conditions of SWJoin into filtering conditions during data stream transfer. Furthermore, the algorithm guarantees that all necessary data for generating exact multi-way SWJoin results can be transmitted to the central processor node.
{"title":"SlidingWindow based Multi-Join Algorithms over Distributed Data Streams","authors":"DongDong Zhang, Jianzhong Li, Kimutai Kimeli, Weiping Wang","doi":"10.1109/ICDE.2006.143","DOIUrl":"https://doi.org/10.1109/ICDE.2006.143","url":null,"abstract":"This paper focuses on multi-way sliding window join (SWJoin) processing over distributed data streams. A novel Join algorithm is proposed based on two distributed data stream transfer models. To reduce the communication cost and lighten the workload on the central processor node, the algorithm filters out tuples that can’t contribute to multiway SWJoin results by transforming the join conditions of SWJoin into filtering conditions during data stream transfer. Furthermore, the algorithm guarantees that all necessary data for generating exact multi-way SWJoin results can be transmitted to the central processor node.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"7 1","pages":"139-139"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86241591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data cleaning based on similarities involves identification of "close" tuples, where closeness is evaluated using a variety of similarity functions chosen to suit the domain and application. Current approaches for efficiently implementing such similarity joins are tightly tied to the chosen similarity function. In this paper, we propose a new primitive operator which can be used as a foundation to implement similarity joins according to a variety of popular string similarity functions, and notions of similarity which go beyond textual similarity. We then propose efficient implementations for this operator. In an experimental evaluation using real datasets, we show that the implementation of similarity joins using our operator is comparable to, and often substantially better than, previous customized implementations for particular similarity functions.
{"title":"A Primitive Operator for Similarity Joins in Data Cleaning","authors":"S. Chaudhuri, Venkatesh Ganti, R. Kaushik","doi":"10.1109/ICDE.2006.9","DOIUrl":"https://doi.org/10.1109/ICDE.2006.9","url":null,"abstract":"Data cleaning based on similarities involves identification of \"close\" tuples, where closeness is evaluated using a variety of similarity functions chosen to suit the domain and application. Current approaches for efficiently implementing such similarity joins are tightly tied to the chosen similarity function. In this paper, we propose a new primitive operator which can be used as a foundation to implement similarity joins according to a variety of popular string similarity functions, and notions of similarity which go beyond textual similarity. We then propose efficient implementations for this operator. In an experimental evaluation using real datasets, we show that the implementation of similarity joins using our operator is comparable to, and often substantially better than, previous customized implementations for particular similarity functions.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"35 1","pages":"5-5"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87282268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Integrating Deep Web sources requires highly accurate semantic matches between the attributes of the source query interfaces. These matches are usually established by comparing the similarities of the attributes’ labels and instances. However, attributes on query interfaces often have no or very few data instances. The pervasive lack of instances seriously reduces the accuracy of current matching techniques. To address this problem, we describe WebIQ, a solution that learns from both the Surface Web and the Deep Web to automatically discover instances for interface attributes. WebIQ extends question answering techniques commonly used in the AI community for this purpose. We describe how to incorporate WebIQ into current interface matching systems. Extensive experiments over five realworld domains show the utility ofWebIQ. In particular, the results show that acquired instances help improve matching accuracy from 89.5% F-1 to 97.5%, at only a modest runtime overhead.
{"title":"WebIQ: Learning from the Web to Match Deep-Web Query Interfaces","authors":"Wensheng Wu, A. Doan, Clement T. Yu","doi":"10.1109/ICDE.2006.172","DOIUrl":"https://doi.org/10.1109/ICDE.2006.172","url":null,"abstract":"Integrating Deep Web sources requires highly accurate semantic matches between the attributes of the source query interfaces. These matches are usually established by comparing the similarities of the attributes’ labels and instances. However, attributes on query interfaces often have no or very few data instances. The pervasive lack of instances seriously reduces the accuracy of current matching techniques. To address this problem, we describe WebIQ, a solution that learns from both the Surface Web and the Deep Web to automatically discover instances for interface attributes. WebIQ extends question answering techniques commonly used in the AI community for this purpose. We describe how to incorporate WebIQ into current interface matching systems. Extensive experiments over five realworld domains show the utility ofWebIQ. In particular, the results show that acquired instances help improve matching accuracy from 89.5% F-1 to 97.5%, at only a modest runtime overhead.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"66 1","pages":"44-44"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86008277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data partitioning is often used to scale-up a database system. In a centralized database system, the serialization order of commited update transactions can be inferred from the database log. To achieve this in a shared-nothing distributed database, the serialization order of update transactions must be inferred from multiple database logs. We describe a technique to generate a single stream of updates from logs of multiple database systems. This single stream represents a valid serialization order of update transactions at the sites over which the database is partitioned.
{"title":"Inferring a Serialization Order for Distributed Transactions","authors":"Khuzaima S. Daudjee, K. Salem","doi":"10.1109/ICDE.2006.82","DOIUrl":"https://doi.org/10.1109/ICDE.2006.82","url":null,"abstract":"Data partitioning is often used to scale-up a database system. In a centralized database system, the serialization order of commited update transactions can be inferred from the database log. To achieve this in a shared-nothing distributed database, the serialization order of update transactions must be inferred from multiple database logs. We describe a technique to generate a single stream of updates from logs of multiple database systems. This single stream represents a valid serialization order of update transactions at the sites over which the database is partitioned.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"5 1","pages":"154-154"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86083996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}