首页 > 最新文献

22nd International Conference on Data Engineering (ICDE'06)最新文献

英文 中文
Query Decomposition: A Multiple Neighborhood Approach to Relevance Feedback Processing in Content-based Image Retrieval 查询分解:基于内容的图像检索中相关反馈处理的多邻域方法
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.123
K. Hua, Ning Yu, Danzhou Liu
Today’s Content-Based Image Retrieval (CBIR) techniques are based on the "k-nearest neighbors" (k- NN) model. They retrieve images from a single neighborhood using low-level visual features. In this model, semantically similar images are assumed to be clustered in the high-dimensional feature space. Unfortunately, no visual-based feature vector is sufficient to facilitate perfect semantic clustering; and semantically similar images with different appearances are always clustered into distinct neighborhoods in the feature space. Confinement of the search results to a single neighborhood is an inherent limitation of the k-NN techniques. In this paper we consider a new image retrieval paradigm — the Query Decomposition model - that facilitates retrieval of semantically similar images from multiple neighborhoods in the feature space. The retrieval results are the k most similar images from different relevant clusters. We introduce a prototype, and present experimental results to illustrate the effectiveness and efficiency of this new approach to content-based image retrieval.
目前基于内容的图像检索(CBIR)技术是基于“k近邻”(k- NN)模型。他们使用低层次的视觉特征从单个街区检索图像。该模型假设语义相似的图像聚类在高维特征空间中。不幸的是,没有基于视觉的特征向量足以促进完美的语义聚类;在特征空间中,语义相似但外观不同的图像总是被聚类成不同的邻域。将搜索结果限制在单个邻域是k-NN技术的固有局限性。在本文中,我们考虑了一种新的图像检索范式-查询分解模型-它有助于从特征空间的多个邻域中检索语义相似的图像。检索结果是来自不同相关聚类的k张最相似的图像。我们介绍了一个原型,并给出了实验结果来说明这种新方法在基于内容的图像检索中的有效性和效率。
{"title":"Query Decomposition: A Multiple Neighborhood Approach to Relevance Feedback Processing in Content-based Image Retrieval","authors":"K. Hua, Ning Yu, Danzhou Liu","doi":"10.1109/ICDE.2006.123","DOIUrl":"https://doi.org/10.1109/ICDE.2006.123","url":null,"abstract":"Today’s Content-Based Image Retrieval (CBIR) techniques are based on the \"k-nearest neighbors\" (k- NN) model. They retrieve images from a single neighborhood using low-level visual features. In this model, semantically similar images are assumed to be clustered in the high-dimensional feature space. Unfortunately, no visual-based feature vector is sufficient to facilitate perfect semantic clustering; and semantically similar images with different appearances are always clustered into distinct neighborhoods in the feature space. Confinement of the search results to a single neighborhood is an inherent limitation of the k-NN techniques. In this paper we consider a new image retrieval paradigm — the Query Decomposition model - that facilitates retrieval of semantically similar images from multiple neighborhoods in the feature space. The retrieval results are the k most similar images from different relevant clusters. We introduce a prototype, and present experimental results to illustrate the effectiveness and efficiency of this new approach to content-based image retrieval.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"54 1","pages":"84-84"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86154373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Surface k-NN Query Processing 曲面k-NN查询处理
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.152
K. Deng, Xiaofang Zhou, Heng Tao Shen, K. Xu, Xuemin Lin
A k-NN query finds the k nearest-neighbors of a given point from a point database. When it is sufficient to measure object distance using the Euclidian distance, the key to efficient k-NN query processing is to fetch and check the distances of a minimum number of points from the database. For many applications, such as vehicle movement along road networks or rover and animal movement along terrain surfaces, the distance is only meaningful when it is along a valid movement path. For this type of k-NN queries, the focus of efficient query processing is to minimize the cost of computing distances using the environment data (such as the road network data and the terrain data), which can be several orders of magnitude larger than that of the point data. Efficient processing of k-NN queries based on the Euclidian distance or the road network distance has been investigated extensively in the past. In this paper, we investigate the problem of surface k-NN query processing, where the distance is calculated from the shortest path along a terrain surface. This problem is very challenging, as the terrain data can be very large and the computational cost of finding shortest paths is very high. We propose an efficient solution based on multiresolution terrain models. Our approach eliminates the need of costly process of finding shortest paths by ranking objects using estimated lower and upper bounds of distance on multiresolution terrain models.
k- nn查询从点数据库中找到给定点的k个最近邻居。当使用欧几里得距离测量物体距离已经足够时,有效k-NN查询处理的关键是从数据库中获取并检查最小数量点的距离。对于许多应用,例如车辆沿着道路网络移动或漫游者和动物沿着地形表面移动,距离只有在沿着有效的移动路径时才有意义。对于这种类型的k-NN查询,高效查询处理的重点是最小化使用环境数据(如道路网络数据和地形数据)计算距离的成本,这可能比点数据大几个数量级。基于欧氏距离或路网距离的k-NN查询的高效处理在过去已经得到了广泛的研究。在本文中,我们研究了表面k-NN查询处理问题,其中距离是从沿着地形表面的最短路径计算的。这个问题非常具有挑战性,因为地形数据可能非常大,寻找最短路径的计算成本非常高。我们提出了一种基于多分辨率地形模型的有效解决方案。我们的方法消除了在多分辨率地形模型上通过使用估计的距离下界和上界对目标进行排序来寻找最短路径的昂贵过程的需要。
{"title":"Surface k-NN Query Processing","authors":"K. Deng, Xiaofang Zhou, Heng Tao Shen, K. Xu, Xuemin Lin","doi":"10.1109/ICDE.2006.152","DOIUrl":"https://doi.org/10.1109/ICDE.2006.152","url":null,"abstract":"A k-NN query finds the k nearest-neighbors of a given point from a point database. When it is sufficient to measure object distance using the Euclidian distance, the key to efficient k-NN query processing is to fetch and check the distances of a minimum number of points from the database. For many applications, such as vehicle movement along road networks or rover and animal movement along terrain surfaces, the distance is only meaningful when it is along a valid movement path. For this type of k-NN queries, the focus of efficient query processing is to minimize the cost of computing distances using the environment data (such as the road network data and the terrain data), which can be several orders of magnitude larger than that of the point data. Efficient processing of k-NN queries based on the Euclidian distance or the road network distance has been investigated extensively in the past. In this paper, we investigate the problem of surface k-NN query processing, where the distance is calculated from the shortest path along a terrain surface. This problem is very challenging, as the terrain data can be very large and the computational cost of finding shortest paths is very high. We propose an efficient solution based on multiresolution terrain models. Our approach eliminates the need of costly process of finding shortest paths by ranking objects using estimated lower and upper bounds of distance on multiresolution terrain models.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"28 1","pages":"78-78"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82556576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
An Estimation System for XPath Expressions XPath表达式的估计系统
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.19
Hanyu Li, M. Lee, W. Hsu, G. Cong
Estimating the result sizes of XML queries is important in query optimization and is useful in providing a quick feedback about the queries. Existing works have focused on the selectivity estimation of XML queries without order-based axes. In this work, we develop a framework to estimate the result sizes of XPath expressions with order-based axes. We describe how the path and order information of XML elements can be captured and summarized in compact data structures. We also describe methods to estimate the selectivity of XPath queries. The results of extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness and accuracy of the proposed approach.
估计XML查询的结果大小在查询优化中很重要,并且有助于提供关于查询的快速反馈。现有的工作主要集中在没有基于顺序轴的XML查询的选择性估计上。在这项工作中,我们开发了一个框架来估计基于顺序轴的XPath表达式的结果大小。我们描述了如何捕获XML元素的路径和顺序信息,并将其总结为紧凑的数据结构。我们还描述了估计XPath查询选择性的方法。在合成数据集和真实数据集上的大量实验结果证明了所提出方法的有效性和准确性。
{"title":"An Estimation System for XPath Expressions","authors":"Hanyu Li, M. Lee, W. Hsu, G. Cong","doi":"10.1109/ICDE.2006.19","DOIUrl":"https://doi.org/10.1109/ICDE.2006.19","url":null,"abstract":"Estimating the result sizes of XML queries is important in query optimization and is useful in providing a quick feedback about the queries. Existing works have focused on the selectivity estimation of XML queries without order-based axes. In this work, we develop a framework to estimate the result sizes of XPath expressions with order-based axes. We describe how the path and order information of XML elements can be captured and summarized in compact data structures. We also describe methods to estimate the selectivity of XPath queries. The results of extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness and accuracy of the proposed approach.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"23 1","pages":"54-54"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74502760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
UNIT: User-centric Transaction Management in Web-Database Systems 单元:web -数据库系统中以用户为中心的事务管理
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.166
Huiming Qu, Alexandros Labrinidis, D. Mossé
Web-database systems are nowadays an integral part of everybody’s life, with applications ranging from monitoring/ trading stock portfolios, to personalized blog aggregation and news services, to personalized weather tracking services. For most of these services to be successful (and their users to be kept satisfied), two criteria need to be met: user requests must be answered in a timely fashion and using fresh data. This paper presents a framework to balance both requirements from the users’ perspective. Toward this, we propose a user satisfaction metric to measure the overall effectiveness of the Web-database system. We also provide a set of algorithms to dynamically optimize this metric, through query admission control and update frequency modulation. Finally, we present extensive experimental results which compare our proposed algorithms to the current state of the art and show that we outperform competitors under various workloads (generated based on real traces) and user requirements.
web -数据库系统如今已成为每个人生活中不可或缺的一部分,其应用范围从监视/交易股票投资组合,到个性化博客聚合和新闻服务,再到个性化天气跟踪服务。要使大多数这些服务成功(并使其用户满意),需要满足两个标准:必须及时响应用户请求并使用新鲜数据。本文提出了一个从用户角度平衡这两种需求的框架。为此,我们提出了一个用户满意度度量来度量web -数据库系统的总体有效性。我们还提供了一套通过查询允许控制和更新调频来动态优化该度量的算法。最后,我们提出了广泛的实验结果,将我们提出的算法与当前的艺术状态进行比较,并表明我们在各种工作负载(基于真实轨迹生成)和用户需求下的表现优于竞争对手。
{"title":"UNIT: User-centric Transaction Management in Web-Database Systems","authors":"Huiming Qu, Alexandros Labrinidis, D. Mossé","doi":"10.1109/ICDE.2006.166","DOIUrl":"https://doi.org/10.1109/ICDE.2006.166","url":null,"abstract":"Web-database systems are nowadays an integral part of everybody’s life, with applications ranging from monitoring/ trading stock portfolios, to personalized blog aggregation and news services, to personalized weather tracking services. For most of these services to be successful (and their users to be kept satisfied), two criteria need to be met: user requests must be answered in a timely fashion and using fresh data. This paper presents a framework to balance both requirements from the users’ perspective. Toward this, we propose a user satisfaction metric to measure the overall effectiveness of the Web-database system. We also provide a set of algorithms to dynamically optimize this metric, through query admission control and update frequency modulation. Finally, we present extensive experimental results which compare our proposed algorithms to the current state of the art and show that we outperform competitors under various workloads (generated based on real traces) and user requirements.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"57 1","pages":"33-33"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74635467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Incremental Maintenance of Materialized XQuery Views 物化XQuery视图的增量维护
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.80
M. El-Sayed, Elke A. Rundensteiner, Murali Mani
Materializing the contents of views has important applications including providing fast access to derived database repositories, optimizing query processing based on cached results, and increasing availability. Maintaining the consistency between materialized views and their base data in the presence of source updates is important to ensure that the materialized views are up-to-date. The straightforward solution for this problem is to recompute the view from scratch over the updated sources.
物化视图的内容具有重要的应用程序,包括提供对派生数据库存储库的快速访问、基于缓存结果优化查询处理以及提高可用性。在存在源更新的情况下,维护物化视图与其基础数据之间的一致性对于确保物化视图是最新的非常重要。此问题的直接解决方案是在更新的源上从头开始重新计算视图。
{"title":"Incremental Maintenance of Materialized XQuery Views","authors":"M. El-Sayed, Elke A. Rundensteiner, Murali Mani","doi":"10.1109/ICDE.2006.80","DOIUrl":"https://doi.org/10.1109/ICDE.2006.80","url":null,"abstract":"Materializing the contents of views has important applications including providing fast access to derived database repositories, optimizing query processing based on cached results, and increasing availability. Maintaining the consistency between materialized views and their base data in the presence of source updates is important to ensure that the materialized views are up-to-date. The straightforward solution for this problem is to recompute the view from scratch over the updated sources.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"6 1","pages":"129-129"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79835954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Don’t be a Pessimist: Use Snapshot based Concurrency Control for XML 不要悲观:为XML使用基于快照的并发控制
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.51
Zeeshan Sardar, Bettina Kemme
As native XML database systems (e.g., [3, 7, 8]) get increasingly popular, fine-granularity concurrency control becomes imperative in order to allow different clients to concurrently access the same documents. Existing concurrency control approaches for XML are mainly based on locking [2, 3, 4, 6, 5]. However, the experiments of [5] have shown that the locking overhead, especially for read operations, can be tremendous. In this paper, we present two snapshot based concurrency control mechanisms that avoid locking. Instead, transactions access a committed snapshot of the data.
随着原生XML数据库系统(例如[3,7,8])越来越流行,为了允许不同的客户机并发访问相同的文档,细粒度并发控制变得势在必行。现有的XML并发控制方法主要基于锁定[2,3,4,6,5]。然而,[5]的实验表明,锁定开销,特别是读操作,可能是巨大的。在本文中,我们提出了两种基于快照的避免锁定的并发控制机制。相反,事务访问已提交的数据快照。
{"title":"Don’t be a Pessimist: Use Snapshot based Concurrency Control for XML","authors":"Zeeshan Sardar, Bettina Kemme","doi":"10.1109/ICDE.2006.51","DOIUrl":"https://doi.org/10.1109/ICDE.2006.51","url":null,"abstract":"As native XML database systems (e.g., [3, 7, 8]) get increasingly popular, fine-granularity concurrency control becomes imperative in order to allow different clients to concurrently access the same documents. Existing concurrency control approaches for XML are mainly based on locking [2, 3, 4, 6, 5]. However, the experiments of [5] have shown that the locking overhead, especially for read operations, can be tremendous. In this paper, we present two snapshot based concurrency control mechanisms that avoid locking. Instead, transactions access a committed snapshot of the data.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"40 1","pages":"130-130"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80186472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
SlidingWindow based Multi-Join Algorithms over Distributed Data Streams 基于滑动窗口的分布式数据流多连接算法
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.143
DongDong Zhang, Jianzhong Li, Kimutai Kimeli, Weiping Wang
This paper focuses on multi-way sliding window join (SWJoin) processing over distributed data streams. A novel Join algorithm is proposed based on two distributed data stream transfer models. To reduce the communication cost and lighten the workload on the central processor node, the algorithm filters out tuples that can’t contribute to multiway SWJoin results by transforming the join conditions of SWJoin into filtering conditions during data stream transfer. Furthermore, the algorithm guarantees that all necessary data for generating exact multi-way SWJoin results can be transmitted to the central processor node.
本文主要研究分布式数据流上的多路滑动窗口连接(SWJoin)处理。提出了一种基于两种分布式数据流传输模型的连接算法。为了降低通信成本,减轻中央处理器节点的工作负荷,该算法在数据流传输过程中将SWJoin的连接条件转化为过滤条件,过滤掉不能产生多路SWJoin结果的元组。此外,该算法保证生成精确的多路SWJoin结果所需的所有数据都可以传输到中央处理器节点。
{"title":"SlidingWindow based Multi-Join Algorithms over Distributed Data Streams","authors":"DongDong Zhang, Jianzhong Li, Kimutai Kimeli, Weiping Wang","doi":"10.1109/ICDE.2006.143","DOIUrl":"https://doi.org/10.1109/ICDE.2006.143","url":null,"abstract":"This paper focuses on multi-way sliding window join (SWJoin) processing over distributed data streams. A novel Join algorithm is proposed based on two distributed data stream transfer models. To reduce the communication cost and lighten the workload on the central processor node, the algorithm filters out tuples that can’t contribute to multiway SWJoin results by transforming the join conditions of SWJoin into filtering conditions during data stream transfer. Furthermore, the algorithm guarantees that all necessary data for generating exact multi-way SWJoin results can be transmitted to the central processor node.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"7 1","pages":"139-139"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86241591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A Primitive Operator for Similarity Joins in Data Cleaning 数据清理中相似连接的基本运算符
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.9
S. Chaudhuri, Venkatesh Ganti, R. Kaushik
Data cleaning based on similarities involves identification of "close" tuples, where closeness is evaluated using a variety of similarity functions chosen to suit the domain and application. Current approaches for efficiently implementing such similarity joins are tightly tied to the chosen similarity function. In this paper, we propose a new primitive operator which can be used as a foundation to implement similarity joins according to a variety of popular string similarity functions, and notions of similarity which go beyond textual similarity. We then propose efficient implementations for this operator. In an experimental evaluation using real datasets, we show that the implementation of similarity joins using our operator is comparable to, and often substantially better than, previous customized implementations for particular similarity functions.
基于相似性的数据清理涉及“接近”元组的识别,其中使用选择适合领域和应用程序的各种相似性函数来评估接近度。当前有效实现这种相似连接的方法与所选择的相似函数紧密相关。本文根据各种流行的字符串相似函数和超越文本相似的相似概念,提出了一种新的基元算子,作为实现相似连接的基础。然后,我们提出了该运算符的有效实现。在使用真实数据集的实验评估中,我们表明使用我们的运算符实现的相似性连接与以前针对特定相似性函数的定制实现相当,并且通常要好得多。
{"title":"A Primitive Operator for Similarity Joins in Data Cleaning","authors":"S. Chaudhuri, Venkatesh Ganti, R. Kaushik","doi":"10.1109/ICDE.2006.9","DOIUrl":"https://doi.org/10.1109/ICDE.2006.9","url":null,"abstract":"Data cleaning based on similarities involves identification of \"close\" tuples, where closeness is evaluated using a variety of similarity functions chosen to suit the domain and application. Current approaches for efficiently implementing such similarity joins are tightly tied to the chosen similarity function. In this paper, we propose a new primitive operator which can be used as a foundation to implement similarity joins according to a variety of popular string similarity functions, and notions of similarity which go beyond textual similarity. We then propose efficient implementations for this operator. In an experimental evaluation using real datasets, we show that the implementation of similarity joins using our operator is comparable to, and often substantially better than, previous customized implementations for particular similarity functions.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"35 1","pages":"5-5"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87282268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 618
WebIQ: Learning from the Web to Match Deep-Web Query Interfaces WebIQ:从Web学习匹配深度Web查询接口
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.172
Wensheng Wu, A. Doan, Clement T. Yu
Integrating Deep Web sources requires highly accurate semantic matches between the attributes of the source query interfaces. These matches are usually established by comparing the similarities of the attributes’ labels and instances. However, attributes on query interfaces often have no or very few data instances. The pervasive lack of instances seriously reduces the accuracy of current matching techniques. To address this problem, we describe WebIQ, a solution that learns from both the Surface Web and the Deep Web to automatically discover instances for interface attributes. WebIQ extends question answering techniques commonly used in the AI community for this purpose. We describe how to incorporate WebIQ into current interface matching systems. Extensive experiments over five realworld domains show the utility ofWebIQ. In particular, the results show that acquired instances help improve matching accuracy from 89.5% F-1 to 97.5%, at only a modest runtime overhead.
集成深度Web源需要源查询接口属性之间的高度精确的语义匹配。这些匹配通常是通过比较属性标签和实例的相似性来建立的。但是,查询接口上的属性通常没有或只有很少的数据实例。实例的普遍缺乏严重降低了当前匹配技术的准确性。为了解决这个问题,我们描述了WebIQ,这是一个从表层网络和深层网络学习的解决方案,可以自动发现接口属性的实例。为此,WebIQ扩展了AI社区中常用的问答技术。我们描述了如何将WebIQ纳入当前的接口匹配系统。在五个现实世界领域的广泛实验显示了webiq的实用性。特别是,结果表明,获取的实例有助于将匹配精度从89.5% F-1提高到97.5%,而只需要适度的运行时开销。
{"title":"WebIQ: Learning from the Web to Match Deep-Web Query Interfaces","authors":"Wensheng Wu, A. Doan, Clement T. Yu","doi":"10.1109/ICDE.2006.172","DOIUrl":"https://doi.org/10.1109/ICDE.2006.172","url":null,"abstract":"Integrating Deep Web sources requires highly accurate semantic matches between the attributes of the source query interfaces. These matches are usually established by comparing the similarities of the attributes’ labels and instances. However, attributes on query interfaces often have no or very few data instances. The pervasive lack of instances seriously reduces the accuracy of current matching techniques. To address this problem, we describe WebIQ, a solution that learns from both the Surface Web and the Deep Web to automatically discover instances for interface attributes. WebIQ extends question answering techniques commonly used in the AI community for this purpose. We describe how to incorporate WebIQ into current interface matching systems. Extensive experiments over five realworld domains show the utility ofWebIQ. In particular, the results show that acquired instances help improve matching accuracy from 89.5% F-1 to 97.5%, at only a modest runtime overhead.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"66 1","pages":"44-44"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86008277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
Inferring a Serialization Order for Distributed Transactions 推断分布式事务的序列化顺序
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.82
Khuzaima S. Daudjee, K. Salem
Data partitioning is often used to scale-up a database system. In a centralized database system, the serialization order of commited update transactions can be inferred from the database log. To achieve this in a shared-nothing distributed database, the serialization order of update transactions must be inferred from multiple database logs. We describe a technique to generate a single stream of updates from logs of multiple database systems. This single stream represents a valid serialization order of update transactions at the sites over which the database is partitioned.
数据分区通常用于扩展数据库系统。在集中式数据库系统中,可以从数据库日志推断已提交更新事务的序列化顺序。要在无共享的分布式数据库中实现这一点,必须从多个数据库日志中推断更新事务的序列化顺序。我们描述了一种从多个数据库系统的日志生成单个更新流的技术。此单个流表示对数据库进行分区的站点上更新事务的有效序列化顺序。
{"title":"Inferring a Serialization Order for Distributed Transactions","authors":"Khuzaima S. Daudjee, K. Salem","doi":"10.1109/ICDE.2006.82","DOIUrl":"https://doi.org/10.1109/ICDE.2006.82","url":null,"abstract":"Data partitioning is often used to scale-up a database system. In a centralized database system, the serialization order of commited update transactions can be inferred from the database log. To achieve this in a shared-nothing distributed database, the serialization order of update transactions must be inferred from multiple database logs. We describe a technique to generate a single stream of updates from logs of multiple database systems. This single stream represents a valid serialization order of update transactions at the sites over which the database is partitioned.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"5 1","pages":"154-154"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86083996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
22nd International Conference on Data Engineering (ICDE'06)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1