首页 > 最新文献

Proceedings. 20th International Conference on Data Engineering最新文献

英文 中文
Routing XML queries 路由XML查询
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320074
Nick Koudas, M. Rabinovich, D. Srivastava, Tingbao Yu
In file-sharing P2P networks, a fundamental problem is that of identifying databases that are relevant to user queries. This problem is referred to as the location problem in P2P literature. We propose a scalable solution to the location problem in a data-sharing P2P network, consisting of a network of XML database nodes and XML router nodes, and make the following contributions. We develop the internal organization and routing protocols for the XML router nodes, to enable scalable XPath query and update processing, under the open and the agreement cooperation models between nodes. Since router nodes tend to be memory constrained, we facilitate a space/performance tradeoff by permitting aggregated routing states, and developing algorithms for generating and using such aggregated information. We experimentally demonstrate the scalability of our approach, and the performance of our query and update protocols, using a detailed simulation model, varying key design parameters.
在文件共享P2P网络中,一个基本问题是识别与用户查询相关的数据库。这个问题在P2P文献中被称为定位问题。针对由XML数据库节点和XML路由器节点组成的数据共享P2P网络中的位置问题,提出了一种可扩展的解决方案,并做出了以下贡献:我们开发了XML路由器节点的内部组织和路由协议,在节点间开放和协议的合作模式下,实现了可扩展的XPath查询和更新处理。由于路由器节点往往受到内存限制,我们通过允许聚合路由状态和开发生成和使用此类聚合信息的算法来促进空间/性能权衡。我们通过实验证明了我们的方法的可扩展性,以及我们的查询和更新协议的性能,使用详细的仿真模型,不同的关键设计参数。
{"title":"Routing XML queries","authors":"Nick Koudas, M. Rabinovich, D. Srivastava, Tingbao Yu","doi":"10.1109/ICDE.2004.1320074","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320074","url":null,"abstract":"In file-sharing P2P networks, a fundamental problem is that of identifying databases that are relevant to user queries. This problem is referred to as the location problem in P2P literature. We propose a scalable solution to the location problem in a data-sharing P2P network, consisting of a network of XML database nodes and XML router nodes, and make the following contributions. We develop the internal organization and routing protocols for the XML router nodes, to enable scalable XPath query and update processing, under the open and the agreement cooperation models between nodes. Since router nodes tend to be memory constrained, we facilitate a space/performance tradeoff by permitting aggregated routing states, and developing algorithms for generating and using such aggregated information. We experimentally demonstrate the scalability of our approach, and the performance of our query and update protocols, using a detailed simulation model, varying key design parameters.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131170210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Nile: a query processing engine for data streams 尼罗河:数据流的查询处理引擎
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320080
M. Hammad, M. Mokbel, Mohamed H. Ali, Walid G. Aref, A. Catlin, A. Elmagarmid, M. Eltabakh, Mohamed G. Elfeky, T. Ghanem, Robert Gwadera, I. Ilyas, M. Marzouk, Xiaopeng Xiong
We present the demonstration of the design of "STEAM", Purdue Boiler Makers' stream database system that allows for the processing of continuous and snap-shot queries over data streams. Specifically, the demonstration focuses on the query processing engine, "Nile". Nile extends the query processor engine of an object-relational database management system, PREDATOR, to process continuous queries over data streams. Nile supports extended SQL operators that handle sliding-window execution as an approach to restrict the size of the stored state in operators such as join.
我们展示了“STEAM”的设计演示,这是普渡锅炉制造商的流数据库系统,允许对数据流进行连续和快照查询的处理。具体地说,演示的重点是查询处理引擎“Nile”。Nile扩展了对象关系数据库管理系统(PREDATOR)的查询处理器引擎,以处理对数据流的连续查询。Nile支持扩展SQL操作符,这些操作符处理滑动窗口执行,作为一种限制join等操作符中存储状态大小的方法。
{"title":"Nile: a query processing engine for data streams","authors":"M. Hammad, M. Mokbel, Mohamed H. Ali, Walid G. Aref, A. Catlin, A. Elmagarmid, M. Eltabakh, Mohamed G. Elfeky, T. Ghanem, Robert Gwadera, I. Ilyas, M. Marzouk, Xiaopeng Xiong","doi":"10.1109/ICDE.2004.1320080","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320080","url":null,"abstract":"We present the demonstration of the design of \"STEAM\", Purdue Boiler Makers' stream database system that allows for the processing of continuous and snap-shot queries over data streams. Specifically, the demonstration focuses on the query processing engine, \"Nile\". Nile extends the query processor engine of an object-relational database management system, PREDATOR, to process continuous queries over data streams. Nile supports extended SQL operators that handle sliding-window execution as an approach to restrict the size of the stored state in operators such as join.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130747294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 134
Publish/subscribe in NonStop SQL: transactional streams in a relational context 在NonStop SQL中发布/订阅:关系上下文中的事务流
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320056
Mike Hanlon, J. Klein, B. V. D. Linden, Hansjörg Zeller
Relational queries on continuous streams of data are the subject of many recent database research projects. In 1998 a small group of people started a similar project with the goal to transform our product, NonStop SQL/MX, into an active RDBMS. This project tried to integrate functionality of transactional queuing systems with relational tables and with SQL, using simple extensions to the SQL syntax and guaranteeing clearly defined query and transactional semantics. The result is the first commercially available RDBMS that incorporates streams. All data flowing through the system is contained in relational tables and is protected by ACID transactions. Insert and update operations on any NonStop SQL table can be considered publishing of data and can therefore be transparent to the (legacy) applications performing them. Unlike triggers, the publish operation does not increase the path length of the application and it allows the subscriber to execute in a separate transaction. Subscribers, using an extended SQL syntax, see a continuous stream of data, consisting of all rows originally in the table plus all rows that are inserted or updated thereafter. The system scales by using partitioned tables and therefore partitioned streams.
对连续数据流的关系查询是最近许多数据库研究项目的主题。1998年,一小群人开始了一个类似的项目,目标是将我们的产品NonStop SQL/MX转换为一个主动的RDBMS。该项目尝试将事务性排队系统的功能与关系表和SQL集成,使用对SQL语法的简单扩展,并保证明确定义的查询和事务性语义。其结果是第一个商业上可用的集成流的RDBMS。流经系统的所有数据都包含在关系表中,并受到ACID事务的保护。对任何NonStop SQL表的插入和更新操作都可以视为数据发布,因此对执行这些操作的(遗留)应用程序是透明的。与触发器不同,发布操作不会增加应用程序的路径长度,它允许订阅者在单独的事务中执行。使用扩展SQL语法的订阅者可以看到连续的数据流,包括最初在表中的所有行以及此后插入或更新的所有行。系统通过使用分区表和分区流进行扩展。
{"title":"Publish/subscribe in NonStop SQL: transactional streams in a relational context","authors":"Mike Hanlon, J. Klein, B. V. D. Linden, Hansjörg Zeller","doi":"10.1109/ICDE.2004.1320056","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320056","url":null,"abstract":"Relational queries on continuous streams of data are the subject of many recent database research projects. In 1998 a small group of people started a similar project with the goal to transform our product, NonStop SQL/MX, into an active RDBMS. This project tried to integrate functionality of transactional queuing systems with relational tables and with SQL, using simple extensions to the SQL syntax and guaranteeing clearly defined query and transactional semantics. The result is the first commercially available RDBMS that incorporates streams. All data flowing through the system is contained in relational tables and is protected by ACID transactions. Insert and update operations on any NonStop SQL table can be considered publishing of data and can therefore be transparent to the (legacy) applications performing them. Unlike triggers, the publish operation does not increase the path length of the application and it allows the subscriber to execute in a separate transaction. Subscribers, using an extended SQL syntax, see a continuous stream of data, consisting of all rows originally in the table plus all rows that are inserted or updated thereafter. The system scales by using partitioned tables and therefore partitioned streams.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131674281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
On local pruning of association rules using directed hypergraphs 基于有向超图的关联规则局部剪枝
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320063
S. Chawla, Joseph G. Davis, G. Pandey
Here we propose an adaptive local pruning method for association rules. Our method exploits the exact mapping between a certain class of association rules, namely those whose consequents are singletons and backward directed hypergraphs (B-graphs). The hypergraph which represents the association rules is called an association rules network(ARN). Here we present a simple example of an ARN. We further prove several properties of the ARN and apply the results of our approach to two popular data sets.
本文提出了一种自适应的关联规则局部剪枝方法。我们的方法利用了某一类关联规则之间的精确映射,即那些结果是单例和向后有向超图(b图)的关联规则。表示关联规则的超图称为关联规则网络(ARN)。这里我们给出一个简单的ARN示例。我们进一步证明了ARN的几个性质,并将我们方法的结果应用于两个流行的数据集。
{"title":"On local pruning of association rules using directed hypergraphs","authors":"S. Chawla, Joseph G. Davis, G. Pandey","doi":"10.1109/ICDE.2004.1320063","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320063","url":null,"abstract":"Here we propose an adaptive local pruning method for association rules. Our method exploits the exact mapping between a certain class of association rules, namely those whose consequents are singletons and backward directed hypergraphs (B-graphs). The hypergraph which represents the association rules is called an association rules network(ARN). Here we present a simple example of an ARN. We further prove several properties of the ARN and apply the results of our approach to two popular data sets.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115625352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
A probabilistic approach to metasearching with adaptive probing 基于自适应探测的元搜索概率方法
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320026
Zhenyu Liu, C. Luo, Junghoo Cho, W. Chu
An ever-increasing amount of valuable information is stored in Web databases, "hidden" behind search interfaces. To save the user's effort in manually exploring each database, metasearchers automatically select the most relevant databases to a user's query. In this paper, we focus on one of the technical challenges in metasearching, namely database selection. Past research uses a precollected summary of each database to estimate its "relevancy" to the query, and in many cases make incorrect database selection. In this paper, we propose two techniques: probabilistic relevancy modelling and adaptive probing. First, we model the relevancy of each database to a given query as a probabilistic distribution, derived by sampling that database. Using the probabilistic model, the user can explicitly specify a desired level of certainty for database selection. The adaptive probing technique decides which and how many databases to contact in order to satisfy the user's requirement. Our experiments on real hidden-Web databases indicate that our approach significantly improves the accuracy of database selection at the cost of a small number of database probing.
越来越多的有价值的信息被存储在Web数据库中,“隐藏”在搜索界面后面。为了节省用户手动探索每个数据库的工作量,元搜索器会自动选择与用户查询最相关的数据库。在本文中,我们关注元搜索中的一个技术挑战,即数据库选择。过去的研究使用预先收集的每个数据库的摘要来估计其与查询的“相关性”,并且在许多情况下做出了错误的数据库选择。本文提出了两种技术:概率关联建模和自适应探测。首先,我们将每个数据库与给定查询的相关性建模为概率分布,该概率分布是通过对该数据库进行抽样得出的。使用概率模型,用户可以显式地为数据库选择指定所需的确定性级别。自适应探测技术可以根据用户的需求决定与哪些数据库和多少数据库进行接触。我们在真实的隐藏web数据库上的实验表明,我们的方法以少量的数据库探测为代价,显著提高了数据库选择的准确性。
{"title":"A probabilistic approach to metasearching with adaptive probing","authors":"Zhenyu Liu, C. Luo, Junghoo Cho, W. Chu","doi":"10.1109/ICDE.2004.1320026","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320026","url":null,"abstract":"An ever-increasing amount of valuable information is stored in Web databases, \"hidden\" behind search interfaces. To save the user's effort in manually exploring each database, metasearchers automatically select the most relevant databases to a user's query. In this paper, we focus on one of the technical challenges in metasearching, namely database selection. Past research uses a precollected summary of each database to estimate its \"relevancy\" to the query, and in many cases make incorrect database selection. In this paper, we propose two techniques: probabilistic relevancy modelling and adaptive probing. First, we model the relevancy of each database to a given query as a probabilistic distribution, derived by sampling that database. Using the probabilistic model, the user can explicitly specify a desired level of certainty for database selection. The adaptive probing technique decides which and how many databases to contact in order to satisfy the user's requirement. Our experiments on real hidden-Web databases indicate that our approach significantly improves the accuracy of database selection at the cost of a small number of database probing.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114433842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Spectral analysis of text collection for similarity-based clustering 基于相似度聚类的文本收集光谱分析
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320064
Wenyuan Li, W. Ng, Ee-Peng Lim
Clustering of text collections is generally difficult due to its high dimensionality, heterogeneity, and large size. These characteristics compound the problem of determining the appropriate similarity space for clustering algorithms. Here, we propose to use the spectral analysis of the similarity space of a text collection to predict clustering behavior before actual clustering is performed. Spectral analysis is a technique that has been adopted across different domains to analyze the key encoding information of a system. Using spectral analysis for prediction is useful in first determining the quality of the similarity space and discovering any possible problems the selected feature set may present.
由于文本集合的高维性、异构性和大尺寸,聚类通常是困难的。这些特征使得为聚类算法确定合适的相似空间的问题复杂化。在这里,我们建议在执行实际聚类之前,使用文本集合的相似空间的谱分析来预测聚类行为。谱分析是一种用于分析系统关键编码信息的跨领域技术。使用谱分析进行预测在首先确定相似空间的质量和发现所选特征集可能存在的任何可能的问题时是有用的。
{"title":"Spectral analysis of text collection for similarity-based clustering","authors":"Wenyuan Li, W. Ng, Ee-Peng Lim","doi":"10.1109/ICDE.2004.1320064","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320064","url":null,"abstract":"Clustering of text collections is generally difficult due to its high dimensionality, heterogeneity, and large size. These characteristics compound the problem of determining the appropriate similarity space for clustering algorithms. Here, we propose to use the spectral analysis of the similarity space of a text collection to predict clustering behavior before actual clustering is performed. Spectral analysis is a technique that has been adopted across different domains to analyze the key encoding information of a system. Using spectral analysis for prediction is useful in first determining the quality of the similarity space and discovering any possible problems the selected feature set may present.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123645576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Implementation and research issues in query processing for wireless sensor networks 无线传感器网络查询处理的实现与研究
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320102
W. Hong, S. Madden
This is a three-hour seminar discussing the design and implementation of software systems as well as open research problems related to data processing and collection in wireless sensor networks. During the first hour-and-ahalf, we focus on the design of the TinyDB data collection system for networks of Berkeley motes running the TinyOS operating system. Then, during the remainder of the seminar, we survey relevant literature from the database, networking, and OS communities and identify a number of unsolved and inadequately addressed research problems. This seminar is intended for anyone interested in wireless sensor networks with a general background in computer science, be they users of sensor networks looking for an easy way to collect data, developers interested in the design of TinyOS and TinyDB, or researchers in search of challenging new problems.
这是一个三小时的研讨会,讨论软件系统的设计和实现,以及与无线传感器网络中数据处理和收集相关的开放式研究问题。在前一个半小时,我们重点设计了运行TinyOS操作系统的Berkeley motes网络的TinyDB数据收集系统。然后,在研讨会的剩余时间里,我们调查了数据库、网络和操作系统社区的相关文献,并确定了一些未解决和未充分解决的研究问题。本次研讨会是为任何对无线传感器网络感兴趣的人准备的,有计算机科学的一般背景,无论他们是传感器网络的用户,寻找一种简单的方法来收集数据,对TinyOS和TinyDB设计感兴趣的开发人员,还是寻找具有挑战性的新问题的研究人员。
{"title":"Implementation and research issues in query processing for wireless sensor networks","authors":"W. Hong, S. Madden","doi":"10.1109/ICDE.2004.1320102","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320102","url":null,"abstract":"This is a three-hour seminar discussing the design and implementation of software systems as well as open research problems related to data processing and collection in wireless sensor networks. During the first hour-and-ahalf, we focus on the design of the TinyDB data collection system for networks of Berkeley motes running the TinyOS operating system. Then, during the remainder of the seminar, we survey relevant literature from the database, networking, and OS communities and identify a number of unsolved and inadequately addressed research problems. This seminar is intended for anyone interested in wireless sensor networks with a general background in computer science, be they users of sensor networks looking for an easy way to collect data, developers interested in the design of TinyOS and TinyDB, or researchers in search of challenging new problems.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124750673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
ToMAS: a system for adapting mappings while schemas evolve ToMAS:一个在模式发展时调整映射的系统
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320090
Yannis Velegrakis, Renée J. Miller, Lucian Popa, J. Mylopoulos
We demonstrate the Toronto Mapping Adaptation System (ToMAS), a tool for automatically detecting and adapting mappings that have become invalid or inconsistent due to changes in either data semantics or schemas. Due to its modular architecture and its stand-alone nature, ToMAS can easily be applied to numerous scenarios and can interoperate with many other tools. To the best of our knowledge, no other tool can correctly maintain the consistency of the mappings under schema changes at the level of complexity supported by ToMAS.
我们演示了多伦多映射适应系统(ToMAS),这是一个自动检测和适应由于数据语义或模式变化而无效或不一致的映射的工具。由于其模块化架构和独立的特性,ToMAS可以很容易地应用于许多场景,并且可以与许多其他工具进行互操作。据我们所知,没有其他工具能够在ToMAS所支持的复杂性级别上正确地维护模式更改下映射的一致性。
{"title":"ToMAS: a system for adapting mappings while schemas evolve","authors":"Yannis Velegrakis, Renée J. Miller, Lucian Popa, J. Mylopoulos","doi":"10.1109/ICDE.2004.1320090","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320090","url":null,"abstract":"We demonstrate the Toronto Mapping Adaptation System (ToMAS), a tool for automatically detecting and adapting mappings that have become invalid or inconsistent due to changes in either data semantics or schemas. Due to its modular architecture and its stand-alone nature, ToMAS can easily be applied to numerous scenarios and can interoperate with many other tools. To the best of our knowledge, no other tool can correctly maintain the consistency of the mappings under schema changes at the level of complexity supported by ToMAS.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130656820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Selectivity estimation for string predicates: overcoming the underestimation problem 字符串谓词的选择性估计:克服低估问题
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319999
S. Chaudhuri, Venkatesh Ganti, L. Gravano
Queries with (equality or LIKE) selection predicates over string attributes are widely used in relational databases. However, state-of-the-art techniques for estimating selectivities of string predicates are often biased towards severely underestimating selectivities. We develop accurate selectivity estimators for string predicates that adapt to data and query characteristics, and which can exploit and build on a variety of existing estimators. A thorough experimental evaluation over real data sets demonstrates the resilience of our estimators to variations in both data and query characteristics.
对字符串属性使用(相等或LIKE)选择谓词的查询在关系数据库中广泛使用。然而,用于估计字符串谓词选择性的最新技术往往倾向于严重低估选择性。我们开发了适合数据和查询特征的字符串谓词的精确选择性估计器,并且可以利用和构建各种现有的估计器。对真实数据集的彻底实验评估证明了我们的估计器对数据和查询特征变化的弹性。
{"title":"Selectivity estimation for string predicates: overcoming the underestimation problem","authors":"S. Chaudhuri, Venkatesh Ganti, L. Gravano","doi":"10.1109/ICDE.2004.1319999","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319999","url":null,"abstract":"Queries with (equality or LIKE) selection predicates over string attributes are widely used in relational databases. However, state-of-the-art techniques for estimating selectivities of string predicates are often biased towards severely underestimating selectivities. We develop accurate selectivity estimators for string predicates that adapt to data and query characteristics, and which can exploit and build on a variety of existing estimators. A thorough experimental evaluation over real data sets demonstrates the resilience of our estimators to variations in both data and query characteristics.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"2 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132329945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
Bulk operations for space-partitioning trees 空间分区树的批量操作
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319982
T. Ghanem, R. Shah, M. Mokbel, Walid G. Aref, J. Vitter
The emergence of extensible index structures, e.g., GiST (generalized search tree) [J.M. Hellerstein et al. (1995)] and SP-GiST (space-partitioning generalized search tree) [W. G Aref et al., (2001)], calls for a set of extensible algorithms to support different operations (e.g., insertion, deletion, and search). Extensible bulk operations (e.g., bulk loading and bulk insertion) are of the same importance and need to be supported in these index engines. In this paper, we propose two extensible buffer-based algorithms for bulk operations in the class of space-partitioning trees; a class of hierarchical data structures that recursively decompose the space into disjoint partitions. The main idea of these algorithms is to build an in-memory tree of the target space-partitioning index. Then, data items are recursively partitioned into disk-based buffers using the in-memory tree. Although the second algorithm is designed for bulk insertion, it can be used in bulk loading as well. The proposed extensible algorithms are implemented inside SP-GiST; a framework for supporting the class of space-partitioning trees. Both algorithms have I/O bound O(NH/B), where N is the number of data items to be bulk loaded/inserted, B is the number of tree nodes that can fit in one disk page, H is the tree height in terms of pages after applying a clustering algorithm. Experimental results are provided to show the scalability and applicability of the proposed algorithms for the class of space-partitioning trees. A comparison of the two proposed algorithms shows that the first algorithm performs better in case of bulk loading. However the second algorithm is more general and can be used for efficient bulk insertion.
可扩展索引结构的出现,例如GiST(广义搜索树)[J.M.Hellerstein et al.(1995)]和SP-GiST(空间划分广义搜索树)[W。G Aref等人,(2001)],需要一组可扩展的算法来支持不同的操作(例如,插入、删除和搜索)。可扩展的批量操作(例如,批量加载和批量插入)同样重要,需要在这些索引引擎中得到支持。在本文中,我们提出了两个可扩展的基于缓冲区的算法,用于空间分区树类的批量操作;递归地将空间分解为不相交的分区的一类分层数据结构。这些算法的主要思想是构建目标空间分区索引的内存树。然后,使用内存树将数据项递归地划分到基于磁盘的缓冲区中。虽然第二种算法是为批量插入设计的,但它也可以用于批量加载。提出的可扩展算法在SP-GiST内部实现;一个支持空间分区树类的框架。这两种算法都有I/O绑定O(NH/B),其中N是要批量加载/插入的数据项的数量,B是一个磁盘页面中可以容纳的树节点的数量,H是应用聚类算法后以页面为单位的树高度。实验结果表明了所提算法在空间划分树分类中的可扩展性和适用性。两种算法的比较表明,第一种算法在批量加载情况下性能更好。然而,第二种算法更通用,可用于高效的批量插入。
{"title":"Bulk operations for space-partitioning trees","authors":"T. Ghanem, R. Shah, M. Mokbel, Walid G. Aref, J. Vitter","doi":"10.1109/ICDE.2004.1319982","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319982","url":null,"abstract":"The emergence of extensible index structures, e.g., GiST (generalized search tree) [J.M. Hellerstein et al. (1995)] and SP-GiST (space-partitioning generalized search tree) [W. G Aref et al., (2001)], calls for a set of extensible algorithms to support different operations (e.g., insertion, deletion, and search). Extensible bulk operations (e.g., bulk loading and bulk insertion) are of the same importance and need to be supported in these index engines. In this paper, we propose two extensible buffer-based algorithms for bulk operations in the class of space-partitioning trees; a class of hierarchical data structures that recursively decompose the space into disjoint partitions. The main idea of these algorithms is to build an in-memory tree of the target space-partitioning index. Then, data items are recursively partitioned into disk-based buffers using the in-memory tree. Although the second algorithm is designed for bulk insertion, it can be used in bulk loading as well. The proposed extensible algorithms are implemented inside SP-GiST; a framework for supporting the class of space-partitioning trees. Both algorithms have I/O bound O(NH/B), where N is the number of data items to be bulk loaded/inserted, B is the number of tree nodes that can fit in one disk page, H is the tree height in terms of pages after applying a clustering algorithm. Experimental results are provided to show the scalability and applicability of the proposed algorithms for the class of space-partitioning trees. A comparison of the two proposed algorithms shows that the first algorithm performs better in case of bulk loading. However the second algorithm is more general and can be used for efficient bulk insertion.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126808481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
期刊
Proceedings. 20th International Conference on Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1