A Run-Time Based Technique to Optimize Queries in Distributed Internet Databases
L. Khan, A. Ponnusamy, D. McLeod, C. Shahabi
{"title":"A Run-Time Based Technique to Optimize Queries in Distributed Internet Databases","authors":"L. Khan, A. Ponnusamy, D. McLeod, C. Shahabi","doi":"10.4018/978-1-59140-063-9.CH007","DOIUrl":null,"url":null,"abstract":"An adaptive probe-based optimization technique is developed and demonstrated in the context of an Internet-based distributed database environment. More and more common are database systems, which are distributed across servers communicating via the Internet where a query at a given site might require data from remote sites. Optimizing the response time of such queries is a challenging task due to the unpredictability of server 701 E. Chocolate Avenue, Hershey PA 17033-1240, USA Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.idea-group.com IDEA GROUP PUBLISHING This chapter appears in the book, Advanced Topics in Database Research, edited by Keng Siau. Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. Queries in Distributed Internet Databases 129 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. performance and network traffic at the time of data shipment; this may result in the selection of an expensive query plan using a static query optimizer. We constructed an experimental setup consisting of two servers running the same DBMS connected via the Internet. Concentrating on join queries, we demonstrate how a static query optimizer might choose an expensive plan by mistake. This is due to the lack of a priori knowledge of the run-time environment, inaccurate statistical assumptions in size estimation, and neglecting the cost of remote method invocation. These shortcomings are addressed collectively by proposing a probing mechanism. Furthermore, we extend our mechanism with an adaptive technique that detects sub-optimality of a plan during query execution and attempts to switch to the cheapest plan while avoiding redundant work and imposing little overhead. We demonstrate that this probe technique can be extended in a client-server environment as a basis for choosing the right place for the execution of user defined functions (UDFs). An implementation of our run-time optimization technique for queries was constructed in the Java language and incorporated into an experimental setup. The results demonstrate the superiority of our probebased optimization over a static optimization. INTRODUCTION A distributed database is a collection of partially independent databases that share a common schema, and coordinates processing of non-local transactions. Processors communicate with one another through a communication network (Silberschatz, Korth, & Sudarshan, 1997; Yu & Meng, 1998). We focus on distributed database systems with sites running homogeneous software (i.e., database management system, DBMS) on heterogeneous hardware (e.g., PC and Unix workstations) connected via the Internet. The Internet databases are appropriate for organizations consisting of a number of almost independent suborganizations, such as a university with many departments or a bank with many branches. The idea is to partition data across multiple geographically or administratively distributed sites where each site runs an almost autonomous database system. In a distributed database system, some queries require the participation of multiple sites, each processing part of the query as well as transferring data back and forth among themselves. Since usually there is more than one plan to execute such a query, it is crucial to obtain the cost of each plan, which highly depends on the amount of participation by each site as well as the amount of data shipment between the sites. Assuming a private/dedicated network and servers, this cost can 32 more pages are available in the full version of this document, which may be purchased using the \"Add to Cart\" button on the publisher's webpage: www.igi-global.com/chapter/run-time-based-techniqueoptimize/4344","PeriodicalId":332833,"journal":{"name":"Advanced Topics in Database Research, Vol. 2","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Topics in Database Research, Vol. 2","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/978-1-59140-063-9.CH007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
An adaptive probe-based optimization technique is developed and demonstrated in the context of an Internet-based distributed database environment. More and more common are database systems, which are distributed across servers communicating via the Internet where a query at a given site might require data from remote sites. Optimizing the response time of such queries is a challenging task due to the unpredictability of server 701 E. Chocolate Avenue, Hershey PA 17033-1240, USA Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.idea-group.com IDEA GROUP PUBLISHING This chapter appears in the book, Advanced Topics in Database Research, edited by Keng Siau. Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. Queries in Distributed Internet Databases 129 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. performance and network traffic at the time of data shipment; this may result in the selection of an expensive query plan using a static query optimizer. We constructed an experimental setup consisting of two servers running the same DBMS connected via the Internet. Concentrating on join queries, we demonstrate how a static query optimizer might choose an expensive plan by mistake. This is due to the lack of a priori knowledge of the run-time environment, inaccurate statistical assumptions in size estimation, and neglecting the cost of remote method invocation. These shortcomings are addressed collectively by proposing a probing mechanism. Furthermore, we extend our mechanism with an adaptive technique that detects sub-optimality of a plan during query execution and attempts to switch to the cheapest plan while avoiding redundant work and imposing little overhead. We demonstrate that this probe technique can be extended in a client-server environment as a basis for choosing the right place for the execution of user defined functions (UDFs). An implementation of our run-time optimization technique for queries was constructed in the Java language and incorporated into an experimental setup. The results demonstrate the superiority of our probebased optimization over a static optimization. INTRODUCTION A distributed database is a collection of partially independent databases that share a common schema, and coordinates processing of non-local transactions. Processors communicate with one another through a communication network (Silberschatz, Korth, & Sudarshan, 1997; Yu & Meng, 1998). We focus on distributed database systems with sites running homogeneous software (i.e., database management system, DBMS) on heterogeneous hardware (e.g., PC and Unix workstations) connected via the Internet. The Internet databases are appropriate for organizations consisting of a number of almost independent suborganizations, such as a university with many departments or a bank with many branches. The idea is to partition data across multiple geographically or administratively distributed sites where each site runs an almost autonomous database system. In a distributed database system, some queries require the participation of multiple sites, each processing part of the query as well as transferring data back and forth among themselves. Since usually there is more than one plan to execute such a query, it is crucial to obtain the cost of each plan, which highly depends on the amount of participation by each site as well as the amount of data shipment between the sites. Assuming a private/dedicated network and servers, this cost can 32 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the publisher's webpage: www.igi-global.com/chapter/run-time-based-techniqueoptimize/4344
基于运行时的分布式Internet数据库查询优化技术
在基于internet的分布式数据库环境中,开发并演示了一种基于自适应探测的优化技术。越来越常见的是数据库系统,它们分布在通过Internet通信的服务器上,其中给定站点上的查询可能需要来自远程站点的数据。优化此类查询的响应时间是一项具有挑战性的任务,因为服务器701 E. Chocolate Avenue, Hershey PA 17033-1240, USA Tel: 717/533-8845;传真717/533 - 8661;URL-http://www.idea-group.com IDEA GROUP PUBLISHING本章摘自sieng编辑的《数据库研究高级专题》一书。Idea Group Inc.版权所有©2003未经Idea Group Inc.书面许可,禁止以印刷或电子形式复制或分发。版权所有©2003,Idea Group Inc.。未经Idea Group Inc.书面许可,禁止以印刷或电子形式复制或分发。数据传输时的性能和网络流量;这可能导致使用静态查询优化器选择昂贵的查询计划。我们构建了一个实验装置,由两台运行相同DBMS的服务器组成,它们通过Internet连接在一起。专注于连接查询,我们将演示静态查询优化器如何错误地选择昂贵的计划。这是由于缺乏对运行时环境的先验知识,在大小估计中不准确的统计假设,以及忽略远程方法调用的成本。通过提出一种探测机制来解决这些缺点。此外,我们使用一种自适应技术扩展了我们的机制,该技术在查询执行期间检测计划的次优性,并尝试切换到最便宜的计划,同时避免冗余工作和施加很少的开销。我们演示了这种探测技术可以在客户机-服务器环境中扩展,作为选择执行用户定义函数(udf)的正确位置的基础。我们的查询运行时优化技术的实现是用Java语言构建的,并被合并到一个实验设置中。结果表明,基于探针的优化优于静态优化。分布式数据库是部分独立的数据库的集合,这些数据库共享一个共同的模式,并协调非本地事务的处理。处理器通过通信网络相互通信(Silberschatz, Korth, & Sudarshan, 1997;Yu & b孟,1998)。我们专注于分布式数据库系统,其站点在异构硬件(如PC和Unix工作站)上运行通过Internet连接的同质软件(即数据库管理系统,DBMS)。Internet数据库适用于由许多几乎独立的子组织组成的组织,例如具有许多部门的大学或具有许多分支机构的银行。其思想是跨多个地理上或管理上分布的站点对数据进行分区,每个站点运行一个几乎自治的数据库系统。在分布式数据库系统中,一些查询需要多个站点的参与,每个站点处理查询的一部分,并在它们之间来回传输数据。由于通常有多个计划来执行这样的查询,因此获取每个计划的成本至关重要,这在很大程度上取决于每个站点的参与数量以及站点之间的数据传输量。假设有一个私人/专用的网络和服务器,这笔费用可以在这个文档的完整版本中提供32个以上的页面,可以使用出版商网页上的“添加到购物车”按钮购买:www.igi-global.com/chapter/run-time-based-techniqueoptimize/4344
本文章由计算机程序翻译,如有差异,请以英文原文为准。