It is increasingly common to find XML views used to enforce access control as found in many applications and commercial database systems. To overcome the overhead of view materialization and maintenance, XML views are necessarily virtual. With this comes the need for answering XML queries posed over virtual views, by rewriting them into equivalent queries on the underlying documents. A major concern here is that query rewriting for recursive XML views is still an open problem, and proposed approaches deal only with non-recursive XML views. Moreover, a small number of works have studied the access rights for updates. In this paper, we present SVMAX (Secure and Valid MAnipulation of XML), the first system that supports specification and enforcement of both read and update access policies over arbitrary XML views (recursive or non). SVMAX defines general and expressive models for controlling access to XML data using significant class of XPath queries and in the presence of the update primitives of W3C XQuery Update Facility. Furthermore, SVMAX features an additional module enabling efficient validation of XML documents after primitive updates of XQuery. The wide use of W3C standards makes of SVMAX a useful system that can be easily integrated within commercial database systems as we will show. We give extensive experimental results, based on real-life DTDs, that show the efficiency and scalability of our system.
{"title":"SVMAX: a system for secure and valid manipulation of XML data","authors":"Houari Mahfoud, Abdessamad Imine, M. Rusinowitch","doi":"10.1145/2513591.2513657","DOIUrl":"https://doi.org/10.1145/2513591.2513657","url":null,"abstract":"It is increasingly common to find XML views used to enforce access control as found in many applications and commercial database systems. To overcome the overhead of view materialization and maintenance, XML views are necessarily virtual. With this comes the need for answering XML queries posed over virtual views, by rewriting them into equivalent queries on the underlying documents. A major concern here is that query rewriting for recursive XML views is still an open problem, and proposed approaches deal only with non-recursive XML views. Moreover, a small number of works have studied the access rights for updates. In this paper, we present SVMAX (Secure and Valid MAnipulation of XML), the first system that supports specification and enforcement of both read and update access policies over arbitrary XML views (recursive or non). SVMAX defines general and expressive models for controlling access to XML data using significant class of XPath queries and in the presence of the update primitives of W3C XQuery Update Facility. Furthermore, SVMAX features an additional module enabling efficient validation of XML documents after primitive updates of XQuery. The wide use of W3C standards makes of SVMAX a useful system that can be easily integrated within commercial database systems as we will show. We give extensive experimental results, based on real-life DTDs, that show the efficiency and scalability of our system.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"36 1","pages":"154-161"},"PeriodicalIF":0.0,"publicationDate":"2013-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85899753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The current generation of stream processing systems is in general built separately from the query engine thus lacks the expressive power of SQL and causes significant overhead in data access and movement. This situation has motivated us to leverage the query engine for stream processing. Stream-join is a window operation where the key issue is how to punctuate and pair two or more correlated streams. In this work we tackle this issue in the specific context of query engine supported stream processing. We focus on the following problems: a SQL query is definable on bounded relation data but stream data are unbounded, and join multiple streams is a stateful (thus history-sensitive) operation but a SQL query only cares about the current state; further, relation join typically requires relation re-scan in a nested-loop but by nature a stream cannot be re-captured as reading a stream always gets newly incoming data. To leverage query processing for analyzing unbounded stream, we defined the Epoch-based Continuous Query (ECQ) model which allows a SQL query to be executed epoch by epoch for processing the stream data chunk by chunk. However, unlike multiple one-time queries, an ECQ is a single, continuous query instance across execution epochs for keeping the continuity of the application state as required by the history-sensitive operations such as sliding-window join. To joining multiple streams, we further developed the techniques to cache one or more consecutive data chunks falling in a sliding window across query execution epochs in the ECQ instance, to allow them to be re-delivered from the cache. In this way join multiple streams and self-join a single stream in the data chunk based window or sliding window, with various pairing schemes, are made possible. We extended the PostgreSQL engine to support the proposed approach. Our experience has demonstrated its value.
{"title":"Stream-join revisited in the context of epoch-based SQL continuous query","authors":"Qiming Chen, M. Hsu","doi":"10.1145/2351476.2351491","DOIUrl":"https://doi.org/10.1145/2351476.2351491","url":null,"abstract":"The current generation of stream processing systems is in general built separately from the query engine thus lacks the expressive power of SQL and causes significant overhead in data access and movement. This situation has motivated us to leverage the query engine for stream processing.\u0000 Stream-join is a window operation where the key issue is how to punctuate and pair two or more correlated streams. In this work we tackle this issue in the specific context of query engine supported stream processing. We focus on the following problems: a SQL query is definable on bounded relation data but stream data are unbounded, and join multiple streams is a stateful (thus history-sensitive) operation but a SQL query only cares about the current state; further, relation join typically requires relation re-scan in a nested-loop but by nature a stream cannot be re-captured as reading a stream always gets newly incoming data.\u0000 To leverage query processing for analyzing unbounded stream, we defined the Epoch-based Continuous Query (ECQ) model which allows a SQL query to be executed epoch by epoch for processing the stream data chunk by chunk. However, unlike multiple one-time queries, an ECQ is a single, continuous query instance across execution epochs for keeping the continuity of the application state as required by the history-sensitive operations such as sliding-window join.\u0000 To joining multiple streams, we further developed the techniques to cache one or more consecutive data chunks falling in a sliding window across query execution epochs in the ECQ instance, to allow them to be re-delivered from the cache. In this way join multiple streams and self-join a single stream in the data chunk based window or sliding window, with various pairing schemes, are made possible.\u0000 We extended the PostgreSQL engine to support the proposed approach. Our experience has demonstrated its value.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"89 1","pages":"130-138"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79982345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Grid systems become very popular during the last decade because of their rapidly increasing computational capabilities. On the other hand, the advances on different domains cause enormous increase in the scale of the manipulated data. This issue augments the importance of distributed query processing and causes researchers to port their underlying environment onto the grid systems. However the dynamicity, heterogeneity and large scale characteristics of grid systems pose new problems for the distributed query processing domain. Resource allocation for query processing in grid systems is one of these problems, which attracts many researchers' attention. In this paper, we propose a new resource allocation algorithm for one relational join operator in a query considering characteristics of the grid systems. We provide theoretical analyses of the proposed algorithm and we consolidate analyses with the simulations.
{"title":"Resource allocation algorithm for a relational join operator in grid systems","authors":"D. Cokuslu, A. Hameurlain, K. Erciyes, F. Morvan","doi":"10.1145/2351476.2351492","DOIUrl":"https://doi.org/10.1145/2351476.2351492","url":null,"abstract":"Grid systems become very popular during the last decade because of their rapidly increasing computational capabilities. On the other hand, the advances on different domains cause enormous increase in the scale of the manipulated data. This issue augments the importance of distributed query processing and causes researchers to port their underlying environment onto the grid systems. However the dynamicity, heterogeneity and large scale characteristics of grid systems pose new problems for the distributed query processing domain. Resource allocation for query processing in grid systems is one of these problems, which attracts many researchers' attention. In this paper, we propose a new resource allocation algorithm for one relational join operator in a query considering characteristics of the grid systems. We provide theoretical analyses of the proposed algorithm and we consolidate analyses with the simulations.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"74 1","pages":"139-145"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84414334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Rodríguez-Mazahua, Xiaoou Li, Jair Cervantes, Farid García
In recent years, vertical partitioning techniques have been employed in multimedia databases to achieve efficient retrieval of multimedia objects. These techniques are static because the input to the partitioning process, which includes queries accessing database and their frequency as well as the database schema, is obtained from an earlier analysis stage. This implies that when the system undergoes sufficient changes, a new analysis stage is carried out to re-run the partitioning process. Multimedia databases are accessed by many users simultaneously, therefore queries and their frequency tend to quickly change over time. In this context, dynamic vertical partitioning can significantly improve performance. In this paper we present an active system called DYMOND (DYnamic Multimedia ON line Distribution), which performs a dynamic vertical partitioning in multimedia databases to improve query performance. Experimental results on benchmark multimedia databases clarify the validness of our system.
近年来,为了实现多媒体对象的高效检索,多媒体数据库采用了垂直分区技术。这些技术是静态的,因为分区过程的输入(包括访问数据库的查询及其频率以及数据库模式)是从较早的分析阶段获得的。这意味着当系统发生足够的变化时,将执行一个新的分析阶段来重新运行分区过程。多媒体数据库是由许多用户同时访问的,因此查询及其频率往往会随着时间的推移而迅速变化。在这种情况下,动态垂直分区可以显著提高性能。本文提出了一个动态多媒体在线分布系统DYMOND (DYnamic Multimedia ON line Distribution),该系统对多媒体数据库进行动态垂直分区以提高查询性能。在多媒体基准数据库上的实验结果验证了系统的有效性。
{"title":"DYMOND: an active system for dynamic vertical partitioning of multimedia databases","authors":"L. Rodríguez-Mazahua, Xiaoou Li, Jair Cervantes, Farid García","doi":"10.1145/2351476.2351485","DOIUrl":"https://doi.org/10.1145/2351476.2351485","url":null,"abstract":"In recent years, vertical partitioning techniques have been employed in multimedia databases to achieve efficient retrieval of multimedia objects. These techniques are static because the input to the partitioning process, which includes queries accessing database and their frequency as well as the database schema, is obtained from an earlier analysis stage. This implies that when the system undergoes sufficient changes, a new analysis stage is carried out to re-run the partitioning process. Multimedia databases are accessed by many users simultaneously, therefore queries and their frequency tend to quickly change over time. In this context, dynamic vertical partitioning can significantly improve performance. In this paper we present an active system called DYMOND (DYnamic Multimedia ON line Distribution), which performs a dynamic vertical partitioning in multimedia databases to improve query performance. Experimental results on benchmark multimedia databases clarify the validness of our system.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"1 1","pages":"71-80"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90710342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social networks drive todays opinions and content diffusion. Large scale, distributed and unpredictable social data streams are produced and such evolving data production offers the ground for the data mining and analysis tasks. Such social data streams embed human reactions and inter-relationships and affective and emotional analysis has become rather important in todays applications. This work highlights the major data structures and methodologies used in evolving social data mining and proceeds to the relevant affective analysis techniques. A particular framework is outlined along with indicative applications which employ evolving social data analysis with emphasis on the seminal criteria of topic, location and time. Such mining and analysis overview is beneficial for various scientific and enterpreneural audiences and communities in the social networking area.
{"title":"Evolving social data mining and affective analysis methodologies, framework and applications","authors":"A. Vakali","doi":"10.1145/2351476.2351477","DOIUrl":"https://doi.org/10.1145/2351476.2351477","url":null,"abstract":"Social networks drive todays opinions and content diffusion. Large scale, distributed and unpredictable social data streams are produced and such evolving data production offers the ground for the data mining and analysis tasks. Such social data streams embed human reactions and inter-relationships and affective and emotional analysis has become rather important in todays applications. This work highlights the major data structures and methodologies used in evolving social data mining and proceeds to the relevant affective analysis techniques. A particular framework is outlined along with indicative applications which employ evolving social data analysis with emphasis on the seminal criteria of topic, location and time. Such mining and analysis overview is beneficial for various scientific and enterpreneural audiences and communities in the social networking area.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"57 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90929046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In facility management for plants and buildings, needs of facility diagnosis for saving energy or facility management cost by analyzing time series data from sensors of equipments in facilities have been increasing. This paper proposes a relation-based stream query language TPQL (Trend Pattern Query Language) for expressing constraints in time series data for anomalies detection in facilities. The features of TPQL are the following. (1) TPQL introduces a convolution operator into a stream query language in order to describe constraints over sliding window. A convolution operator which takes a window function as an argument can express various domain dependent functions extracting feature over sliding windows such as duration constraint and hunting constraint. (2) TPQL introduces time-interval based join into stream query language in order to join time series data with different sampling rates.
{"title":"A stream query language TPQL for anomaly detection in facility management","authors":"Makoto Imamura, S. Takayama, T. Munaka","doi":"10.1145/2351476.2351506","DOIUrl":"https://doi.org/10.1145/2351476.2351506","url":null,"abstract":"In facility management for plants and buildings, needs of facility diagnosis for saving energy or facility management cost by analyzing time series data from sensors of equipments in facilities have been increasing. This paper proposes a relation-based stream query language TPQL (Trend Pattern Query Language) for expressing constraints in time series data for anomalies detection in facilities. The features of TPQL are the following. (1) TPQL introduces a convolution operator into a stream query language in order to describe constraints over sliding window. A convolution operator which takes a window function as an argument can express various domain dependent functions extracting feature over sliding windows such as duration constraint and hunting constraint. (2) TPQL introduces time-interval based join into stream query language in order to join time series data with different sampling rates.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"13 1","pages":"235-238"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87099034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Parallel Shared-Nothing architectures are frequently used to handle large star-schema Data Warehouses (DW). The continuous increase in data volume and the star-schema storage organization introduce severe limitations to scalability due to the well-known parallel join issues and the resulting need to use solutions such as on-the fly repartitioning of data or intermediate results, or massive replication of large data sets that still need to be joined locally, constraining their ability to deliver fast results. Parallelism may improve query performance, however some business decisions may require that query results be timely available which, even with additional parallelism and significant upgrade costs (both monetary and due to disturbance of normal operations), cannot be guaranteed. We propose a Timely-aware Execution Parallel Architecture (TEEPA) which balances data load and query processing among an elastic set of non-dedicated heterogeneous nodes in order to provide scale-out performance and timely query results. Data is allocated using adaptable storage models to minimize join costs (the major uncertainty factor) which best fit the nodes' capabilities, while preserving a consistent logical view of the star-schema. We present experimental evaluation of TEEPA and demonstrate its ability to provide timely results.
{"title":"TEEPA: a timely-aware elastic parallel architecture","authors":"J. Costa, P. Martins, J. Cecílio, P. Furtado","doi":"10.1145/2351476.2351480","DOIUrl":"https://doi.org/10.1145/2351476.2351480","url":null,"abstract":"Parallel Shared-Nothing architectures are frequently used to handle large star-schema Data Warehouses (DW). The continuous increase in data volume and the star-schema storage organization introduce severe limitations to scalability due to the well-known parallel join issues and the resulting need to use solutions such as on-the fly repartitioning of data or intermediate results, or massive replication of large data sets that still need to be joined locally, constraining their ability to deliver fast results. Parallelism may improve query performance, however some business decisions may require that query results be timely available which, even with additional parallelism and significant upgrade costs (both monetary and due to disturbance of normal operations), cannot be guaranteed. We propose a Timely-aware Execution Parallel Architecture (TEEPA) which balances data load and query processing among an elastic set of non-dedicated heterogeneous nodes in order to provide scale-out performance and timely query results. Data is allocated using adaptable storage models to minimize join costs (the major uncertainty factor) which best fit the nodes' capabilities, while preserving a consistent logical view of the star-schema. We present experimental evaluation of TEEPA and demonstrate its ability to provide timely results.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"37 1","pages":"24-31"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75154667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Today, thanks to vehicular networks, drivers may receive useful information produced or relayed by neighboring sensors or vehicles (e.g., the location of an available parking space, of a traffic congestion, etc.). In this paper, we address the problem of providing assistance to the driver when no recent information has been received on his/her vehicle. Therefore, we present a cooperative scheme to aggregate, store and exchange these events in order to have an history of past events. This scheme is based on a dedicated spatio-temporal aggregation structure using Flajolet-Martin sketches and deployed on each vehicle. Contrary to existing approaches considering data aggregation in vehicular networks, our main goal here is not to save network bandwidth but rather to extract useful knowledge from previous observations. In this paper, we present our aggregation data structure, the associated exchange protocol and a set of experiments showing the effectiveness of our proposal.
{"title":"A cooperative scheme to aggregate spatio-temporal events in VANETs","authors":"D. Zekri, Bruno Defude, T. Delot","doi":"10.1145/2351476.2351488","DOIUrl":"https://doi.org/10.1145/2351476.2351488","url":null,"abstract":"Today, thanks to vehicular networks, drivers may receive useful information produced or relayed by neighboring sensors or vehicles (e.g., the location of an available parking space, of a traffic congestion, etc.). In this paper, we address the problem of providing assistance to the driver when no recent information has been received on his/her vehicle. Therefore, we present a cooperative scheme to aggregate, store and exchange these events in order to have an history of past events. This scheme is based on a dedicated spatio-temporal aggregation structure using Flajolet-Martin sketches and deployed on each vehicle. Contrary to existing approaches considering data aggregation in vehicular networks, our main goal here is not to save network bandwidth but rather to extract useful knowledge from previous observations. In this paper, we present our aggregation data structure, the associated exchange protocol and a set of experiments showing the effectiveness of our proposal.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"12 1","pages":"100-109"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86991588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Run-length encoding is a popular compression scheme which is used extensively to compress the attribute values in column stores. Out of order insertion of tuples potentially degrades the compression achieved using run-length encoding and consequently, the performance of reads. The in-place insertions, deletions and updates of tuples into a column store relation with n tuples take O(n) time. The linear cost is typically avoided by amortizing the cost of updates in batches. However, the relation is decompressed and subsequently re-compressed after applying a batch of updates. This leads to added time time complexity. We propose a novel indexing scheme called count indexes that supports O(log n) in-place insertions, deletions, updates and look ups on a run-length encoded sequence with n runs. We also show that count indexes efficiently update a batch of tuples requiring almost a constant time per updated tuple. Additionally, we show that count indexes are optimal. We extend count indexes to support O(log n) updates on bitmapped sequences with n values and adapt them to block-based stores.
{"title":"Incrementally maintaining run-length encoded attributes in column stores","authors":"Abhijeet Mohapatra, M. Genesereth","doi":"10.1145/2351476.2351493","DOIUrl":"https://doi.org/10.1145/2351476.2351493","url":null,"abstract":"Run-length encoding is a popular compression scheme which is used extensively to compress the attribute values in column stores. Out of order insertion of tuples potentially degrades the compression achieved using run-length encoding and consequently, the performance of reads. The in-place insertions, deletions and updates of tuples into a column store relation with n tuples take O(n) time. The linear cost is typically avoided by amortizing the cost of updates in batches. However, the relation is decompressed and subsequently re-compressed after applying a batch of updates. This leads to added time time complexity. We propose a novel indexing scheme called count indexes that supports O(log n) in-place insertions, deletions, updates and look ups on a run-length encoded sequence with n runs. We also show that count indexes efficiently update a batch of tuples requiring almost a constant time per updated tuple. Additionally, we show that count indexes are optimal. We extend count indexes to support O(log n) updates on bitmapped sequences with n values and adapt them to block-based stores.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"315 1","pages":"146-154"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83448415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Martínez, C. Collet, Christophe Bobineau, Etienne Dublé
This paper concerns the integration of the Case Based Reasoning (CBR) paradigm in query processing, providing a way to optimize queries when there is no prior knowledge on queried data sources and certainly no related metadata such as data statistics. Our Query Optimization by Learning (QOL) approach optimizes queries using cases generated from the evaluation of similar past queries. A query case comprises: (i) the query, (ii) the query plan and (iii) the measures (computational resources consumed) of the query plan. The work also concerns the way the CBR process interacts with the query plan generation process. This process uses classical heuristics and makes decisions randomly (e.g. when there is no statistics for join ordering and selection of algorithms, routing protocols); It also (re)uses cases (existing query plans) for similar queries parts, improving the query optimization and evaluation efficiency.
本文研究了基于案例推理(Case Based Reasoning, CBR)范式在查询处理中的集成,提供了一种在没有关于查询数据源的先验知识和没有相关元数据(如数据统计)的情况下优化查询的方法。我们的学习查询优化(Query Optimization by Learning, QOL)方法使用由过去类似查询的评估生成的案例来优化查询。查询用例包括:(i)查询,(ii)查询计划和(iii)查询计划的度量(消耗的计算资源)。这项工作还涉及CBR流程与查询计划生成流程交互的方式。这个过程使用经典的启发式并随机做出决策(例如,当没有统计数据用于连接排序和算法选择时,路由协议);它还(重新)使用了类似查询部分的用例(现有查询计划),提高了查询优化和求值效率。
{"title":"The QOL approach for optimizing distributed queries without complete knowledge","authors":"L. Martínez, C. Collet, Christophe Bobineau, Etienne Dublé","doi":"10.1145/2351476.2351487","DOIUrl":"https://doi.org/10.1145/2351476.2351487","url":null,"abstract":"This paper concerns the integration of the Case Based Reasoning (CBR) paradigm in query processing, providing a way to optimize queries when there is no prior knowledge on queried data sources and certainly no related metadata such as data statistics. Our Query Optimization by Learning (QOL) approach optimizes queries using cases generated from the evaluation of similar past queries. A query case comprises: (i) the query, (ii) the query plan and (iii) the measures (computational resources consumed) of the query plan. The work also concerns the way the CBR process interacts with the query plan generation process. This process uses classical heuristics and makes decisions randomly (e.g. when there is no statistics for join ordering and selection of algorithms, routing protocols); It also (re)uses cases (existing query plans) for similar queries parts, improving the query optimization and evaluation efficiency.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"60 3","pages":"91-99"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72630668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}