Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1320015
W. Litwin, T. Schwarz
Signatures detect changes to data objects. Numerous schemes are in use, especially the cryptographically secure standards SHA-1. We propose a novel signature scheme which we call algebraic signatures. The scheme uses the Galois field calculations. Its major property is the sure detection of any changes up to a parameterized size. More precisely, we detect for sure any changes that do not exceed n-symbols for an n-symbol algebraic signature. This property is new for any known signature scheme. For larger changes, the collision probability is typically negligible, as for the other known schemes. We apply the algebraic signatures to the scalable distributed data structures (SDDS). We filter at the SDDS client node the updates that do not actually change the records. We also manage the concurrent updates to data stored in the SDDS RAM buckets at the server nodes. We further use the scheme for the fast disk backup of these buckets. We sign our objects with 4-byte signatures, instead of 20-byte standard SHA-1 signatures. Our algebraic calculus is then also about twice as fast.
{"title":"Algebraic signatures for scalable distributed data structures","authors":"W. Litwin, T. Schwarz","doi":"10.1109/ICDE.2004.1320015","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320015","url":null,"abstract":"Signatures detect changes to data objects. Numerous schemes are in use, especially the cryptographically secure standards SHA-1. We propose a novel signature scheme which we call algebraic signatures. The scheme uses the Galois field calculations. Its major property is the sure detection of any changes up to a parameterized size. More precisely, we detect for sure any changes that do not exceed n-symbols for an n-symbol algebraic signature. This property is new for any known signature scheme. For larger changes, the collision probability is typically negligible, as for the other known schemes. We apply the algebraic signatures to the scalable distributed data structures (SDDS). We filter at the SDDS client node the updates that do not actually change the records. We also manage the concurrent updates to data stored in the SDDS RAM buckets at the server nodes. We further use the scheme for the fast disk backup of these buckets. We sign our objects with 4-byte signatures, instead of 20-byte standard SHA-1 signatures. Our algebraic calculus is then also about twice as fast.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121149513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1320011
Xuemin Lin, Hongjun Lu, Jian Xu, J. Yu
Statistics over the most recently observed data elements are often required in applications involving data streams, such as intrusion detection in network monitoring, stock price prediction in financial markets, Web log mining for access prediction, and user click stream mining for personalization. Among various statistics, computing quantile summary is probably most challenging because of its complexity. We study the problem of continuously maintaining quantile summary of the most recently observed N elements over a stream so that quantile queries can be answered with a guaranteed precision of /spl epsiv/N. We developed a space efficient algorithm for predefined N that requires only one scan of the input data stream and O(log(/spl epsiv//sup 2/N)//spl epsiv/+1//spl epsiv//sup 2/) space in the worst cases. We also developed an algorithm that maintains quantile summaries for most recent N elements so that quantile queries on any most recent n elements (n /spl les/ N) can be answered with a guaranteed precision of /spl epsiv/n. The worst case space requirement for this algorithm is only O(log/sup 2/(/spl epsiv/N)//spl epsiv//sup 2/). Our performance study indicated that not only the actual quantile estimation error is far below the guaranteed precision but the space requirement is also much less than the given theoretical bound.
{"title":"Continuously maintaining quantile summaries of the most recent N elements over a data stream","authors":"Xuemin Lin, Hongjun Lu, Jian Xu, J. Yu","doi":"10.1109/ICDE.2004.1320011","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320011","url":null,"abstract":"Statistics over the most recently observed data elements are often required in applications involving data streams, such as intrusion detection in network monitoring, stock price prediction in financial markets, Web log mining for access prediction, and user click stream mining for personalization. Among various statistics, computing quantile summary is probably most challenging because of its complexity. We study the problem of continuously maintaining quantile summary of the most recently observed N elements over a stream so that quantile queries can be answered with a guaranteed precision of /spl epsiv/N. We developed a space efficient algorithm for predefined N that requires only one scan of the input data stream and O(log(/spl epsiv//sup 2/N)//spl epsiv/+1//spl epsiv//sup 2/) space in the worst cases. We also developed an algorithm that maintains quantile summaries for most recent N elements so that quantile queries on any most recent n elements (n /spl les/ N) can be answered with a guaranteed precision of /spl epsiv/n. The worst case space requirement for this algorithm is only O(log/sup 2/(/spl epsiv/N)//spl epsiv//sup 2/). Our performance study indicated that not only the actual quantile estimation error is far below the guaranteed precision but the space requirement is also much less than the given theoretical bound.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"358 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122746675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1320095
D. Lomet
Data Engineering deals with the use of engineering techniques and methodologies in the design, development and assessment of information systems for different computing platforms and application environments. The 20th International Conference on Data Engineering will be held in Boston, Massachusetts, USA-an academic and technological center with a variety of historical and cultural attractions of international prominence within walking distance.
{"title":"Database kernel research: what, if anything, is left to do?","authors":"D. Lomet","doi":"10.1109/ICDE.2004.1320095","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320095","url":null,"abstract":"Data Engineering deals with the use of engineering techniques and methodologies in the design, development and assessment of information systems for different computing platforms and application environments. The 20th International Conference on Data Engineering will be held in Boston, Massachusetts, USA-an academic and technological center with a variety of historical and cultural attractions of international prominence within walking distance.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133814423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1320051
M. Carey
This presentation provides a technical overview of BEA Liquid Data for WebLogic, a relatively new product from BEA Systems that provides enterprise information integration capabilities to enterprise applications that are built and deployed using the BEA WebLogic Platform. Liquid Data takes an XML-centric approach to tackling the long-standing problem of integrating data from disparate data sources and making that information easily accessible to applications. In particular, Liquid Data uses the forthcoming XQuery language standard as the basis for defining integrated views of enterprise data and querying over those views. We provide a brief overview of the Liquid Data product architecture and then discuss some of the query processing technology that lies at the heart of the product.
本报告提供了BEA Liquid Data for WebLogic的技术概述,BEA是BEA Systems的一款相对较新的产品,它为使用BEA WebLogic平台构建和部署的企业应用程序提供了企业信息集成功能。Liquid Data采用以xml为中心的方法来解决长期存在的问题,即集成来自不同数据源的数据,并使应用程序能够轻松访问这些信息。特别地,Liquid Data使用即将发布的XQuery语言标准作为定义企业数据集成视图和对这些视图进行查询的基础。我们简要概述了Liquid Data产品架构,然后讨论了该产品核心的一些查询处理技术。
{"title":"BEA liquid data for WebLogic: XML-based enterprise information integration","authors":"M. Carey","doi":"10.1109/ICDE.2004.1320051","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320051","url":null,"abstract":"This presentation provides a technical overview of BEA Liquid Data for WebLogic, a relatively new product from BEA Systems that provides enterprise information integration capabilities to enterprise applications that are built and deployed using the BEA WebLogic Platform. Liquid Data takes an XML-centric approach to tackling the long-standing problem of integrating data from disparate data sources and making that information easily accessible to applications. In particular, Liquid Data uses the forthcoming XQuery language standard as the basis for defining integrated views of enterprise data and querying over those views. We provide a brief overview of the Liquid Data product architecture and then discuss some of the query processing technology that lies at the heart of the product.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115196792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1320083
Amarnath Gupta, B. Liu, Pilho Kim, R. Jain
In the case of media and feature streams, explicit inter-stream constraints exist and can be exploited in the evaluation of continuous queries in the spirit of semantic query optimization. We express these properties using a media stream declaration language MSDL. In the demonstration, we present IMMERSI-MEET, an application built around an immersive environment. The IMMERSI-MEET system distinguishes between continuous streams, where values of different types come at a specified data rate, and discrete streams where sources push values intermittently. In MSDL, any dependence declaration must have at least one dependency specifying predicate in the body. As stream declarations are registered, the stream constraints are interpreted to construct a set of evaluation directives.
{"title":"Using stream semantics for continuous queries in media stream processors","authors":"Amarnath Gupta, B. Liu, Pilho Kim, R. Jain","doi":"10.1109/ICDE.2004.1320083","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320083","url":null,"abstract":"In the case of media and feature streams, explicit inter-stream constraints exist and can be exploited in the evaluation of continuous queries in the spirit of semantic query optimization. We express these properties using a media stream declaration language MSDL. In the demonstration, we present IMMERSI-MEET, an application built around an immersive environment. The IMMERSI-MEET system distinguishes between continuous streams, where values of different types come at a specified data rate, and discrete streams where sources push values intermittently. In MSDL, any dependence declaration must have at least one dependency specifying predicate in the body. As stream declarations are registered, the stream constraints are interpreted to construct a set of evaluation directives.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123361791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1320101
Philip A. Bernstein, Sergey Melnik
By meta data management, we mean techniques for manipulating schemas and schema-like objects (such as interface definitions and web site maps) and mappings between them. Work on meta data problems goes back to at least the early 1970s, when data translation was the hot database research topic, even before relational databases caught on. Many popular research problems in the past five years are primarily meta data problems, such as data warehouse tools (e.g., ETL – to extract, transform and load), data integration, the semantic web, generation of XML or object-oriented wrappers for SQL databases, and generation of wrappers for web sites. Other classical meta data problems are information resource management, design tool support and integration, and schema evolution and data migration. Despite its longevity and continued importance, there is no widely-accepted conceptual framework for the meta data field, as there is for many other database topics, such as access methods, query processing, and transaction management. In this seminar, we propose such a conceptual framework. It consists of three layers: applications, design patterns, and basic operators. Applications are the end-user problems to be solved, like those listed in the previous paragraph. Design patterns are generic problems that need to be solved in support of many different applications, such as meta modeling (for all meta data problems), answering queries using views (for data integration and the semantic web), and change propagation (for data translation, schema evolution, and round-trip engineering). Basic operators are procedures that are needed to support multiple design patterns and applications, such as matching schemas to produce a mapping, merging schemas based on a mapping, and composing mappings. We will describe several meta data management problems, and for each, we will explain the design patterns and operators that are needed to solve it. We will summarize the main approaches to each design pattern and operator – the main choices of language, data structures, and algorithms – and will highlight the relevant papers that address it. This seminar is targeted at both practicing engineers and researchers. The former will learn about the latest solutions to important meta data problems and the many difficult unsolved problems that are best to avoid. Database researchers, especially professors, will benefit from considering the conceptual framework that we propose, since no database textbooks treat meta data management as a separate topic as far as we know.
{"title":"Meta data management","authors":"Philip A. Bernstein, Sergey Melnik","doi":"10.1109/ICDE.2004.1320101","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320101","url":null,"abstract":"By meta data management, we mean techniques for manipulating schemas and schema-like objects (such as interface definitions and web site maps) and mappings between them. Work on meta data problems goes back to at least the early 1970s, when data translation was the hot database research topic, even before relational databases caught on. Many popular research problems in the past five years are primarily meta data problems, such as data warehouse tools (e.g., ETL – to extract, transform and load), data integration, the semantic web, generation of XML or object-oriented wrappers for SQL databases, and generation of wrappers for web sites. Other classical meta data problems are information resource management, design tool support and integration, and schema evolution and data migration. Despite its longevity and continued importance, there is no widely-accepted conceptual framework for the meta data field, as there is for many other database topics, such as access methods, query processing, and transaction management. In this seminar, we propose such a conceptual framework. It consists of three layers: applications, design patterns, and basic operators. Applications are the end-user problems to be solved, like those listed in the previous paragraph. Design patterns are generic problems that need to be solved in support of many different applications, such as meta modeling (for all meta data problems), answering queries using views (for data integration and the semantic web), and change propagation (for data translation, schema evolution, and round-trip engineering). Basic operators are procedures that are needed to support multiple design patterns and applications, such as matching schemas to produce a mapping, merging schemas based on a mapping, and composing mappings. We will describe several meta data management problems, and for each, we will explain the design patterns and operators that are needed to solve it. We will summarize the main approaches to each design pattern and operator – the main choices of language, data structures, and algorithms – and will highlight the relevant papers that address it. This seminar is targeted at both practicing engineers and researchers. The former will learn about the latest solutions to important meta data problems and the many difficult unsolved problems that are best to avoid. Database researchers, especially professors, will benefit from considering the conceptual framework that we propose, since no database textbooks treat meta data management as a separate topic as far as we know.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122740915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1320022
M. Mokbel, Walid G. Aref, Khaled M. Elbassioni, I. Kamel
A new multimedia disk-scheduling algorithm, termed Cascaded-SFC, is presented. The Cascaded-SFC multimedia disk scheduler is applicable in environments where multimedia data requests arrive with different quality of service (QoS) requirements such as real-time deadline and user priority. Previous work on disk scheduling has focused on optimizing the seek times and/or meeting the real-time deadlines. The Cascaded-SFC disk scheduler provides a unified framework for multimedia disk scheduling that scales with the number of scheduling parameters. The general idea is based on modeling the multimedia disk requests as points in multiple multidimensional subspaces, where each of the dimensions represents one of the parameters (e.g., one dimension represents the request deadline, another represents the disk cylinder number, and a third dimension represents the priority of the request, etc.). Each multidimensional subspace represents a subset of the QoS parameters that share some common scheduling characteristics. Then the multimedia disk scheduling problem reduces to the problem of finding a linear order to traverse the multidimensional points in each subspace. Multiple space-filling curves are selected to fit the scheduling needs of the QoS parameters in each subspace. The orders in each subspace are integrated in a cascaded way to provide a total order for the whole space. Comprehensive experiments demonstrate the efficiency and scalability of the Cascaded-SFC disk scheduling algorithm over other disk schedulers.
{"title":"Scalable multimedia disk scheduling","authors":"M. Mokbel, Walid G. Aref, Khaled M. Elbassioni, I. Kamel","doi":"10.1109/ICDE.2004.1320022","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320022","url":null,"abstract":"A new multimedia disk-scheduling algorithm, termed Cascaded-SFC, is presented. The Cascaded-SFC multimedia disk scheduler is applicable in environments where multimedia data requests arrive with different quality of service (QoS) requirements such as real-time deadline and user priority. Previous work on disk scheduling has focused on optimizing the seek times and/or meeting the real-time deadlines. The Cascaded-SFC disk scheduler provides a unified framework for multimedia disk scheduling that scales with the number of scheduling parameters. The general idea is based on modeling the multimedia disk requests as points in multiple multidimensional subspaces, where each of the dimensions represents one of the parameters (e.g., one dimension represents the request deadline, another represents the disk cylinder number, and a third dimension represents the priority of the request, etc.). Each multidimensional subspace represents a subset of the QoS parameters that share some common scheduling characteristics. Then the multimedia disk scheduling problem reduces to the problem of finding a linear order to traverse the multidimensional points in each subspace. Multiple space-filling curves are selected to fit the scheduling needs of the QoS parameters in each subspace. The orders in each subspace are integrated in a cascaded way to provide a total order for the whole space. Comprehensive experiments demonstrate the efficiency and scalability of the Cascaded-SFC disk scheduling algorithm over other disk schedulers.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129703912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1320038
Adam Silberstein, Jun Yang
XML plays an important role in delivering data over the Internet, and the need to store and manipulate XML in its native format has become increasingly relevant. This growing need necessitates work on developing native XML operators, especially for one as fundamental as sort. We present NEXSORT, an algorithm that leverages the hierarchical nature of XML to efficiently sort an XML document in external memory. In a fully sorted XML document, children of every nonleaf element are ordered according to a given sorting criterion. Among NEXSORT's uses is in combination with structural merge as the XML version of sort-merge join, which allows us to merge large XML documents using only a single pass once they are sorted. The hierarchical structure of an XML document limits the number of possible legal orderings among its elements, which means that sorting XML is fundamentally "easier" than sorting a flat file. We prove that the I/O lower bound for sorting XML in external memory is /spl Theta/(max{n,nlog/sub m/(k/B)}), where n is the number of blocks in the input XML document, m is the number of main memory blocks available for sorting, B is the number of elements that can fit in one block, and k is the maximum fan-out of the input document tree. We show that NEXSORT performs within a constant factor of this theoretical lower bound. In practice we demonstrate, even with a naive implementation, NEXSORT significantly outperforms a regular external merge sort of all elements by their key paths, unless the XML document is nearly flat, in which case NEXSORT degenerates essentially to external merge sort.
{"title":"NEXSORT: sorting XML in external memory","authors":"Adam Silberstein, Jun Yang","doi":"10.1109/ICDE.2004.1320038","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320038","url":null,"abstract":"XML plays an important role in delivering data over the Internet, and the need to store and manipulate XML in its native format has become increasingly relevant. This growing need necessitates work on developing native XML operators, especially for one as fundamental as sort. We present NEXSORT, an algorithm that leverages the hierarchical nature of XML to efficiently sort an XML document in external memory. In a fully sorted XML document, children of every nonleaf element are ordered according to a given sorting criterion. Among NEXSORT's uses is in combination with structural merge as the XML version of sort-merge join, which allows us to merge large XML documents using only a single pass once they are sorted. The hierarchical structure of an XML document limits the number of possible legal orderings among its elements, which means that sorting XML is fundamentally \"easier\" than sorting a flat file. We prove that the I/O lower bound for sorting XML in external memory is /spl Theta/(max{n,nlog/sub m/(k/B)}), where n is the number of blocks in the input XML document, m is the number of main memory blocks available for sorting, B is the number of elements that can fit in one block, and k is the maximum fan-out of the input document tree. We show that NEXSORT performs within a constant factor of this theoretical lower bound. In practice we demonstrate, even with a naive implementation, NEXSORT significantly outperforms a regular external merge sort of all elements by their key paths, unless the XML document is nearly flat, in which case NEXSORT degenerates essentially to external merge sort.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116938160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1319983
R. Krishnamurthy, Venkatesan T. Chakaravarthy, R. Kaushik, J. Naughton
We consider the problem of translating XML queries into SQL when XML documents have been stored in an RDBMS using a schema-based relational decomposition. Surprisingly, there is no published XML-to-SQL query translation algorithm for this scenario that handles recursive XML schemas. We present a generic algorithm to translate path expression queries into SQL in the presence of recursion in the schema and queries. This algorithm handles a general class of XML-to-relational mappings, which includes all techniques proposed in literature. Some of the salient features of this algorithm are: (i) It translates a path expression query into a single SQL query, irrespective of how complex the XML schema is, (ii) It uses the "with" clause in SQL99 to handle recursive queries even over nonrecursive schemas, (iii) It reconstructs recursive XML subtrees with a single SQL query and (iv) It shows that the support for linear recursion in SQL99 is sufficient for handling path expression queries over arbitrarily complex recursive XML schema.
{"title":"Recursive XML schemas, recursive XML queries, and relational storage: XML-to-SQL query translation","authors":"R. Krishnamurthy, Venkatesan T. Chakaravarthy, R. Kaushik, J. Naughton","doi":"10.1109/ICDE.2004.1319983","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319983","url":null,"abstract":"We consider the problem of translating XML queries into SQL when XML documents have been stored in an RDBMS using a schema-based relational decomposition. Surprisingly, there is no published XML-to-SQL query translation algorithm for this scenario that handles recursive XML schemas. We present a generic algorithm to translate path expression queries into SQL in the presence of recursion in the schema and queries. This algorithm handles a general class of XML-to-relational mappings, which includes all techniques proposed in literature. Some of the salient features of this algorithm are: (i) It translates a path expression query into a single SQL query, irrespective of how complex the XML schema is, (ii) It uses the \"with\" clause in SQL99 to handle recursive queries even over nonrecursive schemas, (iii) It reconstructs recursive XML subtrees with a single SQL query and (iv) It shows that the support for linear recursion in SQL99 is sufficient for handling path expression queries over arbitrarily complex recursive XML schema.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134573817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1320012
D. Chiu, Yi-Hung Wu, Arbee L. P. Chen
Mining sequential patterns in large databases is an important research topic. The main challenge of mining sequential patterns is the high processing cost due to the large amount of data. We propose a new strategy called direct sequence comparison (abbreviated as DISC), which can find frequent sequences without having to compute the support counts of nonfrequent sequences. The main difference between the DISC strategy and the previous works is the way to prune nonfrequent sequences. The previous works are based on the antimonotone property, which prune the nonfrequent sequences according to the frequent sequences with shorter lengths. On the contrary, the DISC strategy prunes the nonfrequent sequences according to the other sequences with the same length. Moreover, we summarize three strategies used in the previous works and design an efficient algorithm called DISC-all to take advantages of all the four strategies. The experimental results show that the DISC-all algorithm outperforms the PrefixSpan algorithm on mining frequent sequences in large databases. In addition, we analyze these strategies to design the dynamic version of our algorithm, which achieves a much better performance.
{"title":"An efficient algorithm for mining frequent sequences by a new strategy without support counting","authors":"D. Chiu, Yi-Hung Wu, Arbee L. P. Chen","doi":"10.1109/ICDE.2004.1320012","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320012","url":null,"abstract":"Mining sequential patterns in large databases is an important research topic. The main challenge of mining sequential patterns is the high processing cost due to the large amount of data. We propose a new strategy called direct sequence comparison (abbreviated as DISC), which can find frequent sequences without having to compute the support counts of nonfrequent sequences. The main difference between the DISC strategy and the previous works is the way to prune nonfrequent sequences. The previous works are based on the antimonotone property, which prune the nonfrequent sequences according to the frequent sequences with shorter lengths. On the contrary, the DISC strategy prunes the nonfrequent sequences according to the other sequences with the same length. Moreover, we summarize three strategies used in the previous works and design an efficient algorithm called DISC-all to take advantages of all the four strategies. The experimental results show that the DISC-all algorithm outperforms the PrefixSpan algorithm on mining frequent sequences in large databases. In addition, we analyze these strategies to design the dynamic version of our algorithm, which achieves a much better performance.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133138604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}