Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1320094
D. Gawlick
Database technology has done an excellent job of managing data. SQL92/99 and XML are generally considered to be powerful building blocks; these building blocks are complemented by support for Text, Images, Audio, Video, Spatial, Expressions, and other complex data structures. Database technology can also transparently manage access to data in other (remote) databases, in file systems, and in applications. Furthermore, database technology has achieved impressive operational characteristics with respect to, e.g., performance, scalability, reliability, component and site tolerance, and security.
{"title":"Querying the past, the present, and the future","authors":"D. Gawlick","doi":"10.1109/ICDE.2004.1320094","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320094","url":null,"abstract":"Database technology has done an excellent job of managing data. SQL92/99 and XML are generally considered to be powerful building blocks; these building blocks are complemented by support for Text, Images, Audio, Video, Spatial, Expressions, and other complex data structures. Database technology can also transparently manage access to data in other (remote) databases, in file systems, and in applications. Furthermore, database technology has achieved impressive operational characteristics with respect to, e.g., performance, scalability, reliability, component and site tolerance, and security.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132642241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1320093
D. Florescu
The database world today (or better—the information world) is totally different from the peaceful days when the database research field was created. Moreover, it is in constant movement. Lets list some of the changing factors. First the Internet forever changed our lives. Then came XML as an innocent character-bycharacter UNICODE syntax, and that changed all the rules. Then Web Services arrived, invented by marketing departments in the middle of the boom, and only later taken seriously by vendor capitals and technologists. Now mobile computing and messaging are pervasive. And, finally, we see a shift in perspective due to the dramatic reduction of hardware costs.
{"title":"Database research for the current millennium","authors":"D. Florescu","doi":"10.1109/ICDE.2004.1320093","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320093","url":null,"abstract":"The database world today (or better—the information world) is totally different from the peaceful days when the database research field was created. Moreover, it is in constant movement. Lets list some of the changing factors. First the Internet forever changed our lives. Then came XML as an innocent character-bycharacter UNICODE syntax, and that changed all the rules. Then Web Services arrived, invented by marketing departments in the middle of the boom, and only later taken seriously by vendor capitals and technologists. Now mobile computing and messaging are pervasive. And, finally, we see a shift in perspective due to the dramatic reduction of hardware costs.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130963078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1319980
Nick Koudas, B. Ooi, Heng Tao Shen, A. Tung
Recent advances in research fields like multimedia and bioinformatics have brought about a new generation of hyper-dimensional databases which can contain hundreds or even thousands of dimensions. Such hyper-dimensional databases pose significant problems to existing high-dimensional indexing techniques which have been developed for indexing databases with (commonly) less than a hundred dimensions. To support efficient querying and retrieval on hyper-dimensional databases, we propose a methodology called local digital coding (LDC) which can support k-nearest neighbors (KNN) queries on hyper-dimensional databases and yet co-exist with ubiquitous indices, such as B+-trees. LDC extracts a simple bitmap representation called digital code(DC) for each point in the database. Pruning during KNN search is performed by dynamically selecting only a subset of the bits from the DC based on which subsequent comparisons are performed. In doing so, expensive operations involved in computing L-norm distance functions between hyper-dimensional data can be avoided. Extensive experiments are conducted to show that our methodology offers significant performance advantages over other existing indexing methods on both real life and synthetic hyper-dimensional datasets.
{"title":"LDC: enabling search by partial distance in a hyper-dimensional space","authors":"Nick Koudas, B. Ooi, Heng Tao Shen, A. Tung","doi":"10.1109/ICDE.2004.1319980","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319980","url":null,"abstract":"Recent advances in research fields like multimedia and bioinformatics have brought about a new generation of hyper-dimensional databases which can contain hundreds or even thousands of dimensions. Such hyper-dimensional databases pose significant problems to existing high-dimensional indexing techniques which have been developed for indexing databases with (commonly) less than a hundred dimensions. To support efficient querying and retrieval on hyper-dimensional databases, we propose a methodology called local digital coding (LDC) which can support k-nearest neighbors (KNN) queries on hyper-dimensional databases and yet co-exist with ubiquitous indices, such as B+-trees. LDC extracts a simple bitmap representation called digital code(DC) for each point in the database. Pruning during KNN search is performed by dynamically selecting only a subset of the bits from the DC based on which subsequent comparisons are performed. In doing so, expensive operations involved in computing L-norm distance functions between hyper-dimensional data can be avoided. Extensive experiments are conducted to show that our methodology offers significant performance advantages over other existing indexing methods on both real life and synthetic hyper-dimensional datasets.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121466100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1320039
D. Shasha, J. Wang, Sen Zhang
Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data, such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. We present a new FSM technique for finding patterns in rooted unordered labeled trees. The patterns of interest are cousin pairs in these trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(|T|/sup 2/) time where |T| is the number of nodes in T. Experimental results on synthetic data and phylogenies show the scalability and effectiveness of the proposed technique. To demonstrate the usefulness of our approach, we discuss its applications to locating co-occurring patterns in multiple evolutionary trees, evaluating the consensus of equally parsimonious trees, and finding kernel trees of groups of phylogenies. We also describe extensions of our algorithms for undirected acyclic graphs (or free trees).
{"title":"Unordered tree mining with applications to phylogeny","authors":"D. Shasha, J. Wang, Sen Zhang","doi":"10.1109/ICDE.2004.1320039","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320039","url":null,"abstract":"Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data, such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. We present a new FSM technique for finding patterns in rooted unordered labeled trees. The patterns of interest are cousin pairs in these trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(|T|/sup 2/) time where |T| is the number of nodes in T. Experimental results on synthetic data and phylogenies show the scalability and effectiveness of the proposed technique. To demonstrate the usefulness of our approach, we discuss its applications to locating co-occurring patterns in multiple evolutionary trees, evaluating the consensus of equally parsimonious trees, and finding kernel trees of groups of phylogenies. We also describe extensions of our algorithms for undirected acyclic graphs (or free trees).","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116250422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1319985
Xiaodong Wu, M. Lee, W. Hsu
Efficient evaluation of XML queries requires the determination of whether a relationship exists between two elements. A number of labeling schemes have been designed to label the element nodes such that the relationships between nodes can be easily determined by comparing their labels. With the increased popularity of XML on the Web, finding a labeling scheme that is able to support order-sensitive queries in the presence of dynamic updates becomes urgent. We propose a new labeling scheme that take advantage of the unique property of prime numbers to meet this need. The global order of the nodes can be captured by generating simultaneous congruence values from the prime number node labels. Theoretical analysis of the label size requirements for the various labeling schemes is given. Experiment results indicate that the prime number labeling scheme is compact compared to existing dynamic labeling schemes, and provides efficient support to order-sensitive queries and updates.
{"title":"A prime number labeling scheme for dynamic ordered XML trees","authors":"Xiaodong Wu, M. Lee, W. Hsu","doi":"10.1109/ICDE.2004.1319985","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319985","url":null,"abstract":"Efficient evaluation of XML queries requires the determination of whether a relationship exists between two elements. A number of labeling schemes have been designed to label the element nodes such that the relationships between nodes can be easily determined by comparing their labels. With the increased popularity of XML on the Web, finding a labeling scheme that is able to support order-sensitive queries in the presence of dynamic updates becomes urgent. We propose a new labeling scheme that take advantage of the unique property of prime numbers to meet this need. The global order of the nodes can be captured by generating simultaneous congruence values from the prime number node labels. Theoretical analysis of the label size requirements for the various labeling schemes is given. Experiment results indicate that the prime number labeling scheme is compact compared to existing dynamic labeling schemes, and provides efficient support to order-sensitive queries and updates.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122483026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1319997
Jimeng Sun, D. Papadias, Yufei Tao, B. Liu
Moving objects (e.g., vehicles in road networks) continuously generate large amounts of spatio-temporal information in the form of data streams. Efficient management of such streams is a challenging goal due to the highly dynamic nature of the data and the need for fast, online computations. We present a novel approach for approximate query processing about the present, past, or the future in spatio-temporal databases. In particular, we first propose an incrementally updateable, multidimensional histogram for present-time queries. Second, we develop a general architecture for maintaining and querying historical data. Third, we implement a stochastic approach for predicting the results of queries that refer to the future. Finally, we experimentally prove the effectiveness and efficiency of our techniques using a realistic simulation.
{"title":"Querying about the past, the present, and the future in spatio-temporal databases","authors":"Jimeng Sun, D. Papadias, Yufei Tao, B. Liu","doi":"10.1109/ICDE.2004.1319997","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319997","url":null,"abstract":"Moving objects (e.g., vehicles in road networks) continuously generate large amounts of spatio-temporal information in the form of data streams. Efficient management of such streams is a challenging goal due to the highly dynamic nature of the data and the need for fast, online computations. We present a novel approach for approximate query processing about the present, past, or the future in spatio-temporal databases. In particular, we first propose an incrementally updateable, multidimensional histogram for present-time queries. Second, we develop a general architecture for maintaining and querying historical data. Third, we implement a stochastic approach for predicting the results of queries that refer to the future. Finally, we experimentally prove the effectiveness and efficiency of our techniques using a realistic simulation.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121593350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1319981
D. Lomet
Why might B-tree concurrency control still be interesting? For two reasons: (i) currently exploited "real world" approaches are complicated; (ii) simpler proposals are not used because they are not sufficiently robust. In the "real world", systems need to deal robustly with node deletion, and this is an important reason why the currently exploited techniques are complicated. In our effort to simplify the world of robust and highly concurrent B-tree methods, we focus on exactly where B-tree concurrency control needs information about node deletes, and describe mechanisms that provide that information. We exploit the B/sup link/ -tree property of being "well-formed" even when index term posting for a node split has not been completed to greatly simplify our algorithms. Our goal is to describe a very simple but nonetheless robust method.
{"title":"Simple, robust and highly concurrent b-trees with node deletion","authors":"D. Lomet","doi":"10.1109/ICDE.2004.1319981","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319981","url":null,"abstract":"Why might B-tree concurrency control still be interesting? For two reasons: (i) currently exploited \"real world\" approaches are complicated; (ii) simpler proposals are not used because they are not sufficiently robust. In the \"real world\", systems need to deal robustly with node deletion, and this is an important reason why the currently exploited techniques are complicated. In our effort to simplify the world of robust and highly concurrent B-tree methods, we focus on exactly where B-tree concurrency control needs information about node deletes, and describe mechanisms that provide that information. We exploit the B/sup link/ -tree property of being \"well-formed\" even when index term posting for a node split has not been completed to greatly simplify our algorithms. Our goal is to describe a very simple but nonetheless robust method.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"254 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114545797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1320050
J. Richardson, J. Kadin, J. Blake, C. Bult, J. Eppig, M. Ringwald
Biology is a vast domain. The Mouse Genome Informatics (MGI) system, which focuses on the biology of the laboratory mouse, covers only a small, carefully chosen slice. Nevertheless, we deal with data of immense variety, deep complexity, and exponentially growing volume. Our role as an integration nexus is to add value by combining data sets of diverse types and origins, eliminating redundancy and resolving conflicts. We briefly describe some of the issues we face and approaches we have adopted to the integration problem.
{"title":"From sipping on a straw to drinking from a fire hose: data integration in a public genome database","authors":"J. Richardson, J. Kadin, J. Blake, C. Bult, J. Eppig, M. Ringwald","doi":"10.1109/ICDE.2004.1320050","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320050","url":null,"abstract":"Biology is a vast domain. The Mouse Genome Informatics (MGI) system, which focuses on the biology of the laboratory mouse, covers only a small, carefully chosen slice. Nevertheless, we deal with data of immense variety, deep complexity, and exponentially growing volume. Our role as an integration nexus is to add value by combining data sets of diverse types and origins, eliminating redundancy and resolving conflicts. We briefly describe some of the issues we face and approaches we have adopted to the integration problem.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127834229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1320033
Zhenqiang Tan, A. Tung
We look at substructure clustering of sequential 3d objects. A sequential 3d object is a set of points located in a three dimensional space that are linked up to form a sequence. Given a set of sequential 3d objects, our aim is to find significantly large substructures which are present in many of the sequential 3d objects. Unlike traditional subspace clustering methods in which objects are compared based on values in the same dimension, the matching dimensions between two 3d sequential objects are affected by both the translation and rotation of the objects and are thus not well defined. Instead, similarity between the objects are judge by computing a structural distance measurement call rmsd (Root Mean Square Distance) which require proper alignment (including translation and rotation) of the objects. As the computation of rmsd is expensive, we proposed a new measure call ald (Angle Length Distance) which is shown experimentally to approximate rmsd. Based on ald, we define a new clustering model called sCluster and devise an algorithm for discovering all maximum sCluster in a 3d sequential dataset. Experiments are conducted to illustrate the efficiency and effectiveness of our algorithm.
{"title":"Substructure clustering on sequential 3d object datasets","authors":"Zhenqiang Tan, A. Tung","doi":"10.1109/ICDE.2004.1320033","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320033","url":null,"abstract":"We look at substructure clustering of sequential 3d objects. A sequential 3d object is a set of points located in a three dimensional space that are linked up to form a sequence. Given a set of sequential 3d objects, our aim is to find significantly large substructures which are present in many of the sequential 3d objects. Unlike traditional subspace clustering methods in which objects are compared based on values in the same dimension, the matching dimensions between two 3d sequential objects are affected by both the translation and rotation of the objects and are thus not well defined. Instead, similarity between the objects are judge by computing a structural distance measurement call rmsd (Root Mean Square Distance) which require proper alignment (including translation and rotation) of the objects. As the computation of rmsd is expensive, we proposed a new measure call ald (Angle Length Distance) which is shown experimentally to approximate rmsd. Based on ald, we define a new clustering model called sCluster and devise an algorithm for discovering all maximum sCluster in a 3d sequential dataset. Experiments are conducted to illustrate the efficiency and effectiveness of our algorithm.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114725814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/ICDE.2004.1320004
Atsuyuki Morishima, H. Kitagawa, Akira Matsumoto
We present XLearner, a novel tool that helps the rapid development of XML mapping queries written in XQuery. XLearner is novel in that it learns XQuery queries consistent with given examples (fragments) of intended query results. XLearner combines known learning techniques, incorporates mechanisms to cope with issues specific to the XQuery learning context, and provides a systematic way for the semiautomatic development of queries. We describe the XLearner system. It presents algorithms for learning various classes of XQuery, shows that a minor extension gives the system a practical expressive power, and reports experimental results to demonstrate how XLearner outputs reasonably complicated queries with only a small number of interactions with the user.
{"title":"A machine learning approach to rapid development of XML mapping queries","authors":"Atsuyuki Morishima, H. Kitagawa, Akira Matsumoto","doi":"10.1109/ICDE.2004.1320004","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320004","url":null,"abstract":"We present XLearner, a novel tool that helps the rapid development of XML mapping queries written in XQuery. XLearner is novel in that it learns XQuery queries consistent with given examples (fragments) of intended query results. XLearner combines known learning techniques, incorporates mechanisms to cope with issues specific to the XQuery learning context, and provides a systematic way for the semiautomatic development of queries. We describe the XLearner system. It presents algorithms for learning various classes of XQuery, shows that a minor extension gives the system a practical expressive power, and reports experimental results to demonstrate how XLearner outputs reasonably complicated queries with only a small number of interactions with the user.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131383335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}