Cheng Luo, Zhewei Jiang, W. Hou, Feng Yan, Chih-Fang Wang
XML structural joins, which evaluate the containment (ancestor-descendant) relationships between XML elements, are important operations of XML query processing. Estimating structural join size accurately and quickly is thus crucial to the success of XML query plan selection and the query optimization. XML structural joins are essentially complex unequal joins, which render well-known estimation techniques, such as cosine transform, wavelet transform, and sketch, not directly applicable. In this paper, we propose a relation model to capture the structural information of XML data such that the original complex unequal joins are converted to equal joins and those well-known estimation techniques become directly applicable to structural join size estimation. Theoretical analyses and extensive experiments have been performed on these estimation methods. It is shown that the cosine transform requires the least memory and yields the best estimates.
{"title":"Estimating XML Structural Join Size Quickly and Economically","authors":"Cheng Luo, Zhewei Jiang, W. Hou, Feng Yan, Chih-Fang Wang","doi":"10.1109/ICDE.2006.63","DOIUrl":"https://doi.org/10.1109/ICDE.2006.63","url":null,"abstract":"XML structural joins, which evaluate the containment (ancestor-descendant) relationships between XML elements, are important operations of XML query processing. Estimating structural join size accurately and quickly is thus crucial to the success of XML query plan selection and the query optimization. XML structural joins are essentially complex unequal joins, which render well-known estimation techniques, such as cosine transform, wavelet transform, and sketch, not directly applicable. In this paper, we propose a relation model to capture the structural information of XML data such that the original complex unequal joins are converted to equal joins and those well-known estimation techniques become directly applicable to structural join size estimation. Theoretical analyses and extensive experiments have been performed on these estimation methods. It is shown that the cosine transform requires the least memory and yields the best estimates.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"107 1","pages":"62-62"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76686824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fusheng Wang, Peiya Liu, John Pearson, F. Azar, G. Madlmayr
Scientific research in many fields is increasingly a collaborative effort across multiple institutions and disciplines. Scientific researchers need not only an effective system to manage their data, results, and the experiments that generate the results, but also a platform to integrate, share and search these across multiple institutions. Therefore, researchers are able to reuse experiments, pool expertise and validate approaches. In this paper, we present Sci- Port, a system of experiment management and integration for collaborative scientific research. SciPort’s architecture uses i) a general transformation-based data model to represent and link experiment processes; ii) hierarchical data classification across multiple institutions according to research programs’ goals and organization; iii) metadatacentric representation that concisely captures the context of experiments; and iv) virtual data integration through centralized metadata integration. The system is built for open source, and the metadata-based representation and integration provides a unified framework and tool set to manage and share experiments for scientific research communities.
{"title":"Experiment Management with Metadata-based Integration for Collaborative Scientific Research","authors":"Fusheng Wang, Peiya Liu, John Pearson, F. Azar, G. Madlmayr","doi":"10.1109/ICDE.2006.65","DOIUrl":"https://doi.org/10.1109/ICDE.2006.65","url":null,"abstract":"Scientific research in many fields is increasingly a collaborative effort across multiple institutions and disciplines. Scientific researchers need not only an effective system to manage their data, results, and the experiments that generate the results, but also a platform to integrate, share and search these across multiple institutions. Therefore, researchers are able to reuse experiments, pool expertise and validate approaches. In this paper, we present Sci- Port, a system of experiment management and integration for collaborative scientific research. SciPort’s architecture uses i) a general transformation-based data model to represent and link experiment processes; ii) hierarchical data classification across multiple institutions according to research programs’ goals and organization; iii) metadatacentric representation that concisely captures the context of experiments; and iv) virtual data integration through centralized metadata integration. The system is built for open source, and the metadata-based representation and integration provides a unified framework and tool set to manage and share experiments for scientific research communities.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"14 1","pages":"96-96"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88470297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Bonchi, F. Giannotti, C. Lucchese, S. Orlando, R. Perego, R. Trasarti
ConQueSt is a constraint-based querying system devised with the aim of supporting the intrinsically exploratory nature of pattern discovery. It provides users with an expressive constraint-based query language which allows the discovery process to be effectively driven toward potentially interesting patterns. Constraints are also exploited to reduce the cost of pattern mining. The system is built around an efficient constraint-based mining engine which entails several data and search space reduction techniques, and allows new user-defined constraints to be easily added.
{"title":"ConQueSt: a Constraint-based Querying System for Exploratory Pattern Discovery","authors":"F. Bonchi, F. Giannotti, C. Lucchese, S. Orlando, R. Perego, R. Trasarti","doi":"10.1109/ICDE.2006.42","DOIUrl":"https://doi.org/10.1109/ICDE.2006.42","url":null,"abstract":"ConQueSt is a constraint-based querying system devised with the aim of supporting the intrinsically exploratory nature of pattern discovery. It provides users with an expressive constraint-based query language which allows the discovery process to be effectively driven toward potentially interesting patterns. Constraints are also exploited to reduce the cost of pattern mining. The system is built around an efficient constraint-based mining engine which entails several data and search space reduction techniques, and allows new user-defined constraints to be easily added.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"3 1","pages":"159-159"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88495792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Streaming XPath evaluation algorithms must record a potentially exponential number of pattern matches when both predicates and descendant axes are present in queries, and the XML data is recursive. In this paper, we use a compact data structure to encode these pattern matches rather than storing them explicitly. We then propose a polynomial time streaming algorithm to evaluate XPath queries by probing the data structure in a lazy fashion. Extensive experiments show that our approach not only has a good theoretical complexity bound but is also efficient in practice.
{"title":"An Efficient XPath Query Processor for XML Streams","authors":"Yi Chen, S. Davidson, Yifeng Zheng","doi":"10.1109/ICDE.2006.18","DOIUrl":"https://doi.org/10.1109/ICDE.2006.18","url":null,"abstract":"Streaming XPath evaluation algorithms must record a potentially exponential number of pattern matches when both predicates and descendant axes are present in queries, and the XML data is recursive. In this paper, we use a compact data structure to encode these pattern matches rather than storing them explicitly. We then propose a polynomial time streaming algorithm to evaluate XPath queries by probing the data structure in a lazy fashion. Extensive experiments show that our approach not only has a good theoretical complexity bound but is also efficient in practice.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"49 1","pages":"79-79"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84664404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Both XML-relational systems and native XML systems support creating XML wrapper views and querying against them. However, update operations against such virtual XML views in most cases are not supported yet.
{"title":"U-Filter: A Lightweight XML View Update Checker","authors":"Ling Wang, Elke A. Rundensteiner, Murali Mani","doi":"10.1109/ICDE.2006.163","DOIUrl":"https://doi.org/10.1109/ICDE.2006.163","url":null,"abstract":"Both XML-relational systems and native XML systems support creating XML wrapper views and querying against them. However, update operations against such virtual XML views in most cases are not supported yet.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"13 1","pages":"126-126"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87676517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei-Shinn Ku, Roger Zimmermann, C. Wan, Haojun Wang
In this demonstration we present MAPLE, a scalable peer-to-peer nearest neighbor (NN) query system for mobile environments. MAPLE is designed for the efficient sharing of query results cached in the local storage of mobile peers. The MAPLE system is innovative in its ability to either fully or partially compute location-dependent nearest neighbor objects on each host. The demonstration illustrates how cooperative data sharing and distributed processing among mobile peers results in a considerable reduction of the load on remote spatial databases.
{"title":"MAPLE: A Mobile Scalable P2P Nearest Neighbor Query System for Location-based Services","authors":"Wei-Shinn Ku, Roger Zimmermann, C. Wan, Haojun Wang","doi":"10.1109/ICDE.2006.89","DOIUrl":"https://doi.org/10.1109/ICDE.2006.89","url":null,"abstract":"In this demonstration we present MAPLE, a scalable peer-to-peer nearest neighbor (NN) query system for mobile environments. MAPLE is designed for the efficient sharing of query results cached in the local storage of mobile peers. The MAPLE system is innovative in its ability to either fully or partially compute location-dependent nearest neighbor objects on each host. The demonstration illustrates how cooperative data sharing and distributed processing among mobile peers results in a considerable reduction of the load on remote spatial databases.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"78 1","pages":"160-160"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91138983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a new model for coherent clustering of gene expression data called reg-cluster. The proposed model allows (1) the expression profiles of genes in a cluster to follow any shifting-and-scaling patterns in subspace, where the scaling can be either positive or negative, and (2) the expression value changes across any two conditions of the cluster to be significant. No previous work measures up to the task that we have set: the density-based subspace clustering algorithms require genes to have similar expression levels to each other in subspace; the pattern-based biclustering algorithms only allow pure shifting or pure scaling patterns; and the tendency-based biclustering algorithms have no coherence guarantees. We also develop a novel patternbased biclustering algorithm for identifying shifting-andscaling co-regulation patterns, satisfying both coherence constraint and regulation constraint. Our experimental results show that the reg-cluster algorithm is able to detect a significant amount of clusters missed by previous models, and these clusters are potentially of high biological significance.
{"title":"Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles","authors":"Xin Xu, Ying Lu, A. Tung, Wei Wang","doi":"10.1109/ICDE.2006.98","DOIUrl":"https://doi.org/10.1109/ICDE.2006.98","url":null,"abstract":"In this paper, we propose a new model for coherent clustering of gene expression data called reg-cluster. The proposed model allows (1) the expression profiles of genes in a cluster to follow any shifting-and-scaling patterns in subspace, where the scaling can be either positive or negative, and (2) the expression value changes across any two conditions of the cluster to be significant. No previous work measures up to the task that we have set: the density-based subspace clustering algorithms require genes to have similar expression levels to each other in subspace; the pattern-based biclustering algorithms only allow pure shifting or pure scaling patterns; and the tendency-based biclustering algorithms have no coherence guarantees. We also develop a novel patternbased biclustering algorithm for identifying shifting-andscaling co-regulation patterns, satisfying both coherence constraint and regulation constraint. Our experimental results show that the reg-cluster algorithm is able to detect a significant amount of clusters missed by previous models, and these clusters are potentially of high biological significance.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"51 1","pages":"89-89"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91216994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a system for automatically integrating unstructured text into a multi-relational database using state-of-the-art statistical models for structure extraction and matching. We show how to extend current highperforming models, Conditional Random Fields and their semi-markov counterparts, to effectively exploit a variety of recognition clues available in a database of entities, thereby significantly reducing the dependence on manually labeled training data. Our system is designed to load unstructured records into columns spread across multiple tables in the database while resolving the relationship of the extracted text with existing column values, and preserving the cardinality and link constraints of the database. We show how to combine the inference algorithms of statistical models with the database imposed constraints for optimal data integration.
{"title":"Integrating Unstructured Data into Relational Databases","authors":"I. Mansuri, Sunita Sarawagi","doi":"10.1109/ICDE.2006.83","DOIUrl":"https://doi.org/10.1109/ICDE.2006.83","url":null,"abstract":"In this paper we present a system for automatically integrating unstructured text into a multi-relational database using state-of-the-art statistical models for structure extraction and matching. We show how to extend current highperforming models, Conditional Random Fields and their semi-markov counterparts, to effectively exploit a variety of recognition clues available in a database of entities, thereby significantly reducing the dependence on manually labeled training data. Our system is designed to load unstructured records into columns spread across multiple tables in the database while resolving the relationship of the extracted text with existing column values, and preserving the cardinality and link constraints of the database. We show how to combine the inference algorithms of statistical models with the database imposed constraints for optimal data integration.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"31 1","pages":"29-29"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91231258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Gounaris, N. Paton, R. Sakellariou, A. Fernandes, Jim Smith, P. Watson
Grid computational resources, as well as being heterogeneous, may also exhibit unpredictable, volatile behaviour. Therefore, query processing on the Grid needs to be adaptive in order to cope with evolving resource characteristics, such as machine load and availability. To address this challenge in a Grid environment, the non-adaptive OGSA-DQP1 system described in [1] has been enhanced with adaptive capabilities.
{"title":"Practical Adaptation to Changing Resources in Grid Query Processing","authors":"A. Gounaris, N. Paton, R. Sakellariou, A. Fernandes, Jim Smith, P. Watson","doi":"10.1109/ICDE.2006.113","DOIUrl":"https://doi.org/10.1109/ICDE.2006.113","url":null,"abstract":"Grid computational resources, as well as being heterogeneous, may also exhibit unpredictable, volatile behaviour. Therefore, query processing on the Grid needs to be adaptive in order to cope with evolving resource characteristics, such as machine load and availability. To address this challenge in a Grid environment, the non-adaptive OGSA-DQP1 system described in [1] has been enhanced with adaptive capabilities.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"11 1","pages":"165-165"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84381892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The existing query processing techniques for sensor networks rely on a network infrastructure for query propagation and data collection. However, such an infrastructure is very susceptible to network topology transients that widely exist in sensor networks. In this paper, we propose an infrastructure-free window query processing technique for sensor networks, called itinerary-based window query execution (IWQE), in which query propagation and data collection are combined into one single stage and executed along a well-designed itinerary inside a query window. We study the parameters for setting up an itinerary (e.g., width and route) and incorporate into IWQE three data collection schemes based on different performance trade-offs. Finally we demonstrate, by extensive simulations, the superior energy-time efficiency, robustness, and accuracy of IWQE over the current state-of-the-art techniques in supporting window queries under various network conditions.
{"title":"ProcessingWindow Queries in Wireless Sensor Networks","authors":"Yingqi Xu, Wang-Chien Lee, Jianliang Xu, Gail Mitchell","doi":"10.1109/ICDE.2006.119","DOIUrl":"https://doi.org/10.1109/ICDE.2006.119","url":null,"abstract":"The existing query processing techniques for sensor networks rely on a network infrastructure for query propagation and data collection. However, such an infrastructure is very susceptible to network topology transients that widely exist in sensor networks. In this paper, we propose an infrastructure-free window query processing technique for sensor networks, called itinerary-based window query execution (IWQE), in which query propagation and data collection are combined into one single stage and executed along a well-designed itinerary inside a query window. We study the parameters for setting up an itinerary (e.g., width and route) and incorporate into IWQE three data collection schemes based on different performance trade-offs. Finally we demonstrate, by extensive simulations, the superior energy-time efficiency, robustness, and accuracy of IWQE over the current state-of-the-art techniques in supporting window queries under various network conditions.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"80 1","pages":"70-70"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89006727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}