We present a novel algorithm called CLICKS, that finds clusters in categorical datasets based on a search for k-partite maximal cliques. Unlike previous methods, CLICKS mines subspace clusters. It uses a selective vertical method to guarantee complete search. CLICKS outperforms previous approaches by over an order of magnitude and scales better than any of the existing method for high-dimensional datasets. We demonstrate this improvement in an excerpt from our comprehensive performance studies.
{"title":"CLICKS: Mining Subspace Clusters in Categorical Data via K-Partite Maximal Cliques","authors":"Mohammed J. Zaki, M. Peters","doi":"10.1109/ICDE.2005.33","DOIUrl":"https://doi.org/10.1109/ICDE.2005.33","url":null,"abstract":"We present a novel algorithm called CLICKS, that finds clusters in categorical datasets based on a search for k-partite maximal cliques. Unlike previous methods, CLICKS mines subspace clusters. It uses a selective vertical method to guarantee complete search. CLICKS outperforms previous approaches by over an order of magnitude and scales better than any of the existing method for high-dimensional datasets. We demonstrate this improvement in an excerpt from our comprehensive performance studies.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"7 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120988975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Krishnaprasad, Z. Liu, Anand Manikutty, James W. Warner, Vikas Arora
XML has become an attractive data processing model for applications. SQL/XML is a SQL standard that integrates XML with SQL. It introduces the XML datatype as a native SQL datatype and defines XML generation functions in the SQL/XML 2003 standard. The goal for the next version of SQL/XML is integrating XQuery with SQL by supporting XQuery embedded inside SQL functions such as the XMLQuery and XMLTable functions. Starting with the 9i database release, Oracle has supported the XML datatype and various operations on XML instances. In this paper, we present the design and implementation strategies of the SQL/XML standard in Oracle XMLDB. We explore the various critical infrastructures needed in the SQL database kernel to support an efficient native XML datatype implementation and the design approaches for efficient generation, query and update of the XML instances. Furthermore, we also illustrate extensions to SQL/XML that makes Oracle XMLDB a truly industrial strength platform for XML processing.
{"title":"Towards an industrial strength SQL/XML infrastructure","authors":"M. Krishnaprasad, Z. Liu, Anand Manikutty, James W. Warner, Vikas Arora","doi":"10.1109/ICDE.2005.144","DOIUrl":"https://doi.org/10.1109/ICDE.2005.144","url":null,"abstract":"XML has become an attractive data processing model for applications. SQL/XML is a SQL standard that integrates XML with SQL. It introduces the XML datatype as a native SQL datatype and defines XML generation functions in the SQL/XML 2003 standard. The goal for the next version of SQL/XML is integrating XQuery with SQL by supporting XQuery embedded inside SQL functions such as the XMLQuery and XMLTable functions. Starting with the 9i database release, Oracle has supported the XML datatype and various operations on XML instances. In this paper, we present the design and implementation strategies of the SQL/XML standard in Oracle XMLDB. We explore the various critical infrastructures needed in the SQL database kernel to support an efficient native XML datatype implementation and the design approaches for efficient generation, query and update of the XML instances. Furthermore, we also illustrate extensions to SQL/XML that makes Oracle XMLDB a truly industrial strength platform for XML processing.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121478543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Agrawal, Paul Bird, Tyrone Grandison, J. Kiernan, Scott Logan, Walid Rjaibi
Databases are at the core of successful businesses. Due to the voluminous stores of personal data being held by companies today, preserving privacy has become a crucial requirement for operating a business. This paper proposes how current relational database management systems can be transformed into their privacy-preserving equivalents. Specifically, we present language constructs and implementation design for fine-grained access control to achieve this goal.
{"title":"Extending relational database systems to automatically enforce privacy policies","authors":"R. Agrawal, Paul Bird, Tyrone Grandison, J. Kiernan, Scott Logan, Walid Rjaibi","doi":"10.1109/ICDE.2005.64","DOIUrl":"https://doi.org/10.1109/ICDE.2005.64","url":null,"abstract":"Databases are at the core of successful businesses. Due to the voluminous stores of personal data being held by companies today, preserving privacy has become a crucial requirement for operating a business. This paper proposes how current relational database management systems can be transformed into their privacy-preserving equivalents. Specifically, we present language constructs and implementation design for fine-grained access control to achieve this goal.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123240337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Enormous amount of semantic data is still being encoded in HTML documents. Identifying and annotating the semantic concepts implicit in such documents makes them directly amenable for semantic Web processing. In this paper we describe a highly automated technique for annotating HTML documents, especially template-based content-rich documents, containing many different semantic concepts per document. Starting with a (small) seed of hand-labeled instances of semantic concepts in a set of HTML documents we bootstrap an annotation process that automatically identifies unlabeled concept instances present in other documents. The bootstrapping technique exploits the observation that semantically related items in content-rich documents exhibit consistency in presentation style and spatial locality to learn a statistical model for accurately identifying different semantic concepts in HTML documents drawn from a variety of Web sources. We also present experimental results on the effectiveness of the technique.
{"title":"Bootstrapping semantic annotation for content-rich HTML documents","authors":"Saikat Mukherjee, I. Ramakrishnan, Amarjeet Singh","doi":"10.1109/ICDE.2005.28","DOIUrl":"https://doi.org/10.1109/ICDE.2005.28","url":null,"abstract":"Enormous amount of semantic data is still being encoded in HTML documents. Identifying and annotating the semantic concepts implicit in such documents makes them directly amenable for semantic Web processing. In this paper we describe a highly automated technique for annotating HTML documents, especially template-based content-rich documents, containing many different semantic concepts per document. Starting with a (small) seed of hand-labeled instances of semantic concepts in a set of HTML documents we bootstrap an annotation process that automatically identifies unlabeled concept instances present in other documents. The bootstrapping technique exploits the observation that semantically related items in content-rich documents exhibit consistency in presentation style and spatial locality to learn a statistical model for accurately identifying different semantic concepts in HTML documents drawn from a variety of Web sources. We also present experimental results on the effectiveness of the technique.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115944539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Efficient evaluation of path expressions has been studied extensively. However, evaluating more complex FLWOR expressions that contain multiple path expressions has not been well studied. In this paper, we propose a novel pattern matching approach, called BlossomTree, to evaluate a FLWOR expression that contains correlated path expressions. BlossomTree is a formalism to capture the semantics of the path expressions and their correlations. We propose a general algebraic framework (abstract data types and logical operators) to evaluate BlossomTree pattern matching that facilitates efficient evaluation and experimentation. We design efficient data structures and algorithms to implement the abstract data types and logical operators. Our experimental studies demonstrate that the BlossomTree approach can generate highly efficient query plans in different environments.
{"title":"BlossomTree: evaluating XPaths in FLWOR expressions","authors":"Ning Zhang, Shishir Agrawal, M. Tamer Özsu","doi":"10.1109/ICDE.2005.27","DOIUrl":"https://doi.org/10.1109/ICDE.2005.27","url":null,"abstract":"Efficient evaluation of path expressions has been studied extensively. However, evaluating more complex FLWOR expressions that contain multiple path expressions has not been well studied. In this paper, we propose a novel pattern matching approach, called BlossomTree, to evaluate a FLWOR expression that contains correlated path expressions. BlossomTree is a formalism to capture the semantics of the path expressions and their correlations. We propose a general algebraic framework (abstract data types and logical operators) to evaluate BlossomTree pattern matching that facilitates efficient evaluation and experimentation. We design efficient data structures and algorithms to implement the abstract data types and logical operators. Our experimental studies demonstrate that the BlossomTree approach can generate highly efficient query plans in different environments.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132499316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present the Deep Store archival storage architecture, a large-scale storage system that stores immutable data efficiently and reliably for long periods of time. Archived data is stored across a cluster of nodes and recorded to hard disk. The design differentiates itself from traditional file systems by eliminating redundancy within and across files, distributing content for scalability, associating rich metadata with content, and using variable levels of replication based on the importance or degree of dependency of each piece of stored data. We evaluate the foundations of our design, including PRESIDIO, a virtual content-addressable storage framework with multiple methods for interfile and intra-file compression that effectively addresses the data-dependent variability of data compression. We measure content and metadata storage efficiency, demonstrate the need for a variable-degree replication model, and provide preliminary results for storage performance.
{"title":"Deep Store: an archival storage system architecture","authors":"L. You, Kristal T. Pollack, D. Long","doi":"10.1109/ICDE.2005.47","DOIUrl":"https://doi.org/10.1109/ICDE.2005.47","url":null,"abstract":"We present the Deep Store archival storage architecture, a large-scale storage system that stores immutable data efficiently and reliably for long periods of time. Archived data is stored across a cluster of nodes and recorded to hard disk. The design differentiates itself from traditional file systems by eliminating redundancy within and across files, distributing content for scalability, associating rich metadata with content, and using variable levels of replication based on the importance or degree of dependency of each piece of stored data. We evaluate the foundations of our design, including PRESIDIO, a virtual content-addressable storage framework with multiple methods for interfile and intra-file compression that effectively addresses the data-dependent variability of data compression. We measure content and metadata storage efficiency, demonstrate the need for a variable-degree replication model, and provide preliminary results for storage performance.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131523873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We address the problem of content-based dissemination of highly-distributed, high-volume data streams for stream-based monitoring applications and large-scale data delivery. Existing content-based dissemination approaches commonly rely on distributed filtering trees that require filtering at all brokers on the tree. We present a new semantic multicast approach that eliminates the need for content-based filtering at interior brokers and facilitates fine-grained control over the construction of efficient dissemination trees. The central idea is to split the incoming data streams (based on their contents, rates, and destinations) and then spread the pieces across multiple channels, each of which is implemented as an independent dissemination tree. We present the basic design and evaluation of SemCast, an overlay-network based system that implements this semantic multicast approach. Through a detailed simulation study and realistic network topologies, we demonstrate that SemCast significantly improves the efficiency of dissemination compared to traditional approaches.
{"title":"SemCast: semantic multicast for content-based data dissemination","authors":"Olga Papaemmanouil, U. Çetintemel","doi":"10.1109/ICDE.2005.131","DOIUrl":"https://doi.org/10.1109/ICDE.2005.131","url":null,"abstract":"We address the problem of content-based dissemination of highly-distributed, high-volume data streams for stream-based monitoring applications and large-scale data delivery. Existing content-based dissemination approaches commonly rely on distributed filtering trees that require filtering at all brokers on the tree. We present a new semantic multicast approach that eliminates the need for content-based filtering at interior brokers and facilitates fine-grained control over the construction of efficient dissemination trees. The central idea is to split the incoming data streams (based on their contents, rates, and destinations) and then spread the pieces across multiple channels, each of which is implemented as an independent dissemination tree. We present the basic design and evaluation of SemCast, an overlay-network based system that implements this semantic multicast approach. Through a detailed simulation study and realistic network topologies, we demonstrate that SemCast significantly improves the efficiency of dissemination compared to traditional approaches.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116980868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haibo Hu, Jianliang Xu, W. S. Wong, Baihua Zheng, Lee, Wang-Chien Lee
Semantic caching enables mobile clients to answer spatial queries locally by storing the query descriptions together with the results. However, it supports only a limited number of query types, and sharing results among these types is difficult. To address these issues, we propose a proactive caching model which caches the result objects as well as the index that supports these objects as the results. The cached index enables the objects to be reused for all common types of queries. We also propose an adaptive scheme to cache such an index, which further optimizes the query response time for the best user experience. Simulation results show that proactive caching achieves a significant performance gain over page caching and semantic caching in mobile environments where wireless bandwidth and battery are precious resources.
{"title":"Proactive caching for spatial queries in mobile environments","authors":"Haibo Hu, Jianliang Xu, W. S. Wong, Baihua Zheng, Lee, Wang-Chien Lee","doi":"10.1109/ICDE.2005.113","DOIUrl":"https://doi.org/10.1109/ICDE.2005.113","url":null,"abstract":"Semantic caching enables mobile clients to answer spatial queries locally by storing the query descriptions together with the results. However, it supports only a limited number of query types, and sharing results among these types is difficult. To address these issues, we propose a proactive caching model which caches the result objects as well as the index that supports these objects as the results. The cached index enables the objects to be reused for all common types of queries. We also propose an adaptive scheme to cache such an index, which further optimizes the query response time for the best user experience. Simulation results show that proactive caching achieves a significant performance gain over page caching and semantic caching in mobile environments where wireless bandwidth and battery are precious resources.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"12 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113980121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Edmunds, S. Muthukrishnan, Subarna Sadhukhan, S. Sueda
Enacting and capturing real motion for all potential scenarios is prohibitively expensive; hence, there is a great demand to synthetically generate realistic human motion. However, it is a central challenge in character animation to synthetically generate a large sequence of smooth human motion. We present a novel, database-centric solution to address this challenge. We demonstrate a method of generating long sequences of motion by performing various similarity-based "joins" on a database of captured motion sequences. This article illustrates our system (MoDB) and showcases the process of encoding captured motion into relational data and generating realistic motion by concatenating sub-sequences of the captured data according to feasibility metrics. The demo features an interactive character that moves towards user-specified targets; the character 's motion is generated by relying on the real time performance of the database for indexing and selection of feasible sub-sequences.
{"title":"MoDB: database system for synthesizing human motion","authors":"T. Edmunds, S. Muthukrishnan, Subarna Sadhukhan, S. Sueda","doi":"10.1109/ICDE.2005.89","DOIUrl":"https://doi.org/10.1109/ICDE.2005.89","url":null,"abstract":"Enacting and capturing real motion for all potential scenarios is prohibitively expensive; hence, there is a great demand to synthetically generate realistic human motion. However, it is a central challenge in character animation to synthetically generate a large sequence of smooth human motion. We present a novel, database-centric solution to address this challenge. We demonstrate a method of generating long sequences of motion by performing various similarity-based \"joins\" on a database of captured motion sequences. This article illustrates our system (MoDB) and showcases the process of encoding captured motion into relational data and generating realistic motion by concatenating sub-sequences of the captured data according to feasibility metrics. The demo features an interactive character that moves towards user-specified targets; the character 's motion is generated by relying on the real time performance of the database for indexing and selection of feasible sub-sequences.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114200137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lightweight computing devices are becoming ubiquitous and an increasing number of applications are being developed for these devices. Many of these applications deal with significant amounts of data and involve complex joins and aggregate operations, which necessitate a local database management system on the device. This is a challenge as these devices are constrained by limited stable storage and main memory. Hence new storage models that reduce storage costs are needed and a storage scheme should be selected based on data characteristics, nature of queries, and updates. Also, query execution plan should be chosen depending on the amount of available memory and the underlying storage scheme; memory should be optimally allocated among the database operators involved in the query. To achieve these goals, we utilize a novel storage model, ID based storage, which reduces storage costs considerably. We present an exact algorithm for allocating memory among the database operators. Because of its high complexity, we also propose a heuristic solution based on the benefit of an operator per unit memory allocation.
{"title":"Efficient data management on lightweight computing devices","authors":"Rajkumar Sen, K. Ramamritham","doi":"10.1109/ICDE.2005.58","DOIUrl":"https://doi.org/10.1109/ICDE.2005.58","url":null,"abstract":"Lightweight computing devices are becoming ubiquitous and an increasing number of applications are being developed for these devices. Many of these applications deal with significant amounts of data and involve complex joins and aggregate operations, which necessitate a local database management system on the device. This is a challenge as these devices are constrained by limited stable storage and main memory. Hence new storage models that reduce storage costs are needed and a storage scheme should be selected based on data characteristics, nature of queries, and updates. Also, query execution plan should be chosen depending on the amount of available memory and the underlying storage scheme; memory should be optimally allocated among the database operators involved in the query. To achieve these goals, we utilize a novel storage model, ID based storage, which reduces storage costs considerably. We present an exact algorithm for allocating memory among the database operators. Because of its high complexity, we also propose a heuristic solution based on the benefit of an operator per unit memory allocation.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114927377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}