Maintaining materialized views that have join conditions between arbitrary pairs of data sources possibly with cycles is critical for many applications. In this paper, we model view maintenance as the process of answering a set of inter-related distributed multi-join queries. We illustrate two strategies for maintaining as well as optimizing such general join views. We propose a cost-driven view maintenance framework which generates optimized maintenance plans tuned to a given environmental settings. This framework can significantly improve view maintenance performance especially in a distributed environment.
{"title":"Cost-driven general join view maintenance over distributed data sources","authors":"B. Liu, Elke A. Rundensteiner","doi":"10.1109/ICDE.2005.40","DOIUrl":"https://doi.org/10.1109/ICDE.2005.40","url":null,"abstract":"Maintaining materialized views that have join conditions between arbitrary pairs of data sources possibly with cycles is critical for many applications. In this paper, we model view maintenance as the process of answering a set of inter-related distributed multi-join queries. We illustrate two strategies for maintaining as well as optimizing such general join views. We propose a cost-driven view maintenance framework which generates optimized maintenance plans tuned to a given environmental settings. This framework can significantly improve view maintenance performance especially in a distributed environment.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132954209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
XQuery processing is one of the prime research topics of the database community. At the same time, XQuery research is still in a "pre-paradigmatic" stage, where the conventional symptoms of the stage are observed: It is hard to piece together point efforts into a big picture. Similarities and interplay opportunities between parallel efforts are "lost in the translation" across the different paradigms. The goal of this tutorial is to federate the plethora of works, and categorize existing work and future topics along a few reference paradigms that fuse existing results around a reference architecture.
{"title":"XQuery midflight: emerging database-oriented paradigms and a classification of research advances","authors":"I. Manolescu, Y. Papakonstantinou","doi":"10.1109/ICDE.2005.158","DOIUrl":"https://doi.org/10.1109/ICDE.2005.158","url":null,"abstract":"XQuery processing is one of the prime research topics of the database community. At the same time, XQuery research is still in a \"pre-paradigmatic\" stage, where the conventional symptoms of the stage are observed: It is hard to piece together point efforts into a big picture. Similarities and interplay opportunities between parallel efforts are \"lost in the translation\" across the different paradigms. The goal of this tutorial is to federate the plethora of works, and categorize existing work and future topics along a few reference paradigms that fuse existing results around a reference architecture.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128860468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many efforts have been devoted to couple data mining activities with relational DBMSs, but a true integration into the relational DBMS kernel has been rarely achieved. This paper presents a novel indexing technique, which represents transactions in a succinct form, appropriate for tightly integrating frequent itemset mining in a relational DBMS. The data representation is complete, i.e., no support threshold is enforced, in order to allow reusing the index for mining itemsets with any support threshold. Furthermore, an appropriate structure of the stored information has been devised, in order to allow a selective access of the index blocks necessary for the current extraction phase. The index has been implemented into the PostgreSQL open source DBMS and exploits its physical level access methods. Experiments have been run for various datasets, characterized by different data distributions. The execution time of the frequent itemset extraction task exploiting the index is always comparable with and sometime faster than a C++ implementation of the FP-growth algorithm accessing data stored on a flat file.
{"title":"Index support for frequent itemset mining in a relational DBMS","authors":"Elena Baralis, T. Cerquitelli, S. Chiusano","doi":"10.1109/ICDE.2005.80","DOIUrl":"https://doi.org/10.1109/ICDE.2005.80","url":null,"abstract":"Many efforts have been devoted to couple data mining activities with relational DBMSs, but a true integration into the relational DBMS kernel has been rarely achieved. This paper presents a novel indexing technique, which represents transactions in a succinct form, appropriate for tightly integrating frequent itemset mining in a relational DBMS. The data representation is complete, i.e., no support threshold is enforced, in order to allow reusing the index for mining itemsets with any support threshold. Furthermore, an appropriate structure of the stored information has been devised, in order to allow a selective access of the index blocks necessary for the current extraction phase. The index has been implemented into the PostgreSQL open source DBMS and exploits its physical level access methods. Experiments have been run for various datasets, characterized by different data distributions. The execution time of the frequent itemset extraction task exploiting the index is always comparable with and sometime faster than a C++ implementation of the FP-growth algorithm accessing data stored on a flat file.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115874715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In data publishing, if the data is published carelessly, public users could use common knowledge to infer more information from the published data, causing leakage of sensitive information. To address related research challenges, we develop a system called XGuard, which can help data owners publish a partial XML document without leaking sensitive information, even if public users can do inference. Specifically, the system has the following functionalities. i) It allows the data owner to define sensitive information and specify common knowledge as XML constraints. ii) Given a partial document, the system can validate if the document can cause information leakage due to common knowledge and how much data can be leaked. iii) The system can help the data owner interactively analyze the data inference and produce a secure valid partial document using the algorithms.
{"title":"XGuard: a system for publishing XML documents without information leakage in the presence of data inference","authors":"Xiaochun Yang, Chen Li, Ge Yu, L. Shi","doi":"10.1109/ICDE.2005.156","DOIUrl":"https://doi.org/10.1109/ICDE.2005.156","url":null,"abstract":"In data publishing, if the data is published carelessly, public users could use common knowledge to infer more information from the published data, causing leakage of sensitive information. To address related research challenges, we develop a system called XGuard, which can help data owners publish a partial XML document without leaking sensitive information, even if public users can do inference. Specifically, the system has the following functionalities. i) It allows the data owner to define sensitive information and specify common knowledge as XML constraints. ii) Given a partial document, the system can validate if the document can cause information leakage due to common knowledge and how much data can be leaked. iii) The system can help the data owner interactively analyze the data inference and produce a secure valid partial document using the algorithms.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115987148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As IT systems become more and more complex and as business operations become increasingly automated, there is a growing need from business managers to have better control on business operations and on how these are aligned with business goals. This paper describes iBOM, a platform for business operation management developed by HP that allows users to i) analyze operations from a business perspective and manage them based on business goals; ii) define business metrics, perform intelligent analysis on them to understand causes of undesired metric values, and predict future values; iii) optimize operations to improve business metrics. A key aspect is that all this functionality is readily available almost at the click of the mouse. The description of the work proceeds from some specific requirements to the solution developed to address them. We also show that the platform is indeed general, as demonstrated by subsequent deployment domains other than finance.
{"title":"iBOM: a platform for intelligent business operation management","authors":"M. Castellanos, F. Casati, M. Shan, U. Dayal","doi":"10.1109/ICDE.2005.73","DOIUrl":"https://doi.org/10.1109/ICDE.2005.73","url":null,"abstract":"As IT systems become more and more complex and as business operations become increasingly automated, there is a growing need from business managers to have better control on business operations and on how these are aligned with business goals. This paper describes iBOM, a platform for business operation management developed by HP that allows users to i) analyze operations from a business perspective and manage them based on business goals; ii) define business metrics, perform intelligent analysis on them to understand causes of undesired metric values, and predict future values; iii) optimize operations to improve business metrics. A key aspect is that all this functionality is readily available almost at the click of the mouse. The description of the work proceeds from some specific requirements to the solution developed to address them. We also show that the platform is indeed general, as demonstrated by subsequent deployment domains other than finance.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"306 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116187715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a distributed XML stream filtering system that uses a large number of subscribers' profiles, written in XPath expressions, to filter XML streams and then publish the filtered data in real-time. To realize the proposed system, we define XPath expression features on XML data and utilize them to forecast the servers' loads. Our method is realized by combining methods to share the total transfer loads of each filtering server and to equalize the sum of overlap size between filtering servers. Experiments show that the rate at which the publishing time increases with the number of XPath expressions is three times smaller in the proposed system than in the round-robin method. Furthermore, the overhead of the proposed method is quite low.
{"title":"Distributed XML stream filtering system with high scalability","authors":"Hiroyuki Uchiyama, Makoto Onizuka, Takashi Honishi","doi":"10.1109/ICDE.2005.50","DOIUrl":"https://doi.org/10.1109/ICDE.2005.50","url":null,"abstract":"We propose a distributed XML stream filtering system that uses a large number of subscribers' profiles, written in XPath expressions, to filter XML streams and then publish the filtered data in real-time. To realize the proposed system, we define XPath expression features on XML data and utilize them to forecast the servers' loads. Our method is realized by combining methods to share the total transfer loads of each filtering server and to equalize the sum of overlap size between filtering servers. Experiments show that the rate at which the publishing time increases with the number of XPath expressions is three times smaller in the proposed system than in the round-robin method. Furthermore, the overhead of the proposed method is quite low.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"461 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125810872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce a cross-media meta-search engine RelaxImage for searching images from Web. Notable features of the RelaxImage are as follows: (1) each user's keyword query is "relaxed", that is, by gradually relaxing the search terms used for image search, we can solve the problem of conventional image search engine such as Google. (2) For searching images, our RelaxImage sends a different keyword-query to each search engine of different media-type. We show several examples of how the relaxation approach works as well as ways that it can be applied. That is, our RelaxImage shows a great improvement for increasing recall ratio without decreasing of precision ratio.
{"title":"RelaxImage: a cross-media meta-search engine for searching images from Web based on query relaxation","authors":"Akihiro Kuwabara, Katsumi Tanaka","doi":"10.1109/ICDE.2005.122","DOIUrl":"https://doi.org/10.1109/ICDE.2005.122","url":null,"abstract":"We introduce a cross-media meta-search engine RelaxImage for searching images from Web. Notable features of the RelaxImage are as follows: (1) each user's keyword query is \"relaxed\", that is, by gradually relaxing the search terms used for image search, we can solve the problem of conventional image search engine such as Google. (2) For searching images, our RelaxImage sends a different keyword-query to each search engine of different media-type. We show several examples of how the relaxation approach works as well as ways that it can be applied. That is, our RelaxImage shows a great improvement for increasing recall ratio without decreasing of precision ratio.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128655665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Dasgupta, S. Ghosal, R. Jain, Upendra Sharma, Akshat Verma
Logical reorganization of data and requirements of differentiated QoS in information systems necessitate bulk data migration by the underlying storage layer. Such data migration needs to ensure that regular client I/Os are not impacted significantly while migration is in progress. We formalize the data migration problem in a unified admission control framework that captures both the performance requirements of client I/Os and the constraints associated with migration. We propose an adaptive rate-control based data migration methodology, QoSMig, that achieves the optimal client performance in a differentiated QoS setting, while ensuring that the specified migration constraints are met QoSMig uses both long term averages and short term forecasts of client traffic to compute a migration schedule. We present an architecture based on Service Level Enforcement Discipline for Storage (SLEDS) that supports QoSMig. Our trace-driven experimental study demonstrates that QoSMig provides significantly better I/O performance as compared to existing migration methodologies.
{"title":"QoSMig: adaptive rate-controlled migration of bulk data in storage systems","authors":"K. Dasgupta, S. Ghosal, R. Jain, Upendra Sharma, Akshat Verma","doi":"10.1109/ICDE.2005.116","DOIUrl":"https://doi.org/10.1109/ICDE.2005.116","url":null,"abstract":"Logical reorganization of data and requirements of differentiated QoS in information systems necessitate bulk data migration by the underlying storage layer. Such data migration needs to ensure that regular client I/Os are not impacted significantly while migration is in progress. We formalize the data migration problem in a unified admission control framework that captures both the performance requirements of client I/Os and the constraints associated with migration. We propose an adaptive rate-control based data migration methodology, QoSMig, that achieves the optimal client performance in a differentiated QoS setting, while ensuring that the specified migration constraints are met QoSMig uses both long term averages and short term forecasts of client traffic to compute a migration schedule. We present an architecture based on Service Level Enforcement Discipline for Storage (SLEDS) that supports QoSMig. Our trace-driven experimental study demonstrates that QoSMig provides significantly better I/O performance as compared to existing migration methodologies.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115963590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Few years ago, DBMS stored data on disks and cached recently used data in main memory buffer pools, while designers worried about improving I/O performance and maximizing main memory utilization. Today, however, databases live in multi-level memory hierarchies that include disks, main memories, and several levels of processor caches. Recent research shows that all levels of the underlying computer hardware and devices directly influence database performance. This paper aims at (a) explaining why database performance depends on modern processor and memory microarchitectures, (b) surveying and contrasting research on the topic over the past decade, and (c) discussing future research challenges.
{"title":"Database architectures for new hardware","authors":"A. Ailamaki","doi":"10.1109/ICDE.2005.45","DOIUrl":"https://doi.org/10.1109/ICDE.2005.45","url":null,"abstract":"Few years ago, DBMS stored data on disks and cached recently used data in main memory buffer pools, while designers worried about improving I/O performance and maximizing main memory utilization. Today, however, databases live in multi-level memory hierarchies that include disks, main memories, and several levels of processor caches. Recent research shows that all levels of the underlying computer hardware and devices directly influence database performance. This paper aims at (a) explaining why database performance depends on modern processor and memory microarchitectures, (b) surveying and contrasting research on the topic over the past decade, and (c) discussing future research challenges.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116611135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a data model for fuzzy spatial objects implemented as an algebra module in SECONDO. Furthermore, the graphical representation of such objects is discussed.
本文描述了一个用代数模块实现的模糊空间对象数据模型。此外,还讨论了这些对象的图形表示。
{"title":"Fuzzy spatial objects: an algebra implementation in SECONDO","authors":"T. Behr, R. H. Güting","doi":"10.1109/ICDE.2005.70","DOIUrl":"https://doi.org/10.1109/ICDE.2005.70","url":null,"abstract":"This paper describes a data model for fuzzy spatial objects implemented as an algebra module in SECONDO. Furthermore, the graphical representation of such objects is discussed.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124990884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}