This paper describes how to fit fractal models, online, on IP traffic data streams. Our approach relies on maintaining a sketch of the data stream and fitting straight lines: it yields algorithms that are fast, space-efficient, and accurate. We implemented our methods in AT&T’s Gigascope data stream management system, to demonstrate their practicality at streaming line speeds.
{"title":"Fractal Modeling of IP Network Traffic at Streaming Speeds","authors":"Flip Korn, S. Muthukrishnan, Yihua Wu","doi":"10.1109/ICDE.2006.73","DOIUrl":"https://doi.org/10.1109/ICDE.2006.73","url":null,"abstract":"This paper describes how to fit fractal models, online, on IP traffic data streams. Our approach relies on maintaining a sketch of the data stream and fitting straight lines: it yields algorithms that are fast, space-efficient, and accurate. We implemented our methods in AT&T’s Gigascope data stream management system, to demonstrate their practicality at streaming line speeds.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"19 1","pages":"155-155"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76446483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Using query interfaces of different Web databases, we propose a new complex schema matching approach, Parallel Schema Matching (PSM). A parallel schema is formed by comparing two individual schemas and deleting common attributes. The attribute matching can be discovered from the attribute-occurrence patterns if many parallel schemas are available. A count-based greedy algorithm identifies which attributes are more likely to be matched. Experiments show that PSM can identify both simple matching and complex matching accurately and efficiently.
{"title":"Holistic Query Interface Matching using Parallel Schema Matching","authors":"Weifeng Su, Jiying Wang, F. Lochovsky","doi":"10.1109/ICDE.2006.77","DOIUrl":"https://doi.org/10.1109/ICDE.2006.77","url":null,"abstract":"Using query interfaces of different Web databases, we propose a new complex schema matching approach, Parallel Schema Matching (PSM). A parallel schema is formed by comparing two individual schemas and deleting common attributes. The attribute matching can be discovered from the attribute-occurrence patterns if many parallel schemas are available. A count-based greedy algorithm identifies which attributes are more likely to be matched. Experiments show that PSM can identify both simple matching and complex matching accurately and efficiently.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"6 1","pages":"122-122"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74911164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a framework, called MIC, which adopts an information-theoretic approach to address the problem of quantitative association rule mining. In our MIC framework, we first discretize the quantitative attributes. Then, we compute the normalized mutual information between the attributes to construct a graph that indicates the strong informative-relationship between the attributes. We utilize the cliques in the graph to prune the unpromising attribute sets and hence the joined intervals between these attributes. Our experimental results show that the MIC framework significantly improves the mining speed. Importantly, we are able to obtain most of the high-confidence rules and the missing rules are shown to be less interesting.
{"title":"MIC Framework: An Information-Theoretic Approach to Quantitative Association Rule Mining","authors":"Yiping Ke, James Cheng, Wilfred Ng","doi":"10.1109/ICDE.2006.94","DOIUrl":"https://doi.org/10.1109/ICDE.2006.94","url":null,"abstract":"We propose a framework, called MIC, which adopts an information-theoretic approach to address the problem of quantitative association rule mining. In our MIC framework, we first discretize the quantitative attributes. Then, we compute the normalized mutual information between the attributes to construct a graph that indicates the strong informative-relationship between the attributes. We utilize the cliques in the graph to prune the unpromising attribute sets and hence the joined intervals between these attributes. Our experimental results show that the MIC framework significantly improves the mining speed. Importantly, we are able to obtain most of the high-confidence rules and the missing rules are shown to be less interesting.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"11 1","pages":"112-112"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75139581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the problem of speeding up Entity Recognition systems that exploit existing large databases of structured entities to improve extraction accuracy. These systems require the computation of the maximum similarity scores of several overlapping segments of the input text with the entity database. We formulate a Batch-Top-K problem with the goal of sharing computations across overlapping segments. Our proposed algorithm performs a factor of three faster than independent Top-K queries and only a factor of two slower than an unachievable lower bound on total cost. We then propose a novel modification of the popular Viterbi algorithm for recognizing entities so as to work with easily computable bounds on match scores, thereby reducing the total inference time by a factor of eight compared to stateof- the-art methods.
{"title":"Efficient Batch Top-k Search for Dictionary-based Entity Recognition","authors":"Amit Chandel, P. Nagesh, Sunita Sarawagi","doi":"10.1109/ICDE.2006.55","DOIUrl":"https://doi.org/10.1109/ICDE.2006.55","url":null,"abstract":"We consider the problem of speeding up Entity Recognition systems that exploit existing large databases of structured entities to improve extraction accuracy. These systems require the computation of the maximum similarity scores of several overlapping segments of the input text with the entity database. We formulate a Batch-Top-K problem with the goal of sharing computations across overlapping segments. Our proposed algorithm performs a factor of three faster than independent Top-K queries and only a factor of two slower than an unachievable lower bound on total cost. We then propose a novel modification of the popular Viterbi algorithm for recognizing entities so as to work with easily computable bounds on match scores, thereby reducing the total inference time by a factor of eight compared to stateof- the-art methods.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"8 1","pages":"28-28"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76272404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suppose that Alice, owner of a k-anonymous database, needs to determine whether her database, when adjoined with a tuple owned by Bob, is still k-anonymous. Suppose moreover that access to the database is strictly controlled, because for example data are used for experiments that need to be maintained confidential. Clearly, allowing Alice to directly read the contents of the tuple breaks the privacy of Bob; on the other hand, the confidentiality of the database managed by Alice is violated once Bob has access to the contents of the database. Thus the problem is to check whether the database adjoined with the tuple is still k-anonymous, without letting Alice and Bob know the contents of, respectively, the tuple and the database. In this paper, we propose two protocols solving this problem.
{"title":"Private Updates to Anonymous Databases","authors":"Alberto Trombetta, E. Bertino","doi":"10.1109/ICDE.2006.117","DOIUrl":"https://doi.org/10.1109/ICDE.2006.117","url":null,"abstract":"Suppose that Alice, owner of a k-anonymous database, needs to determine whether her database, when adjoined with a tuple owned by Bob, is still k-anonymous. Suppose moreover that access to the database is strictly controlled, because for example data are used for experiments that need to be maintained confidential. Clearly, allowing Alice to directly read the contents of the tuple breaks the privacy of Bob; on the other hand, the confidentiality of the database managed by Alice is violated once Bob has access to the contents of the database. Thus the problem is to check whether the database adjoined with the tuple is still k-anonymous, without letting Alice and Bob know the contents of, respectively, the tuple and the database. In this paper, we propose two protocols solving this problem.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"18 1","pages":"116-116"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81395676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fatih Emekçi, D. Agrawal, A. E. Abbadi, Aziz Gulbeden
Data integration from multiple autonomous data sources has emerged as an important practical problem. The key requirement for such data integration is that owners of such data need to cooperate in a competitive landscape in most of the cases. The research challenge in developing a query processing solution is that the answers to the queries need to be provided while preserving the privacy of the data sources. In general, allowing unrestricted read access to the whole data may give rise to potential vulnerabilities as well as may have legal implications. Therefore, there is a need for privacy preserving database operations for querying data residing at different parties. In this paper, we propose a new query processing technique using third parties in a peer-to-peer system. We propose and evaluate two different protocols for various database operations. Our scheme is able to answer queries without revealing any useful information to the data sources or to the third parties. Analytical comparison of the proposed approach with other recent proposals for privacy-preserving data integration establishes the superiority of the proposed approach in terms of query response time
{"title":"Privacy Preserving Query Processing Using Third Parties","authors":"Fatih Emekçi, D. Agrawal, A. E. Abbadi, Aziz Gulbeden","doi":"10.1109/ICDE.2006.116","DOIUrl":"https://doi.org/10.1109/ICDE.2006.116","url":null,"abstract":"Data integration from multiple autonomous data sources has emerged as an important practical problem. The key requirement for such data integration is that owners of such data need to cooperate in a competitive landscape in most of the cases. The research challenge in developing a query processing solution is that the answers to the queries need to be provided while preserving the privacy of the data sources. In general, allowing unrestricted read access to the whole data may give rise to potential vulnerabilities as well as may have legal implications. Therefore, there is a need for privacy preserving database operations for querying data residing at different parties. In this paper, we propose a new query processing technique using third parties in a peer-to-peer system. We propose and evaluate two different protocols for various database operations. Our scheme is able to answer queries without revealing any useful information to the data sources or to the third parties. Analytical comparison of the proposed approach with other recent proposals for privacy-preserving data integration establishes the superiority of the proposed approach in terms of query response time","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"20 1","pages":"27-27"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82361388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The XPath language incorporates powerful primitives for formulating queries containing nested subexpressions which are existentially or universally quantified. However, even the best published approaches for evaluating XPath have unsatisfactory performance when applied to nested queries. We examine optimization techniques that unnest complex XPath queries. For this purpose, we classify XPath expressions particularly with regard to properties that are relevant for unnesting. We present algebraic equivalences that transform nested expressions into unnested expressions. In our experiments we compare the evaluation times with existing XPath evaluators and the naive evaluation.
{"title":"Algebraic Optimization of Nested XPath Expressions","authors":"M. Brantner, C. Kanne, G. Moerkotte, S. Helmer","doi":"10.1109/ICDE.2006.15","DOIUrl":"https://doi.org/10.1109/ICDE.2006.15","url":null,"abstract":"The XPath language incorporates powerful primitives for formulating queries containing nested subexpressions which are existentially or universally quantified. However, even the best published approaches for evaluating XPath have unsatisfactory performance when applied to nested queries. We examine optimization techniques that unnest complex XPath queries. For this purpose, we classify XPath expressions particularly with regard to properties that are relevant for unnesting. We present algebraic equivalences that transform nested expressions into unnested expressions. In our experiments we compare the evaluation times with existing XPath evaluators and the naive evaluation.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"116 1","pages":"128-128"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79192478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wide spread use of database systems in modern society has brought the need to provide inexperienced users with the ability to easily search a database with no specific knowledge of a query language. Several recent research efforts have focused on supporting keyword-based searches over relational databases. This paper presents an alternative proposal and introduces the idea of précis queries. These are free-form queries whose answer (a précis) is a synthesis of results, containing not only information directly related to the query selections but also information implicitly related to them in various ways. Our approach to précis queries includes two additional novelties: (a) queries do not generate individual relations but entire multi-relation databases; and (b) query results are personalized to user-specific and/or domain requirements. We develop a framework and system architecture for supporting such queries in the context of a relational database system and describe algorithms that implement the required functionality. Finally, we present a set of experimental results that evaluate the proposed algorithms and show the potential of this work.
{"title":"Précis: The Essence of a Query Answer","authors":"G. Koutrika, A. Simitsis, Y. Ioannidis","doi":"10.1109/ICDE.2006.114","DOIUrl":"https://doi.org/10.1109/ICDE.2006.114","url":null,"abstract":"Wide spread use of database systems in modern society has brought the need to provide inexperienced users with the ability to easily search a database with no specific knowledge of a query language. Several recent research efforts have focused on supporting keyword-based searches over relational databases. This paper presents an alternative proposal and introduces the idea of précis queries. These are free-form queries whose answer (a précis) is a synthesis of results, containing not only information directly related to the query selections but also information implicitly related to them in various ways. Our approach to précis queries includes two additional novelties: (a) queries do not generate individual relations but entire multi-relation databases; and (b) query results are personalized to user-specific and/or domain requirements. We develop a framework and system architecture for supporting such queries in the context of a relational database system and describe algorithms that implement the required functionality. Finally, we present a set of experimental results that evaluate the proposed algorithms and show the potential of this work.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"29 1","pages":"69-69"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76815400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present UNIDOOR, a deductive objectoriented database system (DOOD). We demonstrate the distinctive features of UNIDOOR data model and its query language. We then show how essential object-oriented and database management features, that were missing from other DOOD implementations, are successfully supported in UNIDOOR. These features include a scalable persistent store with crash recovery, database integrity and transaction control facilities in a multi-user environment.
{"title":"UNIDOOR: a Deductive Object-Oriented Database Management System","authors":"M. Jaber, A. Voronkov","doi":"10.1109/ICDE.2006.164","DOIUrl":"https://doi.org/10.1109/ICDE.2006.164","url":null,"abstract":"In this paper, we present UNIDOOR, a deductive objectoriented database system (DOOD). We demonstrate the distinctive features of UNIDOOR data model and its query language. We then show how essential object-oriented and database management features, that were missing from other DOOD implementations, are successfully supported in UNIDOOR. These features include a scalable persistent store with crash recovery, database integrity and transaction control facilities in a multi-user environment.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"112 1","pages":"157-157"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77029194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many real world applications deal with transactional data, characterized by a huge number of transactions (tuples) with a small number of dimensions (attributes). However, there are some other applications that involve rather high dimensional data with a small number of tuples. Examples of such applications include bioinformatics, survey-based statistical analysis, text processing, and so on. High dimensional data pose great challenges to most existing data mining algorithms. Although there are numerous algorithms dealing with transactional data sets, there are few algorithms oriented to very high dimensional data sets with a relatively small number of tuples.
{"title":"Top-Down Mining of Interesting Patterns from Very High Dimensional Data","authors":"Hongyan Liu, Jiawei Han, Dong Xin, Zheng Shao","doi":"10.1109/ICDE.2006.161","DOIUrl":"https://doi.org/10.1109/ICDE.2006.161","url":null,"abstract":"Many real world applications deal with transactional data, characterized by a huge number of transactions (tuples) with a small number of dimensions (attributes). However, there are some other applications that involve rather high dimensional data with a small number of tuples. Examples of such applications include bioinformatics, survey-based statistical analysis, text processing, and so on. High dimensional data pose great challenges to most existing data mining algorithms. Although there are numerous algorithms dealing with transactional data sets, there are few algorithms oriented to very high dimensional data sets with a relatively small number of tuples.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"411 1","pages":"114-114"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79923232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}