首页 > 最新文献

Proceedings 18th International Conference on Data Engineering最新文献

英文 中文
Mapping XML and relational schemas with Clio 用Clio映射XML和关系模式
Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994768
Mauricio A. Hernández, Lucian Popa, Yannis Velegrakis, Renée J. Miller, Felix Naumann, C. T. H. Ho
Merging and coalescing data from multiple and diverse sources into different data formats continues to be an important problem in modern information systems. Schema matching (the process of matching elements of a source schema with elements of a target schema) and schema mapping (the process of creating a query that maps between two disparate schemas) are at the heart of data integration systems. We demonstrate Clio, a semi-automatic schema mapping tool developed at the IBM Almaden Research Center. In this paper, we showcase Clio's mapping engine which allows mapping to and from relational and XML schemas, and takes advantage of data constraints in order to preserve data associations.
将来自多个不同来源的数据合并成不同的数据格式仍然是现代信息系统中的一个重要问题。模式匹配(将源模式的元素与目标模式的元素进行匹配的过程)和模式映射(创建在两个完全不同的模式之间进行映射的查询的过程)是数据集成系统的核心。我们将演示Clio,这是IBM Almaden研究中心开发的一种半自动模式映射工具。在本文中,我们将展示Clio的映射引擎,它允许在关系模式和XML模式之间进行映射,并利用数据约束来保持数据关联。
{"title":"Mapping XML and relational schemas with Clio","authors":"Mauricio A. Hernández, Lucian Popa, Yannis Velegrakis, Renée J. Miller, Felix Naumann, C. T. H. Ho","doi":"10.1109/ICDE.2002.994768","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994768","url":null,"abstract":"Merging and coalescing data from multiple and diverse sources into different data formats continues to be an important problem in modern information systems. Schema matching (the process of matching elements of a source schema with elements of a target schema) and schema mapping (the process of creating a query that maps between two disparate schemas) are at the heart of data integration systems. We demonstrate Clio, a semi-automatic schema mapping tool developed at the IBM Almaden Research Center. In this paper, we showcase Clio's mapping engine which allows mapping to and from relational and XML schemas, and takes advantage of data constraints in order to preserve data associations.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134156352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
GADT: a probability space ADT for representing and querying the physical world 表示和查询物理世界的概率空间ADT
Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994710
Anton Faradjian, J. Gehrke, Philippe Bonnet
Large sensor networks are being widely deployed for measurement, detection and monitoring applications. Many of these applications involve database systems to store and process data from the physical world. This data has inherent measurement uncertainties that are properly represented by continuous probability distribution functions (PDFs). We introduce a new object-relational abstract data type (ADT) - the Gaussian ADT (GADT) - that models physical data as Gaussian PDFs, and we show that existing index structures can be used as fast access methods for GADT data. We also present a measurement-theoretic model of probabilistic data and evaluate GADT in its light.
大型传感器网络被广泛用于测量、检测和监控应用。这些应用程序中有许多涉及数据库系统来存储和处理来自物理世界的数据。该数据具有固有的测量不确定性,这些不确定性由连续概率分布函数(pdf)适当地表示。我们引入了一种新的对象关系抽象数据类型(ADT)——高斯抽象数据类型(GADT),它将物理数据建模为高斯pdf,并证明了现有的索引结构可以作为GADT数据的快速访问方法。我们还提出了一个概率数据的测量理论模型,并从该模型的角度对GADT进行了评价。
{"title":"GADT: a probability space ADT for representing and querying the physical world","authors":"Anton Faradjian, J. Gehrke, Philippe Bonnet","doi":"10.1109/ICDE.2002.994710","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994710","url":null,"abstract":"Large sensor networks are being widely deployed for measurement, detection and monitoring applications. Many of these applications involve database systems to store and process data from the physical world. This data has inherent measurement uncertainties that are properly represented by continuous probability distribution functions (PDFs). We introduce a new object-relational abstract data type (ADT) - the Gaussian ADT (GADT) - that models physical data as Gaussian PDFs, and we show that existing index structures can be used as fast access methods for GADT data. We also present a measurement-theoretic model of probabilistic data and evaluate GADT in its light.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125772940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 88
Discovering similar multidimensional trajectories 发现相似的多维轨迹
Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994784
M. Vlachos, D. Gunopulos, G. Kollios
We investigate techniques for analysis and retrieval of object trajectories in two or three dimensional space. Such data usually contain a large amount of noise, that has made previously used metrics fail. Therefore, we formalize non-metric similarity functions based on the longest common subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to similar portions of the sequences. Stretching of sequences in time is allowed, as well as global translation of the sequences in space. Efficient approximate algorithms that compute these similarity measures are also provided. We compare these new methods to the widely used Euclidean and time warping distance functions (for real and synthetic data) and show the superiority of our approach, especially in the strong presence of noise. We prove a weaker version of the triangle inequality and employ it in an indexing structure to answer nearest neighbor queries. Finally, we present experimental results that validate the accuracy and efficiency of our approach.
我们研究了在二维或三维空间中分析和检索物体轨迹的技术。这样的数据通常包含大量的噪声,这使得以前使用的指标失效。因此,我们形式化了基于最长公共子序列(LCSS)的非度量相似性函数,它对噪声具有很强的鲁棒性,并且通过赋予序列的相似部分更多的权重,进一步提供了轨迹之间的直观相似性概念。允许序列在时间上的拉伸,以及序列在空间上的全局平移。本文还提供了计算这些相似性度量的有效近似算法。我们将这些新方法与广泛使用的欧几里得和时间翘曲距离函数(用于真实和合成数据)进行了比较,并显示了我们方法的优越性,特别是在强噪声存在的情况下。我们证明了三角不等式的一个弱版本,并将其应用于一个索引结构中来回答最近邻查询。最后,我们给出了实验结果,验证了我们方法的准确性和效率。
{"title":"Discovering similar multidimensional trajectories","authors":"M. Vlachos, D. Gunopulos, G. Kollios","doi":"10.1109/ICDE.2002.994784","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994784","url":null,"abstract":"We investigate techniques for analysis and retrieval of object trajectories in two or three dimensional space. Such data usually contain a large amount of noise, that has made previously used metrics fail. Therefore, we formalize non-metric similarity functions based on the longest common subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to similar portions of the sequences. Stretching of sequences in time is allowed, as well as global translation of the sequences in space. Efficient approximate algorithms that compute these similarity measures are also provided. We compare these new methods to the widely used Euclidean and time warping distance functions (for real and synthetic data) and show the superiority of our approach, especially in the strong presence of noise. We prove a weaker version of the triangle inequality and employ it in an indexing structure to answer nearest neighbor queries. Finally, we present experimental results that validate the accuracy and efficiency of our approach.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123128557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1526
Extensible and similarity-based grouping for data integration 用于数据集成的可扩展和基于相似性的分组
Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994731
E. Schallehn, K. Sattler, G. Saake
The general concept of grouping and aggregation appears to be a fitting paradigm for various issues in data integration, but in its common form of equality-based grouping, a number of problems remain unsolved. We propose a generic approach to user-defined grouping as part of a SQL extension, allowing for more complex functions, for instance integration of data mining algorithms. Furthermore, we discuss high-level language primitives for common applications.
分组和聚合的一般概念似乎是数据集成中各种问题的合适范例,但是在其基于平等的分组的常见形式中,许多问题仍未解决。我们提出了一种通用的用户定义分组方法,作为SQL扩展的一部分,允许更复杂的功能,例如数据挖掘算法的集成。此外,我们还讨论了通用应用程序的高级语言原语。
{"title":"Extensible and similarity-based grouping for data integration","authors":"E. Schallehn, K. Sattler, G. Saake","doi":"10.1109/ICDE.2002.994731","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994731","url":null,"abstract":"The general concept of grouping and aggregation appears to be a fitting paradigm for various issues in data integration, but in its common form of equality-based grouping, a number of problems remain unsolved. We propose a generic approach to user-defined grouping as part of a SQL extension, allowing for more complex functions, for instance integration of data mining algorithms. Furthermore, we discuss high-level language primitives for common applications.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121765841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
NeT and CoT: inferring XML schemas from relational world . NeT和CoT:从关系世界推断XML模式
Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994721
Dongwon Lee, Murali Mani, Frank Chiu, W. Chu
Two conversion algorithms, called NeT and COT, to translate relational schemas to XML schemas using various semantic constraints are presented. We first present a language-independent formalism named XSchema so that our algorithms are able to generate output schema in various XML schema language proposals. The benefits of such a formalism are that it is both precise and concise. Based on the XSchema formalism, our proposed algorithms have the following characteristics: (1) NeT derives a nested structure from a flat relational model by repeatedly applying the nest operator so that the resulting XML schema becomes hierarchical, and (2) COT considers not only the structure of relational schemas, but also inclusion dependencies during the translation so that relational schemas where multiple tables are interconnected through inclusion dependencies can also be handled.
提出了使用各种语义约束将关系模式转换为XML模式的两种转换算法,称为NeT和COT。我们首先提出一种名为XSchema的独立于语言的形式化方法,以便我们的算法能够以各种XML模式语言建议生成输出模式。这种形式主义的好处是它既精确又简洁。基于XSchema的形式化,我们提出的算法具有以下特点:(1)NeT通过重复应用嵌套运算符从平面关系模型派生出嵌套结构,从而得到层次结构的XML模式;(2)COT在转换过程中不仅考虑关系模式的结构,还考虑包含依赖关系,从而可以处理多个表通过包含依赖关系相互连接的关系模式。
{"title":"NeT and CoT: inferring XML schemas from relational world","authors":"Dongwon Lee, Murali Mani, Frank Chiu, W. Chu","doi":"10.1109/ICDE.2002.994721","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994721","url":null,"abstract":"Two conversion algorithms, called NeT and COT, to translate relational schemas to XML schemas using various semantic constraints are presented. We first present a language-independent formalism named XSchema so that our algorithms are able to generate output schema in various XML schema language proposals. The benefits of such a formalism are that it is both precise and concise. Based on the XSchema formalism, our proposed algorithms have the following characteristics: (1) NeT derives a nested structure from a flat relational model by repeatedly applying the nest operator so that the resulting XML schema becomes hierarchical, and (2) COT considers not only the structure of relational schemas, but also inclusion dependencies during the translation so that relational schemas where multiple tables are interconnected through inclusion dependencies can also be handled.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121794865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Efficient algorithm for projected clustering 高效的投影聚类算法
Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994727
Eric Ka Ka Ng, A. Fu
With high-dimensional data, natural clusters are expected to exist in different subspaces. We propose the EPC (efficient projected clustering) algorithm to discover the sets of correlated dimensions and the location of the clusters. This algorithm is quite different from previous approaches and has the following advantages: (1) there is no requirement on the input regarding the number of natural clusters and the average cardinality of the subspaces; (2) it can handle clusters of irregular shapes; (3) it produces better clustering results compared to the best previous method; (4) it has high scalability. From experiments, it is several times faster than the previous method, while producing more accurate results.
对于高维数据,期望自然聚类存在于不同的子空间中。我们提出了EPC(高效投影聚类)算法来发现相关维集和聚类的位置。该算法与以往的方法有很大的不同,具有以下优点:(1)对输入的自然聚类个数和子空间的平均基数没有要求;(2)能够处理不规则形状的簇;(3)与之前最优的聚类方法相比,聚类效果更好;(4)具有较高的可扩展性。实验结果表明,该方法的速度比以前的方法快几倍,同时得到的结果更准确。
{"title":"Efficient algorithm for projected clustering","authors":"Eric Ka Ka Ng, A. Fu","doi":"10.1109/ICDE.2002.994727","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994727","url":null,"abstract":"With high-dimensional data, natural clusters are expected to exist in different subspaces. We propose the EPC (efficient projected clustering) algorithm to discover the sets of correlated dimensions and the location of the clusters. This algorithm is quite different from previous approaches and has the following advantages: (1) there is no requirement on the input regarding the number of natural clusters and the average cardinality of the subspaces; (2) it can handle clusters of irregular shapes; (3) it produces better clustering results compared to the best previous method; (4) it has high scalability. From experiments, it is several times faster than the previous method, while producing more accurate results.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127501190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A distributed database server for continuous media 用于连续介质的分布式数据库服务器
Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994764
Walid G. Aref, A. Catlin, A. Elmagarmid, Jianping Fan, J. Guo, M. Hammad, I. Ilyas, M. Marzouk, Sunil Prabhakar, A. Rezgui, S. Teoh, Evimaria Terzi, Yi-Cheng Tu, A. Vakali, Xingquan Zhu
In our project, we are adopting a new approach for handling video data. We view the video as a well-defined data type with its own description, parameters and applicable methods. The system is based on PREDATOR, an open-source object-relational DBMS. PREDATOR uses Shore as the underlying storage manager. Supporting video operations (storing, searching-by-content and streaming) and new query types (query-by-example and multi-feature similarity searching) requires major changes in many of the traditional system components. More specifically, the storage and buffer manager has to deal with huge volumes of data with real-time constraints. Query processing has to consider the video methods and operators in generating, optimizing and executing the query plans.
在我们的项目中,我们采用了一种处理视频数据的新方法。我们将视频视为一种定义良好的数据类型,具有自己的描述、参数和适用的方法。该系统基于开放源代码的对象关系数据库管理系统PREDATOR。PREDATOR使用Shore作为底层存储管理器。支持视频操作(存储、按内容搜索和流)和新的查询类型(按示例查询和多特征相似性搜索)需要对许多传统系统组件进行重大更改。更具体地说,存储和缓冲区管理器必须处理具有实时约束的大量数据。查询处理在生成、优化和执行查询计划时必须考虑视频方法和算子。
{"title":"A distributed database server for continuous media","authors":"Walid G. Aref, A. Catlin, A. Elmagarmid, Jianping Fan, J. Guo, M. Hammad, I. Ilyas, M. Marzouk, Sunil Prabhakar, A. Rezgui, S. Teoh, Evimaria Terzi, Yi-Cheng Tu, A. Vakali, Xingquan Zhu","doi":"10.1109/ICDE.2002.994764","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994764","url":null,"abstract":"In our project, we are adopting a new approach for handling video data. We view the video as a well-defined data type with its own description, parameters and applicable methods. The system is based on PREDATOR, an open-source object-relational DBMS. PREDATOR uses Shore as the underlying storage manager. Supporting video operations (storing, searching-by-content and streaming) and new query types (query-by-example and multi-feature similarity searching) requires major changes in many of the traditional system components. More specifically, the storage and buffer manager has to deal with huge volumes of data with real-time constraints. Query processing has to consider the video methods and operators in generating, optimizing and executing the query plans.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128263753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Keyword searching and browsing in databases using BANKS 使用BANKS对数据库进行关键词搜索和浏览
Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994756
Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, S. Sudarshan
With the growth of the Web, there has been a rapid increase in the number of users who need to access online databases without having a detailed knowledge of the schema or of query languages; even relatively simple query languages designed for non-experts are too complicated for them. We describe BANKS, a system which enables keyword-based search on relational databases, together with data and schema browsing. BANKS enables users to extract information in a simple manner without any knowledge of the schema or any need for writing complex queries. A user can get information by typing a few keywords, following hyperlinks, and interacting with controls on the displayed results. BANKS models tuples as nodes in a graph, connected by links induced by foreign key and other relationships. Answers to a query are modeled as rooted trees connecting tuples that match individual keywords in the query. Answers are ranked using a notion of proximity coupled with a notion of prestige of nodes based on inlinks, similar to techniques developed for Web search. We present an efficient heuristic algorithm for finding and ranking query results.
随着Web的发展,需要在没有模式或查询语言详细知识的情况下访问在线数据库的用户数量迅速增加;即使是为非专家设计的相对简单的查询语言对他们来说也太复杂了。我们描述了BANKS,这是一个能够在关系数据库中进行基于关键字的搜索,以及数据和模式浏览的系统。BANKS使用户能够以一种简单的方式提取信息,而无需了解模式或编写复杂查询。用户可以通过输入几个关键字、跟随超链接并与显示结果上的控件进行交互来获取信息。BANKS将元组建模为图中的节点,通过由外键和其他关系引起的链接连接。查询的答案建模为根树,连接与查询中单个关键字匹配的元组。答案的排名使用了临近度的概念以及基于链接的节点声望的概念,类似于为Web搜索开发的技术。我们提出了一种高效的启发式算法来查找和排序查询结果。
{"title":"Keyword searching and browsing in databases using BANKS","authors":"Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, S. Sudarshan","doi":"10.1109/ICDE.2002.994756","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994756","url":null,"abstract":"With the growth of the Web, there has been a rapid increase in the number of users who need to access online databases without having a detailed knowledge of the schema or of query languages; even relatively simple query languages designed for non-experts are too complicated for them. We describe BANKS, a system which enables keyword-based search on relational databases, together with data and schema browsing. BANKS enables users to extract information in a simple manner without any knowledge of the schema or any need for writing complex queries. A user can get information by typing a few keywords, following hyperlinks, and interacting with controls on the displayed results. BANKS models tuples as nodes in a graph, connected by links induced by foreign key and other relationships. Answers to a query are modeled as rooted trees connecting tuples that match individual keywords in the query. Answers are ranked using a notion of proximity coupled with a notion of prestige of nodes based on inlinks, similar to techniques developed for Web search. We present an efficient heuristic algorithm for finding and ranking query results.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132588105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1070
Efficient OLAP query processing in distributed data warehouses 分布式数据仓库中高效的OLAP查询处理
Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994716
M. Akinde, Michael H. Böhlen, T. Johnson, L. Lakshmanan, D. Srivastava
The success of Internet applications has led to an explosive growth in the demand for bandwidth from ISPs. Managing an IP network includes complex data analysis that can often be expressed as OLAP queries. Current day OLAP tools assume the availability of the detailed data in a centralized warehouse. However, the inherently distributed nature of the data collection (e.g., flow-level traffic statistics are gathered at network routers) and the huge amount of data extracted at each collection point (of the order of several gigabytes per day for large IP networks) makes such an approach highly impractical. The natural solution to this problem is to maintain a distributed data warehouse, consisting of multiple local data warehouses (sites) adjacent to the collection points, together with a coordinator. In order for such a solution to make sense, we need a technology for distributed processing of complex OLAP queries. We have developed the Skalla system for this task. We conducted an experimental study of the Skalla evaluation scheme using TPC(R) data.
互联网应用的成功导致了互联网服务提供商对带宽需求的爆炸性增长。管理IP网络包括复杂的数据分析,通常可以表示为OLAP查询。当前的OLAP工具假定集中仓库中详细数据的可用性。然而,数据收集固有的分布式特性(例如,流量级流量统计数据是在网络路由器上收集的)和在每个收集点提取的大量数据(对于大型IP网络,每天有数千兆字节)使得这种方法非常不切实际。这个问题的自然解决方案是维护一个分布式数据仓库,由与收集点相邻的多个本地数据仓库(站点)和一个协调器组成。为了使这种解决方案有意义,我们需要一种用于分布式处理复杂OLAP查询的技术。我们为这项任务开发了斯卡拉系统。我们利用TPC(R)数据对Skalla评价方案进行了实验研究。
{"title":"Efficient OLAP query processing in distributed data warehouses","authors":"M. Akinde, Michael H. Böhlen, T. Johnson, L. Lakshmanan, D. Srivastava","doi":"10.1109/ICDE.2002.994716","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994716","url":null,"abstract":"The success of Internet applications has led to an explosive growth in the demand for bandwidth from ISPs. Managing an IP network includes complex data analysis that can often be expressed as OLAP queries. Current day OLAP tools assume the availability of the detailed data in a centralized warehouse. However, the inherently distributed nature of the data collection (e.g., flow-level traffic statistics are gathered at network routers) and the huge amount of data extracted at each collection point (of the order of several gigabytes per day for large IP networks) makes such an approach highly impractical. The natural solution to this problem is to maintain a distributed data warehouse, consisting of multiple local data warehouses (sites) adjacent to the collection points, together with a coordinator. In order for such a solution to make sense, we need a technology for distributed processing of complex OLAP queries. We have developed the Skalla system for this task. We conducted an experimental study of the Skalla evaluation scheme using TPC(R) data.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"19 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114043176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
Efficiently ordering query plans for data integration 有效地为数据集成排序查询计划
Pub Date : 1999-07-31 DOI: 10.1109/ICDE.2002.994753
A. Doan, A. Halevy
The goal of a data integration system is to provide a uniform interface to a multitude of data sources. Given a user query formulated in this interface, the system translates it into a set of query plans. Each plan is a query formulated over the data sources, and specifies a way to access sources and combine data to answer the user query. In practice, when the number of sources is large, a data-integration system must generate and execute many query plans with significantly varying utilities. Hence, it is crucial that the system finds the best plans efficiently and executes them first, to guarantee acceptable time to and the quality of the first answers. We describe efficient solutions to this problem. First, we formally define the problem of ordering query plans. Second, we identify several interesting structural properties of the problem and describe three ordering algorithms that exploit these properties. Finally, we describe experimental results that suggest guidance on which algorithms perform best under which conditions.
数据集成系统的目标是为众多数据源提供统一的接口。在这个界面中给出一个用户查询,系统将其转换为一组查询计划。每个计划都是针对数据源制定的查询,并指定访问数据源和组合数据以回答用户查询的方法。在实践中,当数据源数量很大时,数据集成系统必须生成和执行许多具有显著不同实用程序的查询计划。因此,至关重要的是,系统要有效地找到最佳计划并首先执行它们,以保证第一个答案的可接受时间和质量。我们描述了解决这个问题的有效方法。首先,我们正式定义了排序查询计划的问题。其次,我们确定了问题的几个有趣的结构性质,并描述了利用这些性质的三种排序算法。最后,我们描述了实验结果,建议指导哪些算法在哪些条件下表现最佳。
{"title":"Efficiently ordering query plans for data integration","authors":"A. Doan, A. Halevy","doi":"10.1109/ICDE.2002.994753","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994753","url":null,"abstract":"The goal of a data integration system is to provide a uniform interface to a multitude of data sources. Given a user query formulated in this interface, the system translates it into a set of query plans. Each plan is a query formulated over the data sources, and specifies a way to access sources and combine data to answer the user query. In practice, when the number of sources is large, a data-integration system must generate and execute many query plans with significantly varying utilities. Hence, it is crucial that the system finds the best plans efficiently and executes them first, to guarantee acceptable time to and the quality of the first answers. We describe efficient solutions to this problem. First, we formally define the problem of ordering query plans. Second, we identify several interesting structural properties of the problem and describe three ordering algorithms that exploit these properties. Finally, we describe experimental results that suggest guidance on which algorithms perform best under which conditions.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114894775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
期刊
Proceedings 18th International Conference on Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1