Proceedings. 20th International Conference on Data Engineering最新文献

英文中文

Algebraic signatures for scalable distributed data structures 可扩展分布式数据结构的代数签名

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320015

W. Litwin, T. Schwarz

Signatures detect changes to data objects. Numerous schemes are in use, especially the cryptographically secure standards SHA-1. We propose a novel signature scheme which we call algebraic signatures. The scheme uses the Galois field calculations. Its major property is the sure detection of any changes up to a parameterized size. More precisely, we detect for sure any changes that do not exceed n-symbols for an n-symbol algebraic signature. This property is new for any known signature scheme. For larger changes, the collision probability is typically negligible, as for the other known schemes. We apply the algebraic signatures to the scalable distributed data structures (SDDS). We filter at the SDDS client node the updates that do not actually change the records. We also manage the concurrent updates to data stored in the SDDS RAM buckets at the server nodes. We further use the scheme for the fast disk backup of these buckets. We sign our objects with 4-byte signatures, instead of 20-byte standard SHA-1 signatures. Our algebraic calculus is then also about twice as fast.

签名检测数据对象的更改。有许多方案正在使用，特别是加密安全标准SHA-1。我们提出了一种新的签名方案，我们称之为代数签名。该方案采用伽罗瓦场计算。它的主要特性是确定检测到参数化大小的任何变化。更准确地说，对于n符号代数签名，我们可以确定地检测到不超过n个符号的任何变化。此属性对于任何已知的签名方案都是新的。对于较大的变化，与其他已知方案一样，碰撞概率通常可以忽略不计。我们将代数签名应用于可扩展分布式数据结构(SDDS)。我们在SDDS客户机节点上过滤没有实际更改记录的更新。我们还管理存储在服务器节点的SDDS RAM桶中的数据的并发更新。我们进一步使用该方案对这些桶进行快速磁盘备份。我们用4字节的签名来签署对象，而不是20字节的标准SHA-1签名。我们的代数演算也快了两倍。

引用次数: 53

Continuously maintaining quantile summaries of the most recent N elements over a data stream 持续维护数据流中最近N个元素的分位数摘要

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320011

Xuemin Lin, Hongjun Lu, Jian Xu, J. Yu

Statistics over the most recently observed data elements are often required in applications involving data streams, such as intrusion detection in network monitoring, stock price prediction in financial markets, Web log mining for access prediction, and user click stream mining for personalization. Among various statistics, computing quantile summary is probably most challenging because of its complexity. We study the problem of continuously maintaining quantile summary of the most recently observed N elements over a stream so that quantile queries can be answered with a guaranteed precision of /spl epsiv/N. We developed a space efficient algorithm for predefined N that requires only one scan of the input data stream and O(log(/spl epsiv//sup 2/N)//spl epsiv/+1//spl epsiv//sup 2/) space in the worst cases. We also developed an algorithm that maintains quantile summaries for most recent N elements so that quantile queries on any most recent n elements (n /spl les/ N) can be answered with a guaranteed precision of /spl epsiv/n. The worst case space requirement for this algorithm is only O(log/sup 2/(/spl epsiv/N)//spl epsiv//sup 2/). Our performance study indicated that not only the actual quantile estimation error is far below the guaranteed precision but the space requirement is also much less than the given theoretical bound.

在涉及数据流的应用程序中，经常需要对最近观察到的数据元素进行统计，例如网络监控中的入侵检测、金融市场中的股票价格预测、用于访问预测的Web日志挖掘以及用于个性化的用户单击流挖掘。在各种统计中，分位数汇总的计算可能是最具挑战性的，因为它的复杂性。我们研究了连续维护流上最近观察到的N个元素的分位数摘要的问题，以便分位数查询可以以保证精度的/spl epsiv/N来回答。我们为预定义的N开发了一种空间高效算法，在最坏的情况下，只需要扫描一次输入数据流和O(log(/spl epsiv//sup 2/N)//spl epsiv/+1//spl epsiv//sup 2/)空间。我们还开发了一种算法，用于维护最近N个元素的分位数摘要，以便对任何最近N个元素(N /spl les/ N)的分位数查询都可以得到保证精度为/spl epsiv/ N的回答。该算法的最坏情况空间需求仅为O(log/sup 2/(/spl epsiv/N)//spl epsiv//sup 2/)。我们的性能研究表明，不仅实际的分位数估计误差远远低于保证精度，而且空间要求也远远小于给定的理论界限。

{"title":"Continuously maintaining quantile summaries of the most recent N elements over a data stream","authors":"Xuemin Lin, Hongjun Lu, Jian Xu, J. Yu","doi":"10.1109/ICDE.2004.1320011","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320011","url":null,"abstract":"Statistics over the most recently observed data elements are often required in applications involving data streams, such as intrusion detection in network monitoring, stock price prediction in financial markets, Web log mining for access prediction, and user click stream mining for personalization. Among various statistics, computing quantile summary is probably most challenging because of its complexity. We study the problem of continuously maintaining quantile summary of the most recently observed N elements over a stream so that quantile queries can be answered with a guaranteed precision of /spl epsiv/N. We developed a space efficient algorithm for predefined N that requires only one scan of the input data stream and O(log(/spl epsiv//sup 2/N)//spl epsiv/+1//spl epsiv//sup 2/) space in the worst cases. We also developed an algorithm that maintains quantile summaries for most recent N elements so that quantile queries on any most recent n elements (n /spl les/ N) can be answered with a guaranteed precision of /spl epsiv/n. The worst case space requirement for this algorithm is only O(log/sup 2/(/spl epsiv/N)//spl epsiv//sup 2/). Our performance study indicated that not only the actual quantile estimation error is far below the guaranteed precision but the space requirement is also much less than the given theoretical bound.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"358 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122746675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 91

Database kernel research: what, if anything, is left to do? 数据库内核研究:如果有的话，还剩下什么要做?

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320095

D. Lomet

Data Engineering deals with the use of engineering techniques and methodologies in the design, development and assessment of information systems for different computing platforms and application environments. The 20th International Conference on Data Engineering will be held in Boston, Massachusetts, USA-an academic and technological center with a variety of historical and cultural attractions of international prominence within walking distance.

数据工程涉及在不同计算平台和应用环境的信息系统的设计、开发和评估中使用工程技术和方法。第20届国际数据工程会议将在美国马萨诸塞州波士顿举行，波士顿是一个学术和技术中心，步行即可到达各种国际知名的历史和文化景点。

引用次数: 0

Recursive XML schemas, recursive XML queries, and relational storage: XML-to-SQL query translation 递归XML模式、递归XML查询和关系存储:XML到sql查询的转换

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319983

R. Krishnamurthy, Venkatesan T. Chakaravarthy, R. Kaushik, J. Naughton

We consider the problem of translating XML queries into SQL when XML documents have been stored in an RDBMS using a schema-based relational decomposition. Surprisingly, there is no published XML-to-SQL query translation algorithm for this scenario that handles recursive XML schemas. We present a generic algorithm to translate path expression queries into SQL in the presence of recursion in the schema and queries. This algorithm handles a general class of XML-to-relational mappings, which includes all techniques proposed in literature. Some of the salient features of this algorithm are: (i) It translates a path expression query into a single SQL query, irrespective of how complex the XML schema is, (ii) It uses the "with" clause in SQL99 to handle recursive queries even over nonrecursive schemas, (iii) It reconstructs recursive XML subtrees with a single SQL query and (iv) It shows that the support for linear recursion in SQL99 is sufficient for handling path expression queries over arbitrarily complex recursive XML schema.

当使用基于模式的关系分解将XML文档存储在RDBMS中时，我们考虑将XML查询转换为SQL的问题。令人惊讶的是，没有针对此场景发布的XML-to- sql查询转换算法来处理递归XML模式。在模式和查询中存在递归的情况下，我们提出了一种将路径表达式查询转换为SQL的通用算法。该算法处理xml到关系映射的一般类，其中包括文献中提出的所有技术。这个算法的特征有:(i)将路径表达式查询转换为一个SQL查询,不管多么复杂的XML模式,(2)它使用”和“条款SQL99处理递归查询甚至在nonrecursive模式,(3)它可以递归的XML子树和一个SQL查询(iv)这表明支持线性递归SQL99足够处理路径表达式查询任意复杂递归的XML模式。

引用次数: 72

Outrageous ideas and/or thoughts while shaving 剃须时的荒唐想法和/或想法

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320096

M. Stonebraker

For this closing panel discussion, we will recruit a collection of participants from the attendees and organizers. Each will agree to present one or more outrageous ideas that are too wacky to get funded and/or incapable of being turned into least-publishable units (LPUs). Less adventuresome panelists can present their pet peeve about research activities pursued by other in the DBMS community. Risk averse panelists can discuss more mundane problems which they would like to work on if they had more time or were excused from department committees.

在这次闭幕小组讨论中，我们将从与会者和组织者中招募一些参与者。每个人都同意提出一个或多个离谱的想法，这些想法太过古怪，无法获得资助，或者无法变成最不容易出版的单位(least-publishable units, lpu)。不太冒险的小组成员可以提出他们对DBMS社区中其他人所从事的研究活动的不满。不愿冒险的小组成员可以讨论更多的世俗问题，如果他们有更多的时间或被允许参加部门委员会，他们会想要解决这些问题。

引用次数: 2

An efficient algorithm for mining frequent sequences by a new strategy without support counting 一种不支持计数的新策略挖掘频繁序列的高效算法

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320012

D. Chiu, Yi-Hung Wu, Arbee L. P. Chen

Mining sequential patterns in large databases is an important research topic. The main challenge of mining sequential patterns is the high processing cost due to the large amount of data. We propose a new strategy called direct sequence comparison (abbreviated as DISC), which can find frequent sequences without having to compute the support counts of nonfrequent sequences. The main difference between the DISC strategy and the previous works is the way to prune nonfrequent sequences. The previous works are based on the antimonotone property, which prune the nonfrequent sequences according to the frequent sequences with shorter lengths. On the contrary, the DISC strategy prunes the nonfrequent sequences according to the other sequences with the same length. Moreover, we summarize three strategies used in the previous works and design an efficient algorithm called DISC-all to take advantages of all the four strategies. The experimental results show that the DISC-all algorithm outperforms the PrefixSpan algorithm on mining frequent sequences in large databases. In addition, we analyze these strategies to design the dynamic version of our algorithm, which achieves a much better performance.

在大型数据库中挖掘顺序模式是一个重要的研究课题。挖掘顺序模式的主要挑战是由于数据量大而导致的高处理成本。我们提出了一种新的策略，称为直接序列比较(DISC)，它可以在不计算非频繁序列的支持计数的情况下找到频繁序列。DISC策略与先前工作的主要区别在于对非频繁序列的修剪方式。以往的工作都是基于反单调性，根据长度较短的频繁序列对非频繁序列进行剪枝。相反，DISC策略根据相同长度的其他序列对非频繁序列进行剪枝。此外，我们总结了之前工作中使用的三种策略，并设计了一种称为DISC-all的高效算法来利用这四种策略。实验结果表明，在大型数据库中挖掘频繁序列时，DISC-all算法优于PrefixSpan算法。此外，我们分析了这些策略，设计了算法的动态版本，实现了更好的性能。

{"title":"An efficient algorithm for mining frequent sequences by a new strategy without support counting","authors":"D. Chiu, Yi-Hung Wu, Arbee L. P. Chen","doi":"10.1109/ICDE.2004.1320012","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320012","url":null,"abstract":"Mining sequential patterns in large databases is an important research topic. The main challenge of mining sequential patterns is the high processing cost due to the large amount of data. We propose a new strategy called direct sequence comparison (abbreviated as DISC), which can find frequent sequences without having to compute the support counts of nonfrequent sequences. The main difference between the DISC strategy and the previous works is the way to prune nonfrequent sequences. The previous works are based on the antimonotone property, which prune the nonfrequent sequences according to the frequent sequences with shorter lengths. On the contrary, the DISC strategy prunes the nonfrequent sequences according to the other sequences with the same length. Moreover, we summarize three strategies used in the previous works and design an efficient algorithm called DISC-all to take advantages of all the four strategies. The experimental results show that the DISC-all algorithm outperforms the PrefixSpan algorithm on mining frequent sequences in large databases. In addition, we analyze these strategies to design the dynamic version of our algorithm, which achieves a much better performance.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133138604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 90

BEA liquid data for WebLogic: XML-based enterprise information integration 用于WebLogic的BEA流动数据:基于xml的企业信息集成

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320051

M. Carey

This presentation provides a technical overview of BEA Liquid Data for WebLogic, a relatively new product from BEA Systems that provides enterprise information integration capabilities to enterprise applications that are built and deployed using the BEA WebLogic Platform. Liquid Data takes an XML-centric approach to tackling the long-standing problem of integrating data from disparate data sources and making that information easily accessible to applications. In particular, Liquid Data uses the forthcoming XQuery language standard as the basis for defining integrated views of enterprise data and querying over those views. We provide a brief overview of the Liquid Data product architecture and then discuss some of the query processing technology that lies at the heart of the product.

本报告提供了BEA Liquid Data for WebLogic的技术概述，BEA是BEA Systems的一款相对较新的产品，它为使用BEA WebLogic平台构建和部署的企业应用程序提供了企业信息集成功能。Liquid Data采用以xml为中心的方法来解决长期存在的问题，即集成来自不同数据源的数据，并使应用程序能够轻松访问这些信息。特别地，Liquid Data使用即将发布的XQuery语言标准作为定义企业数据集成视图和对这些视图进行查询的基础。我们简要概述了Liquid Data产品架构，然后讨论了该产品核心的一些查询处理技术。

引用次数: 10

NEXSORT: sorting XML in external memory NEXSORT:在外部内存中对XML排序

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320038

Adam Silberstein, Jun Yang

XML plays an important role in delivering data over the Internet, and the need to store and manipulate XML in its native format has become increasingly relevant. This growing need necessitates work on developing native XML operators, especially for one as fundamental as sort. We present NEXSORT, an algorithm that leverages the hierarchical nature of XML to efficiently sort an XML document in external memory. In a fully sorted XML document, children of every nonleaf element are ordered according to a given sorting criterion. Among NEXSORT's uses is in combination with structural merge as the XML version of sort-merge join, which allows us to merge large XML documents using only a single pass once they are sorted. The hierarchical structure of an XML document limits the number of possible legal orderings among its elements, which means that sorting XML is fundamentally "easier" than sorting a flat file. We prove that the I/O lower bound for sorting XML in external memory is /spl Theta/(max{n,nlog/sub m/(k/B)}), where n is the number of blocks in the input XML document, m is the number of main memory blocks available for sorting, B is the number of elements that can fit in one block, and k is the maximum fan-out of the input document tree. We show that NEXSORT performs within a constant factor of this theoretical lower bound. In practice we demonstrate, even with a naive implementation, NEXSORT significantly outperforms a regular external merge sort of all elements by their key paths, unless the XML document is nearly flat, in which case NEXSORT degenerates essentially to external merge sort.

XML在通过Internet传递数据方面发挥着重要作用，以其原生格式存储和操作XML的需求变得越来越重要。这种不断增长的需求需要开发原生XML操作符，尤其是像sort这样的基本操作符。我们介绍了NEXSORT，这是一种利用XML的层次特性对外部内存中的XML文档进行有效排序的算法。在完全排序的XML文档中，根据给定的排序标准对每个非叶元素的子元素进行排序。NEXSORT的用途之一是与结构合并结合使用，作为排序-合并连接的XML版本，它允许我们在对大型XML文档进行排序后仅使用一次传递就合并它们。XML文档的层次结构限制了其元素之间可能的合法排序的数量，这意味着对XML进行排序从根本上比对平面文件进行排序“更容易”。我们证明了在外部内存中对XML进行排序的I/O下界为/spl Theta/(max{n,nlog/sub m/(k/B)})，其中n为输入XML文档中的块数，m为可用于排序的主内存块数，B为可容纳在一个块中的元素数，k为输入文档树的最大扇出。我们证明NEXSORT在这个理论下界的常数因子内执行。在实践中，我们证明，即使使用简单的实现，NEXSORT也明显优于按键路径对所有元素进行常规的外部合并排序，除非XML文档几乎是平面的，在这种情况下，NEXSORT本质上退化为外部合并排序。

{"title":"NEXSORT: sorting XML in external memory","authors":"Adam Silberstein, Jun Yang","doi":"10.1109/ICDE.2004.1320038","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320038","url":null,"abstract":"XML plays an important role in delivering data over the Internet, and the need to store and manipulate XML in its native format has become increasingly relevant. This growing need necessitates work on developing native XML operators, especially for one as fundamental as sort. We present NEXSORT, an algorithm that leverages the hierarchical nature of XML to efficiently sort an XML document in external memory. In a fully sorted XML document, children of every nonleaf element are ordered according to a given sorting criterion. Among NEXSORT's uses is in combination with structural merge as the XML version of sort-merge join, which allows us to merge large XML documents using only a single pass once they are sorted. The hierarchical structure of an XML document limits the number of possible legal orderings among its elements, which means that sorting XML is fundamentally \"easier\" than sorting a flat file. We prove that the I/O lower bound for sorting XML in external memory is /spl Theta/(max{n,nlog/sub m/(k/B)}), where n is the number of blocks in the input XML document, m is the number of main memory blocks available for sorting, B is the number of elements that can fit in one block, and k is the maximum fan-out of the input document tree. We show that NEXSORT performs within a constant factor of this theoretical lower bound. In practice we demonstrate, even with a naive implementation, NEXSORT significantly outperforms a regular external merge sort of all elements by their key paths, unless the XML document is nearly flat, in which case NEXSORT degenerates essentially to external merge sort.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116938160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Using stream semantics for continuous queries in media stream processors 在媒体流处理器中使用流语义进行连续查询

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320083

Amarnath Gupta, B. Liu, Pilho Kim, R. Jain

In the case of media and feature streams, explicit inter-stream constraints exist and can be exploited in the evaluation of continuous queries in the spirit of semantic query optimization. We express these properties using a media stream declaration language MSDL. In the demonstration, we present IMMERSI-MEET, an application built around an immersive environment. The IMMERSI-MEET system distinguishes between continuous streams, where values of different types come at a specified data rate, and discrete streams where sources push values intermittently. In MSDL, any dependence declaration must have at least one dependency specifying predicate in the body. As stream declarations are registered, the stream constraints are interpreted to construct a set of evaluation directives.

在媒体流和特征流的情况下，存在显式的流间约束，可以本着语义查询优化的精神在连续查询的评估中加以利用。我们使用媒体流声明语言MSDL来表达这些属性。在演示中，我们展示了IMMERSI-MEET，一个围绕沉浸式环境构建的应用程序。IMMERSI-MEET系统区分连续流和离散流，连续流中不同类型的值以指定的数据速率出现，而离散流中源间歇性地推送值。在MSDL中，任何依赖声明必须在主体中至少有一个指定谓词的依赖。在注册流声明时，流约束被解释为构造一组求值指令。

引用次数: 4

Meta data management 元数据管理

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320101

Philip A. Bernstein, Sergey Melnik

By meta data management, we mean techniques for manipulating schemas and schema-like objects (such as interface definitions and web site maps) and mappings between them. Work on meta data problems goes back to at least the early 1970s, when data translation was the hot database research topic, even before relational databases caught on. Many popular research problems in the past five years are primarily meta data problems, such as data warehouse tools (e.g., ETL – to extract, transform and load), data integration, the semantic web, generation of XML or object-oriented wrappers for SQL databases, and generation of wrappers for web sites. Other classical meta data problems are information resource management, design tool support and integration, and schema evolution and data migration. Despite its longevity and continued importance, there is no widely-accepted conceptual framework for the meta data field, as there is for many other database topics, such as access methods, query processing, and transaction management. In this seminar, we propose such a conceptual framework. It consists of three layers: applications, design patterns, and basic operators. Applications are the end-user problems to be solved, like those listed in the previous paragraph. Design patterns are generic problems that need to be solved in support of many different applications, such as meta modeling (for all meta data problems), answering queries using views (for data integration and the semantic web), and change propagation (for data translation, schema evolution, and round-trip engineering). Basic operators are procedures that are needed to support multiple design patterns and applications, such as matching schemas to produce a mapping, merging schemas based on a mapping, and composing mappings. We will describe several meta data management problems, and for each, we will explain the design patterns and operators that are needed to solve it. We will summarize the main approaches to each design pattern and operator – the main choices of language, data structures, and algorithms – and will highlight the relevant papers that address it. This seminar is targeted at both practicing engineers and researchers. The former will learn about the latest solutions to important meta data problems and the many difficult unsolved problems that are best to avoid. Database researchers, especially professors, will benefit from considering the conceptual framework that we propose, since no database textbooks treat meta data management as a separate topic as far as we know.

通过元数据管理，我们指的是操作模式和类似模式的对象(如接口定义和网站映射)以及它们之间的映射的技术。元数据问题的研究至少可以追溯到20世纪70年代早期，当时数据翻译是热门的数据库研究主题，甚至在关系数据库流行之前。在过去的五年中，许多流行的研究问题主要是元数据问题，如数据仓库工具(例如，ETL -提取，转换和加载)，数据集成，语义网，XML或SQL数据库的面向对象包装器的生成，以及网站包装器的生成。其他经典元数据问题包括信息资源管理、设计工具支持和集成、模式演化和数据迁移。尽管元数据领域存在了很长时间并且一直很重要，但它还没有被广泛接受的概念框架，就像许多其他数据库主题(如访问方法、查询处理和事务管理)一样。在本次研讨会上，我们提出了这样一个概念框架。它由三层组成:应用程序、设计模式和基本操作符。应用程序是要解决的最终用户问题，就像前面列出的那些问题一样。设计模式是需要解决的通用问题，以支持许多不同的应用程序，例如元建模(用于所有元数据问题)、使用视图回答查询(用于数据集成和语义web)和更改传播(用于数据转换、模式演化和往返工程)。基本操作符是支持多种设计模式和应用程序所需的过程，例如匹配模式以生成映射、基于映射合并模式以及组合映射。我们将描述几个元数据管理问题，并针对每个问题解释解决这些问题所需的设计模式和操作符。我们将总结每种设计模式和运算符的主要方法——语言、数据结构和算法的主要选择——并将重点介绍解决这些问题的相关论文。本次研讨会的对象是执业工程师和研究人员。前者将学习重要元数据问题的最新解决方案，以及最好避免的许多难以解决的问题。数据库研究人员，特别是教授，将从考虑我们提出的概念框架中受益，因为据我们所知，没有数据库教科书将元数据管理作为一个单独的主题。

{"title":"Meta data management","authors":"Philip A. Bernstein, Sergey Melnik","doi":"10.1109/ICDE.2004.1320101","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320101","url":null,"abstract":"By meta data management, we mean techniques for manipulating schemas and schema-like objects (such as interface definitions and web site maps) and mappings between them. Work on meta data problems goes back to at least the early 1970s, when data translation was the hot database research topic, even before relational databases caught on. Many popular research problems in the past five years are primarily meta data problems, such as data warehouse tools (e.g., ETL – to extract, transform and load), data integration, the semantic web, generation of XML or object-oriented wrappers for SQL databases, and generation of wrappers for web sites. Other classical meta data problems are information resource management, design tool support and integration, and schema evolution and data migration. Despite its longevity and continued importance, there is no widely-accepted conceptual framework for the meta data field, as there is for many other database topics, such as access methods, query processing, and transaction management. In this seminar, we propose such a conceptual framework. It consists of three layers: applications, design patterns, and basic operators. Applications are the end-user problems to be solved, like those listed in the previous paragraph. Design patterns are generic problems that need to be solved in support of many different applications, such as meta modeling (for all meta data problems), answering queries using views (for data integration and the semantic web), and change propagation (for data translation, schema evolution, and round-trip engineering). Basic operators are procedures that are needed to support multiple design patterns and applications, such as matching schemas to produce a mapping, merging schemas based on a mapping, and composing mappings. We will describe several meta data management problems, and for each, we will explain the design patterns and operators that are needed to solve it. We will summarize the main approaches to each design pattern and operator – the main choices of language, data structures, and algorithms – and will highlight the relevant papers that address it. This seminar is targeted at both practicing engineers and researchers. The former will learn about the latest solutions to important meta data problems and the many difficult unsolved problems that are best to avoid. Database researchers, especially professors, will benefit from considering the conceptual framework that we propose, since no database textbooks treat meta data management as a separate topic as far as we know.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122740915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 91

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings. 20th International Conference on Data Engineering

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀