Learning schema mappings

IF 1.7 2区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Database Systems Pub Date : 2013-11-01 DOI:10.1145/2539032.2539035

B. T. Cate, V. Dalmau, Phokion G. Kolaitis

{"title":"Learning schema mappings","authors":"B. T. Cate, V. Dalmau, Phokion G. Kolaitis","doi":"10.1145/2539032.2539035","DOIUrl":null,"url":null,"abstract":"A schema mapping is a high-level specification of the relationship between a source schema and a target schema. Recently, a line of research has emerged that aims at deriving schema mappings automatically or semi-automatically with the help of data examples, that is, pairs consisting of a source instance and a target instance that depict, in some precise sense, the intended behavior of the schema mapping. Several different uses of data examples for deriving, refining, or illustrating a schema mapping have already been proposed and studied.\n In this article, we use the lens of computational learning theory to systematically investigate the problem of obtaining algorithmically a schema mapping from data examples. Our aim is to leverage the rich body of work on learning theory in order to develop a framework for exploring the power and the limitations of the various algorithmic methods for obtaining schema mappings from data examples. We focus on GAV schema mappings, that is, schema mappings specified by GAV (Global-As-View) constraints. GAV constraints are the most basic and the most widely supported language for specifying schema mappings. We present an efficient algorithm for learning GAV schema mappings using Angluin's model of exact learning with membership and equivalence queries. This is optimal, since we show that neither membership queries nor equivalence queries suffice, unless the source schema consists of unary relations only. We also obtain results concerning the learnability of schema mappings in the context of Valiant's well-known PAC (Probably-Approximately-Correct) learning model, and concerning the learnability of restricted classes of GAV schema mappings. Finally, as a byproduct of our work, we show that there is no efficient algorithm for approximating the shortest GAV schema mapping fitting a given set of examples, unless the source schema consists of unary relations only.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"44 1","pages":"28"},"PeriodicalIF":1.7000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Database Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/2539032.2539035","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 13

Abstract

A schema mapping is a high-level specification of the relationship between a source schema and a target schema. Recently, a line of research has emerged that aims at deriving schema mappings automatically or semi-automatically with the help of data examples, that is, pairs consisting of a source instance and a target instance that depict, in some precise sense, the intended behavior of the schema mapping. Several different uses of data examples for deriving, refining, or illustrating a schema mapping have already been proposed and studied. In this article, we use the lens of computational learning theory to systematically investigate the problem of obtaining algorithmically a schema mapping from data examples. Our aim is to leverage the rich body of work on learning theory in order to develop a framework for exploring the power and the limitations of the various algorithmic methods for obtaining schema mappings from data examples. We focus on GAV schema mappings, that is, schema mappings specified by GAV (Global-As-View) constraints. GAV constraints are the most basic and the most widely supported language for specifying schema mappings. We present an efficient algorithm for learning GAV schema mappings using Angluin's model of exact learning with membership and equivalence queries. This is optimal, since we show that neither membership queries nor equivalence queries suffice, unless the source schema consists of unary relations only. We also obtain results concerning the learnability of schema mappings in the context of Valiant's well-known PAC (Probably-Approximately-Correct) learning model, and concerning the learnability of restricted classes of GAV schema mappings. Finally, as a byproduct of our work, we show that there is no efficient algorithm for approximating the shortest GAV schema mapping fitting a given set of examples, unless the source schema consists of unary relations only.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

学习图式映射

模式映射是源模式和目标模式之间关系的高级规范。最近，出现了一系列旨在借助数据示例自动或半自动地派生模式映射的研究，即由源实例和目标实例组成的对，它们在某种程度上精确地描述了模式映射的预期行为。已经提出并研究了用于派生、精炼或说明模式映射的数据示例的几种不同用法。在本文中，我们使用计算学习理论的镜头来系统地研究从数据示例中获得算法模式映射的问题。我们的目标是利用学习理论的丰富工作来开发一个框架，用于探索从数据示例中获取模式映射的各种算法方法的功能和局限性。我们关注GAV模式映射，即由GAV(全局即视图)约束指定的模式映射。GAV约束是用于指定模式映射的最基本和最广泛支持的语言。利用Angluin的精确学习模型，提出了一种高效的GAV模式映射学习算法。这是最优的，因为我们证明了成员查询和等价查询都不够，除非源模式仅由一元关系组成。在Valiant著名的PAC (Probably-Approximately-Correct，大概正确)学习模型的背景下，我们也得到了关于模式映射可学习性的结果，以及关于GAV模式映射的受限类的可学习性的结果。最后，作为我们工作的副产品，我们表明，除非源模式仅由一元关系组成，否则没有有效的算法来近似拟合给定示例集的最短GAV模式映射。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Database Systems 工程技术-计算机：软件工程

CiteScore

5.60

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Heavily used in both academic and corporate R&D settings, ACM Transactions on Database Systems (TODS) is a key publication for computer scientists working in data abstraction, data modeling, and designing data management systems. Topics include storage and retrieval, transaction management, distributed and federated databases, semantics of data, intelligent databases, and operations and algorithms relating to these areas. In this rapidly changing field, TODS provides insights into the thoughts of the best minds in database R&D.

期刊最新文献

Automated Category Tree Construction: Hardness Bounds and Algorithms Database Repairing with Soft Functional Dependencies Sharing Queries with Nonequivalent User-Defined Aggregate Functions A family of centrality measures for graph data based on subgraphs GraphZeppelin: How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)