Self-Extending Peer Data Management

Datenbanksysteme für Business, Technologie und Web Pub Date : 2005-03-01 DOI:10.18452/9200

Ralf Heese, Sven Herschel, Felix Naumann, A. Roth

{"title":"Self-Extending Peer Data Management","authors":"Ralf Heese, Sven Herschel, Felix Naumann, A. Roth","doi":"10.18452/9200","DOIUrl":null,"url":null,"abstract":"Peer data management systems (PDMS) are the natural extension of integrated information systems. Conventionally, a single integrating system manages an integrated schema, distributes queries to appropriate sources, and integrates incoming data to a common result. In contrast, a PDMS consists of a set of peers, each of which can play the role of an integrating component. A peer knows about its neighboring peers by mappings, which help to translate queries and transform data. Queries submitted to one peer are answered by data residing at that peer and by data that is reached along paths of mappings through the network of peers. The only restriction for PDMS to cover unbounded data is the need to formulate at least one mapping from some known peer to a new data source. We propose a Semantic Web based method that overcomes this restriction, albeit at a price. As sources are dynamically and automatically included in a PDMS, three factors diminish quality: The new source itself might store data of poor quality, the mapping to the PDMS might be incorrect, and the mapping to the PDMS might be incomplete. To compensate, we propose a quality model to measure this effect, a cost model to restrict query planning to the best paths through the PDMS, and techniques to answer queries in such Webscale PDMS efficiently. 1 An Ever-growing PDMS The step from centralized database systems (DBMS) to distributed and then to federated database systems (FDBMS) removed the assumption that data must be located at the same site as the query. A federated database provides a global schema that represents the data it can access locally and remotely. The global schema is related to the local schemata via schema mappings, which specify how the schema of a local database maps to the global schema. The federated database accepts a query against its global schema and distributes it according to the schema mappings to the different sites where the data resides. Those sites execute the partial queries and send results back to the requesting peer. Again, the schema mappings specify how data is to be translated to conform to the global schema. The results are further processed and combined to be finally fused into a single response to the user. A natural extension to this paradigm is to remove the assumption that queries are only asked against a single integrating site. Peer data management systems (PDMS) are built of multiple peers, each of which provides a schema and accepts queries against the schema. Again, the peers are connected by mappings among their schemata. However, instead of forming a tree with a single root, each peer can be connected to any number of other peers. Queries against a schema of one peer can be answered using the data of the entire PDMS, as long as appropriate mappings have been formed (see Fig. 1). In general, a query","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"149 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Datenbanksysteme für Business, Technologie und Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18452/9200","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

Abstract

Peer data management systems (PDMS) are the natural extension of integrated information systems. Conventionally, a single integrating system manages an integrated schema, distributes queries to appropriate sources, and integrates incoming data to a common result. In contrast, a PDMS consists of a set of peers, each of which can play the role of an integrating component. A peer knows about its neighboring peers by mappings, which help to translate queries and transform data. Queries submitted to one peer are answered by data residing at that peer and by data that is reached along paths of mappings through the network of peers. The only restriction for PDMS to cover unbounded data is the need to formulate at least one mapping from some known peer to a new data source. We propose a Semantic Web based method that overcomes this restriction, albeit at a price. As sources are dynamically and automatically included in a PDMS, three factors diminish quality: The new source itself might store data of poor quality, the mapping to the PDMS might be incorrect, and the mapping to the PDMS might be incomplete. To compensate, we propose a quality model to measure this effect, a cost model to restrict query planning to the best paths through the PDMS, and techniques to answer queries in such Webscale PDMS efficiently. 1 An Ever-growing PDMS The step from centralized database systems (DBMS) to distributed and then to federated database systems (FDBMS) removed the assumption that data must be located at the same site as the query. A federated database provides a global schema that represents the data it can access locally and remotely. The global schema is related to the local schemata via schema mappings, which specify how the schema of a local database maps to the global schema. The federated database accepts a query against its global schema and distributes it according to the schema mappings to the different sites where the data resides. Those sites execute the partial queries and send results back to the requesting peer. Again, the schema mappings specify how data is to be translated to conform to the global schema. The results are further processed and combined to be finally fused into a single response to the user. A natural extension to this paradigm is to remove the assumption that queries are only asked against a single integrating site. Peer data management systems (PDMS) are built of multiple peers, each of which provides a schema and accepts queries against the schema. Again, the peers are connected by mappings among their schemata. However, instead of forming a tree with a single root, each peer can be connected to any number of other peers. Queries against a schema of one peer can be answered using the data of the entire PDMS, as long as appropriate mappings have been formed (see Fig. 1). In general, a query

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

自扩展对等数据管理

对等数据管理系统(PDMS)是综合信息系统的自然延伸。通常，单个集成系统管理一个集成的模式，将查询分发到适当的源，并将传入的数据集成为一个公共结果。相反，PDMS由一组对等点组成，每个对等点都可以扮演集成组件的角色。对等体通过映射了解其相邻的对等体，这有助于转换查询和转换数据。提交给一个对等点的查询由驻留在该对等点的数据和通过对等点网络沿映射路径到达的数据来回答。PDMS覆盖无界数据的唯一限制是从某个已知对等点到新数据源的至少一个映射。我们提出了一种基于语义Web的方法来克服这一限制，尽管这是有代价的。由于源是动态和自动地包含在PDMS中，因此有三个因素会降低质量:新源本身可能存储质量较差的数据，到PDMS的映射可能不正确，到PDMS的映射可能不完整。为了补偿这种影响，我们提出了一个质量模型来衡量这种影响，一个成本模型来限制查询计划通过PDMS的最佳路径，以及在这种Webscale PDMS中有效回答查询的技术。从集中式数据库系统(DBMS)到分布式数据库系统，再到联邦数据库系统(FDBMS)，消除了数据必须与查询位于同一站点的假设。联邦数据库提供一个全局模式，表示它可以在本地和远程访问的数据。全局模式通过模式映射与本地模式相关联，模式映射指定了本地数据库的模式如何映射到全局模式。联邦数据库接受针对其全局模式的查询，并根据模式映射将其分发到数据所在的不同站点。这些站点执行部分查询并将结果发送回请求对等端。同样，模式映射指定如何转换数据以符合全局模式。这些结果将被进一步处理和组合，最终融合成对用户的单个响应。对这种范式的自然扩展是消除仅针对单个集成站点请求查询的假设。对等数据管理系统(PDMS)由多个对等点组成，每个对等点提供一个模式并接受针对该模式的查询。同样，对等体通过它们的模式之间的映射连接起来。但是，每个对等体可以连接到任意数量的其他对等体，而不是形成具有单个根的树。只要形成了适当的映射(参见图1)，就可以使用整个PDMS的数据回答针对一个对等点模式的查询

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助