Scientific data integration: wrapping textual documents with a database view mechanism and an XML engine

Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering Pub Date : 2000-11-08 DOI:10.1109/BIBE.2000.889591

Z. Lacroix

{"title":"Scientific data integration: wrapping textual documents with a database view mechanism and an XML engine","authors":"Z. Lacroix","doi":"10.1109/BIBE.2000.889591","DOIUrl":null,"url":null,"abstract":"Building a digital library for scientific data requires accessing and manipulating data extracted from flat files or from documents retrieved from the World Wide Web. We present an approach to querying flat files as well as Web data sources through an object database view based on a database system and a wrapper. Generally, a wrapper has two tasks: it first sends a query to the source to retrieve data and, secondly builds the expected output with respect to the virtual structure. Scientific data servers, and in particular the ones publicly available on the Web, usually provide information retrieval techniques to access data. Our wrappers are composed of a retrieval component, based on an intermediate object view mechanism called 'search views' mapping the source capabilities to attributes, and a XML engine to perform these two tasks. If the retrieval component is specific to each data source, this approach shows that the extraction component (the XML engine) can be common. We describe our system and focus on the retrieval component of the Object-Web Wrapper (OWW) for Web sources. The originality of our approach consists of (1) a common wrapper architecture for flat files and Web data sources sharing a XML engine for data extraction, (2) a generic view mechanism to access data sources with limited capabilities, and (3) the representation of hyperlinks as abstract attributes in the object view as well as their use in the search view. Our approach has been developed and demonstrated as part of a multidatabase system supporting queries via uniform Object Protocol Model (OPM) interfaces.","PeriodicalId":196846,"journal":{"name":"Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2000.889591","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Building a digital library for scientific data requires accessing and manipulating data extracted from flat files or from documents retrieved from the World Wide Web. We present an approach to querying flat files as well as Web data sources through an object database view based on a database system and a wrapper. Generally, a wrapper has two tasks: it first sends a query to the source to retrieve data and, secondly builds the expected output with respect to the virtual structure. Scientific data servers, and in particular the ones publicly available on the Web, usually provide information retrieval techniques to access data. Our wrappers are composed of a retrieval component, based on an intermediate object view mechanism called 'search views' mapping the source capabilities to attributes, and a XML engine to perform these two tasks. If the retrieval component is specific to each data source, this approach shows that the extraction component (the XML engine) can be common. We describe our system and focus on the retrieval component of the Object-Web Wrapper (OWW) for Web sources. The originality of our approach consists of (1) a common wrapper architecture for flat files and Web data sources sharing a XML engine for data extraction, (2) a generic view mechanism to access data sources with limited capabilities, and (3) the representation of hyperlinks as abstract attributes in the object view as well as their use in the search view. Our approach has been developed and demonstrated as part of a multidatabase system supporting queries via uniform Object Protocol Model (OPM) interfaces.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

科学数据集成:用数据库视图机制和XML引擎包装文本文档

为科学数据建立数字图书馆需要访问和操作从平面文件或从万维网检索的文档中提取的数据。我们提出了一种通过基于数据库系统和包装器的对象数据库视图查询平面文件和Web数据源的方法。通常，包装器有两个任务:首先向源发送查询以检索数据，其次根据虚拟结构构建预期的输出。科学数据服务器，特别是在Web上公开可用的服务器，通常提供信息检索技术来访问数据。我们的包装器由检索组件和执行这两项任务的XML引擎组成。检索组件基于一种称为“搜索视图”的中间对象视图机制，将源功能映射到属性。如果检索组件特定于每个数据源，那么这种方法表明提取组件(XML引擎)可以是通用的。我们描述了我们的系统，并将重点放在Web源的对象Web包装器(OWW)的检索组件上。我们方法的独创性包括:(1)平面文件和Web数据源的通用包装架构，共享用于数据提取的XML引擎，(2)访问功能有限的数据源的通用视图机制，以及(3)将超链接表示为对象视图中的抽象属性以及它们在搜索视图中的使用。我们的方法是作为支持通过统一对象协议模型(OPM)接口进行查询的多数据库系统的一部分开发和演示的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering

自引率

0.00%

发文量

期刊最新文献

Classification and estimation of ultrasound speckle noise with neural networks A digital retina-like low level vision processor Achieving interoperability of genome databases through intelligent Web mediators Reconstructing specimens using DIC microscope images Gene mapping by haplotype pattern mining