Scientific data integration: wrapping textual documents with a database view mechanism and an XML engine

Z. Lacroix
{"title":"Scientific data integration: wrapping textual documents with a database view mechanism and an XML engine","authors":"Z. Lacroix","doi":"10.1109/BIBE.2000.889591","DOIUrl":null,"url":null,"abstract":"Building a digital library for scientific data requires accessing and manipulating data extracted from flat files or from documents retrieved from the World Wide Web. We present an approach to querying flat files as well as Web data sources through an object database view based on a database system and a wrapper. Generally, a wrapper has two tasks: it first sends a query to the source to retrieve data and, secondly builds the expected output with respect to the virtual structure. Scientific data servers, and in particular the ones publicly available on the Web, usually provide information retrieval techniques to access data. Our wrappers are composed of a retrieval component, based on an intermediate object view mechanism called 'search views' mapping the source capabilities to attributes, and a XML engine to perform these two tasks. If the retrieval component is specific to each data source, this approach shows that the extraction component (the XML engine) can be common. We describe our system and focus on the retrieval component of the Object-Web Wrapper (OWW) for Web sources. The originality of our approach consists of (1) a common wrapper architecture for flat files and Web data sources sharing a XML engine for data extraction, (2) a generic view mechanism to access data sources with limited capabilities, and (3) the representation of hyperlinks as abstract attributes in the object view as well as their use in the search view. Our approach has been developed and demonstrated as part of a multidatabase system supporting queries via uniform Object Protocol Model (OPM) interfaces.","PeriodicalId":196846,"journal":{"name":"Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2000.889591","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Building a digital library for scientific data requires accessing and manipulating data extracted from flat files or from documents retrieved from the World Wide Web. We present an approach to querying flat files as well as Web data sources through an object database view based on a database system and a wrapper. Generally, a wrapper has two tasks: it first sends a query to the source to retrieve data and, secondly builds the expected output with respect to the virtual structure. Scientific data servers, and in particular the ones publicly available on the Web, usually provide information retrieval techniques to access data. Our wrappers are composed of a retrieval component, based on an intermediate object view mechanism called 'search views' mapping the source capabilities to attributes, and a XML engine to perform these two tasks. If the retrieval component is specific to each data source, this approach shows that the extraction component (the XML engine) can be common. We describe our system and focus on the retrieval component of the Object-Web Wrapper (OWW) for Web sources. The originality of our approach consists of (1) a common wrapper architecture for flat files and Web data sources sharing a XML engine for data extraction, (2) a generic view mechanism to access data sources with limited capabilities, and (3) the representation of hyperlinks as abstract attributes in the object view as well as their use in the search view. Our approach has been developed and demonstrated as part of a multidatabase system supporting queries via uniform Object Protocol Model (OPM) interfaces.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
科学数据集成:用数据库视图机制和XML引擎包装文本文档
为科学数据建立数字图书馆需要访问和操作从平面文件或从万维网检索的文档中提取的数据。我们提出了一种通过基于数据库系统和包装器的对象数据库视图查询平面文件和Web数据源的方法。通常,包装器有两个任务:首先向源发送查询以检索数据,其次根据虚拟结构构建预期的输出。科学数据服务器,特别是在Web上公开可用的服务器,通常提供信息检索技术来访问数据。我们的包装器由检索组件和执行这两项任务的XML引擎组成。检索组件基于一种称为“搜索视图”的中间对象视图机制,将源功能映射到属性。如果检索组件特定于每个数据源,那么这种方法表明提取组件(XML引擎)可以是通用的。我们描述了我们的系统,并将重点放在Web源的对象Web包装器(OWW)的检索组件上。我们方法的独创性包括:(1)平面文件和Web数据源的通用包装架构,共享用于数据提取的XML引擎,(2)访问功能有限的数据源的通用视图机制,以及(3)将超链接表示为对象视图中的抽象属性以及它们在搜索视图中的使用。我们的方法是作为支持通过统一对象协议模型(OPM)接口进行查询的多数据库系统的一部分开发和演示的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Classification and estimation of ultrasound speckle noise with neural networks A digital retina-like low level vision processor Achieving interoperability of genome databases through intelligent Web mediators Reconstructing specimens using DIC microscope images Gene mapping by haplotype pattern mining
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1