An approach for automatic data virtualization

L. Weng, G. Agrawal, Ümit V. Çatalyürek, T. Kurç, S. Narayanan, J. Saltz
{"title":"An approach for automatic data virtualization","authors":"L. Weng, G. Agrawal, Ümit V. Çatalyürek, T. Kurç, S. Narayanan, J. Saltz","doi":"10.1109/HPDC.2004.2","DOIUrl":null,"url":null,"abstract":"Analysis of large and/or geographically distributed scientific datasets is emerging as a key component of grid computing. One challenge in this area is that scientific datasets are typically stored as binary or character flat-files, which makes specification of processing much harder. In view of this, there has been recent interest in data virtualization, and data services to support such virtualization. This paper presents an approach for automatically creating data services to support data virtualization. Specifically, we show how a relational table like data abstraction can be supported for complex multidimensional scientific datasets that are resident on a cluster. We have designed and implemented a tool that processes SQL queries (with select and where statements) on multi-dimensional datasets. We have designed a meta-data description language that is used for specifying the data layout. From such description, our tool automatically generates efficient data subsetting and access functions. We have extensively evaluated our system. The key observations from our experiments are as follows. First, our tool can correctly and efficiently handle a variety of different data layouts. Second, our system scales well as the number of nodes or the amount of data is scaled. Third, the performance of the automatically generated code for indexing and contracting functions is quite comparable to the performance of hand-written codes.","PeriodicalId":446429,"journal":{"name":"Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004.","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPDC.2004.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 44

Abstract

Analysis of large and/or geographically distributed scientific datasets is emerging as a key component of grid computing. One challenge in this area is that scientific datasets are typically stored as binary or character flat-files, which makes specification of processing much harder. In view of this, there has been recent interest in data virtualization, and data services to support such virtualization. This paper presents an approach for automatically creating data services to support data virtualization. Specifically, we show how a relational table like data abstraction can be supported for complex multidimensional scientific datasets that are resident on a cluster. We have designed and implemented a tool that processes SQL queries (with select and where statements) on multi-dimensional datasets. We have designed a meta-data description language that is used for specifying the data layout. From such description, our tool automatically generates efficient data subsetting and access functions. We have extensively evaluated our system. The key observations from our experiments are as follows. First, our tool can correctly and efficiently handle a variety of different data layouts. Second, our system scales well as the number of nodes or the amount of data is scaled. Third, the performance of the automatically generated code for indexing and contracting functions is quite comparable to the performance of hand-written codes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种自动数据虚拟化的方法
分析大型和/或地理上分布的科学数据集正在成为网格计算的关键组成部分。这一领域的一个挑战是,科学数据集通常以二进制或字符平面文件的形式存储,这使得规范处理变得更加困难。鉴于此,最近出现了对数据虚拟化和支持这种虚拟化的数据服务的兴趣。本文提出了一种自动创建数据服务以支持数据虚拟化的方法。具体来说,我们将展示如何为驻留在集群上的复杂多维科学数据集支持像数据抽象这样的关系表。我们设计并实现了一个工具来处理多维数据集上的SQL查询(使用select和where语句)。我们设计了一种元数据描述语言,用于指定数据布局。根据这样的描述,我们的工具自动生成高效的数据子集和访问函数。我们对我们的系统进行了广泛的评估。我们实验的主要观察结果如下。首先,我们的工具可以正确有效地处理各种不同的数据布局。其次,我们的系统可以很好地扩展节点数量或数据量。第三,用于索引和收缩函数的自动生成代码的性能与手写代码的性能相当。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Measuring and understanding user comfort with resource borrowing Globus and PlanetLab resource management solutions compared FPN: a distributed hash table for commercial applications GAIS: grid advanced information service based on P2P mechanism Utilization of a local grid of Mac OS X-based computers using Xgrid
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1