SDS: a framework for scientific data services

Bin Dong, S. Byna, Kesheng Wu
{"title":"SDS: a framework for scientific data services","authors":"Bin Dong, S. Byna, Kesheng Wu","doi":"10.1145/2538542.2538563","DOIUrl":null,"url":null,"abstract":"Large-scale scientific applications typically write their data to parallel file systems with organizations designed to achieve fast write speeds. Analysis tasks frequently read the data in a pattern that is different from the write pattern, and therefore experience poor I/O performance. In this paper, we introduce a prototype framework for bridging the performance gap between write and read stages of data access from parallel file systems. We call this framework Scientific Data Services, or SDS for short. This initial implementation of SDS focuses on reorganizing previously written files into data layouts that benefit read patterns, and transparently directs read calls to the reorganized data. SDS follows a client-server architecture. The SDS Server manages partial or full replicas of reorganized datasets and serves SDS Clients' requests for data. The current version of the SDS client library supports HDF5 programming interface for reading data. The client library intercepts HDF5 calls using the HDF5 Virtual Object Layer (VOL) and transparently redirects them to the reorganized data. The SDS client library also provides a querying interface for reading part of the data based on user-specified selective criteria. We describe the design and implementation of the SDS client-server architecture, and evaluate the response time of the SDS Server and the performance benefits of SDS.","PeriodicalId":250653,"journal":{"name":"Proceedings of the 8th Parallel Data Storage Workshop","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th Parallel Data Storage Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2538542.2538563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 31

Abstract

Large-scale scientific applications typically write their data to parallel file systems with organizations designed to achieve fast write speeds. Analysis tasks frequently read the data in a pattern that is different from the write pattern, and therefore experience poor I/O performance. In this paper, we introduce a prototype framework for bridging the performance gap between write and read stages of data access from parallel file systems. We call this framework Scientific Data Services, or SDS for short. This initial implementation of SDS focuses on reorganizing previously written files into data layouts that benefit read patterns, and transparently directs read calls to the reorganized data. SDS follows a client-server architecture. The SDS Server manages partial or full replicas of reorganized datasets and serves SDS Clients' requests for data. The current version of the SDS client library supports HDF5 programming interface for reading data. The client library intercepts HDF5 calls using the HDF5 Virtual Object Layer (VOL) and transparently redirects them to the reorganized data. The SDS client library also provides a querying interface for reading part of the data based on user-specified selective criteria. We describe the design and implementation of the SDS client-server architecture, and evaluate the response time of the SDS Server and the performance benefits of SDS.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SDS:科学数据服务的框架
大型科学应用程序通常将其数据写入并行文件系统,其组织旨在实现快速写入速度。分析任务经常以不同于写模式的模式读取数据,因此I/O性能很差。在本文中,我们介绍了一个原型框架,用于弥合并行文件系统数据访问的写和读阶段之间的性能差距。我们称这个框架为科学数据服务,简称SDS。SDS的初始实现侧重于将以前写入的文件重新组织成有利于读取模式的数据布局,并透明地将读取调用定向到重新组织的数据。SDS遵循客户机-服务器体系结构。SDS服务器管理重组数据集的部分或完整副本,并为SDS客户端提供数据请求。当前版本的SDS客户端库支持HDF5编程接口读取数据。客户端库使用HDF5虚拟对象层(VOL)拦截HDF5调用,并透明地将它们重定向到重组的数据。SDS客户端库还提供了一个查询接口,用于根据用户指定的选择标准读取部分数据。我们描述了SDS客户机-服务器体系结构的设计和实现,并评估了SDS服务器的响应时间和SDS的性能优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Proceedings of the 8th Parallel Data Storage Workshop Structuring PLFS for extensibility SDS: a framework for scientific data services Efficient transactions for parallel data movement Fourier-assisted machine learning of hard disk drive access time models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1