SDS: a framework for scientific data services

Proceedings of the 8th Parallel Data Storage Workshop Pub Date : 2013-11-17 DOI:10.1145/2538542.2538563

Bin Dong, S. Byna, Kesheng Wu

{"title":"SDS: a framework for scientific data services","authors":"Bin Dong, S. Byna, Kesheng Wu","doi":"10.1145/2538542.2538563","DOIUrl":null,"url":null,"abstract":"Large-scale scientific applications typically write their data to parallel file systems with organizations designed to achieve fast write speeds. Analysis tasks frequently read the data in a pattern that is different from the write pattern, and therefore experience poor I/O performance. In this paper, we introduce a prototype framework for bridging the performance gap between write and read stages of data access from parallel file systems. We call this framework Scientific Data Services, or SDS for short. This initial implementation of SDS focuses on reorganizing previously written files into data layouts that benefit read patterns, and transparently directs read calls to the reorganized data. SDS follows a client-server architecture. The SDS Server manages partial or full replicas of reorganized datasets and serves SDS Clients' requests for data. The current version of the SDS client library supports HDF5 programming interface for reading data. The client library intercepts HDF5 calls using the HDF5 Virtual Object Layer (VOL) and transparently redirects them to the reorganized data. The SDS client library also provides a querying interface for reading part of the data based on user-specified selective criteria. We describe the design and implementation of the SDS client-server architecture, and evaluate the response time of the SDS Server and the performance benefits of SDS.","PeriodicalId":250653,"journal":{"name":"Proceedings of the 8th Parallel Data Storage Workshop","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th Parallel Data Storage Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2538542.2538563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 31

Abstract

Large-scale scientific applications typically write their data to parallel file systems with organizations designed to achieve fast write speeds. Analysis tasks frequently read the data in a pattern that is different from the write pattern, and therefore experience poor I/O performance. In this paper, we introduce a prototype framework for bridging the performance gap between write and read stages of data access from parallel file systems. We call this framework Scientific Data Services, or SDS for short. This initial implementation of SDS focuses on reorganizing previously written files into data layouts that benefit read patterns, and transparently directs read calls to the reorganized data. SDS follows a client-server architecture. The SDS Server manages partial or full replicas of reorganized datasets and serves SDS Clients' requests for data. The current version of the SDS client library supports HDF5 programming interface for reading data. The client library intercepts HDF5 calls using the HDF5 Virtual Object Layer (VOL) and transparently redirects them to the reorganized data. The SDS client library also provides a querying interface for reading part of the data based on user-specified selective criteria. We describe the design and implementation of the SDS client-server architecture, and evaluate the response time of the SDS Server and the performance benefits of SDS.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SDS:科学数据服务的框架

大型科学应用程序通常将其数据写入并行文件系统，其组织旨在实现快速写入速度。分析任务经常以不同于写模式的模式读取数据，因此I/O性能很差。在本文中，我们介绍了一个原型框架，用于弥合并行文件系统数据访问的写和读阶段之间的性能差距。我们称这个框架为科学数据服务，简称SDS。SDS的初始实现侧重于将以前写入的文件重新组织成有利于读取模式的数据布局，并透明地将读取调用定向到重新组织的数据。SDS遵循客户机-服务器体系结构。SDS服务器管理重组数据集的部分或完整副本，并为SDS客户端提供数据请求。当前版本的SDS客户端库支持HDF5编程接口读取数据。客户端库使用HDF5虚拟对象层(VOL)拦截HDF5调用，并透明地将它们重定向到重组的数据。SDS客户端库还提供了一个查询接口，用于根据用户指定的选择标准读取部分数据。我们描述了SDS客户机-服务器体系结构的设计和实现，并评估了SDS服务器的响应时间和SDS的性能优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 8th Parallel Data Storage Workshop

自引率

0.00%

发文量

期刊最新文献

Proceedings of the 8th Parallel Data Storage Workshop Structuring PLFS for extensibility SDS: a framework for scientific data services Efficient transactions for parallel data movement Fourier-assisted machine learning of hard disk drive access time models