在并行I/O软件堆栈上启用活动存储

2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) Pub Date : 2010-05-03 DOI:10.1109/MSST.2010.5496981

S. Son, S. Lang, P. Carns, R. Ross, R. Thakur, Berkin Özisikyilmaz, Prabhat Kumar, W. Liao, A. Choudhary

{"title":"在并行I/O软件堆栈上启用活动存储","authors":"S. Son, S. Lang, P. Carns, R. Ross, R. Thakur, Berkin Özisikyilmaz, Prabhat Kumar, W. Liao, A. Choudhary","doi":"10.1109/MSST.2010.5496981","DOIUrl":null,"url":null,"abstract":"As data sizes continue to increase, the concept of active storage is well fitted for many data analysis kernels. Nevertheless, while this concept has been investigated and deployed in a number of forms, enabling it from the parallel I/O software stack has been largely unexplored. In this paper, we propose and evaluate an active storage system that allows data analysis, mining, and statistical operations to be executed from within a parallel I/O interface. In our proposed scheme, common analysis kernels are embedded in parallel file systems. We expose the semantics of these kernels to parallel file systems through an enhanced runtime interface so that execution of embedded kernels is possible on the server. In order to allow complete server-side operations without file format or layout manipulation, our scheme adjusts the file I/O buffer to the computational unit boundary on the fly. Our scheme also uses server-side collective communication primitives for reduction and aggregation using interserver communication. We have implemented a prototype of our active storage system and demonstrate its benefits using four data analysis benchmarks. Our experimental results show that our proposed system improves the overall performance of all four benchmarks by 50.9% on average and that the compute-intensive portion of the k-means clustering kernel can be improved by 58.4% through GPU offloading when executed with a larger computational load. We also show that our scheme consistently outperforms the traditional storage model with a wide variety of input dataset sizes, number of nodes, and computational loads.","PeriodicalId":350968,"journal":{"name":"2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"77","resultStr":"{\"title\":\"Enabling active storage on parallel I/O software stacks\",\"authors\":\"S. Son, S. Lang, P. Carns, R. Ross, R. Thakur, Berkin Özisikyilmaz, Prabhat Kumar, W. Liao, A. Choudhary\",\"doi\":\"10.1109/MSST.2010.5496981\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As data sizes continue to increase, the concept of active storage is well fitted for many data analysis kernels. Nevertheless, while this concept has been investigated and deployed in a number of forms, enabling it from the parallel I/O software stack has been largely unexplored. In this paper, we propose and evaluate an active storage system that allows data analysis, mining, and statistical operations to be executed from within a parallel I/O interface. In our proposed scheme, common analysis kernels are embedded in parallel file systems. We expose the semantics of these kernels to parallel file systems through an enhanced runtime interface so that execution of embedded kernels is possible on the server. In order to allow complete server-side operations without file format or layout manipulation, our scheme adjusts the file I/O buffer to the computational unit boundary on the fly. Our scheme also uses server-side collective communication primitives for reduction and aggregation using interserver communication. We have implemented a prototype of our active storage system and demonstrate its benefits using four data analysis benchmarks. Our experimental results show that our proposed system improves the overall performance of all four benchmarks by 50.9% on average and that the compute-intensive portion of the k-means clustering kernel can be improved by 58.4% through GPU offloading when executed with a larger computational load. We also show that our scheme consistently outperforms the traditional storage model with a wide variety of input dataset sizes, number of nodes, and computational loads.\",\"PeriodicalId\":350968,\"journal\":{\"name\":\"2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"77\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MSST.2010.5496981\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSST.2010.5496981","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 77

摘要

随着数据大小的不断增加，活动存储的概念非常适合许多数据分析内核。然而，虽然这个概念已经以多种形式进行了研究和部署，但从并行I/O软件堆栈中启用它在很大程度上还没有被探索过。在本文中，我们提出并评估了一个主动存储系统，该系统允许在并行I/O接口内执行数据分析，挖掘和统计操作。在我们提出的方案中，通用分析内核被嵌入到并行文件系统中。我们通过增强的运行时接口将这些内核的语义公开给并行文件系统，这样就可以在服务器上执行嵌入式内核。为了允许在没有文件格式或布局操作的情况下完成服务器端操作，我们的方案动态地将文件I/O缓冲区调整到计算单元边界。我们的方案还使用服务器端集体通信原语进行减少和使用服务器间通信进行聚合。我们已经实现了主动存储系统的原型，并使用四个数据分析基准来演示其优点。我们的实验结果表明，我们提出的系统在所有四个基准测试中的总体性能平均提高了50.9%，并且当在较大的计算负载下执行时，通过GPU卸载可以将k-means聚类内核的计算密集型部分提高58.4%。我们还表明，我们的方案在各种输入数据集大小、节点数量和计算负载的情况下始终优于传统的存储模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Enabling active storage on parallel I/O software stacks

As data sizes continue to increase, the concept of active storage is well fitted for many data analysis kernels. Nevertheless, while this concept has been investigated and deployed in a number of forms, enabling it from the parallel I/O software stack has been largely unexplored. In this paper, we propose and evaluate an active storage system that allows data analysis, mining, and statistical operations to be executed from within a parallel I/O interface. In our proposed scheme, common analysis kernels are embedded in parallel file systems. We expose the semantics of these kernels to parallel file systems through an enhanced runtime interface so that execution of embedded kernels is possible on the server. In order to allow complete server-side operations without file format or layout manipulation, our scheme adjusts the file I/O buffer to the computational unit boundary on the fly. Our scheme also uses server-side collective communication primitives for reduction and aggregation using interserver communication. We have implemented a prototype of our active storage system and demonstrate its benefits using four data analysis benchmarks. Our experimental results show that our proposed system improves the overall performance of all four benchmarks by 50.9% on average and that the compute-intensive portion of the k-means clustering kernel can be improved by 58.4% through GPU offloading when executed with a larger computational load. We also show that our scheme consistently outperforms the traditional storage model with a wide variety of input dataset sizes, number of nodes, and computational loads.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)

自引率

0.00%

发文量

期刊最新文献

Automated lookahead data migration in SSD-enabled multi-tiered storage systems Write amplification reduction in NAND Flash through multi-write coding Leveraging disk drive acoustic modes for power management Achieving page-mapping FTL performance at block-mapping FTL cost by hiding address translation Energy and thermal aware buffer cache replacement algorithm