原位数据分析中的特征保留有损压缩

Workshop Proceedings of the 49th International Conference on Parallel Processing Pub Date : 2020-08-17 DOI:10.1145/3409390.3409400

I. Yakushin, Kshitij Mehta, Jieyang Chen, M. Wolf, Ian T Foster, S. Klasky, T. Munson

{"title":"原位数据分析中的特征保留有损压缩","authors":"I. Yakushin, Kshitij Mehta, Jieyang Chen, M. Wolf, Ian T Foster, S. Klasky, T. Munson","doi":"10.1145/3409390.3409400","DOIUrl":null,"url":null,"abstract":"The traditional model of having simulations write data to disk for offline analysis can be prohibitively expensive on computers with limited storage capacity or I/O bandwidth. In situ data analysis has emerged as a necessary paradigm to address this issue and is expected to play an important role in exascale computing. We demonstrate the various aspects and challenges involved in setting up a comprehensive in situ data analysis pipeline that consists of a simulation coupled with compression and feature tracking routines, a framework for assessing compression quality, a middleware library for I/O and data management, and a workflow tool for composing and running the pipeline. We perform studies of compression mechanisms and parameters on two supercomputers, Summit at Oak Ridge National Laboratory and Theta at Argonne National Laboratory, for two example application pipelines. We show that the optimal choice of compression parameters varies with data, time, and analysis, and that periodic retuning of the in situ pipeline can improve compression quality. Finally, we discuss our perspective on the wider adoption of in situ data analysis and management practices and technologies in the HPC community.","PeriodicalId":350506,"journal":{"name":"Workshop Proceedings of the 49th International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Feature-preserving Lossy Compression for In Situ Data Analysis\",\"authors\":\"I. Yakushin, Kshitij Mehta, Jieyang Chen, M. Wolf, Ian T Foster, S. Klasky, T. Munson\",\"doi\":\"10.1145/3409390.3409400\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The traditional model of having simulations write data to disk for offline analysis can be prohibitively expensive on computers with limited storage capacity or I/O bandwidth. In situ data analysis has emerged as a necessary paradigm to address this issue and is expected to play an important role in exascale computing. We demonstrate the various aspects and challenges involved in setting up a comprehensive in situ data analysis pipeline that consists of a simulation coupled with compression and feature tracking routines, a framework for assessing compression quality, a middleware library for I/O and data management, and a workflow tool for composing and running the pipeline. We perform studies of compression mechanisms and parameters on two supercomputers, Summit at Oak Ridge National Laboratory and Theta at Argonne National Laboratory, for two example application pipelines. We show that the optimal choice of compression parameters varies with data, time, and analysis, and that periodic retuning of the in situ pipeline can improve compression quality. Finally, we discuss our perspective on the wider adoption of in situ data analysis and management practices and technologies in the HPC community.\",\"PeriodicalId\":350506,\"journal\":{\"name\":\"Workshop Proceedings of the 49th International Conference on Parallel Processing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop Proceedings of the 49th International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3409390.3409400\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop Proceedings of the 49th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3409390.3409400","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

在存储容量或I/O带宽有限的计算机上，让模拟将数据写入磁盘以进行脱机分析的传统模型可能会非常昂贵。原位数据分析已成为解决这一问题的必要范例，并有望在百亿亿次计算中发挥重要作用。我们展示了建立一个全面的现场数据分析管道所涉及的各个方面和挑战，该管道包括一个模拟与压缩和特征跟踪例程，一个评估压缩质量的框架，一个用于I/O和数据管理的中间件库，以及一个用于组合和运行管道的工作流工具。我们在两台超级计算机(橡树岭国家实验室的Summit和阿贡国家实验室的Theta)上执行压缩机制和参数的研究，用于两个示例应用程序管道。研究表明，压缩参数的最佳选择随数据、时间和分析而变化，并且定期对原位管道进行调整可以提高压缩质量。最后，我们讨论了我们对HPC社区更广泛地采用原位数据分析和管理实践和技术的看法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Feature-preserving Lossy Compression for In Situ Data Analysis

The traditional model of having simulations write data to disk for offline analysis can be prohibitively expensive on computers with limited storage capacity or I/O bandwidth. In situ data analysis has emerged as a necessary paradigm to address this issue and is expected to play an important role in exascale computing. We demonstrate the various aspects and challenges involved in setting up a comprehensive in situ data analysis pipeline that consists of a simulation coupled with compression and feature tracking routines, a framework for assessing compression quality, a middleware library for I/O and data management, and a workflow tool for composing and running the pipeline. We perform studies of compression mechanisms and parameters on two supercomputers, Summit at Oak Ridge National Laboratory and Theta at Argonne National Laboratory, for two example application pipelines. We show that the optimal choice of compression parameters varies with data, time, and analysis, and that periodic retuning of the in situ pipeline can improve compression quality. Finally, we discuss our perspective on the wider adoption of in situ data analysis and management practices and technologies in the HPC community.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop Proceedings of the 49th International Conference on Parallel Processing

自引率

0.00%

发文量