I. Yakushin, Kshitij Mehta, Jieyang Chen, M. Wolf, Ian T Foster, S. Klasky, T. Munson
{"title":"原位数据分析中的特征保留有损压缩","authors":"I. Yakushin, Kshitij Mehta, Jieyang Chen, M. Wolf, Ian T Foster, S. Klasky, T. Munson","doi":"10.1145/3409390.3409400","DOIUrl":null,"url":null,"abstract":"The traditional model of having simulations write data to disk for offline analysis can be prohibitively expensive on computers with limited storage capacity or I/O bandwidth. In situ data analysis has emerged as a necessary paradigm to address this issue and is expected to play an important role in exascale computing. We demonstrate the various aspects and challenges involved in setting up a comprehensive in situ data analysis pipeline that consists of a simulation coupled with compression and feature tracking routines, a framework for assessing compression quality, a middleware library for I/O and data management, and a workflow tool for composing and running the pipeline. We perform studies of compression mechanisms and parameters on two supercomputers, Summit at Oak Ridge National Laboratory and Theta at Argonne National Laboratory, for two example application pipelines. We show that the optimal choice of compression parameters varies with data, time, and analysis, and that periodic retuning of the in situ pipeline can improve compression quality. Finally, we discuss our perspective on the wider adoption of in situ data analysis and management practices and technologies in the HPC community.","PeriodicalId":350506,"journal":{"name":"Workshop Proceedings of the 49th International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Feature-preserving Lossy Compression for In Situ Data Analysis\",\"authors\":\"I. Yakushin, Kshitij Mehta, Jieyang Chen, M. Wolf, Ian T Foster, S. Klasky, T. Munson\",\"doi\":\"10.1145/3409390.3409400\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The traditional model of having simulations write data to disk for offline analysis can be prohibitively expensive on computers with limited storage capacity or I/O bandwidth. In situ data analysis has emerged as a necessary paradigm to address this issue and is expected to play an important role in exascale computing. We demonstrate the various aspects and challenges involved in setting up a comprehensive in situ data analysis pipeline that consists of a simulation coupled with compression and feature tracking routines, a framework for assessing compression quality, a middleware library for I/O and data management, and a workflow tool for composing and running the pipeline. We perform studies of compression mechanisms and parameters on two supercomputers, Summit at Oak Ridge National Laboratory and Theta at Argonne National Laboratory, for two example application pipelines. We show that the optimal choice of compression parameters varies with data, time, and analysis, and that periodic retuning of the in situ pipeline can improve compression quality. Finally, we discuss our perspective on the wider adoption of in situ data analysis and management practices and technologies in the HPC community.\",\"PeriodicalId\":350506,\"journal\":{\"name\":\"Workshop Proceedings of the 49th International Conference on Parallel Processing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop Proceedings of the 49th International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3409390.3409400\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop Proceedings of the 49th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3409390.3409400","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Feature-preserving Lossy Compression for In Situ Data Analysis
The traditional model of having simulations write data to disk for offline analysis can be prohibitively expensive on computers with limited storage capacity or I/O bandwidth. In situ data analysis has emerged as a necessary paradigm to address this issue and is expected to play an important role in exascale computing. We demonstrate the various aspects and challenges involved in setting up a comprehensive in situ data analysis pipeline that consists of a simulation coupled with compression and feature tracking routines, a framework for assessing compression quality, a middleware library for I/O and data management, and a workflow tool for composing and running the pipeline. We perform studies of compression mechanisms and parameters on two supercomputers, Summit at Oak Ridge National Laboratory and Theta at Argonne National Laboratory, for two example application pipelines. We show that the optimal choice of compression parameters varies with data, time, and analysis, and that periodic retuning of the in situ pipeline can improve compression quality. Finally, we discuss our perspective on the wider adoption of in situ data analysis and management practices and technologies in the HPC community.