NeurLZ: On Systematically Enhancing Lossy Compression Performance for Scientific Data based on Neural Learning with Error Control

arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-09 DOI:arxiv-2409.05785

Wenqi Jia, Youyuan Liu, Zhewen Hu, Jinzhen Wang, Boyuan Zhang, Wei Niu, Junzhou Huang, Stavros Kalafatis, Sian Jin, Miao Yin

{"title":"NeurLZ: On Systematically Enhancing Lossy Compression Performance for Scientific Data based on Neural Learning with Error Control","authors":"Wenqi Jia, Youyuan Liu, Zhewen Hu, Jinzhen Wang, Boyuan Zhang, Wei Niu, Junzhou Huang, Stavros Kalafatis, Sian Jin, Miao Yin","doi":"arxiv-2409.05785","DOIUrl":null,"url":null,"abstract":"Large-scale scientific simulations generate massive datasets that pose\nsignificant challenges for storage and I/O. While traditional lossy compression\ntechniques can improve performance, balancing compression ratio, data quality,\nand throughput remains difficult. To address this, we propose NeurLZ, a novel\ncross-field learning-based and error-controlled compression framework for\nscientific data. By integrating skipping DNN models, cross-field learning, and\nerror control, our framework aims to substantially enhance lossy compression\nperformance. Our contributions are three-fold: (1) We design a lightweight\nskipping model to provide high-fidelity detail retention, further improving\nprediction accuracy. (2) We adopt a cross-field learning approach to\nsignificantly improve data prediction accuracy, resulting in a substantially\nimproved compression ratio. (3) We develop an error control approach to provide\nstrict error bounds according to user requirements. We evaluated NeurLZ on\nseveral real-world HPC application datasets, including Nyx (cosmological\nsimulation), Miranda (large turbulence simulation), and Hurricane (weather\nsimulation). Experiments demonstrate that our framework achieves up to a 90%\nrelative reduction in bit rate under the same data distortion, compared to the\nbest existing approach.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Large-scale scientific simulations generate massive datasets that pose significant challenges for storage and I/O. While traditional lossy compression techniques can improve performance, balancing compression ratio, data quality, and throughput remains difficult. To address this, we propose NeurLZ, a novel cross-field learning-based and error-controlled compression framework for scientific data. By integrating skipping DNN models, cross-field learning, and error control, our framework aims to substantially enhance lossy compression performance. Our contributions are three-fold: (1) We design a lightweight skipping model to provide high-fidelity detail retention, further improving prediction accuracy. (2) We adopt a cross-field learning approach to significantly improve data prediction accuracy, resulting in a substantially improved compression ratio. (3) We develop an error control approach to provide strict error bounds according to user requirements. We evaluated NeurLZ on several real-world HPC application datasets, including Nyx (cosmological simulation), Miranda (large turbulence simulation), and Hurricane (weather simulation). Experiments demonstrate that our framework achieves up to a 90% relative reduction in bit rate under the same data distortion, compared to the best existing approach.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

NeurLZ：基于误差控制的神经学习，系统地提高科学数据的有损压缩性能

大规模科学模拟会产生海量数据集，给存储和 I/O 带来巨大挑战。虽然传统的有损压缩技术可以提高性能，但要在压缩率、数据质量和吞吐量之间取得平衡仍然很困难。为了解决这个问题，我们提出了 NeurLZ，这是一种基于跨领域学习和误差控制的新型科学数据压缩框架。通过整合跳转 DNN 模型、跨场学习和错误控制，我们的框架旨在大幅提高有损压缩性能。我们的贡献有三个方面：（1）我们设计了一个轻量级跳转模型，以提供高保真细节保留，进一步提高预测精度。(2) 我们采用跨场学习方法来显著提高数据预测的准确性，从而大幅提高压缩率。(3) 我们开发了一种误差控制方法，可根据用户要求提供严格的误差界限。我们在多个真实世界的 HPC 应用数据集上评估了 NeurLZ，包括 Nyx（宇宙学模拟）、Miranda（大型湍流模拟）和 Hurricane（天气模拟）。实验证明，与现有的最佳方法相比，我们的框架在相同的数据失真条件下实现了高达 90% 的比特率相对降低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Distributed, Parallel, and Cluster Computing

自引率

0.00%

发文量

期刊最新文献

Massively parallel CMA-ES with increasing population Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach CountChain: A Decentralized Oracle Network for Counting Systems Delay Analysis of EIP-4844