NeurLZ：基于误差控制的神经学习，系统地提高科学数据的有损压缩性能

arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-09 DOI:arxiv-2409.05785

Wenqi Jia, Youyuan Liu, Zhewen Hu, Jinzhen Wang, Boyuan Zhang, Wei Niu, Junzhou Huang, Stavros Kalafatis, Sian Jin, Miao Yin

{"title":"NeurLZ：基于误差控制的神经学习，系统地提高科学数据的有损压缩性能","authors":"Wenqi Jia, Youyuan Liu, Zhewen Hu, Jinzhen Wang, Boyuan Zhang, Wei Niu, Junzhou Huang, Stavros Kalafatis, Sian Jin, Miao Yin","doi":"arxiv-2409.05785","DOIUrl":null,"url":null,"abstract":"Large-scale scientific simulations generate massive datasets that pose\nsignificant challenges for storage and I/O. While traditional lossy compression\ntechniques can improve performance, balancing compression ratio, data quality,\nand throughput remains difficult. To address this, we propose NeurLZ, a novel\ncross-field learning-based and error-controlled compression framework for\nscientific data. By integrating skipping DNN models, cross-field learning, and\nerror control, our framework aims to substantially enhance lossy compression\nperformance. Our contributions are three-fold: (1) We design a lightweight\nskipping model to provide high-fidelity detail retention, further improving\nprediction accuracy. (2) We adopt a cross-field learning approach to\nsignificantly improve data prediction accuracy, resulting in a substantially\nimproved compression ratio. (3) We develop an error control approach to provide\nstrict error bounds according to user requirements. We evaluated NeurLZ on\nseveral real-world HPC application datasets, including Nyx (cosmological\nsimulation), Miranda (large turbulence simulation), and Hurricane (weather\nsimulation). Experiments demonstrate that our framework achieves up to a 90%\nrelative reduction in bit rate under the same data distortion, compared to the\nbest existing approach.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NeurLZ: On Systematically Enhancing Lossy Compression Performance for Scientific Data based on Neural Learning with Error Control\",\"authors\":\"Wenqi Jia, Youyuan Liu, Zhewen Hu, Jinzhen Wang, Boyuan Zhang, Wei Niu, Junzhou Huang, Stavros Kalafatis, Sian Jin, Miao Yin\",\"doi\":\"arxiv-2409.05785\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale scientific simulations generate massive datasets that pose\\nsignificant challenges for storage and I/O. While traditional lossy compression\\ntechniques can improve performance, balancing compression ratio, data quality,\\nand throughput remains difficult. To address this, we propose NeurLZ, a novel\\ncross-field learning-based and error-controlled compression framework for\\nscientific data. By integrating skipping DNN models, cross-field learning, and\\nerror control, our framework aims to substantially enhance lossy compression\\nperformance. Our contributions are three-fold: (1) We design a lightweight\\nskipping model to provide high-fidelity detail retention, further improving\\nprediction accuracy. (2) We adopt a cross-field learning approach to\\nsignificantly improve data prediction accuracy, resulting in a substantially\\nimproved compression ratio. (3) We develop an error control approach to provide\\nstrict error bounds according to user requirements. We evaluated NeurLZ on\\nseveral real-world HPC application datasets, including Nyx (cosmological\\nsimulation), Miranda (large turbulence simulation), and Hurricane (weather\\nsimulation). Experiments demonstrate that our framework achieves up to a 90%\\nrelative reduction in bit rate under the same data distortion, compared to the\\nbest existing approach.\",\"PeriodicalId\":501422,\"journal\":{\"name\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"volume\":\"12 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.05785\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

大规模科学模拟会产生海量数据集，给存储和 I/O 带来巨大挑战。虽然传统的有损压缩技术可以提高性能，但要在压缩率、数据质量和吞吐量之间取得平衡仍然很困难。为了解决这个问题，我们提出了 NeurLZ，这是一种基于跨领域学习和误差控制的新型科学数据压缩框架。通过整合跳转 DNN 模型、跨场学习和错误控制，我们的框架旨在大幅提高有损压缩性能。我们的贡献有三个方面：（1）我们设计了一个轻量级跳转模型，以提供高保真细节保留，进一步提高预测精度。(2) 我们采用跨场学习方法来显著提高数据预测的准确性，从而大幅提高压缩率。(3) 我们开发了一种误差控制方法，可根据用户要求提供严格的误差界限。我们在多个真实世界的 HPC 应用数据集上评估了 NeurLZ，包括 Nyx（宇宙学模拟）、Miranda（大型湍流模拟）和 Hurricane（天气模拟）。实验证明，与现有的最佳方法相比，我们的框架在相同的数据失真条件下实现了高达 90% 的比特率相对降低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

NeurLZ: On Systematically Enhancing Lossy Compression Performance for Scientific Data based on Neural Learning with Error Control

Large-scale scientific simulations generate massive datasets that pose significant challenges for storage and I/O. While traditional lossy compression techniques can improve performance, balancing compression ratio, data quality, and throughput remains difficult. To address this, we propose NeurLZ, a novel cross-field learning-based and error-controlled compression framework for scientific data. By integrating skipping DNN models, cross-field learning, and error control, our framework aims to substantially enhance lossy compression performance. Our contributions are three-fold: (1) We design a lightweight skipping model to provide high-fidelity detail retention, further improving prediction accuracy. (2) We adopt a cross-field learning approach to significantly improve data prediction accuracy, resulting in a substantially improved compression ratio. (3) We develop an error control approach to provide strict error bounds according to user requirements. We evaluated NeurLZ on several real-world HPC application datasets, including Nyx (cosmological simulation), Miranda (large turbulence simulation), and Hurricane (weather simulation). Experiments demonstrate that our framework achieves up to a 90% relative reduction in bit rate under the same data distortion, compared to the best existing approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助