{"title":"NeurLZ: On Systematically Enhancing Lossy Compression Performance for Scientific Data based on Neural Learning with Error Control","authors":"Wenqi Jia, Youyuan Liu, Zhewen Hu, Jinzhen Wang, Boyuan Zhang, Wei Niu, Junzhou Huang, Stavros Kalafatis, Sian Jin, Miao Yin","doi":"arxiv-2409.05785","DOIUrl":null,"url":null,"abstract":"Large-scale scientific simulations generate massive datasets that pose\nsignificant challenges for storage and I/O. While traditional lossy compression\ntechniques can improve performance, balancing compression ratio, data quality,\nand throughput remains difficult. To address this, we propose NeurLZ, a novel\ncross-field learning-based and error-controlled compression framework for\nscientific data. By integrating skipping DNN models, cross-field learning, and\nerror control, our framework aims to substantially enhance lossy compression\nperformance. Our contributions are three-fold: (1) We design a lightweight\nskipping model to provide high-fidelity detail retention, further improving\nprediction accuracy. (2) We adopt a cross-field learning approach to\nsignificantly improve data prediction accuracy, resulting in a substantially\nimproved compression ratio. (3) We develop an error control approach to provide\nstrict error bounds according to user requirements. We evaluated NeurLZ on\nseveral real-world HPC application datasets, including Nyx (cosmological\nsimulation), Miranda (large turbulence simulation), and Hurricane (weather\nsimulation). Experiments demonstrate that our framework achieves up to a 90%\nrelative reduction in bit rate under the same data distortion, compared to the\nbest existing approach.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Large-scale scientific simulations generate massive datasets that pose
significant challenges for storage and I/O. While traditional lossy compression
techniques can improve performance, balancing compression ratio, data quality,
and throughput remains difficult. To address this, we propose NeurLZ, a novel
cross-field learning-based and error-controlled compression framework for
scientific data. By integrating skipping DNN models, cross-field learning, and
error control, our framework aims to substantially enhance lossy compression
performance. Our contributions are three-fold: (1) We design a lightweight
skipping model to provide high-fidelity detail retention, further improving
prediction accuracy. (2) We adopt a cross-field learning approach to
significantly improve data prediction accuracy, resulting in a substantially
improved compression ratio. (3) We develop an error control approach to provide
strict error bounds according to user requirements. We evaluated NeurLZ on
several real-world HPC application datasets, including Nyx (cosmological
simulation), Miranda (large turbulence simulation), and Hurricane (weather
simulation). Experiments demonstrate that our framework achieves up to a 90%
relative reduction in bit rate under the same data distortion, compared to the
best existing approach.