{"title":"NeurLZ:基于误差控制的神经学习,系统地提高科学数据的有损压缩性能","authors":"Wenqi Jia, Youyuan Liu, Zhewen Hu, Jinzhen Wang, Boyuan Zhang, Wei Niu, Junzhou Huang, Stavros Kalafatis, Sian Jin, Miao Yin","doi":"arxiv-2409.05785","DOIUrl":null,"url":null,"abstract":"Large-scale scientific simulations generate massive datasets that pose\nsignificant challenges for storage and I/O. While traditional lossy compression\ntechniques can improve performance, balancing compression ratio, data quality,\nand throughput remains difficult. To address this, we propose NeurLZ, a novel\ncross-field learning-based and error-controlled compression framework for\nscientific data. By integrating skipping DNN models, cross-field learning, and\nerror control, our framework aims to substantially enhance lossy compression\nperformance. Our contributions are three-fold: (1) We design a lightweight\nskipping model to provide high-fidelity detail retention, further improving\nprediction accuracy. (2) We adopt a cross-field learning approach to\nsignificantly improve data prediction accuracy, resulting in a substantially\nimproved compression ratio. (3) We develop an error control approach to provide\nstrict error bounds according to user requirements. We evaluated NeurLZ on\nseveral real-world HPC application datasets, including Nyx (cosmological\nsimulation), Miranda (large turbulence simulation), and Hurricane (weather\nsimulation). Experiments demonstrate that our framework achieves up to a 90%\nrelative reduction in bit rate under the same data distortion, compared to the\nbest existing approach.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NeurLZ: On Systematically Enhancing Lossy Compression Performance for Scientific Data based on Neural Learning with Error Control\",\"authors\":\"Wenqi Jia, Youyuan Liu, Zhewen Hu, Jinzhen Wang, Boyuan Zhang, Wei Niu, Junzhou Huang, Stavros Kalafatis, Sian Jin, Miao Yin\",\"doi\":\"arxiv-2409.05785\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale scientific simulations generate massive datasets that pose\\nsignificant challenges for storage and I/O. While traditional lossy compression\\ntechniques can improve performance, balancing compression ratio, data quality,\\nand throughput remains difficult. To address this, we propose NeurLZ, a novel\\ncross-field learning-based and error-controlled compression framework for\\nscientific data. By integrating skipping DNN models, cross-field learning, and\\nerror control, our framework aims to substantially enhance lossy compression\\nperformance. Our contributions are three-fold: (1) We design a lightweight\\nskipping model to provide high-fidelity detail retention, further improving\\nprediction accuracy. (2) We adopt a cross-field learning approach to\\nsignificantly improve data prediction accuracy, resulting in a substantially\\nimproved compression ratio. (3) We develop an error control approach to provide\\nstrict error bounds according to user requirements. We evaluated NeurLZ on\\nseveral real-world HPC application datasets, including Nyx (cosmological\\nsimulation), Miranda (large turbulence simulation), and Hurricane (weather\\nsimulation). Experiments demonstrate that our framework achieves up to a 90%\\nrelative reduction in bit rate under the same data distortion, compared to the\\nbest existing approach.\",\"PeriodicalId\":501422,\"journal\":{\"name\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"volume\":\"12 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.05785\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
NeurLZ: On Systematically Enhancing Lossy Compression Performance for Scientific Data based on Neural Learning with Error Control
Large-scale scientific simulations generate massive datasets that pose
significant challenges for storage and I/O. While traditional lossy compression
techniques can improve performance, balancing compression ratio, data quality,
and throughput remains difficult. To address this, we propose NeurLZ, a novel
cross-field learning-based and error-controlled compression framework for
scientific data. By integrating skipping DNN models, cross-field learning, and
error control, our framework aims to substantially enhance lossy compression
performance. Our contributions are three-fold: (1) We design a lightweight
skipping model to provide high-fidelity detail retention, further improving
prediction accuracy. (2) We adopt a cross-field learning approach to
significantly improve data prediction accuracy, resulting in a substantially
improved compression ratio. (3) We develop an error control approach to provide
strict error bounds according to user requirements. We evaluated NeurLZ on
several real-world HPC application datasets, including Nyx (cosmological
simulation), Miranda (large turbulence simulation), and Hurricane (weather
simulation). Experiments demonstrate that our framework achieves up to a 90%
relative reduction in bit rate under the same data distortion, compared to the
best existing approach.