Zhiwei Ai, Juelin Leng, Fang Xia, Huawei Wang, Yi Cao
{"title":"大规模结构化数据集的误差控制数据约简方法","authors":"Zhiwei Ai, Juelin Leng, Fang Xia, Huawei Wang, Yi Cao","doi":"10.3724/sp.j.1089.2021.19263","DOIUrl":null,"url":null,"abstract":"The massive datasets generated by scientific or engineering simulations have reached terabytes (TB) or even petabytes (PB). Data reduction has thus become one of the most important tools for saving I/O and storage costs. In order to achieve high-precision visualization and analysis, an error-controlled data reduction approach is proposed for reducing structured large-scale datasets. Firstly, taken the difference between the resulting data and the original one as a constraint, a multi-level structured adaptively-refined background grid is constructed, according to the spatial distribution characteristics of the underlying physical fields. Secondly, the original data is interpolated and mapped to the background grid, and as a result, the data with much less cells is obtained and the storage cost is reduced. Finally, the reduced data is exported to the parallel file system in real time. The proposed data reduction algorithm is implemented based on the parallel programming framework named JASMIN. In this way, the algorithm can be directly coupled with the numerical simulation programs developed with JASMIN. Test results demonstrate that the parallel algorithm can be extended to tens of thousands of CPU cores in parallel. The proposed algorithm has been successfully applied to the electromagnetic simulation of unmanned aerial vehicle irradiation. The cell number of a structured dataset with one hundred billions cells is 1796 计算机辅助设计与图形学学报 第 33 卷 reduced by 99.8%, with the relative error less than 10%. The peak signal-tonoise ratio between the two images, rendered using the reduced data and the original one respectively, is equal to 47.08 dB, which means a high similarity and thus satisfies the precision requirement of visualization.","PeriodicalId":52442,"journal":{"name":"计算机辅助设计与图形学学报","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Error-Controlled Data Reduction Approach for Large-Scale Structured Datasets\",\"authors\":\"Zhiwei Ai, Juelin Leng, Fang Xia, Huawei Wang, Yi Cao\",\"doi\":\"10.3724/sp.j.1089.2021.19263\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The massive datasets generated by scientific or engineering simulations have reached terabytes (TB) or even petabytes (PB). Data reduction has thus become one of the most important tools for saving I/O and storage costs. In order to achieve high-precision visualization and analysis, an error-controlled data reduction approach is proposed for reducing structured large-scale datasets. Firstly, taken the difference between the resulting data and the original one as a constraint, a multi-level structured adaptively-refined background grid is constructed, according to the spatial distribution characteristics of the underlying physical fields. Secondly, the original data is interpolated and mapped to the background grid, and as a result, the data with much less cells is obtained and the storage cost is reduced. Finally, the reduced data is exported to the parallel file system in real time. The proposed data reduction algorithm is implemented based on the parallel programming framework named JASMIN. In this way, the algorithm can be directly coupled with the numerical simulation programs developed with JASMIN. Test results demonstrate that the parallel algorithm can be extended to tens of thousands of CPU cores in parallel. The proposed algorithm has been successfully applied to the electromagnetic simulation of unmanned aerial vehicle irradiation. The cell number of a structured dataset with one hundred billions cells is 1796 计算机辅助设计与图形学学报 第 33 卷 reduced by 99.8%, with the relative error less than 10%. The peak signal-tonoise ratio between the two images, rendered using the reduced data and the original one respectively, is equal to 47.08 dB, which means a high similarity and thus satisfies the precision requirement of visualization.\",\"PeriodicalId\":52442,\"journal\":{\"name\":\"计算机辅助设计与图形学学报\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"计算机辅助设计与图形学学报\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://doi.org/10.3724/sp.j.1089.2021.19263\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"计算机辅助设计与图形学学报","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.3724/sp.j.1089.2021.19263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
Error-Controlled Data Reduction Approach for Large-Scale Structured Datasets
The massive datasets generated by scientific or engineering simulations have reached terabytes (TB) or even petabytes (PB). Data reduction has thus become one of the most important tools for saving I/O and storage costs. In order to achieve high-precision visualization and analysis, an error-controlled data reduction approach is proposed for reducing structured large-scale datasets. Firstly, taken the difference between the resulting data and the original one as a constraint, a multi-level structured adaptively-refined background grid is constructed, according to the spatial distribution characteristics of the underlying physical fields. Secondly, the original data is interpolated and mapped to the background grid, and as a result, the data with much less cells is obtained and the storage cost is reduced. Finally, the reduced data is exported to the parallel file system in real time. The proposed data reduction algorithm is implemented based on the parallel programming framework named JASMIN. In this way, the algorithm can be directly coupled with the numerical simulation programs developed with JASMIN. Test results demonstrate that the parallel algorithm can be extended to tens of thousands of CPU cores in parallel. The proposed algorithm has been successfully applied to the electromagnetic simulation of unmanned aerial vehicle irradiation. The cell number of a structured dataset with one hundred billions cells is 1796 计算机辅助设计与图形学学报 第 33 卷 reduced by 99.8%, with the relative error less than 10%. The peak signal-tonoise ratio between the two images, rendered using the reduced data and the original one respectively, is equal to 47.08 dB, which means a high similarity and thus satisfies the precision requirement of visualization.