通过数据集比例化降低HPC压缩自编码器的训练开销

2021 IEEE International Conference on Networking, Architecture and Storage (NAS) Pub Date : 2021-10-01 DOI:10.1109/nas51552.2021.9605407

Tong Liu, Shakeel Alibhai, Jinzhen Wang, Qing Liu, Xubin He

{"title":"通过数据集比例化降低HPC压缩自编码器的训练开销","authors":"Tong Liu, Shakeel Alibhai, Jinzhen Wang, Qing Liu, Xubin He","doi":"10.1109/nas51552.2021.9605407","DOIUrl":null,"url":null,"abstract":"As the storage overhead of high-performance computing (HPC) data reaches into the petabyte or even exabyte scale, it could be useful to find new methods of compressing such data. The compression autoencoder (CAE) has recently been proposed to compress HPC data with a very high compression ratio. However, this machine learning-based method suffers from the major drawback of lengthy training time. In this paper, we attempt to mitigate this problem by proposing a proportioning scheme to reduce the amount of data that is used for training relative to the amount of data to be compressed. We show that this method drastically reduces the training time without, in most cases, significantly increasing the error. We further explain how this scheme can even improve the accuracy of the CAE on certain datasets. Finally, we provide some guidance on how to determine a suitable proportion of the training dataset to use in order to train the CAE for a given dataset.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reducing the Training Overhead of the HPC Compression Autoencoder via Dataset Proportioning\",\"authors\":\"Tong Liu, Shakeel Alibhai, Jinzhen Wang, Qing Liu, Xubin He\",\"doi\":\"10.1109/nas51552.2021.9605407\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the storage overhead of high-performance computing (HPC) data reaches into the petabyte or even exabyte scale, it could be useful to find new methods of compressing such data. The compression autoencoder (CAE) has recently been proposed to compress HPC data with a very high compression ratio. However, this machine learning-based method suffers from the major drawback of lengthy training time. In this paper, we attempt to mitigate this problem by proposing a proportioning scheme to reduce the amount of data that is used for training relative to the amount of data to be compressed. We show that this method drastically reduces the training time without, in most cases, significantly increasing the error. We further explain how this scheme can even improve the accuracy of the CAE on certain datasets. Finally, we provide some guidance on how to determine a suitable proportion of the training dataset to use in order to train the CAE for a given dataset.\",\"PeriodicalId\":135930,\"journal\":{\"name\":\"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/nas51552.2021.9605407\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/nas51552.2021.9605407","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着高性能计算(HPC)数据的存储开销达到pb甚至eb级，找到压缩此类数据的新方法可能会很有用。压缩自编码器(CAE)最近被提出用于压缩高性能计算数据，具有非常高的压缩比。然而，这种基于机器学习的方法的主要缺点是训练时间长。在本文中，我们试图通过提出一个比例方案来缓解这个问题，以减少用于训练的数据量相对于要压缩的数据量。我们表明，这种方法大大减少了训练时间，而在大多数情况下，显著增加了误差。我们进一步解释了该方案如何在某些数据集上提高CAE的准确性。最后，我们提供了一些指导，说明如何确定训练数据集的合适比例，以便为给定数据集训练CAE。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Reducing the Training Overhead of the HPC Compression Autoencoder via Dataset Proportioning

As the storage overhead of high-performance computing (HPC) data reaches into the petabyte or even exabyte scale, it could be useful to find new methods of compressing such data. The compression autoencoder (CAE) has recently been proposed to compress HPC data with a very high compression ratio. However, this machine learning-based method suffers from the major drawback of lengthy training time. In this paper, we attempt to mitigate this problem by proposing a proportioning scheme to reduce the amount of data that is used for training relative to the amount of data to be compressed. We show that this method drastically reduces the training time without, in most cases, significantly increasing the error. We further explain how this scheme can even improve the accuracy of the CAE on certain datasets. Finally, we provide some guidance on how to determine a suitable proportion of the training dataset to use in order to train the CAE for a given dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

自引率

0.00%

发文量