Reducing the Training Overhead of the HPC Compression Autoencoder via Dataset Proportioning

Tong Liu, Shakeel Alibhai, Jinzhen Wang, Qing Liu, Xubin He
{"title":"Reducing the Training Overhead of the HPC Compression Autoencoder via Dataset Proportioning","authors":"Tong Liu, Shakeel Alibhai, Jinzhen Wang, Qing Liu, Xubin He","doi":"10.1109/nas51552.2021.9605407","DOIUrl":null,"url":null,"abstract":"As the storage overhead of high-performance computing (HPC) data reaches into the petabyte or even exabyte scale, it could be useful to find new methods of compressing such data. The compression autoencoder (CAE) has recently been proposed to compress HPC data with a very high compression ratio. However, this machine learning-based method suffers from the major drawback of lengthy training time. In this paper, we attempt to mitigate this problem by proposing a proportioning scheme to reduce the amount of data that is used for training relative to the amount of data to be compressed. We show that this method drastically reduces the training time without, in most cases, significantly increasing the error. We further explain how this scheme can even improve the accuracy of the CAE on certain datasets. Finally, we provide some guidance on how to determine a suitable proportion of the training dataset to use in order to train the CAE for a given dataset.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/nas51552.2021.9605407","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

As the storage overhead of high-performance computing (HPC) data reaches into the petabyte or even exabyte scale, it could be useful to find new methods of compressing such data. The compression autoencoder (CAE) has recently been proposed to compress HPC data with a very high compression ratio. However, this machine learning-based method suffers from the major drawback of lengthy training time. In this paper, we attempt to mitigate this problem by proposing a proportioning scheme to reduce the amount of data that is used for training relative to the amount of data to be compressed. We show that this method drastically reduces the training time without, in most cases, significantly increasing the error. We further explain how this scheme can even improve the accuracy of the CAE on certain datasets. Finally, we provide some guidance on how to determine a suitable proportion of the training dataset to use in order to train the CAE for a given dataset.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过数据集比例化降低HPC压缩自编码器的训练开销
随着高性能计算(HPC)数据的存储开销达到pb甚至eb级,找到压缩此类数据的新方法可能会很有用。压缩自编码器(CAE)最近被提出用于压缩高性能计算数据,具有非常高的压缩比。然而,这种基于机器学习的方法的主要缺点是训练时间长。在本文中,我们试图通过提出一个比例方案来缓解这个问题,以减少用于训练的数据量相对于要压缩的数据量。我们表明,这种方法大大减少了训练时间,而在大多数情况下,显著增加了误差。我们进一步解释了该方案如何在某些数据集上提高CAE的准确性。最后,我们提供了一些指导,说明如何确定训练数据集的合适比例,以便为给定数据集训练CAE。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
NVSwap: Latency-Aware Paging using Non-Volatile Main Memory Deflection-Aware Routing Algorithm in Network on Chip against Soft Errors and Crosstalk Faults PLMC: A Predictable Tail Latency Mode Coordinator for Shared NVMe SSD with Multiple Hosts Efficient NVM Crash Consistency by Mitigating Resource Contention Characterizing AI Model Inference Applications Running in the SGX Environment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1