Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-05-01 DOI:10.1109/IPDPS54959.2023.00025

Jian Liao, Mingzhen Li, Hailong Yang, Qingxiao Sun, Biao Sun, Jiwei Hao, Tianyu Feng, F. Yu, Shengdong Chen, Ye Tao, Zicheng Zhang, Zhongzhi Luan, D. Qian

{"title":"Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU","authors":"Jian Liao, Mingzhen Li, Hailong Yang, Qingxiao Sun, Biao Sun, Jiwei Hao, Tianyu Feng, F. Yu, Shengdong Chen, Ye Tao, Zicheng Zhang, Zhongzhi Luan, D. Qian","doi":"10.1109/IPDPS54959.2023.00025","DOIUrl":null,"url":null,"abstract":"Larger deep learning models usually lead to higher model quality, however with an ever-increasing GPU memory footprint. Although several tensor checkpointing techniques have been proposed to enable training under a restricted GPU memory budget, they fail to exploit the input tensor dynamics due to diverse datasets and subsequent data augmentation, and thus leave the training optimization on table. In this paper, we propose Mimose, an input-aware tensor checkpointing planner respecting the memory budget while enabling efficient model training on GPU. Mimose builds a lightweight but accurate prediction model of GPU memory usage online, without pre-analyzing the model. It generates a tensor checkpointing plan based on per-layer memory prediction and applies it to the training process on the fly. Our experiments show that Mimose achieves superior training throughput compared to state-of-the-art checkpointing frameworks under the same GPU memory budgets.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS54959.2023.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Larger deep learning models usually lead to higher model quality, however with an ever-increasing GPU memory footprint. Although several tensor checkpointing techniques have been proposed to enable training under a restricted GPU memory budget, they fail to exploit the input tensor dynamics due to diverse datasets and subsequent data augmentation, and thus leave the training optimization on table. In this paper, we propose Mimose, an input-aware tensor checkpointing planner respecting the memory budget while enabling efficient model training on GPU. Mimose builds a lightweight but accurate prediction model of GPU memory usage online, without pre-analyzing the model. It generates a tensor checkpointing plan based on per-layer memory prediction and applies it to the training process on the fly. Our experiments show that Mimose achieves superior training throughput compared to state-of-the-art checkpointing frameworks under the same GPU memory budgets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用激活检查点中的输入张量动力学实现GPU的高效训练

更大的深度学习模型通常会带来更高的模型质量，但是GPU内存占用会不断增加。虽然已经提出了几种张量检查点技术来实现在有限的GPU内存预算下的训练，但由于不同的数据集和随后的数据增加，它们无法利用输入张量的动态，从而使训练优化留在表中。在本文中，我们提出了Mimose，一个输入感知张量检查点规划器，在尊重内存预算的同时，在GPU上实现高效的模型训练。Mimose构建了一个轻量级但准确的GPU内存使用在线预测模型，而无需对模型进行预分析。它基于每层记忆预测生成一个张量检查点计划，并将其应用于动态训练过程。我们的实验表明，在相同的GPU内存预算下，与最先进的检查点框架相比，Mimose实现了更高的训练吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量