Jian Liao, Mingzhen Li, Hailong Yang, Qingxiao Sun, Biao Sun, Jiwei Hao, Tianyu Feng, F. Yu, Shengdong Chen, Ye Tao, Zicheng Zhang, Zhongzhi Luan, D. Qian
{"title":"利用激活检查点中的输入张量动力学实现GPU的高效训练","authors":"Jian Liao, Mingzhen Li, Hailong Yang, Qingxiao Sun, Biao Sun, Jiwei Hao, Tianyu Feng, F. Yu, Shengdong Chen, Ye Tao, Zicheng Zhang, Zhongzhi Luan, D. Qian","doi":"10.1109/IPDPS54959.2023.00025","DOIUrl":null,"url":null,"abstract":"Larger deep learning models usually lead to higher model quality, however with an ever-increasing GPU memory footprint. Although several tensor checkpointing techniques have been proposed to enable training under a restricted GPU memory budget, they fail to exploit the input tensor dynamics due to diverse datasets and subsequent data augmentation, and thus leave the training optimization on table. In this paper, we propose Mimose, an input-aware tensor checkpointing planner respecting the memory budget while enabling efficient model training on GPU. Mimose builds a lightweight but accurate prediction model of GPU memory usage online, without pre-analyzing the model. It generates a tensor checkpointing plan based on per-layer memory prediction and applies it to the training process on the fly. Our experiments show that Mimose achieves superior training throughput compared to state-of-the-art checkpointing frameworks under the same GPU memory budgets.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU\",\"authors\":\"Jian Liao, Mingzhen Li, Hailong Yang, Qingxiao Sun, Biao Sun, Jiwei Hao, Tianyu Feng, F. Yu, Shengdong Chen, Ye Tao, Zicheng Zhang, Zhongzhi Luan, D. Qian\",\"doi\":\"10.1109/IPDPS54959.2023.00025\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Larger deep learning models usually lead to higher model quality, however with an ever-increasing GPU memory footprint. Although several tensor checkpointing techniques have been proposed to enable training under a restricted GPU memory budget, they fail to exploit the input tensor dynamics due to diverse datasets and subsequent data augmentation, and thus leave the training optimization on table. In this paper, we propose Mimose, an input-aware tensor checkpointing planner respecting the memory budget while enabling efficient model training on GPU. Mimose builds a lightweight but accurate prediction model of GPU memory usage online, without pre-analyzing the model. It generates a tensor checkpointing plan based on per-layer memory prediction and applies it to the training process on the fly. Our experiments show that Mimose achieves superior training throughput compared to state-of-the-art checkpointing frameworks under the same GPU memory budgets.\",\"PeriodicalId\":343684,\"journal\":{\"name\":\"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS54959.2023.00025\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS54959.2023.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU
Larger deep learning models usually lead to higher model quality, however with an ever-increasing GPU memory footprint. Although several tensor checkpointing techniques have been proposed to enable training under a restricted GPU memory budget, they fail to exploit the input tensor dynamics due to diverse datasets and subsequent data augmentation, and thus leave the training optimization on table. In this paper, we propose Mimose, an input-aware tensor checkpointing planner respecting the memory budget while enabling efficient model training on GPU. Mimose builds a lightweight but accurate prediction model of GPU memory usage online, without pre-analyzing the model. It generates a tensor checkpointing plan based on per-layer memory prediction and applies it to the training process on the fly. Our experiments show that Mimose achieves superior training throughput compared to state-of-the-art checkpointing frameworks under the same GPU memory budgets.