OLM2: Automatic Optimal Strategy Generating for Large-Scale Model Training with Limited-Memory

2023 IEEE International Conference on Joint Cloud Computing (JCC) Pub Date : 2023-07-01 DOI:10.1109/JCC59055.2023.00006

Zhilin Yang, Yu Tang, Linbo Qiao, Xi Yang, Zhen Huang

{"title":"OLM2: Automatic Optimal Strategy Generating for Large-Scale Model Training with Limited-Memory","authors":"Zhilin Yang, Yu Tang, Linbo Qiao, Xi Yang, Zhen Huang","doi":"10.1109/JCC59055.2023.00006","DOIUrl":null,"url":null,"abstract":"The scale of model parameters and the amount of training data is exponentially increasing. It requires more GPU memory with the exponential increasement of model parameters. Recomputation and swapping are two main memory optimization methods that have been extensively studied, and there are also optimization strategies that combine the two methods. However, most of them are based on heuristic search strategies, which do not explore the complete solution space and can’t guarantee the optimality of the solution results. An optimal search strategy with tensor-level recomputation and swapping is expected in large-scale model training. In this paper, we propose an optimal strategy searching algorithm combining tensor-based recomputation and swapping. Specifically, the memory swapping strategy is reformulated as an optimization problem, which converts the memory constraints into mixed integer programming, to find the optimal memory optimization strategy. By leveraging the advantages of both recomputation and swapping, this approach minimizes computation consumption without exceeding the available memory limitation. Experimental results show that our method exhibits about 60% reduction in memory requirements during the training process. Furthermore, our method can reduce the overall training time beyond the existing algorithms. Compared to Checkmate, our approach achieves about 0.3–0.9% reduction in computation cost per iteration.","PeriodicalId":117254,"journal":{"name":"2023 IEEE International Conference on Joint Cloud Computing (JCC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Joint Cloud Computing (JCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCC59055.2023.00006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The scale of model parameters and the amount of training data is exponentially increasing. It requires more GPU memory with the exponential increasement of model parameters. Recomputation and swapping are two main memory optimization methods that have been extensively studied, and there are also optimization strategies that combine the two methods. However, most of them are based on heuristic search strategies, which do not explore the complete solution space and can’t guarantee the optimality of the solution results. An optimal search strategy with tensor-level recomputation and swapping is expected in large-scale model training. In this paper, we propose an optimal strategy searching algorithm combining tensor-based recomputation and swapping. Specifically, the memory swapping strategy is reformulated as an optimization problem, which converts the memory constraints into mixed integer programming, to find the optimal memory optimization strategy. By leveraging the advantages of both recomputation and swapping, this approach minimizes computation consumption without exceeding the available memory limitation. Experimental results show that our method exhibits about 60% reduction in memory requirements during the training process. Furthermore, our method can reduce the overall training time beyond the existing algorithms. Compared to Checkmate, our approach achieves about 0.3–0.9% reduction in computation cost per iteration.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

有限记忆下大规模模型训练的自动最优策略生成

模型参数的规模和训练数据的数量呈指数级增长。随着模型参数的指数增长，需要更多的GPU内存。重计算和交换是目前被广泛研究的两种主要的内存优化方法，也有将这两种方法结合起来的优化策略。然而，它们大多基于启发式搜索策略，不能探索完整的解空间，也不能保证解结果的最优性。在大规模模型训练中，需要一种具有张量级重计算和交换的最优搜索策略。本文提出了一种基于张量的重计算与交换相结合的最优策略搜索算法。将内存交换策略重新表述为一个优化问题，将内存约束转化为混合整数规划，寻找最优的内存优化策略。通过利用重计算和交换的优点，这种方法在不超过可用内存限制的情况下最大限度地减少了计算消耗。实验结果表明，在训练过程中，我们的方法可以将记忆需求降低约60%。此外，与现有算法相比，我们的方法可以减少总体训练时间。与Checkmate相比，我们的方法每次迭代的计算成本降低了0.3-0.9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE International Conference on Joint Cloud Computing (JCC)

自引率

0.00%

发文量

期刊最新文献

A Parallel Memory Defect Detection Method based on Sparse-Value-Flow Graph HyCU: Hybrid Consistent Update for Software Defined Network Filtering Alerts on Cloud Monitoring Systems Predictive Disk Provisioning for Adjustable Cloud Storage Solutions Bi-level Multi-Agent Actor-Critic Methods with ransformers