OLM2: Automatic Optimal Strategy Generating for Large-Scale Model Training with Limited-Memory

Zhilin Yang, Yu Tang, Linbo Qiao, Xi Yang, Zhen Huang
{"title":"OLM2: Automatic Optimal Strategy Generating for Large-Scale Model Training with Limited-Memory","authors":"Zhilin Yang, Yu Tang, Linbo Qiao, Xi Yang, Zhen Huang","doi":"10.1109/JCC59055.2023.00006","DOIUrl":null,"url":null,"abstract":"The scale of model parameters and the amount of training data is exponentially increasing. It requires more GPU memory with the exponential increasement of model parameters. Recomputation and swapping are two main memory optimization methods that have been extensively studied, and there are also optimization strategies that combine the two methods. However, most of them are based on heuristic search strategies, which do not explore the complete solution space and can’t guarantee the optimality of the solution results. An optimal search strategy with tensor-level recomputation and swapping is expected in large-scale model training. In this paper, we propose an optimal strategy searching algorithm combining tensor-based recomputation and swapping. Specifically, the memory swapping strategy is reformulated as an optimization problem, which converts the memory constraints into mixed integer programming, to find the optimal memory optimization strategy. By leveraging the advantages of both recomputation and swapping, this approach minimizes computation consumption without exceeding the available memory limitation. Experimental results show that our method exhibits about 60% reduction in memory requirements during the training process. Furthermore, our method can reduce the overall training time beyond the existing algorithms. Compared to Checkmate, our approach achieves about 0.3–0.9% reduction in computation cost per iteration.","PeriodicalId":117254,"journal":{"name":"2023 IEEE International Conference on Joint Cloud Computing (JCC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Joint Cloud Computing (JCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCC59055.2023.00006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The scale of model parameters and the amount of training data is exponentially increasing. It requires more GPU memory with the exponential increasement of model parameters. Recomputation and swapping are two main memory optimization methods that have been extensively studied, and there are also optimization strategies that combine the two methods. However, most of them are based on heuristic search strategies, which do not explore the complete solution space and can’t guarantee the optimality of the solution results. An optimal search strategy with tensor-level recomputation and swapping is expected in large-scale model training. In this paper, we propose an optimal strategy searching algorithm combining tensor-based recomputation and swapping. Specifically, the memory swapping strategy is reformulated as an optimization problem, which converts the memory constraints into mixed integer programming, to find the optimal memory optimization strategy. By leveraging the advantages of both recomputation and swapping, this approach minimizes computation consumption without exceeding the available memory limitation. Experimental results show that our method exhibits about 60% reduction in memory requirements during the training process. Furthermore, our method can reduce the overall training time beyond the existing algorithms. Compared to Checkmate, our approach achieves about 0.3–0.9% reduction in computation cost per iteration.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
有限记忆下大规模模型训练的自动最优策略生成
模型参数的规模和训练数据的数量呈指数级增长。随着模型参数的指数增长,需要更多的GPU内存。重计算和交换是目前被广泛研究的两种主要的内存优化方法,也有将这两种方法结合起来的优化策略。然而,它们大多基于启发式搜索策略,不能探索完整的解空间,也不能保证解结果的最优性。在大规模模型训练中,需要一种具有张量级重计算和交换的最优搜索策略。本文提出了一种基于张量的重计算与交换相结合的最优策略搜索算法。将内存交换策略重新表述为一个优化问题,将内存约束转化为混合整数规划,寻找最优的内存优化策略。通过利用重计算和交换的优点,这种方法在不超过可用内存限制的情况下最大限度地减少了计算消耗。实验结果表明,在训练过程中,我们的方法可以将记忆需求降低约60%。此外,与现有算法相比,我们的方法可以减少总体训练时间。与Checkmate相比,我们的方法每次迭代的计算成本降低了0.3-0.9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Parallel Memory Defect Detection Method based on Sparse-Value-Flow Graph HyCU: Hybrid Consistent Update for Software Defined Network Filtering Alerts on Cloud Monitoring Systems Predictive Disk Provisioning for Adjustable Cloud Storage Solutions Bi-level Multi-Agent Actor-Critic Methods with ransformers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1