{"title":"MemFlow: Memory-Aware Distributed Deep Learning","authors":"Neil Band","doi":"10.1145/3318464.3384416","DOIUrl":null,"url":null,"abstract":"As the number of layers and the amount of training data increases, the trend is to train deep neural networks in parallel across devices. In such scenarios, neural network training is increasingly bottlenecked by high memory requirements posed by intermediate results, or feature maps, that are produced during the forward pass and consumed during the backward pass. We recognize that the best-performing device parallelization configurations should consider memory usage in addition to the canonical metric of computation time. Towards this we introduce MemFlow, an optimization framework for distributed deep learning that performs joint optimization over memory usage and computation time when searching for a parallelization strategy. MemFlow consists of: (i) a task graph with memory usage estimates; (ii) a memory-aware execution simulator; and (iii) a Markov Chain Monte Carlo search algorithm that considers various degrees of recomputation i.e., discarding feature maps during the forward pass and recomputing them during the backward pass. Our experiments demonstrate that under memory constraints, MemFlow can readily locate valid and superior parallelization strategies unattainable with previous frameworks.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"52 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3318464.3384416","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
As the number of layers and the amount of training data increases, the trend is to train deep neural networks in parallel across devices. In such scenarios, neural network training is increasingly bottlenecked by high memory requirements posed by intermediate results, or feature maps, that are produced during the forward pass and consumed during the backward pass. We recognize that the best-performing device parallelization configurations should consider memory usage in addition to the canonical metric of computation time. Towards this we introduce MemFlow, an optimization framework for distributed deep learning that performs joint optimization over memory usage and computation time when searching for a parallelization strategy. MemFlow consists of: (i) a task graph with memory usage estimates; (ii) a memory-aware execution simulator; and (iii) a Markov Chain Monte Carlo search algorithm that considers various degrees of recomputation i.e., discarding feature maps during the forward pass and recomputing them during the backward pass. Our experiments demonstrate that under memory constraints, MemFlow can readily locate valid and superior parallelization strategies unattainable with previous frameworks.