{"title":"Accelerating Data-Parallel Neural Network Training with Weighted-Averaging Reparameterisation","authors":"Sterling Ramroach, A. Joshi","doi":"10.1142/S0129626421500092","DOIUrl":null,"url":null,"abstract":"Recent advances in artificial intelligence has shown a direct correlation between the performance of a network and the number of hidden layers within the network. The Compute Unified Device Architecture (CUDA) framework facilitates the movement of heavy computation from the CPU to the graphics processing unit (GPU) and is used to accelerate the training of neural networks. In this paper, we consider the problem of data-parallel neural network training. We compare the performance of training the same neural network on the GPU with and without data parallelism. When data parallelism is used, we compare with both the conventional averaging of coefficients and our proposed method. We set out to show that not all sub-networks are equal and thus, should not be treated as equals when normalising weight vectors. The proposed method achieved state of the art accuracy faster than conventional training along with better classification performance in some cases.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Parallel Process. Lett.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129626421500092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recent advances in artificial intelligence has shown a direct correlation between the performance of a network and the number of hidden layers within the network. The Compute Unified Device Architecture (CUDA) framework facilitates the movement of heavy computation from the CPU to the graphics processing unit (GPU) and is used to accelerate the training of neural networks. In this paper, we consider the problem of data-parallel neural network training. We compare the performance of training the same neural network on the GPU with and without data parallelism. When data parallelism is used, we compare with both the conventional averaging of coefficients and our proposed method. We set out to show that not all sub-networks are equal and thus, should not be treated as equals when normalising weight vectors. The proposed method achieved state of the art accuracy faster than conventional training along with better classification performance in some cases.