{"title":"Performance Analysis of GPU-Based Convolutional Neural Networks","authors":"Xiaqing Li, Guangyan Zhang, H. Howie Huang, Zhufan Wang, Weimin Zheng","doi":"10.1109/ICPP.2016.15","DOIUrl":null,"url":null,"abstract":"As one of the most important deep learning models, convolutional neural networks (CNNs) have achieved great successes in a number of applications such as image classification, speech recognition and nature language understanding. Training CNNs on large data sets is computationally expensive, leading to a flurry of research and development of open-source parallel implementations on GPUs. However, few studies have been performed to evaluate the performance characteristics of those implementations. In this paper, we conduct a comprehensive comparison of these implementations over a wide range of parameter configurations, investigate potential performance bottlenecks and point out a number of opportunities for further optimization.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"109","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 45th International Conference on Parallel Processing (ICPP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2016.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 109
Abstract
As one of the most important deep learning models, convolutional neural networks (CNNs) have achieved great successes in a number of applications such as image classification, speech recognition and nature language understanding. Training CNNs on large data sets is computationally expensive, leading to a flurry of research and development of open-source parallel implementations on GPUs. However, few studies have been performed to evaluate the performance characteristics of those implementations. In this paper, we conduct a comprehensive comparison of these implementations over a wide range of parameter configurations, investigate potential performance bottlenecks and point out a number of opportunities for further optimization.