Lucas Alvarenga, Victor Ferrari, Rafael Souza, Marcio Pereira, Guido Araujo
{"title":"ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation","authors":"Lucas Alvarenga, Victor Ferrari, Rafael Souza, Marcio Pereira, Guido Araujo","doi":"arxiv-2407.10730","DOIUrl":null,"url":null,"abstract":"Convolution is a compute-intensive operation placed at the heart of\nConvolution Neural Networks (CNNs). It has led to the development of many\nhigh-performance algorithms, such as Im2col-GEMM, Winograd, and\nDirect-Convolution. However, the comparison of different convolution algorithms\nis an error-prone task as it requires specific data layouts and system\nresources. Failure to address these requirements might lead to unwanted time\npenalties. Thus, considering all processing steps within convolution algorithms\nis essential to comprehensively evaluate and fairly compare their performance.\nFurthermore, most known convolution benchmarking adopts ad-hoc testing suites\nwith limited coverage and handmade operations. This paper proposes ConvBench, a\nprimitive-level benchmark for the evaluation and comparison of convolution\nalgorithms. It assesses 9243 convolution operations derived from 1097\nreal-world deep learning models, resulting in performance and execution\nbreakdown graphs for a detailed evaluation. ConvBench capability is evaluated\nacross the Sliced Convolution (SConv) algorithm. The experiments showed results\nfaster than Im2col-GEMM in 93.6% of the convolutions. However, the use of\nConvBench allowed the delving into the remaining 6.4% underperforming\nconvolutions, uncovering a critical slowdown of 79.5% on average of SConv's\npacking step. This analysis underscores a potential source of optimization for\nSConv, opening up new paths for convolution designers to improve their\nalgorithms.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"36 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.10730","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Convolution is a compute-intensive operation placed at the heart of
Convolution Neural Networks (CNNs). It has led to the development of many
high-performance algorithms, such as Im2col-GEMM, Winograd, and
Direct-Convolution. However, the comparison of different convolution algorithms
is an error-prone task as it requires specific data layouts and system
resources. Failure to address these requirements might lead to unwanted time
penalties. Thus, considering all processing steps within convolution algorithms
is essential to comprehensively evaluate and fairly compare their performance.
Furthermore, most known convolution benchmarking adopts ad-hoc testing suites
with limited coverage and handmade operations. This paper proposes ConvBench, a
primitive-level benchmark for the evaluation and comparison of convolution
algorithms. It assesses 9243 convolution operations derived from 1097
real-world deep learning models, resulting in performance and execution
breakdown graphs for a detailed evaluation. ConvBench capability is evaluated
across the Sliced Convolution (SConv) algorithm. The experiments showed results
faster than Im2col-GEMM in 93.6% of the convolutions. However, the use of
ConvBench allowed the delving into the remaining 6.4% underperforming
convolutions, uncovering a critical slowdown of 79.5% on average of SConv's
packing step. This analysis underscores a potential source of optimization for
SConv, opening up new paths for convolution designers to improve their
algorithms.