Alptekin Vardar, Li Zhang, Susu Hu, Saiyam Bherulal Jain, Shaown Mojumder, N. Laleni, A. Shrivastava, S. De, T. Kämpfe
{"title":"Layer Sensitivity Aware CNN Quantization for Resource Constrained Edge Devices","authors":"Alptekin Vardar, Li Zhang, Susu Hu, Saiyam Bherulal Jain, Shaown Mojumder, N. Laleni, A. Shrivastava, S. De, T. Kämpfe","doi":"10.1109/ISCMI56532.2022.10068464","DOIUrl":null,"url":null,"abstract":"Edge computing is rapidly becoming the defacto method for AI applications. However, the latency area and energy continue to be the main bottlenecks. To solve this problem, a hardware-aware approach has to be adopted. Quantizing the activations vastly reduces the number of Multiply-Accumulate (MAC) operations, resulting in with better latency and energy consumption while quantizing the weights decreases both memory footprint and the number of MAC operations, also helping with area reduction. In this paper, it is demonstrated that adapting an intra-layer mixed quantization training technique for both weights and activations, concerning layer sensitivities, in a Resnet-20 architecture with CIFAR-10 data set, a memory reduction of 73% can be achieved compared to even its all 8bits counterpart while sacrificing only around 2.3% accuracy. Moreover, it is demonstrated that, depending on the needs of the application, the balance between accuracy and resource usage can easily be arranged using different mixed-quantization schemes.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCMI56532.2022.10068464","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Edge computing is rapidly becoming the defacto method for AI applications. However, the latency area and energy continue to be the main bottlenecks. To solve this problem, a hardware-aware approach has to be adopted. Quantizing the activations vastly reduces the number of Multiply-Accumulate (MAC) operations, resulting in with better latency and energy consumption while quantizing the weights decreases both memory footprint and the number of MAC operations, also helping with area reduction. In this paper, it is demonstrated that adapting an intra-layer mixed quantization training technique for both weights and activations, concerning layer sensitivities, in a Resnet-20 architecture with CIFAR-10 data set, a memory reduction of 73% can be achieved compared to even its all 8bits counterpart while sacrificing only around 2.3% accuracy. Moreover, it is demonstrated that, depending on the needs of the application, the balance between accuracy and resource usage can easily be arranged using different mixed-quantization schemes.