Yongkweon Jeon, Baeseong Park, Se Jung Kwon, Byeongwook Kim, Jeongin Yun, Dongsoo Lee
{"title":"BiQGEMM:基于二进制编码的量化dnn的查找表矩阵乘法","authors":"Yongkweon Jeon, Baeseong Park, Se Jung Kwon, Byeongwook Kim, Jeongin Yun, Dongsoo Lee","doi":"10.1109/SC41405.2020.00099","DOIUrl":null,"url":null,"abstract":"The number of parameters in deep neural networks (DNNs) is rapidly increasing to support complicated tasks and to improve model accuracy. Correspondingly, the amount of computations and required memory footprint increase as well. Quantization is an efficient method to address such concerns by compressing DNNs such that computations can be simplified while required storage footprint is significantly reduced. Unfortunately, commercial CPUs and GPUs do not fully support quantization because only fixed data transfers (such as 32 bits) are allowed. As a result, even if weights are quantized (by a non-uniform quantization scheme) into a few bits, CPUs and GPUs may not access multiple quantized weights without memory bandwidth waste. Success of quantization in practice, hence, relies on an efficient computation engine design, especially for matrix multiplication that is a basic computation engine in most DNNs. In this paper, we propose a novel matrix multiplication method, called BiQGEMM, dedicated to quantized DNNs. BiQGEMM can access multiple quantized weights simultaneously in one instruction. In addition, BiQGEMM pre-computes intermediate results that are highly redundant when quantization leads to limited available computation space. Since pre-computed values are stored in lookup tables and reused, BiQGEMM achieves lower amount of overall computations. Our extensive experimental results show that BiQGEMM presents higher performance than conventional schemes when DNNs are quantized.","PeriodicalId":424429,"journal":{"name":"SC20: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"BiQGEMM: Matrix Multiplication with Lookup Table for Binary-Coding-Based Quantized DNNs\",\"authors\":\"Yongkweon Jeon, Baeseong Park, Se Jung Kwon, Byeongwook Kim, Jeongin Yun, Dongsoo Lee\",\"doi\":\"10.1109/SC41405.2020.00099\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The number of parameters in deep neural networks (DNNs) is rapidly increasing to support complicated tasks and to improve model accuracy. Correspondingly, the amount of computations and required memory footprint increase as well. Quantization is an efficient method to address such concerns by compressing DNNs such that computations can be simplified while required storage footprint is significantly reduced. Unfortunately, commercial CPUs and GPUs do not fully support quantization because only fixed data transfers (such as 32 bits) are allowed. As a result, even if weights are quantized (by a non-uniform quantization scheme) into a few bits, CPUs and GPUs may not access multiple quantized weights without memory bandwidth waste. Success of quantization in practice, hence, relies on an efficient computation engine design, especially for matrix multiplication that is a basic computation engine in most DNNs. In this paper, we propose a novel matrix multiplication method, called BiQGEMM, dedicated to quantized DNNs. BiQGEMM can access multiple quantized weights simultaneously in one instruction. In addition, BiQGEMM pre-computes intermediate results that are highly redundant when quantization leads to limited available computation space. Since pre-computed values are stored in lookup tables and reused, BiQGEMM achieves lower amount of overall computations. Our extensive experimental results show that BiQGEMM presents higher performance than conventional schemes when DNNs are quantized.\",\"PeriodicalId\":424429,\"journal\":{\"name\":\"SC20: International Conference for High Performance Computing, Networking, Storage and Analysis\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SC20: International Conference for High Performance Computing, Networking, Storage and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SC41405.2020.00099\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SC20: International Conference for High Performance Computing, Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC41405.2020.00099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
BiQGEMM: Matrix Multiplication with Lookup Table for Binary-Coding-Based Quantized DNNs
The number of parameters in deep neural networks (DNNs) is rapidly increasing to support complicated tasks and to improve model accuracy. Correspondingly, the amount of computations and required memory footprint increase as well. Quantization is an efficient method to address such concerns by compressing DNNs such that computations can be simplified while required storage footprint is significantly reduced. Unfortunately, commercial CPUs and GPUs do not fully support quantization because only fixed data transfers (such as 32 bits) are allowed. As a result, even if weights are quantized (by a non-uniform quantization scheme) into a few bits, CPUs and GPUs may not access multiple quantized weights without memory bandwidth waste. Success of quantization in practice, hence, relies on an efficient computation engine design, especially for matrix multiplication that is a basic computation engine in most DNNs. In this paper, we propose a novel matrix multiplication method, called BiQGEMM, dedicated to quantized DNNs. BiQGEMM can access multiple quantized weights simultaneously in one instruction. In addition, BiQGEMM pre-computes intermediate results that are highly redundant when quantization leads to limited available computation space. Since pre-computed values are stored in lookup tables and reused, BiQGEMM achieves lower amount of overall computations. Our extensive experimental results show that BiQGEMM presents higher performance than conventional schemes when DNNs are quantized.