{"title":"利用稀疏cnn的高能效推理加速器","authors":"Ning Li","doi":"10.1145/3404555.3404626","DOIUrl":null,"url":null,"abstract":"The significantly growing computation and memory demands have become a bottleneck for the application of convolutional neural networks (CNNs). Model compression is an efficient method to accelerate CNNs. However, the commonly designed architectures are not suitable for compressed models and waste large computational resources on zero operands. In this work, we propose a flexible CNNs inference accelerator on FPGA utilizing uniform sparsity introduced by pattern pruning to achieve high performance. Our accelerator architecture exploits different input & output parallelism for sparse computation to maximize the utilization of computing arrays. A dynamically adjustable mechanism is designed to deal with the unbalanced workload. What's more, a novel data buffering structure with slightly rearranged sequences is applied to address the challenge of access conflict. The experiments show that our accelerator can achieve 316.4 GOP/s ~ 343.5 GOP/s for VGG-16 and ResNet-50.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A High Energy-Efficiency Inference Accelerator Exploiting Sparse CNNs\",\"authors\":\"Ning Li\",\"doi\":\"10.1145/3404555.3404626\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The significantly growing computation and memory demands have become a bottleneck for the application of convolutional neural networks (CNNs). Model compression is an efficient method to accelerate CNNs. However, the commonly designed architectures are not suitable for compressed models and waste large computational resources on zero operands. In this work, we propose a flexible CNNs inference accelerator on FPGA utilizing uniform sparsity introduced by pattern pruning to achieve high performance. Our accelerator architecture exploits different input & output parallelism for sparse computation to maximize the utilization of computing arrays. A dynamically adjustable mechanism is designed to deal with the unbalanced workload. What's more, a novel data buffering structure with slightly rearranged sequences is applied to address the challenge of access conflict. The experiments show that our accelerator can achieve 316.4 GOP/s ~ 343.5 GOP/s for VGG-16 and ResNet-50.\",\"PeriodicalId\":220526,\"journal\":{\"name\":\"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3404555.3404626\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3404555.3404626","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A High Energy-Efficiency Inference Accelerator Exploiting Sparse CNNs
The significantly growing computation and memory demands have become a bottleneck for the application of convolutional neural networks (CNNs). Model compression is an efficient method to accelerate CNNs. However, the commonly designed architectures are not suitable for compressed models and waste large computational resources on zero operands. In this work, we propose a flexible CNNs inference accelerator on FPGA utilizing uniform sparsity introduced by pattern pruning to achieve high performance. Our accelerator architecture exploits different input & output parallelism for sparse computation to maximize the utilization of computing arrays. A dynamically adjustable mechanism is designed to deal with the unbalanced workload. What's more, a novel data buffering structure with slightly rearranged sequences is applied to address the challenge of access conflict. The experiments show that our accelerator can achieve 316.4 GOP/s ~ 343.5 GOP/s for VGG-16 and ResNet-50.