{"title":"量化 CNN 的硬件感知贝叶斯神经架构搜索","authors":"Mathieu Perrin;William Guicquero;Bruno Paille;Gilles Sicard","doi":"10.1109/LES.2024.3434379","DOIUrl":null,"url":null,"abstract":"Advances in neural architecture search (NAS) now provide a crucial assistance to design hardware-efficient neural networks (NNs). This letter presents NAS for resource-efficient, weight-quantized convolutional NNs (CNNs), under computational complexity constraints (model size and number of computations). Bayesian optimization is used to efficiently search for traceable CNN architectures within a continuous embedding space. This embedding is the latent space of a neural architecture autoencoder, regularized with a maximum mean discrepancy penalization and a convex latent predictor of parameters. On CIFAR-100, and without quantization, we obtain 75% test accuracy with less than 2.5M parameters and 600M operations. NAS experiments on STL-10 with 32, 8, and 4 bit weights outperform some high-end architectures while enabling drastic model size reduction (6 Mb–840 kb). It demonstrates our method’s ability to discover lightweight and high-performing models, while showcasing the importance of quantization to improve the tradeoff between accuracy and model size.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 1","pages":"42-45"},"PeriodicalIF":1.7000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hardware-Aware Bayesian Neural Architecture Search of Quantized CNNs\",\"authors\":\"Mathieu Perrin;William Guicquero;Bruno Paille;Gilles Sicard\",\"doi\":\"10.1109/LES.2024.3434379\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Advances in neural architecture search (NAS) now provide a crucial assistance to design hardware-efficient neural networks (NNs). This letter presents NAS for resource-efficient, weight-quantized convolutional NNs (CNNs), under computational complexity constraints (model size and number of computations). Bayesian optimization is used to efficiently search for traceable CNN architectures within a continuous embedding space. This embedding is the latent space of a neural architecture autoencoder, regularized with a maximum mean discrepancy penalization and a convex latent predictor of parameters. On CIFAR-100, and without quantization, we obtain 75% test accuracy with less than 2.5M parameters and 600M operations. NAS experiments on STL-10 with 32, 8, and 4 bit weights outperform some high-end architectures while enabling drastic model size reduction (6 Mb–840 kb). It demonstrates our method’s ability to discover lightweight and high-performing models, while showcasing the importance of quantization to improve the tradeoff between accuracy and model size.\",\"PeriodicalId\":56143,\"journal\":{\"name\":\"IEEE Embedded Systems Letters\",\"volume\":\"17 1\",\"pages\":\"42-45\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Embedded Systems Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10611734/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Embedded Systems Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10611734/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Hardware-Aware Bayesian Neural Architecture Search of Quantized CNNs
Advances in neural architecture search (NAS) now provide a crucial assistance to design hardware-efficient neural networks (NNs). This letter presents NAS for resource-efficient, weight-quantized convolutional NNs (CNNs), under computational complexity constraints (model size and number of computations). Bayesian optimization is used to efficiently search for traceable CNN architectures within a continuous embedding space. This embedding is the latent space of a neural architecture autoencoder, regularized with a maximum mean discrepancy penalization and a convex latent predictor of parameters. On CIFAR-100, and without quantization, we obtain 75% test accuracy with less than 2.5M parameters and 600M operations. NAS experiments on STL-10 with 32, 8, and 4 bit weights outperform some high-end architectures while enabling drastic model size reduction (6 Mb–840 kb). It demonstrates our method’s ability to discover lightweight and high-performing models, while showcasing the importance of quantization to improve the tradeoff between accuracy and model size.
期刊介绍:
The IEEE Embedded Systems Letters (ESL), provides a forum for rapid dissemination of latest technical advances in embedded systems and related areas in embedded software. The emphasis is on models, methods, and tools that ensure secure, correct, efficient and robust design of embedded systems and their applications.