Minhui Hu, Jianhua Fan, Yongyang Hu, Rui Xu, Yang Guo
{"title":"基于收缩阵列的CNN加速器PE利用率建模与优化","authors":"Minhui Hu, Jianhua Fan, Yongyang Hu, Rui Xu, Yang Guo","doi":"10.1117/12.2682498","DOIUrl":null,"url":null,"abstract":"Due to its efficiency, energy-saving, and abundant data reuse, systolic array has been a popular choice for Convolutional Neural Network (CNN) accelerators. Dataflow of the systolic array defines computation mapping strategy and memory access and it is one of the most important design points of accelerators. Most conventional accelerator designs choose a single dataflow and optimize around it. This may influence the Processing Element (PE) utilization rate and cause waste of computing resources and energy. This work introduces a self-paced method to alleviate this problem. We analyse and quantify the PE utilization rate related to the three basic dataflows and build a model called PEU-sim to explore workload-oriented flexible dataflow. Experiments show by combining three dataflows, we are able to raise more than 10% of PE utilization rate for most neural networks and we get the highest of 12.4% for MobileNet.","PeriodicalId":440430,"journal":{"name":"International Conference on Electronic Technology and Information Science","volume":"12715 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Modeling and optimizing PE utilization rate for systolic array based CNN accelerators\",\"authors\":\"Minhui Hu, Jianhua Fan, Yongyang Hu, Rui Xu, Yang Guo\",\"doi\":\"10.1117/12.2682498\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to its efficiency, energy-saving, and abundant data reuse, systolic array has been a popular choice for Convolutional Neural Network (CNN) accelerators. Dataflow of the systolic array defines computation mapping strategy and memory access and it is one of the most important design points of accelerators. Most conventional accelerator designs choose a single dataflow and optimize around it. This may influence the Processing Element (PE) utilization rate and cause waste of computing resources and energy. This work introduces a self-paced method to alleviate this problem. We analyse and quantify the PE utilization rate related to the three basic dataflows and build a model called PEU-sim to explore workload-oriented flexible dataflow. Experiments show by combining three dataflows, we are able to raise more than 10% of PE utilization rate for most neural networks and we get the highest of 12.4% for MobileNet.\",\"PeriodicalId\":440430,\"journal\":{\"name\":\"International Conference on Electronic Technology and Information Science\",\"volume\":\"12715 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Electronic Technology and Information Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2682498\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Electronic Technology and Information Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2682498","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Modeling and optimizing PE utilization rate for systolic array based CNN accelerators
Due to its efficiency, energy-saving, and abundant data reuse, systolic array has been a popular choice for Convolutional Neural Network (CNN) accelerators. Dataflow of the systolic array defines computation mapping strategy and memory access and it is one of the most important design points of accelerators. Most conventional accelerator designs choose a single dataflow and optimize around it. This may influence the Processing Element (PE) utilization rate and cause waste of computing resources and energy. This work introduces a self-paced method to alleviate this problem. We analyse and quantify the PE utilization rate related to the three basic dataflows and build a model called PEU-sim to explore workload-oriented flexible dataflow. Experiments show by combining three dataflows, we are able to raise more than 10% of PE utilization rate for most neural networks and we get the highest of 12.4% for MobileNet.