{"title":"The Brain Memory Architecture HW/SW Co-Design Platform with Adaptive CNN Algorithm","authors":"J. Chiu, Yu-Yi Wang, W. Lin","doi":"10.1109/ICS51289.2020.00047","DOIUrl":null,"url":null,"abstract":"As the demand for machine learning, edge computing, and the Internet of Things technology increases, computing efficiency and energy consumption has become an important basis for computing choices. Although the graphics processing unit(GPU) has a high degree of parallel computing capability, its energy consumption is large, and the data transmission is limited by the system bus bandwidth. Therefore, our laboratory previously proposed the Brain Memory Architecture prototype architecture, which integrates FPGA and memory as a computing architecture, which has the advantages of high-efficiency, and low-power computing and does not require data exchange through the system bus. Based on this prototype architecture, this paper constructs the Brain Memory Architecture HW/SW Co-Design Platform (BMCD platform) to provide a good user interface so that users can easily build a hardware and software collaborative design computing environment. Through the library provided by the platform to establish the data transmission and calculation between acceleration hardware and memory to solve the bandwidth limitation of the traditional system bus. In this platform, the AXI4-stream interconnect core is provided as a standard interface for data handshaking with acceleration hardware, which reduces user design complexity and maintains the scalability of connection with other computing IP cores. In platform evaluation, design and adaptive CNN algorithm for hardware and software design platform, provide data quantization methods to reduce data bits to reduce the required data bandwidth and storage space and propose a dynamic adjustment algorithm for integer and decimal ratios to correct the accuracy and design problems that may be caused by data quantization. With this adaptive CNN algorithm architecture and BMCD platform to construct a rapid data transmission. This paper finally analyzes the comparison of the weight transmission time of different CNN models with the CPU and the GPU. The method proposed in this paper can reach about 20 times faster than the CPU and about 10 times faster than the GPU.","PeriodicalId":176275,"journal":{"name":"2020 International Computer Symposium (ICS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Computer Symposium (ICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICS51289.2020.00047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As the demand for machine learning, edge computing, and the Internet of Things technology increases, computing efficiency and energy consumption has become an important basis for computing choices. Although the graphics processing unit(GPU) has a high degree of parallel computing capability, its energy consumption is large, and the data transmission is limited by the system bus bandwidth. Therefore, our laboratory previously proposed the Brain Memory Architecture prototype architecture, which integrates FPGA and memory as a computing architecture, which has the advantages of high-efficiency, and low-power computing and does not require data exchange through the system bus. Based on this prototype architecture, this paper constructs the Brain Memory Architecture HW/SW Co-Design Platform (BMCD platform) to provide a good user interface so that users can easily build a hardware and software collaborative design computing environment. Through the library provided by the platform to establish the data transmission and calculation between acceleration hardware and memory to solve the bandwidth limitation of the traditional system bus. In this platform, the AXI4-stream interconnect core is provided as a standard interface for data handshaking with acceleration hardware, which reduces user design complexity and maintains the scalability of connection with other computing IP cores. In platform evaluation, design and adaptive CNN algorithm for hardware and software design platform, provide data quantization methods to reduce data bits to reduce the required data bandwidth and storage space and propose a dynamic adjustment algorithm for integer and decimal ratios to correct the accuracy and design problems that may be caused by data quantization. With this adaptive CNN algorithm architecture and BMCD platform to construct a rapid data transmission. This paper finally analyzes the comparison of the weight transmission time of different CNN models with the CPU and the GPU. The method proposed in this paper can reach about 20 times faster than the CPU and about 10 times faster than the GPU.