{"title":"Implementation of a Round Robin Processing Element for Deep Learning Accelerator","authors":"Eunchong Lee, Yongseok Lee, Sang-Seol Lee, Byoung-Ho Choi","doi":"10.1109/ISOCC50952.2020.9333012","DOIUrl":null,"url":null,"abstract":"The deep learning acceleration hardwareperformance is greatly affected by Processing Elements (PEs). In order to apply deep learning accelerators to mobile devices, optimized PE must be designed as ASIC. To improve the performance of PE, we focused on methods of minimizing external memory access and parallelization. As a result, a deep learning accelerator architecture consisting of 512 PEs in parallel is proposed and the results of FPGA implementation is presented.","PeriodicalId":270577,"journal":{"name":"2020 International SoC Design Conference (ISOCC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International SoC Design Conference (ISOCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISOCC50952.2020.9333012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Implementation of a Round Robin Processing Element for Deep Learning Accelerator
The deep learning acceleration hardwareperformance is greatly affected by Processing Elements (PEs). In order to apply deep learning accelerators to mobile devices, optimized PE must be designed as ASIC. To improve the performance of PE, we focused on methods of minimizing external memory access and parallelization. As a result, a deep learning accelerator architecture consisting of 512 PEs in parallel is proposed and the results of FPGA implementation is presented.