{"title":"利用 GEMM 和 Systolic 阵列设计和实现基于 NoC 的卷积架构","authors":"S. Ortega-Cisneros","doi":"10.1109/LES.2023.3321019","DOIUrl":null,"url":null,"abstract":"Neural networks have been used for a long time for image detection and recognition applications due to their ability and efficiency in complex problem solving. Several researchers have chosen to design and develop hardware accelerators for the convolution layer due to the large computational expense consumed by this layer. For that reason, a system that performs indirect GEMM convolution is implemented in a FPGA in this letter. Thus, the input data is segmented and distributed into acceleration modules in a parallel and distributed manner using the Network-on-Chip (NoC) paradigm, and a systolic array (SA) is implemented for the matrix multiplication operation as processing blocks within each NoC Node. Synthesis and performance results show that the implementation of this system presents better results compared to the state of the art in areas, such as acceleration factor, consumption of resources, latency, and operational frequency.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 1","pages":"49-52"},"PeriodicalIF":1.7000,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Design and Implementation of an NoC-Based Convolution Architecture With GEMM and Systolic Arrays\",\"authors\":\"S. Ortega-Cisneros\",\"doi\":\"10.1109/LES.2023.3321019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural networks have been used for a long time for image detection and recognition applications due to their ability and efficiency in complex problem solving. Several researchers have chosen to design and develop hardware accelerators for the convolution layer due to the large computational expense consumed by this layer. For that reason, a system that performs indirect GEMM convolution is implemented in a FPGA in this letter. Thus, the input data is segmented and distributed into acceleration modules in a parallel and distributed manner using the Network-on-Chip (NoC) paradigm, and a systolic array (SA) is implemented for the matrix multiplication operation as processing blocks within each NoC Node. Synthesis and performance results show that the implementation of this system presents better results compared to the state of the art in areas, such as acceleration factor, consumption of resources, latency, and operational frequency.\",\"PeriodicalId\":56143,\"journal\":{\"name\":\"IEEE Embedded Systems Letters\",\"volume\":\"16 1\",\"pages\":\"49-52\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Embedded Systems Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10268006/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Embedded Systems Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10268006/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Design and Implementation of an NoC-Based Convolution Architecture With GEMM and Systolic Arrays
Neural networks have been used for a long time for image detection and recognition applications due to their ability and efficiency in complex problem solving. Several researchers have chosen to design and develop hardware accelerators for the convolution layer due to the large computational expense consumed by this layer. For that reason, a system that performs indirect GEMM convolution is implemented in a FPGA in this letter. Thus, the input data is segmented and distributed into acceleration modules in a parallel and distributed manner using the Network-on-Chip (NoC) paradigm, and a systolic array (SA) is implemented for the matrix multiplication operation as processing blocks within each NoC Node. Synthesis and performance results show that the implementation of this system presents better results compared to the state of the art in areas, such as acceleration factor, consumption of resources, latency, and operational frequency.
期刊介绍:
The IEEE Embedded Systems Letters (ESL), provides a forum for rapid dissemination of latest technical advances in embedded systems and related areas in embedded software. The emphasis is on models, methods, and tools that ensure secure, correct, efficient and robust design of embedded systems and their applications.