Kuan-Yu Chen, Chi-Sheng Yang, Yu-Hsiu Sun, Chien-Wei Tseng, Morteza Fayazi, Xin He, Siying Feng, Y. Yue, T. Mudge, R. Dreslinski, Hun-Seok Kim, D. Blaauw
{"title":"A 507 GMACs/J 256-Core Domain Adaptive Systolic-Array-Processor for Wireless Communication and Linear-Algebra Kernels in 12nm FINFET","authors":"Kuan-Yu Chen, Chi-Sheng Yang, Yu-Hsiu Sun, Chien-Wei Tseng, Morteza Fayazi, Xin He, Siying Feng, Y. Yue, T. Mudge, R. Dreslinski, Hun-Seok Kim, D. Blaauw","doi":"10.1109/vlsitechnologyandcir46769.2022.9830330","DOIUrl":null,"url":null,"abstract":"We present DAP (Domain Adaptive Processor), an adaptive systolic-array-processor of 256 programmable cores in 12 nm CMOS for wireless communication workloads. DAP uses a globally homogeneous but locally heterogeneous architecture, decode-less reconfiguration instructions for data streaming, single-cycle data communication between functional units (FUs), and lightweight nested-loop control. We show how configuration flexibility and fast program loading allows a wide range of communication workloads to be mapped and swapped in sub-µs, supporting continually evolving communication standards such as 5G. DAP achieves 507 GMACs/J and a peak performance of 264 GMACs.","PeriodicalId":332454,"journal":{"name":"2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/vlsitechnologyandcir46769.2022.9830330","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
We present DAP (Domain Adaptive Processor), an adaptive systolic-array-processor of 256 programmable cores in 12 nm CMOS for wireless communication workloads. DAP uses a globally homogeneous but locally heterogeneous architecture, decode-less reconfiguration instructions for data streaming, single-cycle data communication between functional units (FUs), and lightweight nested-loop control. We show how configuration flexibility and fast program loading allows a wide range of communication workloads to be mapped and swapped in sub-µs, supporting continually evolving communication standards such as 5G. DAP achieves 507 GMACs/J and a peak performance of 264 GMACs.