Alex Carsello, Kathleen Feng, Taeyoung Kong, Kalhan Koul, Qiaoyi Liu, J. Melchert, Gedeon Nyengele, Maxwell Strange, Kecheng Zhang, Ankita Nayak, Jeff Setter, James J. Thomas, Kavya Sreedhar, Po-Han Chen, Nikhil Bhagdikar, Zachary Myers, Brandon D'Agostino, Pranil Joshi, S. Richardson, Rick Bahr, Christopher Torng, M. Horowitz, Priyanka Raina
{"title":"Amber: 367 GOPS, 538 GOPS/W 16nm SoC与粗粒度可重构阵列,用于密集线性代数的灵活加速","authors":"Alex Carsello, Kathleen Feng, Taeyoung Kong, Kalhan Koul, Qiaoyi Liu, J. Melchert, Gedeon Nyengele, Maxwell Strange, Kecheng Zhang, Ankita Nayak, Jeff Setter, James J. Thomas, Kavya Sreedhar, Po-Han Chen, Nikhil Bhagdikar, Zachary Myers, Brandon D'Agostino, Pranil Joshi, S. Richardson, Rick Bahr, Christopher Torng, M. Horowitz, Priyanka Raina","doi":"10.1109/vlsitechnologyandcir46769.2022.9830509","DOIUrl":null,"url":null,"abstract":"Amber is a system-on-chip (SoC) with a coarse-grained reconfigurable array (CGRA) for acceleration of dense linear algebra applications such as machine learning (ML), image processing, and computer vision. It achieves a peak energy efficiency of 538.0 INT16 GOPS/W and 483.3 BFloat16 GFLOPS/W. We maximize CGRA utilization and minimize reconfigurability overhead through (1) dynamic partial reconfiguration of the CGRA that enables higher resource utilization by allowing multiple applications to run at once, (2) efficient streaming memory controllers supporting affine access patterns, and (3) low-overhead transcendental and complex arithmetic operations. Compared to a CPU, a GPU, and an FPGA, Amber achieves up to 3902x, 152x, and 88x better energy-delay product (EDP).","PeriodicalId":332454,"journal":{"name":"2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Amber: A 367 GOPS, 538 GOPS/W 16nm SoC with a Coarse-Grained Reconfigurable Array for Flexible Acceleration of Dense Linear Algebra\",\"authors\":\"Alex Carsello, Kathleen Feng, Taeyoung Kong, Kalhan Koul, Qiaoyi Liu, J. Melchert, Gedeon Nyengele, Maxwell Strange, Kecheng Zhang, Ankita Nayak, Jeff Setter, James J. Thomas, Kavya Sreedhar, Po-Han Chen, Nikhil Bhagdikar, Zachary Myers, Brandon D'Agostino, Pranil Joshi, S. Richardson, Rick Bahr, Christopher Torng, M. Horowitz, Priyanka Raina\",\"doi\":\"10.1109/vlsitechnologyandcir46769.2022.9830509\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Amber is a system-on-chip (SoC) with a coarse-grained reconfigurable array (CGRA) for acceleration of dense linear algebra applications such as machine learning (ML), image processing, and computer vision. It achieves a peak energy efficiency of 538.0 INT16 GOPS/W and 483.3 BFloat16 GFLOPS/W. We maximize CGRA utilization and minimize reconfigurability overhead through (1) dynamic partial reconfiguration of the CGRA that enables higher resource utilization by allowing multiple applications to run at once, (2) efficient streaming memory controllers supporting affine access patterns, and (3) low-overhead transcendental and complex arithmetic operations. Compared to a CPU, a GPU, and an FPGA, Amber achieves up to 3902x, 152x, and 88x better energy-delay product (EDP).\",\"PeriodicalId\":332454,\"journal\":{\"name\":\"2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/vlsitechnologyandcir46769.2022.9830509\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/vlsitechnologyandcir46769.2022.9830509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Amber: A 367 GOPS, 538 GOPS/W 16nm SoC with a Coarse-Grained Reconfigurable Array for Flexible Acceleration of Dense Linear Algebra
Amber is a system-on-chip (SoC) with a coarse-grained reconfigurable array (CGRA) for acceleration of dense linear algebra applications such as machine learning (ML), image processing, and computer vision. It achieves a peak energy efficiency of 538.0 INT16 GOPS/W and 483.3 BFloat16 GFLOPS/W. We maximize CGRA utilization and minimize reconfigurability overhead through (1) dynamic partial reconfiguration of the CGRA that enables higher resource utilization by allowing multiple applications to run at once, (2) efficient streaming memory controllers supporting affine access patterns, and (3) low-overhead transcendental and complex arithmetic operations. Compared to a CPU, a GPU, and an FPGA, Amber achieves up to 3902x, 152x, and 88x better energy-delay product (EDP).