{"title":"利用残差放大进行输入调节的 2-10-b 输出精度可重构内存计算巨集","authors":"Balaji Vijayakumar;Ashwin Balagopal Sundar;Janakiraman Viraraghavan;Varchas Bharadwaj","doi":"10.1109/LSSC.2024.3415476","DOIUrl":null,"url":null,"abstract":"Artificial intelligence workloads demand a wide range of multiply and accumulate (MAC) precision. Pitch-matching constraints in compute-in-memory (CIM) engines limit the analog-to-digital converter (ADC) precision to about 8 bits. This letter demonstrates a method of mapping a suitable input conditioned MAC range to the input dynamic range of the on-chip 7-b ADC, thereby achieving up to 10 bits of output MAC precision. A 424 Kb SRAM CIM macro was fabricated in TSMC 28 nm, which computes 72 MACs in parallel per cycle. Measurement results at nominal supply voltage show an energy efficiency of 196.6–102 TOPS/W/b for a 2–10 bit output MAC precision. Inference results on MNIST, CIFAR10, and CIFAR100 are shown with \n<inline-formula> <tex-math>$\\leq 1\\%$ </tex-math></inline-formula>\n accuracy loss from the software baseline.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"7 ","pages":"219-222"},"PeriodicalIF":2.2000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A 2-to-10-b Output Precision Reconfigurable Compute-In-Memory Macro Leveraging Input Conditioning Using Residue Amplification\",\"authors\":\"Balaji Vijayakumar;Ashwin Balagopal Sundar;Janakiraman Viraraghavan;Varchas Bharadwaj\",\"doi\":\"10.1109/LSSC.2024.3415476\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Artificial intelligence workloads demand a wide range of multiply and accumulate (MAC) precision. Pitch-matching constraints in compute-in-memory (CIM) engines limit the analog-to-digital converter (ADC) precision to about 8 bits. This letter demonstrates a method of mapping a suitable input conditioned MAC range to the input dynamic range of the on-chip 7-b ADC, thereby achieving up to 10 bits of output MAC precision. A 424 Kb SRAM CIM macro was fabricated in TSMC 28 nm, which computes 72 MACs in parallel per cycle. Measurement results at nominal supply voltage show an energy efficiency of 196.6–102 TOPS/W/b for a 2–10 bit output MAC precision. Inference results on MNIST, CIFAR10, and CIFAR100 are shown with \\n<inline-formula> <tex-math>$\\\\leq 1\\\\%$ </tex-math></inline-formula>\\n accuracy loss from the software baseline.\",\"PeriodicalId\":13032,\"journal\":{\"name\":\"IEEE Solid-State Circuits Letters\",\"volume\":\"7 \",\"pages\":\"219-222\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Solid-State Circuits Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10559382/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Solid-State Circuits Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10559382/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
A 2-to-10-b Output Precision Reconfigurable Compute-In-Memory Macro Leveraging Input Conditioning Using Residue Amplification
Artificial intelligence workloads demand a wide range of multiply and accumulate (MAC) precision. Pitch-matching constraints in compute-in-memory (CIM) engines limit the analog-to-digital converter (ADC) precision to about 8 bits. This letter demonstrates a method of mapping a suitable input conditioned MAC range to the input dynamic range of the on-chip 7-b ADC, thereby achieving up to 10 bits of output MAC precision. A 424 Kb SRAM CIM macro was fabricated in TSMC 28 nm, which computes 72 MACs in parallel per cycle. Measurement results at nominal supply voltage show an energy efficiency of 196.6–102 TOPS/W/b for a 2–10 bit output MAC precision. Inference results on MNIST, CIFAR10, and CIFAR100 are shown with
$\leq 1\%$
accuracy loss from the software baseline.