{"title":"28nm, 4.69TOPS/W训练,2.34µJ/图像推理,具有推理兼容的反向传播的片上训练加速器","authors":"Haitao Ge, Weiwei Shan, Yicheng Lu, Jun Yang","doi":"10.1109/ICTA56932.2022.9963098","DOIUrl":null,"url":null,"abstract":"Previous on-chip training accelerators improved training efficiency but seldomly considered inference efficiency. We propose to convert back propagation to be compatible with inference, use interleaved memory allocation to reduce external memory access and zero-skipping loss propagation. Working at 40MHz, 0.48V core voltage, our 28nm one-core OCT chip has peak training efficiency of 4.69TOPS/W and the best inference energy of 2.34 µJ/inf/ image, 9.1× better than SoTA work.","PeriodicalId":325602,"journal":{"name":"2022 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A 28nm, 4.69TOPS/W Training, 2.34µJ/lmage Inference, on-chip Training Accelerator with Inference-compatible Back Propagation\",\"authors\":\"Haitao Ge, Weiwei Shan, Yicheng Lu, Jun Yang\",\"doi\":\"10.1109/ICTA56932.2022.9963098\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Previous on-chip training accelerators improved training efficiency but seldomly considered inference efficiency. We propose to convert back propagation to be compatible with inference, use interleaved memory allocation to reduce external memory access and zero-skipping loss propagation. Working at 40MHz, 0.48V core voltage, our 28nm one-core OCT chip has peak training efficiency of 4.69TOPS/W and the best inference energy of 2.34 µJ/inf/ image, 9.1× better than SoTA work.\",\"PeriodicalId\":325602,\"journal\":{\"name\":\"2022 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTA56932.2022.9963098\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTA56932.2022.9963098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A 28nm, 4.69TOPS/W Training, 2.34µJ/lmage Inference, on-chip Training Accelerator with Inference-compatible Back Propagation
Previous on-chip training accelerators improved training efficiency but seldomly considered inference efficiency. We propose to convert back propagation to be compatible with inference, use interleaved memory allocation to reduce external memory access and zero-skipping loss propagation. Working at 40MHz, 0.48V core voltage, our 28nm one-core OCT chip has peak training efficiency of 4.69TOPS/W and the best inference energy of 2.34 µJ/inf/ image, 9.1× better than SoTA work.