J. K. Kim, Phil C. Knag, Thomas Chen, Zhengya Zhang
{"title":"实现片上学习和推理的6.67mW稀疏编码ASIC","authors":"J. K. Kim, Phil C. Knag, Thomas Chen, Zhengya Zhang","doi":"10.1109/VLSIC.2014.6858385","DOIUrl":null,"url":null,"abstract":"A sparse coding ASIC is designed to learn visual receptive fields and infer the sparse representation of images for encoding, feature detection and recognition. 256 leaky integrate-and-fire neurons are connected in a 2-layer network of 2D local grids linked in a 4-stage systolic ring to reduce the communication latency. Spike collisions are kept sparse enough to be tolerated to save power. Memory is divided into a core section to support inference, and an auxiliary section that is only powered on for learning. An approximate learning tracks only significant neuron activities to save memory and power. The 3.06mm2 65nm CMOS ASIC achieves an inference throughput of 1.24Gpixel/s at 1.0V and 310MHz, and on-chip learning can be completed in seconds. Memory supply voltage can be reduced to 440mV to exploit the soft algorithm that tolerates errors, reducing the inference power to 6.67mW for a 140Mpixel/s throughput at 35MHz.","PeriodicalId":381216,"journal":{"name":"2014 Symposium on VLSI Circuits Digest of Technical Papers","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"A 6.67mW sparse coding ASIC enabling on-chip learning and inference\",\"authors\":\"J. K. Kim, Phil C. Knag, Thomas Chen, Zhengya Zhang\",\"doi\":\"10.1109/VLSIC.2014.6858385\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A sparse coding ASIC is designed to learn visual receptive fields and infer the sparse representation of images for encoding, feature detection and recognition. 256 leaky integrate-and-fire neurons are connected in a 2-layer network of 2D local grids linked in a 4-stage systolic ring to reduce the communication latency. Spike collisions are kept sparse enough to be tolerated to save power. Memory is divided into a core section to support inference, and an auxiliary section that is only powered on for learning. An approximate learning tracks only significant neuron activities to save memory and power. The 3.06mm2 65nm CMOS ASIC achieves an inference throughput of 1.24Gpixel/s at 1.0V and 310MHz, and on-chip learning can be completed in seconds. Memory supply voltage can be reduced to 440mV to exploit the soft algorithm that tolerates errors, reducing the inference power to 6.67mW for a 140Mpixel/s throughput at 35MHz.\",\"PeriodicalId\":381216,\"journal\":{\"name\":\"2014 Symposium on VLSI Circuits Digest of Technical Papers\",\"volume\":\"88 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 Symposium on VLSI Circuits Digest of Technical Papers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VLSIC.2014.6858385\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Symposium on VLSI Circuits Digest of Technical Papers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VLSIC.2014.6858385","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A 6.67mW sparse coding ASIC enabling on-chip learning and inference
A sparse coding ASIC is designed to learn visual receptive fields and infer the sparse representation of images for encoding, feature detection and recognition. 256 leaky integrate-and-fire neurons are connected in a 2-layer network of 2D local grids linked in a 4-stage systolic ring to reduce the communication latency. Spike collisions are kept sparse enough to be tolerated to save power. Memory is divided into a core section to support inference, and an auxiliary section that is only powered on for learning. An approximate learning tracks only significant neuron activities to save memory and power. The 3.06mm2 65nm CMOS ASIC achieves an inference throughput of 1.24Gpixel/s at 1.0V and 310MHz, and on-chip learning can be completed in seconds. Memory supply voltage can be reduced to 440mV to exploit the soft algorithm that tolerates errors, reducing the inference power to 6.67mW for a 140Mpixel/s throughput at 35MHz.