Pub Date : 2024-07-25DOI: 10.1038/s44335-024-00001-5
Dayanand Kumar, Hanrui Li, Amit Singh, Manoj Kumar Rajbhar, Abdul Momin Syed, Hoonkyung Lee, Nazek El-Atab
Photoresponsivity studies of wide-bandgap oxide-based devices have emerged as a vibrant and popular research area. Researchers have explored various material systems in their quest to develop devices capable of responding to illumination. In this study, we engineered a mature wide-bandgap oxide-based bilayer heterostructure synaptic memristor to emulate the human brain for applications in neuromorphic computing and photograph sensing. The device exhibits advanced electric and electrophotonic synaptic functions, such as long-term potentiation (LTP), long-term depression (LTD), and paired-pulse facilitation (PPF), by applying successive electric and photonic pulses. Moreover, the device exhibits exceptional electrical SET and photonic RESET endurance, maintaining its stability for a minimum of 1200 cycles without any degradation. Density functional theory calculations of the band structures provide insights into the conduction mechanism of the device. Based on this memristor array, we developed an autoencoder and convolutional neural network for noise reduction and image recognition tasks, which achieves a peak signal-to-noise ratio of 562 and high accuracy of 84.23%, while consuming lower energy by four orders of magnitude compared with the Tesla P40 GPU. This groundbreaking research not only opens doors for the integration of our device into image processing but also represents a significant advancement in the realm of in-memory computing and photograph-sensing features in a single cell.
宽带隙氧化物器件的光致发光性研究已成为一个充满活力的热门研究领域。研究人员探索了各种材料系统,力求开发出能够对光照做出响应的器件。在这项研究中,我们设计了一种成熟的基于宽带隙氧化物的双层异质结构突触忆阻器,以模拟人脑,应用于神经形态计算和照片传感。该器件通过应用连续的电脉冲和光子脉冲,表现出先进的电和光子突触功能,如长期延时(LTP)、长期抑制(LTD)和成对脉冲促进(PPF)。此外,该器件还表现出卓越的电 SET 和光子 RESET 耐久性,可在至少 1200 个周期内保持稳定而无任何退化。带状结构的密度泛函理论计算深入揭示了该器件的传导机制。基于这种忆阻器阵列,我们开发了一种用于降噪和图像识别任务的自动编码器和卷积神经网络,实现了 562 的峰值信噪比和 84.23% 的高准确率,同时能耗比 Tesla P40 GPU 低四个数量级。这项开创性的研究不仅为我们的设备集成到图像处理中打开了大门,而且标志着在单细胞内存计算和照片传感功能领域取得了重大进展。
{"title":"Negative photo conductivity triggered with visible light in wide bandgap oxide-based optoelectronic crossbar memristive array for photograph sensing and neuromorphic computing applications","authors":"Dayanand Kumar, Hanrui Li, Amit Singh, Manoj Kumar Rajbhar, Abdul Momin Syed, Hoonkyung Lee, Nazek El-Atab","doi":"10.1038/s44335-024-00001-5","DOIUrl":"10.1038/s44335-024-00001-5","url":null,"abstract":"Photoresponsivity studies of wide-bandgap oxide-based devices have emerged as a vibrant and popular research area. Researchers have explored various material systems in their quest to develop devices capable of responding to illumination. In this study, we engineered a mature wide-bandgap oxide-based bilayer heterostructure synaptic memristor to emulate the human brain for applications in neuromorphic computing and photograph sensing. The device exhibits advanced electric and electrophotonic synaptic functions, such as long-term potentiation (LTP), long-term depression (LTD), and paired-pulse facilitation (PPF), by applying successive electric and photonic pulses. Moreover, the device exhibits exceptional electrical SET and photonic RESET endurance, maintaining its stability for a minimum of 1200 cycles without any degradation. Density functional theory calculations of the band structures provide insights into the conduction mechanism of the device. Based on this memristor array, we developed an autoencoder and convolutional neural network for noise reduction and image recognition tasks, which achieves a peak signal-to-noise ratio of 562 and high accuracy of 84.23%, while consuming lower energy by four orders of magnitude compared with the Tesla P40 GPU. This groundbreaking research not only opens doors for the integration of our device into image processing but also represents a significant advancement in the realm of in-memory computing and photograph-sensing features in a single cell.","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-9"},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s44335-024-00001-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1038/s44335-024-00004-2
Yuting Wu, Ziyu Wang, Wei D. Lu
Decoder-only Transformer models such as Generative Pre-trained Transformers (GPT) have demonstrated exceptional performance in text generation by autoregressively predicting the next token. However, the efficiency of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. In this work, we propose a Process-in-memory (PIM) GPT accelerator, PIM-GPT, which achieves end-to-end acceleration of GPT inference with high performance and high energy efficiency. PIM-GPT leverages DRAM-based PIM designs for executing multiply-accumulate (MAC) operations directly in the DRAM chips, eliminating the need to move matrix data off-chip. Non-linear functions and data communication are supported by an application specific integrated chip (ASIC). At the software level, mapping schemes are designed to maximize data locality and computation parallelism. Overall, PIM-GPT achieves 41 − 137 × , 631 − 1074 × speedup and 123 − 383 × , 320 − 602 × energy efficiency over GPU and CPU baseline on 8 GPT models with up to 1.4 billion parameters.
{"title":"PIM GPT a hybrid process in memory accelerator for autoregressive transformers","authors":"Yuting Wu, Ziyu Wang, Wei D. Lu","doi":"10.1038/s44335-024-00004-2","DOIUrl":"10.1038/s44335-024-00004-2","url":null,"abstract":"Decoder-only Transformer models such as Generative Pre-trained Transformers (GPT) have demonstrated exceptional performance in text generation by autoregressively predicting the next token. However, the efficiency of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. In this work, we propose a Process-in-memory (PIM) GPT accelerator, PIM-GPT, which achieves end-to-end acceleration of GPT inference with high performance and high energy efficiency. PIM-GPT leverages DRAM-based PIM designs for executing multiply-accumulate (MAC) operations directly in the DRAM chips, eliminating the need to move matrix data off-chip. Non-linear functions and data communication are supported by an application specific integrated chip (ASIC). At the software level, mapping schemes are designed to maximize data locality and computation parallelism. Overall, PIM-GPT achieves 41 − 137 × , 631 − 1074 × speedup and 123 − 383 × , 320 − 602 × energy efficiency over GPU and CPU baseline on 8 GPT models with up to 1.4 billion parameters.","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-13"},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s44335-024-00004-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141805370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1038/s44335-024-00002-4
Marcus Stoffel, Saurabh Balkrishna Tandale
In recent years, spiking neural networks were introduced in science as the third generation of artificial neural networks leading to a tremendous energy saving on neuromorphic processors. This sustainable effect is due to the sparse nature of signal processing in-between spiking neurons leading to much less scalar multiplications as in second-generation networks. The spiking neuron’s efficiency is even more pronounced by their inherently recurrent nature being useful for recursive function approximations. We believe that there is a need for a general regression framework for SNNs to explore the high potential of neuromorphic computations. However, besides many classification studies with SNNs in the literature, nonlinear neuromorphic regression analysis represents a gap in research. Hence, we propose a general SNN approach for function approximation applicable for complex transient signal processing taking surrogate gradients due to the discontinuous spike representation into account. However, to pay attention to the need for high memory access during deep SNN network communications, additional spiking Legrendre Memory Units are introduced in the neuromorphic architecture. Path-dependencies and evolutions of signals can be tackled in this way. Furthermore, interfaces between real physical and binary spiking values are necessary. Following this intention, a hybrid approach is introduced, exhibiting an autoencoding strategy between dense and spiking layers. However, to verify the presented framework of nonlinear regression for a wide spectrum of scientific purposes, we see the need for obtaining realistic complex transient short-time signals by an extensive experimental set-up. Hence, a measurement technique for benchmark experiments is proposed with high-frequency oscillations measured by capacitive and piezoelectric sensors resulting in wave propagations and inelastic solid deformations to be predicted by the developed SNN regression analysis. Hence, the proposed nonlinear regression framework can be deployed to a wide range of scientific and technical applications.
{"title":"Spiking neural networks for nonlinear regression of complex transient signals on sustainable neuromorphic processors","authors":"Marcus Stoffel, Saurabh Balkrishna Tandale","doi":"10.1038/s44335-024-00002-4","DOIUrl":"10.1038/s44335-024-00002-4","url":null,"abstract":"In recent years, spiking neural networks were introduced in science as the third generation of artificial neural networks leading to a tremendous energy saving on neuromorphic processors. This sustainable effect is due to the sparse nature of signal processing in-between spiking neurons leading to much less scalar multiplications as in second-generation networks. The spiking neuron’s efficiency is even more pronounced by their inherently recurrent nature being useful for recursive function approximations. We believe that there is a need for a general regression framework for SNNs to explore the high potential of neuromorphic computations. However, besides many classification studies with SNNs in the literature, nonlinear neuromorphic regression analysis represents a gap in research. Hence, we propose a general SNN approach for function approximation applicable for complex transient signal processing taking surrogate gradients due to the discontinuous spike representation into account. However, to pay attention to the need for high memory access during deep SNN network communications, additional spiking Legrendre Memory Units are introduced in the neuromorphic architecture. Path-dependencies and evolutions of signals can be tackled in this way. Furthermore, interfaces between real physical and binary spiking values are necessary. Following this intention, a hybrid approach is introduced, exhibiting an autoencoding strategy between dense and spiking layers. However, to verify the presented framework of nonlinear regression for a wide spectrum of scientific purposes, we see the need for obtaining realistic complex transient short-time signals by an extensive experimental set-up. Hence, a measurement technique for benchmark experiments is proposed with high-frequency oscillations measured by capacitive and piezoelectric sensors resulting in wave propagations and inelastic solid deformations to be predicted by the developed SNN regression analysis. Hence, the proposed nonlinear regression framework can be deployed to a wide range of scientific and technical applications.","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-15"},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s44335-024-00002-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141802813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}