Pub Date : 2024-09-12DOI: 10.1038/s44335-024-00011-3
Wei D. Lu, Christof Teuscher, Stephen A. Sarles, Yuchao Yang, Aida Todri-Sanial, Xiao-Bo Zhu
{"title":"A perfect storm and a new dawn for unconventional computing technologies","authors":"Wei D. Lu, Christof Teuscher, Stephen A. Sarles, Yuchao Yang, Aida Todri-Sanial, Xiao-Bo Zhu","doi":"10.1038/s44335-024-00011-3","DOIUrl":"10.1038/s44335-024-00011-3","url":null,"abstract":"","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s44335-024-00011-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142170398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-02DOI: 10.1038/s44335-024-00008-y
Joao Henrique Quintino Palhares, Nikhil Garg, Pierre-Antoine Mouny, Yann Beilliard, J. Sandrini, F. Arnaud, Lorena Anghel, Fabien Alibart, Dominique Drouin, Philippe Galy
Seeking to circumvent conventional computing bottlenecks, hardware alternatives, from brain-inspired designs to cryogenic quantum systems, necessitate integrating emerging non-volatile memories. Yet, the immaturity and unreliability of cryogenic-compatible memories hinder scalable computing advancements. This study characterizes 28 nm FD-SOI substrate-embedded Ge-rich Ge2Sb2Te5 phase change memories (ePCMs) down to 12 K to overcome these hurdles. It reveals that ePCMs is cryogenic compatible and can encode multiple resistance states with minimal drift, essential for advanced computing solutions. Through simulations, the ePCM’s impact on a spiking neural network (SNN) performing MNIST classification is evaluated. The SNN maintains high accuracy for extended periods of 2 years at cryogenic temperatures, while an accuracy drop of 10.8% is observed at room temperature. These results highlight the potential of multilevel ePCMs in brain-inspired cryogenic computing applications, offering a promising avenue for the evolution of unconventional computing systems.
{"title":"28 nm FDSOI embedded PCM exhibiting near zero drift at 12 K for cryogenic SNNs","authors":"Joao Henrique Quintino Palhares, Nikhil Garg, Pierre-Antoine Mouny, Yann Beilliard, J. Sandrini, F. Arnaud, Lorena Anghel, Fabien Alibart, Dominique Drouin, Philippe Galy","doi":"10.1038/s44335-024-00008-y","DOIUrl":"10.1038/s44335-024-00008-y","url":null,"abstract":"Seeking to circumvent conventional computing bottlenecks, hardware alternatives, from brain-inspired designs to cryogenic quantum systems, necessitate integrating emerging non-volatile memories. Yet, the immaturity and unreliability of cryogenic-compatible memories hinder scalable computing advancements. This study characterizes 28 nm FD-SOI substrate-embedded Ge-rich Ge2Sb2Te5 phase change memories (ePCMs) down to 12 K to overcome these hurdles. It reveals that ePCMs is cryogenic compatible and can encode multiple resistance states with minimal drift, essential for advanced computing solutions. Through simulations, the ePCM’s impact on a spiking neural network (SNN) performing MNIST classification is evaluated. The SNN maintains high accuracy for extended periods of 2 years at cryogenic temperatures, while an accuracy drop of 10.8% is observed at room temperature. These results highlight the potential of multilevel ePCMs in brain-inspired cryogenic computing applications, offering a promising avenue for the evolution of unconventional computing systems.","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-9"},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s44335-024-00008-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142117949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-02DOI: 10.1038/s44335-024-00007-z
Shiva Asapu, Taehwan Moon, Krishnamurthy Mahalingam, Kurt G. Eyink, James Nicolas Pagaduan, Ruoyu Zhao, Sabyasachi Ganguli, Reika Katsumata, Qiangfei Xia, R. Stanley Williams, J. Joshua Yang
We have measured the dynamical response of ZrO2 capacitors to applied triangular voltage waveforms with varying frequencies and amplitudes to determine the voltage and charge on the devices as a function of time. We have fit our experimental results to a Landau–Khalatnikov dynamical equation with a sixth order Landau–Ginzburg–Devonshire polynomial to represent the static charge-voltage behavior, and obtained coefficients of determination R2 > 0.99 for the fits. Analysis of the resulting quantitative model reveals an extremely small range of negative differential capacitance <16 mV. The hysteresis loops in the dynamical charge-voltage curves are found to result primarily from energy loss during the ferroelectric transitions, as represented by a frequency-dependent series resistance in the model.
{"title":"Accurate compact nonlinear dynamical model for a volatile ferroelectric ZrO2 capacitor","authors":"Shiva Asapu, Taehwan Moon, Krishnamurthy Mahalingam, Kurt G. Eyink, James Nicolas Pagaduan, Ruoyu Zhao, Sabyasachi Ganguli, Reika Katsumata, Qiangfei Xia, R. Stanley Williams, J. Joshua Yang","doi":"10.1038/s44335-024-00007-z","DOIUrl":"10.1038/s44335-024-00007-z","url":null,"abstract":"We have measured the dynamical response of ZrO2 capacitors to applied triangular voltage waveforms with varying frequencies and amplitudes to determine the voltage and charge on the devices as a function of time. We have fit our experimental results to a Landau–Khalatnikov dynamical equation with a sixth order Landau–Ginzburg–Devonshire polynomial to represent the static charge-voltage behavior, and obtained coefficients of determination R2 > 0.99 for the fits. Analysis of the resulting quantitative model reveals an extremely small range of negative differential capacitance <16 mV. The hysteresis loops in the dynamical charge-voltage curves are found to result primarily from energy loss during the ferroelectric transitions, as represented by a frequency-dependent series resistance in the model.","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s44335-024-00007-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142117950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-21DOI: 10.1038/s44335-024-00006-0
Yifei Yu, Shaocong Wang, Meng Xu, Woyu Zhang, Bo Wang, Jichang Yang, Songqi Wang, Yue Zhang, Xiaoshan Wu, Hegan Chen, Dingchen Wang, Xi Chen, Ning Lin, Xiaojuan Qi, Dashan Shang, Zhongrui Wang
The broad integration of 3D sensors into devices like smartphones and AR/VR headsets has led to a surge in 3D data, with point clouds becoming a mainstream representation method. Efficient real-time learning of point cloud data on edge devices is crucial for applications such as autonomous vehicles and embodied AI. Traditional machine learning models on digital processors face limitations, with software challenges like high training complexity, and hardware challenges such as large time and energy overheads due to von Neumann bottleneck. To address this, we propose a software-hardware co-designed random memristor-based dynamic graph CNN (RDGCNN). Software-wise, we transform point cloud into graph, and propose random EdgeConv for efficient hierarchical and topological features extraction. Hardware-wise, leveraging memristor’s intrinsic stochasticity and in-memory computing capability, we achieve significant reductions in training complexity and energy consumption. RDGCNN demonstrates high accuracy and efficiency across various point cloud tasks, paving the way for future edge 3D vision.
三维传感器广泛集成到智能手机和 AR/VR 头显等设备中,导致三维数据激增,点云成为主流表示方法。在边缘设备上对点云数据进行高效的实时学习对于自动驾驶汽车和嵌入式人工智能等应用至关重要。数字处理器上的传统机器学习模型面临诸多限制,软件方面的挑战包括训练复杂度高,硬件方面的挑战包括冯-诺依曼瓶颈导致的大量时间和能源开销。为此,我们提出了一种软硬件协同设计的基于随机忆阻器的动态图 CNN(RDGCN)。在软件方面,我们将点云转换为图,并提出了随机 EdgeConv 以实现高效的层次和拓扑特征提取。在硬件方面,我们利用忆阻器固有的随机性和内存计算能力,显著降低了训练复杂度和能耗。RDGCNN 在各种点云任务中都表现出了高精度和高效率,为未来的边缘 3D 视觉铺平了道路。
{"title":"Random memristor-based dynamic graph CNN for efficient point cloud learning at the edge","authors":"Yifei Yu, Shaocong Wang, Meng Xu, Woyu Zhang, Bo Wang, Jichang Yang, Songqi Wang, Yue Zhang, Xiaoshan Wu, Hegan Chen, Dingchen Wang, Xi Chen, Ning Lin, Xiaojuan Qi, Dashan Shang, Zhongrui Wang","doi":"10.1038/s44335-024-00006-0","DOIUrl":"10.1038/s44335-024-00006-0","url":null,"abstract":"The broad integration of 3D sensors into devices like smartphones and AR/VR headsets has led to a surge in 3D data, with point clouds becoming a mainstream representation method. Efficient real-time learning of point cloud data on edge devices is crucial for applications such as autonomous vehicles and embodied AI. Traditional machine learning models on digital processors face limitations, with software challenges like high training complexity, and hardware challenges such as large time and energy overheads due to von Neumann bottleneck. To address this, we propose a software-hardware co-designed random memristor-based dynamic graph CNN (RDGCNN). Software-wise, we transform point cloud into graph, and propose random EdgeConv for efficient hierarchical and topological features extraction. Hardware-wise, leveraging memristor’s intrinsic stochasticity and in-memory computing capability, we achieve significant reductions in training complexity and energy consumption. RDGCNN demonstrates high accuracy and efficiency across various point cloud tasks, paving the way for future edge 3D vision.","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-9"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s44335-024-00006-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142013667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1038/s44335-024-00005-1
Fiona Knoll, John Daly, Jess Meyer
Decades of exponential scaling in high-performance computing (HPC) efficiency is coming to an end. Transistor-based logic in complementary metal-oxide semiconductor (CMOS) technology is approaching physical limits beyond which further miniaturization will be impossible. Future HPC efficiency gains will necessarily rely on new technologies and paradigms of computing. The Ising model shows particular promise as a future framework for highly energy-efficient computation. Ising systems are able to operate at energies approaching thermodynamic limits for energy consumption of computation. Ising systems can function as both logic and memory. Thus, they have the potential to significantly reduce energy costs inherent to CMOS computing by eliminating costly data movement. The challenge in creating Ising-based hardware is in optimizing useful circuits that produce correct results on fundamentally nondeterministic hardware. The contribution of this paper is a novel machine learning approach, a combination of deep neural networks and random forests, for efficiently solving optimization problems that minimize sources of error in the Ising model. In addition, we provide a process to express a Boltzmann probability optimization problem as a supervised machine learning problem.
{"title":"Solving Boltzmann optimization problems with deep learning","authors":"Fiona Knoll, John Daly, Jess Meyer","doi":"10.1038/s44335-024-00005-1","DOIUrl":"10.1038/s44335-024-00005-1","url":null,"abstract":"Decades of exponential scaling in high-performance computing (HPC) efficiency is coming to an end. Transistor-based logic in complementary metal-oxide semiconductor (CMOS) technology is approaching physical limits beyond which further miniaturization will be impossible. Future HPC efficiency gains will necessarily rely on new technologies and paradigms of computing. The Ising model shows particular promise as a future framework for highly energy-efficient computation. Ising systems are able to operate at energies approaching thermodynamic limits for energy consumption of computation. Ising systems can function as both logic and memory. Thus, they have the potential to significantly reduce energy costs inherent to CMOS computing by eliminating costly data movement. The challenge in creating Ising-based hardware is in optimizing useful circuits that produce correct results on fundamentally nondeterministic hardware. The contribution of this paper is a novel machine learning approach, a combination of deep neural networks and random forests, for efficiently solving optimization problems that minimize sources of error in the Ising model. In addition, we provide a process to express a Boltzmann probability optimization problem as a supervised machine learning problem.","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s44335-024-00005-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1038/s44335-024-00003-3
Yang Lv, Brandon R. Zink, Robert P. Bloom, Hüsrev Cılasun, Pravin Khanal, Salonik Resch, Zamshed Chowdhury, Ali Habiboglu, Weigang Wang, Sachin S. Sapatnekar, Ulya Karpuzcu, Jian-Ping Wang
The conventional computing paradigm struggles to fulfill the rapidly growing demands from emerging applications, especially those for machine intelligence because much of the power and energy is consumed by constant data transfers between logic and memory modules. A new paradigm, called “computational random-access memory (CRAM),” has emerged to address this fundamental limitation. CRAM performs logic operations directly using the memory cells themselves, without having the data ever leave the memory. The energy and performance benefits of CRAM for both conventional and emerging applications have been well established by prior numerical studies. However, there is a lack of experimental demonstration and study of CRAM to evaluate its computational accuracy, which is a realistic and application-critical metric for its technological feasibility and competitiveness. In this work, a CRAM array based on magnetic tunnel junctions (MTJs) is experimentally demonstrated. First, basic memory operations, as well as 2-, 3-, and 5-input logic operations, are studied. Then, a 1-bit full adder with two different designs is demonstrated. Based on the experimental results, a suite of models has been developed to characterize the accuracy of CRAM computation. Scalar addition, multiplication, and matrix multiplication, which are essential building blocks for many conventional and machine intelligence applications, are evaluated and show promising accuracy performance. With the confirmation of MTJ-based CRAM’s accuracy, there is a strong case that this technology will have a significant impact on power- and energy-demanding applications of machine intelligence.
{"title":"Experimental demonstration of magnetic tunnel junction-based computational random-access memory","authors":"Yang Lv, Brandon R. Zink, Robert P. Bloom, Hüsrev Cılasun, Pravin Khanal, Salonik Resch, Zamshed Chowdhury, Ali Habiboglu, Weigang Wang, Sachin S. Sapatnekar, Ulya Karpuzcu, Jian-Ping Wang","doi":"10.1038/s44335-024-00003-3","DOIUrl":"10.1038/s44335-024-00003-3","url":null,"abstract":"The conventional computing paradigm struggles to fulfill the rapidly growing demands from emerging applications, especially those for machine intelligence because much of the power and energy is consumed by constant data transfers between logic and memory modules. A new paradigm, called “computational random-access memory (CRAM),” has emerged to address this fundamental limitation. CRAM performs logic operations directly using the memory cells themselves, without having the data ever leave the memory. The energy and performance benefits of CRAM for both conventional and emerging applications have been well established by prior numerical studies. However, there is a lack of experimental demonstration and study of CRAM to evaluate its computational accuracy, which is a realistic and application-critical metric for its technological feasibility and competitiveness. In this work, a CRAM array based on magnetic tunnel junctions (MTJs) is experimentally demonstrated. First, basic memory operations, as well as 2-, 3-, and 5-input logic operations, are studied. Then, a 1-bit full adder with two different designs is demonstrated. Based on the experimental results, a suite of models has been developed to characterize the accuracy of CRAM computation. Scalar addition, multiplication, and matrix multiplication, which are essential building blocks for many conventional and machine intelligence applications, are evaluated and show promising accuracy performance. With the confirmation of MTJ-based CRAM’s accuracy, there is a strong case that this technology will have a significant impact on power- and energy-demanding applications of machine intelligence.","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11287819/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141857556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1038/s44335-024-00001-5
Dayanand Kumar, Hanrui Li, Amit Singh, Manoj Kumar Rajbhar, Abdul Momin Syed, Hoonkyung Lee, Nazek El-Atab
Photoresponsivity studies of wide-bandgap oxide-based devices have emerged as a vibrant and popular research area. Researchers have explored various material systems in their quest to develop devices capable of responding to illumination. In this study, we engineered a mature wide-bandgap oxide-based bilayer heterostructure synaptic memristor to emulate the human brain for applications in neuromorphic computing and photograph sensing. The device exhibits advanced electric and electrophotonic synaptic functions, such as long-term potentiation (LTP), long-term depression (LTD), and paired-pulse facilitation (PPF), by applying successive electric and photonic pulses. Moreover, the device exhibits exceptional electrical SET and photonic RESET endurance, maintaining its stability for a minimum of 1200 cycles without any degradation. Density functional theory calculations of the band structures provide insights into the conduction mechanism of the device. Based on this memristor array, we developed an autoencoder and convolutional neural network for noise reduction and image recognition tasks, which achieves a peak signal-to-noise ratio of 562 and high accuracy of 84.23%, while consuming lower energy by four orders of magnitude compared with the Tesla P40 GPU. This groundbreaking research not only opens doors for the integration of our device into image processing but also represents a significant advancement in the realm of in-memory computing and photograph-sensing features in a single cell.
宽带隙氧化物器件的光致发光性研究已成为一个充满活力的热门研究领域。研究人员探索了各种材料系统,力求开发出能够对光照做出响应的器件。在这项研究中,我们设计了一种成熟的基于宽带隙氧化物的双层异质结构突触忆阻器,以模拟人脑,应用于神经形态计算和照片传感。该器件通过应用连续的电脉冲和光子脉冲,表现出先进的电和光子突触功能,如长期延时(LTP)、长期抑制(LTD)和成对脉冲促进(PPF)。此外,该器件还表现出卓越的电 SET 和光子 RESET 耐久性,可在至少 1200 个周期内保持稳定而无任何退化。带状结构的密度泛函理论计算深入揭示了该器件的传导机制。基于这种忆阻器阵列,我们开发了一种用于降噪和图像识别任务的自动编码器和卷积神经网络,实现了 562 的峰值信噪比和 84.23% 的高准确率,同时能耗比 Tesla P40 GPU 低四个数量级。这项开创性的研究不仅为我们的设备集成到图像处理中打开了大门,而且标志着在单细胞内存计算和照片传感功能领域取得了重大进展。
{"title":"Negative photo conductivity triggered with visible light in wide bandgap oxide-based optoelectronic crossbar memristive array for photograph sensing and neuromorphic computing applications","authors":"Dayanand Kumar, Hanrui Li, Amit Singh, Manoj Kumar Rajbhar, Abdul Momin Syed, Hoonkyung Lee, Nazek El-Atab","doi":"10.1038/s44335-024-00001-5","DOIUrl":"10.1038/s44335-024-00001-5","url":null,"abstract":"Photoresponsivity studies of wide-bandgap oxide-based devices have emerged as a vibrant and popular research area. Researchers have explored various material systems in their quest to develop devices capable of responding to illumination. In this study, we engineered a mature wide-bandgap oxide-based bilayer heterostructure synaptic memristor to emulate the human brain for applications in neuromorphic computing and photograph sensing. The device exhibits advanced electric and electrophotonic synaptic functions, such as long-term potentiation (LTP), long-term depression (LTD), and paired-pulse facilitation (PPF), by applying successive electric and photonic pulses. Moreover, the device exhibits exceptional electrical SET and photonic RESET endurance, maintaining its stability for a minimum of 1200 cycles without any degradation. Density functional theory calculations of the band structures provide insights into the conduction mechanism of the device. Based on this memristor array, we developed an autoencoder and convolutional neural network for noise reduction and image recognition tasks, which achieves a peak signal-to-noise ratio of 562 and high accuracy of 84.23%, while consuming lower energy by four orders of magnitude compared with the Tesla P40 GPU. This groundbreaking research not only opens doors for the integration of our device into image processing but also represents a significant advancement in the realm of in-memory computing and photograph-sensing features in a single cell.","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-9"},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s44335-024-00001-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1038/s44335-024-00004-2
Yuting Wu, Ziyu Wang, Wei D. Lu
Decoder-only Transformer models such as Generative Pre-trained Transformers (GPT) have demonstrated exceptional performance in text generation by autoregressively predicting the next token. However, the efficiency of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. In this work, we propose a Process-in-memory (PIM) GPT accelerator, PIM-GPT, which achieves end-to-end acceleration of GPT inference with high performance and high energy efficiency. PIM-GPT leverages DRAM-based PIM designs for executing multiply-accumulate (MAC) operations directly in the DRAM chips, eliminating the need to move matrix data off-chip. Non-linear functions and data communication are supported by an application specific integrated chip (ASIC). At the software level, mapping schemes are designed to maximize data locality and computation parallelism. Overall, PIM-GPT achieves 41 − 137 × , 631 − 1074 × speedup and 123 − 383 × , 320 − 602 × energy efficiency over GPU and CPU baseline on 8 GPT models with up to 1.4 billion parameters.
{"title":"PIM GPT a hybrid process in memory accelerator for autoregressive transformers","authors":"Yuting Wu, Ziyu Wang, Wei D. Lu","doi":"10.1038/s44335-024-00004-2","DOIUrl":"10.1038/s44335-024-00004-2","url":null,"abstract":"Decoder-only Transformer models such as Generative Pre-trained Transformers (GPT) have demonstrated exceptional performance in text generation by autoregressively predicting the next token. However, the efficiency of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. In this work, we propose a Process-in-memory (PIM) GPT accelerator, PIM-GPT, which achieves end-to-end acceleration of GPT inference with high performance and high energy efficiency. PIM-GPT leverages DRAM-based PIM designs for executing multiply-accumulate (MAC) operations directly in the DRAM chips, eliminating the need to move matrix data off-chip. Non-linear functions and data communication are supported by an application specific integrated chip (ASIC). At the software level, mapping schemes are designed to maximize data locality and computation parallelism. Overall, PIM-GPT achieves 41 − 137 × , 631 − 1074 × speedup and 123 − 383 × , 320 − 602 × energy efficiency over GPU and CPU baseline on 8 GPT models with up to 1.4 billion parameters.","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-13"},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s44335-024-00004-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141805370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1038/s44335-024-00002-4
Marcus Stoffel, Saurabh Balkrishna Tandale
In recent years, spiking neural networks were introduced in science as the third generation of artificial neural networks leading to a tremendous energy saving on neuromorphic processors. This sustainable effect is due to the sparse nature of signal processing in-between spiking neurons leading to much less scalar multiplications as in second-generation networks. The spiking neuron’s efficiency is even more pronounced by their inherently recurrent nature being useful for recursive function approximations. We believe that there is a need for a general regression framework for SNNs to explore the high potential of neuromorphic computations. However, besides many classification studies with SNNs in the literature, nonlinear neuromorphic regression analysis represents a gap in research. Hence, we propose a general SNN approach for function approximation applicable for complex transient signal processing taking surrogate gradients due to the discontinuous spike representation into account. However, to pay attention to the need for high memory access during deep SNN network communications, additional spiking Legrendre Memory Units are introduced in the neuromorphic architecture. Path-dependencies and evolutions of signals can be tackled in this way. Furthermore, interfaces between real physical and binary spiking values are necessary. Following this intention, a hybrid approach is introduced, exhibiting an autoencoding strategy between dense and spiking layers. However, to verify the presented framework of nonlinear regression for a wide spectrum of scientific purposes, we see the need for obtaining realistic complex transient short-time signals by an extensive experimental set-up. Hence, a measurement technique for benchmark experiments is proposed with high-frequency oscillations measured by capacitive and piezoelectric sensors resulting in wave propagations and inelastic solid deformations to be predicted by the developed SNN regression analysis. Hence, the proposed nonlinear regression framework can be deployed to a wide range of scientific and technical applications.
{"title":"Spiking neural networks for nonlinear regression of complex transient signals on sustainable neuromorphic processors","authors":"Marcus Stoffel, Saurabh Balkrishna Tandale","doi":"10.1038/s44335-024-00002-4","DOIUrl":"10.1038/s44335-024-00002-4","url":null,"abstract":"In recent years, spiking neural networks were introduced in science as the third generation of artificial neural networks leading to a tremendous energy saving on neuromorphic processors. This sustainable effect is due to the sparse nature of signal processing in-between spiking neurons leading to much less scalar multiplications as in second-generation networks. The spiking neuron’s efficiency is even more pronounced by their inherently recurrent nature being useful for recursive function approximations. We believe that there is a need for a general regression framework for SNNs to explore the high potential of neuromorphic computations. However, besides many classification studies with SNNs in the literature, nonlinear neuromorphic regression analysis represents a gap in research. Hence, we propose a general SNN approach for function approximation applicable for complex transient signal processing taking surrogate gradients due to the discontinuous spike representation into account. However, to pay attention to the need for high memory access during deep SNN network communications, additional spiking Legrendre Memory Units are introduced in the neuromorphic architecture. Path-dependencies and evolutions of signals can be tackled in this way. Furthermore, interfaces between real physical and binary spiking values are necessary. Following this intention, a hybrid approach is introduced, exhibiting an autoencoding strategy between dense and spiking layers. However, to verify the presented framework of nonlinear regression for a wide spectrum of scientific purposes, we see the need for obtaining realistic complex transient short-time signals by an extensive experimental set-up. Hence, a measurement technique for benchmark experiments is proposed with high-frequency oscillations measured by capacitive and piezoelectric sensors resulting in wave propagations and inelastic solid deformations to be predicted by the developed SNN regression analysis. Hence, the proposed nonlinear regression framework can be deployed to a wide range of scientific and technical applications.","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-15"},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s44335-024-00002-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141802813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}