Pub Date : 2024-08-05DOI: 10.1038/s44335-024-00005-1
Fiona Knoll, John Daly, Jess Meyer
Decades of exponential scaling in high-performance computing (HPC) efficiency is coming to an end. Transistor-based logic in complementary metal-oxide semiconductor (CMOS) technology is approaching physical limits beyond which further miniaturization will be impossible. Future HPC efficiency gains will necessarily rely on new technologies and paradigms of computing. The Ising model shows particular promise as a future framework for highly energy-efficient computation. Ising systems are able to operate at energies approaching thermodynamic limits for energy consumption of computation. Ising systems can function as both logic and memory. Thus, they have the potential to significantly reduce energy costs inherent to CMOS computing by eliminating costly data movement. The challenge in creating Ising-based hardware is in optimizing useful circuits that produce correct results on fundamentally nondeterministic hardware. The contribution of this paper is a novel machine learning approach, a combination of deep neural networks and random forests, for efficiently solving optimization problems that minimize sources of error in the Ising model. In addition, we provide a process to express a Boltzmann probability optimization problem as a supervised machine learning problem.
{"title":"Solving Boltzmann optimization problems with deep learning","authors":"Fiona Knoll, John Daly, Jess Meyer","doi":"10.1038/s44335-024-00005-1","DOIUrl":"10.1038/s44335-024-00005-1","url":null,"abstract":"Decades of exponential scaling in high-performance computing (HPC) efficiency is coming to an end. Transistor-based logic in complementary metal-oxide semiconductor (CMOS) technology is approaching physical limits beyond which further miniaturization will be impossible. Future HPC efficiency gains will necessarily rely on new technologies and paradigms of computing. The Ising model shows particular promise as a future framework for highly energy-efficient computation. Ising systems are able to operate at energies approaching thermodynamic limits for energy consumption of computation. Ising systems can function as both logic and memory. Thus, they have the potential to significantly reduce energy costs inherent to CMOS computing by eliminating costly data movement. The challenge in creating Ising-based hardware is in optimizing useful circuits that produce correct results on fundamentally nondeterministic hardware. The contribution of this paper is a novel machine learning approach, a combination of deep neural networks and random forests, for efficiently solving optimization problems that minimize sources of error in the Ising model. In addition, we provide a process to express a Boltzmann probability optimization problem as a supervised machine learning problem.","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s44335-024-00005-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1038/s44335-024-00003-3
Yang Lv, Brandon R. Zink, Robert P. Bloom, Hüsrev Cılasun, Pravin Khanal, Salonik Resch, Zamshed Chowdhury, Ali Habiboglu, Weigang Wang, Sachin S. Sapatnekar, Ulya Karpuzcu, Jian-Ping Wang
The conventional computing paradigm struggles to fulfill the rapidly growing demands from emerging applications, especially those for machine intelligence because much of the power and energy is consumed by constant data transfers between logic and memory modules. A new paradigm, called “computational random-access memory (CRAM),” has emerged to address this fundamental limitation. CRAM performs logic operations directly using the memory cells themselves, without having the data ever leave the memory. The energy and performance benefits of CRAM for both conventional and emerging applications have been well established by prior numerical studies. However, there is a lack of experimental demonstration and study of CRAM to evaluate its computational accuracy, which is a realistic and application-critical metric for its technological feasibility and competitiveness. In this work, a CRAM array based on magnetic tunnel junctions (MTJs) is experimentally demonstrated. First, basic memory operations, as well as 2-, 3-, and 5-input logic operations, are studied. Then, a 1-bit full adder with two different designs is demonstrated. Based on the experimental results, a suite of models has been developed to characterize the accuracy of CRAM computation. Scalar addition, multiplication, and matrix multiplication, which are essential building blocks for many conventional and machine intelligence applications, are evaluated and show promising accuracy performance. With the confirmation of MTJ-based CRAM’s accuracy, there is a strong case that this technology will have a significant impact on power- and energy-demanding applications of machine intelligence.
{"title":"Experimental demonstration of magnetic tunnel junction-based computational random-access memory","authors":"Yang Lv, Brandon R. Zink, Robert P. Bloom, Hüsrev Cılasun, Pravin Khanal, Salonik Resch, Zamshed Chowdhury, Ali Habiboglu, Weigang Wang, Sachin S. Sapatnekar, Ulya Karpuzcu, Jian-Ping Wang","doi":"10.1038/s44335-024-00003-3","DOIUrl":"10.1038/s44335-024-00003-3","url":null,"abstract":"The conventional computing paradigm struggles to fulfill the rapidly growing demands from emerging applications, especially those for machine intelligence because much of the power and energy is consumed by constant data transfers between logic and memory modules. A new paradigm, called “computational random-access memory (CRAM),” has emerged to address this fundamental limitation. CRAM performs logic operations directly using the memory cells themselves, without having the data ever leave the memory. The energy and performance benefits of CRAM for both conventional and emerging applications have been well established by prior numerical studies. However, there is a lack of experimental demonstration and study of CRAM to evaluate its computational accuracy, which is a realistic and application-critical metric for its technological feasibility and competitiveness. In this work, a CRAM array based on magnetic tunnel junctions (MTJs) is experimentally demonstrated. First, basic memory operations, as well as 2-, 3-, and 5-input logic operations, are studied. Then, a 1-bit full adder with two different designs is demonstrated. Based on the experimental results, a suite of models has been developed to characterize the accuracy of CRAM computation. Scalar addition, multiplication, and matrix multiplication, which are essential building blocks for many conventional and machine intelligence applications, are evaluated and show promising accuracy performance. With the confirmation of MTJ-based CRAM’s accuracy, there is a strong case that this technology will have a significant impact on power- and energy-demanding applications of machine intelligence.","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11287819/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141857556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1038/s44335-024-00001-5
Dayanand Kumar, Hanrui Li, Amit Singh, Manoj Kumar Rajbhar, Abdul Momin Syed, Hoonkyung Lee, Nazek El-Atab
Photoresponsivity studies of wide-bandgap oxide-based devices have emerged as a vibrant and popular research area. Researchers have explored various material systems in their quest to develop devices capable of responding to illumination. In this study, we engineered a mature wide-bandgap oxide-based bilayer heterostructure synaptic memristor to emulate the human brain for applications in neuromorphic computing and photograph sensing. The device exhibits advanced electric and electrophotonic synaptic functions, such as long-term potentiation (LTP), long-term depression (LTD), and paired-pulse facilitation (PPF), by applying successive electric and photonic pulses. Moreover, the device exhibits exceptional electrical SET and photonic RESET endurance, maintaining its stability for a minimum of 1200 cycles without any degradation. Density functional theory calculations of the band structures provide insights into the conduction mechanism of the device. Based on this memristor array, we developed an autoencoder and convolutional neural network for noise reduction and image recognition tasks, which achieves a peak signal-to-noise ratio of 562 and high accuracy of 84.23%, while consuming lower energy by four orders of magnitude compared with the Tesla P40 GPU. This groundbreaking research not only opens doors for the integration of our device into image processing but also represents a significant advancement in the realm of in-memory computing and photograph-sensing features in a single cell.
宽带隙氧化物器件的光致发光性研究已成为一个充满活力的热门研究领域。研究人员探索了各种材料系统,力求开发出能够对光照做出响应的器件。在这项研究中,我们设计了一种成熟的基于宽带隙氧化物的双层异质结构突触忆阻器,以模拟人脑,应用于神经形态计算和照片传感。该器件通过应用连续的电脉冲和光子脉冲,表现出先进的电和光子突触功能,如长期延时(LTP)、长期抑制(LTD)和成对脉冲促进(PPF)。此外,该器件还表现出卓越的电 SET 和光子 RESET 耐久性,可在至少 1200 个周期内保持稳定而无任何退化。带状结构的密度泛函理论计算深入揭示了该器件的传导机制。基于这种忆阻器阵列,我们开发了一种用于降噪和图像识别任务的自动编码器和卷积神经网络,实现了 562 的峰值信噪比和 84.23% 的高准确率,同时能耗比 Tesla P40 GPU 低四个数量级。这项开创性的研究不仅为我们的设备集成到图像处理中打开了大门,而且标志着在单细胞内存计算和照片传感功能领域取得了重大进展。
{"title":"Negative photo conductivity triggered with visible light in wide bandgap oxide-based optoelectronic crossbar memristive array for photograph sensing and neuromorphic computing applications","authors":"Dayanand Kumar, Hanrui Li, Amit Singh, Manoj Kumar Rajbhar, Abdul Momin Syed, Hoonkyung Lee, Nazek El-Atab","doi":"10.1038/s44335-024-00001-5","DOIUrl":"10.1038/s44335-024-00001-5","url":null,"abstract":"Photoresponsivity studies of wide-bandgap oxide-based devices have emerged as a vibrant and popular research area. Researchers have explored various material systems in their quest to develop devices capable of responding to illumination. In this study, we engineered a mature wide-bandgap oxide-based bilayer heterostructure synaptic memristor to emulate the human brain for applications in neuromorphic computing and photograph sensing. The device exhibits advanced electric and electrophotonic synaptic functions, such as long-term potentiation (LTP), long-term depression (LTD), and paired-pulse facilitation (PPF), by applying successive electric and photonic pulses. Moreover, the device exhibits exceptional electrical SET and photonic RESET endurance, maintaining its stability for a minimum of 1200 cycles without any degradation. Density functional theory calculations of the band structures provide insights into the conduction mechanism of the device. Based on this memristor array, we developed an autoencoder and convolutional neural network for noise reduction and image recognition tasks, which achieves a peak signal-to-noise ratio of 562 and high accuracy of 84.23%, while consuming lower energy by four orders of magnitude compared with the Tesla P40 GPU. This groundbreaking research not only opens doors for the integration of our device into image processing but also represents a significant advancement in the realm of in-memory computing and photograph-sensing features in a single cell.","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-9"},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s44335-024-00001-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1038/s44335-024-00004-2
Yuting Wu, Ziyu Wang, Wei D. Lu
Decoder-only Transformer models such as Generative Pre-trained Transformers (GPT) have demonstrated exceptional performance in text generation by autoregressively predicting the next token. However, the efficiency of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. In this work, we propose a Process-in-memory (PIM) GPT accelerator, PIM-GPT, which achieves end-to-end acceleration of GPT inference with high performance and high energy efficiency. PIM-GPT leverages DRAM-based PIM designs for executing multiply-accumulate (MAC) operations directly in the DRAM chips, eliminating the need to move matrix data off-chip. Non-linear functions and data communication are supported by an application specific integrated chip (ASIC). At the software level, mapping schemes are designed to maximize data locality and computation parallelism. Overall, PIM-GPT achieves 41 − 137 × , 631 − 1074 × speedup and 123 − 383 × , 320 − 602 × energy efficiency over GPU and CPU baseline on 8 GPT models with up to 1.4 billion parameters.
{"title":"PIM GPT a hybrid process in memory accelerator for autoregressive transformers","authors":"Yuting Wu, Ziyu Wang, Wei D. Lu","doi":"10.1038/s44335-024-00004-2","DOIUrl":"10.1038/s44335-024-00004-2","url":null,"abstract":"Decoder-only Transformer models such as Generative Pre-trained Transformers (GPT) have demonstrated exceptional performance in text generation by autoregressively predicting the next token. However, the efficiency of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. In this work, we propose a Process-in-memory (PIM) GPT accelerator, PIM-GPT, which achieves end-to-end acceleration of GPT inference with high performance and high energy efficiency. PIM-GPT leverages DRAM-based PIM designs for executing multiply-accumulate (MAC) operations directly in the DRAM chips, eliminating the need to move matrix data off-chip. Non-linear functions and data communication are supported by an application specific integrated chip (ASIC). At the software level, mapping schemes are designed to maximize data locality and computation parallelism. Overall, PIM-GPT achieves 41 − 137 × , 631 − 1074 × speedup and 123 − 383 × , 320 − 602 × energy efficiency over GPU and CPU baseline on 8 GPT models with up to 1.4 billion parameters.","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-13"},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s44335-024-00004-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141805370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1038/s44335-024-00002-4
Marcus Stoffel, Saurabh Balkrishna Tandale
In recent years, spiking neural networks were introduced in science as the third generation of artificial neural networks leading to a tremendous energy saving on neuromorphic processors. This sustainable effect is due to the sparse nature of signal processing in-between spiking neurons leading to much less scalar multiplications as in second-generation networks. The spiking neuron’s efficiency is even more pronounced by their inherently recurrent nature being useful for recursive function approximations. We believe that there is a need for a general regression framework for SNNs to explore the high potential of neuromorphic computations. However, besides many classification studies with SNNs in the literature, nonlinear neuromorphic regression analysis represents a gap in research. Hence, we propose a general SNN approach for function approximation applicable for complex transient signal processing taking surrogate gradients due to the discontinuous spike representation into account. However, to pay attention to the need for high memory access during deep SNN network communications, additional spiking Legrendre Memory Units are introduced in the neuromorphic architecture. Path-dependencies and evolutions of signals can be tackled in this way. Furthermore, interfaces between real physical and binary spiking values are necessary. Following this intention, a hybrid approach is introduced, exhibiting an autoencoding strategy between dense and spiking layers. However, to verify the presented framework of nonlinear regression for a wide spectrum of scientific purposes, we see the need for obtaining realistic complex transient short-time signals by an extensive experimental set-up. Hence, a measurement technique for benchmark experiments is proposed with high-frequency oscillations measured by capacitive and piezoelectric sensors resulting in wave propagations and inelastic solid deformations to be predicted by the developed SNN regression analysis. Hence, the proposed nonlinear regression framework can be deployed to a wide range of scientific and technical applications.
{"title":"Spiking neural networks for nonlinear regression of complex transient signals on sustainable neuromorphic processors","authors":"Marcus Stoffel, Saurabh Balkrishna Tandale","doi":"10.1038/s44335-024-00002-4","DOIUrl":"10.1038/s44335-024-00002-4","url":null,"abstract":"In recent years, spiking neural networks were introduced in science as the third generation of artificial neural networks leading to a tremendous energy saving on neuromorphic processors. This sustainable effect is due to the sparse nature of signal processing in-between spiking neurons leading to much less scalar multiplications as in second-generation networks. The spiking neuron’s efficiency is even more pronounced by their inherently recurrent nature being useful for recursive function approximations. We believe that there is a need for a general regression framework for SNNs to explore the high potential of neuromorphic computations. However, besides many classification studies with SNNs in the literature, nonlinear neuromorphic regression analysis represents a gap in research. Hence, we propose a general SNN approach for function approximation applicable for complex transient signal processing taking surrogate gradients due to the discontinuous spike representation into account. However, to pay attention to the need for high memory access during deep SNN network communications, additional spiking Legrendre Memory Units are introduced in the neuromorphic architecture. Path-dependencies and evolutions of signals can be tackled in this way. Furthermore, interfaces between real physical and binary spiking values are necessary. Following this intention, a hybrid approach is introduced, exhibiting an autoencoding strategy between dense and spiking layers. However, to verify the presented framework of nonlinear regression for a wide spectrum of scientific purposes, we see the need for obtaining realistic complex transient short-time signals by an extensive experimental set-up. Hence, a measurement technique for benchmark experiments is proposed with high-frequency oscillations measured by capacitive and piezoelectric sensors resulting in wave propagations and inelastic solid deformations to be predicted by the developed SNN regression analysis. Hence, the proposed nonlinear regression framework can be deployed to a wide range of scientific and technical applications.","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-15"},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s44335-024-00002-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141802813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}