首页 > 最新文献

IEEE Journal on Exploratory Solid-State Computational Devices and Circuits最新文献

英文 中文
2024 Index IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Vol. 10 探索固态计算器件和电路的IEEE杂志第10卷
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-01-17 DOI: 10.1109/JXCDC.2025.3531616
{"title":"2024 Index IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Vol. 10","authors":"","doi":"10.1109/JXCDC.2025.3531616","DOIUrl":"https://doi.org/10.1109/JXCDC.2025.3531616","url":null,"abstract":"","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"187-194"},"PeriodicalIF":2.0,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10845029","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits publication information 探索性固态计算器件和电路IEEE杂志出版信息
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-01-16 DOI: 10.1109/JXCDC.2024.3499815
{"title":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits publication information","authors":"","doi":"10.1109/JXCDC.2024.3499815","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3499815","url":null,"abstract":"","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"C2-C2"},"PeriodicalIF":2.0,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10844007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
INFORMATION FOR AUTHORS 作者信息
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-01-16 DOI: 10.1109/JXCDC.2024.3499819
{"title":"INFORMATION FOR AUTHORS","authors":"","doi":"10.1109/JXCDC.2024.3499819","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3499819","url":null,"abstract":"","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"C3-C3"},"PeriodicalIF":2.0,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10844024","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Special Topic on 3-D Logic and Memory for Energy Efficient Computing 面向节能计算的三维逻辑和存储器专题
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-01-07 DOI: 10.1109/JXCDC.2024.3518312
editorial
{"title":"Special Topic on 3-D Logic and Memory for Energy Efficient Computing","authors":"editorial","doi":"10.1109/JXCDC.2024.3518312","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3518312","url":null,"abstract":"","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"iii-iv"},"PeriodicalIF":2.0,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10832462","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
E-MAC: Enhanced In-SRAM MAC Accuracy via Digital-to-Time Modulation E-MAC:通过数字时间调制增强sram内MAC精度
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-12-16 DOI: 10.1109/JXCDC.2024.3518633
Saeed Seyedfaraji;Salar Shakibhamedan;Amire Seyedfaraji;Baset Mesgari;Nima Taherinejad;Axel Jantsch;Semeen Rehman
In this article, we introduce a novel technique called E-multiplication and accumulation (MAC) (EMAC), aimed at enhancing energy efficiency, reducing latency, and improving the accuracy of analog-based in-static random access memory (SRAM) MAC accelerators. Our approach involves a digital-to-time word-line (WL) modulation technique that encodes the WL voltage while preserving the necessary linear voltage drop for precise computations. This eliminates the need for an additional digital-to-analog converter (DAC) in the design. Furthermore, the SRAM-based logical weight encoding scheme we present reduces the reliance on capacitance-based techniques, which typically introduce area overhead in the circuit. This approach ensures consistent voltage drops for all equivalent cases [i.e., $(a { times} b) = (b times a)$ ], addressing a persistent issue in existing state-of-the-art methods. Compared with state-of-the-art analog-based in-SRAM techniques, our E-MAC approach demonstrates significant energy savings ( $1.89times $ ) and improved accuracy (73.25%) per MAC computation from a 1-V power supply, while achieving a $11.84times $ energy efficiency improvement over baseline digital approaches. Our application analysis shows a marginal overall reduction in accuracy, i.e., a 0.1% and 0.17% reduction for LeNet5-based CNN and VGG16, respectively, when trained on the MNIST and ImageNet datasets.
在本文中,我们介绍了一种称为e乘法和积累(MAC) (EMAC)的新技术,旨在提高基于模拟的静态随机存取存储器(SRAM) MAC加速器的能源效率、减少延迟和提高准确性。我们的方法涉及一种数字到时间字线(WL)调制技术,该技术对WL电压进行编码,同时保留精确计算所需的线性压降。这消除了在设计中需要额外的数模转换器(DAC)。此外,我们提出的基于sram的逻辑权重编码方案减少了对基于电容的技术的依赖,这些技术通常会在电路中引入面积开销。这种方法确保了所有等效情况下的一致电压降[即,$(a {times} b) = (b times a)$],解决了现有最先进方法中持续存在的问题。与最先进的基于模拟的sram技术相比,我们的E-MAC方法在1 v电源的每个MAC计算中显示出显着的节能(1.89美元)和提高的精度(73.25%),同时实现了11.84美元的能源效率提高。我们的应用分析显示,当在MNIST和ImageNet数据集上训练时,基于lenet5的CNN和基于VGG16的准确率分别降低了0.1%和0.17%。
{"title":"E-MAC: Enhanced In-SRAM MAC Accuracy via Digital-to-Time Modulation","authors":"Saeed Seyedfaraji;Salar Shakibhamedan;Amire Seyedfaraji;Baset Mesgari;Nima Taherinejad;Axel Jantsch;Semeen Rehman","doi":"10.1109/JXCDC.2024.3518633","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3518633","url":null,"abstract":"In this article, we introduce a novel technique called E-multiplication and accumulation (MAC) (EMAC), aimed at enhancing energy efficiency, reducing latency, and improving the accuracy of analog-based in-static random access memory (SRAM) MAC accelerators. Our approach involves a digital-to-time word-line (WL) modulation technique that encodes the WL voltage while preserving the necessary linear voltage drop for precise computations. This eliminates the need for an additional digital-to-analog converter (DAC) in the design. Furthermore, the SRAM-based logical weight encoding scheme we present reduces the reliance on capacitance-based techniques, which typically introduce area overhead in the circuit. This approach ensures consistent voltage drops for all equivalent cases [i.e., \u0000<inline-formula> <tex-math>$(a { times} b) = (b times a)$ </tex-math></inline-formula>\u0000], addressing a persistent issue in existing state-of-the-art methods. Compared with state-of-the-art analog-based in-SRAM techniques, our E-MAC approach demonstrates significant energy savings (\u0000<inline-formula> <tex-math>$1.89times $ </tex-math></inline-formula>\u0000) and improved accuracy (73.25%) per MAC computation from a 1-V power supply, while achieving a \u0000<inline-formula> <tex-math>$11.84times $ </tex-math></inline-formula>\u0000 energy efficiency improvement over baseline digital approaches. Our application analysis shows a marginal overall reduction in accuracy, i.e., a 0.1% and 0.17% reduction for LeNet5-based CNN and VGG16, respectively, when trained on the MNIST and ImageNet datasets.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"178-186"},"PeriodicalIF":2.0,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10804123","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142918524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of a Plasmon-Based Optical Integrated Circuit for Error-Tolerant Streaming Applications 基于等离子体的容错流应用光学集成电路的评价
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-12-04 DOI: 10.1109/JXCDC.2024.3510684
Samantha Lubaba Noor;Xuan Wu;Dennis Lin;Pol van Dorpe;Francky Catthoor;Patrick Reynaert;Azad Naeemi
In this work, we have designed and modeled an integrated plasmonic computing module, which operates at 200 GHz clock frequency for high-end streaming algorithm applications. Our work includes designing the individual optical components (modulator, logic gate, and photodetector) and high-speed electronic driver circuits and integrating the components considering their interactions. We have also holistically evaluated the system-level performance of the computing module, taking into account various factors such as power consumption, operational speed, physical footprint, and average temperature. Through rigorous numerical analyses, we have found that with the existing technology and available materials, the plasmonic computing module can best achieve a bit-error-ratio (BER) of $10^{-1}$ . The performance can be improved by using a high electrooptic coefficient material in the phase shifter and increasing the driver circuit’s swing to greater than 1 V.
在这项工作中,我们设计并建模了一个集成的等离子体计算模块,该模块工作在200 GHz时钟频率下,用于高端流算法应用。我们的工作包括设计单个光学元件(调制器,逻辑门和光电探测器)和高速电子驱动电路,并考虑到它们的相互作用集成组件。我们还全面评估了计算模块的系统级性能,考虑了各种因素,如功耗、运行速度、物理占用空间和平均温度。通过严格的数值分析,我们发现在现有的技术和可用的材料下,等离子体计算模块可以达到10^{-1}$的最佳误码率(BER)。通过在移相器中使用高电光系数的材料,并将驱动电路的摆幅增加到大于1v,可以提高移相器的性能。
{"title":"Evaluation of a Plasmon-Based Optical Integrated Circuit for Error-Tolerant Streaming Applications","authors":"Samantha Lubaba Noor;Xuan Wu;Dennis Lin;Pol van Dorpe;Francky Catthoor;Patrick Reynaert;Azad Naeemi","doi":"10.1109/JXCDC.2024.3510684","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3510684","url":null,"abstract":"In this work, we have designed and modeled an integrated plasmonic computing module, which operates at 200 GHz clock frequency for high-end streaming algorithm applications. Our work includes designing the individual optical components (modulator, logic gate, and photodetector) and high-speed electronic driver circuits and integrating the components considering their interactions. We have also holistically evaluated the system-level performance of the computing module, taking into account various factors such as power consumption, operational speed, physical footprint, and average temperature. Through rigorous numerical analyses, we have found that with the existing technology and available materials, the plasmonic computing module can best achieve a bit-error-ratio (BER) of \u0000<inline-formula> <tex-math>$10^{-1}$ </tex-math></inline-formula>\u0000. The performance can be improved by using a high electrooptic coefficient material in the phase shifter and increasing the driver circuit’s swing to greater than 1 V.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"170-177"},"PeriodicalIF":2.0,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10777494","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142918295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ferroelectric Transistor-Based Synaptic Crossbar Arrays: The Impact of Ferroelectric Thickness and Device-Circuit Interactions 基于铁电晶体管的突触横杆阵列:铁电厚度和器件电路相互作用的影响
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-11-18 DOI: 10.1109/JXCDC.2024.3502053
Chunguang Wang;Sumeet Kumar Gupta
Ferroelectric transistors (FeFETs)-based crossbar arrays have shown immense promise for computing-in-memory (CiM) architectures targeted for neural accelerator designs. Offering CMOS compatibility, nonvolatility, compact bit cell, and CiM-amenable features, such as multilevel storage and voltage-driven conductance tuning, FeFETs are among the foremost candidates for synaptic devices. However, device and circuit nonideal attributes in FeFETs-based crossbar arrays cause the output currents to deviate from the expected value, which can induce error in CiM of matrix-vector multiplications (MVMs). In this article, we analyze the impact of ferroelectric thickness ( $T_{text {FE}}$ ) and cross-layer interactions in FeFETs-based synaptic crossbar arrays accounting for device-circuit nonidealities. First, based on a physics-based model of multidomain FeFETs calibrated to experiments, we analyze the impact of $T_{text {FE}}$ on the characteristics of FeFETs as synaptic devices, highlighting the connections between the multidomain physics and the synaptic attributes. Based on this analysis, we investigate the impact of $T_{text {FE}}$ in conjunction with other design parameters, such as number of bits stored per device (bit slice), wordline (WL) activation schemes, and FeFETs width on the error probability, area, energy, and latency of CiM at the array level. Our results show that FeFETs with $T_{text {FE}}$ around 7 nm achieve the highest CiM robustness, while FeFETs with $T_{text {FE}}$ around 10 nm offer the lowest CiM energy and latency. While the CiM robustness for bit slice 2 is less than bit slice 1, its robustness can be brought to a target level via additional design techniques, such as partial wordline activation and optimization of FeFETs width.
基于铁电晶体管(fefet)的交叉棒阵列在内存计算(CiM)架构中显示出巨大的前景,目标是神经加速器设计。fefet具有CMOS兼容性、非易失性、紧凑的位单元和适合cim的特性,如多电平存储和电压驱动的电导调谐,是突触器件的首选候选器件之一。然而,在基于fet的交叉棒阵列中,器件和电路的非理想属性会导致输出电流偏离期望值,从而导致矩阵向量乘法(MVMs)的CiM误差。在本文中,我们分析了铁电厚度($T_{text {FE}}$)和考虑器件电路非理想性的基于fet的突触交叉棒阵列中的跨层相互作用的影响。首先,基于实验校准的多域fet物理模型,我们分析了$T_{text {FE}}$对fet作为突触器件特性的影响,强调了多域物理与突触属性之间的联系。基于此分析,我们研究了$T_{text {FE}}$与其他设计参数(如每个器件存储的比特数(位片)、字线(WL)激活方案和fet宽度)对阵列级CiM的错误概率、面积、能量和延迟的影响。我们的研究结果表明,$T_{text {FE}}$约为7 nm的fefet具有最高的CiM鲁棒性,而$T_{text {FE}}$约为10 nm的fefet具有最低的CiM能量和延迟。虽然位片2的CiM鲁棒性低于位片1,但可以通过额外的设计技术将其鲁棒性提高到目标水平,例如部分字线激活和优化fet宽度。
{"title":"Ferroelectric Transistor-Based Synaptic Crossbar Arrays: The Impact of Ferroelectric Thickness and Device-Circuit Interactions","authors":"Chunguang Wang;Sumeet Kumar Gupta","doi":"10.1109/JXCDC.2024.3502053","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3502053","url":null,"abstract":"Ferroelectric transistors (FeFETs)-based crossbar arrays have shown immense promise for computing-in-memory (CiM) architectures targeted for neural accelerator designs. Offering CMOS compatibility, nonvolatility, compact bit cell, and CiM-amenable features, such as multilevel storage and voltage-driven conductance tuning, FeFETs are among the foremost candidates for synaptic devices. However, device and circuit nonideal attributes in FeFETs-based crossbar arrays cause the output currents to deviate from the expected value, which can induce error in CiM of matrix-vector multiplications (MVMs). In this article, we analyze the impact of ferroelectric thickness (\u0000<inline-formula> <tex-math>$T_{text {FE}}$ </tex-math></inline-formula>\u0000) and cross-layer interactions in FeFETs-based synaptic crossbar arrays accounting for device-circuit nonidealities. First, based on a physics-based model of multidomain FeFETs calibrated to experiments, we analyze the impact of \u0000<inline-formula> <tex-math>$T_{text {FE}}$ </tex-math></inline-formula>\u0000 on the characteristics of FeFETs as synaptic devices, highlighting the connections between the multidomain physics and the synaptic attributes. Based on this analysis, we investigate the impact of \u0000<inline-formula> <tex-math>$T_{text {FE}}$ </tex-math></inline-formula>\u0000 in conjunction with other design parameters, such as number of bits stored per device (bit slice), wordline (WL) activation schemes, and FeFETs width on the error probability, area, energy, and latency of CiM at the array level. Our results show that FeFETs with \u0000<inline-formula> <tex-math>$T_{text {FE}}$ </tex-math></inline-formula>\u0000 around 7 nm achieve the highest CiM robustness, while FeFETs with \u0000<inline-formula> <tex-math>$T_{text {FE}}$ </tex-math></inline-formula>\u0000 around 10 nm offer the lowest CiM energy and latency. While the CiM robustness for bit slice 2 is less than bit slice 1, its robustness can be brought to a target level via additional design techniques, such as partial wordline activation and optimization of FeFETs width.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"144-152"},"PeriodicalIF":2.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10756727","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142798031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SpecPCM: A Low-Power PCM-Based In-Memory Computing Accelerator for Full-Stack Mass Spectrometry Analysis SpecPCM:用于全栈质谱分析的低功耗pcm内存计算加速器
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-11-15 DOI: 10.1109/JXCDC.2024.3498837
Keming Fan;Ashkan Moradifirouzabadi;Xiangjin Wu;Zheyu Li;Flavio Ponzina;Anton Persson;Eric Pop;Tajana Rosing;Mingu Kang
Mass spectrometry (MS) is essential for proteomics and metabolomics but faces impending challenges in efficiently processing the vast volumes of data. This article introduces SpecPCM, an in-memory computing (IMC) accelerator designed to achieve substantial improvements in energy and delay efficiency for both MS spectral clustering and database (DB) search. SpecPCM employs analog processing with low-voltage swing and utilizes recently introduced phase change memory (PCM) devices based on superlattice materials, optimized for low-voltage and low-power programming. Our approach integrates contributions across multiple levels: application, algorithm, circuit, device, and instruction sets. We leverage a robust hyperdimensional computing (HD) algorithm with a novel dimension-packing method and develop specialized hardware for the end-to-end MS pipeline to overcome the nonideal behavior of PCM devices. We further optimize multilevel PCM devices for different tasks by using different materials. We also perform a comprehensive design exploration to improve energy and delay efficiency while maintaining accuracy, exploring various combinations of hardware and software parameters controlled by the instruction set architecture (ISA). SpecPCM, with up to three bits per cell, achieves speedups of up to $82times $ and $143times $ for MS clustering and DB search tasks, respectively, along with a four-orders-of-magnitude improvement in energy efficiency compared with state-of-the-art (SoA) CPU/GPU tools.
质谱(MS)是蛋白质组学和代谢组学必不可少的,但在有效处理大量数据方面面临着迫在眉睫的挑战。本文介绍了SpecPCM,一个内存计算(IMC)加速器,旨在实现MS谱聚类和数据库(DB)搜索的能量和延迟效率的实质性改进。SpecPCM采用低电压摆动的模拟处理,并利用最近推出的基于超晶格材料的相变存储器(PCM)器件,针对低电压和低功耗编程进行了优化。我们的方法集成了多个层次的贡献:应用,算法,电路,设备和指令集。我们利用一种鲁棒的超维计算(HD)算法和一种新颖的维度填充方法,并为端到端MS管道开发专门的硬件,以克服PCM器件的非理想行为。我们通过使用不同的材料进一步优化了多电平PCM器件,以适应不同的任务。我们还进行了全面的设计探索,以提高能源和延迟效率,同时保持准确性,探索由指令集架构(ISA)控制的硬件和软件参数的各种组合。SpecPCM每单元最多3位,分别为MS集群和DB搜索任务实现高达82倍和143倍的加速,同时与最先进的(SoA) CPU/GPU工具相比,能效提高了4个数量级。
{"title":"SpecPCM: A Low-Power PCM-Based In-Memory Computing Accelerator for Full-Stack Mass Spectrometry Analysis","authors":"Keming Fan;Ashkan Moradifirouzabadi;Xiangjin Wu;Zheyu Li;Flavio Ponzina;Anton Persson;Eric Pop;Tajana Rosing;Mingu Kang","doi":"10.1109/JXCDC.2024.3498837","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3498837","url":null,"abstract":"Mass spectrometry (MS) is essential for proteomics and metabolomics but faces impending challenges in efficiently processing the vast volumes of data. This article introduces SpecPCM, an in-memory computing (IMC) accelerator designed to achieve substantial improvements in energy and delay efficiency for both MS spectral clustering and database (DB) search. SpecPCM employs analog processing with low-voltage swing and utilizes recently introduced phase change memory (PCM) devices based on superlattice materials, optimized for low-voltage and low-power programming. Our approach integrates contributions across multiple levels: application, algorithm, circuit, device, and instruction sets. We leverage a robust hyperdimensional computing (HD) algorithm with a novel dimension-packing method and develop specialized hardware for the end-to-end MS pipeline to overcome the nonideal behavior of PCM devices. We further optimize multilevel PCM devices for different tasks by using different materials. We also perform a comprehensive design exploration to improve energy and delay efficiency while maintaining accuracy, exploring various combinations of hardware and software parameters controlled by the instruction set architecture (ISA). SpecPCM, with up to three bits per cell, achieves speedups of up to \u0000<inline-formula> <tex-math>$82times $ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$143times $ </tex-math></inline-formula>\u0000 for MS clustering and DB search tasks, respectively, along with a four-orders-of-magnitude improvement in energy efficiency compared with state-of-the-art (SoA) CPU/GPU tools.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"161-169"},"PeriodicalIF":2.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10753646","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142859023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
X-TIME: Accelerating Large Tree Ensembles Inference for Tabular Data With Analog CAMs X-TIME:用模拟cam加速表格数据的大型树集合推断
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-11-14 DOI: 10.1109/JXCDC.2024.3495634
Giacomo Pedretti;John Moon;Pedro Bruel;Sergey Serebryakov;Ron M. Roth;Luca Buonanno;Archit Gajjar;Lei Zhao;Tobias Ziegler;Cong Xu;Martin Foltin;Paolo Faraboschi;Jim Ignowski;Catherine E. Graves
Structured, or tabular, data are the most common format in data science. While deep learning models have proven formidable in learning from unstructured data such as images or speech, they are less accurate than simpler approaches when learning from tabular data. In contrast, modern tree-based machine learning (ML) models shine in extracting relevant information from structured data. An essential requirement in data science is to reduce model inference latency in cases where, for example, models are used in a closed loop with simulation to accelerate scientific discovery. However, the hardware acceleration community has mostly focused on deep neural networks and largely ignored other forms of ML. Previous work has described the use of an analog content addressable memory (CAM) component for efficiently mapping random forests (RFs). In this work, we develop an analog-digital architecture that implements a novel increased precision analog CAM and a programmable chip for inference of state-of-the-art tree-based ML models, such as eXtreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and others. Thanks to hardware-aware training, X-TIME reaches state-of-the-art accuracy and $119times $ higher throughput at $9740times $ lower latency with ${gt }150times $ improved energy efficiency compared with a state-of-the-art GPU for models with up to 4096 trees and depth of 8, with a 19-W peak power consumption.
结构化或表格数据是数据科学中最常见的格式。虽然深度学习模型在从图像或语音等非结构化数据中学习方面已经被证明是强大的,但在从表格数据中学习时,它们不如简单的方法准确。相比之下,现代基于树的机器学习(ML)模型在从结构化数据中提取相关信息方面表现出色。数据科学的一个基本要求是在某些情况下减少模型推理延迟,例如,将模型用于具有仿真的闭环中以加速科学发现。然而,硬件加速社区主要关注深度神经网络,而在很大程度上忽略了其他形式的机器学习。以前的工作描述了使用模拟内容可寻址存储器(CAM)组件来有效地映射随机森林(rf)。在这项工作中,我们开发了一种模拟数字架构,该架构实现了一种新型的提高精度的模拟CAM和可编程芯片,用于推断最先进的基于树的ML模型,如极限梯度增强(XGBoost),分类增强(CatBoost)等。由于硬件感知训练,X-TIME达到了最先进的精度,吞吐量提高了119倍,延迟降低了9740倍,能源效率提高了150倍,与最先进的GPU相比,可用于多达4096棵树和深度为8的模型,峰值功耗为19 w。
{"title":"X-TIME: Accelerating Large Tree Ensembles Inference for Tabular Data With Analog CAMs","authors":"Giacomo Pedretti;John Moon;Pedro Bruel;Sergey Serebryakov;Ron M. Roth;Luca Buonanno;Archit Gajjar;Lei Zhao;Tobias Ziegler;Cong Xu;Martin Foltin;Paolo Faraboschi;Jim Ignowski;Catherine E. Graves","doi":"10.1109/JXCDC.2024.3495634","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3495634","url":null,"abstract":"Structured, or tabular, data are the most common format in data science. While deep learning models have proven formidable in learning from unstructured data such as images or speech, they are less accurate than simpler approaches when learning from tabular data. In contrast, modern tree-based machine learning (ML) models shine in extracting relevant information from structured data. An essential requirement in data science is to reduce model inference latency in cases where, for example, models are used in a closed loop with simulation to accelerate scientific discovery. However, the hardware acceleration community has mostly focused on deep neural networks and largely ignored other forms of ML. Previous work has described the use of an analog content addressable memory (CAM) component for efficiently mapping random forests (RFs). In this work, we develop an analog-digital architecture that implements a novel increased precision analog CAM and a programmable chip for inference of state-of-the-art tree-based ML models, such as eXtreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and others. Thanks to hardware-aware training, X-TIME reaches state-of-the-art accuracy and \u0000<inline-formula> <tex-math>$119times $ </tex-math></inline-formula>\u0000 higher throughput at \u0000<inline-formula> <tex-math>$9740times $ </tex-math></inline-formula>\u0000 lower latency with \u0000<inline-formula> <tex-math>${gt }150times $ </tex-math></inline-formula>\u0000 improved energy efficiency compared with a state-of-the-art GPU for models with up to 4096 trees and depth of 8, with a 19-W peak power consumption.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"116-124"},"PeriodicalIF":2.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10753423","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142777850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approximated 2-Bit Adders for Parallel In-Memristor Computing With a Novel Sum-of-Product Architecture 一种新的和积结构的近似2位加法器用于并行忆阻器计算
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-11-13 DOI: 10.1109/JXCDC.2024.3497720
Christian Simonides;Dominik Gausepohl;Peter M. Hinkel;Fabian Seiler;Nima Taherinejad
Conventional computing methods struggle with the exponentially increasing demand for computational power, caused by applications including image processing and machine learning (ML). Novel computing paradigms such as in-memory computing (IMC) and approximate computing (AxC) provide promising solutions to this problem. Due to their low energy consumption and inherent ability to store data in a nonvolatile fashion, memristors are an increasingly popular choice in these fields. There is a wide range of logic forms compatible with memristive IMC, each offering different advantages. We present a novel mixed-logic solution that utilizes properties of the sum-of-product (SOP) representation and propose a full-adder circuit that works efficiently in 2-bit units. To further improve the speed, area usage, and energy consumption, we propose two additional approximate (Ax) 2-bit adders that exhibit inherent parallelization capabilities. We apply the proposed adders in selected image processing applications, where our Ax approach reduces the energy consumption by $mathrm {31~!%}$ $mathrm {40~!%}$ and improves the speed by $mathrm {50~!%}$ . To demonstrate the potential gains of our approximations in more complex applications, we applied them in ML. Our experiments indicate that with up to $6/16$ Ax adders, there is no accuracy degradation when applied in a convolutional neural network (CNN) that is evaluated on MNIST. Our approach can save up to 125.6 mJ of energy and 505 million steps compared to our exact approach.
由于图像处理和机器学习(ML)等应用的出现,传统的计算方法难以应对以指数级增长的计算能力需求。新的计算范式,如内存计算(IMC)和近似计算(AxC),为这一问题提供了有希望的解决方案。由于其低能耗和以非易失性方式存储数据的固有能力,记忆电阻器在这些领域越来越受欢迎。记忆式IMC有多种逻辑形式,每种形式都有不同的优点。我们提出了一种新的混合逻辑解决方案,利用乘积和(SOP)表示的特性,并提出了一个在2位单元中有效工作的全加法器电路。为了进一步提高速度、面积使用和能耗,我们提出了两个额外的近似(Ax) 2位加法器,它们具有固有的并行化能力。我们将提出的加法器应用于选定的图像处理应用中,其中我们的Ax方法减少了能耗$ mathm {31~!%}$ - $ mathm {40~!%}$并提高$ mathm {50~! %} $。为了证明我们的近似在更复杂的应用中的潜在收益,我们将它们应用于ML中。我们的实验表明,使用高达$6/16$ Ax加法器,当应用于在MNIST上评估的卷积神经网络(CNN)时,没有精度下降。与我们的方法相比,我们的方法可以节省多达125.6兆焦耳的能量和5.05亿步。
{"title":"Approximated 2-Bit Adders for Parallel In-Memristor Computing With a Novel Sum-of-Product Architecture","authors":"Christian Simonides;Dominik Gausepohl;Peter M. Hinkel;Fabian Seiler;Nima Taherinejad","doi":"10.1109/JXCDC.2024.3497720","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3497720","url":null,"abstract":"Conventional computing methods struggle with the exponentially increasing demand for computational power, caused by applications including image processing and machine learning (ML). Novel computing paradigms such as in-memory computing (IMC) and approximate computing (AxC) provide promising solutions to this problem. Due to their low energy consumption and inherent ability to store data in a nonvolatile fashion, memristors are an increasingly popular choice in these fields. There is a wide range of logic forms compatible with memristive IMC, each offering different advantages. We present a novel mixed-logic solution that utilizes properties of the sum-of-product (SOP) representation and propose a full-adder circuit that works efficiently in 2-bit units. To further improve the speed, area usage, and energy consumption, we propose two additional approximate (Ax) 2-bit adders that exhibit inherent parallelization capabilities. We apply the proposed adders in selected image processing applications, where our Ax approach reduces the energy consumption by \u0000<inline-formula> <tex-math>$mathrm {31~!%}$ </tex-math></inline-formula>\u0000–\u0000<inline-formula> <tex-math>$mathrm {40~!%}$ </tex-math></inline-formula>\u0000 and improves the speed by \u0000<inline-formula> <tex-math>$mathrm {50~!%}$ </tex-math></inline-formula>\u0000. To demonstrate the potential gains of our approximations in more complex applications, we applied them in ML. Our experiments indicate that with up to \u0000<inline-formula> <tex-math>$6/16$ </tex-math></inline-formula>\u0000 Ax adders, there is no accuracy degradation when applied in a convolutional neural network (CNN) that is evaluated on MNIST. Our approach can save up to 125.6 mJ of energy and 505 million steps compared to our exact approach.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"135-143"},"PeriodicalIF":2.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10752571","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142798030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1