首页 > 最新文献

Proceedings of the Great Lakes Symposium on VLSI 2022最新文献

英文 中文
Side-Channel Analysis of the Random Number Generator in STM32 MCUs STM32单片机中随机数发生器的侧信道分析
Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530324
Kalle Ngo, E. Dubrova
The hardware random number generator (RNG) integrated in STM32 MCUs is intended to ensure that the numbers it generates cannot be guessed with a probability higher than a random guess. The RNG is based on several ring oscillators whose outputs are combined and post-processed to produce a 32-bit random number per round of computation. In this paper, we show that it is possible to train a neural network capable of recovering the Hamming weight of these random numbers from power traces with a higher than 60% probability. This is a 4-fold improvement over the 14% probability of the most likely Hamming weight.
集成在STM32 mcu中的硬件随机数生成器(RNG)旨在确保其生成的数字不会以高于随机猜测的概率被猜出。RNG基于几个环形振荡器,它们的输出经过组合和后处理,每轮计算产生一个32位随机数。在本文中,我们证明了有可能训练一个能够以高于60%的概率从功率走线恢复这些随机数的汉明权值的神经网络。这比最可能的汉明权重14%的概率提高了4倍。
{"title":"Side-Channel Analysis of the Random Number Generator in STM32 MCUs","authors":"Kalle Ngo, E. Dubrova","doi":"10.1145/3526241.3530324","DOIUrl":"https://doi.org/10.1145/3526241.3530324","url":null,"abstract":"The hardware random number generator (RNG) integrated in STM32 MCUs is intended to ensure that the numbers it generates cannot be guessed with a probability higher than a random guess. The RNG is based on several ring oscillators whose outputs are combined and post-processed to produce a 32-bit random number per round of computation. In this paper, we show that it is possible to train a neural network capable of recovering the Hamming weight of these random numbers from power traces with a higher than 60% probability. This is a 4-fold improvement over the 14% probability of the most likely Hamming weight.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123816305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Benchmark Comparisons of Spike-based Reconfigurable Neuroprocessor Architectures for Control Applications 控制应用中基于峰值的可重构神经处理器架构的基准比较
Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530381
Adam Z. Foshie, Charles Rizzo, Hritom Das, Chaohui Zheng, J. Plank, G. Rose
Neuromorphic computing is a leading option for non von-Neumann computing architectures. With it, neural networks are developed that derive architectural inspiration from how the brain operates with neurons, synapses, and spikes. These networks are often implemented in either software or hardware based neuroprocessors designed to handle specific tasks efficiently. Even if implemented in hardware, software emulation is instrumental in determining the worthwhile features and capabilities of the architecture. In this work two novel neuroprocessors are introduced: the software-based RISP neuroprocessor, and the RAVENS hardware neuroprocessor. Several benchmark tests using control applications are performed with each neuroprocessor configured in various ways to evaluate their comparative performance and training properties.
神经形态计算是非诺伊曼计算体系结构的主要选择。有了它,从大脑如何运作神经元、突触和尖峰中获得建筑灵感的神经网络得以发展。这些网络通常在基于软件或硬件的神经处理器中实现,旨在有效地处理特定任务。即使在硬件中实现,软件仿真也有助于确定体系结构的有价值的特性和功能。本文介绍了两种新型神经处理器:基于软件的RISP神经处理器和基于硬件的RAVENS神经处理器。使用控制应用程序对每个以各种方式配置的神经处理器执行几个基准测试,以评估它们的比较性能和训练属性。
{"title":"Benchmark Comparisons of Spike-based Reconfigurable Neuroprocessor Architectures for Control Applications","authors":"Adam Z. Foshie, Charles Rizzo, Hritom Das, Chaohui Zheng, J. Plank, G. Rose","doi":"10.1145/3526241.3530381","DOIUrl":"https://doi.org/10.1145/3526241.3530381","url":null,"abstract":"Neuromorphic computing is a leading option for non von-Neumann computing architectures. With it, neural networks are developed that derive architectural inspiration from how the brain operates with neurons, synapses, and spikes. These networks are often implemented in either software or hardware based neuroprocessors designed to handle specific tasks efficiently. Even if implemented in hardware, software emulation is instrumental in determining the worthwhile features and capabilities of the architecture. In this work two novel neuroprocessors are introduced: the software-based RISP neuroprocessor, and the RAVENS hardware neuroprocessor. Several benchmark tests using control applications are performed with each neuroprocessor configured in various ways to evaluate their comparative performance and training properties.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127625374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An Effective Test Method for Block RAMs in Heterogeneous FPGAs Based on a Novel Partial Bitstream Relocation Technique 基于部分位流重定位技术的异构fpga块ram测试方法
Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530317
Wei-Xi Xiong, Yanze Li, Changpeng Sun, Huanlin Luo, Jiafeng Liu, Jian Wang, Jinmei Lai, G. Qu
Block RAMs (BRAMs) play an important role in modern heterogenous FPGAs, hence how to test them comprehensively and effectively becomes a major concern. On-chip Partial Bitstream Relocation (PBR) technique based on FPGA Dynamic Partial Reconfiguration (DPR) can decrease the time spent on configuring modules in FPGA while reducing the memory resources overhead for storing partial bitstreams of the reconfigurable modules. The previous PBR technique is difficult to be combined with BRAM test directly, because they are somehow tedious, unsuitable for large-scale design or limited to specific devices. Besides, the problem exists for BRAM testing is that fault model is still incomplete and testing algorithms need to be improved to achieve higher fault coverage. An Effective BRAM test method based on a novel PBR technique is proposed in this paper. Our test method establishes a complete fault model for BRAM and improves the testing algorithms for faults in BRAM ECC circuits and intra-word coupling faults in SRAM cells. On-board experiments are carried out with Xilinx xc7vx690t device, and 14 BRAM configurations are used to fully test BRAMs. In conjunction with the proposed PBR technique, the number of configurations can be reduced to 10, which leads to a 35.7% time saving.
块ram在现代异构fpga中扮演着重要的角色,因此如何对其进行全面有效的测试成为人们关注的焦点。基于FPGA动态部分重构(DPR)的片上部分位流重定位(PBR)技术可以减少FPGA中配置模块的时间,同时减少存储可重构模块的部分位流所消耗的内存资源。以前的PBR技术很难与BRAM测试直接结合,因为它们有些繁琐,不适合大规模设计或仅限于特定的器件。此外,BRAM测试存在的问题是故障模型仍然不完整,需要改进测试算法以达到更高的故障覆盖率。本文提出了一种基于PBR技术的有效的BRAM测试方法。我们的测试方法建立了完整的BRAM故障模型,改进了BRAM ECC电路故障和SRAM单元字内耦合故障的测试算法。在Xilinx xc7vx690t器件上进行了车载实验,使用14种BRAM配置对BRAM进行了全面测试。结合建议的PBR技术,配置的数量可以减少到10个,从而节省35.7%的时间。
{"title":"An Effective Test Method for Block RAMs in Heterogeneous FPGAs Based on a Novel Partial Bitstream Relocation Technique","authors":"Wei-Xi Xiong, Yanze Li, Changpeng Sun, Huanlin Luo, Jiafeng Liu, Jian Wang, Jinmei Lai, G. Qu","doi":"10.1145/3526241.3530317","DOIUrl":"https://doi.org/10.1145/3526241.3530317","url":null,"abstract":"Block RAMs (BRAMs) play an important role in modern heterogenous FPGAs, hence how to test them comprehensively and effectively becomes a major concern. On-chip Partial Bitstream Relocation (PBR) technique based on FPGA Dynamic Partial Reconfiguration (DPR) can decrease the time spent on configuring modules in FPGA while reducing the memory resources overhead for storing partial bitstreams of the reconfigurable modules. The previous PBR technique is difficult to be combined with BRAM test directly, because they are somehow tedious, unsuitable for large-scale design or limited to specific devices. Besides, the problem exists for BRAM testing is that fault model is still incomplete and testing algorithms need to be improved to achieve higher fault coverage. An Effective BRAM test method based on a novel PBR technique is proposed in this paper. Our test method establishes a complete fault model for BRAM and improves the testing algorithms for faults in BRAM ECC circuits and intra-word coupling faults in SRAM cells. On-board experiments are carried out with Xilinx xc7vx690t device, and 14 BRAM configurations are used to fully test BRAMs. In conjunction with the proposed PBR technique, the number of configurations can be reduced to 10, which leads to a 35.7% time saving.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130827623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Session 3B: VLSI for Machine Learning and Artifical Intelligence 1 会议详情:3B: VLSI用于机器学习和人工智能
Pub Date : 2022-06-06 DOI: 10.1145/3542687
J. Hu
{"title":"Session details: Session 3B: VLSI for Machine Learning and Artifical Intelligence 1","authors":"J. Hu","doi":"10.1145/3542687","DOIUrl":"https://doi.org/10.1145/3542687","url":null,"abstract":"","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133527153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and Evaluation of In-Exact Compressor based Approximate Multipliers 基于精确压缩器的近似乘法器设计与评价
Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530320
C. PrashanthH., R. SoujanyaS., Bindu G. Gowda, M. Rao
VLSI implementation of arithmetic functions are of high demand considering the rise in hardware realization of image and digital signal processing modules for various autonomous applications. The hardware implementation offers faster results and desirable outcome, but expecting the same design metrics in the form of power, footprint and delay on a tiny decision-making edge devices with limited resources needs design improvisation. Approximate computing promises to support the required hardware metrics in error resilient applications where the inexact output is not deviated much from the expected one, and decision made remains unchanged. Multiplier design blocks are heavily used in the multimedia functional chip, and introducing approximation in these blocks effectively benefits design metrics and chip cost of the developed system-on-chip(SoC). The proposed work attempts to design and use various sizes of approximate AND-OR re-coded compressors in the multiple reduction stages, along with various fast adders in the final addition stage of multiplier design. Further, design metrics and resources utilized for different multiplier designs were characterized in ASIC and FPGA synthesis flows respectively, along with their error statistics. Designed approximate multipliers were employed in Gaussian smoothing application to evaluate the quality-hardware resource trade-off of approximation
考虑到各种自主应用中图像和数字信号处理模块硬件实现的兴起,对算术函数的VLSI实现提出了很高的要求。硬件实现提供了更快的结果和理想的结果,但是在资源有限的小型决策边缘设备上期望相同的设计指标以功率、占地面积和延迟的形式出现,需要即兴设计。近似计算承诺在错误弹性应用程序中支持所需的硬件指标,其中不精确的输出不会偏离预期的输出,并且所做的决策保持不变。多媒体功能芯片中大量使用乘法器设计模块,在这些模块中引入近似可以有效地提高设计指标和芯片成本。本文尝试在多重约简阶段设计和使用各种大小的近似与或重新编码的压缩器,以及在乘法器设计的最后加法阶段使用各种快速加法器。此外,在ASIC和FPGA合成流程中分别描述了不同乘法器设计的设计指标和使用的资源,以及它们的误差统计。在高斯平滑应用中采用设计的近似乘法器来评估近似的质量-硬件资源权衡
{"title":"Design and Evaluation of In-Exact Compressor based Approximate Multipliers","authors":"C. PrashanthH., R. SoujanyaS., Bindu G. Gowda, M. Rao","doi":"10.1145/3526241.3530320","DOIUrl":"https://doi.org/10.1145/3526241.3530320","url":null,"abstract":"VLSI implementation of arithmetic functions are of high demand considering the rise in hardware realization of image and digital signal processing modules for various autonomous applications. The hardware implementation offers faster results and desirable outcome, but expecting the same design metrics in the form of power, footprint and delay on a tiny decision-making edge devices with limited resources needs design improvisation. Approximate computing promises to support the required hardware metrics in error resilient applications where the inexact output is not deviated much from the expected one, and decision made remains unchanged. Multiplier design blocks are heavily used in the multimedia functional chip, and introducing approximation in these blocks effectively benefits design metrics and chip cost of the developed system-on-chip(SoC). The proposed work attempts to design and use various sizes of approximate AND-OR re-coded compressors in the multiple reduction stages, along with various fast adders in the final addition stage of multiplier design. Further, design metrics and resources utilized for different multiplier designs were characterized in ASIC and FPGA synthesis flows respectively, along with their error statistics. Designed approximate multipliers were employed in Gaussian smoothing application to evaluate the quality-hardware resource trade-off of approximation","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114343768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
MI2D: Accelerating Matrix Inversion with 2-Dimensional Tile Manipulations MI2D:用二维贴图操作加速矩阵反转
Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530314
Lingfeng Chen, Tian Xia, Wenzhe Zhao, Pengju Ren
Matrix inversion is critical in mathematics and scientific applications. Large-scale dense matrix inversion is especially challenging for modern computers due to its heavy dependency of matrix elements and the poor temporal data locality. In this paper, we propose a novel accelerator termed MI2D, which converts matrix inversion into regular matrix multiplications using 2-dimensional cross-tile operations and novel algorithms for efficient data reuse and computations. Our evaluations show that MI2D can be easily integrated with existing matrix engines in modern high-end CPU and NPU, and effectively improves matrix inversion with 2.7× speedup against Intel Skylake CPU, and 24× against NVIDIA RTX 2080 Ti.
矩阵反演在数学和科学应用中是至关重要的。大规模密集矩阵反演由于其对矩阵元素的依赖性和数据局部性差,对现代计算机来说尤其具有挑战性。在本文中,我们提出了一种称为MI2D的新型加速器,它使用二维交叉块操作和有效的数据重用和计算的新算法将矩阵反演转换为规则矩阵乘法。我们的评估表明,MI2D可以很容易地与现代高端CPU和NPU中现有的矩阵引擎集成,并有效地提高了矩阵反演,在英特尔Skylake CPU上加速2.7倍,在NVIDIA RTX 2080 Ti上加速24倍。
{"title":"MI2D: Accelerating Matrix Inversion with 2-Dimensional Tile Manipulations","authors":"Lingfeng Chen, Tian Xia, Wenzhe Zhao, Pengju Ren","doi":"10.1145/3526241.3530314","DOIUrl":"https://doi.org/10.1145/3526241.3530314","url":null,"abstract":"Matrix inversion is critical in mathematics and scientific applications. Large-scale dense matrix inversion is especially challenging for modern computers due to its heavy dependency of matrix elements and the poor temporal data locality. In this paper, we propose a novel accelerator termed MI2D, which converts matrix inversion into regular matrix multiplications using 2-dimensional cross-tile operations and novel algorithms for efficient data reuse and computations. Our evaluations show that MI2D can be easily integrated with existing matrix engines in modern high-end CPU and NPU, and effectively improves matrix inversion with 2.7× speedup against Intel Skylake CPU, and 24× against NVIDIA RTX 2080 Ti.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132905356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advanced Environment Modeling and Interaction in an Open Source RISC-V Virtual Prototype 开源RISC-V虚拟样机中的高级环境建模与交互
Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530374
Pascal Pieper, V. Herdt, R. Drechsler
RISC-V is a modern Instruction Set Architecture (ISA) that by its open nature in combination with a clean and modular design has enormous potential to become a game changer in the Internet of Things (IoT) era. Recently, SystemC-based Virtual Prototypes (VPs) have been introduced into the RISC-V ecosystem to lay the foundation for advanced industry-proven system-level use-cases. However, a VP-driven environment modeling and interaction has been mostly neglected in the RISC-V context. In this paper we propose such an extension to broaden the application domain for virtual prototyping in the RISC-V context. As foundation, we build upon the open source RISC-V VP available at GitHub. For visualization purposes of the environment we designed a Graphical User Interface (GUI) and designed appropriate libraries to offer hardware communication interfaces such as GPIO and SPI from the VP to an interactive environment model. Our approach is designed to be integrated with SystemC-based VPs that leverage a Transaction Level Modeling (TLM) communication system to prefer a speed optimized simulation. To show the practicability of an environment model, we provide a set of building blocks such as buttons, LEDs and an OLED display and configure them in two demonstration environments. Our evaluation with three different case-studies demonstrates the applicability of our approach in building virtual environments effectively and correctly in matching the real physical systems. To advance the RISC-V community and stimulate further research we provide our extended VP platform with the environment configuration and visualization toolbox as well as both case-studies as open source on GitHub as well.
RISC-V是一种现代指令集架构(ISA),凭借其开放的性质,结合干净和模块化的设计,具有成为物联网(IoT)时代游戏规则改变者的巨大潜力。最近,基于systemc的虚拟原型(vp)已被引入RISC-V生态系统,为先进的行业验证系统级用例奠定了基础。然而,在RISC-V环境中,副总裁驱动的环境建模和交互大多被忽视。在本文中,我们提出了这样的扩展,以扩大在RISC-V环境下虚拟原型的应用领域。作为基础,我们建立在GitHub上可用的开源RISC-V VP之上。为了使环境可视化,我们设计了一个图形用户界面(GUI),并设计了适当的库,以提供从VP到交互式环境模型的硬件通信接口,如GPIO和SPI。我们的方法旨在与基于systemc的副总裁集成,该副总裁利用事务级建模(TLM)通信系统来优选速度优化模拟。为了展示环境模型的实用性,我们提供了一组构建块,如按钮、led和OLED显示器,并在两个演示环境中配置它们。我们对三个不同的案例研究进行了评估,证明了我们的方法在有效和正确地匹配真实物理系统中构建虚拟环境方面的适用性。为了推进RISC-V社区并刺激进一步的研究,我们提供了扩展的VP平台,包括环境配置和可视化工具箱,以及在GitHub上开源的案例研究。
{"title":"Advanced Environment Modeling and Interaction in an Open Source RISC-V Virtual Prototype","authors":"Pascal Pieper, V. Herdt, R. Drechsler","doi":"10.1145/3526241.3530374","DOIUrl":"https://doi.org/10.1145/3526241.3530374","url":null,"abstract":"RISC-V is a modern Instruction Set Architecture (ISA) that by its open nature in combination with a clean and modular design has enormous potential to become a game changer in the Internet of Things (IoT) era. Recently, SystemC-based Virtual Prototypes (VPs) have been introduced into the RISC-V ecosystem to lay the foundation for advanced industry-proven system-level use-cases. However, a VP-driven environment modeling and interaction has been mostly neglected in the RISC-V context. In this paper we propose such an extension to broaden the application domain for virtual prototyping in the RISC-V context. As foundation, we build upon the open source RISC-V VP available at GitHub. For visualization purposes of the environment we designed a Graphical User Interface (GUI) and designed appropriate libraries to offer hardware communication interfaces such as GPIO and SPI from the VP to an interactive environment model. Our approach is designed to be integrated with SystemC-based VPs that leverage a Transaction Level Modeling (TLM) communication system to prefer a speed optimized simulation. To show the practicability of an environment model, we provide a set of building blocks such as buttons, LEDs and an OLED display and configure them in two demonstration environments. Our evaluation with three different case-studies demonstrates the applicability of our approach in building virtual environments effectively and correctly in matching the real physical systems. To advance the RISC-V community and stimulate further research we provide our extended VP platform with the environment configuration and visualization toolbox as well as both case-studies as open source on GitHub as well.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116611085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Thermal and Power-Aware Run-time Performance Management of 3D MPSoCs with Integrated Flow Cell Arrays 集成流电池阵列的3D mpsoc热和功耗感知运行时性能管理
Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530309
Halima Najibi, A. Levisse, G. Ansaloni, Marina Zapater, David Atienza Alonso
Flow Cell Arrays (FCA) technology employs microchannels filled with an electrolytic fluid to concurrently provide cooling and power generation to integrated circuits (ICs). This solution is particularly appealing for Three-Dimensional Multi-Processor Systems-on-Chip (3D MPSoCs) realized in deeply scaled technologies, as their extreme power densities result in significant thermal and voltage supply challenges. FCAs provide them with extra power to boost performance. However, the dual effects of FCAs (cooling and power supply) have conflicting trends leading to a complex interplay between temperature, voltage stability, and performance. In this paper, we explore this trade-off by introducing a novel methodology that controls the operating frequency of computing components and the electrolytic coolant flow rate at run-time. Our strategy enables tangible performance gains while abiding by timing, voltage drop, and temperature constraints. We showcase its benefits by targeting a 4-layer 3D MPSoC, achieving up to 24% increase in the operating frequencies and resulting in application speedups of up to 17%, while reducing the costs related to FCA liquid pumping energy.
流动电池阵列(FCA)技术采用充满电解流体的微通道,同时为集成电路(ic)提供冷却和发电。这种解决方案对于采用深度缩放技术实现的三维多处理器片上系统(3D mpsoc)特别有吸引力,因为它们的极端功率密度会导致显著的热和电压供应挑战。fca为它们提供额外的动力来提高性能。然而,fca的双重作用(冷却和供电)有相互冲突的趋势,导致温度、电压稳定性和性能之间复杂的相互作用。在本文中,我们通过引入一种新的方法来探索这种权衡,该方法可以控制计算组件的工作频率和运行时的电解冷却剂流量。我们的策略在遵守时序、电压降和温度限制的同时,实现了切实的性能提升。我们以4层3D MPSoC为目标,展示了其优势,实现了高达24%的工作频率提高,应用速度高达17%,同时降低了与FCA液体泵能量相关的成本。
{"title":"Thermal and Power-Aware Run-time Performance Management of 3D MPSoCs with Integrated Flow Cell Arrays","authors":"Halima Najibi, A. Levisse, G. Ansaloni, Marina Zapater, David Atienza Alonso","doi":"10.1145/3526241.3530309","DOIUrl":"https://doi.org/10.1145/3526241.3530309","url":null,"abstract":"Flow Cell Arrays (FCA) technology employs microchannels filled with an electrolytic fluid to concurrently provide cooling and power generation to integrated circuits (ICs). This solution is particularly appealing for Three-Dimensional Multi-Processor Systems-on-Chip (3D MPSoCs) realized in deeply scaled technologies, as their extreme power densities result in significant thermal and voltage supply challenges. FCAs provide them with extra power to boost performance. However, the dual effects of FCAs (cooling and power supply) have conflicting trends leading to a complex interplay between temperature, voltage stability, and performance. In this paper, we explore this trade-off by introducing a novel methodology that controls the operating frequency of computing components and the electrolytic coolant flow rate at run-time. Our strategy enables tangible performance gains while abiding by timing, voltage drop, and temperature constraints. We showcase its benefits by targeting a 4-layer 3D MPSoC, achieving up to 24% increase in the operating frequencies and resulting in application speedups of up to 17%, while reducing the costs related to FCA liquid pumping energy.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125890841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Leveraging Machine Learning for Gate-level Timing Estimation Using Current Source Models and Effective Capacitance 利用电流源模型和有效电容利用机器学习进行门级定时估计
Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530343
Dimitrios Garyfallou, Anastasis Vagenas, Charalampos Antoniadis, Y. Massoud, G. Stamoulis
With process technology scaling, accurate gate-level timing analysis becomes even more challenging. Highly resistive on-chip interconnects have an ever-increasing impact on timing, signals no longer resemble smooth saturated ramps, while gate-interconnect interdependencies are stronger. Moreover, efficiency is a serious concern since repeatedly invoking a signoff tool during incremental optimization of modern VLSI circuits has become a major bottleneck. In this paper, we introduce a novel machine learning approach for timing estimation of gate-level stages using current source models and the concept of multiple slew and effective capacitance values. First, we exploit a fast iterative algorithm for initial stage timing estimation and feature extraction, and then we employ four artificial neural networks to correlate the initial delay and slew estimates for both the driver and interconnect with golden SPICE results. Contrary to prior works, our method uses fewer and more accurate features to represent the stage, leading to more efficient models. Experimental evaluation on driver-interconnect stages implemented in 7 nm FinFET technology indicates that our method leads to 0.99% (0.90 ps) and 2.54% (2.59 ps) mean error against SPICE for stage delay and slew, respectively. Furthermore, it has a small memory footprint (1.27 MB) and performs 35× faster than a commercial signoff tool. Thus, it may be integrated into timing-driven optimization steps to provide signoff accuracy and expedite timing closure.
随着工艺技术的扩展,精确的门级时序分析变得更加具有挑战性。高电阻片上互连对时序的影响越来越大,信号不再像平滑的饱和斜坡,而栅极互连的相互依赖性更强。此外,效率是一个严重的问题,因为在现代VLSI电路的增量优化过程中反复调用签名工具已成为主要瓶颈。在本文中,我们介绍了一种新的机器学习方法,用于门级的时序估计,该方法使用电流源模型和多摆和有效电容值的概念。首先,我们利用快速迭代算法进行初始阶段时间估计和特征提取,然后我们使用四个人工神经网络将驱动器和互连的初始延迟和旋转估计与黄金SPICE结果相关联。与之前的工作相反,我们的方法使用更少和更准确的特征来表示阶段,从而获得更高效的模型。对采用7nm FinFET技术实现的驱动器互连级的实验评估表明,我们的方法对SPICE的级延迟和电平转换的平均误差分别为0.99% (0.90 ps)和2.54% (2.59 ps)。此外,它的内存占用很小(1.27 MB),执行速度比商业签名工具快35倍。因此,它可以集成到时间驱动的优化步骤中,以提供签名准确性并加快时间关闭。
{"title":"Leveraging Machine Learning for Gate-level Timing Estimation Using Current Source Models and Effective Capacitance","authors":"Dimitrios Garyfallou, Anastasis Vagenas, Charalampos Antoniadis, Y. Massoud, G. Stamoulis","doi":"10.1145/3526241.3530343","DOIUrl":"https://doi.org/10.1145/3526241.3530343","url":null,"abstract":"With process technology scaling, accurate gate-level timing analysis becomes even more challenging. Highly resistive on-chip interconnects have an ever-increasing impact on timing, signals no longer resemble smooth saturated ramps, while gate-interconnect interdependencies are stronger. Moreover, efficiency is a serious concern since repeatedly invoking a signoff tool during incremental optimization of modern VLSI circuits has become a major bottleneck. In this paper, we introduce a novel machine learning approach for timing estimation of gate-level stages using current source models and the concept of multiple slew and effective capacitance values. First, we exploit a fast iterative algorithm for initial stage timing estimation and feature extraction, and then we employ four artificial neural networks to correlate the initial delay and slew estimates for both the driver and interconnect with golden SPICE results. Contrary to prior works, our method uses fewer and more accurate features to represent the stage, leading to more efficient models. Experimental evaluation on driver-interconnect stages implemented in 7 nm FinFET technology indicates that our method leads to 0.99% (0.90 ps) and 2.54% (2.59 ps) mean error against SPICE for stage delay and slew, respectively. Furthermore, it has a small memory footprint (1.27 MB) and performs 35× faster than a commercial signoff tool. Thus, it may be integrated into timing-driven optimization steps to provide signoff accuracy and expedite timing closure.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123677320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
MnM: A Fast and Efficient Min/Max Searching in MRAM MRAM中一种快速有效的最小/最大搜索方法
Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530349
Amitesh Sridharan, Fan Zhang, Deliang Fan
In-Memory Computing (IMC) technology has been considered to be a promising approach to solve well-known memory-wall challenge for data intensive applications. In this paper, we are the first to propose MnM, a novel IMC system with innovative architecture/circuit designs for fast and efficient Min/Max searching computation in emerging Spin-Orbit Torque Magnetic Random Access Memory (SOT-MRAM). Our proposed SOT-MRAM based in-memory logic circuits are specially optimized to perform parallel, one-cycle XNOR logic that are heavily used in the Min/Max searching-in-memory algorithm. Our novel in-memory XNOR circuit also has an overhead of just two transistors per row when compared to most prior methodologies which typically use multiple sense amplifiers or complex CMOS logic gates. We also design all other required peripheral circuits for implementing complete Min/Max searching-in-MRAM computation. Our cross-layer comprehensive experiments on Dijkstra's algorithm and other sorting algorithms in real word datasets show that our MnM could achieve significant performance improvement over CPUs, GPUs, and other competing IMC platforms based on RRAM/MRAM/DRAM.
内存计算(IMC)技术被认为是解决数据密集型应用中众所周知的内存墙难题的一种很有前途的方法。在本文中,我们首先提出了MnM,这是一种新颖的IMC系统,具有创新的架构/电路设计,可以在新兴的自旋轨道扭矩磁随机存取存储器(SOT-MRAM)中快速高效地进行最小/最大搜索计算。我们提出的基于SOT-MRAM的内存逻辑电路经过专门优化,可以执行并行的单周期XNOR逻辑,这些逻辑在最小/最大内存搜索算法中大量使用。与大多数先前使用多感测放大器或复杂CMOS逻辑门的方法相比,我们的新型内存XNOR电路每行只有两个晶体管的开销。我们还设计了所有其他所需的外围电路,以实现完整的mram最小/最大搜索计算。我们对Dijkstra算法和其他排序算法在真实单词数据集上的跨层综合实验表明,我们的MnM可以在cpu, gpu和其他基于RRAM/MRAM/DRAM的竞争IMC平台上取得显着的性能提升。
{"title":"MnM: A Fast and Efficient Min/Max Searching in MRAM","authors":"Amitesh Sridharan, Fan Zhang, Deliang Fan","doi":"10.1145/3526241.3530349","DOIUrl":"https://doi.org/10.1145/3526241.3530349","url":null,"abstract":"In-Memory Computing (IMC) technology has been considered to be a promising approach to solve well-known memory-wall challenge for data intensive applications. In this paper, we are the first to propose MnM, a novel IMC system with innovative architecture/circuit designs for fast and efficient Min/Max searching computation in emerging Spin-Orbit Torque Magnetic Random Access Memory (SOT-MRAM). Our proposed SOT-MRAM based in-memory logic circuits are specially optimized to perform parallel, one-cycle XNOR logic that are heavily used in the Min/Max searching-in-memory algorithm. Our novel in-memory XNOR circuit also has an overhead of just two transistors per row when compared to most prior methodologies which typically use multiple sense amplifiers or complex CMOS logic gates. We also design all other required peripheral circuits for implementing complete Min/Max searching-in-MRAM computation. Our cross-layer comprehensive experiments on Dijkstra's algorithm and other sorting algorithms in real word datasets show that our MnM could achieve significant performance improvement over CPUs, GPUs, and other competing IMC platforms based on RRAM/MRAM/DRAM.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130254694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the Great Lakes Symposium on VLSI 2022
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1