首页 > 最新文献

2023 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)最新文献

英文 中文
A Low-power Neural 3D Rendering Processor with Bio-inspired Visual Perception Core and Hybrid DNN Acceleration 具有仿生视觉感知核心和混合DNN加速的低功耗神经3D渲染处理器
Pub Date : 2023-04-19 DOI: 10.1109/COOLCHIPS57690.2023.10122036
Donghyeon Han, Junha Ryu, Sangyeob Kim, Sangjin Kim, Jongjun Park, H. Yoo
This paper presents a low-power neural 3D rendering processor which can support both inference (INF) and training of the deep neural network (DNN). The processor is realized with four key features: 1) bio-inspired visual perception core (VPC), 2) neural engines using hybrid sparsity exploitation, 3) dynamic neural network allocation (DNNA) core with centrifugal-sampling (CS), and 4) hierarchical weight memory (HWM) with input-channel (iCh) pre-fetcher. Thanks to the VPC and the proposed DNN acceleration architecture, it can improve throughput by 4174x and demonstrates> 30 FPS rendering while consuming 133 mW power.
本文提出了一种低功耗的神经网络三维渲染处理器,它既支持推理(INF),又支持深度神经网络(DNN)的训练。该处理器具有四个关键特征:1)仿生视觉感知核心(VPC), 2)基于混合稀疏性的神经引擎,3)基于离心采样(CS)的动态神经网络分配(dna)核心,以及4)基于输入通道(iCh)预取器的分层权重记忆(HWM)。由于VPC和提出的DNN加速架构,它可以将吞吐量提高4174x,并在消耗133 mW功率的情况下展示bbb30 FPS渲染。
{"title":"A Low-power Neural 3D Rendering Processor with Bio-inspired Visual Perception Core and Hybrid DNN Acceleration","authors":"Donghyeon Han, Junha Ryu, Sangyeob Kim, Sangjin Kim, Jongjun Park, H. Yoo","doi":"10.1109/COOLCHIPS57690.2023.10122036","DOIUrl":"https://doi.org/10.1109/COOLCHIPS57690.2023.10122036","url":null,"abstract":"This paper presents a low-power neural 3D rendering processor which can support both inference (INF) and training of the deep neural network (DNN). The processor is realized with four key features: 1) bio-inspired visual perception core (VPC), 2) neural engines using hybrid sparsity exploitation, 3) dynamic neural network allocation (DNNA) core with centrifugal-sampling (CS), and 4) hierarchical weight memory (HWM) with input-channel (iCh) pre-fetcher. Thanks to the VPC and the proposed DNN acceleration architecture, it can improve throughput by 4174x and demonstrates> 30 FPS rendering while consuming 133 mW power.","PeriodicalId":387793,"journal":{"name":"2023 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123127139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Keynote and Invited Speakers Biography 主题演讲和特邀演讲者简介
Pub Date : 2023-04-19 DOI: 10.1109/coolchips57690.2023.10122034
{"title":"Keynote and Invited Speakers Biography","authors":"","doi":"10.1109/coolchips57690.2023.10122034","DOIUrl":"https://doi.org/10.1109/coolchips57690.2023.10122034","url":null,"abstract":"","PeriodicalId":387793,"journal":{"name":"2023 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126721901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lookup Table Modular Reduction: A Low-Latency Modular Reduction for Fast ECC Processor 查找表模块化缩减:用于快速ECC处理器的低延迟模块化缩减
Pub Date : 2023-04-19 DOI: 10.1109/COOLCHIPS57690.2023.10122002
Anawin Opasatian, M. Ikeda
Modular multiplication is used extensively in many cryptosystems, such as in Elliptic Curve Cryptography (ECC). This is why the speed of the modular multiplication has a high impact on the overall speed of the cryptography computation. Recent works utilizing a lookup table for inferring value have shown a promising way for fast computation of modular re-duction, which can be used to construct a much faster modular multiplier than the conventional methods on FPGA. In this work, we explore an alternative way to implement the said technique, which we will call Lookup Table Modular Reduction (LUTMR). We show that in this technique, the modulo value used for generating the modular reduction circuit has a high impact on the generated circuit efficiency. With the LUTMR technique, three modular multipliers for curve Secp256k1, NIST-P384, and BLS12-381 are implemented on FPGA, which has shown to be the fastest compared to recent works. The NIST-P384 ECC processor is also implemented with the designed modular multiplier. It can compute the scalar multiplication in $75.08 mu mathrm{s}$, the fastest and lowest in Time-Area criteria among recent works.
模乘法在许多密码系统中得到了广泛的应用,例如椭圆曲线密码系统(ECC)。这就是为什么模乘法的速度对加密计算的整体速度有很大的影响。最近利用查找表来推断值的工作显示了一种有希望的快速计算模块化约简的方法,该方法可以用于构建比FPGA上传统方法更快的模块化乘法器。在这项工作中,我们探索了实现上述技术的另一种方法,我们将其称为查找表模块化缩减(LUTMR)。我们表明,在这种技术中,用于生成模块化缩减电路的模值对生成的电路效率有很大的影响。利用LUTMR技术,在FPGA上实现了曲线Secp256k1、NIST-P384和BLS12-381的三个模块化乘法器,与最近的工作相比,这是最快的。NIST-P384 ECC处理器也实现了设计的模块化乘法器。它可以在$75.08 mu mathm {s}$中计算标量乘法,是最近的作品中最快和最低的Time-Area标准。
{"title":"Lookup Table Modular Reduction: A Low-Latency Modular Reduction for Fast ECC Processor","authors":"Anawin Opasatian, M. Ikeda","doi":"10.1109/COOLCHIPS57690.2023.10122002","DOIUrl":"https://doi.org/10.1109/COOLCHIPS57690.2023.10122002","url":null,"abstract":"Modular multiplication is used extensively in many cryptosystems, such as in Elliptic Curve Cryptography (ECC). This is why the speed of the modular multiplication has a high impact on the overall speed of the cryptography computation. Recent works utilizing a lookup table for inferring value have shown a promising way for fast computation of modular re-duction, which can be used to construct a much faster modular multiplier than the conventional methods on FPGA. In this work, we explore an alternative way to implement the said technique, which we will call Lookup Table Modular Reduction (LUTMR). We show that in this technique, the modulo value used for generating the modular reduction circuit has a high impact on the generated circuit efficiency. With the LUTMR technique, three modular multipliers for curve Secp256k1, NIST-P384, and BLS12-381 are implemented on FPGA, which has shown to be the fastest compared to recent works. The NIST-P384 ECC processor is also implemented with the designed modular multiplier. It can compute the scalar multiplication in $75.08 mu mathrm{s}$, the fastest and lowest in Time-Area criteria among recent works.","PeriodicalId":387793,"journal":{"name":"2023 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129294872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FPGA Emulation of Through-Silicon-Via (TSV) Dataflow Network for 3D Standard Chip Stacking System 三维标准芯片堆叠系统TSV数据流网络的FPGA仿真
Pub Date : 2023-04-19 DOI: 10.1109/COOLCHIPS57690.2023.10122025
Takeshi Ohkawa, M. Aoyagi
Through-Silicon-Via (TSV) is expected to realize high-performance, low-power consumption, and lowcost 3D-LSI (Large Scale Integration) system. It is realized by integrating pre-manufactured chips with a 3D Standard Chip Stacking System (3D-SCSS) through a standard bus TSV connection. However, it is difficult to define a standard chip connection mechanism. This paper proposes an FPGA emulation of the TSV dataflow network for evaluating the performance of 3D-SCSS. To emulate 3D-SCSS, multiple-clock domains are assumed to overcome the problem of jitter in the global clock, which is a separated clock domain model. Simple dataflow experiments are done where processes are deployed to different chips and communicate among the chips in the 3D-SCSS. The evaluation shows that the emulation method is suitable to measure the latency performance of the proposed TSV dataflow network. (Keywords: 3D-LSI, TSV, FPGA, Emulation, Dataflow, 3D-SCSS)
TSV (through silicon - via)有望实现高性能、低功耗、低成本的3D-LSI(大规模集成电路)系统。它通过标准总线TSV连接将预制芯片与3D标准芯片堆叠系统(3D- scss)集成在一起实现。然而,很难定义一个标准的芯片连接机制。本文提出了一种TSV数据流网络的FPGA仿真,用于评估3D-SCSS的性能。为了模拟3D-SCSS,假设了多时钟域来克服全局时钟抖动问题,这是一种分离的时钟域模型。简单的数据流实验,其中进程部署到不同的芯片,并在3D-SCSS的芯片之间进行通信。仿真结果表明,该仿真方法适合于测量所提出的TSV数据流网络的时延性能。(关键词:3D-LSI, TSV, FPGA,仿真,数据流,3D-SCSS)
{"title":"FPGA Emulation of Through-Silicon-Via (TSV) Dataflow Network for 3D Standard Chip Stacking System","authors":"Takeshi Ohkawa, M. Aoyagi","doi":"10.1109/COOLCHIPS57690.2023.10122025","DOIUrl":"https://doi.org/10.1109/COOLCHIPS57690.2023.10122025","url":null,"abstract":"Through-Silicon-Via (TSV) is expected to realize high-performance, low-power consumption, and lowcost 3D-LSI (Large Scale Integration) system. It is realized by integrating pre-manufactured chips with a 3D Standard Chip Stacking System (3D-SCSS) through a standard bus TSV connection. However, it is difficult to define a standard chip connection mechanism. This paper proposes an FPGA emulation of the TSV dataflow network for evaluating the performance of 3D-SCSS. To emulate 3D-SCSS, multiple-clock domains are assumed to overcome the problem of jitter in the global clock, which is a separated clock domain model. Simple dataflow experiments are done where processes are deployed to different chips and communicate among the chips in the 3D-SCSS. The evaluation shows that the emulation method is suitable to measure the latency performance of the proposed TSV dataflow network. (Keywords: 3D-LSI, TSV, FPGA, Emulation, Dataflow, 3D-SCSS)","PeriodicalId":387793,"journal":{"name":"2023 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125642859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Flexibly Controllable Dynamic Cooling Methods for Solid-State Annealing Processors to Improve Combinatorial Optimization Performance 固态退火处理器灵活可控动态冷却方法提高组合优化性能
Pub Date : 2023-04-19 DOI: 10.1109/COOLCHIPS57690.2023.10121990
Genta Inoue, Daiki Okonogi, Thiem Van Chu, Jaehoon Yu, Masato Motomura, Kazushi Kawamura
A recently proposed dynamic cooling method enables automatic pseudo-temperature control in the computing process on solid-state annealing processors. Though it may be a practical approach to improve the optimization performance, its effectiveness has been verified only on one annealing policy. On the other hand, another work has claimed that annealing computation can speed up by adaptively utilizing multiple policies. In this paper, we propose a flexibly controllable dynamic cooling method effective for various policies, followed by a method to reduce the sampling frequency on an annealing system. Simulation results have demonstrated that our approach works well for several policies and can be introduced into annealing processors efficiently.
最近提出的一种动态冷却方法可以在固态退火处理器的计算过程中实现自动伪温度控制。虽然这可能是一种提高优化性能的实用方法,但其有效性仅在一种退火策略上得到了验证。另一方面,另一项研究声称退火计算可以通过自适应地使用多个策略来加快计算速度。在本文中,我们提出了一种灵活可控的动态冷却方法,适用于各种策略,其次是一种降低退火系统采样频率的方法。仿真结果表明,该方法适用于多种策略,可以有效地引入退火处理器。
{"title":"Flexibly Controllable Dynamic Cooling Methods for Solid-State Annealing Processors to Improve Combinatorial Optimization Performance","authors":"Genta Inoue, Daiki Okonogi, Thiem Van Chu, Jaehoon Yu, Masato Motomura, Kazushi Kawamura","doi":"10.1109/COOLCHIPS57690.2023.10121990","DOIUrl":"https://doi.org/10.1109/COOLCHIPS57690.2023.10121990","url":null,"abstract":"A recently proposed dynamic cooling method enables automatic pseudo-temperature control in the computing process on solid-state annealing processors. Though it may be a practical approach to improve the optimization performance, its effectiveness has been verified only on one annealing policy. On the other hand, another work has claimed that annealing computation can speed up by adaptively utilizing multiple policies. In this paper, we propose a flexibly controllable dynamic cooling method effective for various policies, followed by a method to reduce the sampling frequency on an annealing system. Simulation results have demonstrated that our approach works well for several policies and can be introduced into annealing processors efficiently.","PeriodicalId":387793,"journal":{"name":"2023 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126569013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low power implementation of Geometric High-order Decorrelation-based Source Separation on an FPGA board 基于几何高阶去相关的源分离在FPGA板上的低功耗实现
Pub Date : 2023-04-19 DOI: 10.1109/COOLCHIPS57690.2023.10121954
Ziquan Qin, Kaijie Wei, H. Amano, K. Nakadai
Open source software for robot audition called HARK aims to make “OpenCV” in audio signal processing, providing comprehensive functions from multichannel audio input to sound localization, sound source separation, and au-tomatic speech recognition. Since each of these HARK modules takes considerable energy when executed on PC, we propose to implement each module on an FPGA board called M-KUBOS connected. Here, we focus on the most computationally expensive function of HARK; the sound source separation, and implement it on a Zynq Ultrascale+ board. More than twice a performance improvement was achieved by using the sound frequency level parallelization in the HLS description compared to the software execution on the Ryzen 3990X64-core server. Power evaluation of the real board showed that the energy consumption is only 1/23.4 of the server.
机器人试听开源软件HARK,旨在将音频信号处理做到“OpenCV”,提供从多声道音频输入到声音定位、声源分离、语音自动识别等全面功能。由于这些HARK模块在PC上执行时都需要消耗相当大的能量,因此我们建议在称为M-KUBOS连接的FPGA板上实现每个模块。在这里,我们关注的是HARK中计算成本最高的函数;声源分离,并在Zynq Ultrascale+板上实现。与Ryzen 3990x64核服务器上的软件执行相比,在HLS描述中使用声音频率级并行化实现了两倍以上的性能改进。实板功耗评估显示,能耗仅为服务器的1/23.4。
{"title":"Low power implementation of Geometric High-order Decorrelation-based Source Separation on an FPGA board","authors":"Ziquan Qin, Kaijie Wei, H. Amano, K. Nakadai","doi":"10.1109/COOLCHIPS57690.2023.10121954","DOIUrl":"https://doi.org/10.1109/COOLCHIPS57690.2023.10121954","url":null,"abstract":"Open source software for robot audition called HARK aims to make “OpenCV” in audio signal processing, providing comprehensive functions from multichannel audio input to sound localization, sound source separation, and au-tomatic speech recognition. Since each of these HARK modules takes considerable energy when executed on PC, we propose to implement each module on an FPGA board called M-KUBOS connected. Here, we focus on the most computationally expensive function of HARK; the sound source separation, and implement it on a Zynq Ultrascale+ board. More than twice a performance improvement was achieved by using the sound frequency level parallelization in the HLS description compared to the software execution on the Ryzen 3990X64-core server. Power evaluation of the real board showed that the energy consumption is only 1/23.4 of the server.","PeriodicalId":387793,"journal":{"name":"2023 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132644594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Real-Time Keyword Spotting System Based on an End-To-End Binary Convolutional Neural Network in FPGA 基于端到端二进制卷积神经网络的实时关键字识别系统
Pub Date : 2023-04-19 DOI: 10.1109/COOLCHIPS57690.2023.10121981
Jinsung Yoon, Dong-Hwi Lee, Neungyun Kim, Su-Jung Lee, Gil-Ho Kwak, Tae-Hwan Kim
This paper presents a real-time keyword spotting system in an FPGA. The proposed system performs the entire KWS task based on a binary convolutional neural network (BCNN) without involving any other complicated processing. The BCNN inference is efficiently carried out by skipping redundant operations. With all the essential components integrated, the proposed system has been implemented with only 8475 look-up tables in an FPGA. The proposed system processes one-second frame in 19.8 ms, exhibiting the spotting accuracy of 91.64%.
本文介绍了一种基于FPGA的实时关键词定位系统。该系统基于二进制卷积神经网络(BCNN)完成整个KWS任务,而不涉及任何其他复杂的处理。BCNN推理是通过跳过冗余运算来实现的。集成了所有基本组件后,所提出的系统在FPGA中仅使用8475个查找表实现。该系统在19.8 ms内处理1秒帧,定位精度为91.64%。
{"title":"A Real-Time Keyword Spotting System Based on an End-To-End Binary Convolutional Neural Network in FPGA","authors":"Jinsung Yoon, Dong-Hwi Lee, Neungyun Kim, Su-Jung Lee, Gil-Ho Kwak, Tae-Hwan Kim","doi":"10.1109/COOLCHIPS57690.2023.10121981","DOIUrl":"https://doi.org/10.1109/COOLCHIPS57690.2023.10121981","url":null,"abstract":"This paper presents a real-time keyword spotting system in an FPGA. The proposed system performs the entire KWS task based on a binary convolutional neural network (BCNN) without involving any other complicated processing. The BCNN inference is efficiently carried out by skipping redundant operations. With all the essential components integrated, the proposed system has been implemented with only 8475 look-up tables in an FPGA. The proposed system processes one-second frame in 19.8 ms, exhibiting the spotting accuracy of 91.64%.","PeriodicalId":387793,"journal":{"name":"2023 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133463519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MazeCov-Q: An Efficient Maze-Based Reinforcement Learning Accelerator for Coverage
Pub Date : 2023-04-19 DOI: 10.1109/COOLCHIPS57690.2023.10122120
Infall Syafalni, Mohamad Imam Firdaus, A. M. R. Ilmy, N. Sutisna, T. Adiono
Reinforcement learning (RL) is an unsupervised machine learning that does not requires pre-assigned labeled data to learn. It is implemented in many areas such as robotics, games, finances, health, transportation, and energy applications. In this paper, we present an application of reinforcement learning accelerator for finding coverage area and its implementation in a mobile robot called MazeCov-Q (Maze-Based Coverage Q-Learning). We define a novel state that is divided into two conditions. The conditions are directions and visit counters for the Q-value calculation. The experimental results show that our MazeCov-Q achieves more than 74% path efficiency on average. Moreover, our coverage-based Q-learning accelerator (MazeCov-Q) achieves 48.3 Mps and 169.05 Mps for 50 Mhz Pynq Z1 and 175 MHz ZCU104 boards, respectively. This research is useful for surveillance, resource allocation, environmental monitoring, and autonomous navigation.
强化学习(RL)是一种无监督的机器学习,不需要预先分配标记数据来学习。它被应用于许多领域,如机器人、游戏、金融、健康、交通和能源应用。我们定义了一种新的状态,它分为两种情况。条件为q值计算的方向和访问计数器。实验结果表明,我们的MazeCov-Q平均路径效率达到74%以上。该研究对监测、资源分配、环境监测和自主导航具有重要意义。
{"title":"MazeCov-Q: An Efficient Maze-Based Reinforcement Learning Accelerator for Coverage","authors":"Infall Syafalni, Mohamad Imam Firdaus, A. M. R. Ilmy, N. Sutisna, T. Adiono","doi":"10.1109/COOLCHIPS57690.2023.10122120","DOIUrl":"https://doi.org/10.1109/COOLCHIPS57690.2023.10122120","url":null,"abstract":"Reinforcement learning (RL) is an unsupervised machine learning that does not requires pre-assigned labeled data to learn. It is implemented in many areas such as robotics, games, finances, health, transportation, and energy applications. In this paper, we present an application of reinforcement learning accelerator for finding coverage area and its implementation in a mobile robot called MazeCov-Q (Maze-Based Coverage Q-Learning). We define a novel state that is divided into two conditions. The conditions are directions and visit counters for the Q-value calculation. The experimental results show that our MazeCov-Q achieves more than 74% path efficiency on average. Moreover, our coverage-based Q-learning accelerator (MazeCov-Q) achieves 48.3 Mps and 169.05 Mps for 50 Mhz Pynq Z1 and 175 MHz ZCU104 boards, respectively. This research is useful for surveillance, resource allocation, environmental monitoring, and autonomous navigation.","PeriodicalId":387793,"journal":{"name":"2023 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126025493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 2.41-μW/MHz, 437-PE/mm2 CGRA in 22 nm FD-SOI With RISC-Like Code Generation 一个2.41 μ w /MHz, 437-PE/mm2的22 nm FD-SOI CGRA与类risc代码生成
Pub Date : 2023-04-19 DOI: 10.1109/COOLCHIPS57690.2023.10121985
Tobias Kaiser, F. Gerfers
While coarse-grained reconfigurable arrays (CGRAs) have the potential to improve energy efficiency in general-purpose computing beyond the limitations of von Neumann architectures, they suffer from challenges in code generation. Pasithea-l is a CGRA architecture that aims to combine high energy efficiency with RISC-like programmability. This paper presents its first silicon prototype and a C compiler that uses conventional CPU compiler techniques. Compared to code generation for traditional CGRAs, which require expensive place and route steps, this method of code generation reduces compile times and compiler complexity significantly. Performance and power were measured for a set of benchmark programs written in C. On average, energy efficiency of 195.1 int32 MIPS/mW and active power of 2.41μW/MHz were achieved. Peak energy efficiency of 558.2 MIPS/mW and peak performance of 97.5 MIPS were measured. Load/store instructions and instruction transfers are identified as critical factors for energy efficiency in Pasithea. In comparison to an MCU with state-of-the-art energy efficiency, Pasithea achieves higher energy efficiency in four of the benchmarked programs. Switched capacitance per benchmark run was reduced by a factor of approximately 1.4, on average. Its 0.75 mm2 core area and fabric density of 437 Plis/mm2 enable use in cost-sensitive applications and permit further upscaling.
虽然粗粒度可重构阵列(CGRAs)有潜力提高通用计算的能源效率,超越了冯·诺伊曼架构的限制,但它们在代码生成方面面临挑战。pasithea - 1是一种CGRA架构,旨在将高能效与类似risc的可编程性相结合。本文介绍了它的第一个硅原型和一个使用传统CPU编译技术的C编译器。传统的CGRAs代码生成需要昂贵的放置和路由步骤,与之相比,这种代码生成方法大大减少了编译时间和编译器的复杂性。对一组用c语言编写的基准程序进行了性能和功耗测试,平均实现了195.1 int32 MIPS/mW的能效和2.41μW/MHz的有功功率。峰值能效为558.2 MIPS/mW,峰值性能为97.5 MIPS/mW。加载/存储指令和指令传输被认为是Pasithea能效的关键因素。与具有最先进能效的MCU相比,Pasithea在四个基准程序中实现了更高的能效。每次基准测试运行的开关电容平均减少了约1.4倍。其0.75 mm2的核心面积和437 Plis/mm2的织物密度使其能够在成本敏感的应用中使用,并允许进一步升级。
{"title":"A 2.41-μW/MHz, 437-PE/mm2 CGRA in 22 nm FD-SOI With RISC-Like Code Generation","authors":"Tobias Kaiser, F. Gerfers","doi":"10.1109/COOLCHIPS57690.2023.10121985","DOIUrl":"https://doi.org/10.1109/COOLCHIPS57690.2023.10121985","url":null,"abstract":"While coarse-grained reconfigurable arrays (CGRAs) have the potential to improve energy efficiency in general-purpose computing beyond the limitations of von Neumann architectures, they suffer from challenges in code generation. Pasithea-l is a CGRA architecture that aims to combine high energy efficiency with RISC-like programmability. This paper presents its first silicon prototype and a C compiler that uses conventional CPU compiler techniques. Compared to code generation for traditional CGRAs, which require expensive place and route steps, this method of code generation reduces compile times and compiler complexity significantly. Performance and power were measured for a set of benchmark programs written in C. On average, energy efficiency of 195.1 int32 MIPS/mW and active power of 2.41μW/MHz were achieved. Peak energy efficiency of 558.2 MIPS/mW and peak performance of 97.5 MIPS were measured. Load/store instructions and instruction transfers are identified as critical factors for energy efficiency in Pasithea. In comparison to an MCU with state-of-the-art energy efficiency, Pasithea achieves higher energy efficiency in four of the benchmarked programs. Switched capacitance per benchmark run was reduced by a factor of approximately 1.4, on average. Its 0.75 mm2 core area and fabric density of 437 Plis/mm2 enable use in cost-sensitive applications and permit further upscaling.","PeriodicalId":387793,"journal":{"name":"2023 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114449203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cachet: A High-Performance Joint-Subtree Integrity Verification for Secure Non-Volatile Memory 缓存:安全非易失性存储器的高性能联合子树完整性验证
Pub Date : 2023-04-19 DOI: 10.1109/COOLCHIPS57690.2023.10122117
Tatsuya Kubo, Shinya Takamaeda-Yamazaki
Data confidentiality, integrity, and persistence are essential in secure non-volatile memory (NVM) systems. However, the cost of persisting all affected security metadata is high and leads to non-negligible overheads, including performance degradation, memory lifetime reduction, and high energy consumption. This is because integrity trees, which are typically used for data authentication of NVMs, require additional cryptographic calculations and memory accesses to persist the metadata for the recovery. In this paper, we propose Cachet, a novel integrity verification scheme that leverages set hash functions to achieve high performance and crash consistency. Specifically, Cachet maintains two set hash values representing the metadata cache state to enable the lazy update of the integrity tree in a joint-subtree manner with minimal overheads. The observation that underlies Cachet is that regarding the metadata cache, the integrity of each cached node is never verified individually, and the recovery process requires just the digest of the cached metadata. Our evaluation results show that Cachet reduces the application execution time by 21%, NVM writes by 30%, and hash calculations by 36% compared to the state-of-art solutions.
在安全的非易失性内存(NVM)系统中,数据机密性、完整性和持久性是必不可少的。但是,持久化所有受影响的安全元数据的成本很高,并导致不可忽略的开销,包括性能下降、内存生命周期缩短和高能耗。这是因为完整性树(通常用于nvm的数据身份验证)需要额外的加密计算和内存访问来持久化用于恢复的元数据。在本文中,我们提出了Cachet,一种新的完整性验证方案,它利用集合哈希函数来实现高性能和崩溃一致性。具体来说,Cachet维护两组表示元数据缓存状态的哈希值,以便以最小开销的联合子树方式延迟更新完整性树。Cachet背后的观察结果是,对于元数据缓存,从不单独验证每个缓存节点的完整性,恢复过程只需要缓存元数据的摘要。我们的评估结果表明,与最先进的解决方案相比,Cachet将应用程序执行时间减少了21%,将NVM写入时间减少了30%,将哈希计算时间减少了36%。
{"title":"Cachet: A High-Performance Joint-Subtree Integrity Verification for Secure Non-Volatile Memory","authors":"Tatsuya Kubo, Shinya Takamaeda-Yamazaki","doi":"10.1109/COOLCHIPS57690.2023.10122117","DOIUrl":"https://doi.org/10.1109/COOLCHIPS57690.2023.10122117","url":null,"abstract":"Data confidentiality, integrity, and persistence are essential in secure non-volatile memory (NVM) systems. However, the cost of persisting all affected security metadata is high and leads to non-negligible overheads, including performance degradation, memory lifetime reduction, and high energy consumption. This is because integrity trees, which are typically used for data authentication of NVMs, require additional cryptographic calculations and memory accesses to persist the metadata for the recovery. In this paper, we propose Cachet, a novel integrity verification scheme that leverages set hash functions to achieve high performance and crash consistency. Specifically, Cachet maintains two set hash values representing the metadata cache state to enable the lazy update of the integrity tree in a joint-subtree manner with minimal overheads. The observation that underlies Cachet is that regarding the metadata cache, the integrity of each cached node is never verified individually, and the recovery process requires just the digest of the cached metadata. Our evaluation results show that Cachet reduces the application execution time by 21%, NVM writes by 30%, and hash calculations by 36% compared to the state-of-art solutions.","PeriodicalId":387793,"journal":{"name":"2023 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125552997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2023 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1