2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)最新文献

英文中文

A Time-Efficient Defect Simulation Framework for Analog and Mixed Signal (AMS) Circuits 模拟与混合信号(AMS)电路的缺陷仿真框架

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893224

M. Saikiran, Mona Ganji, Degang Chen

Defect simulation time in AMS circuits is rapidly growing due to increasing circuit complexity, especially in safety-critical automotive applications which needs to meet very high defect coverage (usually >90%). Reduction in defect simulation time directly translates into reduction in overall development time. In this work, we propose a time-efficient framework to simulate various defects during pre-silicon testing of AMS circuits. The proposed method uses Verilog-A modules to realize a given defect model and tests nearly all the defects in a circuit with a single test run (for a given test condition) depending on the defect-detection scheme. To strongly validate our framework, we use two distinct defect detection schemes for operational amplifiers. The first detection scheme is the intentional offset injection (IOI) method which, predominantly, is a DC testing scheme. For this scheme, the proposed framework achieved a time-saving factor of more than 10X compared to the conventional framework. The second scheme is the oscillation test method (OTM) which is a transient testing scheme. For this OTM scheme, we show that the proposed framework can reduce the simulation time to less than 50% of the conventional simulation time. We also show that the proposed framework has no negative impact on defect coverage.

由于电路复杂性的增加，特别是在需要满足非常高的缺陷覆盖率(通常>90%)的安全关键汽车应用中，AMS电路中的缺陷模拟时间正在迅速增长。缺陷模拟时间的减少直接转化为总体开发时间的减少。在这项工作中，我们提出了一个时间高效的框架来模拟AMS电路在预硅测试期间的各种缺陷。该方法利用Verilog-A模块实现给定的缺陷模型，并根据缺陷检测方案，在给定的测试条件下，通过一次测试运行测试电路中几乎所有的缺陷。为了强有力地验证我们的框架，我们对运算放大器使用了两种不同的缺陷检测方案。第一种检测方案是有意偏移注入(IOI)方法，它主要是一种直流测试方案。对于该方案，与传统框架相比，所提出的框架节省了10倍以上的时间。第二种方案是振荡测试法(OTM)，这是一种瞬态测试方案。对于该OTM方案，我们表明所提出的框架可以将仿真时间减少到传统仿真时间的50%以下。我们还展示了所建议的框架对缺陷覆盖没有负面影响。

{"title":"A Time-Efficient Defect Simulation Framework for Analog and Mixed Signal (AMS) Circuits","authors":"M. Saikiran, Mona Ganji, Degang Chen","doi":"10.1109/SBCCI55532.2022.9893224","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893224","url":null,"abstract":"Defect simulation time in AMS circuits is rapidly growing due to increasing circuit complexity, especially in safety-critical automotive applications which needs to meet very high defect coverage (usually >90%). Reduction in defect simulation time directly translates into reduction in overall development time. In this work, we propose a time-efficient framework to simulate various defects during pre-silicon testing of AMS circuits. The proposed method uses Verilog-A modules to realize a given defect model and tests nearly all the defects in a circuit with a single test run (for a given test condition) depending on the defect-detection scheme. To strongly validate our framework, we use two distinct defect detection schemes for operational amplifiers. The first detection scheme is the intentional offset injection (IOI) method which, predominantly, is a DC testing scheme. For this scheme, the proposed framework achieved a time-saving factor of more than 10X compared to the conventional framework. The second scheme is the oscillation test method (OTM) which is a transient testing scheme. For this OTM scheme, we show that the proposed framework can reduce the simulation time to less than 50% of the conventional simulation time. We also show that the proposed framework has no negative impact on defect coverage.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116221222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Energy-Efficient Forwarding Routing Algorithm with bidirectional link quality estimator for Wireless Sensor Networks 基于双向链路质量估计的无线传感器网络节能转发路由算法

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893221

Hugo Rodríguez, Jimmy Tarrillo

This work proposes a low energy forwarding routing algorithm for Wireless Sensor Networks (WSN). The proposed algorithm is based on the Minimum Cost Forwarding Algorithm (MCFA) and utilizes a cost function in order to determine the best transmission routes with the lowest energy consumption. The proposed cost function is calculated considering the link quality between nodes and the transmission cost of the neighboring nodes. The link quality is estimated in a bidirectional way, which means it considers reception quality and transmission quality, and is power aware. For the reception quality WMEWMA is used, and for the transmission quality the power of the transceiver and the number of transmission attempts are considered. The performance of the proposed algorithm is tested in three scenarios and compared with the performance in the same scenarios of MCFA as routing algorithm with WMEWMA as cost function. For the purpose of testing, physical nodes were designed and built using ATmega328P microprocessor and nRF24L01 transceiver.

本文提出了一种无线传感器网络(WSN)的低能量转发路由算法。该算法以最小代价转发算法(Minimum Cost Forwarding algorithm, MCFA)为基础，利用代价函数确定能量消耗最低的最佳传输路径。考虑了节点间的链路质量和相邻节点的传输成本，计算了所提出的代价函数。链路质量是双向估计的，即考虑接收质量和传输质量，并且是功率感知的。对于接收质量采用WMEWMA，对于传输质量则考虑收发器的功率和传输尝试次数。在三种场景下测试了该算法的性能，并与以WMEWMA为代价函数的MCFA作为路由算法在相同场景下的性能进行了比较。为了测试，使用ATmega328P微处理器和nRF24L01收发器设计和构建了物理节点。

引用次数: 0

Deep Neural Network Feasibility Using Analog Spiking Neurons 使用模拟尖峰神经元的深度神经网络可行性

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893216

Thomas Soupizet, Zalfa Jouni, João F. Sulzbach, A. Benlarbi-Delai, Pietro M. Ferreira

Novel non-Von-Neumann solutions based on artificial intelligence (AI) have surfaced such as the neuromorphic spiking processors in either analog or digital domain. This paper proposes to study the feasibility of deep neural networks on ultra-low-power eNeuron technology. The trade-offs in terms of deep learning capabilities and energy efficiency are highlighted. This study reveals that published eNeurons and synapses satisfy linear fittings for an excitation current greater than 200 pA and a spiking frequency higher than 150 kHz, where energy efficiency is optimal. Thus, deep learning and energy efficiency are mutually exclusive for studied analog spiking neurons.

基于人工智能(AI)的新型非冯-诺伊曼解决方案已经出现，例如模拟或数字领域的神经形态尖峰处理器。本文提出研究深度神经网络在超低功耗eNeuron技术上的可行性。强调了深度学习能力和能源效率方面的权衡。该研究表明，已发表的神经元和突触满足激励电流大于200pa和峰值频率高于150khz的线性拟合，其中能量效率是最佳的。因此，深度学习和能量效率对于研究的模拟尖峰神经元是相互排斥的。

引用次数: 3

35th SBCCI 2022 Proceedings 第35届SBCCI 2022会议录

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/sbcci55532.2022.9893225

引用次数: 0

PANACA: An Open-Source Configurable Network-on-Chip Simulation Platform PANACA:一个开源可配置的片上网络仿真平台

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893260

Julian Haase, Alexander Groß, Maximilian Feichter, D. Göhringer

Network-on-Chip (NoC) is the central communication infrastructure of modern Multi-Processor Systems-on-Chip (MPSoCs), as the number of processing elements integrated on a single chip is continuously increasing. The exploration of the huge design space offered by novel NoC-based MPSoC architectures requires early and accurate system modeling and simulation. This paper introduces PANACA, an open-source highly configurable NoC simulator written in SystemC-TLM. PANACA enables fast simulation of MPSoCs using NoC-based architectures and is designed for a modular, flexible and precise modeling of network elements. It offers a wide set of accurate configurable parameters, such as topology, routing algorithm and flow control. The provided simulation and exploration management allows a detailed and automated evaluation of the huge design space.

片上网络(NoC)是现代多处理器片上系统(mpsoc)的核心通信基础设施，因为集成在单个芯片上的处理元件数量不断增加。探索新型基于noc的MPSoC架构提供的巨大设计空间需要早期和准确的系统建模和仿真。本文介绍了用SystemC-TLM编写的开源、高可配置的NoC模拟器PANACA。PANACA使用基于noc的架构实现mpsoc的快速仿真，并为网络元素的模块化、灵活和精确建模而设计。它提供了广泛的精确可配置参数，如拓扑，路由算法和流量控制。提供的模拟和探索管理允许对巨大的设计空间进行详细和自动化的评估。

引用次数: 1

Thermal-Aware Thread and Turbo Frequency Throttling Optimization for Parallel Applications 并行应用的热感知线程和涡轮频率节流优化

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893245

Sandro M. Marques, F. Rossi, M. C. Luizelli, A. C. S. Beck, A. Lorenzon

The number of processing cores in multicore pro-cessors has been rising to deal with the levels performance required by modern applications. Concomitantly, the operating temperature of hardware components has become a primary concern due to economic and environmental perspectives. Hence, different software (e.g., thread throttling) and hardware (e.g., dynamic voltage and frequency scaling - DVFS) strategies have also been applied to reduce the processor temperature levels without jeopardizing the application's performance. While thread throttling strategies artificially tune the degree of thread-level parallelism of applications to improve the hardware resources utilization according to their scalability issues, turbo frequencies have been employed to speed up the execution of a given appli-cation by increasing the processor's frequencies above the base. Given that, we propose Urano. It is a thermal-aware strategy that combines thread throttling and turbo mode optimization to diminish the processor operating temperature without penalizing the performance of the application. Through the execution of twelve well-known parallel applications on a modern multicore architecture, we demonstrate that Urano decreases the peak temperature by up to 17% compared to how parallel applications are executed with minimal impact on the performance.

多核处理器中的处理核数量不断增加，以满足现代应用程序对性能的要求。同时，由于经济和环境的考虑，硬件组件的工作温度已成为主要关注的问题。因此，不同的软件(例如，线程节流)和硬件(例如，动态电压和频率缩放- DVFS)策略也被应用于降低处理器温度水平而不损害应用程序的性能。虽然线程调节策略人为地调整应用程序的线程级并行度，以根据其可伸缩性问题提高硬件资源利用率，但涡轮频率已被用于通过增加处理器的基本频率来加速给定应用程序的执行。鉴于此，我们推荐乌拉诺。它是一种热感知策略，结合了线程节流和涡轮模式优化，在不影响应用程序性能的情况下降低处理器的工作温度。通过在现代多核架构上执行12个著名的并行应用程序，我们证明，与并行应用程序的执行方式相比，Urano将峰值温度降低了17%，而对性能的影响最小。

{"title":"Thermal-Aware Thread and Turbo Frequency Throttling Optimization for Parallel Applications","authors":"Sandro M. Marques, F. Rossi, M. C. Luizelli, A. C. S. Beck, A. Lorenzon","doi":"10.1109/SBCCI55532.2022.9893245","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893245","url":null,"abstract":"The number of processing cores in multicore pro-cessors has been rising to deal with the levels performance required by modern applications. Concomitantly, the operating temperature of hardware components has become a primary concern due to economic and environmental perspectives. Hence, different software (e.g., thread throttling) and hardware (e.g., dynamic voltage and frequency scaling - DVFS) strategies have also been applied to reduce the processor temperature levels without jeopardizing the application's performance. While thread throttling strategies artificially tune the degree of thread-level parallelism of applications to improve the hardware resources utilization according to their scalability issues, turbo frequencies have been employed to speed up the execution of a given appli-cation by increasing the processor's frequencies above the base. Given that, we propose Urano. It is a thermal-aware strategy that combines thread throttling and turbo mode optimization to diminish the processor operating temperature without penalizing the performance of the application. Through the execution of twelve well-known parallel applications on a modern multicore architecture, we demonstrate that Urano decreases the peak temperature by up to 17% compared to how parallel applications are executed with minimal impact on the performance.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"55 51","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120888939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-Throughput Multifilter VLSI Design for the AV1 Fractional Motion Estimation AV1分数阶运动估计的高通量多滤波器VLSI设计

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893255

Daiane Freitas, Bruna Nagai, M. Grellert, C. Diniz, G. Corrêa

A main challenge for emerging video encoders is the high complexity introduced by their new encoding tools. In the royalty-free AV1 (AOMedia Video 1) codec, a large part of this complexity is focused on the inter prediction stage. This is particularly given to fractional motion estimation (FME), where a large number of FIR (Finite Impulse Response) type filters is used in the process of interpolation that generates fractional position samples given the integer position samples as input. Therefore, strategies to mitigate this complexity, such as designing hardware accelerators, are needed. Another recurring concern is the power dissipated as many users consume video media using battery-constrained devices. Based on that, this work introduces a dedicated multifilter hardware architecture for the AV1 codec interpolation filters with a focus on the motion estimation stage. The proposal implements the Regular, Sharp and Smooth filter families, using the operand isolation technique to avoid unnecessary power consumption. The designed architecture is capable of achieving a processing throughput of 3187.5 Msamples/sec for ME (Motion Estimation) operation, and can interpolate 8k videos resolution at 60 frames per second considering the MC (Motion Compensation) scenario.

新兴视频编码器面临的主要挑战是其新编码工具带来的高复杂性。在免版税的AV1 (amedia Video 1)编解码器中，这种复杂性的很大一部分集中在内部预测阶段。这尤其适用于分数运动估计(FME)，其中在插值过程中使用大量FIR(有限脉冲响应)类型滤波器，该滤波器在给定整数位置样本作为输入的情况下生成分数位置样本。因此，需要一些策略来减轻这种复杂性，例如设计硬件加速器。另一个反复出现的问题是，由于许多用户使用电池有限的设备来消费视频媒体，因此耗电量很大。在此基础上，本文介绍了AV1编解码器插值滤波器的专用多滤波器硬件架构，重点介绍了运动估计阶段。该方案利用运算数隔离技术实现了Regular、Sharp和Smooth滤波器族，避免了不必要的功耗。所设计的架构能够实现3187.5 m样本/秒的处理吞吐量，用于ME(运动估计)操作，并且考虑到MC(运动补偿)场景，可以以每秒60帧的速度插值8k视频分辨率。

{"title":"High-Throughput Multifilter VLSI Design for the AV1 Fractional Motion Estimation","authors":"Daiane Freitas, Bruna Nagai, M. Grellert, C. Diniz, G. Corrêa","doi":"10.1109/SBCCI55532.2022.9893255","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893255","url":null,"abstract":"A main challenge for emerging video encoders is the high complexity introduced by their new encoding tools. In the royalty-free AV1 (AOMedia Video 1) codec, a large part of this complexity is focused on the inter prediction stage. This is particularly given to fractional motion estimation (FME), where a large number of FIR (Finite Impulse Response) type filters is used in the process of interpolation that generates fractional position samples given the integer position samples as input. Therefore, strategies to mitigate this complexity, such as designing hardware accelerators, are needed. Another recurring concern is the power dissipated as many users consume video media using battery-constrained devices. Based on that, this work introduces a dedicated multifilter hardware architecture for the AV1 codec interpolation filters with a focus on the motion estimation stage. The proposal implements the Regular, Sharp and Smooth filter families, using the operand isolation technique to avoid unnecessary power consumption. The designed architecture is capable of achieving a processing throughput of 3187.5 Msamples/sec for ME (Motion Estimation) operation, and can interpolate 8k videos resolution at 60 frames per second considering the MC (Motion Compensation) scenario.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129607585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Exploring Machine Learning for Electrical Behavior Prediction: The CMOS Inverter Case Study 探索机器学习的电行为预测:CMOS逆变器案例研究

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893261

Gabriel Lima Jacinto, L. Y. Imamura, M. Grellert, C. Meinhardt

With the advancement of integrated circuit man-ufacturing technology, a growing number of aspects must be considered during the electrical characterization of circuits in order to solve challenges such as the effect of process variability. This increases the characterization time due to the use of techniques based on exhaustive electrical simulations. Machine learning techniques are consistently being employed to assist digital design at many levels of abstraction with various successful applications. Thus, the main objective of this work is to evaluate machine learning regression algorithms as an alternative to ex-haustive electrical simulation in the cell characterization project. In this step, multiple linear regression, support vector regression, decision trees, and random forest algorithms are considered. This work presents the results of a first case study: an Inverter using bulk CMOS technology. Specifically, the energy values and propagation times of this circuit will be separately predicted. A comparative analysis is done for each dependent variable between the models in order to understand which is the best regression model for the task. The algorithm with the lowest cost function proved to be Random Forests, with a R2 above 98% for all predicted variables.

随着集成电路人工制造技术的进步，为了解决工艺变异性的影响等挑战，在电路的电气表征过程中必须考虑越来越多的方面。由于使用了基于详尽的电模拟的技术，这增加了表征时间。机器学习技术一直被用于协助各种成功应用的许多抽象层次的数字设计。因此，这项工作的主要目的是评估机器学习回归算法作为细胞表征项目中详尽电模拟的替代方案。在这一步中，考虑了多元线性回归、支持向量回归、决策树和随机森林算法。这项工作提出了第一个案例研究的结果:使用大块CMOS技术的逆变器。具体来说，将分别预测该电路的能量值和传播时间。对模型之间的每个因变量进行比较分析，以便了解哪一个是任务的最佳回归模型。成本函数最小的算法被证明是随机森林，所有预测变量的R2都在98%以上。

{"title":"Exploring Machine Learning for Electrical Behavior Prediction: The CMOS Inverter Case Study","authors":"Gabriel Lima Jacinto, L. Y. Imamura, M. Grellert, C. Meinhardt","doi":"10.1109/SBCCI55532.2022.9893261","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893261","url":null,"abstract":"With the advancement of integrated circuit man-ufacturing technology, a growing number of aspects must be considered during the electrical characterization of circuits in order to solve challenges such as the effect of process variability. This increases the characterization time due to the use of techniques based on exhaustive electrical simulations. Machine learning techniques are consistently being employed to assist digital design at many levels of abstraction with various successful applications. Thus, the main objective of this work is to evaluate machine learning regression algorithms as an alternative to ex-haustive electrical simulation in the cell characterization project. In this step, multiple linear regression, support vector regression, decision trees, and random forest algorithms are considered. This work presents the results of a first case study: an Inverter using bulk CMOS technology. Specifically, the energy values and propagation times of this circuit will be separately predicted. A comparative analysis is done for each dependent variable between the models in order to understand which is the best regression model for the task. The algorithm with the lowest cost function proved to be Random Forests, with a R2 above 98% for all predicted variables.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130046465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Low-Frequency Non-Separable Transform Hardware System Design for the VVC Encoder VVC编码器的低频不可分变换硬件系统设计

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893228

J. Goebel, L. Agostini, B. Zatt, M. Porto

This paper presents a dedicated hardware system design for the Low-Frequency Non-Separable Transform (LFNST) of the Versatile Video Coding (VVC/H.266) standard. The LFNST is a secondary transform used to transform the coefficients that came from the DCT-II transform. The developed design exploits two clock domains, where the LFNST core is working at 746.48 MHz and the primary transform can operate at a slower clock of only 186.62MHz to be able to process Ultra-High Definition (UHD) videos with $4098times 2160$ pixels (4K) at 60 frames per second. The whole LFNST hardware system design presents an area utilization of 57.3 Kgates and a power dissipation of 32.22 mW (processing the LFNST $4times 4$ through TU size of $4times 4$) when synthesized for an ASIC implementation with a 40nm technology standard cells library.

本文针对通用视频编码(VVC/H.266)标准的低频不可分变换(LFNST)设计了专用硬件系统。LFNST是二级变换，用于变换来自DCT-II变换的系数。开发的设计利用了两个时钟域，其中LFNST核心工作在746.48 MHz，主变换可以在仅186.62MHz的较慢时钟下工作，以便能够以每秒60帧的速度处理4098 × 2160像素(4K)的超高清(UHD)视频。整个LFNST硬件系统设计的面积利用率为57.3 Kgates，功耗为32.22 mW(处理LFNST $4 × 4$通过TU尺寸为$4 × 4$)，用于采用40nm技术标准单元库的ASIC实现。

引用次数: 1

CSIP: A Compact Scrypt IP design with single PBKDF2 core for Blockchain mining CSIP:一个紧凑的Scrypt IP设计，具有单个PBKDF2核心，用于区块链挖掘

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893217

V. Le, Pham Hoai Luan, T. Tran, Y. Nakashima

Developing compact and energy-efficient Scrypt hardware for power-constrained devices is necessary to balance the distribution of blockchain networks. However, existing Scrypt circuits are challenging to achieve in a compact area and energy-efficient since they focus only on maximizing hash performance. Therefore, this paper proposes a Compact Scrypt IP (CSIP) architecture to reduce power consumption while maintaining hashing performance for blockchain mining. Specifically, CSIP uses only one SHA-256 core inside one PBKDF2 core to minimize hardware resources, thus decreasing power consumption significantly. Furthermore, CSIP supports the configuration of parameters to suit the various requirements of blockchain mining. The CSIP design is successfully implemented and verified on a Xilinx Zynq UltraScale+ MPSoC ZCU102 FPGA at a system-on-chip level. Accordingly, the energy efficiency of the CSIP on ZCU102 FPGA is 322 times and 9 times higher than Intel i9-10940X CPU and Nvidia Tesla V100 GPU, respectively. Finally, the experimental results on Xilinx Virtex-7 VC707 FPGA show that the proposed CSIP is significantly better than existing Scrypt architectures in area, power, and energy efficiency.

为功率受限的设备开发紧凑且节能的Scrypt硬件对于平衡区块链网络的分布是必要的。然而，现有的Scrypt电路在紧凑的区域和节能方面具有挑战性，因为它们只关注最大化哈希性能。因此，本文提出了一种Compact Scrypt IP (CSIP)架构，以降低功耗，同时保持区块链挖掘的哈希性能。具体来说，CSIP在一个PBKDF2核内只使用一个SHA-256核，最大限度地减少了硬件资源，从而显著降低了功耗。此外，CSIP支持参数配置，以适应区块链挖掘的各种需求。CSIP设计在Xilinx Zynq UltraScale+ MPSoC ZCU102 FPGA上成功实现并验证了系统级芯片。因此，ZCU102 FPGA上CSIP的能效分别是Intel i9-10940X CPU和Nvidia Tesla V100 GPU的322倍和9倍。最后，在Xilinx Virtex-7 VC707 FPGA上的实验结果表明，所提出的CSIP在面积、功耗和能效方面明显优于现有的Scrypt架构。

{"title":"CSIP: A Compact Scrypt IP design with single PBKDF2 core for Blockchain mining","authors":"V. Le, Pham Hoai Luan, T. Tran, Y. Nakashima","doi":"10.1109/SBCCI55532.2022.9893217","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893217","url":null,"abstract":"Developing compact and energy-efficient Scrypt hardware for power-constrained devices is necessary to balance the distribution of blockchain networks. However, existing Scrypt circuits are challenging to achieve in a compact area and energy-efficient since they focus only on maximizing hash performance. Therefore, this paper proposes a Compact Scrypt IP (CSIP) architecture to reduce power consumption while maintaining hashing performance for blockchain mining. Specifically, CSIP uses only one SHA-256 core inside one PBKDF2 core to minimize hardware resources, thus decreasing power consumption significantly. Furthermore, CSIP supports the configuration of parameters to suit the various requirements of blockchain mining. The CSIP design is successfully implemented and verified on a Xilinx Zynq UltraScale+ MPSoC ZCU102 FPGA at a system-on-chip level. Accordingly, the energy efficiency of the CSIP on ZCU102 FPGA is 322 times and 9 times higher than Intel i9-10940X CPU and Nvidia Tesla V100 GPU, respectively. Finally, the experimental results on Xilinx Virtex-7 VC707 FPGA show that the proposed CSIP is significantly better than existing Scrypt architectures in area, power, and energy efficiency.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115328148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀