Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893224
M. Saikiran, Mona Ganji, Degang Chen
Defect simulation time in AMS circuits is rapidly growing due to increasing circuit complexity, especially in safety-critical automotive applications which needs to meet very high defect coverage (usually >90%). Reduction in defect simulation time directly translates into reduction in overall development time. In this work, we propose a time-efficient framework to simulate various defects during pre-silicon testing of AMS circuits. The proposed method uses Verilog-A modules to realize a given defect model and tests nearly all the defects in a circuit with a single test run (for a given test condition) depending on the defect-detection scheme. To strongly validate our framework, we use two distinct defect detection schemes for operational amplifiers. The first detection scheme is the intentional offset injection (IOI) method which, predominantly, is a DC testing scheme. For this scheme, the proposed framework achieved a time-saving factor of more than 10X compared to the conventional framework. The second scheme is the oscillation test method (OTM) which is a transient testing scheme. For this OTM scheme, we show that the proposed framework can reduce the simulation time to less than 50% of the conventional simulation time. We also show that the proposed framework has no negative impact on defect coverage.
{"title":"A Time-Efficient Defect Simulation Framework for Analog and Mixed Signal (AMS) Circuits","authors":"M. Saikiran, Mona Ganji, Degang Chen","doi":"10.1109/SBCCI55532.2022.9893224","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893224","url":null,"abstract":"Defect simulation time in AMS circuits is rapidly growing due to increasing circuit complexity, especially in safety-critical automotive applications which needs to meet very high defect coverage (usually >90%). Reduction in defect simulation time directly translates into reduction in overall development time. In this work, we propose a time-efficient framework to simulate various defects during pre-silicon testing of AMS circuits. The proposed method uses Verilog-A modules to realize a given defect model and tests nearly all the defects in a circuit with a single test run (for a given test condition) depending on the defect-detection scheme. To strongly validate our framework, we use two distinct defect detection schemes for operational amplifiers. The first detection scheme is the intentional offset injection (IOI) method which, predominantly, is a DC testing scheme. For this scheme, the proposed framework achieved a time-saving factor of more than 10X compared to the conventional framework. The second scheme is the oscillation test method (OTM) which is a transient testing scheme. For this OTM scheme, we show that the proposed framework can reduce the simulation time to less than 50% of the conventional simulation time. We also show that the proposed framework has no negative impact on defect coverage.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116221222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893221
Hugo Rodríguez, Jimmy Tarrillo
This work proposes a low energy forwarding routing algorithm for Wireless Sensor Networks (WSN). The proposed algorithm is based on the Minimum Cost Forwarding Algorithm (MCFA) and utilizes a cost function in order to determine the best transmission routes with the lowest energy consumption. The proposed cost function is calculated considering the link quality between nodes and the transmission cost of the neighboring nodes. The link quality is estimated in a bidirectional way, which means it considers reception quality and transmission quality, and is power aware. For the reception quality WMEWMA is used, and for the transmission quality the power of the transceiver and the number of transmission attempts are considered. The performance of the proposed algorithm is tested in three scenarios and compared with the performance in the same scenarios of MCFA as routing algorithm with WMEWMA as cost function. For the purpose of testing, physical nodes were designed and built using ATmega328P microprocessor and nRF24L01 transceiver.
{"title":"Energy-Efficient Forwarding Routing Algorithm with bidirectional link quality estimator for Wireless Sensor Networks","authors":"Hugo Rodríguez, Jimmy Tarrillo","doi":"10.1109/SBCCI55532.2022.9893221","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893221","url":null,"abstract":"This work proposes a low energy forwarding routing algorithm for Wireless Sensor Networks (WSN). The proposed algorithm is based on the Minimum Cost Forwarding Algorithm (MCFA) and utilizes a cost function in order to determine the best transmission routes with the lowest energy consumption. The proposed cost function is calculated considering the link quality between nodes and the transmission cost of the neighboring nodes. The link quality is estimated in a bidirectional way, which means it considers reception quality and transmission quality, and is power aware. For the reception quality WMEWMA is used, and for the transmission quality the power of the transceiver and the number of transmission attempts are considered. The performance of the proposed algorithm is tested in three scenarios and compared with the performance in the same scenarios of MCFA as routing algorithm with WMEWMA as cost function. For the purpose of testing, physical nodes were designed and built using ATmega328P microprocessor and nRF24L01 transceiver.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"06 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129354417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893216
Thomas Soupizet, Zalfa Jouni, João F. Sulzbach, A. Benlarbi-Delai, Pietro M. Ferreira
Novel non-Von-Neumann solutions based on artificial intelligence (AI) have surfaced such as the neuromorphic spiking processors in either analog or digital domain. This paper proposes to study the feasibility of deep neural networks on ultra-low-power eNeuron technology. The trade-offs in terms of deep learning capabilities and energy efficiency are highlighted. This study reveals that published eNeurons and synapses satisfy linear fittings for an excitation current greater than 200 pA and a spiking frequency higher than 150 kHz, where energy efficiency is optimal. Thus, deep learning and energy efficiency are mutually exclusive for studied analog spiking neurons.
{"title":"Deep Neural Network Feasibility Using Analog Spiking Neurons","authors":"Thomas Soupizet, Zalfa Jouni, João F. Sulzbach, A. Benlarbi-Delai, Pietro M. Ferreira","doi":"10.1109/SBCCI55532.2022.9893216","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893216","url":null,"abstract":"Novel non-Von-Neumann solutions based on artificial intelligence (AI) have surfaced such as the neuromorphic spiking processors in either analog or digital domain. This paper proposes to study the feasibility of deep neural networks on ultra-low-power eNeuron technology. The trade-offs in terms of deep learning capabilities and energy efficiency are highlighted. This study reveals that published eNeurons and synapses satisfy linear fittings for an excitation current greater than 200 pA and a spiking frequency higher than 150 kHz, where energy efficiency is optimal. Thus, deep learning and energy efficiency are mutually exclusive for studied analog spiking neurons.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122520166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893260
Julian Haase, Alexander Groß, Maximilian Feichter, D. Göhringer
Network-on-Chip (NoC) is the central communication infrastructure of modern Multi-Processor Systems-on-Chip (MPSoCs), as the number of processing elements integrated on a single chip is continuously increasing. The exploration of the huge design space offered by novel NoC-based MPSoC architectures requires early and accurate system modeling and simulation. This paper introduces PANACA, an open-source highly configurable NoC simulator written in SystemC-TLM. PANACA enables fast simulation of MPSoCs using NoC-based architectures and is designed for a modular, flexible and precise modeling of network elements. It offers a wide set of accurate configurable parameters, such as topology, routing algorithm and flow control. The provided simulation and exploration management allows a detailed and automated evaluation of the huge design space.
{"title":"PANACA: An Open-Source Configurable Network-on-Chip Simulation Platform","authors":"Julian Haase, Alexander Groß, Maximilian Feichter, D. Göhringer","doi":"10.1109/SBCCI55532.2022.9893260","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893260","url":null,"abstract":"Network-on-Chip (NoC) is the central communication infrastructure of modern Multi-Processor Systems-on-Chip (MPSoCs), as the number of processing elements integrated on a single chip is continuously increasing. The exploration of the huge design space offered by novel NoC-based MPSoC architectures requires early and accurate system modeling and simulation. This paper introduces PANACA, an open-source highly configurable NoC simulator written in SystemC-TLM. PANACA enables fast simulation of MPSoCs using NoC-based architectures and is designed for a modular, flexible and precise modeling of network elements. It offers a wide set of accurate configurable parameters, such as topology, routing algorithm and flow control. The provided simulation and exploration management allows a detailed and automated evaluation of the huge design space.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122544287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893245
Sandro M. Marques, F. Rossi, M. C. Luizelli, A. C. S. Beck, A. Lorenzon
The number of processing cores in multicore pro-cessors has been rising to deal with the levels performance required by modern applications. Concomitantly, the operating temperature of hardware components has become a primary concern due to economic and environmental perspectives. Hence, different software (e.g., thread throttling) and hardware (e.g., dynamic voltage and frequency scaling - DVFS) strategies have also been applied to reduce the processor temperature levels without jeopardizing the application's performance. While thread throttling strategies artificially tune the degree of thread-level parallelism of applications to improve the hardware resources utilization according to their scalability issues, turbo frequencies have been employed to speed up the execution of a given appli-cation by increasing the processor's frequencies above the base. Given that, we propose Urano. It is a thermal-aware strategy that combines thread throttling and turbo mode optimization to diminish the processor operating temperature without penalizing the performance of the application. Through the execution of twelve well-known parallel applications on a modern multicore architecture, we demonstrate that Urano decreases the peak temperature by up to 17% compared to how parallel applications are executed with minimal impact on the performance.
{"title":"Thermal-Aware Thread and Turbo Frequency Throttling Optimization for Parallel Applications","authors":"Sandro M. Marques, F. Rossi, M. C. Luizelli, A. C. S. Beck, A. Lorenzon","doi":"10.1109/SBCCI55532.2022.9893245","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893245","url":null,"abstract":"The number of processing cores in multicore pro-cessors has been rising to deal with the levels performance required by modern applications. Concomitantly, the operating temperature of hardware components has become a primary concern due to economic and environmental perspectives. Hence, different software (e.g., thread throttling) and hardware (e.g., dynamic voltage and frequency scaling - DVFS) strategies have also been applied to reduce the processor temperature levels without jeopardizing the application's performance. While thread throttling strategies artificially tune the degree of thread-level parallelism of applications to improve the hardware resources utilization according to their scalability issues, turbo frequencies have been employed to speed up the execution of a given appli-cation by increasing the processor's frequencies above the base. Given that, we propose Urano. It is a thermal-aware strategy that combines thread throttling and turbo mode optimization to diminish the processor operating temperature without penalizing the performance of the application. Through the execution of twelve well-known parallel applications on a modern multicore architecture, we demonstrate that Urano decreases the peak temperature by up to 17% compared to how parallel applications are executed with minimal impact on the performance.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"55 51","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120888939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893255
Daiane Freitas, Bruna Nagai, M. Grellert, C. Diniz, G. Corrêa
A main challenge for emerging video encoders is the high complexity introduced by their new encoding tools. In the royalty-free AV1 (AOMedia Video 1) codec, a large part of this complexity is focused on the inter prediction stage. This is particularly given to fractional motion estimation (FME), where a large number of FIR (Finite Impulse Response) type filters is used in the process of interpolation that generates fractional position samples given the integer position samples as input. Therefore, strategies to mitigate this complexity, such as designing hardware accelerators, are needed. Another recurring concern is the power dissipated as many users consume video media using battery-constrained devices. Based on that, this work introduces a dedicated multifilter hardware architecture for the AV1 codec interpolation filters with a focus on the motion estimation stage. The proposal implements the Regular, Sharp and Smooth filter families, using the operand isolation technique to avoid unnecessary power consumption. The designed architecture is capable of achieving a processing throughput of 3187.5 Msamples/sec for ME (Motion Estimation) operation, and can interpolate 8k videos resolution at 60 frames per second considering the MC (Motion Compensation) scenario.
新兴视频编码器面临的主要挑战是其新编码工具带来的高复杂性。在免版税的AV1 (amedia Video 1)编解码器中,这种复杂性的很大一部分集中在内部预测阶段。这尤其适用于分数运动估计(FME),其中在插值过程中使用大量FIR(有限脉冲响应)类型滤波器,该滤波器在给定整数位置样本作为输入的情况下生成分数位置样本。因此,需要一些策略来减轻这种复杂性,例如设计硬件加速器。另一个反复出现的问题是,由于许多用户使用电池有限的设备来消费视频媒体,因此耗电量很大。在此基础上,本文介绍了AV1编解码器插值滤波器的专用多滤波器硬件架构,重点介绍了运动估计阶段。该方案利用运算数隔离技术实现了Regular、Sharp和Smooth滤波器族,避免了不必要的功耗。所设计的架构能够实现3187.5 m样本/秒的处理吞吐量,用于ME(运动估计)操作,并且考虑到MC(运动补偿)场景,可以以每秒60帧的速度插值8k视频分辨率。
{"title":"High-Throughput Multifilter VLSI Design for the AV1 Fractional Motion Estimation","authors":"Daiane Freitas, Bruna Nagai, M. Grellert, C. Diniz, G. Corrêa","doi":"10.1109/SBCCI55532.2022.9893255","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893255","url":null,"abstract":"A main challenge for emerging video encoders is the high complexity introduced by their new encoding tools. In the royalty-free AV1 (AOMedia Video 1) codec, a large part of this complexity is focused on the inter prediction stage. This is particularly given to fractional motion estimation (FME), where a large number of FIR (Finite Impulse Response) type filters is used in the process of interpolation that generates fractional position samples given the integer position samples as input. Therefore, strategies to mitigate this complexity, such as designing hardware accelerators, are needed. Another recurring concern is the power dissipated as many users consume video media using battery-constrained devices. Based on that, this work introduces a dedicated multifilter hardware architecture for the AV1 codec interpolation filters with a focus on the motion estimation stage. The proposal implements the Regular, Sharp and Smooth filter families, using the operand isolation technique to avoid unnecessary power consumption. The designed architecture is capable of achieving a processing throughput of 3187.5 Msamples/sec for ME (Motion Estimation) operation, and can interpolate 8k videos resolution at 60 frames per second considering the MC (Motion Compensation) scenario.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129607585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893261
Gabriel Lima Jacinto, L. Y. Imamura, M. Grellert, C. Meinhardt
With the advancement of integrated circuit man-ufacturing technology, a growing number of aspects must be considered during the electrical characterization of circuits in order to solve challenges such as the effect of process variability. This increases the characterization time due to the use of techniques based on exhaustive electrical simulations. Machine learning techniques are consistently being employed to assist digital design at many levels of abstraction with various successful applications. Thus, the main objective of this work is to evaluate machine learning regression algorithms as an alternative to ex-haustive electrical simulation in the cell characterization project. In this step, multiple linear regression, support vector regression, decision trees, and random forest algorithms are considered. This work presents the results of a first case study: an Inverter using bulk CMOS technology. Specifically, the energy values and propagation times of this circuit will be separately predicted. A comparative analysis is done for each dependent variable between the models in order to understand which is the best regression model for the task. The algorithm with the lowest cost function proved to be Random Forests, with a R2 above 98% for all predicted variables.
{"title":"Exploring Machine Learning for Electrical Behavior Prediction: The CMOS Inverter Case Study","authors":"Gabriel Lima Jacinto, L. Y. Imamura, M. Grellert, C. Meinhardt","doi":"10.1109/SBCCI55532.2022.9893261","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893261","url":null,"abstract":"With the advancement of integrated circuit man-ufacturing technology, a growing number of aspects must be considered during the electrical characterization of circuits in order to solve challenges such as the effect of process variability. This increases the characterization time due to the use of techniques based on exhaustive electrical simulations. Machine learning techniques are consistently being employed to assist digital design at many levels of abstraction with various successful applications. Thus, the main objective of this work is to evaluate machine learning regression algorithms as an alternative to ex-haustive electrical simulation in the cell characterization project. In this step, multiple linear regression, support vector regression, decision trees, and random forest algorithms are considered. This work presents the results of a first case study: an Inverter using bulk CMOS technology. Specifically, the energy values and propagation times of this circuit will be separately predicted. A comparative analysis is done for each dependent variable between the models in order to understand which is the best regression model for the task. The algorithm with the lowest cost function proved to be Random Forests, with a R2 above 98% for all predicted variables.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130046465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893228
J. Goebel, L. Agostini, B. Zatt, M. Porto
This paper presents a dedicated hardware system design for the Low-Frequency Non-Separable Transform (LFNST) of the Versatile Video Coding (VVC/H.266) standard. The LFNST is a secondary transform used to transform the coefficients that came from the DCT-II transform. The developed design exploits two clock domains, where the LFNST core is working at 746.48 MHz and the primary transform can operate at a slower clock of only 186.62MHz to be able to process Ultra-High Definition (UHD) videos with $4098times 2160$ pixels (4K) at 60 frames per second. The whole LFNST hardware system design presents an area utilization of 57.3 Kgates and a power dissipation of 32.22 mW (processing the LFNST $4times 4$ through TU size of $4times 4$) when synthesized for an ASIC implementation with a 40nm technology standard cells library.
{"title":"Low-Frequency Non-Separable Transform Hardware System Design for the VVC Encoder","authors":"J. Goebel, L. Agostini, B. Zatt, M. Porto","doi":"10.1109/SBCCI55532.2022.9893228","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893228","url":null,"abstract":"This paper presents a dedicated hardware system design for the Low-Frequency Non-Separable Transform (LFNST) of the Versatile Video Coding (VVC/H.266) standard. The LFNST is a secondary transform used to transform the coefficients that came from the DCT-II transform. The developed design exploits two clock domains, where the LFNST core is working at 746.48 MHz and the primary transform can operate at a slower clock of only 186.62MHz to be able to process Ultra-High Definition (UHD) videos with $4098times 2160$ pixels (4K) at 60 frames per second. The whole LFNST hardware system design presents an area utilization of 57.3 Kgates and a power dissipation of 32.22 mW (processing the LFNST $4times 4$ through TU size of $4times 4$) when synthesized for an ASIC implementation with a 40nm technology standard cells library.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133000079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893217
V. Le, Pham Hoai Luan, T. Tran, Y. Nakashima
Developing compact and energy-efficient Scrypt hardware for power-constrained devices is necessary to balance the distribution of blockchain networks. However, existing Scrypt circuits are challenging to achieve in a compact area and energy-efficient since they focus only on maximizing hash performance. Therefore, this paper proposes a Compact Scrypt IP (CSIP) architecture to reduce power consumption while maintaining hashing performance for blockchain mining. Specifically, CSIP uses only one SHA-256 core inside one PBKDF2 core to minimize hardware resources, thus decreasing power consumption significantly. Furthermore, CSIP supports the configuration of parameters to suit the various requirements of blockchain mining. The CSIP design is successfully implemented and verified on a Xilinx Zynq UltraScale+ MPSoC ZCU102 FPGA at a system-on-chip level. Accordingly, the energy efficiency of the CSIP on ZCU102 FPGA is 322 times and 9 times higher than Intel i9-10940X CPU and Nvidia Tesla V100 GPU, respectively. Finally, the experimental results on Xilinx Virtex-7 VC707 FPGA show that the proposed CSIP is significantly better than existing Scrypt architectures in area, power, and energy efficiency.
为功率受限的设备开发紧凑且节能的Scrypt硬件对于平衡区块链网络的分布是必要的。然而,现有的Scrypt电路在紧凑的区域和节能方面具有挑战性,因为它们只关注最大化哈希性能。因此,本文提出了一种Compact Scrypt IP (CSIP)架构,以降低功耗,同时保持区块链挖掘的哈希性能。具体来说,CSIP在一个PBKDF2核内只使用一个SHA-256核,最大限度地减少了硬件资源,从而显著降低了功耗。此外,CSIP支持参数配置,以适应区块链挖掘的各种需求。CSIP设计在Xilinx Zynq UltraScale+ MPSoC ZCU102 FPGA上成功实现并验证了系统级芯片。因此,ZCU102 FPGA上CSIP的能效分别是Intel i9-10940X CPU和Nvidia Tesla V100 GPU的322倍和9倍。最后,在Xilinx Virtex-7 VC707 FPGA上的实验结果表明,所提出的CSIP在面积、功耗和能效方面明显优于现有的Scrypt架构。
{"title":"CSIP: A Compact Scrypt IP design with single PBKDF2 core for Blockchain mining","authors":"V. Le, Pham Hoai Luan, T. Tran, Y. Nakashima","doi":"10.1109/SBCCI55532.2022.9893217","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893217","url":null,"abstract":"Developing compact and energy-efficient Scrypt hardware for power-constrained devices is necessary to balance the distribution of blockchain networks. However, existing Scrypt circuits are challenging to achieve in a compact area and energy-efficient since they focus only on maximizing hash performance. Therefore, this paper proposes a Compact Scrypt IP (CSIP) architecture to reduce power consumption while maintaining hashing performance for blockchain mining. Specifically, CSIP uses only one SHA-256 core inside one PBKDF2 core to minimize hardware resources, thus decreasing power consumption significantly. Furthermore, CSIP supports the configuration of parameters to suit the various requirements of blockchain mining. The CSIP design is successfully implemented and verified on a Xilinx Zynq UltraScale+ MPSoC ZCU102 FPGA at a system-on-chip level. Accordingly, the energy efficiency of the CSIP on ZCU102 FPGA is 322 times and 9 times higher than Intel i9-10940X CPU and Nvidia Tesla V100 GPU, respectively. Finally, the experimental results on Xilinx Virtex-7 VC707 FPGA show that the proposed CSIP is significantly better than existing Scrypt architectures in area, power, and energy efficiency.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115328148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}