Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893247
Zalfa Jouni, Thomas Soupizet, Siqi Wang, A. Benlarbi-Delai, Pietro M. Ferreira
Low-cost devices with ultra-low-power radio ca-pabilities are a major challenge in smart devices, while a permanently-on receiver is required for smart communication. This paper proposes a wake-up radio with a neuromorphic pre-processing system both biased in weak inversion region. The system can receive a 2.4 GHz signal, demodulate it, and recognize bit patterns based on the spiking frequency of a neuron. Significant performance is obtained with 1.2 nW of total power consumption, which is at least three orders of magnitude less than the conventional RF envelope detectors. Further, spiking frequency responsiveness over input power suggests that the proposed system can distinguish different signals at 2.4 GHz. The proposed system achieves an energy efficiency of 1.2 pJ/bit with a minimum detectable signal of -27 dBm.
{"title":"1.2 nW Neuromorphic Enhanced Wake-Up Radio","authors":"Zalfa Jouni, Thomas Soupizet, Siqi Wang, A. Benlarbi-Delai, Pietro M. Ferreira","doi":"10.1109/SBCCI55532.2022.9893247","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893247","url":null,"abstract":"Low-cost devices with ultra-low-power radio ca-pabilities are a major challenge in smart devices, while a permanently-on receiver is required for smart communication. This paper proposes a wake-up radio with a neuromorphic pre-processing system both biased in weak inversion region. The system can receive a 2.4 GHz signal, demodulate it, and recognize bit patterns based on the spiking frequency of a neuron. Significant performance is obtained with 1.2 nW of total power consumption, which is at least three orders of magnitude less than the conventional RF envelope detectors. Further, spiking frequency responsiveness over input power suggests that the proposed system can distinguish different signals at 2.4 GHz. The proposed system achieves an energy efficiency of 1.2 pJ/bit with a minimum detectable signal of -27 dBm.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"34 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123146987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893257
Pham Hoai Luan, T. Tran, V. Le, Y. Nakashima
Developing flexible and energy-efficient BLAKE-256/2s hardware has recently become necessary since BLAKE-256 and BLAKE2s are important cryptographic hash functions in reliability and security enhancement for blockchain-based IoT applications. However, previous BLAKE-256/2s architectures are challenging in achieving high flexibility and energy efficiency. Therefore, this paper proposes the BLAKE-256/2s co-processor to achieve high flexibility and energy efficiency for blockchain-based IoT applications. The proposed BLAKE-256/2s accelerator has three novel optimization techniques to achieve those goals. First, a configurable hashing core is proposed to enhance flexibility. Second, a pipelined permutation and compression architecture are developed to improve the throughput and hardware efficiency. Third, a mining transmission mechanism is introduced to optimize the performance of our co-processor at the system-on-chip level. The proposed co-processor is implemented and verified on a Xilinx Zynq $mathbf{UltraScale}+$ MPSoC ZCU102 FPGA. Accordingly, the power and energy efficiency of the co-processor on the ZCU102 FPGA is significantly better than the Intel i9 10940X CPU and the RTX 3090 GPU. Moreover, experimental results on several FPGAs prove that the proposed co-processor is considerably higher throughput, area efficiency, and flexibility than FPGA-based related works.
{"title":"A Flexible and Energy-Efficient BLAKE-256/2s Co-Processor for Blockchain-based IoT Applications","authors":"Pham Hoai Luan, T. Tran, V. Le, Y. Nakashima","doi":"10.1109/SBCCI55532.2022.9893257","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893257","url":null,"abstract":"Developing flexible and energy-efficient BLAKE-256/2s hardware has recently become necessary since BLAKE-256 and BLAKE2s are important cryptographic hash functions in reliability and security enhancement for blockchain-based IoT applications. However, previous BLAKE-256/2s architectures are challenging in achieving high flexibility and energy efficiency. Therefore, this paper proposes the BLAKE-256/2s co-processor to achieve high flexibility and energy efficiency for blockchain-based IoT applications. The proposed BLAKE-256/2s accelerator has three novel optimization techniques to achieve those goals. First, a configurable hashing core is proposed to enhance flexibility. Second, a pipelined permutation and compression architecture are developed to improve the throughput and hardware efficiency. Third, a mining transmission mechanism is introduced to optimize the performance of our co-processor at the system-on-chip level. The proposed co-processor is implemented and verified on a Xilinx Zynq $mathbf{UltraScale}+$ MPSoC ZCU102 FPGA. Accordingly, the power and energy efficiency of the co-processor on the ZCU102 FPGA is significantly better than the Intel i9 10940X CPU and the RTX 3090 GPU. Moreover, experimental results on several FPGAs prove that the proposed co-processor is considerably higher throughput, area efficiency, and flexibility than FPGA-based related works.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122464749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893229
Hector Gerardo Muñoz Hernandez, Florian Fricke, Muhammed Al Kadi, M. Reichenbach, Michael Hübner
Using Graphical Processing Units (GPUs) to accelerate applications such as Deep Neural Networks (DNNs) or image processing tasks has been gaining lots of attention for some time now due to their high performance. This has brought a lot of attention to creating toolflows where hardware designing expertise is not a requirement anymore, resulting in making the technology accessible to as many users as possible. However, hardware knowledge is still important to make the toolflows run efficiently on the target platform. In this paper, we introduce a framework that uses Jupyter Notebooks, a browser based interactive computing environment, and PYNQ which is as a high-level-abstraction interface between the Programmable Logic (PL) and the Processing System (PS) from which the user can unlock the full potential of a highly customizable soft-core GPU running on an Field-programmable Gate Array (FPGA). The framework is open-source and requires only a few set-up steps to get applications running by re-using the existing Jupyter Notebooks as templates, making it ideal for fast prototyping and educational purposes. Moreover, there is also the possibility to customize the architecture of the target hardware to fit performance, resource utilization, and functional requirements. This framework also supports floating-point operations and can be ported to System on Chip (SoC) devices like the Xilinx Zynq-7000 family, among others.
{"title":"Edge GPU based on an FPGA Overlay Architecture using PYNQ","authors":"Hector Gerardo Muñoz Hernandez, Florian Fricke, Muhammed Al Kadi, M. Reichenbach, Michael Hübner","doi":"10.1109/SBCCI55532.2022.9893229","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893229","url":null,"abstract":"Using Graphical Processing Units (GPUs) to accelerate applications such as Deep Neural Networks (DNNs) or image processing tasks has been gaining lots of attention for some time now due to their high performance. This has brought a lot of attention to creating toolflows where hardware designing expertise is not a requirement anymore, resulting in making the technology accessible to as many users as possible. However, hardware knowledge is still important to make the toolflows run efficiently on the target platform. In this paper, we introduce a framework that uses Jupyter Notebooks, a browser based interactive computing environment, and PYNQ which is as a high-level-abstraction interface between the Programmable Logic (PL) and the Processing System (PS) from which the user can unlock the full potential of a highly customizable soft-core GPU running on an Field-programmable Gate Array (FPGA). The framework is open-source and requires only a few set-up steps to get applications running by re-using the existing Jupyter Notebooks as templates, making it ideal for fast prototyping and educational purposes. Moreover, there is also the possibility to customize the architecture of the target hardware to fit performance, resource utilization, and functional requirements. This framework also supports floating-point operations and can be ported to System on Chip (SoC) devices like the Xilinx Zynq-7000 family, among others.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122241971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893243
M. Saikiran, Mona Ganji, Degang Chen
Low Dropout (LDO) voltage regulator is one of the most commonly used blocks in integrated circuits (IC). In contrast to classic LDOs, flipped voltage follower (FVF) LDOs are capable of sourcing high current loads as well as providing high bandwidth (increased PSRR) due to the presence of a fast local loop. In mission-critical applications such as automotive, industrial, and space applications, functional safety (FuSa) is a very important requirement. To emphasize this requirement, ISO26262 Standard for functional safety recommends an automotive IC to have a very high defect coverage (usually greater than 90%). In this work, we propose an extremely simple and low-cost defect detection methodology for a folded FVF LDO providing high defect-coverage. Furthermore, as the proposed method is time-efficient, it can also be incorporated wafer-level production testing of the SoC and reduce the test time. The proposed design for test (DfT) defect detection method uses completely digital injection and detection circuits, making the method robust and easy to implement. Additionally, the digital nature of the method makes it an ideal candidate in an SoC where digital control and monitor bus (like IJTAG) is already available. The circuit under test (CUT) used in this work is designed in 65nm UMC technology. In this paper, in addition to the defect coverage results with the proposed method, we also present defect-coverage results for our CUT with defect detection methods proposed in the literature for comparison. The transistor-level fault simulations confirm that the proposed method has high fault coverage of 94% with less than 4% area overhead making it extremely area-efficient.
{"title":"Digital Defect-Oriented Test Methodology for Flipped Voltage Follower Low Dropout (LDO) Voltage Regulators","authors":"M. Saikiran, Mona Ganji, Degang Chen","doi":"10.1109/SBCCI55532.2022.9893243","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893243","url":null,"abstract":"Low Dropout (LDO) voltage regulator is one of the most commonly used blocks in integrated circuits (IC). In contrast to classic LDOs, flipped voltage follower (FVF) LDOs are capable of sourcing high current loads as well as providing high bandwidth (increased PSRR) due to the presence of a fast local loop. In mission-critical applications such as automotive, industrial, and space applications, functional safety (FuSa) is a very important requirement. To emphasize this requirement, ISO26262 Standard for functional safety recommends an automotive IC to have a very high defect coverage (usually greater than 90%). In this work, we propose an extremely simple and low-cost defect detection methodology for a folded FVF LDO providing high defect-coverage. Furthermore, as the proposed method is time-efficient, it can also be incorporated wafer-level production testing of the SoC and reduce the test time. The proposed design for test (DfT) defect detection method uses completely digital injection and detection circuits, making the method robust and easy to implement. Additionally, the digital nature of the method makes it an ideal candidate in an SoC where digital control and monitor bus (like IJTAG) is already available. The circuit under test (CUT) used in this work is designed in 65nm UMC technology. In this paper, in addition to the defect coverage results with the proposed method, we also present defect-coverage results for our CUT with defect detection methods proposed in the literature for comparison. The transistor-level fault simulations confirm that the proposed method has high fault coverage of 94% with less than 4% area overhead making it extremely area-efficient.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126999712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893259
Alcides S. Costa, Leonardo Droves Silveira, A. Reis
This paper presents a novel approach for prototyping digital circuits and teaching digital design. The approach uses a client-server architecture and aims to reduce the use of computing resources and software maintenance by the student. The reduction of computing resources and software maintenance occurs because the system emulates digital circuits on the server-side while the student interacts with the system through a graphical user interface containing a virtual printed circuit board on the client-side. Preliminary results of the proposed architecture have emulated designs of up to 8,000 two-input NAND gates at 1Hz system clock without loss of responsiveness.
{"title":"A Virtual Board Approach for Prototyping and Teaching Digital Design","authors":"Alcides S. Costa, Leonardo Droves Silveira, A. Reis","doi":"10.1109/SBCCI55532.2022.9893259","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893259","url":null,"abstract":"This paper presents a novel approach for prototyping digital circuits and teaching digital design. The approach uses a client-server architecture and aims to reduce the use of computing resources and software maintenance by the student. The reduction of computing resources and software maintenance occurs because the system emulates digital circuits on the server-side while the student interacts with the system through a graphical user interface containing a virtual printed circuit board on the client-side. Preliminary results of the proposed architecture have emulated designs of up to 8,000 two-input NAND gates at 1Hz system clock without loss of responsiveness.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131917227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893262
K. Antonopoulos, D. Karadimas, Alexandros Spournias, C. Panagiotou, Ignantios Fwtiou, Ioannis Symeonidis, Christos D. Antonopoulos, Michael Hübner, N. Voros
Over the last few years advancements in technological domains such as embedded system, IoT and distributed communication have greatly propelled the development of platforms and systems enabling people with disabilities to equally participate in everyday life activities. The added value of such platforms is even more pronounced when respective activities are of outmost importance like going to the supermarket and performing all the functions and services that a person without impairments can enjoy. In this context this paper presents and analyzes an end-to-end architecture where, distributed embedded systems intelligence, distributed data communication techniques and open data interfaces approaches are leveraged so that people with mobility impairments can be fully independent and functional when visiting a supermarket. In-depth analysis is offered on critical components and services such as, the specialized cart movement control unit, multifaceted localization techniques and automated pricing and merchandize management services. It is noted that this work is the outcome of a Greek National Research project supported by one of the biggest supermarket companies in Greece.
{"title":"A distributed Embedded Systems IoT platform and Associated services Supporting Shopping Cart for Disabled People","authors":"K. Antonopoulos, D. Karadimas, Alexandros Spournias, C. Panagiotou, Ignantios Fwtiou, Ioannis Symeonidis, Christos D. Antonopoulos, Michael Hübner, N. Voros","doi":"10.1109/SBCCI55532.2022.9893262","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893262","url":null,"abstract":"Over the last few years advancements in technological domains such as embedded system, IoT and distributed communication have greatly propelled the development of platforms and systems enabling people with disabilities to equally participate in everyday life activities. The added value of such platforms is even more pronounced when respective activities are of outmost importance like going to the supermarket and performing all the functions and services that a person without impairments can enjoy. In this context this paper presents and analyzes an end-to-end architecture where, distributed embedded systems intelligence, distributed data communication techniques and open data interfaces approaches are leveraged so that people with mobility impairments can be fully independent and functional when visiting a supermarket. In-depth analysis is offered on critical components and services such as, the specialized cart movement control unit, multifaceted localization techniques and automated pricing and merchandize management services. It is noted that this work is the outcome of a Greek National Research project supported by one of the biggest supermarket companies in Greece.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134200723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893246
Reza Ranjandish
Biphasic electrical stimulation is used To prevent any charge accumulation during electrical stimulation and to provide a safe stimulation. One of the methods to provide biphasic stimulation is to use passive discharging by electrode shortening. However, discharging an electrode with an unknown impedance may lead to the generation of large and unsafe current that may lead to unintended stimulation. This paper presents an all-digital programmable current-limited discharge circuitry for safe electrical stimulation. Implementing a current-limited discharge circuitry in the digital domain enhances the controllability of the system, and reduces the complexity of the design. In addition, using the proposed system, end-of-discharge is detected and the performance of the system is monitored in real-time. The correct performance of the proposed charge balancer is validated by simulation results obtained from the behavioral model of system using ideal components.
{"title":"An All-digital Programmable Current-limited Discharge Circuitry for a Safe Electrical Stimulation","authors":"Reza Ranjandish","doi":"10.1109/SBCCI55532.2022.9893246","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893246","url":null,"abstract":"Biphasic electrical stimulation is used To prevent any charge accumulation during electrical stimulation and to provide a safe stimulation. One of the methods to provide biphasic stimulation is to use passive discharging by electrode shortening. However, discharging an electrode with an unknown impedance may lead to the generation of large and unsafe current that may lead to unintended stimulation. This paper presents an all-digital programmable current-limited discharge circuitry for safe electrical stimulation. Implementing a current-limited discharge circuitry in the digital domain enhances the controllability of the system, and reduces the complexity of the design. In addition, using the proposed system, end-of-discharge is detected and the performance of the system is monitored in real-time. The correct performance of the proposed charge balancer is validated by simulation results obtained from the behavioral model of system using ideal components.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"225 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122920954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893252
Gayas Sayed, M. Kuhl
This paper presents a stochastic-binary hybrid ar-chitecture for miniaturizing FIR filters and its comparison with established filter implementations. FPGA realization of 4-, 16-and 64-tap FIR filters with 8-bit word length has confirmed the worthiness of the proposed architecture in densely packed systems. FPGA resources reduces by $sim 53%$ in a 4 - tap FIR filter, with the architecture's area efficiency increasing with filter taps. This hybrid architecture, performing multiplication in stochastic and accumulation in binary domain, requires only 2 random number generators, one for tapped delay-line and the other for filter coefficients. Thanks to additive-recurrence-based low-discrepancy sequences in stochastic number generators and non-scaled binary accumulation, the proposed stochastic architecture achieves best-in-class performance. This is validated via a detailed analysis of a 16-tap filter that achieves a pass-band ripple of $A_{p}=0.58dB$ and a stop-band attenuation of $A_{st}=-31.46dB$, making it indistinguishable from binary implementations. Additionally, a mathematical approach is pre-sented to estimate the error in the filter output, well before its actual realization. ASIC synthesis of a 16-tap FIR filter based on the proposed architecture with area of only 30,165 $mu m^{2}$ is superior to all discussed filter structures: It grants 68% area reduction compared to its binary counterpart and consumes an energy per operation of 0.429 nJ, a value at least $1.8times$ lower than previous stochastic designs.
本文提出了一种用于FIR滤波器小型化的随机-二进制混合结构,并与已有的滤波器实现进行了比较。FPGA实现了8位字长的4、16和64分路FIR滤波器,证实了该架构在密集系统中的价值。在4分路FIR滤波器中,FPGA资源减少$sim 53%$,随着滤波器分路的增加,结构的面积效率增加。这种混合结构在随机域进行乘法,在二值域进行累加,只需要2个随机数生成器,一个用于抽头延迟线,另一个用于滤波系数。由于随机数字生成器中基于加性递归的低差异序列和非尺度二进制累积,所提出的随机架构实现了同类最佳性能。这是通过对16分接滤波器的详细分析来验证的,该滤波器实现了通带纹波$A_{p}=0.58dB$和阻带衰减$A_{st}=-31.46dB$,使其与二进制实现无法区分。此外,在实际实现之前,提出了一种数学方法来估计滤波器输出中的误差。基于所提出的结构的16分路FIR滤波器的ASIC合成,其面积仅为30,165 $mu m^{2}$,优于所有讨论的滤波器结构:它授予68% area reduction compared to its binary counterpart and consumes an energy per operation of 0.429 nJ, a value at least $1.8times$ lower than previous stochastic designs.
{"title":"Miniaturized Sign-Magnitude Stochastic-Binary FIR Filter Architecture with Enhanced Accuracy","authors":"Gayas Sayed, M. Kuhl","doi":"10.1109/SBCCI55532.2022.9893252","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893252","url":null,"abstract":"This paper presents a stochastic-binary hybrid ar-chitecture for miniaturizing FIR filters and its comparison with established filter implementations. FPGA realization of 4-, 16-and 64-tap FIR filters with 8-bit word length has confirmed the worthiness of the proposed architecture in densely packed systems. FPGA resources reduces by $sim 53%$ in a 4 - tap FIR filter, with the architecture's area efficiency increasing with filter taps. This hybrid architecture, performing multiplication in stochastic and accumulation in binary domain, requires only 2 random number generators, one for tapped delay-line and the other for filter coefficients. Thanks to additive-recurrence-based low-discrepancy sequences in stochastic number generators and non-scaled binary accumulation, the proposed stochastic architecture achieves best-in-class performance. This is validated via a detailed analysis of a 16-tap filter that achieves a pass-band ripple of $A_{p}=0.58dB$ and a stop-band attenuation of $A_{st}=-31.46dB$, making it indistinguishable from binary implementations. Additionally, a mathematical approach is pre-sented to estimate the error in the filter output, well before its actual realization. ASIC synthesis of a 16-tap FIR filter based on the proposed architecture with area of only 30,165 $mu m^{2}$ is superior to all discussed filter structures: It grants 68% area reduction compared to its binary counterpart and consumes an energy per operation of 0.429 nJ, a value at least $1.8times$ lower than previous stochastic designs.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"280 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122938718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893238
C. Salazar-García, A. Chacón-Rodríguez, R. Rímolo-Donadío, R. García-Ramírez, David Solórzano-Pacheco, Jeferson González-Gómez, C. Strydis
The development of multi-FPGA systems focused on high-performance computing requires high-speed channels, low bandwidth overhead and latency. In this paper, we propose a multi-FPGA interconnection framework aimed at distributed processing applications. Our solution allows efficient communication between different processing elements distributed among the FPGAs. To evaluate our proposal, we built a multi-FPGA system composed of five Zynq ZC706 FPGA boards capable of hosting a diverse number of coprocessors distributed over our custom network. With an aggregate bandwidth of up to 25 Gbps per FPGA board, the interconnection framework reaches a latency of only 200.36 ns, one of the lowest reported in the lElectronics Engineering, iterature. Experimental results show a computational efficiency of 97.25 % with a sustained throughput of 21.4 GFLOPS. Furthermore, the proposed network interconnection architecture is easily portable to the latest generation FPGAs. This makes the current proposal a competitive option for distributed processing in multi-FPGA systems.
{"title":"A custom interconnection multi-FPGA framework for distributed processing applications","authors":"C. Salazar-García, A. Chacón-Rodríguez, R. Rímolo-Donadío, R. García-Ramírez, David Solórzano-Pacheco, Jeferson González-Gómez, C. Strydis","doi":"10.1109/SBCCI55532.2022.9893238","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893238","url":null,"abstract":"The development of multi-FPGA systems focused on high-performance computing requires high-speed channels, low bandwidth overhead and latency. In this paper, we propose a multi-FPGA interconnection framework aimed at distributed processing applications. Our solution allows efficient communication between different processing elements distributed among the FPGAs. To evaluate our proposal, we built a multi-FPGA system composed of five Zynq ZC706 FPGA boards capable of hosting a diverse number of coprocessors distributed over our custom network. With an aggregate bandwidth of up to 25 Gbps per FPGA board, the interconnection framework reaches a latency of only 200.36 ns, one of the lowest reported in the lElectronics Engineering, iterature. Experimental results show a computational efficiency of 97.25 % with a sustained throughput of 21.4 GFLOPS. Furthermore, the proposed network interconnection architecture is easily portable to the latest generation FPGAs. This makes the current proposal a competitive option for distributed processing in multi-FPGA systems.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127072842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/SBCCI55532.2022.9893254
Mariana Siniscalchi, C. Galup-Montoro, S. Bourdel, F. Silveira
The design and simulation results of a cross-coupled LC VCO in 28 nm FD-SOI for the 4.8 - 5 GHz band are presented. Following a previous work, it is predicted that 0.26 V is the minimum supply voltage of this VCO. It was verified through simulations that the VCO can operate with 0.25 V supply, consuming 90 μW and has a phase noise of -90.2 dBc/Hz at 1 MHz, while complying with the requirements of the following stage. As a consequence of operating at frequencies above the transition frequency of the transistors, the predictions based on the previous study, are less accurate but still provide a good starting point for the design. Moreover, lifting the previous study hypothesis of having an oscillation frequency at least a decade below the transition frequency of the transistors has allowed to further lowering the minimum supply voltage.
{"title":"Limits for Low Supply Voltage Operation of a 5 GHz VCO to Drive a 4-Path Mixer","authors":"Mariana Siniscalchi, C. Galup-Montoro, S. Bourdel, F. Silveira","doi":"10.1109/SBCCI55532.2022.9893254","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893254","url":null,"abstract":"The design and simulation results of a cross-coupled LC VCO in 28 nm FD-SOI for the 4.8 - 5 GHz band are presented. Following a previous work, it is predicted that 0.26 V is the minimum supply voltage of this VCO. It was verified through simulations that the VCO can operate with 0.25 V supply, consuming 90 μW and has a phase noise of -90.2 dBc/Hz at 1 MHz, while complying with the requirements of the following stage. As a consequence of operating at frequencies above the transition frequency of the transistors, the predictions based on the previous study, are less accurate but still provide a good starting point for the design. Moreover, lifting the previous study hypothesis of having an oscillation frequency at least a decade below the transition frequency of the transistors has allowed to further lowering the minimum supply voltage.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130817555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}