2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)最新文献

英文中文

A distributed Embedded Systems IoT platform and Associated services Supporting Shopping Cart for Disabled People 一种支持残疾人购物车的分布式嵌入式系统物联网平台及相关服务

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893262

K. Antonopoulos, D. Karadimas, Alexandros Spournias, C. Panagiotou, Ignantios Fwtiou, Ioannis Symeonidis, Christos D. Antonopoulos, Michael Hübner, N. Voros

Over the last few years advancements in technological domains such as embedded system, IoT and distributed communication have greatly propelled the development of platforms and systems enabling people with disabilities to equally participate in everyday life activities. The added value of such platforms is even more pronounced when respective activities are of outmost importance like going to the supermarket and performing all the functions and services that a person without impairments can enjoy. In this context this paper presents and analyzes an end-to-end architecture where, distributed embedded systems intelligence, distributed data communication techniques and open data interfaces approaches are leveraged so that people with mobility impairments can be fully independent and functional when visiting a supermarket. In-depth analysis is offered on critical components and services such as, the specialized cart movement control unit, multifaceted localization techniques and automated pricing and merchandize management services. It is noted that this work is the outcome of a Greek National Research project supported by one of the biggest supermarket companies in Greece.

在过去几年中，嵌入式系统、物联网和分布式通信等技术领域的进步极大地推动了平台和系统的发展，使残疾人能够平等地参与日常生活活动。这些平台的附加价值更加明显，因为它们各自的活动都是最重要的，比如去超市，完成所有非残疾人可以享受的功能和服务。在此背景下，本文提出并分析了一种端到端架构，其中利用分布式嵌入式系统智能、分布式数据通信技术和开放数据接口方法，使行动不便的人在逛超市时可以完全独立和正常工作。对关键部件和服务进行了深入分析，如专业的推车运动控制单元，多方面的本地化技术以及自动定价和商品管理服务。值得注意的是，这项工作是由希腊最大的超市公司之一支持的希腊国家研究项目的成果。

{"title":"A distributed Embedded Systems IoT platform and Associated services Supporting Shopping Cart for Disabled People","authors":"K. Antonopoulos, D. Karadimas, Alexandros Spournias, C. Panagiotou, Ignantios Fwtiou, Ioannis Symeonidis, Christos D. Antonopoulos, Michael Hübner, N. Voros","doi":"10.1109/SBCCI55532.2022.9893262","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893262","url":null,"abstract":"Over the last few years advancements in technological domains such as embedded system, IoT and distributed communication have greatly propelled the development of platforms and systems enabling people with disabilities to equally participate in everyday life activities. The added value of such platforms is even more pronounced when respective activities are of outmost importance like going to the supermarket and performing all the functions and services that a person without impairments can enjoy. In this context this paper presents and analyzes an end-to-end architecture where, distributed embedded systems intelligence, distributed data communication techniques and open data interfaces approaches are leveraged so that people with mobility impairments can be fully independent and functional when visiting a supermarket. In-depth analysis is offered on critical components and services such as, the specialized cart movement control unit, multifaceted localization techniques and automated pricing and merchandize management services. It is noted that this work is the outcome of a Greek National Research project supported by one of the biggest supermarket companies in Greece.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134200723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Virtual Board Approach for Prototyping and Teaching Digital Design 虚拟板的原型设计与数字设计教学

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893259

Alcides S. Costa, Leonardo Droves Silveira, A. Reis

This paper presents a novel approach for prototyping digital circuits and teaching digital design. The approach uses a client-server architecture and aims to reduce the use of computing resources and software maintenance by the student. The reduction of computing resources and software maintenance occurs because the system emulates digital circuits on the server-side while the student interacts with the system through a graphical user interface containing a virtual printed circuit board on the client-side. Preliminary results of the proposed architecture have emulated designs of up to 8,000 two-input NAND gates at 1Hz system clock without loss of responsiveness.

本文提出了一种数字电路原型设计和数字设计教学的新方法。该方法使用客户机-服务器架构，旨在减少学生对计算资源和软件维护的使用。由于系统在服务器端模拟数字电路，而学生通过客户端包含虚拟印刷电路板的图形用户界面与系统交互，从而减少了计算资源和软件维护。所提出的架构的初步结果已经在1Hz系统时钟下模拟了多达8,000个双输入NAND门的设计，而不会损失响应性。

引用次数: 1

A Flexible and Energy-Efficient BLAKE-256/2s Co-Processor for Blockchain-based IoT Applications 面向区块链物联网应用的灵活节能BLAKE-256/2s协处理器

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893257

Pham Hoai Luan, T. Tran, V. Le, Y. Nakashima

Developing flexible and energy-efficient BLAKE-256/2s hardware has recently become necessary since BLAKE-256 and BLAKE2s are important cryptographic hash functions in reliability and security enhancement for blockchain-based IoT applications. However, previous BLAKE-256/2s architectures are challenging in achieving high flexibility and energy efficiency. Therefore, this paper proposes the BLAKE-256/2s co-processor to achieve high flexibility and energy efficiency for blockchain-based IoT applications. The proposed BLAKE-256/2s accelerator has three novel optimization techniques to achieve those goals. First, a configurable hashing core is proposed to enhance flexibility. Second, a pipelined permutation and compression architecture are developed to improve the throughput and hardware efficiency. Third, a mining transmission mechanism is introduced to optimize the performance of our co-processor at the system-on-chip level. The proposed co-processor is implemented and verified on a Xilinx Zynq $mathbf{UltraScale}+$ MPSoC ZCU102 FPGA. Accordingly, the power and energy efficiency of the co-processor on the ZCU102 FPGA is significantly better than the Intel i9 10940X CPU and the RTX 3090 GPU. Moreover, experimental results on several FPGAs prove that the proposed co-processor is considerably higher throughput, area efficiency, and flexibility than FPGA-based related works.

由于BLAKE-256和BLAKE2s是基于区块链的物联网应用中重要的加密哈希函数，因此开发灵活且节能的BLAKE-256/2s硬件最近变得非常必要。然而，以前的BLAKE-256/2s架构在实现高灵活性和能源效率方面具有挑战性。因此，本文提出BLAKE-256/2s协处理器，为基于区块链的物联网应用实现高灵活性和高能效。提出的BLAKE-256/2s加速器采用了三种新的优化技术来实现这些目标。首先，提出了一个可配置的哈希核心，以提高灵活性。其次，提出了一种流水线排列和压缩体系结构，以提高吞吐量和硬件效率。第三，引入了一种挖掘传输机制，以优化我们的协处理器在片上系统级的性能。所提出的协处理器在Xilinx Zynq $mathbf{UltraScale}+$ MPSoC ZCU102 FPGA上实现并验证。因此，ZCU102 FPGA上的协处理器的功耗和能效明显优于Intel i9 10940X CPU和RTX 3090 GPU。此外，在多个fpga上的实验结果表明，与基于fpga的相关产品相比，该协处理器具有更高的吞吐量、面积效率和灵活性。

{"title":"A Flexible and Energy-Efficient BLAKE-256/2s Co-Processor for Blockchain-based IoT Applications","authors":"Pham Hoai Luan, T. Tran, V. Le, Y. Nakashima","doi":"10.1109/SBCCI55532.2022.9893257","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893257","url":null,"abstract":"Developing flexible and energy-efficient BLAKE-256/2s hardware has recently become necessary since BLAKE-256 and BLAKE2s are important cryptographic hash functions in reliability and security enhancement for blockchain-based IoT applications. However, previous BLAKE-256/2s architectures are challenging in achieving high flexibility and energy efficiency. Therefore, this paper proposes the BLAKE-256/2s co-processor to achieve high flexibility and energy efficiency for blockchain-based IoT applications. The proposed BLAKE-256/2s accelerator has three novel optimization techniques to achieve those goals. First, a configurable hashing core is proposed to enhance flexibility. Second, a pipelined permutation and compression architecture are developed to improve the throughput and hardware efficiency. Third, a mining transmission mechanism is introduced to optimize the performance of our co-processor at the system-on-chip level. The proposed co-processor is implemented and verified on a Xilinx Zynq $mathbf{UltraScale}+$ MPSoC ZCU102 FPGA. Accordingly, the power and energy efficiency of the co-processor on the ZCU102 FPGA is significantly better than the Intel i9 10940X CPU and the RTX 3090 GPU. Moreover, experimental results on several FPGAs prove that the proposed co-processor is considerably higher throughput, area efficiency, and flexibility than FPGA-based related works.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122464749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Digital Defect-Oriented Test Methodology for Flipped Voltage Follower Low Dropout (LDO) Voltage Regulators 翻转电压从动器低差(LDO)稳压器数字缺陷导向测试方法

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893243

M. Saikiran, Mona Ganji, Degang Chen

Low Dropout (LDO) voltage regulator is one of the most commonly used blocks in integrated circuits (IC). In contrast to classic LDOs, flipped voltage follower (FVF) LDOs are capable of sourcing high current loads as well as providing high bandwidth (increased PSRR) due to the presence of a fast local loop. In mission-critical applications such as automotive, industrial, and space applications, functional safety (FuSa) is a very important requirement. To emphasize this requirement, ISO26262 Standard for functional safety recommends an automotive IC to have a very high defect coverage (usually greater than 90%). In this work, we propose an extremely simple and low-cost defect detection methodology for a folded FVF LDO providing high defect-coverage. Furthermore, as the proposed method is time-efficient, it can also be incorporated wafer-level production testing of the SoC and reduce the test time. The proposed design for test (DfT) defect detection method uses completely digital injection and detection circuits, making the method robust and easy to implement. Additionally, the digital nature of the method makes it an ideal candidate in an SoC where digital control and monitor bus (like IJTAG) is already available. The circuit under test (CUT) used in this work is designed in 65nm UMC technology. In this paper, in addition to the defect coverage results with the proposed method, we also present defect-coverage results for our CUT with defect detection methods proposed in the literature for comparison. The transistor-level fault simulations confirm that the proposed method has high fault coverage of 94% with less than 4% area overhead making it extremely area-efficient.

低压差(LDO)稳压器是集成电路(IC)中最常用的模块之一。与经典ldo相比，翻转电压跟随器(FVF) ldo能够提供高电流负载，并且由于存在快速本地环路而提供高带宽(增加PSRR)。在汽车、工业和空间应用等关键任务应用中，功能安全(FuSa)是一个非常重要的要求。为了强调这一要求，ISO26262功能安全标准建议汽车IC具有非常高的缺陷覆盖率(通常大于90%)。在这项工作中，我们提出了一种非常简单和低成本的折叠FVF LDO缺陷检测方法，提供了高缺陷覆盖率。此外，由于所提出的方法具有时间效率，它还可以纳入晶圆级SoC的生产测试，减少测试时间。所提出的测试缺陷检测方法采用完全数字化的注入和检测电路，使该方法鲁棒性强，易于实现。此外，该方法的数字特性使其成为已经可用的数字控制和监控总线(如IJTAG)的SoC的理想候选者。在这项工作中使用的被测电路(CUT)是采用65nm UMC技术设计的。在本文中，除了使用所提出的方法的缺陷覆盖结果外，我们还将我们的CUT的缺陷覆盖结果与文献中提出的缺陷检测方法进行比较。晶体管级故障仿真结果表明，该方法故障覆盖率高达94%，面积开销小于4%，具有极高的面积效率。

{"title":"Digital Defect-Oriented Test Methodology for Flipped Voltage Follower Low Dropout (LDO) Voltage Regulators","authors":"M. Saikiran, Mona Ganji, Degang Chen","doi":"10.1109/SBCCI55532.2022.9893243","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893243","url":null,"abstract":"Low Dropout (LDO) voltage regulator is one of the most commonly used blocks in integrated circuits (IC). In contrast to classic LDOs, flipped voltage follower (FVF) LDOs are capable of sourcing high current loads as well as providing high bandwidth (increased PSRR) due to the presence of a fast local loop. In mission-critical applications such as automotive, industrial, and space applications, functional safety (FuSa) is a very important requirement. To emphasize this requirement, ISO26262 Standard for functional safety recommends an automotive IC to have a very high defect coverage (usually greater than 90%). In this work, we propose an extremely simple and low-cost defect detection methodology for a folded FVF LDO providing high defect-coverage. Furthermore, as the proposed method is time-efficient, it can also be incorporated wafer-level production testing of the SoC and reduce the test time. The proposed design for test (DfT) defect detection method uses completely digital injection and detection circuits, making the method robust and easy to implement. Additionally, the digital nature of the method makes it an ideal candidate in an SoC where digital control and monitor bus (like IJTAG) is already available. The circuit under test (CUT) used in this work is designed in 65nm UMC technology. In this paper, in addition to the defect coverage results with the proposed method, we also present defect-coverage results for our CUT with defect detection methods proposed in the literature for comparison. The transistor-level fault simulations confirm that the proposed method has high fault coverage of 94% with less than 4% area overhead making it extremely area-efficient.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126999712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Edge GPU based on an FPGA Overlay Architecture using PYNQ 基于FPGA覆盖架构的边缘GPU使用PYNQ

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893229

Hector Gerardo Muñoz Hernandez, Florian Fricke, Muhammed Al Kadi, M. Reichenbach, Michael Hübner

Using Graphical Processing Units (GPUs) to accelerate applications such as Deep Neural Networks (DNNs) or image processing tasks has been gaining lots of attention for some time now due to their high performance. This has brought a lot of attention to creating toolflows where hardware designing expertise is not a requirement anymore, resulting in making the technology accessible to as many users as possible. However, hardware knowledge is still important to make the toolflows run efficiently on the target platform. In this paper, we introduce a framework that uses Jupyter Notebooks, a browser based interactive computing environment, and PYNQ which is as a high-level-abstraction interface between the Programmable Logic (PL) and the Processing System (PS) from which the user can unlock the full potential of a highly customizable soft-core GPU running on an Field-programmable Gate Array (FPGA). The framework is open-source and requires only a few set-up steps to get applications running by re-using the existing Jupyter Notebooks as templates, making it ideal for fast prototyping and educational purposes. Moreover, there is also the possibility to customize the architecture of the target hardware to fit performance, resource utilization, and functional requirements. This framework also supports floating-point operations and can be ported to System on Chip (SoC) devices like the Xilinx Zynq-7000 family, among others.

使用图形处理单元(gpu)来加速深度神经网络(dnn)或图像处理任务等应用，由于其高性能，一段时间以来一直受到广泛关注。这引起了对创建工具流的大量关注，在这些工具流中不再需要硬件设计专业知识，从而使尽可能多的用户可以使用该技术。然而，硬件知识对于使工具流在目标平台上有效运行仍然很重要。在本文中，我们介绍了一个使用Jupyter notebook(基于浏览器的交互式计算环境)和PYNQ(可编程逻辑(PL)和处理系统(PS)之间的高级抽象接口)的框架，用户可以从中释放运行在现场可编程门阵列(FPGA)上的高度可定制软核GPU的全部潜力。该框架是开源的，只需要几个设置步骤就可以通过重用现有的Jupyter notebook作为模板来运行应用程序，使其成为快速原型和教育目的的理想选择。此外，还可以定制目标硬件的体系结构，以适应性能、资源利用率和功能需求。该框架还支持浮点运算，可以移植到片上系统(SoC)器件，如Xilinx Zynq-7000系列等。

{"title":"Edge GPU based on an FPGA Overlay Architecture using PYNQ","authors":"Hector Gerardo Muñoz Hernandez, Florian Fricke, Muhammed Al Kadi, M. Reichenbach, Michael Hübner","doi":"10.1109/SBCCI55532.2022.9893229","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893229","url":null,"abstract":"Using Graphical Processing Units (GPUs) to accelerate applications such as Deep Neural Networks (DNNs) or image processing tasks has been gaining lots of attention for some time now due to their high performance. This has brought a lot of attention to creating toolflows where hardware designing expertise is not a requirement anymore, resulting in making the technology accessible to as many users as possible. However, hardware knowledge is still important to make the toolflows run efficiently on the target platform. In this paper, we introduce a framework that uses Jupyter Notebooks, a browser based interactive computing environment, and PYNQ which is as a high-level-abstraction interface between the Programmable Logic (PL) and the Processing System (PS) from which the user can unlock the full potential of a highly customizable soft-core GPU running on an Field-programmable Gate Array (FPGA). The framework is open-source and requires only a few set-up steps to get applications running by re-using the existing Jupyter Notebooks as templates, making it ideal for fast prototyping and educational purposes. Moreover, there is also the possibility to customize the architecture of the target hardware to fit performance, resource utilization, and functional requirements. This framework also supports floating-point operations and can be ported to System on Chip (SoC) devices like the Xilinx Zynq-7000 family, among others.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122241971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

1.2 nW Neuromorphic Enhanced Wake-Up Radio 1.2 nW神经形态增强唤醒无线电

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893247

Zalfa Jouni, Thomas Soupizet, Siqi Wang, A. Benlarbi-Delai, Pietro M. Ferreira

Low-cost devices with ultra-low-power radio ca-pabilities are a major challenge in smart devices, while a permanently-on receiver is required for smart communication. This paper proposes a wake-up radio with a neuromorphic pre-processing system both biased in weak inversion region. The system can receive a 2.4 GHz signal, demodulate it, and recognize bit patterns based on the spiking frequency of a neuron. Significant performance is obtained with 1.2 nW of total power consumption, which is at least three orders of magnitude less than the conventional RF envelope detectors. Further, spiking frequency responsiveness over input power suggests that the proposed system can distinguish different signals at 2.4 GHz. The proposed system achieves an energy efficiency of 1.2 pJ/bit with a minimum detectable signal of -27 dBm.

具有超低功率无线电功能的低成本设备是智能设备的主要挑战，而智能通信需要永久打开的接收器。本文提出了一种带有神经形态预处理系统的唤醒无线电，该系统均偏置于弱反转区。该系统可以接收2.4 GHz信号，对其进行解调，并根据神经元的尖峰频率识别位模式。在1.2 nW的总功耗下获得了显著的性能，这比传统的射频包络检测器至少低三个数量级。此外，输入功率的尖峰频率响应性表明，所提出的系统可以区分2.4 GHz的不同信号。该系统的能量效率为1.2 pJ/bit，最小可检测信号为-27 dBm。

引用次数: 2

An All-digital Programmable Current-limited Discharge Circuitry for a Safe Electrical Stimulation 用于安全电刺激的全数字可编程限流放电电路

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893246

Reza Ranjandish

Biphasic electrical stimulation is used To prevent any charge accumulation during electrical stimulation and to provide a safe stimulation. One of the methods to provide biphasic stimulation is to use passive discharging by electrode shortening. However, discharging an electrode with an unknown impedance may lead to the generation of large and unsafe current that may lead to unintended stimulation. This paper presents an all-digital programmable current-limited discharge circuitry for safe electrical stimulation. Implementing a current-limited discharge circuitry in the digital domain enhances the controllability of the system, and reduces the complexity of the design. In addition, using the proposed system, end-of-discharge is detected and the performance of the system is monitored in real-time. The correct performance of the proposed charge balancer is validated by simulation results obtained from the behavioral model of system using ideal components.

双相电刺激是为了防止电刺激过程中的电荷积累，并提供安全的电刺激。采用缩短电极的被动放电是提供双相刺激的方法之一。然而，放电具有未知阻抗的电极可能导致产生大而不安全的电流，从而可能导致意外的刺激。提出了一种全数字可编程限流放电安全电刺激电路。在数字域实现限流放电电路，增强了系统的可控性，降低了设计的复杂性。此外，利用所提出的系统，可以实时检测放电结束并监控系统的性能。利用理想元件对系统行为模型进行仿真，验证了电荷平衡器的正确性能。

引用次数: 0

Miniaturized Sign-Magnitude Stochastic-Binary FIR Filter Architecture with Enhanced Accuracy 具有增强精度的小型化符号-幅度随机二值FIR滤波器结构

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893252

Gayas Sayed, M. Kuhl

This paper presents a stochastic-binary hybrid ar-chitecture for miniaturizing FIR filters and its comparison with established filter implementations. FPGA realization of 4-, 16-and 64-tap FIR filters with 8-bit word length has confirmed the worthiness of the proposed architecture in densely packed systems. FPGA resources reduces by $sim 53%$ in a 4 - tap FIR filter, with the architecture's area efficiency increasing with filter taps. This hybrid architecture, performing multiplication in stochastic and accumulation in binary domain, requires only 2 random number generators, one for tapped delay-line and the other for filter coefficients. Thanks to additive-recurrence-based low-discrepancy sequences in stochastic number generators and non-scaled binary accumulation, the proposed stochastic architecture achieves best-in-class performance. This is validated via a detailed analysis of a 16-tap filter that achieves a pass-band ripple of $A_{p}=0.58dB$ and a stop-band attenuation of $A_{st}=-31.46dB$, making it indistinguishable from binary implementations. Additionally, a mathematical approach is pre-sented to estimate the error in the filter output, well before its actual realization. ASIC synthesis of a 16-tap FIR filter based on the proposed architecture with area of only 30,165 $mu m^{2}$ is superior to all discussed filter structures: It grants 68% area reduction compared to its binary counterpart and consumes an energy per operation of 0.429 nJ, a value at least $1.8times$ lower than previous stochastic designs.

本文提出了一种用于FIR滤波器小型化的随机-二进制混合结构，并与已有的滤波器实现进行了比较。FPGA实现了8位字长的4、16和64分路FIR滤波器，证实了该架构在密集系统中的价值。在4分路FIR滤波器中，FPGA资源减少$sim 53%$，随着滤波器分路的增加，结构的面积效率增加。这种混合结构在随机域进行乘法，在二值域进行累加，只需要2个随机数生成器，一个用于抽头延迟线，另一个用于滤波系数。由于随机数字生成器中基于加性递归的低差异序列和非尺度二进制累积，所提出的随机架构实现了同类最佳性能。这是通过对16分接滤波器的详细分析来验证的，该滤波器实现了通带纹波$A_{p}=0.58dB$和阻带衰减$A_{st}=-31.46dB$，使其与二进制实现无法区分。此外，在实际实现之前，提出了一种数学方法来估计滤波器输出中的误差。基于所提出的结构的16分路FIR滤波器的ASIC合成，其面积仅为30,165 $mu m^{2}$，优于所有讨论的滤波器结构:它授予68% area reduction compared to its binary counterpart and consumes an energy per operation of 0.429 nJ, a value at least $1.8times$ lower than previous stochastic designs.

{"title":"Miniaturized Sign-Magnitude Stochastic-Binary FIR Filter Architecture with Enhanced Accuracy","authors":"Gayas Sayed, M. Kuhl","doi":"10.1109/SBCCI55532.2022.9893252","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893252","url":null,"abstract":"This paper presents a stochastic-binary hybrid ar-chitecture for miniaturizing FIR filters and its comparison with established filter implementations. FPGA realization of 4-, 16-and 64-tap FIR filters with 8-bit word length has confirmed the worthiness of the proposed architecture in densely packed systems. FPGA resources reduces by $sim 53%$ in a 4 - tap FIR filter, with the architecture's area efficiency increasing with filter taps. This hybrid architecture, performing multiplication in stochastic and accumulation in binary domain, requires only 2 random number generators, one for tapped delay-line and the other for filter coefficients. Thanks to additive-recurrence-based low-discrepancy sequences in stochastic number generators and non-scaled binary accumulation, the proposed stochastic architecture achieves best-in-class performance. This is validated via a detailed analysis of a 16-tap filter that achieves a pass-band ripple of $A_{p}=0.58dB$ and a stop-band attenuation of $A_{st}=-31.46dB$, making it indistinguishable from binary implementations. Additionally, a mathematical approach is pre-sented to estimate the error in the filter output, well before its actual realization. ASIC synthesis of a 16-tap FIR filter based on the proposed architecture with area of only 30,165 $mu m^{2}$ is superior to all discussed filter structures: It grants 68% area reduction compared to its binary counterpart and consumes an energy per operation of 0.429 nJ, a value at least $1.8times$ lower than previous stochastic designs.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"280 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122938718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A custom interconnection multi-FPGA framework for distributed processing applications 分布式处理应用的自定义互连多fpga框架

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893238

C. Salazar-García, A. Chacón-Rodríguez, R. Rímolo-Donadío, R. García-Ramírez, David Solórzano-Pacheco, Jeferson González-Gómez, C. Strydis

The development of multi-FPGA systems focused on high-performance computing requires high-speed channels, low bandwidth overhead and latency. In this paper, we propose a multi-FPGA interconnection framework aimed at distributed processing applications. Our solution allows efficient communication between different processing elements distributed among the FPGAs. To evaluate our proposal, we built a multi-FPGA system composed of five Zynq ZC706 FPGA boards capable of hosting a diverse number of coprocessors distributed over our custom network. With an aggregate bandwidth of up to 25 Gbps per FPGA board, the interconnection framework reaches a latency of only 200.36 ns, one of the lowest reported in the lElectronics Engineering, iterature. Experimental results show a computational efficiency of 97.25 % with a sustained throughput of 21.4 GFLOPS. Furthermore, the proposed network interconnection architecture is easily portable to the latest generation FPGAs. This makes the current proposal a competitive option for distributed processing in multi-FPGA systems.

以高性能计算为核心的多fpga系统的发展需要高速通道、低带宽开销和低延迟。在本文中，我们提出了一个针对分布式处理应用的多fpga互连框架。我们的解决方案允许分布在fpga之间的不同处理元件之间的有效通信。为了评估我们的建议，我们构建了一个多FPGA系统，该系统由五块Zynq ZC706 FPGA板组成，能够托管分布在我们自定义网络上的不同数量的协处理器。每个FPGA板的总带宽高达25 Gbps，互连框架的延迟仅为200.36 ns，是电子工程文献中报道的最低延迟之一。实验结果表明，该算法的计算效率为97.25%，持续吞吐量为21.4 GFLOPS。此外，所提出的网络互连架构易于移植到最新一代的fpga上。这使得当前的方案成为多fpga系统中分布式处理的一个有竞争力的选择。

{"title":"A custom interconnection multi-FPGA framework for distributed processing applications","authors":"C. Salazar-García, A. Chacón-Rodríguez, R. Rímolo-Donadío, R. García-Ramírez, David Solórzano-Pacheco, Jeferson González-Gómez, C. Strydis","doi":"10.1109/SBCCI55532.2022.9893238","DOIUrl":"https://doi.org/10.1109/SBCCI55532.2022.9893238","url":null,"abstract":"The development of multi-FPGA systems focused on high-performance computing requires high-speed channels, low bandwidth overhead and latency. In this paper, we propose a multi-FPGA interconnection framework aimed at distributed processing applications. Our solution allows efficient communication between different processing elements distributed among the FPGAs. To evaluate our proposal, we built a multi-FPGA system composed of five Zynq ZC706 FPGA boards capable of hosting a diverse number of coprocessors distributed over our custom network. With an aggregate bandwidth of up to 25 Gbps per FPGA board, the interconnection framework reaches a latency of only 200.36 ns, one of the lowest reported in the lElectronics Engineering, iterature. Experimental results show a computational efficiency of 97.25 % with a sustained throughput of 21.4 GFLOPS. Furthermore, the proposed network interconnection architecture is easily portable to the latest generation FPGAs. This makes the current proposal a competitive option for distributed processing in multi-FPGA systems.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127072842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Limits for Low Supply Voltage Operation of a 5 GHz VCO to Drive a 4-Path Mixer 5 GHz压控振荡器驱动4路混频器的低电源电压操作限制

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

Pub Date : 2022-08-22 DOI: 10.1109/SBCCI55532.2022.9893254

Mariana Siniscalchi, C. Galup-Montoro, S. Bourdel, F. Silveira

The design and simulation results of a cross-coupled LC VCO in 28 nm FD-SOI for the 4.8 - 5 GHz band are presented. Following a previous work, it is predicted that 0.26 V is the minimum supply voltage of this VCO. It was verified through simulations that the VCO can operate with 0.25 V supply, consuming 90 μW and has a phase noise of -90.2 dBc/Hz at 1 MHz, while complying with the requirements of the following stage. As a consequence of operating at frequencies above the transition frequency of the transistors, the predictions based on the previous study, are less accurate but still provide a good starting point for the design. Moreover, lifting the previous study hypothesis of having an oscillation frequency at least a decade below the transition frequency of the transistors has allowed to further lowering the minimum supply voltage.

给出了4.8 ~ 5 GHz频段28 nm FD-SOI交叉耦合LC压控振荡器的设计和仿真结果。根据之前的工作，预测0.26 V是该压控振荡器的最小电源电压。仿真结果表明，该压控振荡器可在0.25 V电源下工作，功耗为90 μW，在1 MHz时相位噪声为-90.2 dBc/Hz，符合下一阶段的要求。由于工作频率高于晶体管的转换频率，基于先前研究的预测不太准确，但仍然为设计提供了一个很好的起点。此外，取消先前的研究假设，即振荡频率至少比晶体管的过渡频率低十年，从而进一步降低了最小电源电压。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀