首页 > 最新文献

2019 32nd Symposium on Integrated Circuits and Systems Design (SBCCI)最新文献

英文 中文
An SVM-based Hardware Accelerator for Onboard Classification of Hyperspectral Images 基于svm的机载高光谱图像分类硬件加速器
Pub Date : 2019-08-26 DOI: 10.1145/3338852.3339869
Lucas A. Martins, Guilherme A. M. Sborz, Felipe Viel, C. Zeferino
Hyperspectral images (HSIs) have been used in civil and military scenarios for ground recognition, urban development management, rare minerals identification, and diverse other purposes. However, HSIs have a significant volume of information and require high computational power, especially for real-time processing in embedded applications, as in onboard computers in satellites. These issues have driven the development of hardware-based solutions able to provide the processing power necessary to meet such requirements. In this paper, we present a hardware accelerator to enhance the performance of one of the most computational expensive stages of HSI processing: the classification. We have employed the Entropy Multiple Correlation Ratio procedure to select the spectral bands to be used in the training process. For the classification step, we have applied a Support Vector Machine classifier with a Hamming Distance decision approach. The proposed custom processor was implemented in FPGA and compared with high-level implementations. The results obtained demonstrate that the processor has a silicon cost lower than similar solutions and can perform a realtime pixel classification in 0.1 ms and achieves a state-of-the-art accuracy of 99.7%.
高光谱图像(hsi)已经在民用和军事场景中用于地面识别、城市发展管理、稀有矿物识别和各种其他用途。然而,hsi具有大量的信息,需要很高的计算能力,特别是在嵌入式应用中的实时处理,如卫星上的机载计算机。这些问题推动了基于硬件的解决方案的开发,这些解决方案能够提供满足此类需求所需的处理能力。在本文中,我们提出了一个硬件加速器来提高HSI处理中最昂贵的计算阶段之一的性能:分类。我们采用了熵多相关比法来选择训练过程中使用的光谱波段。对于分类步骤,我们采用了支持向量机分类器和汉明距离决策方法。提出的自定义处理器在FPGA上实现,并与高级实现进行了比较。结果表明,该处理器的硅成本低于同类解决方案,可以在0.1 ms内完成实时像素分类,并达到99.7%的最先进精度。
{"title":"An SVM-based Hardware Accelerator for Onboard Classification of Hyperspectral Images","authors":"Lucas A. Martins, Guilherme A. M. Sborz, Felipe Viel, C. Zeferino","doi":"10.1145/3338852.3339869","DOIUrl":"https://doi.org/10.1145/3338852.3339869","url":null,"abstract":"Hyperspectral images (HSIs) have been used in civil and military scenarios for ground recognition, urban development management, rare minerals identification, and diverse other purposes. However, HSIs have a significant volume of information and require high computational power, especially for real-time processing in embedded applications, as in onboard computers in satellites. These issues have driven the development of hardware-based solutions able to provide the processing power necessary to meet such requirements. In this paper, we present a hardware accelerator to enhance the performance of one of the most computational expensive stages of HSI processing: the classification. We have employed the Entropy Multiple Correlation Ratio procedure to select the spectral bands to be used in the training process. For the classification step, we have applied a Support Vector Machine classifier with a Hamming Distance decision approach. The proposed custom processor was implemented in FPGA and compared with high-level implementations. The results obtained demonstrate that the processor has a silicon cost lower than similar solutions and can perform a realtime pixel classification in 0.1 ms and achieves a state-of-the-art accuracy of 99.7%.","PeriodicalId":184401,"journal":{"name":"2019 32nd Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124325273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
CMOS Analog Four-Quadrant Multiplier Free of Voltage Reference Generators CMOS模拟四象限乘法器无电压基准发生器
Pub Date : 2019-08-26 DOI: 10.1145/3338852.3339870
Antonio José Sobrinho de Sousa, F. Andrade, Hildeloi dos Santos, Gabriele Costa Goncalves, M. D. Pereira, E. Santana, A. Cunha
This work presents a CMOS four quadrant analog multiplier architecture for application as the synapse element in analog cellular neural networks. The circuit has voltage-mode inputs and a current-mode output and includes a signal application method that avoids voltage or current reference generators. Simulations have been accomplished for a CMOS 130 nm technology, featuring $pm 50 mathrm{mV}$ input voltage range, $60 mumathrm{W}$ static power and −25 dB maximum THD. The active area is $346 mumathrm{m}^{2}$.
本文提出了一种CMOS四象限模拟乘法器架构,用于模拟细胞神经网络中的突触元件。该电路具有电压模式输入和电流模式输出,并包括避免电压或电流基准发生器的信号应用方法。对CMOS 130纳米技术进行了仿真,该技术具有$pm 50 math {mV}$输入电压范围,$60 mumath {W}$静态功率和- 25 dB最大THD。活动面积为$346 mu mathm {m}^{2}$。
{"title":"CMOS Analog Four-Quadrant Multiplier Free of Voltage Reference Generators","authors":"Antonio José Sobrinho de Sousa, F. Andrade, Hildeloi dos Santos, Gabriele Costa Goncalves, M. D. Pereira, E. Santana, A. Cunha","doi":"10.1145/3338852.3339870","DOIUrl":"https://doi.org/10.1145/3338852.3339870","url":null,"abstract":"This work presents a CMOS four quadrant analog multiplier architecture for application as the synapse element in analog cellular neural networks. The circuit has voltage-mode inputs and a current-mode output and includes a signal application method that avoids voltage or current reference generators. Simulations have been accomplished for a CMOS 130 nm technology, featuring $pm 50 mathrm{mV}$ input voltage range, $60 mumathrm{W}$ static power and −25 dB maximum THD. The active area is $346 mumathrm{m}^{2}$.","PeriodicalId":184401,"journal":{"name":"2019 32nd Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131383277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
DNAr-Logic: A constructive DNA logic circuit design library in R language for Molecular Computing DNA - logic:一个基于R语言的DNA逻辑电路设计库
Pub Date : 2019-08-26 DOI: 10.1145/3338852.3339854
Renan A. Marks, Daniel K. S. Vieira, Marcos V. Guterres, Poliana A. C. Oliveira, O. V. Neto
This paper describes the DNAr-Logic: an implementation of a software package in R language that provides ease of use and scalability of the design process of digital logic circuits in molecular computing, more specifically, DNA computing. These devices may be used in-vitro, in-vivo, or even replace the CMOS technology in some applications. Using a technique known as DNA strand displacement reaction (DSD) in conjunction with chemical reaction networks (CRN's), DNA strands can be used as “wet” hardware to construct molecular logic circuits analogous to electronic digital projects. The circuits designed using the DNAr-Logic can be created in a constructive manner and simulated without requiring knowledge of chemistry or DSD mechanism. The package implements all the main logic gates. We describe the design of a majority gate (also available in the package) and a full-adder circuit that only uses this port. We describe the results and simulation of our design.
本文描述了DNA - logic:一个用R语言实现的软件包,它提供了分子计算中数字逻辑电路设计过程的易用性和可扩展性,更具体地说,是DNA计算。这些器件可用于体外、体内,甚至在某些应用中取代CMOS技术。利用DNA链位移反应(DSD)技术与化学反应网络(CRN)相结合,DNA链可以作为“湿”硬件来构建类似电子数字项目的分子逻辑电路。使用DNAr-Logic设计的电路可以以建设性的方式创建和模拟,而不需要化学或DSD机制的知识。该包实现了所有主要的逻辑门。我们描述了一个多数门(也在封装中提供)和一个只使用该端口的全加法器电路的设计。我们描述了我们设计的结果和仿真。
{"title":"DNAr-Logic: A constructive DNA logic circuit design library in R language for Molecular Computing","authors":"Renan A. Marks, Daniel K. S. Vieira, Marcos V. Guterres, Poliana A. C. Oliveira, O. V. Neto","doi":"10.1145/3338852.3339854","DOIUrl":"https://doi.org/10.1145/3338852.3339854","url":null,"abstract":"This paper describes the DNAr-Logic: an implementation of a software package in R language that provides ease of use and scalability of the design process of digital logic circuits in molecular computing, more specifically, DNA computing. These devices may be used in-vitro, in-vivo, or even replace the CMOS technology in some applications. Using a technique known as DNA strand displacement reaction (DSD) in conjunction with chemical reaction networks (CRN's), DNA strands can be used as “wet” hardware to construct molecular logic circuits analogous to electronic digital projects. The circuits designed using the DNAr-Logic can be created in a constructive manner and simulated without requiring knowledge of chemistry or DSD mechanism. The package implements all the main logic gates. We describe the design of a majority gate (also available in the package) and a full-adder circuit that only uses this port. We describe the results and simulation of our design.","PeriodicalId":184401,"journal":{"name":"2019 32nd Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121121170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Adaptive Discrete Particle Swarm Optimization for Mapping Real-Time Applications onto Network-on-a-Chip based MPSoCs 一种将实时应用映射到基于片上网络的mpsoc的自适应离散粒子群优化
Pub Date : 2019-08-26 DOI: 10.1145/3338852.3339835
J. B. D. Barros, R. C. Sampaio, C. Llanos
This paper presents a modified version of the well-known Particle Swarm Optimization (PSO) algorithm as an alternative for the single-objective Genetic Algorithm (GA) that is currently the state-of-the-art method to map real-time applications tasks onto Multiple Processors System-on-a-Chip (MPSoC) using preemptive capable wormhole-based Network-on-a-Chip (NoC) as their communication architecture. A statistical study based on an experimental setup has been performed to compare the GA-based task mapper and the proposed method by using a real-time application as a benchmark, as well as a group of randomly generated ones. Preliminary results have shown that our method is capable of achieving quicker convergence than the GA-based method, and it even produces better results when the application utilization is smaller than the available processing capacity, i.e., a fully schedulable mapping solution exists.
本文提出了著名的粒子群优化(PSO)算法的改进版本,作为单目标遗传算法(GA)的替代方案,GA是目前最先进的方法,将实时应用任务映射到多处理器片上系统(MPSoC)上,使用具有抢占能力的基于虫洞的片上网络(NoC)作为其通信架构。以实时应用程序和随机生成的一组应用程序为基准,对基于遗传算法的任务映射器和提出的方法进行了统计研究。初步结果表明,该方法比基于遗传算法的方法收敛速度更快,甚至在应用程序利用率小于可用处理能力时也能产生更好的结果,即存在完全可调度的映射解决方案。
{"title":"An Adaptive Discrete Particle Swarm Optimization for Mapping Real-Time Applications onto Network-on-a-Chip based MPSoCs","authors":"J. B. D. Barros, R. C. Sampaio, C. Llanos","doi":"10.1145/3338852.3339835","DOIUrl":"https://doi.org/10.1145/3338852.3339835","url":null,"abstract":"This paper presents a modified version of the well-known Particle Swarm Optimization (PSO) algorithm as an alternative for the single-objective Genetic Algorithm (GA) that is currently the state-of-the-art method to map real-time applications tasks onto Multiple Processors System-on-a-Chip (MPSoC) using preemptive capable wormhole-based Network-on-a-Chip (NoC) as their communication architecture. A statistical study based on an experimental setup has been performed to compare the GA-based task mapper and the proposed method by using a real-time application as a benchmark, as well as a group of randomly generated ones. Preliminary results have shown that our method is capable of achieving quicker convergence than the GA-based method, and it even produces better results when the application utilization is smaller than the available processing capacity, i.e., a fully schedulable mapping solution exists.","PeriodicalId":184401,"journal":{"name":"2019 32nd Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"295 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134541366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A sub-1mA Highly Linear Inductorless Wideband LNA with Low IP3 sensitivity to Variability for IoT Applications 一种亚1ma的高线性无电感宽带LNA,对物联网应用的可变性具有低IP3灵敏度
Pub Date : 2019-08-26 DOI: 10.1145/3338852.3339858
A. L. T. Costa, H. Klimach, S. Bampi
This paper proposes a wideband 0.4-2 GHz cascode common-gate LNA that can be used as a building block for a noise canceling topology (which entails its noise to be canceled at the output node). The design strategy is to set the operating point by analyzing the third order coefficient $(alpha_{3})$ of the output current and the output voltage, which is designed using a load composed by a diode-connected PMOS transistor and a resistor in parallel. This operating point allows a reasonable $V_{GS}$ spread, maintaining a high IIP3 which implies a low IIP3 sensitivity to process variability. The design strategy also achieves a current consumption under 1 mA and, depending on the technology node $V_{DD}$ (CMOS 130 nm in this case), it can consume under 1 mW of power. This makes the wideband LNA suitable for IoT applications. Monte Carlo simulations have been carried out to demonstrate the operating region sensitivity to variability and achieves a result of worst case $IIP3_{mu}=+0.2 mathrm{dBm}$ with $sigma=0.8 mathrm{dBm}$ (@2GHz) up to a nominal 2.75 dBm @900 MHz, $S_{11} < -23 mathrm{dB}, mathrm{NF} < 5.5 mathrm{dB}$ (canceled by virtue of its topology), a voltage gain of 11.6-14.6 dB ($S_{21}=6.4-9.4 mathrm{dB}$ with a buffer to $50 Omega$), and consuming just 1.19 mW from a 1.2 V supply.
本文提出了一种宽带0.4-2 GHz级联码共门LNA,它可以用作噪声消除拓扑(需要在输出节点消除噪声)的构建块。设计策略是通过分析输出电流和输出电压的三阶系数$(alpha_{3})$来设定工作点,采用二极管连接的PMOS三极管和电阻并联构成的负载进行设计。该工作点允许合理的$V_{GS}$扩展,保持高IIP3,这意味着低IIP3对过程可变性的敏感性。该设计策略还实现了1 mA以下的电流消耗,并且根据技术节点$V_{DD}$(在这种情况下为CMOS 130 nm),它可以消耗1 mW以下的功率。这使得宽带LNA适用于物联网应用。已经进行了蒙特卡罗模拟,以证明工作区域对可变性的敏感性,并实现了最坏情况$IIP3_{mu}=+0.2 mathrm{dBm}$的结果,$sigma=0.8 mathrm{dBm}$ (@2GHz)高达名义2.75 dBm @900 MHz, $S_{11} < -23 mathrm{dB}, mathrm{NF} < 5.5 mathrm{dB}$(由于其拓扑而取消),电压增益为11.6-14.6 dB ($S_{21}=6.4-9.4 mathrm{dB}$带缓冲到$50 Omega$),并且从1.2 V电源仅消耗1.19 mW。
{"title":"A sub-1mA Highly Linear Inductorless Wideband LNA with Low IP3 sensitivity to Variability for IoT Applications","authors":"A. L. T. Costa, H. Klimach, S. Bampi","doi":"10.1145/3338852.3339858","DOIUrl":"https://doi.org/10.1145/3338852.3339858","url":null,"abstract":"This paper proposes a wideband 0.4-2 GHz cascode common-gate LNA that can be used as a building block for a noise canceling topology (which entails its noise to be canceled at the output node). The design strategy is to set the operating point by analyzing the third order coefficient $(alpha_{3})$ of the output current and the output voltage, which is designed using a load composed by a diode-connected PMOS transistor and a resistor in parallel. This operating point allows a reasonable $V_{GS}$ spread, maintaining a high IIP3 which implies a low IIP3 sensitivity to process variability. The design strategy also achieves a current consumption under 1 mA and, depending on the technology node $V_{DD}$ (CMOS 130 nm in this case), it can consume under 1 mW of power. This makes the wideband LNA suitable for IoT applications. Monte Carlo simulations have been carried out to demonstrate the operating region sensitivity to variability and achieves a result of worst case $IIP3_{mu}=+0.2 mathrm{dBm}$ with $sigma=0.8 mathrm{dBm}$ (@2GHz) up to a nominal 2.75 dBm @900 MHz, $S_{11} < -23 mathrm{dB}, mathrm{NF} < 5.5 mathrm{dB}$ (canceled by virtue of its topology), a voltage gain of 11.6-14.6 dB ($S_{21}=6.4-9.4 mathrm{dB}$ with a buffer to $50 Omega$), and consuming just 1.19 mW from a 1.2 V supply.","PeriodicalId":184401,"journal":{"name":"2019 32nd Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134349825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An FPGA-Based Evaluation Platform for Energy Harvesting Embedded Systems 基于fpga的能量采集嵌入式系统评估平台
Pub Date : 2019-08-26 DOI: 10.1145/3338852.3339863
Roberto Paulo Dias Alcantara Filho, O. A. D. L. Junior, C. Júnior
Extreme low-power embedded systems are essential in Smart Cities and the Internet of Things, once these systems are responsible for acquiring, processing, and transmitting valuable environmental data. Some of these systems should run for a very long time without any human intervention, even for batteries replacement. Energy harvesting technologies allow embedded systems to be powered up from the environment by converting surrounding energy sources into electrical energy. However, energy-harvesting embedded systems (EHES) heavily depends on the nature of the energy sources, which are mostly uncontrollable and unpredictable. To improve the evaluation of energy management techniques in EHES, we propose the emulation of I-V curves of low-power energy harvesting transducers. An FPGA-based platform controls the energy source emulation combined with an integrated logic analyzer, which allows real-time data gathering from the EHES in multiple evaluation scenarios. The experiments show that the platform replicates solar energy scenarios with only 0.56% mean error.
超低功耗嵌入式系统在智慧城市和物联网中至关重要,因为这些系统负责获取、处理和传输有价值的环境数据。其中一些系统应该在没有任何人为干预的情况下运行很长时间,即使是更换电池。能量收集技术允许嵌入式系统通过将周围的能源转换成电能来为环境供电。然而,能量收集嵌入式系统(EHES)在很大程度上取决于能源的性质,而这些能源大多是不可控和不可预测的。为了提高对EHES中能量管理技术的评估,我们提出了低功率能量收集传感器的I-V曲线仿真。基于fpga的平台控制能源仿真,并结合集成逻辑分析仪,允许在多种评估场景中从EHES实时收集数据。实验表明,该平台模拟太阳能场景的平均误差仅为0.56%。
{"title":"An FPGA-Based Evaluation Platform for Energy Harvesting Embedded Systems","authors":"Roberto Paulo Dias Alcantara Filho, O. A. D. L. Junior, C. Júnior","doi":"10.1145/3338852.3339863","DOIUrl":"https://doi.org/10.1145/3338852.3339863","url":null,"abstract":"Extreme low-power embedded systems are essential in Smart Cities and the Internet of Things, once these systems are responsible for acquiring, processing, and transmitting valuable environmental data. Some of these systems should run for a very long time without any human intervention, even for batteries replacement. Energy harvesting technologies allow embedded systems to be powered up from the environment by converting surrounding energy sources into electrical energy. However, energy-harvesting embedded systems (EHES) heavily depends on the nature of the energy sources, which are mostly uncontrollable and unpredictable. To improve the evaluation of energy management techniques in EHES, we propose the emulation of I-V curves of low-power energy harvesting transducers. An FPGA-based platform controls the energy source emulation combined with an integrated logic analyzer, which allows real-time data gathering from the EHES in multiple evaluation scenarios. The experiments show that the platform replicates solar energy scenarios with only 0.56% mean error.","PeriodicalId":184401,"journal":{"name":"2019 32nd Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115932486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Approximate Interpolation Filters for the Fractional Motion Estimation in HEVC Encoders and their VLSI Design HEVC编码器中分数运动估计的近似插值滤波器及其VLSI设计
Pub Date : 2019-08-26 DOI: 10.1145/3338852.3339859
Rafael da Silva, Ícaro Siqueira, M. Grellert
Motion Estimation (ME) is one of the most complex HEVC steps, consuming more than 60% of the average encoding time, most of which is spent on its fractional part (Fractional Motion Estimation - FME), in which sub-pixel samples are interpolated and searched over to find motion vectors with higher precision. This paper presents hardware designs for the sub-pixel interpolation unit of the FME step. The designs employ approximate computing techniques by reducing the number of taps in each filter to reduce memory access and hardware cost. The approximate filters were implemented in the HEVC reference software to assess their impact on coding performance. A complete interpolation architecture was implemented in VHDL and synthesized with different filter precision and input sizes in order to assess the effect of these parameters on hardware area and performance. The approximate designs reduce the number of adders/subtractors by up to 67.65% and memory bandwidth by up to 75% with a tolerable loss in coding performance (less than 1% using the Bjontegaard Delta bitrate metric). When synthesized to an FPGA device, 52.9% less logic elements are required with a modest increase in frequency.
运动估计(ME)是最复杂的HEVC步骤之一,占用平均编码时间的60%以上,其中大部分时间花在分数阶运动估计(FME)上,在分数阶运动估计中,对亚像素样本进行插值和搜索,以获得更高精度的运动向量。本文介绍了FME步进的亚像素插值单元的硬件设计。该设计采用近似计算技术,通过减少每个滤波器的抽头数量来减少内存访问和硬件成本。在HEVC参考软件中实现了近似滤波器,以评估其对编码性能的影响。在VHDL中实现了完整的插补架构,并综合了不同的滤波精度和输入大小,以评估这些参数对硬件面积和性能的影响。近似设计将加/减法器的数量减少了67.65%,内存带宽减少了75%,编码性能损失可容忍(使用Bjontegaard Delta比特率指标小于1%)。当合成到FPGA器件时,所需的逻辑元件减少52.9%,频率适度增加。
{"title":"Approximate Interpolation Filters for the Fractional Motion Estimation in HEVC Encoders and their VLSI Design","authors":"Rafael da Silva, Ícaro Siqueira, M. Grellert","doi":"10.1145/3338852.3339859","DOIUrl":"https://doi.org/10.1145/3338852.3339859","url":null,"abstract":"Motion Estimation (ME) is one of the most complex HEVC steps, consuming more than 60% of the average encoding time, most of which is spent on its fractional part (Fractional Motion Estimation - FME), in which sub-pixel samples are interpolated and searched over to find motion vectors with higher precision. This paper presents hardware designs for the sub-pixel interpolation unit of the FME step. The designs employ approximate computing techniques by reducing the number of taps in each filter to reduce memory access and hardware cost. The approximate filters were implemented in the HEVC reference software to assess their impact on coding performance. A complete interpolation architecture was implemented in VHDL and synthesized with different filter precision and input sizes in order to assess the effect of these parameters on hardware area and performance. The approximate designs reduce the number of adders/subtractors by up to 67.65% and memory bandwidth by up to 75% with a tolerable loss in coding performance (less than 1% using the Bjontegaard Delta bitrate metric). When synthesized to an FPGA device, 52.9% less logic elements are required with a modest increase in frequency.","PeriodicalId":184401,"journal":{"name":"2019 32nd Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121031160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Performance Evaluation of HEVC RCL Applications Mapped onto NoC-Based Embedded Platforms 基于noc的嵌入式平台上HEVC RCL应用的性能评价
Pub Date : 2019-08-26 DOI: 10.1145/3338852.3339868
Wagner Penny, D. Palomino, M. Porto, B. Zatt, L. Indrusiak
Today, several applications running into embedded systems have to fulfill soft or hard timing constraints. Video applications, like the modern High Efficiency Video Coding (HEVC), e.g., most often have soft real-time constraints. However, in specific scenarios, such as in robotic surgeries, the coupling of satellites and so on, harder timing constraints arise, becoming a huge challenge. Although the implementation of such applications in Networks-on-Chip (NoCs) being an alternative to reduce their algorithmic complexity and meet real-time constraints, a performance evaluation of the mapped NoC and the schedulability analysis for a given application are mandatory. In this work we make a performance evaluation of HEVC Residual Coding Loop (RCL) mapped onto a NoC-based embedded platform, considering the encoding of a single $1920mathrm{x}1080$ pixels frame. A set of analysis exploring the combination of different NoC sizes and task mapping strategies were performed, showing for the typical and upper-bound workload cases scenarios when the application is schedulable and meets the real-time constraints.
今天,一些运行在嵌入式系统中的应用程序必须满足软或硬时间限制。视频应用,比如现代的高效视频编码(HEVC),通常都有软实时限制。然而,在特定的情况下,如机器人手术,卫星耦合等,更难的时间限制出现,成为一个巨大的挑战。尽管在片上网络(NoC)中实现这些应用程序是降低算法复杂性和满足实时限制的另一种选择,但对映射的NoC进行性能评估和给定应用程序的可调度性分析是必须的。在这项工作中,我们对映射到基于noc的嵌入式平台上的HEVC残余编码环路(RCL)进行了性能评估,考虑到单个$1920 mathm {x}1080$像素帧的编码。对不同NoC大小和任务映射策略的组合进行了一系列分析,展示了应用程序可调度并满足实时约束的典型和上限工作负载场景。
{"title":"Performance Evaluation of HEVC RCL Applications Mapped onto NoC-Based Embedded Platforms","authors":"Wagner Penny, D. Palomino, M. Porto, B. Zatt, L. Indrusiak","doi":"10.1145/3338852.3339868","DOIUrl":"https://doi.org/10.1145/3338852.3339868","url":null,"abstract":"Today, several applications running into embedded systems have to fulfill soft or hard timing constraints. Video applications, like the modern High Efficiency Video Coding (HEVC), e.g., most often have soft real-time constraints. However, in specific scenarios, such as in robotic surgeries, the coupling of satellites and so on, harder timing constraints arise, becoming a huge challenge. Although the implementation of such applications in Networks-on-Chip (NoCs) being an alternative to reduce their algorithmic complexity and meet real-time constraints, a performance evaluation of the mapped NoC and the schedulability analysis for a given application are mandatory. In this work we make a performance evaluation of HEVC Residual Coding Loop (RCL) mapped onto a NoC-based embedded platform, considering the encoding of a single $1920mathrm{x}1080$ pixels frame. A set of analysis exploring the combination of different NoC sizes and task mapping strategies were performed, showing for the typical and upper-bound workload cases scenarios when the application is schedulable and meets the real-time constraints.","PeriodicalId":184401,"journal":{"name":"2019 32nd Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123273406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reduction of Neural Network Circuits by Constant and Nearly Constant Signal Propagation 恒和近恒信号传播的神经网络电路约简
Pub Date : 2019-08-26 DOI: 10.1145/3338852.3339874
A. Berndt, A. Mishchenko, P. Butzen, A. Reis
This work focuses on optimizing circuits representing neural networks (NNs) in the form of and-inverter graphs (AIGs). The optimization is done by analyzing the training set of the neural network to find constant bit values at the primary inputs. The constant values are then propagated through the AIG, which results in removing unnecessary nodes. Furthermore, a trade-off between neural network accuracy and its reduction due to constant propagation is investigated by replacing with constants those inputs that are likely to be zero or one. The experimental results show a significant reduction in circuit size with negligible loss in accuracy.
这项工作的重点是优化以与逆变器图(AIGs)形式表示神经网络(nn)的电路。优化是通过分析神经网络的训练集,在主输入处找到恒定的位值来完成的。然后,常量值通过AIG传播,从而删除不必要的节点。此外,通过将可能为0或1的输入替换为常数,研究了神经网络精度与常数传播导致的减少之间的权衡。实验结果表明,电路尺寸显著减小,精度损失可忽略不计。
{"title":"Reduction of Neural Network Circuits by Constant and Nearly Constant Signal Propagation","authors":"A. Berndt, A. Mishchenko, P. Butzen, A. Reis","doi":"10.1145/3338852.3339874","DOIUrl":"https://doi.org/10.1145/3338852.3339874","url":null,"abstract":"This work focuses on optimizing circuits representing neural networks (NNs) in the form of and-inverter graphs (AIGs). The optimization is done by analyzing the training set of the neural network to find constant bit values at the primary inputs. The constant values are then propagated through the AIG, which results in removing unnecessary nodes. Furthermore, a trade-off between neural network accuracy and its reduction due to constant propagation is investigated by replacing with constants those inputs that are likely to be zero or one. The experimental results show a significant reduction in circuit size with negligible loss in accuracy.","PeriodicalId":184401,"journal":{"name":"2019 32nd Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123187094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Toward Nanometric Scale Integration: An Automatic Routing Approach for NML Circuits 迈向纳米级集成:NML电路的自动布线方法
Pub Date : 2019-08-26 DOI: 10.1145/3338852.3339862
P. A. Silva, O. P. V. Neto, J. Nacif
In recent years, many technologies have been studied to replace or complement CMOS. Some of these emerging technologies are known as Field Coupled Nanocomputing. However, these new technologies introduce the need for developing tools to perform circuit mapping, placement, and routing. Nanomagnetic Logic Circuit (NML) is one of these emergent technologies. It relies on the magnetization of nanomagnets to perform operations through majority logic. In this work, we propose an approach to map a gate-level circuit to an NML layout automatically. We use the Breadth First Search to perform the placement and the A* algorithm to transverse the circuit and build the routes for each node. To evaluate the effectiveness of our approach, we use a series of ISCAS'85 benchmarks. Our results show an area reduction varying from 20% to 60%.
近年来,人们研究了许多替代或补充CMOS的技术。其中一些新兴技术被称为场耦合纳米计算。然而,这些新技术引入了开发工具来执行电路映射、放置和路由的需求。纳米磁逻辑电路(NML)就是其中的一种新兴技术。它依靠纳米磁体的磁化来通过多数逻辑执行操作。在这项工作中,我们提出了一种将门级电路自动映射到NML布局的方法。我们使用广度优先搜索来执行布局,并使用A*算法来横向电路并为每个节点构建路由。为了评估我们方法的有效性,我们使用了一系列ISCAS'85基准。我们的结果显示,面积减少从20%到60%不等。
{"title":"Toward Nanometric Scale Integration: An Automatic Routing Approach for NML Circuits","authors":"P. A. Silva, O. P. V. Neto, J. Nacif","doi":"10.1145/3338852.3339862","DOIUrl":"https://doi.org/10.1145/3338852.3339862","url":null,"abstract":"In recent years, many technologies have been studied to replace or complement CMOS. Some of these emerging technologies are known as Field Coupled Nanocomputing. However, these new technologies introduce the need for developing tools to perform circuit mapping, placement, and routing. Nanomagnetic Logic Circuit (NML) is one of these emergent technologies. It relies on the magnetization of nanomagnets to perform operations through majority logic. In this work, we propose an approach to map a gate-level circuit to an NML layout automatically. We use the Breadth First Search to perform the placement and the A* algorithm to transverse the circuit and build the routes for each node. To evaluate the effectiveness of our approach, we use a series of ISCAS'85 benchmarks. Our results show an area reduction varying from 20% to 60%.","PeriodicalId":184401,"journal":{"name":"2019 32nd Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123829559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2019 32nd Symposium on Integrated Circuits and Systems Design (SBCCI)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1