首页 > 最新文献

Journal of Low Power Electronics and Applications最新文献

英文 中文
Electrical Impedance Tomography for Hand Gesture Recognition for HMI Interaction Applications 用于人机界面交互应用的手势识别电阻抗断层扫描
IF 2.1 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-07-18 DOI: 10.3390/jlpea12030041
Noelia Vaquero-Gallardo, H. Martínez-García
Electrical impedance tomography (EIT) is based on the physical principle of bioimpedance defined as the opposition that biological tissues exhibit to the flow of a rotating alternating electrical current. Consequently, here, we propose studying the characterization and classification of bioimpedance patterns based on EIT by measuring, on the forearm with eight electrodes in a non-invasive way, the potential drops resulting from the execution of six hand gestures. The starting point was the acquisition of bioimpedance patterns studied by means of principal component analysis (PCA), validated through the cross-validation technique, and classified using the k-nearest neighbor (kNN) classification algorithm. As a result, it is concluded that reduction and classification is feasible, with a sensitivity of 0.89 in the worst case, for each of the reduced bioimpedance patterns, leading to the following direct advantage: a reduction in the numbers of electrodes and electronics required. In this work, bioimpedance patterns were investigated for monitoring subjects’ mobility, where, generally, these solutions are based on a sensor system with moving parts that suffer from significant problems of wear, lack of adaptability to the patient, and lack of resolution. Whereas, the proposal implemented in this prototype, based on the so-called electrical impedance tomography, does not have these problems.
电阻抗断层扫描(EIT)是基于生物阻抗的物理原理,定义为生物组织对旋转交变电流的反对。因此,在此,我们建议研究基于EIT的生物阻抗模式的表征和分类,通过在前臂上以非侵入方式测量8个电极,执行6个手势产生的电位下降。首先通过主成分分析(PCA)获取生物阻抗模式,通过交叉验证技术进行验证,并使用k-最近邻(kNN)分类算法进行分类。因此,得出的结论是,对每个减少的生物阻抗模式进行还原和分类是可行的,在最坏的情况下灵敏度为0.89,导致以下直接优势:所需电极和电子设备的数量减少。在这项工作中,研究了生物阻抗模式用于监测受试者的移动性,其中,通常,这些解决方案是基于具有运动部件的传感器系统,这些部件存在严重的磨损问题,对患者缺乏适应性,并且缺乏分辨率。然而,在这个原型中实现的建议,基于所谓的电阻抗断层扫描,没有这些问题。
{"title":"Electrical Impedance Tomography for Hand Gesture Recognition for HMI Interaction Applications","authors":"Noelia Vaquero-Gallardo, H. Martínez-García","doi":"10.3390/jlpea12030041","DOIUrl":"https://doi.org/10.3390/jlpea12030041","url":null,"abstract":"Electrical impedance tomography (EIT) is based on the physical principle of bioimpedance defined as the opposition that biological tissues exhibit to the flow of a rotating alternating electrical current. Consequently, here, we propose studying the characterization and classification of bioimpedance patterns based on EIT by measuring, on the forearm with eight electrodes in a non-invasive way, the potential drops resulting from the execution of six hand gestures. The starting point was the acquisition of bioimpedance patterns studied by means of principal component analysis (PCA), validated through the cross-validation technique, and classified using the k-nearest neighbor (kNN) classification algorithm. As a result, it is concluded that reduction and classification is feasible, with a sensitivity of 0.89 in the worst case, for each of the reduced bioimpedance patterns, leading to the following direct advantage: a reduction in the numbers of electrodes and electronics required. In this work, bioimpedance patterns were investigated for monitoring subjects’ mobility, where, generally, these solutions are based on a sensor system with moving parts that suffer from significant problems of wear, lack of adaptability to the patient, and lack of resolution. Whereas, the proposal implemented in this prototype, based on the so-called electrical impedance tomography, does not have these problems.","PeriodicalId":38100,"journal":{"name":"Journal of Low Power Electronics and Applications","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47159440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Efficiency of Priority Queue Architectures in FPGA FPGA中优先级队列结构的效率
IF 2.1 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-07-14 DOI: 10.3390/jlpea12030039
L. Kohútka
This paper presents a novel SRAM-based architecture of a data structure that represents a set of multiple priority queues that can be implemented in FPGA or ASIC. The proposed architecture is based on shift registers, systolic arrays and SRAM memories. Such architecture, called MultiQueue, is optimized for minimum chip area costs, which leads to lower energy consumption too. The MultiQueue architecture has constant time complexity, constant critical path length and constant latency. Therefore, it is highly predictable and very suitable for real-time systems too. The proposed architecture was verified using a simplified version of UVM and applying millions of instructions with randomly generated input values. Achieved FPGA synthesis results are presented and discussed. These results show significant savings in FPGA Look-Up Tables consumption in comparison to existing solutions. More than 63% of Look-Up Tables can be saved using the MultiQueue architecture instead of the existing priority queues.
本文提出了一种新的基于SRAM的数据结构架构,该架构表示一组可以在FPGA或ASIC中实现的多优先级队列。所提出的体系结构基于移位寄存器、收缩阵列和SRAM存储器。这种被称为“多队列”的架构针对最小的芯片面积成本进行了优化,这也降低了能耗。多队列体系结构具有恒定的时间复杂性、恒定的关键路径长度和恒定的延迟。因此,它具有高度的可预测性,也非常适合实时系统。所提出的体系结构使用简化版本的UVM进行了验证,并应用了数百万条具有随机生成输入值的指令。给出并讨论了所实现的FPGA综合结果。这些结果表明,与现有解决方案相比,FPGA查找表的消耗显著节省。使用MultiQueue体系结构而不是现有的优先级队列,可以保存63%以上的查找表。
{"title":"Efficiency of Priority Queue Architectures in FPGA","authors":"L. Kohútka","doi":"10.3390/jlpea12030039","DOIUrl":"https://doi.org/10.3390/jlpea12030039","url":null,"abstract":"This paper presents a novel SRAM-based architecture of a data structure that represents a set of multiple priority queues that can be implemented in FPGA or ASIC. The proposed architecture is based on shift registers, systolic arrays and SRAM memories. Such architecture, called MultiQueue, is optimized for minimum chip area costs, which leads to lower energy consumption too. The MultiQueue architecture has constant time complexity, constant critical path length and constant latency. Therefore, it is highly predictable and very suitable for real-time systems too. The proposed architecture was verified using a simplified version of UVM and applying millions of instructions with randomly generated input values. Achieved FPGA synthesis results are presented and discussed. These results show significant savings in FPGA Look-Up Tables consumption in comparison to existing solutions. More than 63% of Look-Up Tables can be saved using the MultiQueue architecture instead of the existing priority queues.","PeriodicalId":38100,"journal":{"name":"Journal of Low Power Electronics and Applications","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2022-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44650522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Analysis and Comparison of Different Approaches to Implementing a Network-Based Parallel Data Processing Algorithm 基于网络的并行数据处理算法的不同实现方法的分析与比较
IF 2.1 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-07-09 DOI: 10.3390/jlpea12030038
I. Skliarova
It is well known that network-based parallel data processing algorithms are well suited to implementation in reconfigurable hardware recurring to either Field-Programmable Gate Arrays (FPGA) or Programmable Systems-on-Chip (PSoC). The intrinsic parallelism of these devices makes it possible to execute several data-independent network operations in parallel. However, the approaches to designing the respective systems vary significantly with the experience and background of the engineer in charge. In this paper, we analyze and compare the pros and cons of using an embedded processor, high-level synthesis methods, and register-transfer low-level design in terms of design effort, performance, and power consumption for implementing a parallel algorithm to find the two smallest values in a dataset. This problem is easy to formulate, has a number of practical applications (for instance, in low-density parity check decoders), and is very well suited to parallel implementation based on comparator networks.
众所周知,基于网络的并行数据处理算法非常适合在现场可编程门阵列(FPGA)或可编程片上系统(PSoC)的可重构硬件中实现。这些设备固有的并行性使得并行执行多个数据无关的网络操作成为可能。然而,设计各自系统的方法因主管工程师的经验和背景而有很大差异。在本文中,我们分析和比较了使用嵌入式处理器、高级合成方法和寄存器传输低级设计的优点和缺点,在设计工作量、性能和功耗方面实现并行算法以找到数据集中的两个最小值。这个问题很容易表述,有许多实际应用(例如,在低密度奇偶校验解码器中),并且非常适合基于比较器网络的并行实现。
{"title":"Analysis and Comparison of Different Approaches to Implementing a Network-Based Parallel Data Processing Algorithm","authors":"I. Skliarova","doi":"10.3390/jlpea12030038","DOIUrl":"https://doi.org/10.3390/jlpea12030038","url":null,"abstract":"It is well known that network-based parallel data processing algorithms are well suited to implementation in reconfigurable hardware recurring to either Field-Programmable Gate Arrays (FPGA) or Programmable Systems-on-Chip (PSoC). The intrinsic parallelism of these devices makes it possible to execute several data-independent network operations in parallel. However, the approaches to designing the respective systems vary significantly with the experience and background of the engineer in charge. In this paper, we analyze and compare the pros and cons of using an embedded processor, high-level synthesis methods, and register-transfer low-level design in terms of design effort, performance, and power consumption for implementing a parallel algorithm to find the two smallest values in a dataset. This problem is easy to formulate, has a number of practical applications (for instance, in low-density parity check decoders), and is very well suited to parallel implementation based on comparator networks.","PeriodicalId":38100,"journal":{"name":"Journal of Low Power Electronics and Applications","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2022-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48912454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning Approaches to Source Code Analysis for Optimization of Heterogeneous Systems: Recent Results, Challenges and Opportunities 面向异构系统优化的源代码分析的深度学习方法:最新成果、挑战和机遇
IF 2.1 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-07-05 DOI: 10.3390/jlpea12030037
Francesco Barchi, Emanuele Parisi, Andrea Bartolini, A. Acquaviva
To cope with the increasing complexity of digital systems programming, deep learning techniques have recently been proposed to enhance software deployment by analysing source code for different purposes, ranging from performance and energy improvement to debugging and security assessment. As embedded platforms for cyber-physical systems are characterised by increasing heterogeneity and parallelism, one of the most challenging and specific problems is efficiently allocating computational kernels to available hardware resources. In this field, deep learning applied to source code can be a key enabler to face this complexity. However, due to the rapid development of such techniques, it is not easy to understand which of those are suitable and most promising for this class of systems. For this purpose, we discuss recent developments in deep learning for source code analysis, and focus on techniques for kernel mapping on heterogeneous platforms, highlighting recent results, challenges and opportunities for their applications to cyber-physical systems.
为了应对数字系统编程日益复杂的问题,最近提出了深度学习技术,通过分析不同目的的源代码来增强软件部署,从性能和能量改进到调试和安全评估。由于网络物理系统的嵌入式平台具有日益增加的异构性和并行性的特点,最具挑战性和最具体的问题之一是将计算内核有效地分配给可用的硬件资源。在这个领域,应用于源代码的深度学习可能是应对这种复杂性的关键因素。然而,由于这些技术的快速发展,不容易理解哪种技术适用于这类系统并且最有前景。为此,我们讨论了用于源代码分析的深度学习的最新发展,并重点讨论了异构平台上的内核映射技术,强调了其在网络物理系统中应用的最新成果、挑战和机遇。
{"title":"Deep Learning Approaches to Source Code Analysis for Optimization of Heterogeneous Systems: Recent Results, Challenges and Opportunities","authors":"Francesco Barchi, Emanuele Parisi, Andrea Bartolini, A. Acquaviva","doi":"10.3390/jlpea12030037","DOIUrl":"https://doi.org/10.3390/jlpea12030037","url":null,"abstract":"To cope with the increasing complexity of digital systems programming, deep learning techniques have recently been proposed to enhance software deployment by analysing source code for different purposes, ranging from performance and energy improvement to debugging and security assessment. As embedded platforms for cyber-physical systems are characterised by increasing heterogeneity and parallelism, one of the most challenging and specific problems is efficiently allocating computational kernels to available hardware resources. In this field, deep learning applied to source code can be a key enabler to face this complexity. However, due to the rapid development of such techniques, it is not easy to understand which of those are suitable and most promising for this class of systems. For this purpose, we discuss recent developments in deep learning for source code analysis, and focus on techniques for kernel mapping on heterogeneous platforms, highlighting recent results, challenges and opportunities for their applications to cyber-physical systems.","PeriodicalId":38100,"journal":{"name":"Journal of Low Power Electronics and Applications","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46451402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
±0.3V Bulk-Driven Fully Differential Buffer with High Figures of Merit ±0.3V大容量驱动的高性能全差分缓冲器
IF 2.1 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-06-22 DOI: 10.3390/jlpea12030035
Manaswini Gangineni, J. Ramírez-Angulo, H. Vázquez-Leal, J. Huerta-Chua, A. López-Martín, R. Carvajal
A high performance bulk-driven rail-to-rail fully differential buffer operating from ±0.3V supplies in 180 nm CMOS technology is reported. It has a differential–difference input stage and common mode feedback circuits implemented with no-tail, high CMRR bulk-driven pseudo-differential cells. It operates in subthreshold, has infinite input impedance, low output impedance (1.4 kΩ), 86.77 dB DC open-loop gain, 172.91 kHz bandwidth and 0.684 μW static power dissipation with a 50-pF load capacitance. The buffer has power efficient class AB operation, a small signal figure of merit FOMSS = 12.69 MHzpFμW−1, a large signal figure of merit FOMLS = 34.89 (V/μs) pFμW−1, CMRR = 102 dB, PSRR+ = 109 dB, PSRR− = 100 dB, 1.1 μV/√Hz input noise spectral density, 0.3 mVrms input noise and 3.5 mV input DC offset voltage.
报道了一种在180nm CMOS技术中从±0.3V电源工作的高性能体块驱动轨对轨全差分缓冲器。它有一个差分-差分输入级和共模反馈电路,实现了无尾、高CMRR体驱动的伪差分单元。它在亚阈值下工作,具有无限输入阻抗,低输出阻抗(1.4kΩ), 86.77 dB直流开环增益,172.91 kHz带宽和0.684μW静态功耗,负载电容为50 pF。该缓冲器具有功率效率AB类操作,小信号品质因数FOMS=12.69 MHzpFμW−1,大信号品质因数FOMLS=34.89(V/μs)pFμW−1,CMRR=102 dB,PSRR+=109 dB,PSSR−=100 dB,1.1μV/√Hz输入噪声频谱密度,0.3 mVrms输入噪声和3.5 mV输入直流偏移电压。
{"title":"±0.3V Bulk-Driven Fully Differential Buffer with High Figures of Merit","authors":"Manaswini Gangineni, J. Ramírez-Angulo, H. Vázquez-Leal, J. Huerta-Chua, A. López-Martín, R. Carvajal","doi":"10.3390/jlpea12030035","DOIUrl":"https://doi.org/10.3390/jlpea12030035","url":null,"abstract":"A high performance bulk-driven rail-to-rail fully differential buffer operating from ±0.3V supplies in 180 nm CMOS technology is reported. It has a differential–difference input stage and common mode feedback circuits implemented with no-tail, high CMRR bulk-driven pseudo-differential cells. It operates in subthreshold, has infinite input impedance, low output impedance (1.4 kΩ), 86.77 dB DC open-loop gain, 172.91 kHz bandwidth and 0.684 μW static power dissipation with a 50-pF load capacitance. The buffer has power efficient class AB operation, a small signal figure of merit FOMSS = 12.69 MHzpFμW−1, a large signal figure of merit FOMLS = 34.89 (V/μs) pFμW−1, CMRR = 102 dB, PSRR+ = 109 dB, PSRR− = 100 dB, 1.1 μV/√Hz input noise spectral density, 0.3 mVrms input noise and 3.5 mV input DC offset voltage.","PeriodicalId":38100,"journal":{"name":"Journal of Low Power Electronics and Applications","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42505620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Implementing a Timing Error-Resilient and Energy-Efficient Near-Threshold Hardware Accelerator for Deep Neural Network Inference 实现一种用于深度神经网络推理的时间误差弹性和节能的近阈值硬件加速器
IF 2.1 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-06-06 DOI: 10.3390/jlpea12020032
N. D. Gundi, Pramesh Pandey, Sanghamitra Roy, Koushik Chakraborty
Increasing processing requirements in the Artificial Intelligence (AI) realm has led to the emergence of domain-specific architectures for Deep Neural Network (DNN) applications. Tensor Processing Unit (TPU), a DNN accelerator by Google, has emerged as a front runner outclassing its contemporaries, CPUs and GPUs, in performance by 15×–30×. TPUs have been deployed in Google data centers to cater to the performance demands. However, a TPU’s performance enhancement is accompanied by a mammoth power consumption. In the pursuit of lowering the energy utilization, this paper proposes PREDITOR—a low-power TPU operating in the Near-Threshold Computing (NTC) realm. PREDITOR uses mathematical analysis to mitigate the undetectable timing errors by boosting the voltage of the selective multiplier-and-accumulator units at specific intervals to enhance the performance of the NTC TPU, thereby ensuring a high inference accuracy at low voltage. PREDITOR offers up to 3×–5× improved performance in comparison to the leading-edge error mitigation schemes with a minor loss in accuracy.
人工智能(AI)领域不断增长的处理需求导致了深度神经网络(DNN)应用领域特定架构的出现。张量处理单元(TPU),谷歌的深度神经网络加速器,在性能上领先于同时代的cpu和gpu,领先15×-30×。在谷歌数据中心部署tpu以满足性能需求。然而,TPU的性能提升伴随着巨大的功耗。为了降低能量利用率,本文提出了一种运行在近阈值计算(NTC)领域的低功耗TPU - pre。PREDITOR使用数学分析来减轻不可检测的时序误差,通过在特定间隔提高选择性乘法器和累加器单元的电压来增强NTC TPU的性能,从而确保在低电压下的高推断精度。与领先的错误缓解方案相比,PREDITOR提供了高达3×-5×的改进性能,并且精度损失较小。
{"title":"Implementing a Timing Error-Resilient and Energy-Efficient Near-Threshold Hardware Accelerator for Deep Neural Network Inference","authors":"N. D. Gundi, Pramesh Pandey, Sanghamitra Roy, Koushik Chakraborty","doi":"10.3390/jlpea12020032","DOIUrl":"https://doi.org/10.3390/jlpea12020032","url":null,"abstract":"Increasing processing requirements in the Artificial Intelligence (AI) realm has led to the emergence of domain-specific architectures for Deep Neural Network (DNN) applications. Tensor Processing Unit (TPU), a DNN accelerator by Google, has emerged as a front runner outclassing its contemporaries, CPUs and GPUs, in performance by 15×–30×. TPUs have been deployed in Google data centers to cater to the performance demands. However, a TPU’s performance enhancement is accompanied by a mammoth power consumption. In the pursuit of lowering the energy utilization, this paper proposes PREDITOR—a low-power TPU operating in the Near-Threshold Computing (NTC) realm. PREDITOR uses mathematical analysis to mitigate the undetectable timing errors by boosting the voltage of the selective multiplier-and-accumulator units at specific intervals to enhance the performance of the NTC TPU, thereby ensuring a high inference accuracy at low voltage. PREDITOR offers up to 3×–5× improved performance in comparison to the leading-edge error mitigation schemes with a minor loss in accuracy.","PeriodicalId":38100,"journal":{"name":"Journal of Low Power Electronics and Applications","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45014526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Potential of SoC FPAAs for Emerging Ultra-Low-Power Machine Learning SoC FPAAs在新兴超低功耗机器学习中的潜力
IF 2.1 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-06-06 DOI: 10.3390/jlpea12020033
J. Hasler
Large-scale field-programmable analog arrays (FPAA) have the potential to handle machine inference and learning applications with significantly low energy requirements, potentially alleviating the high cost of these processes today, even in cloud-based systems. FPAA devices enable embedded machine learning, one form of physical mixed-signal computing, enabling machine learning and inference on low-power embedded platforms, particularly edge platforms. This discussion reviews the current capabilities of large-scale field-programmable analog arrays (FPAA), as well as considering the future potential of these SoC FPAA devices, including questions that enable ubiquitous use of FPAA devices similar to FPGA devices. Today’s FPAA devices include integrated analog and digital fabric, as well as specialized processors and infrastructure, becoming a platform of mixed-signal development and analog-enabled computing. We address and show that next-generation FPAAs can handle the required load of 10,000–10,000,000,000 PMAC, required for present and future large fielded applications, at orders of magnitude of lower energy levels than those expected by current technology, motivating the need to develop these new generations of FPAA devices.
大规模现场可编程模拟阵列(FPAA)有潜力以极低的能量需求处理机器推理和学习应用,甚至在基于云的系统中,也有可能减轻这些过程的高成本。FPAA设备支持嵌入式机器学习,这是一种物理混合信号计算形式,可以在低功耗嵌入式平台(特别是边缘平台)上实现机器学习和推理。本讨论回顾了大规模现场可编程模拟阵列(FPAA)的当前功能,并考虑了这些SoC FPAA器件的未来潜力,包括使FPAA器件类似于FPGA器件的普遍使用的问题。今天的FPAA设备包括集成的模拟和数字结构,以及专门的处理器和基础设施,成为混合信号开发和模拟计算的平台。我们解决并表明,下一代FPAA可以处理当前和未来大型现场应用所需的10,000-10,000,000,000 PMAC的所需负载,其能量水平比当前技术预期的低几个数量级,从而激发了开发这些新一代FPAA设备的需求。
{"title":"The Potential of SoC FPAAs for Emerging Ultra-Low-Power Machine Learning","authors":"J. Hasler","doi":"10.3390/jlpea12020033","DOIUrl":"https://doi.org/10.3390/jlpea12020033","url":null,"abstract":"Large-scale field-programmable analog arrays (FPAA) have the potential to handle machine inference and learning applications with significantly low energy requirements, potentially alleviating the high cost of these processes today, even in cloud-based systems. FPAA devices enable embedded machine learning, one form of physical mixed-signal computing, enabling machine learning and inference on low-power embedded platforms, particularly edge platforms. This discussion reviews the current capabilities of large-scale field-programmable analog arrays (FPAA), as well as considering the future potential of these SoC FPAA devices, including questions that enable ubiquitous use of FPAA devices similar to FPGA devices. Today’s FPAA devices include integrated analog and digital fabric, as well as specialized processors and infrastructure, becoming a platform of mixed-signal development and analog-enabled computing. We address and show that next-generation FPAAs can handle the required load of 10,000–10,000,000,000 PMAC, required for present and future large fielded applications, at orders of magnitude of lower energy levels than those expected by current technology, motivating the need to develop these new generations of FPAA devices.","PeriodicalId":38100,"journal":{"name":"Journal of Low Power Electronics and Applications","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43142468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Methodology to Design Static NCL Libraries 一种设计静态NCL库的方法
IF 2.1 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-06-06 DOI: 10.3390/jlpea12020031
Toi Le Thanh, Lac Truong Tri, Trang Hoang
The Null Convention Logic (NCL) based asynchronous design technique has interested researchers because this technique had overcome disadvantages of the synchronous technique, such as noise, glitches, clock skew and power. However, using the NCL-based asynchronous design method is difficult for university students and researchers because of the lack of standard NCL cell libraries. Therefore, in this paper, a novel flow is proposed to design NCL cell libraries. These libraries are used to synthesize NCL-based asynchronous designs. We chose the static NCL cell library to illustrate the proposed design solution because this library is one of the most basic NCL libraries. Static NCL cells in this library are designed based on the Process Design Kit 45nm technology and are implemented by the Virtuoso and the Design Compiler (DC) tool. In addition, the Ocean script and Electronic Design Automation (EDA) environment are used for supporting designs and simulations. A complete library of 27 NCL cells was designed to serve for study and research. We also implemented synthesis for NCL full adders using this library and compared our synthesis results with the results of other authors. The comparison results indicated that our results were a 20% improvement on power consumption.
基于空约定逻辑(NCL)的异步设计技术由于克服了同步技术的缺点,如噪声、故障、时钟倾斜和功耗等,引起了研究人员的兴趣。然而,由于缺乏标准的NCL单元库,使用基于NCL的异步设计方法对大学生和研究人员来说是困难的。因此,本文提出了一种新的NCL单元库设计流程。这些库用于综合基于ncl的异步设计。我们选择静态NCL单元库来说明所建议的设计解决方案,因为该库是最基本的NCL库之一。该库中的静态NCL单元是基于Process Design Kit 45纳米技术设计的,并由Virtuoso和Design Compiler (DC)工具实现。此外,Ocean脚本和电子设计自动化(EDA)环境用于支持设计和仿真。设计了一个完整的27个NCL细胞文库,以供学习和研究。我们还使用该库实现了NCL全加法器的合成,并将合成结果与其他作者的结果进行了比较。比较结果表明,我们的结果在功耗方面提高了20%。
{"title":"A Methodology to Design Static NCL Libraries","authors":"Toi Le Thanh, Lac Truong Tri, Trang Hoang","doi":"10.3390/jlpea12020031","DOIUrl":"https://doi.org/10.3390/jlpea12020031","url":null,"abstract":"The Null Convention Logic (NCL) based asynchronous design technique has interested researchers because this technique had overcome disadvantages of the synchronous technique, such as noise, glitches, clock skew and power. However, using the NCL-based asynchronous design method is difficult for university students and researchers because of the lack of standard NCL cell libraries. Therefore, in this paper, a novel flow is proposed to design NCL cell libraries. These libraries are used to synthesize NCL-based asynchronous designs. We chose the static NCL cell library to illustrate the proposed design solution because this library is one of the most basic NCL libraries. Static NCL cells in this library are designed based on the Process Design Kit 45nm technology and are implemented by the Virtuoso and the Design Compiler (DC) tool. In addition, the Ocean script and Electronic Design Automation (EDA) environment are used for supporting designs and simulations. A complete library of 27 NCL cells was designed to serve for study and research. We also implemented synthesis for NCL full adders using this library and compared our synthesis results with the results of other authors. The comparison results indicated that our results were a 20% improvement on power consumption.","PeriodicalId":38100,"journal":{"name":"Journal of Low Power Electronics and Applications","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43577609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Embedded Object Detection with Custom LittleNet, FINN and Vitis AI DCNN Accelerators 嵌入式目标检测与自定义littleet, FINN和Vitis AI DCNN加速器
IF 2.1 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-05-20 DOI: 10.3390/jlpea12020030
Michal Machura, M. Danilowicz, T. Kryjak
Object detection is an essential component of many systems used, for example, in advanced driver assistance systems (ADAS) or advanced video surveillance systems (AVSS). Currently, the highest detection accuracy is achieved by solutions using deep convolutional neural networks (DCNN). Unfortunately, these come at the cost of a high computational complexity; hence, the work on the widely understood acceleration of these algorithms is very important and timely. In this work, we compare three different DCNN hardware accelerator implementation methods: coarse-grained (a custom accelerator called LittleNet), fine-grained (FINN) and sequential (Vitis AI). We evaluate the approaches in terms of object detection accuracy, throughput and energy usage on the VOT and VTB datasets. We also present the limitations of each of the methods considered. We describe the whole process of DNNs implementation, including architecture design, training, quantisation and hardware implementation. We used two custom DNN architectures to obtain a higher accuracy, higher throughput and lower energy consumption. The first was implemented in SystemVerilog and the second with the FINN tool from AMD Xilinx. Next, both approaches were compared with the Vitis AI tool from AMD Xilinx. The final implementations were tested on the Avnet Ultra96-V2 development board with the Zynq UltraScale+ MPSoC ZCU3EG device. For two different DNNs architectures, we achieved a throughput of 196 fps for our custom accelerator and 111 fps for FINN. The same networks implemented with Vitis AI achieved 123.3 fps and 53.3 fps, respectively.
物体检测是许多系统的重要组成部分,例如高级驾驶辅助系统(ADAS)或高级视频监控系统(AVSS)。目前,使用深度卷积神经网络(DCNN)的解决方案可以实现最高的检测精度。不幸的是,这些都是以高计算复杂度为代价的;因此,对这些算法进行广泛理解的加速研究是非常重要和及时的。在这项工作中,我们比较了三种不同的DCNN硬件加速器实现方法:粗粒度(称为LittleNet的定制加速器),细粒度(FINN)和顺序(Vitis AI)。我们在VOT和VTB数据集上评估了目标检测精度,吞吐量和能量使用方面的方法。我们还介绍了所考虑的每种方法的局限性。我们描述了深度神经网络实现的整个过程,包括架构设计、训练、量化和硬件实现。我们使用两种定制的深度神经网络架构来获得更高的精度,更高的吞吐量和更低的能耗。第一个是在SystemVerilog中实现的,第二个是用AMD Xilinx的FINN工具实现的。接下来,将这两种方法与AMD Xilinx的Vitis AI工具进行比较。最终实现在安富利Ultra96-V2开发板上与Zynq UltraScale+ MPSoC ZCU3EG器件进行了测试。对于两种不同的dnn架构,我们的定制加速器实现了196 fps的吞吐量,FINN实现了111 fps的吞吐量。使用Vitis AI实现的相同网络分别达到123.3 fps和53.3 fps。
{"title":"Embedded Object Detection with Custom LittleNet, FINN and Vitis AI DCNN Accelerators","authors":"Michal Machura, M. Danilowicz, T. Kryjak","doi":"10.3390/jlpea12020030","DOIUrl":"https://doi.org/10.3390/jlpea12020030","url":null,"abstract":"Object detection is an essential component of many systems used, for example, in advanced driver assistance systems (ADAS) or advanced video surveillance systems (AVSS). Currently, the highest detection accuracy is achieved by solutions using deep convolutional neural networks (DCNN). Unfortunately, these come at the cost of a high computational complexity; hence, the work on the widely understood acceleration of these algorithms is very important and timely. In this work, we compare three different DCNN hardware accelerator implementation methods: coarse-grained (a custom accelerator called LittleNet), fine-grained (FINN) and sequential (Vitis AI). We evaluate the approaches in terms of object detection accuracy, throughput and energy usage on the VOT and VTB datasets. We also present the limitations of each of the methods considered. We describe the whole process of DNNs implementation, including architecture design, training, quantisation and hardware implementation. We used two custom DNN architectures to obtain a higher accuracy, higher throughput and lower energy consumption. The first was implemented in SystemVerilog and the second with the FINN tool from AMD Xilinx. Next, both approaches were compared with the Vitis AI tool from AMD Xilinx. The final implementations were tested on the Avnet Ultra96-V2 development board with the Zynq UltraScale+ MPSoC ZCU3EG device. For two different DNNs architectures, we achieved a throughput of 196 fps for our custom accelerator and 111 fps for FINN. The same networks implemented with Vitis AI achieved 123.3 fps and 53.3 fps, respectively.","PeriodicalId":38100,"journal":{"name":"Journal of Low Power Electronics and Applications","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2022-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45864626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Low-Overhead Reinforcement Learning-Based Power Management Using 2QoSM 基于2QoSM的低开销强化学习电源管理
IF 2.1 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-05-19 DOI: 10.3390/jlpea12020029
Michael J. Giardino, D. Schwyn, Bonnie H. Ferri, A. Ferri
With the computational systems of even embedded devices becoming ever more powerful, there is a need for more effective and pro-active methods of dynamic power management. The work presented in this paper demonstrates the effectiveness of a reinforcement-learning based dynamic power manager placed in a software framework. This combination of Q-learning for determining policy and the software abstractions provide many of the benefits of co-design, namely, good performance, responsiveness and application guidance, with the flexibility of easily changing policies or platforms. The Q-learning based Quality of Service Manager (2QoSM) is implemented on an autonomous robot built on a complex, powerful embedded single-board computer (SBC) and a high-resolution path-planning algorithm. We find that the 2QoSM reduces power consumption up to 42% compared to the Linux on-demand governor and 10.2% over a state-of-the-art situation aware governor. Moreover, the performance as measured by path error is improved by up to 6.1%, all while saving power.
随着嵌入式设备的计算系统变得越来越强大,需要更有效和主动的动态电源管理方法。本文中的工作证明了将基于强化学习的动态功率管理器放置在软件框架中的有效性。用于确定策略的Q学习和软件抽象的这种组合提供了联合设计的许多好处,即良好的性能、响应能力和应用程序指导,以及易于更改策略或平台的灵活性。基于Q学习的服务质量管理器(2QoSM)是在一个基于复杂、强大的嵌入式单板计算机(SBC)和高分辨率路径规划算法的自主机器人上实现的。我们发现,与Linux按需调速器相比,2QoSM的功耗降低了42%,与最先进的态势感知调速器相比,功耗降低了10.2%。此外,通过路径误差测量的性能提高了6.1%,同时节省了电力。
{"title":"Low-Overhead Reinforcement Learning-Based Power Management Using 2QoSM","authors":"Michael J. Giardino, D. Schwyn, Bonnie H. Ferri, A. Ferri","doi":"10.3390/jlpea12020029","DOIUrl":"https://doi.org/10.3390/jlpea12020029","url":null,"abstract":"With the computational systems of even embedded devices becoming ever more powerful, there is a need for more effective and pro-active methods of dynamic power management. The work presented in this paper demonstrates the effectiveness of a reinforcement-learning based dynamic power manager placed in a software framework. This combination of Q-learning for determining policy and the software abstractions provide many of the benefits of co-design, namely, good performance, responsiveness and application guidance, with the flexibility of easily changing policies or platforms. The Q-learning based Quality of Service Manager (2QoSM) is implemented on an autonomous robot built on a complex, powerful embedded single-board computer (SBC) and a high-resolution path-planning algorithm. We find that the 2QoSM reduces power consumption up to 42% compared to the Linux on-demand governor and 10.2% over a state-of-the-art situation aware governor. Moreover, the performance as measured by path error is improved by up to 6.1%, all while saving power.","PeriodicalId":38100,"journal":{"name":"Journal of Low Power Electronics and Applications","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2022-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44003912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Journal of Low Power Electronics and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1