首页 > 最新文献

IEEE Embedded Systems Letters最新文献

英文 中文
Dynamic Segmented Bus for Energy-Efficient Last-Level Cache in Advanced Interconnect-Dominant Nodes 先进互联优势节点中节能最后一级缓存的动态分段总线
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3444711
Mahta Mayahinia;Tommaso Marinelli;Zhenlin Pei;Hsiao-Hsuan Liu;Chenyun Pan;Zsolt Tokei;Francky Catthoor;Mehdi B. Tahoori
To deal with stagnated performance and energy improved by successive technology scaling, system-technology co-optimization (STCO) comes as a rescue which involves the co-optimization of the important system parameters from the high-level application all the way down to the low-level technology. This article addresses the interconnect dominance issue in advanced nodes as a bottleneck in energy-efficient static RAM (SRAM)-based last-level cache (LLC) and aims to mitigate it through an STCO mechanism. Our main approach in this work is the utilization of a workload-aware controlled dynamic segmented bus (DSB) as the intramacro (interbanks) interconnect. Based on our results, our approach can improve the energy efficiency of the SRAM-based LLC by an average of 35%.
系统技术协同优化(system-technology co-optimization, STCO)是一种解决方案,它涉及到从高层应用一直到底层技术的重要系统参数的协同优化。本文将高级节点中的互连优势问题作为基于节能静态RAM (SRAM)的最后一级缓存(LLC)的瓶颈,并旨在通过STCO机制缓解这一问题。我们在这项工作中的主要方法是利用工作负载感知控制的动态分段总线(DSB)作为内部(银行间)互连。根据我们的研究结果,我们的方法可以将基于sram的LLC的能源效率平均提高35%。
{"title":"Dynamic Segmented Bus for Energy-Efficient Last-Level Cache in Advanced Interconnect-Dominant Nodes","authors":"Mahta Mayahinia;Tommaso Marinelli;Zhenlin Pei;Hsiao-Hsuan Liu;Chenyun Pan;Zsolt Tokei;Francky Catthoor;Mehdi B. Tahoori","doi":"10.1109/LES.2024.3444711","DOIUrl":"https://doi.org/10.1109/LES.2024.3444711","url":null,"abstract":"To deal with stagnated performance and energy improved by successive technology scaling, system-technology co-optimization (STCO) comes as a rescue which involves the co-optimization of the important system parameters from the high-level application all the way down to the low-level technology. This article addresses the interconnect dominance issue in advanced nodes as a bottleneck in energy-efficient static RAM (SRAM)-based last-level cache (LLC) and aims to mitigate it through an STCO mechanism. Our main approach in this work is the utilization of a workload-aware controlled dynamic segmented bus (DSB) as the intramacro (interbanks) interconnect. Based on our results, our approach can improve the energy efficiency of the SRAM-based LLC by an average of 35%.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"321-324"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SPELL: An End-to-End Tool Flow for LLM-Guided Secure SoC Design for Embedded Systems 拼写:一个端到端的工具流程为llm引导的安全SoC设计的嵌入式系统
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3447691
Sudipta Paria;Aritra Dasgupta;Swarup Bhunia
Modern embedded systems and Internet of Things (IoT) devices contain system-on-chips (SoCs) as their hardware backbone, which increasingly contain many critical assets (secure communication keys, configuration bits, firmware, sensitive data, etc.). These critical assets must be protected against wide array of potential vulnerabilities to uphold the system’s confidentiality, integrity, and availability. Today’s SoC designs contain diverse intellectual property (IP) blocks, often acquired from multiple 3rd-party IP vendors. Secure hardware design using them inevitably relies on the accrued domain knowledge of well-trained security experts. In this letter, we introduce SPELL, a novel end-to-end framework for the automated development of secure SoC designs. It leverages conversational large language models (LLMs) to automatically identify security vulnerabilities in a target SoC and map them to the evolving database of common weakness enumerations (CWEs); SPELL then filters the relevant CWEs, subsequently converting them to systemverilog assertions (SVAs) for verification; and finally, addresses the vulnerabilities via centralized security policy enforcement. We have implemented the SPELL framework using popular LLMs, such as ChatGPT and GEMINI, to analyze their efficacy in generating appropriate CWEs from user-defined SoC specifications and implement corresponding security policies for an open-source SoC benchmark. We have also explored the limitations of existing pretrained conversational LLMs in this context.
现代嵌入式系统和物联网(IoT)设备包含片上系统(soc)作为其硬件骨干,其中越来越多地包含许多关键资产(安全通信密钥,配置位,固件,敏感数据等)。必须保护这些关键资产免受各种潜在漏洞的侵害,以维护系统的机密性、完整性和可用性。如今的SoC设计包含多种知识产权(IP)模块,通常是从多个第三方IP供应商那里获得的。使用它们的安全硬件设计不可避免地依赖于训练有素的安全专家积累的领域知识。在这封信中,我们介绍了SPELL,这是一种用于安全SoC设计自动化开发的新型端到端框架。它利用会话式大型语言模型(llm)自动识别目标SoC中的安全漏洞,并将其映射到不断发展的常见弱点枚举(CWEs)数据库;然后,SPELL过滤相关的CWEs,随后将它们转换为系统verilog断言(SVAs)进行验证;最后,通过集中的安全策略实施来解决漏洞。我们已经使用流行的llm(如ChatGPT和GEMINI)实现了SPELL框架,以分析它们在从用户定义的SoC规范生成适当的CWEs方面的功效,并为开源SoC基准实现相应的安全策略。在这种情况下,我们还探讨了现有的预训练会话法学硕士的局限性。
{"title":"SPELL: An End-to-End Tool Flow for LLM-Guided Secure SoC Design for Embedded Systems","authors":"Sudipta Paria;Aritra Dasgupta;Swarup Bhunia","doi":"10.1109/LES.2024.3447691","DOIUrl":"https://doi.org/10.1109/LES.2024.3447691","url":null,"abstract":"Modern embedded systems and Internet of Things (IoT) devices contain system-on-chips (SoCs) as their hardware backbone, which increasingly contain many critical assets (secure communication keys, configuration bits, firmware, sensitive data, etc.). These critical assets must be protected against wide array of potential vulnerabilities to uphold the system’s confidentiality, integrity, and availability. Today’s SoC designs contain diverse intellectual property (IP) blocks, often acquired from multiple 3rd-party IP vendors. Secure hardware design using them inevitably relies on the accrued domain knowledge of well-trained security experts. In this letter, we introduce \u0000<monospace>SPELL</monospace>\u0000, a novel end-to-end framework for the automated development of secure SoC designs. It leverages conversational large language models (LLMs) to automatically identify security vulnerabilities in a target SoC and map them to the evolving database of common weakness enumerations (CWEs); \u0000<monospace>SPELL</monospace>\u0000 then filters the relevant CWEs, subsequently converting them to systemverilog assertions (SVAs) for verification; and finally, addresses the vulnerabilities via centralized security policy enforcement. We have implemented the \u0000<monospace>SPELL</monospace>\u0000 framework using popular LLMs, such as ChatGPT and GEMINI, to analyze their efficacy in generating appropriate CWEs from user-defined SoC specifications and implement corresponding security policies for an open-source SoC benchmark. We have also explored the limitations of existing pretrained conversational LLMs in this context.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"365-368"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heterogeneous Accelerator Design for Multi-DNN Workloads via Heuristic Optimization 基于启发式优化的多dnn工作负载异构加速器设计
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3443628
Konstantinos Balaskas;Heba Khdr;Mohammed Bakr Sikal;Fabian Kreß;Kostas Siozios;Jürgen Becker;Jörg Henkel
The significant advancements of deep neural networks (DNNs) in a wide range of application domains have spawned the need for more specialized, sophisticated solutions in the form of multi-DNN workloads. Heterogeneous DNN accelerators have emerged as an elegant solution to tackle the workloads’ inherent diversity, achieving significant improvements compared to homogeneous solutions. However, utilizing off-the-shelf architectures provides suboptimal adaptability to given workloads, whereas custom design approaches offer limited heterogeneity, and thus reduced gains. In this letter, we combat these shortcomings and propose an exploration-based framework to holistically design heterogeneous accelerators, tailored for multi-DNN workloads. Our framework is workload-agnostic and leverages architectural heterogeneity to its full potential, by integrating low-precision arithmetic and custom structural parameters. We explore the formed design space, targeting to minimize the system’s energy-delay product (EDP) via heuristic techniques. Our proposed accelerators achieve, on average, a significant $5.5times $ reduction in EDP compared to the state of the art across various multi-DNN workloads.
深度神经网络(dnn)在广泛应用领域的重大进步催生了对多深度神经网络工作负载形式的更专业、更复杂的解决方案的需求。异构DNN加速器已经成为解决工作负载固有多样性的优雅解决方案,与同构解决方案相比,实现了显着改进。然而,利用现成的体系结构提供了对给定工作负载的次优适应性,而定制设计方法提供了有限的异构性,从而降低了收益。在这封信中,我们克服了这些缺点,并提出了一个基于探索的框架来整体设计异构加速器,为多深度神经网络工作负载量身定制。我们的框架与工作负载无关,通过集成低精度算法和自定义结构参数,充分利用了体系结构的异构性。我们探索形成的设计空间,目标是通过启发式技术最小化系统的能量延迟积(EDP)。与各种多dnn工作负载的最先进技术相比,我们提出的加速器平均实现了5.5倍的EDP降低。
{"title":"Heterogeneous Accelerator Design for Multi-DNN Workloads via Heuristic Optimization","authors":"Konstantinos Balaskas;Heba Khdr;Mohammed Bakr Sikal;Fabian Kreß;Kostas Siozios;Jürgen Becker;Jörg Henkel","doi":"10.1109/LES.2024.3443628","DOIUrl":"https://doi.org/10.1109/LES.2024.3443628","url":null,"abstract":"The significant advancements of deep neural networks (DNNs) in a wide range of application domains have spawned the need for more specialized, sophisticated solutions in the form of multi-DNN workloads. Heterogeneous DNN accelerators have emerged as an elegant solution to tackle the workloads’ inherent diversity, achieving significant improvements compared to homogeneous solutions. However, utilizing off-the-shelf architectures provides suboptimal adaptability to given workloads, whereas custom design approaches offer limited heterogeneity, and thus reduced gains. In this letter, we combat these shortcomings and propose an exploration-based framework to holistically design heterogeneous accelerators, tailored for multi-DNN workloads. Our framework is workload-agnostic and leverages architectural heterogeneity to its full potential, by integrating low-precision arithmetic and custom structural parameters. We explore the formed design space, targeting to minimize the system’s energy-delay product (EDP) via heuristic techniques. Our proposed accelerators achieve, on average, a significant \u0000<inline-formula> <tex-math>$5.5times $ </tex-math></inline-formula>\u0000 reduction in EDP compared to the state of the art across various multi-DNN workloads.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"317-320"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Precision-Aware Safe Neural-Controlled Cyber–Physical Systems 迈向精确感知安全神经控制的网络物理系统
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3444004
Harikishan Thevendhriya;Sumana Ghosh;Debasmita Lohar
The safety of neural network (NN) controllers is crucial, specifically in the context of safety-critical Cyber-Physical System (CPS) applications. Current safety verification focuses on the reachability analysis, considering the bounded errors from the noisy environments or inaccurate implementations. However, it assumes real-valued arithmetic and does not account for the fixed-point quantization often used in the embedded systems. Some recent efforts have focused on generating the sound quantized NN implementations in fixed-point, ensuring specific target error bounds, but they assume the safety of NNs is already proven. To bridge this gap, we introduce Nexus, a novel two-phase framework combining reachability analysis with sound NN quantization. Nexus provides an end-to-end solution that ensures CPS safety within bounded errors while generating mixed-precision fixed-point implementations for the NN controllers. Additionally, we optimize these implementations for the automated parallelization on the FPGAs using a commercial HLS compiler, reducing the machine cycles significantly.
神经网络(NN)控制器的安全性至关重要,特别是在安全关键型网络物理系统(CPS)应用的背景下。目前的安全验证侧重于可达性分析,考虑到噪声环境或不准确实现的有界错误。然而,它采用实值算法,没有考虑嵌入式系统中常用的定点量化。最近的一些研究集中于在定点上生成声音量化的神经网络实现,确保特定的目标误差范围,但他们假设神经网络的安全性已经被证明。为了弥补这一差距,我们引入了Nexus,这是一种结合可达性分析和声音神经网络量化的新型两阶段框架。Nexus提供了一个端到端解决方案,在为NN控制器生成混合精度定点实现的同时,确保CPS在有限错误内的安全性。此外,我们使用商用HLS编译器优化了这些fpga上的自动并行化实现,大大减少了机器周期。
{"title":"Toward Precision-Aware Safe Neural-Controlled Cyber–Physical Systems","authors":"Harikishan Thevendhriya;Sumana Ghosh;Debasmita Lohar","doi":"10.1109/LES.2024.3444004","DOIUrl":"https://doi.org/10.1109/LES.2024.3444004","url":null,"abstract":"The safety of neural network (NN) controllers is crucial, specifically in the context of safety-critical Cyber-Physical System (CPS) applications. Current safety verification focuses on the reachability analysis, considering the bounded errors from the noisy environments or inaccurate implementations. However, it assumes real-valued arithmetic and does not account for the fixed-point quantization often used in the embedded systems. Some recent efforts have focused on generating the sound quantized NN implementations in fixed-point, ensuring specific target error bounds, but they assume the safety of NNs is already proven. To bridge this gap, we introduce Nexus, a novel two-phase framework combining reachability analysis with sound NN quantization. Nexus provides an end-to-end solution that ensures CPS safety within bounded errors while generating mixed-precision fixed-point implementations for the NN controllers. Additionally, we optimize these implementations for the automated parallelization on the FPGAs using a commercial HLS compiler, reducing the machine cycles significantly.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"397-400"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Methodology for Formal Verification of Hardware Safety Strategies Using SMT 使用SMT的硬件安全策略的形式化验证方法
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3439859
Anthony Faure-Gignoux;Kevin Delmas;Adrien Gauffriau;Claire Pagetti
Safety-critical embedded systems must maintain their functionality even in the presence of single permanent hardware failure. Naive redundancy of hardware is often unaffordable and impractical, therefore alternative strategies must be explored for minimal cost fault tolerance. The objective of this article is to propose a methodology to evaluate formally safety strategies using satisfiability modulo theory solvers. Practically, the approach consists in providing a bounded model checking demonstration applied to the formal model of hardware. We show the capabilities of the approach on an efficient hardware accelerator designed to perform parallel computations of matrix multiplications and convolutions.
安全关键型嵌入式系统必须保持其功能,即使存在单一的永久性硬件故障。单纯的硬件冗余通常是负担不起的和不切实际的,因此必须探索其他策略以实现最小成本的容错。本文的目的是提出一种使用可满足模理论解算器来评估形式化安全策略的方法。实际上,该方法包括提供一个应用于硬件正式模型的有界模型检查演示。我们展示了该方法在高效硬件加速器上的能力,该加速器设计用于执行矩阵乘法和卷积的并行计算。
{"title":"Methodology for Formal Verification of Hardware Safety Strategies Using SMT","authors":"Anthony Faure-Gignoux;Kevin Delmas;Adrien Gauffriau;Claire Pagetti","doi":"10.1109/LES.2024.3439859","DOIUrl":"https://doi.org/10.1109/LES.2024.3439859","url":null,"abstract":"Safety-critical embedded systems must maintain their functionality even in the presence of single permanent hardware failure. Naive redundancy of hardware is often unaffordable and impractical, therefore alternative strategies must be explored for minimal cost fault tolerance. The objective of this article is to propose a methodology to evaluate formally safety strategies using satisfiability modulo theory solvers. Practically, the approach consists in providing a bounded model checking demonstration applied to the formal model of hardware. We show the capabilities of the approach on an efficient hardware accelerator designed to perform parallel computations of matrix multiplications and convolutions.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"381-384"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Co-Designing Perception-Based Autonomous Systems on CPU-GPU Platforms CPU-GPU平台上基于感知的自治系统协同设计
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3443135
Suraj Singh;Ashiqur Rahaman Molla;Arijit Mondal;Soumyajit Dey
Perception-based autonomous system design methods are widely adopted in various domains like transportation, industrial robotics, etc. However, attaining safe and predictable execution in such systems depends on the platform-level integration of perception and control tasks. This letter presents a novel methodology to co-optimize these tasks, assuming a CPU-GPU-based real-time platform, a common choice of compute resource in this domain. Unlike the traditional methods that separately address AI-based sensing and control concerns, we consider that the overall performance of the system depends on the inferencing accuracy of the perception tasks and the performance of the control tasks iteratively executing in a feedback loop. We propose a design-space exploration methodology that considers the above concern and validates the same on an autonomous driving use case using a novel simulation setup.
基于感知的自主系统设计方法被广泛应用于交通运输、工业机器人等各个领域。然而,在这样的系统中获得安全和可预测的执行取决于感知和控制任务的平台级集成。本文提出了一种新的方法来共同优化这些任务,假设基于cpu - gpu的实时平台,这是该领域中常见的计算资源选择。与分别解决基于人工智能的传感和控制问题的传统方法不同,我们认为系统的整体性能取决于感知任务的推理准确性和在反馈回路中迭代执行的控制任务的性能。我们提出了一种设计空间探索方法,该方法考虑了上述问题,并使用新颖的模拟设置在自动驾驶用例上验证了相同的方法。
{"title":"Co-Designing Perception-Based Autonomous Systems on CPU-GPU Platforms","authors":"Suraj Singh;Ashiqur Rahaman Molla;Arijit Mondal;Soumyajit Dey","doi":"10.1109/LES.2024.3443135","DOIUrl":"https://doi.org/10.1109/LES.2024.3443135","url":null,"abstract":"Perception-based autonomous system design methods are widely adopted in various domains like transportation, industrial robotics, etc. However, attaining safe and predictable execution in such systems depends on the platform-level integration of perception and control tasks. This letter presents a novel methodology to co-optimize these tasks, assuming a CPU-GPU-based real-time platform, a common choice of compute resource in this domain. Unlike the traditional methods that separately address AI-based sensing and control concerns, we consider that the overall performance of the system depends on the inferencing accuracy of the perception tasks and the performance of the control tasks iteratively executing in a feedback loop. We propose a design-space exploration methodology that considers the above concern and validates the same on an autonomous driving use case using a novel simulation setup.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"357-360"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing ADC Front-End Costs During Training of On-Sensor Printed Multilayer Perceptrons 在传感器印刷多层感知器训练过程中降低ADC前端成本
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3447412
Florentia Afentaki;Paula Carolina Lozano Duarte;Georgios Zervakis;Mehdi B. Tahoori
Printed electronics (PEs) technology offers a cost-effective and fully-customizable solution to computational needs beyond the capabilities of traditional silicon technologies, offering advantages, such as on-demand manufacturing and conformal, low-cost hardware. However, the low-resolution fabrication of PEs, which results in large feature sizes, poses a challenge for integrating complex designs like those of machine learning (ML) classification systems. Current literature optimizes only the multilayer perceptron (MLP) circuit within the classification system, while the cost of analog-to-digital converters (ADCs) is overlooked. Printed applications frequently require on-sensor processing, yet while the digital classifier has been extensively optimized, the analog-to-digital interfacing, specifically the ADCs, dominates the total area and energy consumption. In this letter, we target digital printed MLP classifiers and we propose the design of customized ADCs per MLP’s input which involves minimizing the distinct represented numbers for each input, simplifying thus the ADC’s circuitry. Incorporating this ADC optimization in the MLP training, enables eliminating ADC levels and the respective comparators, while still maintaining high classification accuracy. Our approach achieves $11.2times $ lower ADC area for less than 5% accuracy drop across varying MLPs.
印刷电子(PEs)技术为计算需求提供了一种成本效益高、完全可定制的解决方案,超越了传统硅技术的能力,提供了诸如按需制造和保形、低成本硬件等优势。然而,pe的低分辨率制造导致了大的特征尺寸,这对集成像机器学习(ML)分类系统这样的复杂设计提出了挑战。目前的文献只优化了分类系统中的多层感知器(MLP)电路,而忽略了模数转换器(adc)的成本。印刷应用经常需要传感器上的处理,然而,虽然数字分类器已被广泛优化,但模数接口,特别是adc,在总面积和能耗方面占主导地位。在这封信中,我们的目标是数字印刷MLP分类器,我们提出每个MLP输入定制ADC的设计,其中包括最小化每个输入的不同表示数字,从而简化ADC的电路。在MLP训练中结合这种ADC优化,可以消除ADC水平和各自的比较器,同时仍然保持较高的分类准确性。我们的方法在不同的mlp中实现了11.2倍的ADC面积降低,精度下降不到5%。
{"title":"Reducing ADC Front-End Costs During Training of On-Sensor Printed Multilayer Perceptrons","authors":"Florentia Afentaki;Paula Carolina Lozano Duarte;Georgios Zervakis;Mehdi B. Tahoori","doi":"10.1109/LES.2024.3447412","DOIUrl":"https://doi.org/10.1109/LES.2024.3447412","url":null,"abstract":"Printed electronics (PEs) technology offers a cost-effective and fully-customizable solution to computational needs beyond the capabilities of traditional silicon technologies, offering advantages, such as on-demand manufacturing and conformal, low-cost hardware. However, the low-resolution fabrication of PEs, which results in large feature sizes, poses a challenge for integrating complex designs like those of machine learning (ML) classification systems. Current literature optimizes only the multilayer perceptron (MLP) circuit within the classification system, while the cost of analog-to-digital converters (ADCs) is overlooked. Printed applications frequently require on-sensor processing, yet while the digital classifier has been extensively optimized, the analog-to-digital interfacing, specifically the ADCs, dominates the total area and energy consumption. In this letter, we target digital printed MLP classifiers and we propose the design of customized ADCs per MLP’s input which involves minimizing the distinct represented numbers for each input, simplifying thus the ADC’s circuitry. Incorporating this ADC optimization in the MLP training, enables eliminating ADC levels and the respective comparators, while still maintaining high classification accuracy. Our approach achieves \u0000<inline-formula> <tex-math>$11.2times $ </tex-math></inline-formula>\u0000 lower ADC area for less than 5% accuracy drop across varying MLPs.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"353-356"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterizing CNN Throughput and Energy Under Multithreaded and Multiaccelerator Execution 在多线程和多加速器执行下表征CNN吞吐量和能量
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3446896
M A Muneeb;Rajesh Kedia
Emerging applications and batch processing convolutional neural network (CNN) workloads require executing multiple CNNs concurrently. A wide variety of CNN accelerators are available today and we characterize the support for concurrency for CNNs in such accelerators. We use a commercial-off-the-shelf CNN accelerator in multithreading and multiaccelerator modes and identify that upto $3.98times $ improvement in throughput and $3.20times $ improvement in energy per inference can be obtained even with just a single accelerator. Our detailed characterization of 104 CNN models, for three different sizes of accelerator, reveals many insights that connect CNN characteristics to improvement in throughput and energy. We also present a design space and a low error throughput estimation model to explore such a design space.
新兴应用和批处理卷积神经网络(CNN)工作负载需要同时执行多个卷积神经网络。现在有各种各样的CNN加速器可用,我们描述了这些加速器对CNN并发性的支持。我们在多线程和多加速器模式下使用商用现成的CNN加速器,并确定即使只有一个加速器,也可以获得高达3.98倍的吞吐量改进和3.20倍的每次推理能量改进。我们对三种不同尺寸的加速器的104个CNN模型进行了详细的表征,揭示了将CNN特性与吞吐量和能量的改进联系起来的许多见解。我们还提出了一个设计空间和一个低误差吞吐量估计模型来探索这样的设计空间。
{"title":"Characterizing CNN Throughput and Energy Under Multithreaded and Multiaccelerator Execution","authors":"M A Muneeb;Rajesh Kedia","doi":"10.1109/LES.2024.3446896","DOIUrl":"https://doi.org/10.1109/LES.2024.3446896","url":null,"abstract":"Emerging applications and batch processing convolutional neural network (CNN) workloads require executing multiple CNNs concurrently. A wide variety of CNN accelerators are available today and we characterize the support for concurrency for CNNs in such accelerators. We use a commercial-off-the-shelf CNN accelerator in multithreading and multiaccelerator modes and identify that upto \u0000<inline-formula> <tex-math>$3.98times $ </tex-math></inline-formula>\u0000 improvement in throughput and \u0000<inline-formula> <tex-math>$3.20times $ </tex-math></inline-formula>\u0000 improvement in energy per inference can be obtained even with just a single accelerator. Our detailed characterization of 104 CNN models, for three different sizes of accelerator, reveals many insights that connect CNN characteristics to improvement in throughput and energy. We also present a design space and a low error throughput estimation model to explore such a design space.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"369-372"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MONO: Enhancing Bit-Flip Resilience With Bit Homogeneity for Neural Networks MONO:利用位同质性增强神经网络的位翻转弹性
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3444921
Maryam Eslami;Yuhao Liu;Salim Ullah;Mostafa E. Salehi;Reshad Hosseini;Seyed Ahmad Mirsalari;Akash Kumar
Deep neural networks (DNNs) have been applied across diverse domains, including safety-critical applications. Past studies indicate that DNNs are very sensitive to changes in weights and activations due to uneven bit-weight distribution in standard number formats like fixed points, which can cause significant output accuracy fluctuations. To address this issue, we introduce a new data type called MONO to enhance bit-flip resilience using uniformity at the bit level by employing symmetric weights for all bit positions. On average, MONO has improved error resilience more effectively than the fixed-point data type, even when utilizing triple modular redundancy (TMR) and most significant bit (MSB) protection, while maintaining low overhead.
深度神经网络(dnn)已经应用于包括安全关键应用在内的各个领域。过去的研究表明,由于定点等标准数字格式的位权分布不均匀,dnn对权重和激活的变化非常敏感,这会导致输出精度的显著波动。为了解决这个问题,我们引入了一种称为MONO的新数据类型,通过对所有位位置采用对称权重,在位级别使用均匀性来增强位翻转弹性。平均而言,MONO比定点数据类型更有效地提高了错误恢复能力,即使在使用三模冗余(TMR)和最重要位(MSB)保护时也是如此,同时保持低开销。
{"title":"MONO: Enhancing Bit-Flip Resilience With Bit Homogeneity for Neural Networks","authors":"Maryam Eslami;Yuhao Liu;Salim Ullah;Mostafa E. Salehi;Reshad Hosseini;Seyed Ahmad Mirsalari;Akash Kumar","doi":"10.1109/LES.2024.3444921","DOIUrl":"https://doi.org/10.1109/LES.2024.3444921","url":null,"abstract":"Deep neural networks (DNNs) have been applied across diverse domains, including safety-critical applications. Past studies indicate that DNNs are very sensitive to changes in weights and activations due to uneven bit-weight distribution in standard number formats like fixed points, which can cause significant output accuracy fluctuations. To address this issue, we introduce a new data type called MONO to enhance bit-flip resilience using uniformity at the bit level by employing symmetric weights for all bit positions. On average, MONO has improved error resilience more effectively than the fixed-point data type, even when utilizing triple modular redundancy (TMR) and most significant bit (MSB) protection, while maintaining low overhead.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"333-336"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing HLS Performance Prediction on FPGAs Through Multimodal Representation Learning 利用多模态表示学习增强fpga的HLS性能预测
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3446797
Longshan Shang;Teng Wang;Lei Gong;Chao Wang;Xuehai Zhou
The emergence of design space exploration (DSE) technology has reduced the cost of searching for pragma configurations that lead to optimal performance microarchitecture. However, obtaining synthesis reports for a single design candidate can be time-consuming, sometimes taking several hours or even tens of hours, rendering this process prohibitively expensive. Researchers have proposed many solutions to address this issue. Previous studies have focused on extracting features from a single modality, leading to challenges in comprehensively evaluating the quality of designs. To overcome this limitation, this letter introduces a novel modal-aware representation learning method for the evaluation of high-level synthesis (HLS) design, named MORPH, which integrates information from three data modalities to characterize HLS designs, including code, graph, and code description (caption) modality. Remarkably, our model outperforms the baseline, demonstrating a 6%–25% improvement in root mean squared error loss. Moreover, the transferability of our predictor has also been notably enhanced.
设计空间探索(design space exploration, DSE)技术的出现降低了搜索可导致最佳性能微架构的编译配置的成本。然而,获取单个候选设计的综合报告可能非常耗时,有时需要几个小时甚至几十个小时,这使得该过程非常昂贵。研究人员提出了许多解决方案来解决这个问题。以往的研究主要集中在从单一模态中提取特征,这给全面评估设计质量带来了挑战。为了克服这一限制,本文介绍了一种新的模式感知表示学习方法,用于评估高层次综合(HLS)设计,称为MORPH,它集成了来自三种数据模式的信息来表征高层次综合(HLS)设计,包括代码、图形和代码描述(标题)模式。值得注意的是,我们的模型优于基线,在均方根误差损失方面提高了6%-25%。此外,我们的预测器的可转移性也显著增强。
{"title":"Enhancing HLS Performance Prediction on FPGAs Through Multimodal Representation Learning","authors":"Longshan Shang;Teng Wang;Lei Gong;Chao Wang;Xuehai Zhou","doi":"10.1109/LES.2024.3446797","DOIUrl":"https://doi.org/10.1109/LES.2024.3446797","url":null,"abstract":"The emergence of design space exploration (DSE) technology has reduced the cost of searching for pragma configurations that lead to optimal performance microarchitecture. However, obtaining synthesis reports for a single design candidate can be time-consuming, sometimes taking several hours or even tens of hours, rendering this process prohibitively expensive. Researchers have proposed many solutions to address this issue. Previous studies have focused on extracting features from a single modality, leading to challenges in comprehensively evaluating the quality of designs. To overcome this limitation, this letter introduces a novel modal-aware representation learning method for the evaluation of high-level synthesis (HLS) design, named MORPH, which integrates information from three data modalities to characterize HLS designs, including code, graph, and code description (caption) modality. Remarkably, our model outperforms the baseline, demonstrating a 6%–25% improvement in root mean squared error loss. Moreover, the transferability of our predictor has also been notably enhanced.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"385-388"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Embedded Systems Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1