首页 > 最新文献

IEEE Embedded Systems Letters最新文献

英文 中文
NvMISC: Toward an FPGA-Based Emulation Platform for RISC-V and Nonvolatile Memories 面向RISC-V和非易失性存储器的fpga仿真平台
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3299202
Yuankang Zhao;Salim Ullah;Siva Satyendra Sahoo;Akash Kumar
The emerging nonvolatile memories (NVMs), such as spin transfer torque random access memory (STT-RAM) and racetrack memory (RTM), offer a promising solution to satisfy the memory and performance requirements of modern applications. Compared to the commonly utilized volatile static random-access memories (SRAMs), the NVMs provide better capacity and energy efficiency. However, many of these NVMs are still in the development phases and require proper evaluation in order to evaluate the impact of their use at the system level. Therefore, there is a need to design functional- and cycleaccurate simulators/emulators to evaluate the performance of these memory technologies. To this end, this work focuses on implementing a RISC-V-based emulation platform for evaluating NVMs. The proposed framework provides interfaces to integrate various types of NVMs, with RTMs and STT-RAMs used as test cases. The efficacy of the framework is evaluated by executing benchmark applications.
新兴的非易失性存储器(nvm),如自旋传递扭矩随机存取存储器(STT-RAM)和赛道存储器(RTM),提供了一个有前途的解决方案,以满足现代应用的内存和性能要求。与常用的易失性静态随机存取存储器(sram)相比,nvm具有更好的容量和能效。然而,许多nvm仍处于开发阶段,需要进行适当的评估,以评估其在系统级使用的影响。因此,有必要设计功能和周期精确的模拟器/仿真器来评估这些存储技术的性能。为此,本研究的重点是实现一个基于risc - v的仿真平台来评估nvm。该框架提供了集成各种类型nvm的接口,以rtm和stt - ram作为测试用例。通过执行基准应用程序来评估框架的有效性。
{"title":"NvMISC: Toward an FPGA-Based Emulation Platform for RISC-V and Nonvolatile Memories","authors":"Yuankang Zhao;Salim Ullah;Siva Satyendra Sahoo;Akash Kumar","doi":"10.1109/LES.2023.3299202","DOIUrl":"10.1109/LES.2023.3299202","url":null,"abstract":"The emerging nonvolatile memories (NVMs), such as spin transfer torque random access memory (STT-RAM) and racetrack memory (RTM), offer a promising solution to satisfy the memory and performance requirements of modern applications. Compared to the commonly utilized volatile static random-access memories (SRAMs), the NVMs provide better capacity and energy efficiency. However, many of these NVMs are still in the development phases and require proper evaluation in order to evaluate the impact of their use at the system level. Therefore, there is a need to design functional- and cycleaccurate simulators/emulators to evaluate the performance of these memory technologies. To this end, this work focuses on implementing a RISC-V-based emulation platform for evaluating NVMs. The proposed framework provides interfaces to integrate various types of NVMs, with RTMs and STT-RAMs used as test cases. The efficacy of the framework is evaluated by executing benchmark applications.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"170-173"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135699685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
No-Multiplication Deterministic Hyperdimensional Encoding for Resource-Constrained Devices 资源受限设备的无乘法确定性超维编码
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3298732
Mehran Shoushtari Moghadam;Sercan Aygun;M. Hassan Najafi
Hyperdimensional vector processing is a nascent computing approach that mimics the brain structure and offers lightweight, robust, and efficient hardware solutions for different learning and cognitive tasks. For image recognition and classification, hyperdimensional computing (HDC) utilizes the intensity values of captured images and the positions of image pixels. Traditional HDC systems represent the intensity and positions with binary hypervectors of 1K–10K dimensions. The intensity hypervectors are cross-correlated for closer values and uncorrelated for distant values in the intensity range. The position hypervectors are pseudo-random binary vectors generated iteratively for the best classification performance. In this study, we propose a radically new approach for encoding image data in HDC systems. Position hypervectors are no longer needed by encoding pixel intensities using a deterministic approach based on quasi-random sequences. The proposed approach significantly reduces the number of operations by eliminating the position hypervectors and the multiplication operations in the HDC system. Additionally, we suggest a hybrid technique for generating hypervectors by combining two deterministic sequences, achieving higher classification accuracy. Our experimental results show up to $102times $ reduction in runtime and significant memory-usage savings with improved accuracy compared to a baseline HDC system with conventional hypervector encoding.
超维向量处理是一种新兴的计算方法,它模仿大脑结构,为不同的学习和认知任务提供轻量级、健壮和高效的硬件解决方案。HDC (hyperdimensional computing)用于图像识别和分类,利用捕获图像的强度值和图像像素的位置。传统的HDC系统用k - 10k维的二进制超向量表示强度和位置。在强度范围内,强度超向量对于较近的值是交叉相关的,对于较远的值是不相关的。位置超向量是为获得最佳分类性能而迭代生成的伪随机二值向量。在本研究中,我们提出了一种在HDC系统中编码图像数据的全新方法。使用基于准随机序列的确定性方法编码像素强度,不再需要位置超向量。该方法通过消除HDC系统中的位置超向量和乘法运算,显著减少了操作次数。此外,我们提出了一种混合技术,通过组合两个确定性序列来生成超向量,从而实现更高的分类精度。我们的实验结果表明,与传统超向量编码的基线HDC系统相比,运行时间减少了102倍,内存使用节省了显著的准确性。
{"title":"No-Multiplication Deterministic Hyperdimensional Encoding for Resource-Constrained Devices","authors":"Mehran Shoushtari Moghadam;Sercan Aygun;M. Hassan Najafi","doi":"10.1109/LES.2023.3298732","DOIUrl":"10.1109/LES.2023.3298732","url":null,"abstract":"Hyperdimensional vector processing is a nascent computing approach that mimics the brain structure and offers lightweight, robust, and efficient hardware solutions for different learning and cognitive tasks. For image recognition and classification, hyperdimensional computing (HDC) utilizes the intensity values of captured images and the positions of image pixels. Traditional HDC systems represent the intensity and positions with binary hypervectors of 1K–10K dimensions. The intensity hypervectors are cross-correlated for closer values and uncorrelated for distant values in the intensity range. The position hypervectors are pseudo-random binary vectors generated iteratively for the best classification performance. In this study, we propose a radically new approach for encoding image data in HDC systems. Position hypervectors are no longer needed by encoding pixel intensities using a deterministic approach based on quasi-random sequences. The proposed approach significantly reduces the number of operations by eliminating the position hypervectors and the multiplication operations in the HDC system. Additionally, we suggest a hybrid technique for generating hypervectors by combining two deterministic sequences, achieving higher classification accuracy. Our experimental results show up to \u0000<inline-formula> <tex-math>$102times $ </tex-math></inline-formula>\u0000 reduction in runtime and significant memory-usage savings with improved accuracy compared to a baseline HDC system with conventional hypervector encoding.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"210-213"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135699687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Approximate Parallel Annealing Ising Machine for Solving Traveling Salesman Problems 求解旅行商问题的近似并行退火伊辛机
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3298739
Qichao Tao;Tingting Zhang;Jie Han
Annealing-based Ising machines have emerged as high-performance solvers for combinatorial optimization problems (COPs). As a typical COP with constraints imposed on the solution, traveling salesman problems (TSPs) are difficult to solve using conventional methods. To address this challenge, we design an approximate parallel annealing Ising machine (APAIM) based on an improved parallel annealing algorithm. In this design, adders are reused in the local field accumulator units (LAUs) with half-precision floating-point representation of the coefficients in the Ising model. The momentum scaling factor is approximated by a linear, incremental function to save hardware. To improve the solution quality, a buffer-based energy calculation unit selects the best solution among the found candidate results in multiple iterations. Finally, approximate adders are applied in the design for improving the speed of accumulation in the LAUs. The design and synthesis of a 64-spin APAIM show the potential of this methodology in efficiently solving complicated constrained COPs.
基于退火的伊辛机器已经成为组合优化问题(cop)的高性能求解器。旅行商问题(tsp)是一类典型的求解约束问题,难以用常规方法求解。为了解决这一挑战,我们设计了一种基于改进并行退火算法的近似并行退火伊辛机(APAIM)。在这个设计中,加法器在局部字段累加器单元(lau)中被重用,在Ising模型中使用半精度浮点表示系数。动量比例因子由一个线性的增量函数近似,以节省硬件。为了提高解的质量,基于缓冲的能量计算单元在多次迭代中从发现的候选结果中选择最佳解。最后,在设计中采用近似加法器来提高lau的积累速度。64自旋APAIM的设计和合成显示了该方法在有效求解复杂约束cop方面的潜力。
{"title":"An Approximate Parallel Annealing Ising Machine for Solving Traveling Salesman Problems","authors":"Qichao Tao;Tingting Zhang;Jie Han","doi":"10.1109/LES.2023.3298739","DOIUrl":"10.1109/LES.2023.3298739","url":null,"abstract":"Annealing-based Ising machines have emerged as high-performance solvers for combinatorial optimization problems (COPs). As a typical COP with constraints imposed on the solution, traveling salesman problems (TSPs) are difficult to solve using conventional methods. To address this challenge, we design an approximate parallel annealing Ising machine (APAIM) based on an improved parallel annealing algorithm. In this design, adders are reused in the local field accumulator units (LAUs) with half-precision floating-point representation of the coefficients in the Ising model. The momentum scaling factor is approximated by a linear, incremental function to save hardware. To improve the solution quality, a buffer-based energy calculation unit selects the best solution among the found candidate results in multiple iterations. Finally, approximate adders are applied in the design for improving the speed of accumulation in the LAUs. The design and synthesis of a 64-spin APAIM show the potential of this methodology in efficiently solving complicated constrained COPs.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"226-229"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135699989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DynaFuse: Dynamic Fusion for Resource Efficient Multimodal Machine Learning Inference 动态融合:资源高效多模态机器学习推理
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3298738
Hamidreza Alikhani;Anil Kanduri;Pasi Liljeberg;Amir M. Rahmani;Nikil Dutt
Multimodal machine learning (MMML) applications combine results from different modalities in the inference phase to improve prediction accuracy. Existing MMML fusion strategies use static modality weight assignment, based on the intrinsic value of sensor modalities determined during the training phase. However, input data perturbations in practical scenarios affect the intrinsic value of modalities in the inference phase, lowering prediction accuracy, and draining computational and energy resources. In this letter, we present dynamic fusion (DynaFuse), a framework for dynamic and adaptive fusion of MMML inference to set modality weights, considering run-time parameters of input data quality and sensor energy budgets. We determine the insightfulness of modalities by combining the design-time intrinsic value with the run-time extrinsic value of different modalities to assign updated modality weights, catering to both accuracy requirements and energy conservation demands. The DynaFuse approach achieves up to 22% gain in prediction accuracy and an average energy savings of 34% on exemplary MMML applications of human activity recognition and stress monitoring in comparison with state-of-the-art static fusion approaches.
多模态机器学习(MMML)应用程序在推理阶段结合不同模态的结果来提高预测精度。现有的MMML融合策略使用静态模态权重分配,基于在训练阶段确定的传感器模态的内在值。然而,在实际场景中,输入数据的扰动会影响推理阶段模态的内在价值,降低预测精度,消耗计算和能量资源。在这项工作中,我们提出了DynaFuse,这是一个动态和自适应融合MMML推理来设置模态权重的框架,考虑了输入数据质量和传感器能量预算的运行时参数。我们通过结合不同模态的设计时内在价值和运行时外在价值来确定模态的洞察力,以分配更新的模态权重,同时满足精度要求和节能要求。与最先进的静态融合方法相比,DynaFuse方法在人类活动识别和压力监测的典型MMML应用中实现了高达22%的预测精度提高,平均节省了34%的能源。
{"title":"DynaFuse: Dynamic Fusion for Resource Efficient Multimodal Machine Learning Inference","authors":"Hamidreza Alikhani;Anil Kanduri;Pasi Liljeberg;Amir M. Rahmani;Nikil Dutt","doi":"10.1109/LES.2023.3298738","DOIUrl":"10.1109/LES.2023.3298738","url":null,"abstract":"Multimodal machine learning (MMML) applications combine results from different modalities in the inference phase to improve prediction accuracy. Existing MMML fusion strategies use static modality weight assignment, based on the intrinsic value of sensor modalities determined during the training phase. However, input data perturbations in practical scenarios affect the intrinsic value of modalities in the inference phase, lowering prediction accuracy, and draining computational and energy resources. In this letter, we present dynamic fusion (DynaFuse), a framework for dynamic and adaptive fusion of MMML inference to set modality weights, considering run-time parameters of input data quality and sensor energy budgets. We determine the insightfulness of modalities by combining the design-time intrinsic value with the run-time extrinsic value of different modalities to assign updated modality weights, catering to both accuracy requirements and energy conservation demands. The DynaFuse approach achieves up to 22% gain in prediction accuracy and an average energy savings of 34% on exemplary MMML applications of human activity recognition and stress monitoring in comparison with state-of-the-art static fusion approaches.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"222-225"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10261977","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135700000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Should We Even Optimize for Execution Energy? Rethinking Mapping for MAGIC Design Style 我们是否应该优化执行能量?重新思考MAGIC设计风格的映射
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3298740
Simranjeet Singh;Chandan Kumar Jha;Ankit Bende;Phrangboklang Lyngton Thangkhiew;Vikas Rana;Sachin Patkar;Rolf Drechsler;Farhad Merchant
Memristor-based logic-in-memory (LiM) has become popular as a means to overcome the von Neumann bottleneck in traditional data-intensive computing. Recently, the memristor-aided logic (MAGIC) design style has gained immense traction for LiM due to its simplicity. However, understanding the energy distribution during the design of logic operations within the memristive memory is crucial in assessing such an implementation’s significance. The current energy estimation methods rely on coarse-grained techniques, which underestimate the energy consumption of MAGIC-styled operations performed on a memristor crossbar. To address this issue, we analyze the energy breakdown in MAGIC operations and propose a solution that utilizes mapping from the SIMPLER MAGIC tool to achieve accurate energy estimation through SPICE simulations. In contrast to existing research that primarily focuses on optimizing execution energy, our findings reveal that the memristor’s initialization energy in the MAGIC design style is, on average, $68times $ higher. We demonstrate that this initialization energy significantly dominates the overall energy consumption. By highlighting this aspect, we aim to redirect the attention of designers toward developing algorithms and strategies that prioritize optimizations in initializations rather than execution for more effective energy savings.
在传统的数据密集型计算中,基于忆阻器的内存逻辑(LiM)作为克服冯·诺依曼瓶颈的一种手段而受到欢迎。最近,忆阻器辅助逻辑(MAGIC)设计风格因其简单性而获得了巨大的吸引力。然而,了解记忆存储器中逻辑运算设计过程中的能量分布对于评估这种实现的意义至关重要。目前的能量估计方法依赖于粗粒度技术,低估了在忆阻器横条上执行的magic式操作的能量消耗。为了解决这个问题,我们分析了MAGIC操作中的能量分解,并提出了一种解决方案,利用simple MAGIC工具的映射,通过SPICE模拟实现准确的能量估计。与现有主要关注优化执行能量的研究相反,我们的研究结果表明,MAGIC设计风格的忆阻器初始化能量平均高出68倍。我们证明了这个初始化能量显著地支配着总能量消耗。通过强调这方面,我们的目标是将设计人员的注意力转移到开发算法和策略上,这些算法和策略优先考虑初始化中的优化,而不是执行更有效的节能。
{"title":"Should We Even Optimize for Execution Energy? Rethinking Mapping for MAGIC Design Style","authors":"Simranjeet Singh;Chandan Kumar Jha;Ankit Bende;Phrangboklang Lyngton Thangkhiew;Vikas Rana;Sachin Patkar;Rolf Drechsler;Farhad Merchant","doi":"10.1109/LES.2023.3298740","DOIUrl":"10.1109/LES.2023.3298740","url":null,"abstract":"Memristor-based logic-in-memory (LiM) has become popular as a means to overcome the von Neumann bottleneck in traditional data-intensive computing. Recently, the memristor-aided logic (MAGIC) design style has gained immense traction for LiM due to its simplicity. However, understanding the energy distribution during the design of logic operations within the memristive memory is crucial in assessing such an implementation’s significance. The current energy estimation methods rely on coarse-grained techniques, which underestimate the energy consumption of MAGIC-styled operations performed on a memristor crossbar. To address this issue, we analyze the energy breakdown in MAGIC operations and propose a solution that utilizes mapping from the SIMPLER MAGIC tool to achieve accurate energy estimation through SPICE simulations. In contrast to existing research that primarily focuses on optimizing execution energy, our findings reveal that the memristor’s initialization energy in the MAGIC design style is, on average, \u0000<inline-formula> <tex-math>$68times $ </tex-math></inline-formula>\u0000 higher. We demonstrate that this initialization energy significantly dominates the overall energy consumption. By highlighting this aspect, we aim to redirect the attention of designers toward developing algorithms and strategies that prioritize optimizations in initializations rather than execution for more effective energy savings.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"230-233"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135699991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Efficient Partial Weight Update Techniques for Lightweight On-Device Learning on Tiny Flash-Embedded MCUs 基于微型闪存嵌入式mcu的轻量化设备学习的有效部分权重更新技术
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3298731
Jisu Kwon;Daejin Park
Typical training procedures involve read and write operations for weight updates during backpropagation. However, on-device training on microcontroller units (MCUs) presents two challenges. First, the on-chip SRAM has insufficient capacity to store the weight. Second, the large flash memory, which has a constraint on write access, becomes necessary to accommodate the network for on-device training on MCUs. To tackle these memory constraints, we propose a partial weight update technique based on gradient delta computation. The weights are stored in flash memory, and a part of the weight to be updated is selectively copied to the SRAM from the flash memory. We implemented this approach for training a fully connected network on an on-device MNIST digit classification task using only 20-kB SRAM and 1912-kB flash memory on an MCU. The proposed technique achieves reasonable accuracy with only 18.52% partial weight updates, which is comparable to state-of-the-art results. Furthermore, we achieved a reduction of up to 46.9% in the area-power-delay product compared to a commercially available high-performance MCU capable of embedding the entire model parameter, taking into account the area scale factor.
典型的训练过程包括反向传播过程中权重更新的读写操作。然而,在微控制器单元(mcu)上的设备上培训提出了两个挑战。首先,片上SRAM没有足够的容量来存储重量。其次,对写访问有限制的大容量闪存,对于适应mcu的设备上训练网络是必要的。为了解决这些内存限制,我们提出了一种基于梯度增量计算的部分权重更新技术。权重存储在闪存中,需要更新的权重的一部分被选择性地从闪存复制到SRAM中。我们实现了这种方法,用于在设备上的MNIST数字分类任务上训练完全连接的网络,仅使用MCU上的20KB SRAM和192KB闪存。所提出的技术仅以18.52%的部分权重更新达到合理的精度,与最先进的结果相当。此外,考虑到面积比例因素,与能够嵌入整个模型参数的市售高性能MCU相比,我们实现了面积功率延迟产品的减少高达46.9%。
{"title":"Efficient Partial Weight Update Techniques for Lightweight On-Device Learning on Tiny Flash-Embedded MCUs","authors":"Jisu Kwon;Daejin Park","doi":"10.1109/LES.2023.3298731","DOIUrl":"10.1109/LES.2023.3298731","url":null,"abstract":"Typical training procedures involve read and write operations for weight updates during backpropagation. However, on-device training on microcontroller units (MCUs) presents two challenges. First, the on-chip SRAM has insufficient capacity to store the weight. Second, the large flash memory, which has a constraint on write access, becomes necessary to accommodate the network for on-device training on MCUs. To tackle these memory constraints, we propose a partial weight update technique based on gradient delta computation. The weights are stored in flash memory, and a part of the weight to be updated is selectively copied to the SRAM from the flash memory. We implemented this approach for training a fully connected network on an on-device MNIST digit classification task using only 20-kB SRAM and 1912-kB flash memory on an MCU. The proposed technique achieves reasonable accuracy with only 18.52% partial weight updates, which is comparable to state-of-the-art results. Furthermore, we achieved a reduction of up to 46.9% in the area-power-delay product compared to a commercially available high-performance MCU capable of embedding the entire model parameter, taking into account the area scale factor.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"206-209"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135699689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LOCoCAT: Low-Overhead Classification of CAN Bus Attack Types localcat: CAN总线攻击类型的低开销分类
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3299217
Caio Batista de Melo;Nikil Dutt
Although research has shown vulnerabilities and shortcomings of the controller area network bus (CAN bus) and proposed alternatives, the CAN bus protocol is still the industry standard and present in most vehicles. Due to its vulnerability to potential intruders that can hinder execution or even take control of the vehicles, much work has focused on detecting intrusions on the CAN bus. However, most literature does not provide mechanisms to reason about, or respond to the attacks so that the system can continue to execute safely despite the intruder. This letter proposes a low-overhead methodology to automatically classify intrusions into predefined types once detected. Our framework: 1) groups messages of the same attacks into blocks; 2) extracts relevant features from each block; and 3) predicts the type of attack using a lightweight classifier model. The initial models depicted in this letter show an accuracy of up to 99.16% within the first 50 ms of the attack, allowing the system to quickly react to the intrusion before the malicious actor can conclude their attack. We believe this letter lays the groundwork for vehicles to have specialized runtime reactions based on the attack type.
尽管研究显示了CAN总线的漏洞和缺点,并提出了替代方案,但CAN总线协议仍然是行业标准,并存在于大多数车辆中。由于它容易受到潜在入侵者的攻击,这些入侵者可能会阻碍车辆的执行,甚至控制车辆,因此许多工作都集中在检测can总线上的入侵上。然而,大多数文献并没有提供推理或响应攻击的机制,以便系统能够在入侵者存在的情况下继续安全执行。本文提出了一种低开销的方法,在检测到入侵后自动将入侵分类为预定义的类型。我们的框架(i)将相同攻击的消息分组到块中,(ii)从每个块中提取相关特征,以及(iii)使用轻量级分类器模型预测攻击类型。本文中描述的初始模型显示,在攻击的前50毫秒内,准确率高达99.16%,允许系统在恶意行为者结束攻击之前快速对入侵做出反应。我们相信本文为车辆基于攻击类型的专门运行时反应奠定了基础。
{"title":"LOCoCAT: Low-Overhead Classification of CAN Bus Attack Types","authors":"Caio Batista de Melo;Nikil Dutt","doi":"10.1109/LES.2023.3299217","DOIUrl":"10.1109/LES.2023.3299217","url":null,"abstract":"Although research has shown vulnerabilities and shortcomings of the controller area network bus (CAN bus) and proposed alternatives, the CAN bus protocol is still the industry standard and present in most vehicles. Due to its vulnerability to potential intruders that can hinder execution or even take control of the vehicles, much work has focused on detecting intrusions on the CAN bus. However, most literature does not provide mechanisms to reason about, or respond to the attacks so that the system can continue to execute safely despite the intruder. This letter proposes a low-overhead methodology to automatically classify intrusions into predefined types once detected. Our framework: 1) groups messages of the same attacks into blocks; 2) extracts relevant features from each block; and 3) predicts the type of attack using a lightweight classifier model. The initial models depicted in this letter show an accuracy of up to 99.16% within the first 50 ms of the attack, allowing the system to quickly react to the intrusion before the malicious actor can conclude their attack. We believe this letter lays the groundwork for vehicles to have specialized runtime reactions based on the attack type.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"178-181"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10261979","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135699992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CNN Workloads Characterization and Integrated CPU–GPU DVFS Governors on Embedded Systems CNN工作负载表征和嵌入式系统上集成的CPU-GPU DVFS调控器
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3299335
Meruyert Karzhaubayeva;Aidar Amangeldi;Jurn-Gyu Park
Dynamic power management (DPM) techniques on mobile systems are indispensable for deep learning (DL) inference optimization, which is mainly performed on battery-based mobile and/or embedded platforms with constrained resources. To this end, we characterize CNN workloads using object detection applications of YOLOv4/-tiny and YOLOv3/-tiny, and then propose integrated CPU–GPU DVFS governor policies that scale integrated pairs of CPU and GPU frequencies to improve energy–delay product (EDP) with negligible inference execution time degradation. Our results show up to 16.7% EDP improvements with negligible (mostly less than 2%) performance degradation using object detection applications on NVIDIA Jetson TX2.
移动系统上的动态电源管理(DPM)技术对于深度学习(DL)推理优化是必不可少的,这主要是在资源受限的基于电池的移动或/和嵌入式平台上进行的。为此,我们使用YOLOv4/-tiny和YOLOv3/-tiny的目标检测应用程序来表征CNN工作负载,然后提出集成的CPU-GPU DVFS调控策略,该策略可以扩展CPU和GPU的集成频率对,以提高能量延迟积(EDP),而推理执行时间退化可以忽略不计。我们的结果显示,使用NVIDIA Jetson TX2上的目标检测应用程序,EDP提高了16.7%,性能下降可以忽略不计(大多数低于2%)。
{"title":"CNN Workloads Characterization and Integrated CPU–GPU DVFS Governors on Embedded Systems","authors":"Meruyert Karzhaubayeva;Aidar Amangeldi;Jurn-Gyu Park","doi":"10.1109/LES.2023.3299335","DOIUrl":"10.1109/LES.2023.3299335","url":null,"abstract":"Dynamic power management (DPM) techniques on mobile systems are indispensable for deep learning (DL) inference optimization, which is mainly performed on battery-based mobile and/or embedded platforms with constrained resources. To this end, we characterize CNN workloads using object detection applications of YOLOv4/-tiny and YOLOv3/-tiny, and then propose integrated CPU–GPU DVFS governor policies that scale integrated pairs of CPU and GPU frequencies to improve energy–delay product (EDP) with negligible inference execution time degradation. Our results show up to 16.7% EDP improvements with negligible (mostly less than 2%) performance degradation using object detection applications on NVIDIA Jetson TX2.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"202-205"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135700345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Swift-CNN: Leveraging PCM Memory’s Fast Write Mode to Accelerate CNNs Swift-CNN:利用PCM内存的快速写入模式来加速cnn
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3298742
Lokesh Siddhu;Hassan Nassar;Lars Bauer;Christian Hakert;Nils Hölscher;Jian-Jia Chen;Joerg Henkel
Nonvolatile memories [especially phase change memories (PCMs)] offer scalability and higher density. However, reduced write performance has limited their use as main memory. Researchers have explored using the fast write mode available in PCM to alleviate the challenges. The fast write mode offers lower write latency and energy consumption. However, the fast-written data are retained for a limited time and need to be refreshed. Prior works perform fast writes when the memory is busy and use slow writes to refresh the data during memory idle phases. Such policies do not consider the retention time requirement of a variable and repeat all the writes made during the busy phase. In this work, we suggest a retention-time-aware selection of write modes. As a case study, we use convolutional neural networks (CNNs) and present a novel algorithm, Swift-CNN, that assesses each CNN layer’s memory access behavior and retention time requirement and suggests an appropriate PCM write mode. Our results show that Swift-CNN decreases inference and training execution time and memory energy compared to state-of-the-art techniques and achieves execution time close to the ideal (fast write-only) policy.
非易失性存储器(特别是相变存储器,PCM)提供可扩展性和更高的密度。然而,写性能的降低限制了它们作为主存的使用。研究人员已经探索了使用PCM中可用的快速写入模式来缓解挑战。快速写模式具有较低的写时延和较低的写能耗。但是,快速写入的数据保留有限的时间,并且需要刷新。先前的工作在内存繁忙时执行快速写入,并在内存空闲阶段使用慢速写入来刷新数据。这种策略不考虑变量的保留时间要求,而是重复繁忙阶段的所有写操作。在这项工作中,我们建议对写入模式进行保留时间感知选择。作为案例研究,我们使用卷积神经网络(CNN)并提出了一种新的算法Swift-CNN,该算法评估每个CNN层的内存访问行为和保留时间要求,并建议适当的PCM写入模式。我们的结果表明,与最先进的技术相比,Swift-CNN减少了推理和训练的执行时间和内存能量,并实现了接近理想(快速只写)策略的执行时间。
{"title":"Swift-CNN: Leveraging PCM Memory’s Fast Write Mode to Accelerate CNNs","authors":"Lokesh Siddhu;Hassan Nassar;Lars Bauer;Christian Hakert;Nils Hölscher;Jian-Jia Chen;Joerg Henkel","doi":"10.1109/LES.2023.3298742","DOIUrl":"10.1109/LES.2023.3298742","url":null,"abstract":"Nonvolatile memories [especially phase change memories (PCMs)] offer scalability and higher density. However, reduced write performance has limited their use as main memory. Researchers have explored using the fast write mode available in PCM to alleviate the challenges. The fast write mode offers lower write latency and energy consumption. However, the fast-written data are retained for a limited time and need to be refreshed. Prior works perform fast writes when the memory is busy and use slow writes to refresh the data during memory idle phases. Such policies do not consider the retention time requirement of a variable and repeat all the writes made during the busy phase. In this work, we suggest a retention-time-aware selection of write modes. As a case study, we use convolutional neural networks (CNNs) and present a novel algorithm, Swift-CNN, that assesses each CNN layer’s memory access behavior and retention time requirement and suggests an appropriate PCM write mode. Our results show that Swift-CNN decreases inference and training execution time and memory energy compared to state-of-the-art techniques and achieves execution time close to the ideal (fast write-only) policy.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"234-237"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135700343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Automating FPGA Design Build Flow Using GitLab CI 使用 GitLab CI 实现 FPGA 设计构建流程自动化
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-12 DOI: 10.1109/LES.2023.3314148
Chimezie Eguzo;Benedikt Scherer;Daniel Keßel;Ilja Bekman;Matthias Streun;Mario Schlosser;Stefan van Waasen
Building and testing software for embedded systems can be challenging with an impact on delivery time, design reproducibility, and collaboration among project contributors. To accelerate project development, presented here is an automated build flow that utilizes Xilinx PetaLinux, and field programmable gate array (FPGA) hardware description and integrates with the GitLab continuous integration and continuous deployment (CI/CD) framework for embedded targets. This build flow automates the complete process of FPGA implementation, PetaLinux configuration, and cross-compilation of software essentials for the target system-on-chip (SoC). The system has been successfully deployed in cross-compiling the control and command toolset for the Positron Emission Tomography scanner (PhenoPET) and the implementation of the message queuing telemetry transport (MQTT) service on a Xilinx Zynq Ultrascale MPSoC. This approach can be easily adapted to other projects with specific requirements.
嵌入式系统软件的构建和测试具有挑战性,会对交付时间、设计可重复性和项目贡献者之间的协作产生影响。为了加快项目开发,本文介绍了一种自动构建流程,它利用赛灵思 PetaLinux 和现场可编程门阵列(FPGA)硬件描述,并与针对嵌入式目标的 GitLab 持续集成和持续部署(CI/CD)框架集成。该构建流程可自动完成 FPGA 实施、PetaLinux 配置和目标片上系统(SoC)软件要件交叉编译的整个过程。该系统已成功应用于正电子发射断层扫描仪(PhenoPET)控制和命令工具集的交叉编译,以及 Xilinx Zynq Ultrascale MPSoC 上消息队列遥测传输(MQTT)服务的实施。这种方法可轻松适用于具有特定要求的其他项目。
{"title":"On Automating FPGA Design Build Flow Using GitLab CI","authors":"Chimezie Eguzo;Benedikt Scherer;Daniel Keßel;Ilja Bekman;Matthias Streun;Mario Schlosser;Stefan van Waasen","doi":"10.1109/LES.2023.3314148","DOIUrl":"10.1109/LES.2023.3314148","url":null,"abstract":"Building and testing software for embedded systems can be challenging with an impact on delivery time, design reproducibility, and collaboration among project contributors. To accelerate project development, presented here is an automated build flow that utilizes Xilinx PetaLinux, and field programmable gate array (FPGA) hardware description and integrates with the GitLab continuous integration and continuous deployment (CI/CD) framework for embedded targets. This build flow automates the complete process of FPGA implementation, PetaLinux configuration, and cross-compilation of software essentials for the target system-on-chip (SoC). The system has been successfully deployed in cross-compiling the control and command toolset for the Positron Emission Tomography scanner (PhenoPET) and the implementation of the message queuing telemetry transport (MQTT) service on a Xilinx Zynq Ultrascale MPSoC. This approach can be easily adapted to other projects with specific requirements.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 2","pages":"227-230"},"PeriodicalIF":1.6,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135400128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Embedded Systems Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1