首页 > 最新文献

IEEE Embedded Systems Letters最新文献

英文 中文
Empowering Edge Devices With Processing-in-Memory for On-Device Language Inference 赋予边缘设备与内存中处理设备上的语言推理
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-02-05 DOI: 10.1109/LES.2025.3538827
Jimin Lee;Soonhoi Ha
The rapid advancement of deep learning (DL) models has led to a pressing need for efficient on-device DL solutions, particularly for edge devices with limited resources. processing-in-memory (PIM) technology is considered a promising technology to address the worsening memory wall problem by integrating processing capabilities directly into memory modules. This letter evaluates the potential of Samsung PIM technology in enhancing the performance of on-device language inference. We assess the impact of PIM on the inference stage of three transformer models, Gemma, Qwen2, and TinyBERT demonstrating an average 1.92x speed-up in end-to-end latency compared to CPU by offloading all linear layers to PIM. Notably, Qwen2, which has characteristics favorable to PIM, achieves a 1.25x speed-up in end-to-end latency compared to GPU. Our findings emphasize the importance of understanding model characteristics for effective PIM deployment. The results demonstrate the PIM solution’s efficiency in enabling on-device language models and its edge deployment potential.
深度学习(DL)模型的快速发展导致了对高效的设备上深度学习解决方案的迫切需求,特别是对于资源有限的边缘设备。内存中处理(PIM)技术通过将处理能力直接集成到内存模块中,被认为是解决日益恶化的内存墙问题的一种很有前途的技术。这封信评估了三星PIM技术在提高设备上语言推理性能方面的潜力。我们评估了PIM对三个变压器模型(Gemma, Qwen2和TinyBERT)的推理阶段的影响,通过将所有线性层卸载到PIM,与CPU相比,端到端延迟的平均速度提高了1.92倍。值得注意的是,Qwen2具有有利于PIM的特性,与GPU相比,端到端延迟速度提高了1.25倍。我们的发现强调了理解模型特征对于有效的PIM部署的重要性。结果证明了PIM解决方案在启用设备上语言模型方面的效率及其边缘部署潜力。
{"title":"Empowering Edge Devices With Processing-in-Memory for On-Device Language Inference","authors":"Jimin Lee;Soonhoi Ha","doi":"10.1109/LES.2025.3538827","DOIUrl":"https://doi.org/10.1109/LES.2025.3538827","url":null,"abstract":"The rapid advancement of deep learning (DL) models has led to a pressing need for efficient on-device DL solutions, particularly for edge devices with limited resources. processing-in-memory (PIM) technology is considered a promising technology to address the worsening memory wall problem by integrating processing capabilities directly into memory modules. This letter evaluates the potential of Samsung PIM technology in enhancing the performance of on-device language inference. We assess the impact of PIM on the inference stage of three transformer models, Gemma, Qwen2, and TinyBERT demonstrating an average 1.92x speed-up in end-to-end latency compared to CPU by offloading all linear layers to PIM. Notably, Qwen2, which has characteristics favorable to PIM, achieves a 1.25x speed-up in end-to-end latency compared to GPU. Our findings emphasize the importance of understanding model characteristics for effective PIM deployment. The results demonstrate the PIM solution’s efficiency in enabling on-device language models and its edge deployment potential.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 4","pages":"244-247"},"PeriodicalIF":2.0,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144843066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid Recursive Karatsuba Multiplications on FPGAs fpga的混合递归Karatsuba乘法
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-02-04 DOI: 10.1109/LES.2025.3538470
Monalisa Das;Babita Jajodia
The demand for large integer polynomial multiplications has become increasingly significant in modern cryptographic algorithms. The practical implementation of such multipliers presents a field of research focused on optimizing hardware design concerning space and time complexity. In this letter, the authors propose an efficient polynomial multiplier based on a hybrid recursive Karatsuba multiplication (HRKM) algorithm. The overall performance of the proposed design is evaluated using the area-time-product (ATP) metric. The hardware implementation of the proposed architecture is carried out on a Virtex-7 FPGA device using the Xilinx ISE platform. Hardware implementation results show that the proposed HRKM architecture shows ATP reduction of 67.885%, 70.128%, and 65.869% for 128, 256, and 512 bits, respectively, in comparison to Hybrid Karatsuba (nonrecursive) multiplications.
在现代密码算法中,对大整数多项式乘法的需求越来越大。这种乘法器的实际实现是一个关注空间和时间复杂度优化硬件设计的研究领域。在这篇文章中,作者提出了一种基于混合递归Karatsuba乘法(HRKM)算法的高效多项式乘法器。采用面积-时间积(ATP)度量来评估所提出设计的总体性能。该架构的硬件实现是在使用Xilinx ISE平台的Virtex-7 FPGA设备上进行的。硬件实现结果表明,与Hybrid Karatsuba(非递归)乘法相比,所提出的HRKM架构在128位、256位和512位时分别减少了67.885%、70.128%和65.869%的ATP。
{"title":"Hybrid Recursive Karatsuba Multiplications on FPGAs","authors":"Monalisa Das;Babita Jajodia","doi":"10.1109/LES.2025.3538470","DOIUrl":"https://doi.org/10.1109/LES.2025.3538470","url":null,"abstract":"The demand for large integer polynomial multiplications has become increasingly significant in modern cryptographic algorithms. The practical implementation of such multipliers presents a field of research focused on optimizing hardware design concerning space and time complexity. In this letter, the authors propose an efficient polynomial multiplier based on a hybrid recursive Karatsuba multiplication (HRKM) algorithm. The overall performance of the proposed design is evaluated using the area-time-product (ATP) metric. The hardware implementation of the proposed architecture is carried out on a Virtex-7 FPGA device using the Xilinx ISE platform. Hardware implementation results show that the proposed HRKM architecture shows ATP reduction of 67.885%, 70.128%, and 65.869% for 128, 256, and 512 bits, respectively, in comparison to Hybrid Karatsuba (nonrecursive) multiplications.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 4","pages":"240-243"},"PeriodicalIF":2.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144842961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adversarial Attack Bypass by Stochastic Computing 基于随机计算的对抗性攻击绕过
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-02-04 DOI: 10.1109/LES.2025.3538552
Faeze S. Banitaba;Sercan Aygun;Mehran Shoushtari Moghadam;Amirhossein Jalilvand;Bingzhe Li;M. Hassan Najafi
Deep learning excels by utilizing vast datasets and sophisticated training algorithms. It achieves superior performance across many machine learning challenges compared to traditional methods. However, deep neural networks (DNNs) are not flawless; they are particularly susceptible to adversarial samples during the inference phase. These inputs area deliberately designed by attackers to cause DNNs to make incorrect classifications, exploiting the networks’ vulnerabilities. This letter proposes a novel perspective to fortify the neural network (NN) defense against adversarial attacks. We enhance the NN security by employing an emerging model of computation, namely, stochastic computing (SC). We show that strengthening NN with SC counteracts the adverse effects of these attacks on an NN output and adds a vital defense layer. Our evaluation results reveal that SC notably increases NN robustness and decreases susceptibility to interference, creating secure, reliable NN systems. The proposed method improves accuracy and reduces hardware footprint and energy consumption by up to 85%, 88%, and 95%, respectively.
深度学习通过利用庞大的数据集和复杂的训练算法而表现出色。与传统方法相比,它在许多机器学习挑战中实现了卓越的性能。然而,深度神经网络(dnn)并非完美无缺;在推理阶段,它们特别容易受到对抗性样本的影响。这些输入是由攻击者故意设计的,以导致dnn做出错误的分类,利用网络的漏洞。这封信提出了一个新的视角来加强神经网络(NN)对对抗性攻击的防御。我们通过采用一种新兴的计算模型,即随机计算(SC)来增强神经网络的安全性。我们表明,用SC增强NN抵消了这些攻击对NN输出的不利影响,并增加了一个重要的防御层。我们的评估结果表明,SC显著提高了神经网络的鲁棒性,降低了对干扰的敏感性,创建了安全、可靠的神经网络系统。该方法提高了精度,减少了85%、88%和95%的硬件占用和能耗。
{"title":"Adversarial Attack Bypass by Stochastic Computing","authors":"Faeze S. Banitaba;Sercan Aygun;Mehran Shoushtari Moghadam;Amirhossein Jalilvand;Bingzhe Li;M. Hassan Najafi","doi":"10.1109/LES.2025.3538552","DOIUrl":"https://doi.org/10.1109/LES.2025.3538552","url":null,"abstract":"Deep learning excels by utilizing vast datasets and sophisticated training algorithms. It achieves superior performance across many machine learning challenges compared to traditional methods. However, deep neural networks (DNNs) are not flawless; they are particularly susceptible to adversarial samples during the inference phase. These inputs area deliberately designed by attackers to cause DNNs to make incorrect classifications, exploiting the networks’ vulnerabilities. This letter proposes a novel perspective to fortify the neural network (NN) defense against adversarial attacks. We enhance the NN security by employing an emerging model of computation, namely, stochastic computing (SC). We show that strengthening NN with SC counteracts the adverse effects of these attacks on an NN output and adds a vital defense layer. Our evaluation results reveal that SC notably increases NN robustness and decreases susceptibility to interference, creating secure, reliable NN systems. The proposed method improves accuracy and reduces hardware footprint and energy consumption by up to 85%, 88%, and 95%, respectively.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 4","pages":"234-239"},"PeriodicalIF":2.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144843086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Template-Based Methodology for Efficient DNNs Inference on FPGA Devices With HW-SW Co-Design HW-SW协同设计的基于模板的FPGA器件dnn高效推理方法
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-02-03 DOI: 10.1109/LES.2025.3538159
Swati;Shantanu Banarjee;Pinalkumar Engineer
Convolutional neural networks (CNNs) are the epitome of artificial intelligence (AI)-based applications. The computationally intensive convolution operation is the core of the entire architecture. Acceleration of CNN-based applications requires several algorithmic level manipulations and optimization for resource-constrained devices. In this work, we have proposed a template-based methodology for CNN acceleration on field programmable gate arrays (FPGA) hardware by designing reusable cores for individual layers like convolution, pooling, and dense layers. We explored various optimization techniques to achieve the best-hardware-designing strategy with data reuse and design space exploration. We have verified our methodology for LeNet-5 with kernel $5times 5$ and a custom CNN with kernel $3times 3$ for classification. The hardware-system design was validated on FPGA Xilinx XC7Z020 FPGA. Our proposed methodology achieves 2.9 GOPS/s performance outperforming existing implementation by $1.28times $ .
卷积神经网络(cnn)是基于人工智能(AI)的应用的缩影。计算密集型的卷积运算是整个体系结构的核心。基于cnn的应用程序的加速需要对资源受限的设备进行若干算法级操作和优化。在这项工作中,我们提出了一种基于模板的方法,通过为卷积、池化和密集层等各个层设计可重用核心,在现场可编程门阵列(FPGA)硬件上实现CNN加速。我们探索了各种优化技术,以实现具有数据重用和设计空间探索的最佳硬件设计策略。我们已经验证了LeNet-5的方法,内核$5乘以5$和自定义CNN的内核$3乘以3$进行分类。硬件系统设计在Xilinx XC7Z020 FPGA上进行了验证。我们提出的方法实现了2.9 GOPS/s的性能,比现有的实现高出1.28倍。
{"title":"A Template-Based Methodology for Efficient DNNs Inference on FPGA Devices With HW-SW Co-Design","authors":"Swati;Shantanu Banarjee;Pinalkumar Engineer","doi":"10.1109/LES.2025.3538159","DOIUrl":"https://doi.org/10.1109/LES.2025.3538159","url":null,"abstract":"Convolutional neural networks (CNNs) are the epitome of artificial intelligence (AI)-based applications. The computationally intensive convolution operation is the core of the entire architecture. Acceleration of CNN-based applications requires several algorithmic level manipulations and optimization for resource-constrained devices. In this work, we have proposed a template-based methodology for CNN acceleration on field programmable gate arrays (FPGA) hardware by designing reusable cores for individual layers like convolution, pooling, and dense layers. We explored various optimization techniques to achieve the best-hardware-designing strategy with data reuse and design space exploration. We have verified our methodology for LeNet-5 with kernel <inline-formula> <tex-math>$5times 5$ </tex-math></inline-formula> and a custom CNN with kernel <inline-formula> <tex-math>$3times 3$ </tex-math></inline-formula> for classification. The hardware-system design was validated on FPGA Xilinx XC7Z020 FPGA. Our proposed methodology achieves 2.9 GOPS/s performance outperforming existing implementation by <inline-formula> <tex-math>$1.28times $ </tex-math></inline-formula>.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 4","pages":"280-283"},"PeriodicalIF":2.0,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144843065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-Preserving Anomaly Detection With Homomorphic Encryption for Industrial Control Systems in Critical Infrastructure 基于同态加密的关键基础设施工业控制系统隐私保护异常检测
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-02-03 DOI: 10.1109/LES.2025.3538013
Dahoon Jeong;Yooshin Kim;Donghoon Shin
Critical infrastructure (CI) is essential for societal and economic stability, making it a prime target for cyber threats. Traditional anomaly detection models like LSTM and Transformers require substantial computational resources, which are often unavailable in CI environments. Cloud computing offers on-demand resources but introduces privacy concerns due to the need to transmit sensitive data to cloud servers. Homomorphic encryption (HE) enables secure processing of encrypted data but is computationally intensive, particularly due to operations like bootstrapping. This letter proposes a bootstrapping-free lightweight anomaly detection model optimized for homomorphically encrypted data, leveraging CI’s operational characteristics. The model employs a two-stage data separation process and introduces state-vectors for normal operation detection, forming a allowlist anomaly detection approach. Experimental results on the SWaT and WADI datasets demonstrate the model’s competitive performance and efficiency, with significantly reduced training times while maintaining robust security.
关键基础设施(CI)对社会和经济稳定至关重要,使其成为网络威胁的主要目标。传统的异常检测模型(如LSTM和Transformers)需要大量的计算资源,而这些资源在CI环境中通常是不可用的。云计算提供按需资源,但由于需要将敏感数据传输到云服务器,因此引入了隐私问题。同态加密(HE)支持加密数据的安全处理,但计算量很大,特别是由于自引导等操作。这封信提出了一种无需引导的轻量级异常检测模型,该模型利用CI的操作特性,针对同态加密数据进行了优化。该模型采用两阶段数据分离过程,并引入状态向量进行正常运行检测,形成允许列表异常检测方法。在SWaT和WADI数据集上的实验结果表明,该模型具有竞争力的性能和效率,在保持鲁棒安全性的同时显著减少了训练时间。
{"title":"Privacy-Preserving Anomaly Detection With Homomorphic Encryption for Industrial Control Systems in Critical Infrastructure","authors":"Dahoon Jeong;Yooshin Kim;Donghoon Shin","doi":"10.1109/LES.2025.3538013","DOIUrl":"https://doi.org/10.1109/LES.2025.3538013","url":null,"abstract":"Critical infrastructure (CI) is essential for societal and economic stability, making it a prime target for cyber threats. Traditional anomaly detection models like LSTM and Transformers require substantial computational resources, which are often unavailable in CI environments. Cloud computing offers on-demand resources but introduces privacy concerns due to the need to transmit sensitive data to cloud servers. Homomorphic encryption (HE) enables secure processing of encrypted data but is computationally intensive, particularly due to operations like bootstrapping. This letter proposes a bootstrapping-free lightweight anomaly detection model optimized for homomorphically encrypted data, leveraging CI’s operational characteristics. The model employs a two-stage data separation process and introduces state-vectors for normal operation detection, forming a allowlist anomaly detection approach. Experimental results on the SWaT and WADI datasets demonstrate the model’s competitive performance and efficiency, with significantly reduced training times while maintaining robust security.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 4","pages":"276-279"},"PeriodicalIF":2.0,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144842984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online Internal Resistance Computation-Based Early Sensing of Thermal Runaway for Smart Fault Handling System (FHS) of Li-Ion Batteries 基于在线内阻计算的锂离子电池智能故障处理系统热失控早期感知
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-01-28 DOI: 10.1109/LES.2025.3535836
Abhijit Dey;Supratik Mondal;Biswajit Chakraborty;Sovan Dalai;Kesab Bhattacharya
In this letter, a microcontroller-based smart fault handling system (FHS) is proposed which is capable of early sensing and managing the thermal runaway (TR) event in real-time battery management system (BMS) through online internal resistance (IR) computation method. In overcharging region of lithium-ion (Li-ion) batteries (LIBs), TR is one of the critical issues which occur when used in electric vehicles (EVs) and battery energy storage systems (BESSs). Therefore, a proper subsystem is utmost required in the BMS for detecting the TR event quite early, which will automatically prevent the battery modules from critical accidents like fire, explosion, etc. The developed smart FHS utilizes an efficient, cost-effective, and reliable online IR sensing-based early TR sensing (ETRS) system which detects the TR event ~3.9 min prior to the TR onset point (outperforming the other detection methods) and shuts down the charging mechanism. Additionally, this system sends an IoT-based short message service (SMS) alert notification to the users allowing them to take necessary preventive steps.
本文提出了一种基于微控制器的智能故障处理系统(FHS),该系统能够通过在线内阻(IR)计算方法,对电池实时管理系统(BMS)中的热失控(TR)事件进行早期感知和管理。在锂离子电池的过充电区,TR是电动汽车和电池储能系统中遇到的关键问题之一。因此,BMS最需要一个合适的子系统来尽早检测TR事件,从而自动防止电池模块发生火灾、爆炸等重大事故。所开发的智能FHS采用了一种高效、经济、可靠的基于在线红外传感的早期TR传感(ETRS)系统,该系统在TR发生点前3.9分钟检测到TR事件(优于其他检测方法)并关闭充电机制。此外,该系统还会向用户发送基于物联网的短信服务(SMS)警报通知,让他们采取必要的预防措施。
{"title":"Online Internal Resistance Computation-Based Early Sensing of Thermal Runaway for Smart Fault Handling System (FHS) of Li-Ion Batteries","authors":"Abhijit Dey;Supratik Mondal;Biswajit Chakraborty;Sovan Dalai;Kesab Bhattacharya","doi":"10.1109/LES.2025.3535836","DOIUrl":"https://doi.org/10.1109/LES.2025.3535836","url":null,"abstract":"In this letter, a microcontroller-based smart fault handling system (FHS) is proposed which is capable of early sensing and managing the thermal runaway (TR) event in real-time battery management system (BMS) through online internal resistance (IR) computation method. In overcharging region of lithium-ion (Li-ion) batteries (LIBs), TR is one of the critical issues which occur when used in electric vehicles (EVs) and battery energy storage systems (BESSs). Therefore, a proper subsystem is utmost required in the BMS for detecting the TR event quite early, which will automatically prevent the battery modules from critical accidents like fire, explosion, etc. The developed smart FHS utilizes an efficient, cost-effective, and reliable online IR sensing-based early TR sensing (ETRS) system which detects the TR event ~3.9 min prior to the TR onset point (outperforming the other detection methods) and shuts down the charging mechanism. Additionally, this system sends an IoT-based short message service (SMS) alert notification to the users allowing them to take necessary preventive steps.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 4","pages":"284-287"},"PeriodicalIF":2.0,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144843087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight Surveillance Image Classification Through Hardware-Software Co-Design 基于软硬件协同设计的轻量化监控图像分类
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-01-27 DOI: 10.1109/LES.2025.3534237
Abhishek Yadav;Vyom Kumar Gupta;Binod Kumar
This work designs and implements a custom hardware accelerator for single object classification from drone imagery, for surveillance applications. A lightweight attention-based convolutional neural network (CNN) is developed and translated into hardware implementation as an IP/core. This accelerator is implemented as programmable logic (PL) and further optimized with buffer incorporation through high-level synthesis (HLS). The optimized PL is integrated with a processing system (PS), i.e., ZYNQ UltraScale+ MPSoC, enabling a hardware/software co-design paradigm for enhancing the one-versus-rest classification task. The system architecture is tested through the PYNQ overlay process. Experimental results show the architecture is lightweight (1.97 MB) and requires 8.9 million trainable parameters, on the targeted ZCU104 embedded FPGA board. It performs 53.8 Giga MAC operations, achieves an inference time of 1.05 ms, and a throughput of 947.2 frames per second (FPS). It consumes 5.65 W of power at 100 MHz of frequency, shows an efficiency of 9.52 GOPs/W, and dissipates 0.006 J of energy per inference. Codes and subsequent files are available at https://shorturl.at/iX0jw.
这项工作设计并实现了一个定制的硬件加速器,用于从无人机图像中对单个目标进行分类,用于监视应用。开发了一种轻量级的基于注意力的卷积神经网络(CNN),并将其转换为IP/核的硬件实现。该加速器采用可编程逻辑(PL)实现,并通过高级合成(HLS)进一步优化缓冲。优化后的PL与处理系统(PS)集成,即ZYNQ UltraScale+ MPSoC,支持硬件/软件协同设计范例,以增强one-versus-rest分类任务。通过PYNQ覆盖过程对系统架构进行了测试。实验结果表明,该架构轻量级(1.97 MB),需要890万个可训练参数,在目标ZCU104嵌入式FPGA板上实现。它执行53.8千兆MAC操作,实现1.05 ms的推理时间和947.2帧/秒(FPS)的吞吐量。在100mhz频率下,功耗为5.65 W,效率为9.52 GOPs/W,每次推理能耗为0.006 J。代码和后续文件可从https://shorturl.at/iX0jw获得。
{"title":"Lightweight Surveillance Image Classification Through Hardware-Software Co-Design","authors":"Abhishek Yadav;Vyom Kumar Gupta;Binod Kumar","doi":"10.1109/LES.2025.3534237","DOIUrl":"https://doi.org/10.1109/LES.2025.3534237","url":null,"abstract":"This work designs and implements a custom hardware accelerator for single object classification from drone imagery, for surveillance applications. A lightweight attention-based convolutional neural network (CNN) is developed and translated into hardware implementation as an IP/core. This accelerator is implemented as programmable logic (PL) and further optimized with buffer incorporation through high-level synthesis (HLS). The optimized PL is integrated with a processing system (PS), i.e., ZYNQ UltraScale+ MPSoC, enabling a hardware/software co-design paradigm for enhancing the one-versus-rest classification task. The system architecture is tested through the PYNQ overlay process. Experimental results show the architecture is lightweight (1.97 MB) and requires 8.9 million trainable parameters, on the targeted ZCU104 embedded FPGA board. It performs 53.8 Giga MAC operations, achieves an inference time of 1.05 ms, and a throughput of 947.2 frames per second (FPS). It consumes 5.65 W of power at 100 MHz of frequency, shows an efficiency of 9.52 GOPs/W, and dissipates 0.006 J of energy per inference. Codes and subsequent files are available at <uri>https://shorturl.at/iX0jw</uri>.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 4","pages":"222-225"},"PeriodicalIF":2.0,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144842963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kol-4-Gen: Stacked Kolmogorov-Arnold and Generative Adversarial Networks for Malware Binary Classification Through Visual Analysis 基于视觉分析的恶意软件二进制分类的堆叠Kolmogorov-Arnold和生成对抗网络
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-01-23 DOI: 10.1109/LES.2025.3529625
Anurag Dutta;Satya Prakash Nayak;Ruchira Naskar;Rajat Subhra Chakraborty
Malware identification and classification is an active field of research. A popular approach is to classify malware binaries using visual analysis, by converting malware binaries into images, which reveal different class-specific patterns. To develop a highly accurate multiclass malware classifier, in this letter, we propose Kol-4-Gen, a set of four novel deep learning models based on the Kolmogorov-Arnold Network (KAN) with trainable activation functions, and a generative adversarial network (GAN) to address data imbalance (if applicable) during training. Our models, tested on the standard Malimg (grayscale, imbalanced, 25 classes), Malevis (RGB, balanced, 26 classes), and the miniature Virus-MNIST (grayscale, imbalanced, 10 classes) datasets, outperform state-of-the-art (S-O-T-A) models, achieving $approx 99.36%$ , $approx 95.44%$ , and $approx 92.12%$ validation accuracy, respectively.
恶意软件的识别和分类是一个活跃的研究领域。一种流行的方法是使用可视化分析对恶意软件二进制文件进行分类,方法是将恶意软件二进制文件转换为图像,从而揭示不同的类特定模式。为了开发一个高度精确的多类恶意软件分类器,在这篇文章中,我们提出了koll -4- gen,这是一组基于Kolmogorov-Arnold网络(KAN)的四种新型深度学习模型,具有可训练的激活函数,以及一个生成对抗网络(GAN),以解决训练期间的数据不平衡(如果适用)。我们的模型在标准Malimg(灰度,不平衡,25类)、Malevis (RGB,平衡,26类)和微型病毒- mnist(灰度,不平衡,10类)数据集上进行了测试,优于最先进的(S-O-T-A)模型,分别达到了约99.36%、95.44%和92.12%的验证精度。
{"title":"Kol-4-Gen: Stacked Kolmogorov-Arnold and Generative Adversarial Networks for Malware Binary Classification Through Visual Analysis","authors":"Anurag Dutta;Satya Prakash Nayak;Ruchira Naskar;Rajat Subhra Chakraborty","doi":"10.1109/LES.2025.3529625","DOIUrl":"https://doi.org/10.1109/LES.2025.3529625","url":null,"abstract":"Malware identification and classification is an active field of research. A popular approach is to classify malware binaries using visual analysis, by converting malware binaries into images, which reveal different class-specific patterns. To develop a highly accurate multiclass malware classifier, in this letter, we propose Kol-4-Gen, a set of four novel deep learning models based on the Kolmogorov-Arnold Network (KAN) with trainable activation functions, and a generative adversarial network (GAN) to address data imbalance (if applicable) during training. Our models, tested on the standard <monospace>Malimg</monospace> (grayscale, imbalanced, 25 classes), <monospace>Malevis</monospace> (RGB, balanced, 26 classes), and the miniature <monospace>Virus-MNIST</monospace> (grayscale, imbalanced, 10 classes) datasets, outperform state-of-the-art (S-O-T-A) models, achieving <inline-formula> <tex-math>$approx 99.36%$ </tex-math></inline-formula>, <inline-formula> <tex-math>$approx 95.44%$ </tex-math></inline-formula>, and <inline-formula> <tex-math>$approx 92.12%$ </tex-math></inline-formula> validation accuracy, respectively.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 4","pages":"268-271"},"PeriodicalIF":2.0,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10851320","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144843062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A RISC-V-Based High-Throughput Accelerator for Sparse Winograd CNN Inference on FPGA 基于risc - v的稀疏Winograd CNN推理高通量加速器
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-01-17 DOI: 10.1109/LES.2025.3531251
Shabirahmed Badashasab Jigalur;Chang-Ling Tsai;Yu-Chi Shih;Yen-Cheng Kuan
This letter proposes a RISC-V-based accelerator for inferring a model that uses efficient sparse Winograd convolutional neural networks. This accelerator consists of a RISC-V processor (Andes NX27V) and a coprocessor; the latter performs the Winograd-ReLU convolutions and fully connected layers of the network. The pooling and ReLU layers of the network are executed by the processor in parallel with the coprocessor to increase throughput. In addition, on-chip buffers are used for the input/output data and filter weights to ensure pipelined operation. Implemented on an AMD VCU118 FPGA platform operating at 250 MHz, the accelerator achieves an average throughput of 5104.6 GOP/s when inferring a VGG16-based model.
这封信提出了一个基于risc - v的加速器,用于推断使用高效稀疏Winograd卷积神经网络的模型。该加速器由一个RISC-V处理器(Andes NX27V)和一个协处理器组成;后者执行Winograd-ReLU卷积和网络的完全连接层。网络的池和ReLU层由处理器与协处理器并行执行,以提高吞吐量。此外,片上缓冲器用于输入/输出数据和过滤器权重,以确保流水线操作。在工作频率为250 MHz的AMD VCU118 FPGA平台上实现,在推断基于vgg16的模型时,该加速器的平均吞吐量达到5104.6 GOP/s。
{"title":"A RISC-V-Based High-Throughput Accelerator for Sparse Winograd CNN Inference on FPGA","authors":"Shabirahmed Badashasab Jigalur;Chang-Ling Tsai;Yu-Chi Shih;Yen-Cheng Kuan","doi":"10.1109/LES.2025.3531251","DOIUrl":"https://doi.org/10.1109/LES.2025.3531251","url":null,"abstract":"This letter proposes a RISC-V-based accelerator for inferring a model that uses efficient sparse Winograd convolutional neural networks. This accelerator consists of a RISC-V processor (Andes NX27V) and a coprocessor; the latter performs the Winograd-ReLU convolutions and fully connected layers of the network. The pooling and ReLU layers of the network are executed by the processor in parallel with the coprocessor to increase throughput. In addition, on-chip buffers are used for the input/output data and filter weights to ensure pipelined operation. Implemented on an AMD VCU118 FPGA platform operating at 250 MHz, the accelerator achieves an average throughput of 5104.6 GOP/s when inferring a VGG16-based model.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 4","pages":"256-259"},"PeriodicalIF":2.0,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144843061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design, Construction, and Measurement of Branchline Coupler 支线耦合器的设计、建造和测量
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-01-16 DOI: 10.1109/LES.2025.3528326
Brian Maximiliano Gluzman;Ramiro Avalos Ribas;Jorge Castiñeira Moreira;Alejandro José Uriz;Juan Alberto Etcheverry
This letter proposes the design, construction, and measurement of a quadrature coupler at 2 GHz using microstrip technology. The device has the ability to divide the input signal into two, each one ideally 3 dB lower in power, and with a phase difference of 90° between them. The main novelty is the design of four coupler models with different proposals for intersections between the feed lines and the coupler sections. Based on the simulation results obtained, the best performing alternative is selected, and its dimensions are adjusted to operate at the design frequency. After construction the device was validated using a vector network analyzer (VNA).
这封信提出了设计,结构和测量一个正交耦合器在2 GHz使用微带技术。该设备能够将输入信号分成两个,理想情况下,每个信号的功率低3db,相位差为90°。主要的新颖之处在于四种耦合器模型的设计,对馈线和耦合器部分之间的交叉点有不同的建议。根据仿真结果,选择性能最佳的备选方案,并调整其尺寸,使其在设计频率下工作。构建完成后,使用矢量网络分析仪(VNA)对该装置进行了验证。
{"title":"Design, Construction, and Measurement of Branchline Coupler","authors":"Brian Maximiliano Gluzman;Ramiro Avalos Ribas;Jorge Castiñeira Moreira;Alejandro José Uriz;Juan Alberto Etcheverry","doi":"10.1109/LES.2025.3528326","DOIUrl":"https://doi.org/10.1109/LES.2025.3528326","url":null,"abstract":"This letter proposes the design, construction, and measurement of a quadrature coupler at 2 GHz using microstrip technology. The device has the ability to divide the input signal into two, each one ideally 3 dB lower in power, and with a phase difference of 90° between them. The main novelty is the design of four coupler models with different proposals for intersections between the feed lines and the coupler sections. Based on the simulation results obtained, the best performing alternative is selected, and its dimensions are adjusted to operate at the design frequency. After construction the device was validated using a vector network analyzer (VNA).","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 6","pages":"370-373"},"PeriodicalIF":2.0,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Embedded Systems Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1