首页 > 最新文献

Microprocessors and Microsystems最新文献

英文 中文
ViT-LoRA: Optimized vision transformer for efficient edge computing in medical imaging ViT-LoRA:用于医学成像中高效边缘计算的优化视觉转换器
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-24 DOI: 10.1016/j.micpro.2026.105251
Premalatha R , Jayanthi K B , Rajasekaran C , Sureshkumar R
Vision Transformer (ViT) models have demonstrated excellent performance in medical image processing. Their deployment in resource-constrained situations is limited by their high computational complexity and memory requirements. Although parameter-efficient tuning of ViT models is made possible by Low-Rank Adaptation (LoRA), its use in real-time clinical datasets and edge-device deployment is yet mainly unexplored. Using a real-time lung infection dataset, this research assesses ViT-LoRA's effectiveness in real-world medical imaging scenarios and investigates its generalisation potential on a public COVID-19 CT dataset. Four ViT fine-tuning procedures are thoroughly compared: LoRA-based tuning (ViT-LoRA), adapter-based tuning (ViT-APT), partial fine-tuning (ViT-PFT), and full fine-tuning (ViT-FFT). ViT-LoRA attains a testing accuracy of 98.50 % with only 2.104 million trainable parameters, resulting in a significantly reduced memory of 24.08 MB. The optimized ViT-LoRA Model has been deployed to a NVIDIA Jetson Nano and evaluated against the 30 test images. This evaluation of the ViT-LoRA Model resulted in an average of 3.44 seconds per test image for real-time edge-based medical imaging applications.
视觉变换模型在医学图像处理中表现出优异的性能。它们在资源受限情况下的部署受到其高计算复杂性和内存需求的限制。尽管低秩自适应(Low-Rank Adaptation, LoRA)可以实现ViT模型的参数高效调优,但其在实时临床数据集和边缘设备部署中的应用仍未得到充分探索。利用实时肺部感染数据集,本研究评估了ViT-LoRA在真实医学成像场景中的有效性,并调查了其在公共COVID-19 CT数据集上的推广潜力。对四种ViT微调过程进行了彻底的比较:基于lora的微调(viti - lora)、基于适配器的微调(viti - apt)、部分微调(viti - pft)和完全微调(viti - fft)。vitl - lora的测试准确率达到98.50%,只需要2104万个可训练参数,从而大大减少了24.08 MB的内存。优化后的vitl - lora模型已部署在NVIDIA Jetson Nano上,并针对30张测试图像进行了评估。对ViT-LoRA模型的评估结果是,对于实时边缘医学成像应用,每个测试图像的平均时间为3.44秒。
{"title":"ViT-LoRA: Optimized vision transformer for efficient edge computing in medical imaging","authors":"Premalatha R ,&nbsp;Jayanthi K B ,&nbsp;Rajasekaran C ,&nbsp;Sureshkumar R","doi":"10.1016/j.micpro.2026.105251","DOIUrl":"10.1016/j.micpro.2026.105251","url":null,"abstract":"<div><div>Vision Transformer (ViT) models have demonstrated excellent performance in medical image processing. Their deployment in resource-constrained situations is limited by their high computational complexity and memory requirements. Although parameter-efficient tuning of ViT models is made possible by Low-Rank Adaptation (LoRA), its use in real-time clinical datasets and edge-device deployment is yet mainly unexplored. Using a real-time lung infection dataset, this research assesses ViT-LoRA's effectiveness in real-world medical imaging scenarios and investigates its generalisation potential on a public COVID-19 CT dataset. Four ViT fine-tuning procedures are thoroughly compared: LoRA-based tuning (ViT-LoRA), adapter-based tuning (ViT-APT), partial fine-tuning (ViT-PFT), and full fine-tuning (ViT-FFT). ViT-LoRA attains a testing accuracy of 98.50 % with only 2.104 million trainable parameters, resulting in a significantly reduced memory of 24.08 MB. The optimized ViT-LoRA Model has been deployed to a NVIDIA Jetson Nano and evaluated against the 30 test images. This evaluation of the ViT-LoRA Model resulted in an average of 3.44 seconds per test image for real-time edge-based medical imaging applications.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"121 ","pages":"Article 105251"},"PeriodicalIF":2.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Edge computing System-on-Chip architecture for a Non-Intrusive Load Monitoring sensor in ambient intelligence applications 环境智能应用中非侵入式负载监测传感器的边缘计算片上系统架构
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-21 DOI: 10.1016/j.micpro.2026.105250
Rubén Nieto , Laura de Diego-Otón , Miguel Tapiador , Víctor M. Navarro , Santiago Murano , Álvaro Hernández , Jesús Ureña
Non-Intrusive Load Monitoring (NILM) systems allow the disaggregation of the individual consumption of different appliances from aggregate electrical measurements, for applications such as improving energy efficiency at home. In other contexts, NILM techniques are also useful to promote independent living for elderly, as they enable the inference and monitoring of their behavior through the analysis of their energy consumption and the identification of the appliances’ usage patterns. To achieve this, aggregated voltage and current signals are collected at the entrance of the house using a NILM sensor system. This analysis often involves sending the collected data to the cloud for further processing, which can result in significant bandwidth usage, especially when a high sampling rate approach is employed. In this work, a System-on-Chip (SoC) architecture based on a FPGA (Field-Programmable Gate Array) device is proposed for NILM processing, fully performed on edge computing. This architecture is focused on Ambient Intelligence for Independent Living (AIIL) of elderly. Voltage and current data are acquired at 4 kSPS (kilo Samples Per Second), where on/off switchings (events) of appliances are detected, thus delimiting a window of 4096 samples around both signals. These windows are processed by a Convolutional Neural Network (CNN) that implements the load identification. Unlike prior works that primarily focus on algorithmic enhancements, this study introduces a complete hardware/software design of a FPGA-based SoC architecture and its real-time validation. The proposed architecture achieves an inference latency of 56ms and a classification accuracy of 84.7% for fourteen classes (ON/OFF states of seven appliances), while reducing bandwidth usage by transmitting only the final identification instead of raw signals. These results demonstrate the feasibility of real-time implementations of NILM applications at the edge with competitive performance.
非侵入式负载监测(NILM)系统允许从综合电气测量中分解不同电器的个人消耗,用于提高家庭能源效率等应用。在其他情况下,NILM技术对于促进老年人的独立生活也很有用,因为它们可以通过分析他们的能源消耗和识别设备的使用模式来推断和监测他们的行为。为了实现这一目标,使用NILM传感器系统在房屋入口处收集汇总的电压和电流信号。这种分析通常涉及将收集到的数据发送到云进行进一步处理,这可能导致大量带宽使用,特别是在采用高采样率方法时。在这项工作中,提出了一种基于FPGA(现场可编程门阵列)器件的片上系统(SoC)架构,用于NILM处理,完全执行边缘计算。该建筑专注于老年人独立生活的环境智能(AIIL)。电压和电流数据以4 kSPS(每秒千样本)的速度获取,其中检测设备的开/关开关(事件),从而在两个信号周围划分4096个样本的窗口。这些窗口由卷积神经网络(CNN)处理,实现负载识别。与之前主要关注算法增强的工作不同,本研究介绍了基于fpga的SoC架构的完整硬件/软件设计及其实时验证。所提出的架构在14个类别(7个设备的开/关状态)中实现了56ms的推理延迟和84.7%的分类准确率,同时通过仅传输最终识别而不是原始信号来减少带宽使用。这些结果证明了NILM应用程序在具有竞争性性能的边缘实时实现的可行性。
{"title":"Edge computing System-on-Chip architecture for a Non-Intrusive Load Monitoring sensor in ambient intelligence applications","authors":"Rubén Nieto ,&nbsp;Laura de Diego-Otón ,&nbsp;Miguel Tapiador ,&nbsp;Víctor M. Navarro ,&nbsp;Santiago Murano ,&nbsp;Álvaro Hernández ,&nbsp;Jesús Ureña","doi":"10.1016/j.micpro.2026.105250","DOIUrl":"10.1016/j.micpro.2026.105250","url":null,"abstract":"<div><div>Non-Intrusive Load Monitoring (NILM) systems allow the disaggregation of the individual consumption of different appliances from aggregate electrical measurements, for applications such as improving energy efficiency at home. In other contexts, NILM techniques are also useful to promote independent living for elderly, as they enable the inference and monitoring of their behavior through the analysis of their energy consumption and the identification of the appliances’ usage patterns. To achieve this, aggregated voltage and current signals are collected at the entrance of the house using a NILM sensor system. This analysis often involves sending the collected data to the cloud for further processing, which can result in significant bandwidth usage, especially when a high sampling rate approach is employed. In this work, a System-on-Chip (SoC) architecture based on a FPGA (Field-Programmable Gate Array) device is proposed for NILM processing, fully performed on edge computing. This architecture is focused on Ambient Intelligence for Independent Living (AIIL) of elderly. Voltage and current data are acquired at 4 kSPS (kilo Samples Per Second), where on/off switchings (events) of appliances are detected, thus delimiting a window of 4096 samples around both signals. These windows are processed by a Convolutional Neural Network (CNN) that implements the load identification. Unlike prior works that primarily focus on algorithmic enhancements, this study introduces a complete hardware/software design of a FPGA-based SoC architecture and its real-time validation. The proposed architecture achieves an inference latency of <span><math><mrow><mn>56</mn><mspace></mspace><mi>ms</mi></mrow></math></span> and a classification accuracy of 84.7% for fourteen classes (ON/OFF states of seven appliances), while reducing bandwidth usage by transmitting only the final identification instead of raw signals. These results demonstrate the feasibility of real-time implementations of NILM applications at the edge with competitive performance.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"121 ","pages":"Article 105250"},"PeriodicalIF":2.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-precision positioning and timing method of GNSS receiver for mobile communication networks 移动通信网络GNSS接收机高精度定位授时方法
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-31 DOI: 10.1016/j.micpro.2025.105242
Haodong Zhao, Junna Shang
Currently, high-precision GNSS receivers are expensive and the cost of using them in mobile communication networks is extremely high. To reduce the construction cost of positioning and timing capabilities in mobile communication networks, the existing ordinary GNSS receivers in the network are used to form a self-differential enhanced iterative network to achieve high-precision positioning in local areas.Based on high-precision positioning, various delay errors in the current 1PPS second pulse are corrected by differential information data to solve the precise time of the local clock, thereby improving timing accuracy. In engineering applications, the self-differential enhanced iterative network algorithm is used to make embedded improvements to the antenna parameter sensor commonly used in mobile communication networks. The improved antenna parameter sensor has obtained high-precision positioning and timing functions based on the original attitude and direction measurement functions. Its positioning accuracy can reach millimeter level, and the timing accuracy can reach 20 nanoseconds.
目前,高精度GNSS接收机价格昂贵,在移动通信网络中使用成本极高。为了降低移动通信网络中定位授时能力的建设成本,利用网络中已有的普通GNSS接收机组成自微分增强迭代网络,实现局部高精度定位。在高精度定位的基础上,利用差分信息数据对当前1PPS秒脉冲中的各种延迟误差进行校正,求解本地时钟的精确时间,从而提高授时精度。在工程应用中,采用自微分增强迭代网络算法对移动通信网络中常用的天线参数传感器进行嵌入式改进。改进后的天线参数传感器在原有姿态和方向测量功能的基础上获得了高精度的定位和授时功能。其定位精度可达毫米级,授时精度可达20纳秒。
{"title":"High-precision positioning and timing method of GNSS receiver for mobile communication networks","authors":"Haodong Zhao,&nbsp;Junna Shang","doi":"10.1016/j.micpro.2025.105242","DOIUrl":"10.1016/j.micpro.2025.105242","url":null,"abstract":"<div><div>Currently, high-precision GNSS receivers are expensive and the cost of using them in mobile communication networks is extremely high. To reduce the construction cost of positioning and timing capabilities in mobile communication networks, the existing ordinary GNSS receivers in the network are used to form a self-differential enhanced iterative network to achieve high-precision positioning in local areas.Based on high-precision positioning, various delay errors in the current 1PPS second pulse are corrected by differential information data to solve the precise time of the local clock, thereby improving timing accuracy. In engineering applications, the self-differential enhanced iterative network algorithm is used to make embedded improvements to the antenna parameter sensor commonly used in mobile communication networks. The improved antenna parameter sensor has obtained high-precision positioning and timing functions based on the original attitude and direction measurement functions. Its positioning accuracy can reach millimeter level, and the timing accuracy can reach 20 nanoseconds.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"121 ","pages":"Article 105242"},"PeriodicalIF":2.6,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A digital beamforming receiver architecture implemented on a FPGA for space applications 一种在FPGA上实现的用于空间应用的数字波束成形接收机架构
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-30 DOI: 10.1016/j.micpro.2025.105243
Eduardo Ortega , Agustín Martínez , Antonio Oliva , Fernando Sanz , Óscar Rodríguez , Manuel Prieto , Pablo Parra , Antonio da Silva , Sebastián Sánchez
The burgeoning interest within the space community in digital beamforming is largely attributable to the superior flexibility that satellites with active antenna systems offer for a wide range of applications, notably in communication services. This paper delves into the analysis and practical implementation of a Digital Beamforming and Digital Down Conversion (DDC) chain, leveraging a high-speed Analog-to-Digital Converter (ADC) certified for space applications alongside a high-performance Field-Programmable Gate Array (FPGA). The proposed design strategy focuses on optimizing resource efficiency and minimizing power consumption by strategically sequencing the beamformer processor ahead of the complex down-conversion operation. This innovative approach entails the application of demodulation and low-pass filtering exclusively to the aggregated beam channel, culminating in a marked reduction in the requisite digital signal processing resources relative to traditional, more resource-intensive digital beamforming and DDC architectures. In the experimental validation, an evaluation board integrating a high-speed ADC and a FPGA was utilized. This setup facilitated the empirical validation of the design’s efficacy by applying various RF input signals to the digital beamforming receiver system. The ADC employed is capable of high-resolution signal processing, while the FPGA provides the necessary computational flexibility and speed for real-time digital signal processing tasks. The findings underscore the potential of this design to significantly enhance the efficiency and performance of digital beamforming systems in space applications.
空间领域对数字波束形成的兴趣日益浓厚,这在很大程度上是由于具有有源天线系统的卫星为广泛的应用提供了优越的灵活性,特别是在通信服务方面。本文深入研究了数字波束形成和数字下变频(DDC)链的分析和实际实现,利用经过空间应用认证的高速模数转换器(ADC)以及高性能现场可编程门阵列(FPGA)。提出的设计策略侧重于优化资源效率和最小化功耗,通过在复杂的下变频操作之前对波束形成处理器进行战略性排序。这种创新的方法需要将解调和低通滤波专门应用于聚合波束信道,与传统的、资源更密集的数字波束形成和DDC架构相比,最终显著减少了所需的数字信号处理资源。在实验验证中,采用了集成高速ADC和FPGA的评估板。通过将各种RF输入信号应用于数字波束成形接收器系统,该设置促进了设计有效性的经验验证。所采用的ADC能够进行高分辨率信号处理,而FPGA为实时数字信号处理任务提供必要的计算灵活性和速度。研究结果强调了该设计在显著提高空间应用中数字波束形成系统的效率和性能方面的潜力。
{"title":"A digital beamforming receiver architecture implemented on a FPGA for space applications","authors":"Eduardo Ortega ,&nbsp;Agustín Martínez ,&nbsp;Antonio Oliva ,&nbsp;Fernando Sanz ,&nbsp;Óscar Rodríguez ,&nbsp;Manuel Prieto ,&nbsp;Pablo Parra ,&nbsp;Antonio da Silva ,&nbsp;Sebastián Sánchez","doi":"10.1016/j.micpro.2025.105243","DOIUrl":"10.1016/j.micpro.2025.105243","url":null,"abstract":"<div><div>The burgeoning interest within the space community in digital beamforming is largely attributable to the superior flexibility that satellites with active antenna systems offer for a wide range of applications, notably in communication services. This paper delves into the analysis and practical implementation of a Digital Beamforming and Digital Down Conversion (DDC) chain, leveraging a high-speed Analog-to-Digital Converter (ADC) certified for space applications alongside a high-performance Field-Programmable Gate Array (FPGA). The proposed design strategy focuses on optimizing resource efficiency and minimizing power consumption by strategically sequencing the beamformer processor ahead of the complex down-conversion operation. This innovative approach entails the application of demodulation and low-pass filtering exclusively to the aggregated beam channel, culminating in a marked reduction in the requisite digital signal processing resources relative to traditional, more resource-intensive digital beamforming and DDC architectures. In the experimental validation, an evaluation board integrating a high-speed ADC and a FPGA was utilized. This setup facilitated the empirical validation of the design’s efficacy by applying various RF input signals to the digital beamforming receiver system. The ADC employed is capable of high-resolution signal processing, while the FPGA provides the necessary computational flexibility and speed for real-time digital signal processing tasks. The findings underscore the potential of this design to significantly enhance the efficiency and performance of digital beamforming systems in space applications.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"121 ","pages":"Article 105243"},"PeriodicalIF":2.6,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable hardware designs for median filters based on separable sorting networks 基于可分离排序网络的中值滤波器的可扩展硬件设计
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-25 DOI: 10.1016/j.micpro.2025.105241
Cameron Vogeli, Daniel Llamocca
We present scalable and generalized hardware designs for k × k median filters based on separability of sorting networks, where we can process 4 pixels at a time. The fully customized (performance, bit-width) hardware architectures allow for design space exploration to establish trade-offs among processing time and resource usage. Results are presented in terms of resources, processing cycles, and throughput. We present true scalable architectures: our approach features a linear increase (becoming even less pronounced) in hardware resources and processing time as k grows. As far as we are aware, there are no competing works (that use separability) for k>5. The proposed architectures, validated on modern FPGAs for k=3,5,7,9,11, are expected to be used as building blocks on a variety of image processing applications.
我们提出了基于排序网络可分离性的k × k中值滤波器的可扩展和通用硬件设计,其中我们一次可以处理4个像素。完全定制的(性能、位宽)硬件架构允许进行设计空间探索,从而在处理时间和资源使用之间建立权衡。结果以资源、处理周期和吞吐量的形式呈现。我们提出了真正的可扩展架构:我们的方法的特点是随着k的增长,硬件资源和处理时间呈线性增长(变得更加不明显)。据我们所知,k>;5没有竞争作品(使用可分离性)。在k=3,5,7,9,11的现代fpga上验证了所提出的架构,预计将用作各种图像处理应用的构建模块。
{"title":"Scalable hardware designs for median filters based on separable sorting networks","authors":"Cameron Vogeli,&nbsp;Daniel Llamocca","doi":"10.1016/j.micpro.2025.105241","DOIUrl":"10.1016/j.micpro.2025.105241","url":null,"abstract":"<div><div>We present scalable and generalized hardware designs for <strong><em>k</em></strong> <strong><em>×</em></strong> <strong><em>k</em></strong> median filters based on separability of sorting networks, where we can process 4 pixels at a time. The fully customized (performance, bit-width) hardware architectures allow for design space exploration to establish trade-offs among processing time and resource usage. Results are presented in terms of resources, processing cycles, and throughput. We present true scalable architectures: our approach features a linear increase (becoming even less pronounced) in hardware resources and processing time as <span><math><mi>k</mi></math></span> grows. As far as we are aware, there are no competing works (that use separability) for <span><math><mrow><mi>k</mi><mo>&gt;</mo><mn>5</mn></mrow></math></span>. The proposed architectures, validated on modern FPGAs for <span><math><mrow><mi>k</mi><mo>=</mo><mn>3</mn><mo>,</mo><mn>5</mn><mo>,</mo><mn>7</mn><mo>,</mo><mn>9</mn><mo>,</mo><mn>11</mn></mrow></math></span>, are expected to be used as building blocks on a variety of image processing applications.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"121 ","pages":"Article 105241"},"PeriodicalIF":2.6,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145869493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic linux malware detection using binary inspection and runtime opcode tracing 自动linux恶意软件检测使用二进制检查和运行时操作码跟踪
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-12 DOI: 10.1016/j.micpro.2025.105237
Martí Alonso , Andreu Gironés , Juan-José Costa , Enric Morancho , Stefano Di Carlo , Ramon Canal
The fast-paced evolution of cyberattacks to digital infrastructures requires new protection mechanisms to counterattack them. Malware attacks, a type of cyberattacks ranging from viruses and worms to ransomware and spyware, have been traditionally detected using signature-based methods. But with new versions of malware, this approach is not good enough, and new machine learning tools look promising. In this paper we present two methods to detect Linux malware using machine learning models: (1) a dynamic approach, that tracks the application executed instructions (opcodes) while they are being executed; and (2) a static approach, that inspects the binary application files before execution. We evaluate (1) five machine learning models (Support Vector Machine, k-Nearest Neighbor, Naive Bayes, Decision Tree and Random Forest) and (2) a deep neural network using a Long Short-Term Memory architecture with word embedding. We show the methodology, the initial dataset preparation, the infrastructure used to obtain the traces of executed instructions, and the evaluation of the results for the different models used. The obtained results show that the dynamic approach with a Random Forest classifier gets a 90% accuracy or higher, while the static approach obtains a 98% accuracy.
网络攻击对数字基础设施的快速演变需要新的保护机制来反击。恶意软件攻击是一种网络攻击,从病毒和蠕虫到勒索软件和间谍软件,传统上使用基于签名的方法来检测。但是对于新版本的恶意软件,这种方法还不够好,新的机器学习工具看起来很有希望。在本文中,我们提出了两种使用机器学习模型检测Linux恶意软件的方法:(1)动态方法,在应用程序执行指令(操作码)时跟踪它们;(2)静态方法,在执行前检查二进制应用程序文件。我们评估了(1)五种机器学习模型(支持向量机,k近邻,朴素贝叶斯,决策树和随机森林)和(2)使用长短期记忆架构和词嵌入的深度神经网络。我们展示了方法、初始数据集准备、用于获取执行指令的跟踪的基础设施,以及对所使用的不同模型的结果的评估。得到的结果表明,使用随机森林分类器的动态方法获得90%以上的准确率,而静态方法获得98%的准确率。
{"title":"Automatic linux malware detection using binary inspection and runtime opcode tracing","authors":"Martí Alonso ,&nbsp;Andreu Gironés ,&nbsp;Juan-José Costa ,&nbsp;Enric Morancho ,&nbsp;Stefano Di Carlo ,&nbsp;Ramon Canal","doi":"10.1016/j.micpro.2025.105237","DOIUrl":"10.1016/j.micpro.2025.105237","url":null,"abstract":"<div><div>The fast-paced evolution of cyberattacks to digital infrastructures requires new protection mechanisms to counterattack them. Malware attacks, a type of cyberattacks ranging from viruses and worms to ransomware and spyware, have been traditionally detected using signature-based methods. But with new versions of malware, this approach is not good enough, and new machine learning tools look promising. In this paper we present two methods to detect Linux malware using machine learning models: (1) a dynamic approach, that tracks the application executed instructions (opcodes) while they are being executed; and (2) a static approach, that inspects the binary application files before execution. We evaluate (1) five machine learning models (Support Vector Machine, k-Nearest Neighbor, Naive Bayes, Decision Tree and Random Forest) and (2) a deep neural network using a Long Short-Term Memory architecture with word embedding. We show the methodology, the initial dataset preparation, the infrastructure used to obtain the traces of executed instructions, and the evaluation of the results for the different models used. The obtained results show that the dynamic approach with a Random Forest classifier gets a 90% accuracy or higher, while the static approach obtains a 98% accuracy.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"120 ","pages":"Article 105237"},"PeriodicalIF":2.6,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SHAX: Evaluation of SVM hardware accelerator for detecting and preventing ROP on Xtensa Xtensa上用于检测和预防ROP的SVM硬件加速器的评估
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-04 DOI: 10.1016/j.micpro.2025.105236
Adebayo Omotosho , Sirine Ilahi , Ernesto Cristopher Villegas Castillo , Christian Hammer , Hans-Martin Bluethgen
Return-oriented programming (ROP) chains together sequences of instructions residing in executable pages of the memory to compromise a program’s control flow. On embedded systems, ROP detection is intricate as such devices lack the resources to directly run sophisticated software-based detection techniques, as these are memory and CPU-intensive.
However, a Field Programmable Gate Array (FPGA) can enhance the capabilities of an embedded device to handle resource-intensive tasks. Hence, this paper presents the first performance evaluation of a Support Vector Machine (SVM) hardware accelerator for automatic ROP classification on Xtensa-embedded devices using hardware performance counters (HPCs).
In addition to meeting security requirements, modern cyber–physical systems must exhibit high reliability against hardware failures to ensure correct functionality. To assess the reliability level of our proposed SVM architecture, we perform simulation-based fault injection at the RT-level. To improve the efficiency of this evaluation, we utilize a hybrid virtual prototype that integrates the RT-level model of the SVM accelerator with the Tensilica LX7 Instruction Set Simulator. This setup enables early-stage reliability assessment, helping to identify vulnerabilities and reduce the need for extensive fault injection campaigns during later stages of the design process.
Our evaluation results show that an SVM accelerator targeting an FPGA device can detect and prevent ROP attacks on an embedded processor with high accuracy in real time. In addition, we explore the most vulnerable locations of our SVM design to permanent faults, enabling the exploration of safety mechanisms that increase fault coverage in future works.
面向返回的编程(ROP)将驻留在内存可执行页中的指令序列链接在一起,以破坏程序的控制流。在嵌入式系统上,ROP检测是复杂的,因为这些设备缺乏直接运行复杂的基于软件的检测技术的资源,因为这些是内存和cpu密集型的。然而,现场可编程门阵列(FPGA)可以增强嵌入式设备处理资源密集型任务的能力。因此,本文提出了支持向量机(SVM)硬件加速器在xtensa嵌入式设备上使用硬件性能计数器(hpc)进行自动ROP分类的首次性能评估。除了满足安全要求外,现代网络物理系统必须在硬件故障时表现出高可靠性,以确保正确的功能。为了评估我们提出的SVM架构的可靠性水平,我们在rt级执行基于仿真的故障注入。为了提高评估效率,我们使用了一个混合虚拟样机,该样机将支持向量机加速器的rt级模型与Tensilica LX7指令集模拟器集成在一起。这种设置支持早期可靠性评估,有助于识别漏洞,并减少在设计过程的后期阶段进行大量故障注入活动的需要。我们的评估结果表明,针对FPGA器件的SVM加速器可以实时高精度地检测和防止嵌入式处理器的ROP攻击。此外,我们还探索了SVM设计中最容易受到永久故障影响的位置,从而可以在未来的工作中探索增加故障覆盖的安全机制。
{"title":"SHAX: Evaluation of SVM hardware accelerator for detecting and preventing ROP on Xtensa","authors":"Adebayo Omotosho ,&nbsp;Sirine Ilahi ,&nbsp;Ernesto Cristopher Villegas Castillo ,&nbsp;Christian Hammer ,&nbsp;Hans-Martin Bluethgen","doi":"10.1016/j.micpro.2025.105236","DOIUrl":"10.1016/j.micpro.2025.105236","url":null,"abstract":"<div><div><em>Return-oriented programming</em> (ROP) chains together sequences of instructions residing in executable pages of the memory to compromise a program’s control flow. On <em>embedded systems</em>, ROP detection is intricate as such devices lack the resources to directly run sophisticated software-based detection techniques, as these are memory and CPU-intensive.</div><div>However, a <em>Field Programmable Gate Array</em> (FPGA) can enhance the capabilities of an embedded device to handle resource-intensive tasks. Hence, this paper presents the first performance evaluation of a Support Vector Machine (SVM) hardware accelerator for automatic ROP classification on Xtensa-embedded devices using hardware performance counters (HPCs).</div><div>In addition to meeting security requirements, modern cyber–physical systems must exhibit high reliability against hardware failures to ensure correct functionality. To assess the reliability level of our proposed SVM architecture, we perform simulation-based fault injection at the RT-level. To improve the efficiency of this evaluation, we utilize a hybrid virtual prototype that integrates the RT-level model of the SVM accelerator with the Tensilica LX7 Instruction Set Simulator. This setup enables early-stage reliability assessment, helping to identify vulnerabilities and reduce the need for extensive fault injection campaigns during later stages of the design process.</div><div>Our evaluation results show that an SVM accelerator targeting an FPGA device can detect and prevent ROP attacks on an embedded processor with high accuracy in real time. In addition, we explore the most vulnerable locations of our SVM design to permanent faults, enabling the exploration of safety mechanisms that increase fault coverage in future works.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"120 ","pages":"Article 105236"},"PeriodicalIF":2.6,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware and software design of APEnetX: A custom high-speed interconnect for scientific computing 科学计算专用高速互连APEnetX的软硬件设计
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-21 DOI: 10.1016/j.micpro.2025.105224
Roberto Ammendola , Andrea Biagioni , Carlotta Chiarini , Paolo Cretaro , Ottorino Frezza , Francesca Lo Cicero , Alessandro Lonardo , Michele Martinelli , Pier Stanislao Paolucci , Elena Pastorelli , Pierpaolo Perticaroli , Luca Pontisso , Cristian Rossi , Francesco Simula , Piero Vicini
High speed interconnects are critical to provide robust and highly efficient services to every user in a cluster. Several commercial offerings – many of which now firmly established in the market – have arisen throughout the years, spanning the very many possible tradeoffs between cost, reconfigurability, performance, resiliency and support for a variety of processing architectures. On the other hand, custom interconnects may represent an appealing solution for applications requiring cost-effectiveness, customizability and flexibility.
In this regard, the APEnet project was started in 2003, focusing on the design of PCIe FPGA-based custom Network Interface Cards (NIC) for cluster interconnects with a 3D torus topology. In this work, we highlight the main features of APEnetX, the latest version of the APEnet NIC. Designed on the Xilinx Alveo U200 card, it implements Remote Direct Memory Access (RDMA) transactions using both Xilinx Ultrascale+ IPs and custom hardware and software components to ensure efficient data transfer without the involvement of the host operating system. The software stack lets the user interface with the NIC directly via a low level driver or through a plug-in for the OpenMPI stack, aligning our NIC to the application layer standards in the HPC community. The APEnetX architecture integrates a Quality-of-Service (QoS) scheme implementation, in order to enforce some level of performance during network congestion events. Finally, APEnetX is accompanied by an Omnet++ based simulator which enables probing the performance of the network when its size is pushed to numbers of nodes otherwise unattainable for cost and/or practicality reasons.
高速互连对于为集群中的每个用户提供健壮和高效的服务至关重要。多年来,已经出现了一些商业产品,其中许多已经在市场上站稳了脚跟,跨越了成本、可重构性、性能、弹性和对各种处理架构的支持之间的许多可能的权衡。另一方面,对于需要成本效益、可定制性和灵活性的应用程序来说,自定义互连可能是一个有吸引力的解决方案。在这方面,APEnet项目于2003年启动,重点是设计基于PCIe fpga的自定义网络接口卡(NIC),用于具有3D环面拓扑的集群互连。在这项工作中,我们重点介绍了APEnetX的主要功能,APEnetX是APEnet网卡的最新版本。在Xilinx Alveo U200卡上设计,它使用Xilinx Ultrascale+ ip和定制硬件和软件组件实现远程直接内存访问(RDMA)事务,以确保高效的数据传输,而无需主机操作系统的参与。软件栈允许用户直接通过底层驱动程序或OpenMPI栈的插件与网卡进行交互,使我们的网卡与高性能计算社区的应用层标准保持一致。APEnetX体系结构集成了服务质量(QoS)方案实现,以便在网络拥塞事件期间强制执行某种级别的性能。最后,APEnetX还附带了一个基于omnet++的模拟器,当网络的大小被推到节点数量时,它可以探测网络的性能,否则由于成本和/或实用性原因无法实现。
{"title":"Hardware and software design of APEnetX: A custom high-speed interconnect for scientific computing","authors":"Roberto Ammendola ,&nbsp;Andrea Biagioni ,&nbsp;Carlotta Chiarini ,&nbsp;Paolo Cretaro ,&nbsp;Ottorino Frezza ,&nbsp;Francesca Lo Cicero ,&nbsp;Alessandro Lonardo ,&nbsp;Michele Martinelli ,&nbsp;Pier Stanislao Paolucci ,&nbsp;Elena Pastorelli ,&nbsp;Pierpaolo Perticaroli ,&nbsp;Luca Pontisso ,&nbsp;Cristian Rossi ,&nbsp;Francesco Simula ,&nbsp;Piero Vicini","doi":"10.1016/j.micpro.2025.105224","DOIUrl":"10.1016/j.micpro.2025.105224","url":null,"abstract":"<div><div>High speed interconnects are critical to provide robust and highly efficient services to every user in a cluster. Several commercial offerings – many of which now firmly established in the market – have arisen throughout the years, spanning the very many possible tradeoffs between cost, reconfigurability, performance, resiliency and support for a variety of processing architectures. On the other hand, custom interconnects may represent an appealing solution for applications requiring cost-effectiveness, customizability and flexibility.</div><div>In this regard, the APEnet project was started in 2003, focusing on the design of PCIe FPGA-based custom Network Interface Cards (NIC) for cluster interconnects with a 3D torus topology. In this work, we highlight the main features of APEnetX, the latest version of the APEnet NIC. Designed on the Xilinx Alveo U200 card, it implements Remote Direct Memory Access (RDMA) transactions using both Xilinx Ultrascale+ IPs and custom hardware and software components to ensure efficient data transfer without the involvement of the host operating system. The software stack lets the user interface with the NIC directly via a low level driver or through a plug-in for the OpenMPI stack, aligning our NIC to the application layer standards in the HPC community. The APEnetX architecture integrates a Quality-of-Service (QoS) scheme implementation, in order to enforce some level of performance during network congestion events. Finally, APEnetX is accompanied by an Omnet++ based simulator which enables probing the performance of the network when its size is pushed to numbers of nodes otherwise unattainable for cost and/or practicality reasons.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"120 ","pages":"Article 105224"},"PeriodicalIF":2.6,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ALFA: Design of an accuracy-configurable and low-latency fault-tolerant adder ALFA:一种精度可配置、低延迟容错加法器的设计
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-19 DOI: 10.1016/j.micpro.2025.105226
Ioannis Tsounis, Dimitris Agiakatsikas, Mihalis Psarakis
Low-Latency Approximate Adders (LLAAs) are high-performance adder models that perform either approximate addition with configurable accuracy-loss or accurate addition by integrating proper circuitry to detect and correct the expected approximation error. Due to their block-based structure, these adder models offer lower latency at the expense of configurable accuracy loss and area overhead. However, hardware accelerators employing such adders are susceptible to hardware (HW) faults, which can cause extra errors (i.e., HW errors) in addition to the expected approximation errors during their operation. In this work, we propose a novel Accuracy Configurable Low-latency and Fault-tolerant Adder, namely ALFA, that offers 100% fault coverage taking into consideration the required accuracy level. Our approach takes advantage of the resemblance between the HW errors and the approximation errors to build a scheme based on selective Triple Modular Redundancy (TMR), which can detect and correct all errors that violate the accuracy threshold. The proposed ALFA model for approximate operation achieves significant performance gains with minimum area overhead compared to the state-of-the-art Reduced Precision Redundancy (RPR) Ripple Carry Adders (RCA) with the same level of fault-tolerance. Furthermore, the accurate ALFA model outperforms the RCA with classical TMR in terms of performance.
低延迟近似加法器(LLAAs)是高性能加法器模型,通过集成适当的电路来检测和纠正预期的近似误差,执行具有可配置精度损失的近似加法或精确加法。由于其基于块的结构,这些加法器模型以牺牲可配置精度损失和面积开销为代价提供了较低的延迟。然而,使用这种加法器的硬件加速器容易受到硬件(HW)故障的影响,这除了在其操作期间预期的近似误差外,还可能导致额外的错误(即HW错误)。在这项工作中,我们提出了一种新颖的精度可配置低延迟和容错加法器,即ALFA,它在考虑所需精度水平的情况下提供100%的故障覆盖率。我们的方法利用了HW误差和近似误差之间的相似性,建立了一种基于选择性三模冗余(TMR)的方案,该方案可以检测和纠正所有违反精度阈值的误差。与具有相同容错级别的最先进的降低精度冗余(RPR)纹波进位加法器(RCA)相比,所提出的用于近似操作的ALFA模型以最小的面积开销实现了显着的性能提升。此外,精确的ALFA模型在性能方面优于具有经典TMR的RCA模型。
{"title":"ALFA: Design of an accuracy-configurable and low-latency fault-tolerant adder","authors":"Ioannis Tsounis,&nbsp;Dimitris Agiakatsikas,&nbsp;Mihalis Psarakis","doi":"10.1016/j.micpro.2025.105226","DOIUrl":"10.1016/j.micpro.2025.105226","url":null,"abstract":"<div><div>Low-Latency Approximate Adders (LLAAs) are high-performance adder models that perform either approximate addition with configurable accuracy-loss or accurate addition by integrating proper circuitry to detect and correct the expected approximation error. Due to their block-based structure, these adder models offer lower latency at the expense of configurable accuracy loss and area overhead. However, hardware accelerators employing such adders are susceptible to hardware (HW) faults, which can cause extra errors (i.e., HW errors) in addition to the expected approximation errors during their operation. In this work, we propose a novel Accuracy Configurable Low-latency and Fault-tolerant Adder, namely ALFA, that offers 100% fault coverage taking into consideration the required accuracy level. Our approach takes advantage of the resemblance between the HW errors and the approximation errors to build a scheme based on selective Triple Modular Redundancy (TMR), which can detect and correct all errors that violate the accuracy threshold. The proposed ALFA model for approximate operation achieves significant performance gains with minimum area overhead compared to the state-of-the-art Reduced Precision Redundancy (RPR) Ripple Carry Adders (RCA) with the same level of fault-tolerance. Furthermore, the accurate ALFA model outperforms the RCA with classical TMR in terms of performance.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"120 ","pages":"Article 105226"},"PeriodicalIF":2.6,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A runtime-adaptive transformer neural network accelerator on FPGAs 基于fpga的运行自适应变压器神经网络加速器
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-17 DOI: 10.1016/j.micpro.2025.105223
Ehsan Kabir , Jason D. Bakos , David Andrews , Miaoqing Huang
Transformer neural networks (TNN) excel in natural language processing (NLP), machine translation, and computer vision (CV) without relying on recurrent or convolutional layers. However, they have high computational and memory demands, particularly on resource constrained devices like FPGAs. Moreover, transformer models vary in processing time across applications, requiring custom models with specific parameters. Designing custom accelerators for each model is complex and time-intensive. Some custom accelerators exist with no runtime adaptability, and they often rely on sparse matrices to reduce latency. However, hardware designs become more challenging due to the need for application-specific sparsity patterns. This paper introduces ADAPTOR, a runtime-adaptive accelerator for dense matrix computations in transformer encoders and decoders on FPGAs. ADAPTOR enhances the utilization of processing elements and on-chip memory, enhancing parallelism and reducing latency. It incorporates efficient matrix tiling to distribute resources across FPGA platforms and is fully quantized for computational efficiency and portability. Evaluations on Xilinx Alveo U55C data center cards and embedded platforms like VC707 and ZCU102 show that our design is 1.2× and 2.87× more power efficient than the NVIDIA K80 GPU and the i7-8700K CPU respectively. Additionally, it achieves a speedup of 1.7 to 2.25× compared to some state-of-the-art FPGA-based accelerators.
变压器神经网络(TNN)在自然语言处理(NLP),机器翻译和计算机视觉(CV)方面表现出色,而不依赖于循环或卷积层。然而,它们有很高的计算和内存需求,特别是在像fpga这样资源受限的设备上。此外,变压器模型在不同应用程序之间的处理时间不同,需要使用特定参数的自定义模型。为每个模型设计定制加速器既复杂又耗时。一些自定义加速器没有运行时适应性,它们通常依赖于稀疏矩阵来减少延迟。然而,由于需要特定于应用程序的稀疏性模式,硬件设计变得更具挑战性。本文介绍了一种用于fpga上变压器编码器和解码器密集矩阵计算的运行时自适应加速器ADAPTOR。ADAPTOR提高了处理元件和片上存储器的利用率,增强了并行性并减少了延迟。它结合了高效的矩阵平铺,在FPGA平台上分配资源,并完全量化了计算效率和可移植性。在Xilinx Alveo U55C数据中心卡和VC707和ZCU102等嵌入式平台上的测试表明,我们的设计比NVIDIA K80 GPU和i7-8700K CPU的能效分别提高1.2倍和2.87倍。此外,与一些最先进的基于fpga的加速器相比,它实现了1.7到2.25倍的加速。
{"title":"A runtime-adaptive transformer neural network accelerator on FPGAs","authors":"Ehsan Kabir ,&nbsp;Jason D. Bakos ,&nbsp;David Andrews ,&nbsp;Miaoqing Huang","doi":"10.1016/j.micpro.2025.105223","DOIUrl":"10.1016/j.micpro.2025.105223","url":null,"abstract":"<div><div>Transformer neural networks (TNN) excel in natural language processing (NLP), machine translation, and computer vision (CV) without relying on recurrent or convolutional layers. However, they have high computational and memory demands, particularly on resource constrained devices like FPGAs. Moreover, transformer models vary in processing time across applications, requiring custom models with specific parameters. Designing custom accelerators for each model is complex and time-intensive. Some custom accelerators exist with no runtime adaptability, and they often rely on sparse matrices to reduce latency. However, hardware designs become more challenging due to the need for application-specific sparsity patterns. This paper introduces ADAPTOR, a runtime-adaptive accelerator for dense matrix computations in transformer encoders and decoders on FPGAs. ADAPTOR enhances the utilization of processing elements and on-chip memory, enhancing parallelism and reducing latency. It incorporates efficient matrix tiling to distribute resources across FPGA platforms and is fully quantized for computational efficiency and portability. Evaluations on Xilinx Alveo U55C data center cards and embedded platforms like VC707 and ZCU102 show that our design is 1.2<span><math><mo>×</mo></math></span> and 2.87<span><math><mo>×</mo></math></span> more power efficient than the NVIDIA K80 GPU and the i7-8700K CPU respectively. Additionally, it achieves a speedup of 1.7 to 2.25<span><math><mo>×</mo></math></span> compared to some state-of-the-art FPGA-based accelerators.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"120 ","pages":"Article 105223"},"PeriodicalIF":2.6,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Microprocessors and Microsystems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1