首页 > 最新文献

IEEE Embedded Systems Letters最新文献

英文 中文
Functional Validation of the RISC-V Unlimited Vector Extension RISC-V 无限矢量扩展的功能验证
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-19 DOI: 10.1109/LES.2024.3416820
Ana Fernandes;Luís Crespo;Nuno Neves;Pedro Tomás;Nuno Roma;Gabriel Falcao
Data streaming and data-flow computing paradigms have been on the rise, aiming to improve the performance of general-purpose processors. However, providing support for data streaming typically requires the definition of new instruction set architecture (ISA) extensions, which must be thoroughly validated before being implemented in hardware. This step is usually carried out using instruction set simulators (ISSs), to which the necessary streaming support must be added. Accordingly, this work proposes a new validation simulator for the recently presented stream-based RISC-V ISA unlimited vector extension (UVE). The proposed tool is based on Spike, the golden reference instruction set simulator ISS for RISC-V extensions. It is capable of processing a wide range of memory access patterns and provides the necessary mechanisms to validate the target extension, as well as to evaluate the resulting instruction reduction gains.
{"title":"Functional Validation of the RISC-V Unlimited Vector Extension","authors":"Ana Fernandes;Luís Crespo;Nuno Neves;Pedro Tomás;Nuno Roma;Gabriel Falcao","doi":"10.1109/LES.2024.3416820","DOIUrl":"10.1109/LES.2024.3416820","url":null,"abstract":"Data streaming and data-flow computing paradigms have been on the rise, aiming to improve the performance of general-purpose processors. However, providing support for data streaming typically requires the definition of new instruction set architecture (ISA) extensions, which must be thoroughly validated before being implemented in hardware. This step is usually carried out using instruction set simulators (ISSs), to which the necessary streaming support must be added. Accordingly, this work proposes a new validation simulator for the recently presented stream-based RISC-V ISA unlimited vector extension (UVE). The proposed tool is based on Spike, the golden reference instruction set simulator ISS for RISC-V extensions. It is capable of processing a wide range of memory access patterns and provides the necessary mechanisms to validate the target extension, as well as to evaluate the resulting instruction reduction gains.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 1","pages":"2-5"},"PeriodicalIF":1.7,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
M-HLS: Malevolent High-Level Synthesis for Watermarked Hardware IPs M-HLS:针对带水印硬件 IP 的恶意高级合成
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-19 DOI: 10.1109/LES.2024.3416422
Anirban Sengupta;Aditya Anshul;Vishal Chourasia;Nitish Kumar
Hardware Trojan insertion in high-level synthesis (HLS) generated intellectual property (IP) designs can pose strong security concern for the designers. Backdoor hardware Trojans can be inserted in the HLS design flow to compromise the produced register transfer level (RTL) IP design. This letter presents a novel malevolent HLS (M-HLS) framework introducing the possibility of two different hardware Trojan insertion [i.e., performance degradation hardware Trojan (PD-HT) and Denial of Service hardware Trojan (DoS-HT)] in multiplexer (mux)-based interconnect stage of HLS generated watermarked IP design. The proposed framework is validated on the watermarked MESA Horner Bezier’s IP, which indicates strong performance degradation and DoS achievable by an attacker at minimal area and power overhead.
在高级合成(HLS)生成的知识产权(IP)设计中插入硬件木马会给设计者带来严重的安全问题。后门硬件木马可以插入到HLS设计流程中,以破坏生成的寄存器传输级别(RTL) IP设计。本文提出了一种新的恶意HLS (M-HLS)框架,引入了两种不同硬件木马插入的可能性[即性能下降硬件木马(PD-HT)和拒绝服务硬件木马(DoS-HT)]在基于多路复用器(mux)的HLS生成水印IP设计的互连阶段。该框架在带有水印的MESA Horner Bezier IP上进行了验证,表明攻击者可以在最小的面积和功耗开销下实现较强的性能下降和DoS。
{"title":"M-HLS: Malevolent High-Level Synthesis for Watermarked Hardware IPs","authors":"Anirban Sengupta;Aditya Anshul;Vishal Chourasia;Nitish Kumar","doi":"10.1109/LES.2024.3416422","DOIUrl":"10.1109/LES.2024.3416422","url":null,"abstract":"Hardware Trojan insertion in high-level synthesis (HLS) generated intellectual property (IP) designs can pose strong security concern for the designers. Backdoor hardware Trojans can be inserted in the HLS design flow to compromise the produced register transfer level (RTL) IP design. This letter presents a novel malevolent HLS (M-HLS) framework introducing the possibility of two different hardware Trojan insertion [i.e., performance degradation hardware Trojan (PD-HT) and Denial of Service hardware Trojan (DoS-HT)] in multiplexer (mux)-based interconnect stage of HLS generated watermarked IP design. The proposed framework is validated on the watermarked MESA Horner Bezier’s IP, which indicates strong performance degradation and DoS achievable by an attacker at minimal area and power overhead.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"497-500"},"PeriodicalIF":1.7,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reliable Methodology to FPGA Design Verification and Noise Analysis for Digital Lock-In Amplifiers 数字锁相放大器 FPGA 设计验证和噪声分析的可靠方法
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-18 DOI: 10.1109/LES.2024.3415651
Jose Alejandro Galaviz-Aguilar;Cesar Vargas-Rosales;Francisco Falcone
The lock-in amplifier (LIA) instruments are designed to provide signal conditioning for precision measurement systems to extract signals from extremely noisy environments. The digital LIAs design often requires a verification process to ensure hardware performance. Thus, hardware description language (HDL) with functional verification strategies offers a powerful tool to provide an field-programmable gate array (FPGA) integrated solution. In this letter, we propose a methodology of design and verification of all-digital LIA and an additive white Gaussian noise (AWGN) module able to measure extremely lower levels of signal-to-noise ratio (SNR) of $approx $ ${10}^{-{15}}$ or down to −37 dB while a wide reserve of spurious-free dynamic range (SFDR) up to 90 dB on FPGA is ensured. To this end, the designed and implemented FPGA framework for quick, accurate, and comprehensive characterization of a given digital LIA is used to leverage the capabilities of the design under controllable AWGN noise patterns stimulus.
锁相放大器(LIA)仪器的设计目的是为精密测量系统提供信号调节,以便从极度嘈杂的环境中提取信号。数字 LIA 设计通常需要验证过程,以确保硬件性能。因此,具有功能验证策略的硬件描述语言(HDL)为提供现场可编程门阵列(FPGA)集成解决方案提供了强有力的工具。在这封信中,我们提出了一种全数字 LIA 和加性白高斯噪声(AWGN)模块的设计和验证方法,该模块能够测量 ${10}^{-{15}}$ 或低至 -37 dB 的极低水平信噪比(SNR),同时确保 FPGA 具有高达 90 dB 的宽储备无杂散动态范围(SFDR)。为此,设计并实现了 FPGA 框架,用于快速、准确、全面地鉴定给定的数字 LIA,以充分利用设计在可控 AWGN 噪声模式刺激下的能力。
{"title":"Reliable Methodology to FPGA Design Verification and Noise Analysis for Digital Lock-In Amplifiers","authors":"Jose Alejandro Galaviz-Aguilar;Cesar Vargas-Rosales;Francisco Falcone","doi":"10.1109/LES.2024.3415651","DOIUrl":"10.1109/LES.2024.3415651","url":null,"abstract":"The lock-in amplifier (LIA) instruments are designed to provide signal conditioning for precision measurement systems to extract signals from extremely noisy environments. The digital LIAs design often requires a verification process to ensure hardware performance. Thus, hardware description language (HDL) with functional verification strategies offers a powerful tool to provide an field-programmable gate array (FPGA) integrated solution. In this letter, we propose a methodology of design and verification of all-digital LIA and an additive white Gaussian noise (AWGN) module able to measure extremely lower levels of signal-to-noise ratio (SNR) of \u0000<inline-formula> <tex-math>$approx $ </tex-math></inline-formula>\u0000 \u0000<inline-formula> <tex-math>${10}^{-{15}}$ </tex-math></inline-formula>\u0000 or down to −37 dB while a wide reserve of spurious-free dynamic range (SFDR) up to 90 dB on FPGA is ensured. To this end, the designed and implemented FPGA framework for quick, accurate, and comprehensive characterization of a given digital LIA is used to leverage the capabilities of the design under controllable AWGN noise patterns stimulus.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 3","pages":"307-310"},"PeriodicalIF":1.7,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Virtually Contiguous Memory Allocation in Embedded Systems: A Performance Analysis 嵌入式系统中的虚拟连续内存分配:性能分析
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-18 DOI: 10.1109/LES.2024.3416192
Yacine Hadjadj;Chakib Mustapha Anouar Zouaoui;Nasreddine Taleb
In an era dominated by embedded systems, where efficient memory management is crucial, this study delves into the effectiveness of VCMalloc, a novel memory allocator that ensures virtual contiguity, on the Raspberry Pi 4 platform. Through a series of meticulously designed experiments, we compared the performance of VCMalloc with that of the conventional Malloc allocator. Our comprehensive findings reveal VCMalloc’s notable superiority across various performance metrics, positioning it as a highly promising solution for memory management in embedded systems, especially those leveraging virtual memory and memory management units (MMUs).
{"title":"Virtually Contiguous Memory Allocation in Embedded Systems: A Performance Analysis","authors":"Yacine Hadjadj;Chakib Mustapha Anouar Zouaoui;Nasreddine Taleb","doi":"10.1109/LES.2024.3416192","DOIUrl":"10.1109/LES.2024.3416192","url":null,"abstract":"In an era dominated by embedded systems, where efficient memory management is crucial, this study delves into the effectiveness of VCMalloc, a novel memory allocator that ensures virtual contiguity, on the Raspberry Pi 4 platform. Through a series of meticulously designed experiments, we compared the performance of VCMalloc with that of the conventional Malloc allocator. Our comprehensive findings reveal VCMalloc’s notable superiority across various performance metrics, positioning it as a highly promising solution for memory management in embedded systems, especially those leveraging virtual memory and memory management units (MMUs).","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 1","pages":"26-29"},"PeriodicalIF":1.7,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
76.5-Gb/s Viterbi Decoder for Convolutional Codes on GPU GPU 上用于卷积码的 76.5-Gbps Viterbi 译码器
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-18 DOI: 10.1109/LES.2024.3416401
Zhanxian Liu;Chufan Liu;Haijun Zhang;Ling Zhao
This letter presents an optimized Viterbi decoder of convolutional codes on graphics processing unit (GPU) for software defined radio (SDR) platforms. Before the forward process, channel messages are interleaved with coalesced global memory access and the interleaved messages are represented with 4 bits to improve shared memory efficiency. Moreover, we optimize on-chip memory allocations of the forward process to accelerate instruction execution. Excluding the data transfer latency between host and device, the proposed Viterbi decoder achieves 22.2 and 76.5-Gb/s throughput on Tesla V100 and RTX4090, respectively. Compared with related works, the throughput speedups achieved by the proposed decoder are from $2.06times $ to $2.93times $ .
{"title":"76.5-Gb/s Viterbi Decoder for Convolutional Codes on GPU","authors":"Zhanxian Liu;Chufan Liu;Haijun Zhang;Ling Zhao","doi":"10.1109/LES.2024.3416401","DOIUrl":"10.1109/LES.2024.3416401","url":null,"abstract":"This letter presents an optimized Viterbi decoder of convolutional codes on graphics processing unit (GPU) for software defined radio (SDR) platforms. Before the forward process, channel messages are interleaved with coalesced global memory access and the interleaved messages are represented with 4 bits to improve shared memory efficiency. Moreover, we optimize on-chip memory allocations of the forward process to accelerate instruction execution. Excluding the data transfer latency between host and device, the proposed Viterbi decoder achieves 22.2 and 76.5-Gb/s throughput on Tesla V100 and RTX4090, respectively. Compared with related works, the throughput speedups achieved by the proposed decoder are from <inline-formula> <tex-math>$2.06times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$2.93times $ </tex-math></inline-formula>.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 1","pages":"22-25"},"PeriodicalIF":1.7,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wireless Tag Sensor Network for Apnea Detection and Posture Recognition Using LSTM 利用 LSTM 进行呼吸暂停检测和姿势识别的无线标签传感器网络
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-14 DOI: 10.1109/LES.2024.3410024
Rafik Saddaoui;Massine Gana;Hamid Hamiche;Mourad Laghrouche
We have developed a low-cost, high-accuracy, and energy-efficient wearable tag sensor for apnea detection. The sensor can detect different types of breathing problems by monitoring the small movements of the chest wall compartments during each respiration cycle. This tag sensor sends also apnea events, digital respiration rate, and patient posture data using an ultra high radio frequency identification (UHF RFID) reader. The reader is based on the recent AS3993 chip connected to a Raspberry Pi 4 controller, which acts as a local server and is connected to the cloud to share acquired data with the treating doctor. A sleep disorder detection and classification with several positions using a long short-term memory (LSTM) network algorithm is implemented in real-time on the embedded arm microcontroller STM32F407. The proposed apnea detection method exhibits low error, enabling it to meet clinical requirements. The accuracy of apnea events and position detection were triggered in over 93% of cases. We have also evaluated six different classification techniques optimized by considering the proposed feature extraction and regularization of classifier parameters.
我们开发了一种低成本,高精度,节能的可穿戴标签传感器,用于呼吸暂停检测。该传感器可以通过监测每个呼吸周期中胸壁隔室的微小运动来检测不同类型的呼吸问题。该标签传感器还使用超高射频识别(UHF RFID)阅读器发送呼吸暂停事件、数字呼吸率和患者姿势数据。读卡器基于最新的AS3993芯片,连接到树莓派4控制器,作为本地服务器,并连接到云,与治疗医生共享获取的数据。在嵌入式arm微控制器STM32F407上实现了一种基于LSTM网络算法的多位置睡眠障碍实时检测与分类。所提出的呼吸暂停检测方法误差小,能够满足临床要求。超过93%的病例触发了呼吸暂停事件和位置检测的准确性。我们还评估了六种不同的分类技术,通过考虑所提出的特征提取和分类器参数的正则化来优化。
{"title":"Wireless Tag Sensor Network for Apnea Detection and Posture Recognition Using LSTM","authors":"Rafik Saddaoui;Massine Gana;Hamid Hamiche;Mourad Laghrouche","doi":"10.1109/LES.2024.3410024","DOIUrl":"10.1109/LES.2024.3410024","url":null,"abstract":"We have developed a low-cost, high-accuracy, and energy-efficient wearable tag sensor for apnea detection. The sensor can detect different types of breathing problems by monitoring the small movements of the chest wall compartments during each respiration cycle. This tag sensor sends also apnea events, digital respiration rate, and patient posture data using an ultra high radio frequency identification (UHF RFID) reader. The reader is based on the recent AS3993 chip connected to a Raspberry Pi 4 controller, which acts as a local server and is connected to the cloud to share acquired data with the treating doctor. A sleep disorder detection and classification with several positions using a long short-term memory (LSTM) network algorithm is implemented in real-time on the embedded arm microcontroller STM32F407. The proposed apnea detection method exhibits low error, enabling it to meet clinical requirements. The accuracy of apnea events and position detection were triggered in over 93% of cases. We have also evaluated six different classification techniques optimized by considering the proposed feature extraction and regularization of classifier parameters.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"469-472"},"PeriodicalIF":1.7,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating NTT/INTT Implementation Styles for Post-Quantum Cryptography 评估后量子密码学的 NTT/INTT 实现风格
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-06 DOI: 10.1109/LES.2024.3410516
Malik Imran;Safiullah Khan;Ayesha Khalid;Ciara Rafferty;Yasir Ali Shah;Samuel Pagliarini;Muhammad Rashid;Máire O’Neill
Unifying the forward and inverse operations of the number theoretic transform (NTT) into a single hardware module is a common practice when designing polynomial coefficient multiplier accelerators as used in the post-quantum cryptographic algorithms. This letter experimentally evaluates that this design unification is not always advantageous. In this context, we present three NTT hardware architectures: 1) a forward NTT (FNTT) architecture; 2) an inverse NTT (INTT) architecture; and 3) a unified NTT (UNTT) architecture for computing the FNTT and INTT computations on a single design. We benchmark our throughput/area and energy/area evaluations on Xilinx Virtex-7 field-programmable gate array (FPGA) and 28-nm application-specific integrated circuit (ASIC) platforms. The standalone FNTT and INTT designs, on average on FPGA, exhibit $4.66times $ and $3.75times $ higher throughput/area and energy/area values, respectively, than the UNTT design. Similarly, the individual FNTT and INTT designs, on average on ASIC, achieve $1.25times $ and $1.09times $ higher throughput/area and energy/area values, respectively, compared to the UNTT design.
将数论变换(NTT)的正反操作统一到一个硬件模块中是设计后量子加密算法中使用的多项式系数乘法器加速器时的常见做法。这封信实验评估,这种设计统一并不总是有利的。在此背景下,我们提出了三种NTT硬件架构:1)前向NTT (FNTT)架构;2)逆NTT (INTT)架构;3)统一的NTT (UNTT)架构,用于在单个设计上计算FNTT和INTT计算。我们在Xilinx Virtex-7现场可编程门阵列(FPGA)和28nm专用集成电路(ASIC)平台上对吞吐量/面积和能量/面积进行基准评估。独立的FNTT和INTT设计在FPGA上的平均吞吐量/面积和能量/面积值分别比UNTT设计高4.66倍和3.75倍。同样,单个FNTT和INTT设计在ASIC上的平均吞吐量/面积和能量/面积值分别比UNTT设计高1.25倍和1.09倍。
{"title":"Evaluating NTT/INTT Implementation Styles for Post-Quantum Cryptography","authors":"Malik Imran;Safiullah Khan;Ayesha Khalid;Ciara Rafferty;Yasir Ali Shah;Samuel Pagliarini;Muhammad Rashid;Máire O’Neill","doi":"10.1109/LES.2024.3410516","DOIUrl":"10.1109/LES.2024.3410516","url":null,"abstract":"Unifying the forward and inverse operations of the number theoretic transform (NTT) into a single hardware module is a common practice when designing polynomial coefficient multiplier accelerators as used in the post-quantum cryptographic algorithms. This letter experimentally evaluates that this design unification is not always advantageous. In this context, we present three NTT hardware architectures: 1) a forward NTT (FNTT) architecture; 2) an inverse NTT (INTT) architecture; and 3) a unified NTT (UNTT) architecture for computing the FNTT and INTT computations on a single design. We benchmark our throughput/area and energy/area evaluations on Xilinx Virtex-7 field-programmable gate array (FPGA) and 28-nm application-specific integrated circuit (ASIC) platforms. The standalone FNTT and INTT designs, on average on FPGA, exhibit \u0000<inline-formula> <tex-math>$4.66times $ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$3.75times $ </tex-math></inline-formula>\u0000 higher throughput/area and energy/area values, respectively, than the UNTT design. Similarly, the individual FNTT and INTT designs, on average on ASIC, achieve \u0000<inline-formula> <tex-math>$1.25times $ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$1.09times $ </tex-math></inline-formula>\u0000 higher throughput/area and energy/area values, respectively, compared to the UNTT design.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"485-488"},"PeriodicalIF":1.7,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FEARLESS: A Federated Reinforcement Learning Orchestrator for Serverless Edge Swarms 无畏:无服务器边缘群的联合强化学习协调器
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-06 DOI: 10.1109/LES.2024.3410892
Christos Sad;Dimosthenis Masouros;Kostas Siozios
The rise of edge computing, characterized by swarms of edge devices, marks a significant shift in cloud-edge computing landscapes, moving data processing closer to the source of data generation. However, this paradigm introduces complexities in orchestration, as traditional centralized methods become inadequate for effectively managing distributed, dynamic edge environments. In this letter, we introduce FEARLESS, a distributed orchestration framework tailored for swarms of edge devices. FEARLESS employs a vertical federated reinforcement learning approach to efficiently orchestrate function invocation requests in serverless swarms. Experimental results demonstrate that FEARLESS significantly reduces the quality-of-service violations of the scheduled tasks by up to 57%, compared to a centralized “least-CPU-utilization” and a “local-execution” approach, while it also achieves approximately up to 20% average total energy reduction.
{"title":"FEARLESS: A Federated Reinforcement Learning Orchestrator for Serverless Edge Swarms","authors":"Christos Sad;Dimosthenis Masouros;Kostas Siozios","doi":"10.1109/LES.2024.3410892","DOIUrl":"10.1109/LES.2024.3410892","url":null,"abstract":"The rise of edge computing, characterized by swarms of edge devices, marks a significant shift in cloud-edge computing landscapes, moving data processing closer to the source of data generation. However, this paradigm introduces complexities in orchestration, as traditional centralized methods become inadequate for effectively managing distributed, dynamic edge environments. In this letter, we introduce FEARLESS, a distributed orchestration framework tailored for swarms of edge devices. FEARLESS employs a vertical federated reinforcement learning approach to efficiently orchestrate function invocation requests in serverless swarms. Experimental results demonstrate that FEARLESS significantly reduces the quality-of-service violations of the scheduled tasks by up to 57%, compared to a centralized “least-CPU-utilization” and a “local-execution” approach, while it also achieves approximately up to 20% average total energy reduction.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 1","pages":"34-37"},"PeriodicalIF":1.7,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speed Record of AES-CTR and AES-ECB Bit-Sliced Implementation on GPUs 在 GPU 上实现 AES-CTR 和 AES-ECB 比特切分的速度记录
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-05 DOI: 10.1109/LES.2024.3409725
Wai-Kong Lee;Seog Chung Seo;Hwajeong Seo;Dong Cheon Kim;Seong Oun Hwang
The advanced encryption standard (AES) has been widely used to protect digital data in various applications, such as secure IoT communication, files encryption, and pseudorandom number generation. The efficient implementation of AES on parallel architecture, such as graphics processing unit (GPU), has attracted considerable interest over the past decade. These prior studies mainly implemented the AES electronics code book (ECB) and counter (CTR) mode using the table-based approach. In this brief, we set a new speed record of AES-ECB and AES-CTR on GPU based on the proposed bit-sliced implementation techniques. Our implementation achieved 2.6% (ECB) and 9% (CTR) faster than the state-of-the-art table-based implementation on a RTX3080 GPU. Our work evaluated on an embedded GPU (Jetson Orin Nano) can also achieve high throughput at 60 Gb/s, which is 1.9% (ECB) and 7% (CTR) faster than state-of-the-art.
高级加密标准AES (advanced encryption standard)已被广泛用于保护各种应用中的数字数据,如安全的物联网通信、文件加密和伪随机数生成。AES在并行架构(如图形处理单元(GPU))上的高效实现在过去十年中引起了相当大的兴趣。这些先前的研究主要使用基于表的方法实现AES电子码本(ECB)和计数器(CTR)模式。在本文中,我们基于所提出的位切片实现技术,在GPU上创造了AES-ECB和AES-CTR的新的速度记录。我们的实现比RTX3080 GPU上最先进的基于表的实现快2.6% (ECB)和9% (CTR)。我们在嵌入式GPU (Jetson Orin Nano)上评估的工作也可以实现60 Gb/s的高吞吐量,比最先进的速度快1.9% (ECB)和7% (CTR)。
{"title":"Speed Record of AES-CTR and AES-ECB Bit-Sliced Implementation on GPUs","authors":"Wai-Kong Lee;Seog Chung Seo;Hwajeong Seo;Dong Cheon Kim;Seong Oun Hwang","doi":"10.1109/LES.2024.3409725","DOIUrl":"10.1109/LES.2024.3409725","url":null,"abstract":"The advanced encryption standard (AES) has been widely used to protect digital data in various applications, such as secure IoT communication, files encryption, and pseudorandom number generation. The efficient implementation of AES on parallel architecture, such as graphics processing unit (GPU), has attracted considerable interest over the past decade. These prior studies mainly implemented the AES electronics code book (ECB) and counter (CTR) mode using the table-based approach. In this brief, we set a new speed record of AES-ECB and AES-CTR on GPU based on the proposed bit-sliced implementation techniques. Our implementation achieved 2.6% (ECB) and 9% (CTR) faster than the state-of-the-art table-based implementation on a RTX3080 GPU. Our work evaluated on an embedded GPU (Jetson Orin Nano) can also achieve high throughput at 60 Gb/s, which is 1.9% (ECB) and 7% (CTR) faster than state-of-the-art.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"481-484"},"PeriodicalIF":1.7,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LoRa, Sigfox, and NB-IoT: An Empirical Comparison for IoT LPWAN Technologies in the Agribusiness LoRa、Sigfox 和 NB-IoT:农业综合企业物联网 LPWAN 技术的经验比较
IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-04-29 DOI: 10.1109/LES.2024.3394446
Juan Pablo Becoña;Marcel Grané;Matías Miguez;Alfredo Arnaud
In this letter, three battery-powered, custom Internet of Things (IoT) sensor nodes for the agribusiness, are presented: first, a Sigfox-based temperature-humidity index (THI) sensor to monitor the impact of heat stress in livestock, then a LoRaWAN version of an estrus detection collar for dairy farms, and finally a NB-IoT low-power A-GPS geolocation device for animals. Detailed power consumption measurements are presented and compared to highlight the benefits of each low-power wide-area network technology for the industry. The measured energy to transmit a single 10Byte payload packet was 90, 20, and 90 mJ for Sigfox, LoRa, and NB-IoT, respectively. With an adequate power management strategy, the nodes could operate up to 10 years in the case of the THI and estrus detector, and >1 yr in the case of the GPS tracker, powered by a single 1900 mA $cdot mathrm {h}~mathrm {LiSOCl}_{2}$ battery.
在这封信中,介绍了三个由电池供电的定制物联网(IoT)传感器节点:首先是一个基于 Sigfox 的温湿度指数(THI)传感器,用于监测牲畜热应激的影响;然后是一个 LoRaWAN 版本的发情检测项圈,用于奶牛场;最后是一个 NB-IoT 低功耗 A-GPS 动物地理定位设备。详细的功耗测量和比较突出了每种低功耗广域网技术在行业中的优势。Sigfox、LoRa 和 NB-IoT 传输单个 10 字节有效载荷数据包的实测能量分别为 90、20 和 90 mJ。采用适当的电源管理策略,在单节 1900 mA 电池供电的情况下,THI 和发情检测器的节点可运行长达 10 年,GPS 跟踪器的节点可运行 >1 年。
{"title":"LoRa, Sigfox, and NB-IoT: An Empirical Comparison for IoT LPWAN Technologies in the Agribusiness","authors":"Juan Pablo Becoña;Marcel Grané;Matías Miguez;Alfredo Arnaud","doi":"10.1109/LES.2024.3394446","DOIUrl":"10.1109/LES.2024.3394446","url":null,"abstract":"In this letter, three battery-powered, custom Internet of Things (IoT) sensor nodes for the agribusiness, are presented: first, a Sigfox-based temperature-humidity index (THI) sensor to monitor the impact of heat stress in livestock, then a LoRaWAN version of an estrus detection collar for dairy farms, and finally a NB-IoT low-power A-GPS geolocation device for animals. Detailed power consumption measurements are presented and compared to highlight the benefits of each low-power wide-area network technology for the industry. The measured energy to transmit a single 10Byte payload packet was 90, 20, and 90 mJ for Sigfox, LoRa, and NB-IoT, respectively. With an adequate power management strategy, the nodes could operate up to 10 years in the case of the THI and estrus detector, and >1 yr in the case of the GPS tracker, powered by a single 1900 mA\u0000<inline-formula> <tex-math>$cdot mathrm {h}~mathrm {LiSOCl}_{2}$ </tex-math></inline-formula>\u0000 battery.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 3","pages":"283-286"},"PeriodicalIF":1.7,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140829762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Embedded Systems Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1