首页 > 最新文献

IEEE Embedded Systems Letters最新文献

英文 中文
External Timed I/O Semantics Preserving Utilization Optimization for LET-Based Effect Chain 基于let效应链的外部定时I/O语义保持利用率优化
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3298741
Bo Zhang;Caixu Zhao;Xi Li
In real-time systems, it is essential to verify the end-to-end constraints that regulate the external input/output (I/O) semantics of the head and tail tasks in each effect chain during the design phase and preserve them during implementation. The logical execution time (LET) model has been adopted by the industry due to the predictability and composability of its timed behavior. However, during the execution of LET-based effect chains, there are ineffective jobs whose outputs are redundant or unused and do not contribute to the external I/O behavior. This letter proposes an offline optimization method for deriving multiframe tasks that achieve the external timed I/O semantics of the LET-based effect chains with reduced utilization. The method first removes ineffective jobs from each effect chain and further explores the benefits of removing jobs for single and crossing effect chains by loosening the LET interval. The method is evaluated using synthetic benchmarks that mimic real-world automotive applications.
在实时系统中,必须在设计阶段验证端到端约束,这些约束调节每个效果链中头/尾任务的外部I/O(输入/输出)语义,并在实现期间保留它们。逻辑执行时间(LET)模型由于其计时行为的可预测性和可组合性而被业界采用。然而,在基于let的效应链的执行过程中,存在一些无效的作业,其输出是冗余的或未使用的,并且对外部I/O行为没有贡献。本文提出了一种离线优化方法,用于派生多帧任务,该任务在降低利用率的情况下实现基于let的效果链的外部定时I/O语义。该方法首先从每个效果链中删除无效作业,并通过放宽逻辑执行时间间隔进一步探索为单个和交叉效果链删除作业的好处。该方法使用模拟真实世界汽车应用的合成基准进行评估。
{"title":"External Timed I/O Semantics Preserving Utilization Optimization for LET-Based Effect Chain","authors":"Bo Zhang;Caixu Zhao;Xi Li","doi":"10.1109/LES.2023.3298741","DOIUrl":"10.1109/LES.2023.3298741","url":null,"abstract":"In real-time systems, it is essential to verify the end-to-end constraints that regulate the external input/output (I/O) semantics of the head and tail tasks in each effect chain during the design phase and preserve them during implementation. The logical execution time (LET) model has been adopted by the industry due to the predictability and composability of its timed behavior. However, during the execution of LET-based effect chains, there are ineffective jobs whose outputs are redundant or unused and do not contribute to the external I/O behavior. This letter proposes an offline optimization method for deriving multiframe tasks that achieve the external timed I/O semantics of the LET-based effect chains with reduced utilization. The method first removes ineffective jobs from each effect chain and further explores the benefits of removing jobs for single and crossing effect chains by loosening the LET interval. The method is evaluated using synthetic benchmarks that mimic real-world automotive applications.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"198-201"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135699686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Flipping Bits Like a Pro: Precise Rowhammering on Embedded Devices 像专业人士一样翻转比特:在嵌入式设备上精确地滚动
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3298737
Anandpreet Kaur;Pravin Srivastav;Bibhas Ghoshal
In this article, we introduce Flip-On-Chip, the first end-to-end tool that thoroughly examines the vulnerability of embedded DRAM against rowhammer bit flips. Our tool, Flip-On-Chip, utilizes DRAM address mapping information to efficiently and deterministically perform a double-sided RowHammer test. We evaluated Flip-On-Chip on two DRAM modules: 1) LPDDR2 and 2) LPDDR4. It is found that our proposed tool increases the number of bit flips by 7.34 % on LPDDR2 and by 99.97 % on LPDDR4, as compared to state-of-the-art approaches provided in the literature. Additionally, Flip-On-Chip takes into account a number of system-level parameters to evaluate their influence on triggering Rowhammer bit flips.
在本文中,我们介绍了Flip-On-Chip,这是第一个端到端工具,可以彻底检查嵌入式DRAM对rowhammer位翻转的脆弱性。我们的工具Flip-On-Chip利用DRAM地址映射信息高效、确定地执行双面RowHammer测试。我们在两个DRAM模块lpddr2和LPDDR4上评估了Flip-On-Chip。与文献中提供的最先进的方法相比,我们提出的工具在LPDDR2上增加了7.34%的位翻转次数,在LPDDR4上增加了99.97%。此外,Flip-On-Chip考虑了许多系统级参数来评估它们对触发Rowhammer位翻转的影响。
{"title":"Flipping Bits Like a Pro: Precise Rowhammering on Embedded Devices","authors":"Anandpreet Kaur;Pravin Srivastav;Bibhas Ghoshal","doi":"10.1109/LES.2023.3298737","DOIUrl":"10.1109/LES.2023.3298737","url":null,"abstract":"In this article, we introduce Flip-On-Chip, the first end-to-end tool that thoroughly examines the vulnerability of embedded DRAM against rowhammer bit flips. Our tool, Flip-On-Chip, utilizes DRAM address mapping information to efficiently and deterministically perform a double-sided RowHammer test. We evaluated Flip-On-Chip on two DRAM modules: 1) LPDDR2 and 2) LPDDR4. It is found that our proposed tool increases the number of bit flips by 7.34 % on LPDDR2 and by 99.97 % on LPDDR4, as compared to state-of-the-art approaches provided in the literature. Additionally, Flip-On-Chip takes into account a number of system-level parameters to evaluate their influence on triggering Rowhammer bit flips.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"218-221"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135702640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Differentiable Slimming for Memory-Efficient Transformers 内存高效变压器的可微分瘦身
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3299638
Nikolay Penkov;Konstantinos Balaskas;Martin Rapp;Joerg Henkel
Transformer models are continuously achieving state-of-the-art performance on a wide range of benchmarks. To meet demanding performance targets, the number of model parameters is continuously increased. As a result, state-of-the-art Transformers require substantial computational resources prohibiting their deployment on consumer-grade hardware. In the literature, overparameterized Transformers are successfully reduced in size with the help of pruning strategies. Existing works lack the ability to optimize the full architecture, without incurring significant overheads, in a fully differentiable manner. Our work proposes a single-stage approach for training a Transformer for memory-efficient inference and various resource-constrained scenarios. Transformer blocks are extended with trainable gate parameters, which attribute importance and control information flow. Their integration into a differentiable pruning-aware training scheme allows the extraction of extremely sparse subnetworks at runtime, with minimal performance degradation. Evaluative pruning results, at the attention head and layer levels, illustrate the memory efficiency of our trained subnetworks under various memory budgets.
变压器模型在广泛的基准测试中不断实现最先进的性能。为了满足苛刻的性能目标,模型参数的数量不断增加。因此,最先进的变形金刚需要大量的计算资源,因此无法在消费级硬件上部署它们。在文献中,过度参数化的变压器在修剪策略的帮助下成功地减小了尺寸。现有的工作缺乏以完全可微分的方式在不产生重大开销的情况下优化整个架构的能力。我们的工作提出了一种单阶段方法,用于训练Transformer进行内存效率推断和各种资源约束场景。变压器块扩展为可训练的栅极参数,这些栅极参数具有重要属性并控制信息流。它们集成到一个可微分的修剪感知训练方案中,允许在运行时以最小的性能下降提取极其稀疏的子网络。在注意头和层级别上的评估剪枝结果说明了我们训练的子网络在不同内存预算下的内存效率。
{"title":"Differentiable Slimming for Memory-Efficient Transformers","authors":"Nikolay Penkov;Konstantinos Balaskas;Martin Rapp;Joerg Henkel","doi":"10.1109/LES.2023.3299638","DOIUrl":"10.1109/LES.2023.3299638","url":null,"abstract":"Transformer models are continuously achieving state-of-the-art performance on a wide range of benchmarks. To meet demanding performance targets, the number of model parameters is continuously increased. As a result, state-of-the-art Transformers require substantial computational resources prohibiting their deployment on consumer-grade hardware. In the literature, overparameterized Transformers are successfully reduced in size with the help of pruning strategies. Existing works lack the ability to optimize the full architecture, without incurring significant overheads, in a fully differentiable manner. Our work proposes a single-stage approach for training a Transformer for memory-efficient inference and various resource-constrained scenarios. Transformer blocks are extended with trainable gate parameters, which attribute importance and control information flow. Their integration into a differentiable pruning-aware training scheme allows the extraction of extremely sparse subnetworks at runtime, with minimal performance degradation. Evaluative pruning results, at the attention head and layer levels, illustrate the memory efficiency of our trained subnetworks under various memory budgets.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"186-189"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135699987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Flexibility Designs of Quantized Runtime Reconfigurable Multi-Precision Multipliers 量化运行时可重构多精度乘法器的高灵活性设计
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3298736
Yuhao Liu;Shubham Rai;Salim Ullah;Akash Kumar
Recent research widely explored the quantization schemes on hardware. However, for recent accelerators only supporting 8 bits quantization, such as Google TPU, the lower-precision inputs, such as 1/2-bit quantized neural network models in FINN, need to extend the data width to meet the hardware interface requirements. This conversion influences communication and computing efficiency. To improve the flexibility and throughput of quantized multipliers, our work explores two novel reconfigurable multiplier designs that can repartition the number of input channels in runtime based on input precision and reconfigure the signed/unsigned multiplication modes. In this letter, we explored two novel runtime reconfigurable multi-precision multipliers based on the multiplier-tree and bit-serial multiplier architectures. We evaluated our designs by implementing a systolic array and single-layer neural network accelerator on the Ultra96 FPGA platform. The result shows the flexibility of our implementation and the high speedup for low-precision quantized multiplication working with a fixed data width of the hardware interface.
近年来的研究广泛地探索了硬件上的量化方案。然而,对于最近仅支持8位量化的加速器,如Google TPU,较低精度的输入,如FINN中的1/2位量化神经网络模型,需要扩展数据宽度以满足硬件接口要求。这种转换影响通信和计算效率。为了提高量化乘法器的灵活性和吞吐量,我们的工作探索了两种新的可重构乘法器设计,它们可以在运行时根据输入精度重新划分输入通道的数量,并重新配置有符号/无符号乘法模式。在本文中,我们探索了两种基于乘法器树和位串行乘法器架构的新型运行时可重构多精度乘法器。我们通过在Ultra96 FPGA平台上实现收缩阵列和单层神经网络加速器来评估我们的设计。结果表明,在固定数据宽度的硬件接口下,实现的灵活性和低精度量化乘法的高加速。
{"title":"High-Flexibility Designs of Quantized Runtime Reconfigurable Multi-Precision Multipliers","authors":"Yuhao Liu;Shubham Rai;Salim Ullah;Akash Kumar","doi":"10.1109/LES.2023.3298736","DOIUrl":"10.1109/LES.2023.3298736","url":null,"abstract":"Recent research widely explored the quantization schemes on hardware. However, for recent accelerators only supporting 8 bits quantization, such as Google TPU, the lower-precision inputs, such as 1/2-bit quantized neural network models in FINN, need to extend the data width to meet the hardware interface requirements. This conversion influences communication and computing efficiency. To improve the flexibility and throughput of quantized multipliers, our work explores two novel reconfigurable multiplier designs that can repartition the number of input channels in runtime based on input precision and reconfigure the signed/unsigned multiplication modes. In this letter, we explored two novel runtime reconfigurable multi-precision multipliers based on the multiplier-tree and bit-serial multiplier architectures. We evaluated our designs by implementing a systolic array and single-layer neural network accelerator on the Ultra96 FPGA platform. The result shows the flexibility of our implementation and the high speedup for low-precision quantized multiplication working with a fixed data width of the hardware interface.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"194-197"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135699988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware Trojan Detection Method Against Balanced Controllability Trigger Design 针对平衡可控性触发器设计的硬件木马检测方法
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3318591
Wei-Ting Hsu;Pei-Yu Lo;Chi-Wei Chen;Chin-Wei Tien;Sy-Yen Kuo
HT has become a serious threat to the Internet of Things due to the globalization of the integrated circuit industry. To evade functional verification, HTs tend to have at least one trigger signal at the gate-level netlist with a very low transition probability. Based on this nature, previous studies use imbalanced controllability as a feature to detect HTs, assuming that signals with imbalanced controllability are always accompanied by low transition probability. However, this study has found out a way to create a new type of HT that has low transition probability but balanced controllability, against previous methods. Hence, current imbalanced controllability detectors are inadequate in this scenario. To address this limitation, we propose a probability-based detection method that uses unsupervised anomaly analysis to detect HTs. Our proposed method detects not only the proposed HT but also the 580 Trojan benchmarks on Trusthub. Experimental results show that our proposed detector outperforms other detectors, achieving an overall 100% true positive rate and 0.37% false positive rate on the 580 benchmarks.
由于集成电路产业的全球化,HT 已成为物联网的一个严重威胁。为了逃避功能验证,HT 往往在门级网表中至少有一个触发信号,其转换概率非常低。基于这一特性,以往的研究将不平衡可控性作为检测 HT 的一个特征,认为具有不平衡可控性的信号总是伴随着低转换概率。然而,与之前的方法相比,本研究发现了一种新的 HT 类型,即过渡概率低但可控性平衡。因此,当前的不平衡可控性检测器在这种情况下是不够的。针对这一局限,我们提出了一种基于概率的检测方法,利用无监督异常分析来检测 HT。我们提出的方法不仅能检测所提出的 HT,还能检测 Trusthub 上的 580 种木马基准。实验结果表明,我们提出的检测器优于其他检测器,在 580 个基准上实现了 100% 的总体真阳性率和 0.37% 的假阳性率。
{"title":"Hardware Trojan Detection Method Against Balanced Controllability Trigger Design","authors":"Wei-Ting Hsu;Pei-Yu Lo;Chi-Wei Chen;Chin-Wei Tien;Sy-Yen Kuo","doi":"10.1109/LES.2023.3318591","DOIUrl":"10.1109/LES.2023.3318591","url":null,"abstract":"HT has become a serious threat to the Internet of Things due to the globalization of the integrated circuit industry. To evade functional verification, HTs tend to have at least one trigger signal at the gate-level netlist with a very low transition probability. Based on this nature, previous studies use imbalanced controllability as a feature to detect HTs, assuming that signals with imbalanced controllability are always accompanied by low transition probability. However, this study has found out a way to create a new type of HT that has low transition probability but balanced controllability, against previous methods. Hence, current imbalanced controllability detectors are inadequate in this scenario. To address this limitation, we propose a probability-based detection method that uses unsupervised anomaly analysis to detect HTs. Our proposed method detects not only the proposed HT but also the 580 Trojan benchmarks on Trusthub. Experimental results show that our proposed detector outperforms other detectors, achieving an overall 100% true positive rate and 0.37% false positive rate on the 580 benchmarks.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 2","pages":"178-181"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135700341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effects of Runtime Reconfiguration on PUFs Implemented as FPGA-Based Accelerators 运行时重构对fpga加速puf实现的影响
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3299214
Hassan Nassar;Lars Bauer;Jörg Henkel
Physical unclonable functions (PUFs) are a handy security primitive for resource-constrained devices. They offer an alternative to the resource-intensive classical hash algorithms. Using the IC differences resulting from the fabrication process, PUFs give device-specific outputs (responses) when given the same inputs (challenges). Hence, without using a device-specific key, PUFs can generate device-specific responses. FPGAs are one of the platforms that are heavily studied as a candidate for PUF implementation. The idea is that a PUF that is designed as an HDL code can be used as part of the static design or as a dynamic accelerator. Previous works studied PUF implementation as part of the static design. In contrast to the state-of-the-art, this letter studies PUFs when used as runtime reconfigurable accelerators. In this letter, we find that not all regions of an FPGA are equally suitable for implementing different PUF types. Regions, where clock routing resources exist, are the worst suited for PUF implementation. Moreover, we find out that for certain PUF types, the property of dynamic partial reconfiguration can lead to performance degradation if not applied carefully. When static routing passing through the region increases, the PUF performance degrades significantly.
puf对于资源受限的设备是一种方便的安全原语。它们为资源密集型的经典散列算法提供了一种替代方案。利用制造过程中产生的IC差异,puf在给定相同的输入(挑战)时提供特定于设备的输出(响应)。因此,无需使用特定于设备的密钥,puf就可以生成特定于设备的响应。fpga是PUF实现的候选平台之一。其思想是,设计为HDL代码的PUF可以用作静态设计的一部分,也可以用作动态加速器。以前的工作将PUF实现作为静态设计的一部分进行研究。与最先进的技术相比,这项工作研究了puf作为运行时可重构加速器的情况。在这项工作中,我们发现并非FPGA的所有区域都同样适合实现不同的PUF类型。存在时钟路由资源的区域最不适合PUF实现。此外,我们发现对于某些PUF类型,如果不小心应用动态部分重构的特性会导致性能下降。当经过该区域的静态路由增加时,PUF性能会显著下降。
{"title":"Effects of Runtime Reconfiguration on PUFs Implemented as FPGA-Based Accelerators","authors":"Hassan Nassar;Lars Bauer;Jörg Henkel","doi":"10.1109/LES.2023.3299214","DOIUrl":"10.1109/LES.2023.3299214","url":null,"abstract":"Physical unclonable functions (PUFs) are a handy security primitive for resource-constrained devices. They offer an alternative to the resource-intensive classical hash algorithms. Using the IC differences resulting from the fabrication process, PUFs give device-specific outputs (responses) when given the same inputs (challenges). Hence, without using a device-specific key, PUFs can generate device-specific responses. FPGAs are one of the platforms that are heavily studied as a candidate for PUF implementation. The idea is that a PUF that is designed as an HDL code can be used as part of the static design or as a dynamic accelerator. Previous works studied PUF implementation as part of the static design. In contrast to the state-of-the-art, this letter studies PUFs when used as runtime reconfigurable accelerators. In this letter, we find that not all regions of an FPGA are equally suitable for implementing different PUF types. Regions, where clock routing resources exist, are the worst suited for PUF implementation. Moreover, we find out that for certain PUF types, the property of dynamic partial reconfiguration can lead to performance degradation if not applied carefully. When static routing passing through the region increases, the PUF performance degrades significantly.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"174-177"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135700348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FPGA Implementation of Modified SNOW 3G Stream Ciphers Using Fast and Resource Efficient Substitution Box 基于快速资源高效替换盒的改进SNOW 3G流密码的FPGA实现
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3298743
Sushree Sila P. Goswami;Gaurav Trivedi
Security plays a vital role in electronic communication, particularly in wireless networks like long term evolution (LTE), where safeguarding data and resources from malicious activities is crucial. Cryptographic algorithms are at the core of security mechanisms, ensuring the protection of sensitive information. While software implementations of these algorithms are relatively straightforward, they often need more speed in real-time applications for communication devices like mobile phones. Consequently, implementation of these cryptographic algorithms as hardware crypto processors becomes necessary. This letter presents a novel implementation of the SNOW3G crypto processor architecture for the 4G LTE security applications, focusing on area, power, and efficiency. The proposed modified SNOW3G architecture utilizes only 0.31% of the available area when implemented on an FPGA Zynq ZC702 and achieves 28.34 efficiency which quantifies as the ratio of throughput to the area. Furthermore, it consumes a total power of 0.142 mW. These low power and area requirements make the design highly suitable for integration into mobile devices, meeting their specific constraints and enabling efficient cryptographic operations.
安全在电子通信中起着至关重要的作用,特别是在LTE(长期演进)这样的无线网络中,保护数据和资源免受恶意活动的侵害至关重要。加密算法是安全机制的核心,确保敏感信息得到保护。虽然这些算法的软件实现相对简单,但在移动电话等通信设备的实时应用中,它们通常需要更快的速度。因此,有必要将这些加密算法实现为硬件加密处理器。本文提出了一种针对4G LTE安全应用的SNOW3G加密处理器架构的新实现,重点关注了面积、功耗和效率。提出的改进的SNOW3G架构在FPGA Zynq ZC702上实现时仅利用了0.31%的可用面积,并实现了28.34的效率(以吞吐量与面积的比率来量化)。此外,它消耗的总功率为0.142mW。这些低功耗和面积要求使设计非常适合集成到移动设备中,满足其特定限制并实现高效的加密操作。
{"title":"FPGA Implementation of Modified SNOW 3G Stream Ciphers Using Fast and Resource Efficient Substitution Box","authors":"Sushree Sila P. Goswami;Gaurav Trivedi","doi":"10.1109/LES.2023.3298743","DOIUrl":"10.1109/LES.2023.3298743","url":null,"abstract":"Security plays a vital role in electronic communication, particularly in wireless networks like long term evolution (LTE), where safeguarding data and resources from malicious activities is crucial. Cryptographic algorithms are at the core of security mechanisms, ensuring the protection of sensitive information. While software implementations of these algorithms are relatively straightforward, they often need more speed in real-time applications for communication devices like mobile phones. Consequently, implementation of these cryptographic algorithms as hardware crypto processors becomes necessary. This letter presents a novel implementation of the SNOW3G crypto processor architecture for the 4G LTE security applications, focusing on area, power, and efficiency. The proposed modified SNOW3G architecture utilizes only 0.31% of the available area when implemented on an FPGA Zynq ZC702 and achieves 28.34 efficiency which quantifies as the ratio of throughput to the area. Furthermore, it consumes a total power of 0.142 mW. These low power and area requirements make the design highly suitable for integration into mobile devices, meeting their specific constraints and enabling efficient cryptographic operations.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"238-241"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135700003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vector-Based Dedicated Processor Architecture for Efficient Tracking in VSLAM Systems 基于矢量的VSLAM系统高效跟踪专用处理器体系结构
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3298900
Dejian Li;Xi Feng;Chongfei Shen;Qi Chen;Lixin Yang;Sihai Qiu;Xin Jin;Meng Liu
This letter introduces a dedicated processor architecture, called MEGACORE, which leverages vector technology to enhance tracking performance in visual simultaneous localization and mapping (VSLAM) systems. By harnessing the inherent parallelism of vector processing and incorporating a floating point unit (FPU), MEGACORE achieves significant acceleration in the tracking task of VSLAM. Through careful optimizations, we achieved notable improvements compared to the baseline design. Our optimizations resulted in a 14.9% reduction in the area parameter and a 4.4% reduction in power consumption. Furthermore, by conducting application benchmarks, we determined that the average speedup ratio across all stages of the tracking process is 3.25. These findings highlight the effectiveness of MEGACORE in improving the efficiency and performance of VSLAM systems, making it a promising solution for real-world implementations in embedded systems.
本文介绍了一种名为MEGACORE的专用处理器架构,该架构利用矢量技术来提高视觉同步定位和映射(VSLAM)系统的跟踪性能。通过利用矢量处理固有的并行性并结合浮点单元(FPU), MEGACORE在VSLAM跟踪任务中实现了显着的加速。通过仔细的优化,与基线设计相比,我们取得了显著的改进。我们的优化使面积参数降低了14.9%,功耗降低了4.4%。此外,通过执行应用程序基准测试,我们确定跟踪过程所有阶段的平均加速比为3.25。这些发现突出了MEGACORE在提高VSLAM系统的效率和性能方面的有效性,使其成为嵌入式系统中实际实现的有前途的解决方案。
{"title":"Vector-Based Dedicated Processor Architecture for Efficient Tracking in VSLAM Systems","authors":"Dejian Li;Xi Feng;Chongfei Shen;Qi Chen;Lixin Yang;Sihai Qiu;Xin Jin;Meng Liu","doi":"10.1109/LES.2023.3298900","DOIUrl":"10.1109/LES.2023.3298900","url":null,"abstract":"This letter introduces a dedicated processor architecture, called MEGACORE, which leverages vector technology to enhance tracking performance in visual simultaneous localization and mapping (VSLAM) systems. By harnessing the inherent parallelism of vector processing and incorporating a floating point unit (FPU), MEGACORE achieves significant acceleration in the tracking task of VSLAM. Through careful optimizations, we achieved notable improvements compared to the baseline design. Our optimizations resulted in a 14.9% reduction in the area parameter and a 4.4% reduction in power consumption. Furthermore, by conducting application benchmarks, we determined that the average speedup ratio across all stages of the tracking process is 3.25. These findings highlight the effectiveness of MEGACORE in improving the efficiency and performance of VSLAM systems, making it a promising solution for real-world implementations in embedded systems.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"182-185"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135700340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimized Local Path Planner Implementation for GPU-Accelerated Embedded Systems gpu加速嵌入式系统的优化本地路径规划器实现
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3298733
Filippo Muzzini;Nicola Capodieci;Federico Ramanzin;Paolo Burgio
Autonomous vehicles are latency-sensitive systems. The planning phase is a critical component of such systems, during which the in-vehicle compute platform is responsible for determining the future maneuvers that the vehicle will follow. In this letter, we present a GPU-accelerated optimized implementation of the Frenet Path Planner, a widely known path planning algorithm. Unlike the current state of the art, our implementation accelerates the entire algorithm, including the path generation and collision avoidance phases. We measure the execution time of our implementation and demonstrate dramatic speedups compared to the CPU baseline implementation. Additionally, we evaluate the impact of different precision types (double, float, and half) on trajectory errors to investigate the tradeoff between completion latencies and computation precision.
自动驾驶汽车是延迟敏感系统。规划阶段是此类系统的关键组成部分,在此期间,车载计算平台负责确定车辆将遵循的未来机动。在本文中,我们提出了一个gpu加速的优化实现Frenet路径规划器,一个广为人知的路径规划算法。与目前最先进的技术不同,我们的实现加速了整个算法,包括路径生成和避免碰撞阶段。我们测量了实现的执行时间,并演示了与CPU基准实现相比的显著加速。此外,我们评估了不同精度类型(double, float, half)对轨迹误差的影响,以研究完成延迟和计算精度之间的权衡。
{"title":"Optimized Local Path Planner Implementation for GPU-Accelerated Embedded Systems","authors":"Filippo Muzzini;Nicola Capodieci;Federico Ramanzin;Paolo Burgio","doi":"10.1109/LES.2023.3298733","DOIUrl":"10.1109/LES.2023.3298733","url":null,"abstract":"Autonomous vehicles are latency-sensitive systems. The planning phase is a critical component of such systems, during which the in-vehicle compute platform is responsible for determining the future maneuvers that the vehicle will follow. In this letter, we present a GPU-accelerated optimized implementation of the Frenet Path Planner, a widely known path planning algorithm. Unlike the current state of the art, our implementation accelerates the entire algorithm, including the path generation and collision avoidance phases. We measure the execution time of our implementation and demonstrate dramatic speedups compared to the CPU baseline implementation. Additionally, we evaluate the impact of different precision types (double, float, and half) on trajectory errors to investigate the tradeoff between completion latencies and computation precision.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"214-217"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135700673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware–Software Co-Optimization of Long-Latency Stochastic Computing 长延迟随机计算的软硬件协同优化
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-25 DOI: 10.1109/LES.2023.3298734
Sercan Aygun;Lida Kouhalvandi;M. Hassan Najafi;Serdar Ozoguz;Ece Olcay Gunes
Stochastic computing (SC) is an emerging paradigm that offers hardware-efficient solutions for developing low-cost and noise-robust architectures. In SC, deterministic logic systems are employed along with bit-stream sources to process scalar values. However, using long bit-streams introduces challenges, such as increased latency and significant energy consumption. To address these issues, we present an optimization-oriented approach for modeling and sizing new logic gates, which results in optimal latency. The optimization process is automated using hardware–software cooperation by integrating Cadence and MATLAB environments. Initially, we optimize the circuit topology by leveraging the design parameters of two-input basic logic gates. This optimization is performed using a multiobjective approach based on a deep neural network. Subsequently, we employ the proposed gates to demonstrate favorable solutions targeting SC-based operations.
随机计算(SC)是一种新兴的范式,它为开发低成本和抗噪声架构提供了硬件高效的解决方案。在SC中,确定性逻辑系统与位流源一起用于处理标量值。然而,使用长比特流会带来一些挑战,比如延迟增加和能耗增加。为了解决这些问题,我们提出了一种面向优化的方法来建模和确定新逻辑门的大小,从而获得最佳延迟。通过集成Cadence和MATLAB环境,利用硬件软件协作实现优化过程自动化。首先,我们通过利用双输入基本逻辑门的设计参数来优化电路拓扑。这种优化是使用基于深度神经网络的多目标方法进行的。随后,我们采用所提出的门来展示针对基于sc的操作的有利解决方案。
{"title":"Hardware–Software Co-Optimization of Long-Latency Stochastic Computing","authors":"Sercan Aygun;Lida Kouhalvandi;M. Hassan Najafi;Serdar Ozoguz;Ece Olcay Gunes","doi":"10.1109/LES.2023.3298734","DOIUrl":"10.1109/LES.2023.3298734","url":null,"abstract":"Stochastic computing (SC) is an emerging paradigm that offers hardware-efficient solutions for developing low-cost and noise-robust architectures. In SC, deterministic logic systems are employed along with bit-stream sources to process scalar values. However, using long bit-streams introduces challenges, such as increased latency and significant energy consumption. To address these issues, we present an optimization-oriented approach for modeling and sizing new logic gates, which results in optimal latency. The optimization process is automated using hardware–software cooperation by integrating Cadence and MATLAB environments. Initially, we optimize the circuit topology by leveraging the design parameters of two-input basic logic gates. This optimization is performed using a multiobjective approach based on a deep neural network. Subsequently, we employ the proposed gates to demonstrate favorable solutions targeting SC-based operations.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"190-193"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135700675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Embedded Systems Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1