ACM Transactions on Embedded Computing Systems最新文献_第2页

SPIMulator: A Spintronic Processing-In-Memory Simulator for Racetracks SPIMulator：用于赛道的自旋电子处理内存模拟器

IF 2 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Embedded Computing Systems

Pub Date : 2024-02-08 DOI: 10.1145/3645112

Pavia Bera, Stephen Cahoon, Sanjukta Bhanja, Alex Jones

In-memory processing is becoming a popular method to alleviate the memory bottleneck of the von Neumann computing model. With the goal of improving both latency and energy cost associated with such in-memory processing, emerging non-volatile memory technologies, such as Spintronic magnetic memory, are of particular interest as they can provide a near-SRAM read/write performance and eliminate nearly all static energy without experiencing any endurance limitations. Spintronic Racetrack Memory (RM) further addresses density concerns of spin-transfer torque memory (STT-MRAM). Moreover, it has recently been demonstrated that portions of RM nanowires can function as a polymorphic gate, which can be leveraged to implement multi-operand bulk bitwise operations. With more complex control, they can also be leveraged to build arithmetic integer and floating point processing in memory (PIM) primitives. This paper proposes SPIMulator, a Spintronic PIM simulator that can simulate the storage and PIM architecture of executing PIM commands in Racetrack memory. SPIMulator functionally models the polymorphic gate properties recently proposed for Racetrack memory, which allows transverse access that determines the number of ‘1’s in a segment of each Racetrack nanowire. From this simulation, SPIMulator can report real-time performance statistics such as cycle count and energy. Thus, SPIMulator simulates the multi-operand bit-wise logic operations recently proposed and can be easily extended to implement new PIM operations as they are developed. Due to the functional nature of SPIMulator, it can serve as a programming environment that allows development of PIM-based codes for verification of new acceleration algorithms. We demonstrate the value of SPIMulator through the modeling and estimations of performance and energy consumption of a variety of example applications, including the Advanced Encryption Standard (AES) for encryption primarily based on logical and look-up operations; multiplication of matrices, a frequent requirement in scientific, signal processing, and machine learning algorithms; and bitmap indices a common search table employed for database lookups.

内存处理正成为缓解冯-诺依曼计算模型内存瓶颈的流行方法。为了改善与这种内存处理相关的延迟和能耗成本，新兴的非易失性内存技术（如自旋电子磁性内存）尤其引人关注，因为它们可以提供接近 SRAM 的读/写性能，并消除几乎所有的静态能耗，而不会受到任何续航时间的限制。自旋电子磁道存储器（RM）进一步解决了自旋转移力矩存储器（STT-MRAM）的密度问题。此外，最近的研究表明，RM 纳米线的一部分可以作为多态门，利用它可以实现多操作数的批量位操作。通过更复杂的控制，还可以利用它们构建算术整数和浮点存储器处理（PIM）基元。本文提出的 SPIMulator 是一款 Spintronic PIM 仿真器，可以模拟在 Racetrack 内存中执行 PIM 命令的存储和 PIM 架构。SPIMulator 在功能上模拟了最近为 Racetrack 存储器提出的多态门特性，它允许横向访问，从而决定每个 Racetrack 纳米线段中 "1 "的数量。通过这种模拟，SPIMulator 可以报告实时性能统计数据，如周期计数和能量。这样，SPIMulator 就能模拟最近提出的多操作数比特慧逻辑运算，并能在开发出新的 PIM 运算时轻松加以扩展。由于 SPIMulator 的功能特性，它可以作为一种编程环境，允许开发基于 PIM 的代码，以验证新的加速算法。我们通过对各种示例应用的性能和能耗进行建模和估算，证明了 SPIMulator 的价值，这些应用包括主要基于逻辑和查找操作的高级加密标准（AES）加密；矩阵乘法（科学、信号处理和机器学习算法中的常见要求）；以及位图索引（数据库查找中常用的搜索表）。

{"title":"SPIMulator: A Spintronic Processing-In-Memory Simulator for Racetracks","authors":"Pavia Bera, Stephen Cahoon, Sanjukta Bhanja, Alex Jones","doi":"10.1145/3645112","DOIUrl":"https://doi.org/10.1145/3645112","url":null,"abstract":"In-memory processing is becoming a popular method to alleviate the memory bottleneck of the von Neumann computing model. With the goal of improving both latency and energy cost associated with such in-memory processing, emerging non-volatile memory technologies, such as Spintronic magnetic memory, are of particular interest as they can provide a near-SRAM read/write performance and eliminate nearly all static energy without experiencing any endurance limitations. Spintronic Racetrack Memory (RM) further addresses density concerns of spin-transfer torque memory (STT-MRAM). Moreover, it has recently been demonstrated that portions of RM nanowires can function as a polymorphic gate, which can be leveraged to implement multi-operand bulk bitwise operations. With more complex control, they can also be leveraged to build arithmetic integer and floating point processing in memory (PIM) primitives. This paper proposes SPIMulator, a Spintronic PIM simulator that can simulate the storage and PIM architecture of executing PIM commands in Racetrack memory. SPIMulator functionally models the polymorphic gate properties recently proposed for Racetrack memory, which allows transverse access that determines the number of ‘1’s in a segment of each Racetrack nanowire. From this simulation, SPIMulator can report real-time performance statistics such as cycle count and energy. Thus, SPIMulator simulates the multi-operand bit-wise logic operations recently proposed and can be easily extended to implement new PIM operations as they are developed. Due to the functional nature of SPIMulator, it can serve as a programming environment that allows development of PIM-based codes for verification of new acceleration algorithms. We demonstrate the value of SPIMulator through the modeling and estimations of performance and energy consumption of a variety of example applications, including the Advanced Encryption Standard (AES) for encryption primarily based on logical and look-up operations; multiplication of matrices, a frequent requirement in scientific, signal processing, and machine learning algorithms; and bitmap indices a common search table employed for database lookups.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"1 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139754352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

STDF: Spatio-Temporal Deformable Fusion for Video Quality Enhancement on Embedded Platforms STDF：嵌入式平台上的时空可变形融合视频质量增强技术

IF 2 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Embedded Computing Systems

Pub Date : 2024-02-08 DOI: 10.1145/3645113

Jianing Deng, Shunjie Dong, Lvcheng Chen, Jingtong Hu, Cheng Zhuo

With the development of embedded systems and deep learning, it is feasible to combine them for offering various and convenient human-centered services, which is based on high-quality (HQ) videos. However, due to the limit of video traffic load and unavoidable noise, the visual quality of an image from an edge camera may degrade significantly, influencing the overall video and service quality. To maintain video stability, video quality enhancement (QE), aiming at recovering high-quality (HQ) videos from their distorted low-quality (LQ) sources, has aroused increasing attention in recent years. The key challenge for video quality enhancement lies in how to effectively aggregate complementary information from multiple frames (i.e., temporal fusion). To handle diverse motion in videos, existing methods commonly apply motion compensation before the temporal fusion. However, the motion field estimated from the distorted LQ video tends to be inaccurate and unreliable, thereby resulting in ineffective fusion and restoration. In addition, motion estimation for consecutive frames is generally conducted in a pairwise manner, which leads to expensive and inefficient computation. In this paper, we propose a fast yet effective temporal fusion scheme for video QE by incorporating a novel Spatio-Temporal Deformable Convolution (STDC) to simultaneously compensate motion and aggregate temporal information. Specifically, the proposed temporal fusion scheme takes a target frame along with its adjacent reference frames as input to jointly estimate an offset field to deform the spatio-temporal sampling positions of convolution. As a result, complementary information from multiple frames can be fused within the STDC operation in one forward pass. Extensive experimental results on three benchmark datasets show that our method performs favorably to the state-of-the-arts in terms of accuracy and efficiency.

随着嵌入式系统和深度学习的发展，将二者结合起来提供以高质量（HQ）视频为基础的各种便捷的以人为本的服务成为可能。然而，由于视频流量负载的限制和不可避免的噪声，来自边缘摄像头的图像的视觉质量可能会大幅下降，从而影响整体视频和服务质量。为了保持视频的稳定性，近年来，旨在从失真低质量（LQ）视频源恢复高质量（HQ）视频的视频质量增强（QE）引起了越来越多的关注。视频质量增强的关键挑战在于如何有效地汇聚来自多个帧的互补信息（即时序融合）。为了处理视频中的各种运动，现有方法通常会在时间融合之前应用运动补偿。然而，从失真 LQ 视频中估算出的运动场往往不准确、不可靠，从而导致融合和还原效果不佳。此外，连续帧的运动估计通常是以成对的方式进行的，这会导致昂贵而低效的计算。在本文中，我们提出了一种快速而有效的视频 QE 时空融合方案，该方案采用了一种新颖的时空变形卷积（STDC）来同时补偿运动和聚合时空信息。具体来说，所提出的时空融合方案将目标帧及其相邻参考帧作为输入，共同估算偏移场以变形卷积的时空采样位置。因此，来自多个帧的互补信息可以在 STDC 操作中一次性融合。在三个基准数据集上的大量实验结果表明，我们的方法在准确性和效率方面都优于最新技术。

{"title":"STDF: Spatio-Temporal Deformable Fusion for Video Quality Enhancement on Embedded Platforms","authors":"Jianing Deng, Shunjie Dong, Lvcheng Chen, Jingtong Hu, Cheng Zhuo","doi":"10.1145/3645113","DOIUrl":"https://doi.org/10.1145/3645113","url":null,"abstract":"With the development of embedded systems and deep learning, it is feasible to combine them for offering various and convenient human-centered services, which is based on high-quality (HQ) videos. However, due to the limit of video traffic load and unavoidable noise, the visual quality of an image from an edge camera may degrade significantly, influencing the overall video and service quality. To maintain video stability, video quality enhancement (QE), aiming at recovering high-quality (HQ) videos from their distorted low-quality (LQ) sources, has aroused increasing attention in recent years. The key challenge for video quality enhancement lies in how to effectively aggregate complementary information from multiple frames (i.e., temporal fusion). To handle diverse motion in videos, existing methods commonly apply motion compensation before the temporal fusion. However, the motion field estimated from the distorted LQ video tends to be inaccurate and unreliable, thereby resulting in ineffective fusion and restoration. In addition, motion estimation for consecutive frames is generally conducted in a pairwise manner, which leads to expensive and inefficient computation. In this paper, we propose a fast yet effective temporal fusion scheme for video QE by incorporating a novel Spatio-Temporal Deformable Convolution (STDC) to simultaneously compensate motion and aggregate temporal information. Specifically, the proposed temporal fusion scheme takes a target frame along with its adjacent reference frames as input to jointly estimate an offset field to deform the spatio-temporal sampling positions of convolution. As a result, complementary information from multiple frames can be fused within the STDC operation in one forward pass. Extensive experimental results on three benchmark datasets show that our method performs favorably to the state-of-the-arts in terms of accuracy and efficiency.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"11 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139773088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Space-Grained Cleaning Method to Reduce Long-Tail Latency of DM-SMR Disks 降低 DM-SMR 磁盘长尾延迟的空间粒度清洁方法

IF 2 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Embedded Computing Systems

Pub Date : 2024-02-05 DOI: 10.1145/3643827

Chin-Hsien Wu, Cheng-Tze Lee, Yi-Ren Tsai, Cheng-Yen Wu

DM-SMR (device-managed shingled magnetic recording) disks allocate a portion of disk space as the persistent cache (PC) to address the issue of overlapping tracks during data updates. When the PC space becomes insufficient, a space cleaning is triggered to reclaim its invalid space. However, the space cleaning is time-consuming and contributes to the long-tail latency of DM-SMR disks. In the paper, we will propose a space-grained cleaning method that leverages various idle periods to effectively reduce the long-tail latency of DM-SMR disks. The objective is to perform a proper space-grained cleaning for a suitable space region at an appropriate time period, thereby preventing delays in subsequent I/O requests and reducing the long-tail latency associated with DM-SMR disks. The experimental results demonstrate a substantial reduction in the long-tail latency of DM-SMR disks through the proposed method.

DM-SMR（设备管理磁记录）磁盘分配了一部分磁盘空间作为持久缓存（PC），以解决数据更新时磁道重叠的问题。当 PC 空间不足时，会触发空间清理以回收无效空间。然而，空间清理非常耗时，而且会导致 DM-SMR 磁盘的长尾延迟。在本文中，我们将提出一种空间粒度清理方法，利用各种空闲时间有效减少 DM-SMR 磁盘的长尾延迟。其目的是在适当的时间段对适当的空间区域执行适当的空间粒度清理，从而防止后续 I/O 请求的延迟，减少与 DM-SMR 磁盘相关的长尾延迟。实验结果表明，通过所建议的方法，DM-SMR 磁盘的长尾延迟大幅减少。

引用次数: 0

Compact Instruction Set Extensions for Dilithium Dilithium 紧凑型指令集扩展

IF 2 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Embedded Computing Systems

Pub Date : 2024-02-02 DOI: 10.1145/3643826

Lu Li, Qi Tian, Guofeng Qin, Shuaiyu Chen, Weijia Wang

Post-quantum cryptography is considered to provide security against both traditional and quantum computer attacks. Dilithium is a digital signature algorithm that derives its security from the challenge of finding short vectors in lattices. It has been selected as one of the standardizations in the NIST post-quantum cryptography project. Hardware-software co-design is a commonly adopted implementation strategy to address various implementation challenges, including limited resources, high performance, and flexibility requirements. In this study, we investigate using compact instruction set extensions (ISEs) for Dilithium, aiming to improve software efficiency with low hardware overheads. To begin with, we propose tightly coupled accelerators that are deeply integrated into the RISC-V processor. These accelerators target the most computationally demanding components in resource-constrained processors, such as polynomial generation, Number Theoretic Transform (NTT), and modular arithmetic. Next, we design a set of custom instructions that seamlessly integrate with the RISC-V base instruction formats, completing the accelerators in a compact manner. Subsequently, we implement our ISEs in a chip design for the Hummingbird E203 core and conduct performance benchmarks for Dilithium utilizing these ISEs. Additionally, we evaluate the resource consumption of the ISEs on FPGA and ASIC technologies. Compared to the reference software implementation on the RISC-V core, our co-design demonstrates a remarkable speedup factor ranging from 6.95 to 9.96. This significant improvement in performance is achieved by incorporating additional hardware resources, specifically, a (35% ) increase in LUTs, a (14% ) increase in FFs, 7 additional DSPs, and no additional RAM. Furthermore, compared to the state-of-the-art approach, our work achieves faster speed performance with a reduced circuit cost. Specifically, the usage of additional LUTs, FFs, and RAMs is reduced by (47.53% ), (50.43% ), and (100% ), respectively. On ASIC technology, our approach demonstrates 12 412 cell counts. Our co-design provides a better trade-off implementation on speed performance and circuit overheads.

后量子加密算法被认为能提供安全防护，既能抵御传统计算机攻击，也能抵御量子计算机攻击。Dilithium 是一种数字签名算法，其安全性来自于在网格中寻找短向量的挑战。它已被选为 NIST 后量子加密项目的标准化算法之一。硬件-软件协同设计是一种普遍采用的实现策略，以应对各种实现挑战，包括有限的资源、高性能和灵活性要求。在本研究中，我们研究了如何为 Dilithium 使用紧凑型指令集扩展（ISE），旨在以较低的硬件开销提高软件效率。首先，我们提出了与 RISC-V 处理器深度集成的紧密耦合加速器。这些加速器针对资源受限处理器中计算要求最高的组件，如多项式生成、数论变换（NTT）和模块化算术。接下来，我们设计了一套自定义指令，与 RISC-V 基本指令格式无缝集成，以紧凑的方式完成加速器。随后，我们在蜂鸟 E203 内核的芯片设计中实现了 ISE，并利用这些 ISE 对 Dilithium 进行了性能基准测试。此外，我们还评估了 ISE 在 FPGA 和 ASIC 技术上的资源消耗。与 RISC-V 内核上的参考软件实现相比，我们的协同设计实现了 6.95 到 9.96 的显著提速。性能的大幅提升是通过增加硬件资源实现的，具体来说，LUT增加了35%，FF增加了14%，增加了7个DSP，但没有增加RAM。此外，与最先进的方法相比，我们的工作实现了更快的速度性能，同时降低了电路成本。具体来说，额外的LUT、FF和RAM的使用分别减少了47.53%、50.43%和100%。在 ASIC 技术上，我们的方法展示了 12 412 个单元数。我们的协同设计在速度性能和电路开销之间实现了更好的权衡。

{"title":"Compact Instruction Set Extensions for Dilithium","authors":"Lu Li, Qi Tian, Guofeng Qin, Shuaiyu Chen, Weijia Wang","doi":"10.1145/3643826","DOIUrl":"https://doi.org/10.1145/3643826","url":null,"abstract":"Post-quantum cryptography is considered to provide security against both traditional and quantum computer attacks. Dilithium is a digital signature algorithm that derives its security from the challenge of finding short vectors in lattices. It has been selected as one of the standardizations in the NIST post-quantum cryptography project. Hardware-software co-design is a commonly adopted implementation strategy to address various implementation challenges, including limited resources, high performance, and flexibility requirements. In this study, we investigate using compact instruction set extensions (ISEs) for Dilithium, aiming to improve software efficiency with low hardware overheads. To begin with, we propose tightly coupled accelerators that are deeply integrated into the RISC-V processor. These accelerators target the most computationally demanding components in resource-constrained processors, such as polynomial generation, Number Theoretic Transform (NTT), and modular arithmetic. Next, we design a set of custom instructions that seamlessly integrate with the RISC-V base instruction formats, completing the accelerators in a compact manner. Subsequently, we implement our ISEs in a chip design for the Hummingbird E203 core and conduct performance benchmarks for Dilithium utilizing these ISEs. Additionally, we evaluate the resource consumption of the ISEs on FPGA and ASIC technologies. Compared to the reference software implementation on the RISC-V core, our co-design demonstrates a remarkable speedup factor ranging from 6.95 to 9.96. This significant improvement in performance is achieved by incorporating additional hardware resources, specifically, a (35% ) increase in LUTs, a (14% ) increase in FFs, 7 additional DSPs, and no additional RAM. Furthermore, compared to the state-of-the-art approach, our work achieves faster speed performance with a reduced circuit cost. Specifically, the usage of additional LUTs, FFs, and RAMs is reduced by (47.53% ), (50.43% ), and (100% ), respectively. On ASIC technology, our approach demonstrates 12 412 cell counts. Our co-design provides a better trade-off implementation on speed performance and circuit overheads.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"1 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139669646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Flexible Updating of Internet of Things Computing Functions through Optimizing Dynamic Partial Reconfiguration 通过优化动态局部重配置灵活更新物联网计算功能

IF 2 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Embedded Computing Systems

Pub Date : 2024-02-01 DOI: 10.1145/3643825

George Kornaros, Svoronos Leivadaros, Filippos Kolimbianakis

With applications to become increasingly compute- and data-intensive requiring more processing power, many internet-of-things (IoT) platforms in robots, drones, and autonomous vehicles which implement neural network inference, cryptographic functions or signal processing (e.g., multimedia, communication), employ field programmable gate arrays (FPGAs). At the same time, dynamic partial reconfiguration (DPR) in modern FPGAs enable changing the function of a part of the FPGA by dynamically loading new bitstreams to the logic regions without affecting the function of other parts of the FPGA. This is especially useful, to update functions of IoT devices while in operation, for bug fixing or functionality adjustments, and more importantly when these IoT devices integrate low-cost FPGAs that can hardly realize many hard accelerators. To deal with one of the major limitations of using partial reconfiguration in IoT devices, this work introduces techniques to flexibly use DPR, namely FLEXDPR, by sharing reconfigurable partitions among different accelerator functions and by supporting virtual relocation of these functions. Experimental results on the Xilinx ZYNQ-7000 platform reveal energy and latency efficiency improvements of, on average, about 20%. Overall, the suggested approach can reduce partial reconfiguration overhead while easing the scheduler’s decisions for the deployment of hardware functions throughout time and space in a performance-conscious manner.

随着需要更强处理能力的计算和数据密集型应用日益增多，机器人、无人机和自动驾驶汽车中的许多物联网（IoT）平台都采用了现场可编程门阵列（FPGA），这些平台实现了神经网络推理、加密功能或信号处理（如多媒体、通信）。与此同时，现代 FPGA 中的动态部分重新配置（DPR）功能可通过向逻辑区域动态加载新的比特流来改变 FPGA 部分的功能，而不会影响 FPGA 其他部分的功能。这对于物联网设备在运行过程中更新功能、修复错误或调整功能特别有用，更重要的是，当这些物联网设备集成了低成本 FPGA 时，很难实现许多硬加速器。为了解决在物联网设备中使用部分重新配置的主要限制之一，这项工作引入了灵活使用 DPR 的技术，即 FLEXDPR，在不同加速器功能之间共享可重新配置的分区，并支持这些功能的虚拟重定位。在赛灵思 ZYNQ-7000 平台上的实验结果表明，能量和延迟效率平均提高了约 20%。总体而言，所建议的方法可以减少部分重新配置开销，同时以注重性能的方式简化调度器在整个时间和空间内部署硬件功能的决策。

{"title":"Flexible Updating of Internet of Things Computing Functions through Optimizing Dynamic Partial Reconfiguration","authors":"George Kornaros, Svoronos Leivadaros, Filippos Kolimbianakis","doi":"10.1145/3643825","DOIUrl":"https://doi.org/10.1145/3643825","url":null,"abstract":"With applications to become increasingly compute- and data-intensive requiring more processing power, many internet-of-things (IoT) platforms in robots, drones, and autonomous vehicles which implement neural network inference, cryptographic functions or signal processing (e.g., multimedia, communication), employ field programmable gate arrays (FPGAs). At the same time, dynamic partial reconfiguration (DPR) in modern FPGAs enable changing the function of a part of the FPGA by dynamically loading new bitstreams to the logic regions without affecting the function of other parts of the FPGA. This is especially useful, to update functions of IoT devices while in operation, for bug fixing or functionality adjustments, and more importantly when these IoT devices integrate low-cost FPGAs that can hardly realize many hard accelerators. To deal with one of the major limitations of using partial reconfiguration in IoT devices, this work introduces techniques to flexibly use DPR, namely FLEXDPR, by sharing reconfigurable partitions among different accelerator functions and by supporting virtual relocation of these functions. Experimental results on the Xilinx ZYNQ-7000 platform reveal energy and latency efficiency improvements of, on average, about 20%. Overall, the suggested approach can reduce partial reconfiguration overhead while easing the scheduler’s decisions for the deployment of hardware functions throughout time and space in a performance-conscious manner.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"6 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139656828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Customized FPGA Implementation of Authenticated Lightweight Cipher Fountain for IoT Systems 为物联网系统定制的验证型轻量级密码泉 FPGA 实现

IF 2 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Embedded Computing Systems

Pub Date : 2024-01-26 DOI: 10.1145/3643039

Zhengyuan Shi, Cheng Chen, Gangqiang Yang, Hongchao Zhou, Hailiang Xiong, Zhiguo Wan

Authenticated Encryption with Associated-Data (AEAD) can ensure both confidentiality and integrity of information in encrypted communication. Distinctive variants are customized from AEAD to satisfy various requirements. In this paper, we take a 128-bit lightweight AEAD stream cipher Fountain as an example. We provide a general cryptographic solution with three Fountain variants. These three variants are for encryption, message authentication code (MAC) generation, and authenticated encryption with associated data, respectively. Besides, we propose area-saved and throughput-improved strategies for the FPGA implementation of Fountain. The conventional paralleled hardware implementation leads to much resource-consuming with higher parallel width. We propose a hybrid architecture with parallel and serial update modes simultaneously. We also analyze the trade-off between area occupation and authentication latency for those two architectures. According to our discussion, hybrid architectures can perform efficiently with higher throughput than most ciphers, including Grain-128 x32. Our Fountain keystream generator occupies 46 slices on Spartan-3 FPGAs, smaller than most ciphers with the same security level, and even smaller than the 80-bit security level cipher Trivium. In summary, the customized Fountain with optimized implementations on FPGA is suitable for various applications in the field of IoT.

关联数据认证加密（AEAD）可确保加密通信中信息的保密性和完整性。根据 AEAD 定制了不同的变体，以满足各种需求。本文以 128 位轻量级 AEAD 流密码 Fountain 为例。我们提供了一种具有三种 Fountain 变体的通用加密解决方案。这三个变体分别用于加密、生成消息验证码（MAC）和带关联数据的验证加密。此外，我们还为 Fountain 的 FPGA 实现提出了节省面积和提高吞吐量的策略。传统的并行硬件实现会随着并行宽度的增加而消耗大量资源。我们提出了一种同时采用并行和串行更新模式的混合架构。我们还分析了这两种架构在面积占用和验证延迟之间的权衡。根据我们的讨论，混合架构可以比大多数密码（包括 Grain-128 x32）具有更高的吞吐量。我们的 Fountain 密钥流生成器在 Spartan-3 FPGA 上只占用 46 个切片，小于具有相同安全级别的大多数密码，甚至小于 80 位安全级别密码 Trivium。总之，在 FPGA 上优化实现的定制 Fountain 适用于物联网领域的各种应用。

{"title":"Customized FPGA Implementation of Authenticated Lightweight Cipher Fountain for IoT Systems","authors":"Zhengyuan Shi, Cheng Chen, Gangqiang Yang, Hongchao Zhou, Hailiang Xiong, Zhiguo Wan","doi":"10.1145/3643039","DOIUrl":"https://doi.org/10.1145/3643039","url":null,"abstract":"Authenticated Encryption with Associated-Data (AEAD) can ensure both confidentiality and integrity of information in encrypted communication. Distinctive variants are customized from AEAD to satisfy various requirements. In this paper, we take a 128-bit lightweight AEAD stream cipher Fountain as an example. We provide a general cryptographic solution with three Fountain variants. These three variants are for encryption, message authentication code (MAC) generation, and authenticated encryption with associated data, respectively. Besides, we propose area-saved and throughput-improved strategies for the FPGA implementation of Fountain. The conventional paralleled hardware implementation leads to much resource-consuming with higher parallel width. We propose a hybrid architecture with parallel and serial update modes simultaneously. We also analyze the trade-off between area occupation and authentication latency for those two architectures. According to our discussion, hybrid architectures can perform efficiently with higher throughput than most ciphers, including Grain-128 x32. Our Fountain keystream generator occupies 46 slices on Spartan-3 FPGAs, smaller than most ciphers with the same security level, and even smaller than the 80-bit security level cipher Trivium. In summary, the customized Fountain with optimized implementations on FPGA is suitable for various applications in the field of IoT.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"151 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139589025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Intelligent Caching for Vehicular Dew Computing in Poor Network Connectivity Environments 网络连接不畅环境下的车载露点计算智能缓存

IF 2 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Embedded Computing Systems

Pub Date : 2024-01-25 DOI: 10.1145/3643038

Liang Zhao, Hongxuan Li, Enchao Zhang, Ammar Hawbani, Mingwei Lin, Shaohua Wan, Mohsen Guizani

In vehicular networks, some edge servers may not function properly due to the time-varying load condition and the uneven computing resource distribution, resulting in a low quality of caching services. To overcome this challenge, we develop a Vehicular dew computing (VDC) architecture for the first time by combining dew computing with vehicular networks, which can achieve wireless communication between vehicles in a resource-constrained environment. Consequently, it is crucial to develop an adaptive caching scheme that empowers vehicles to form efficient cooperation in VDC. In this paper, we propose an intelligent caching scheme based on VDC architecture, which includes two parts. First, to meet the dynamic nature of VDC, a spatiotemporal vehicle clustering algorithm is proposed to establish adaptive cooperation to assist content caching for vehicles. Second, the multi-armed bandit algorithm is employed to select suitable content for caching in vehicles based on real-time file popularity, and a model is established to dynamically update each vehicle’s request preferences. Extensive experiments are conducted to demonstrate that the proposed scheme has excellent performance in terms of cluster head stability and cache hit rate.

在车载网络中，一些边缘服务器可能会因为负载条件的时变性和计算资源的不均衡分布而无法正常工作，导致缓存服务质量低下。为了克服这一难题，我们首次将露水计算与车载网络相结合，开发了一种车载露水计算（VDC）架构，可以在资源受限的环境中实现车辆之间的无线通信。因此，开发一种自适应缓存方案，使车辆在 VDC 中形成高效合作至关重要。本文提出了一种基于 VDC 架构的智能缓存方案，包括两个部分。首先，针对 VDC 的动态特性，提出了一种时空车辆聚类算法，以建立自适应合作，协助车辆进行内容缓存。其次，采用多臂匪徒算法，根据实时文件流行度选择合适的内容缓存到车辆中，并建立模型动态更新每辆车的请求偏好。广泛的实验证明，所提出的方案在簇头稳定性和缓存命中率方面都有出色的表现。

{"title":"Intelligent Caching for Vehicular Dew Computing in Poor Network Connectivity Environments","authors":"Liang Zhao, Hongxuan Li, Enchao Zhang, Ammar Hawbani, Mingwei Lin, Shaohua Wan, Mohsen Guizani","doi":"10.1145/3643038","DOIUrl":"https://doi.org/10.1145/3643038","url":null,"abstract":"In vehicular networks, some edge servers may not function properly due to the time-varying load condition and the uneven computing resource distribution, resulting in a low quality of caching services. To overcome this challenge, we develop a Vehicular dew computing (VDC) architecture for the first time by combining dew computing with vehicular networks, which can achieve wireless communication between vehicles in a resource-constrained environment. Consequently, it is crucial to develop an adaptive caching scheme that empowers vehicles to form efficient cooperation in VDC. In this paper, we propose an intelligent caching scheme based on VDC architecture, which includes two parts. First, to meet the dynamic nature of VDC, a spatiotemporal vehicle clustering algorithm is proposed to establish adaptive cooperation to assist content caching for vehicles. Second, the multi-armed bandit algorithm is employed to select suitable content for caching in vehicles based on real-time file popularity, and a model is established to dynamically update each vehicle’s request preferences. Extensive experiments are conducted to demonstrate that the proposed scheme has excellent performance in terms of cluster head stability and cache hit rate.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"76 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139553715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PolyARBerNN: A Neural Network Guided Solver and Optimizer for Bounded Polynomial Inequalities PolyARBerNN：神经网络引导的有界多项式不等式求解器和优化器

IF 2 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Embedded Computing Systems

Pub Date : 2024-01-24 DOI: 10.1145/3632970

Wael Fatnassi, Yasser Shoukry

Constraints solvers play a significant role in the analysis, synthesis, and formal verification of complex cyber-physical systems. In this paper, we study the problem of designing a scalable constraints solver for an important class of constraints named polynomial constraint inequalities (also known as nonlinear real arithmetic theory). In this paper, we introduce a solver named PolyARBerNN that uses convex polynomials as abstractions for highly nonlinears polynomials. Such abstractions were previously shown to be powerful to prune the search space and restrict the usage of sound and complete solvers to small search spaces. Compared with the previous efforts on using convex abstractions, PolyARBerNN provides three main contributions namely (i) a neural network guided abstraction refinement procedure that helps selecting the right abstraction out of a set of pre-defined abstractions, (ii) a Bernstein polynomial-based search space pruning mechanism that can be used to compute tight estimates of the polynomial maximum and minimum values which can be used as an additional abstraction of the polynomials, and (iii) an optimizer that transforms polynomial objective functions into polynomial constraints (on the gradient of the objective function) whose solutions are guaranteed to be close to the global optima. These enhancements together allowed the PolyARBerNN solver to solve complex instances and scales more favorably compared to the state-of-art nonlinear real arithmetic solvers while maintaining the soundness and completeness of the resulting solver. In particular, our test benches show that PolyARBerNN achieved 100X speedup compared with Z3 8.9, Yices 2.6, and PVS (a solver that uses Bernstein expansion to solve multivariate polynomial constraints) on a variety of standard test benches. Finally, we implemented an optimizer called PolyAROpt that uses PolyARBerNN to solve constrained polynomial optimization problems. Numerical results show that PolyAROpt is able to solve high-dimensional and high order polynomial optimization problems with higher speed compared to the built-in optimizer in the Z3 8.9 solver.

约束求解器在复杂网络物理系统的分析、综合和形式验证中发挥着重要作用。在本文中，我们研究了为一类重要的约束条件设计可扩展的约束条件求解器的问题，这类约束条件被命名为多项式约束不等式（也称为非线性实算术理论）。在本文中，我们介绍了一种名为 PolyARBerNN 的求解器，它使用凸多项式作为高度非线性多项式的抽象。以前的研究表明，这种抽象具有强大的剪裁搜索空间的能力，并能将完善的求解器限制在较小的搜索空间内。与以往使用凸抽象的方法相比，PolyARBerNN 有三个主要贡献：(i) 神经网络引导的抽象完善程序，有助于从一组预定义的抽象中选择正确的抽象、(ii) 基于伯恩斯坦多项式的搜索空间剪枝机制，可用于计算多项式最大值和最小值的严格估计值，这些估计值可用作多项式的额外抽象；以及 (iii) 优化器，可将多项式目标函数转化为多项式约束（目标函数梯度），其解决方案保证接近全局最优。这些改进使得 PolyARBerNN 求解器能够求解复杂的实例，与最先进的非线性实算术求解器相比，其规模更大，同时保持了求解器的合理性和完整性。特别是，我们的测试平台显示，在各种标准测试平台上，PolyARBerNN 与 Z3 8.9、Yices 2.6 和 PVS（一种使用伯恩斯坦展开求解多元多项式约束的求解器）相比，速度提高了 100 倍。最后，我们实施了一个名为 PolyAROpt 的优化器，它使用 PolyARBerNN 解决多项式约束优化问题。数值结果表明，与 Z3 8.9 求解器中的内置优化器相比，PolyAROpt 能够以更快的速度解决高维和高阶多项式优化问题。

{"title":"PolyARBerNN: A Neural Network Guided Solver and Optimizer for Bounded Polynomial Inequalities","authors":"Wael Fatnassi, Yasser Shoukry","doi":"10.1145/3632970","DOIUrl":"https://doi.org/10.1145/3632970","url":null,"abstract":"Constraints solvers play a significant role in the analysis, synthesis, and formal verification of complex cyber-physical systems. In this paper, we study the problem of designing a scalable constraints solver for an important class of constraints named polynomial constraint inequalities (also known as nonlinear real arithmetic theory). In this paper, we introduce a solver named PolyARBerNN that uses convex polynomials as abstractions for highly nonlinears polynomials. Such abstractions were previously shown to be powerful to prune the search space and restrict the usage of sound and complete solvers to small search spaces. Compared with the previous efforts on using convex abstractions, PolyARBerNN provides three main contributions namely (i) a neural network guided abstraction refinement procedure that helps selecting the right abstraction out of a set of pre-defined abstractions, (ii) a Bernstein polynomial-based search space pruning mechanism that can be used to compute tight estimates of the polynomial maximum and minimum values which can be used as an additional abstraction of the polynomials, and (iii) an optimizer that transforms polynomial objective functions into polynomial constraints (on the gradient of the objective function) whose solutions are guaranteed to be close to the global optima. These enhancements together allowed the PolyARBerNN solver to solve complex instances and scales more favorably compared to the state-of-art nonlinear real arithmetic solvers while maintaining the soundness and completeness of the resulting solver. In particular, our test benches show that PolyARBerNN achieved 100X speedup compared with Z3 8.9, Yices 2.6, and PVS (a solver that uses Bernstein expansion to solve multivariate polynomial constraints) on a variety of standard test benches. Finally, we implemented an optimizer called PolyAROpt that uses PolyARBerNN to solve constrained polynomial optimization problems. Numerical results show that PolyAROpt is able to solve high-dimensional and high order polynomial optimization problems with higher speed compared to the built-in optimizer in the Z3 8.9 solver.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"7 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139553761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adversarial Transferability in Embedded Sensor Systems: An Activity Recognition Perspective 嵌入式传感器系统中的对抗可转移性：活动识别视角

IF 2 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Embedded Computing Systems

Pub Date : 2024-01-22 DOI: 10.1145/3641861

Ramesh Kumar Sah, Hassan Ghasemzadeh

Machine learning algorithms are increasingly used for inference and decision-making in embedded systems. Data from sensors are used to train machine learning models for various smart functions of embedded and cyber-physical systems ranging from applications in healthcare, autonomous vehicles, and national security. However, recent studies have shown that machine learning models can be fooled by adding adversarial noise to their inputs. The perturbed inputs are called adversarial examples. Furthermore, adversarial examples designed to fool one machine learning system are also often effective against another system. This property of adversarial examples is called adversarial transferability and has not been explored in wearable systems to date. In this work, we take the first stride in studying adversarial transferability in wearable sensor systems from four viewpoints: (1) transferability between machine learning models; (2) transferability across users/subjects of the embedded system; (3) transferability across sensor body locations; and (4) transferability across datasets used for model training. We present a set of carefully designed experiments to investigate these transferability scenarios. We also propose a threat model describing the interactions of an adversary with the source and target sensor systems in different transferability settings. In most cases, we found high untargeted transferability, whereas targeted transferability success scores varied from (0% ) to (80% ). The transferability of adversarial examples depends on many factors such as the inclusion of data from all subjects, sensor body position, number of samples in the dataset, type of learning algorithm, and the distribution of source and target system dataset. The transferability of adversarial examples decreased sharply when the data distribution of the source and target system became more distinct. We also provide guidelines and suggestions for the community for designing robust sensor systems. Code and dataset used in our analysis is publicly available here.

机器学习算法越来越多地用于嵌入式系统的推理和决策。来自传感器的数据被用来训练机器学习模型，以实现嵌入式系统和网络物理系统的各种智能功能，包括医疗保健、自动驾驶汽车和国家安全等领域的应用。然而，最近的研究表明，机器学习模型可以通过在其输入中添加对抗性噪声来欺骗用户。扰动输入被称为对抗示例。此外，旨在愚弄一个机器学习系统的对抗性示例往往对另一个系统也很有效。对抗性示例的这一特性被称为对抗性可转移性，迄今为止，可穿戴系统尚未对这一特性进行探索。在这项工作中，我们首次从四个角度研究了可穿戴传感器系统中的对抗可转移性：(1) 机器学习模型之间的可转移性；(2) 嵌入式系统用户/对象之间的可转移性；(3) 传感器身体位置之间的可转移性；(4) 模型训练所用数据集之间的可转移性。我们提出了一系列精心设计的实验来研究这些可转移性情况。我们还提出了一个威胁模型，描述了在不同的可转移性设置下，对手与源传感器系统和目标传感器系统之间的相互作用。在大多数情况下，我们发现非目标可转移性很高，而目标可转移性的成功率从（0%）到（80%）不等。对抗示例的可转移性取决于很多因素，如是否包含所有被试的数据、传感器的身体位置、数据集中的样本数量、学习算法的类型以及源和目标系统数据集的分布。当源系统和目标系统的数据分布变得更加不同时，对抗示例的可转移性就会急剧下降。我们还为设计稳健的传感器系统提供了指导和建议。我们在分析中使用的代码和数据集可在此公开获取。

{"title":"Adversarial Transferability in Embedded Sensor Systems: An Activity Recognition Perspective","authors":"Ramesh Kumar Sah, Hassan Ghasemzadeh","doi":"10.1145/3641861","DOIUrl":"https://doi.org/10.1145/3641861","url":null,"abstract":"Machine learning algorithms are increasingly used for inference and decision-making in embedded systems. Data from sensors are used to train machine learning models for various smart functions of embedded and cyber-physical systems ranging from applications in healthcare, autonomous vehicles, and national security. However, recent studies have shown that machine learning models can be fooled by adding adversarial noise to their inputs. The perturbed inputs are called adversarial examples. Furthermore, adversarial examples designed to fool one machine learning system are also often effective against another system. This property of adversarial examples is called adversarial transferability and has not been explored in wearable systems to date. In this work, we take the first stride in studying adversarial transferability in wearable sensor systems from four viewpoints: (1) transferability between machine learning models; (2) transferability across users/subjects of the embedded system; (3) transferability across sensor body locations; and (4) transferability across datasets used for model training. We present a set of carefully designed experiments to investigate these transferability scenarios. We also propose a threat model describing the interactions of an adversary with the source and target sensor systems in different transferability settings. In most cases, we found high untargeted transferability, whereas targeted transferability success scores varied from (0% ) to (80% ). The transferability of adversarial examples depends on many factors such as the inclusion of data from all subjects, sensor body position, number of samples in the dataset, type of learning algorithm, and the distribution of source and target system dataset. The transferability of adversarial examples decreased sharply when the data distribution of the source and target system became more distinct. We also provide guidelines and suggestions for the community for designing robust sensor systems. Code and dataset used in our analysis is publicly available here.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"53 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139517265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Stash: Flexible Energy Storage for Intermittent Sensors 储藏室：间歇式传感器的灵活储能技术

IF 2 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Embedded Computing Systems

Pub Date : 2024-01-19 DOI: 10.1145/3641511

Arwa Alsubhi, Simeon Babatunde, Nicole Tobias, Jacob Sorber

Batteryless sensors promise a sustainable future for sensing, but they face significant challenges when storing and using environmental energy. Incoming energy can fluctuate unpredictably between periods of scarcity and abundance, and device performance depends on both incoming energy and how much a device can store. Existing batteryless devices have used fixed or run-time selectable front-end capacitor banks to meet the energy needs of different tasks. Neither approach adapts well to rapidly changing energy harvesting conditions, nor does it allow devices to store excess energy during times of abundance without sacrificing performance.

This paper presents Stash, a hardware back-end energy storage technique that allows batteryless devices to charge quickly and store excess energy when it is abundant, extending their operating time and carrying out additional tasks without compromising the main ones. Stash performs like a small capacitor device when small capacitors excel and like a large capacitor device when large capacitors excel, with no additional software complexity and negligible power overhead. We evaluate Stash using two applications—temperature sensing and wearable activity monitoring—under both synthetic solar energy and recorded solar and thermal traces from various human activities. Our results show that Stash increased sensor coverage by up to 15% under variable energy-harvesting conditions when compared to competitor configurations that used fixed small, large, and reconfigurable front-end energy storage.

无电池传感器为传感技术带来了可持续发展的未来，但它们在存储和使用环境能源时面临着巨大挑战。输入的能量会在稀缺期和充裕期之间不可预测地波动，而设备的性能取决于输入的能量和设备可存储的能量。现有的无电池设备使用固定或运行时间可选的前端电容器组来满足不同任务的能源需求。这两种方法都不能很好地适应瞬息万变的能量采集条件，也不能让设备在能量充沛时储存多余的能量而不影响性能。本文介绍了一种硬件后端储能技术--Stash，它允许无电池设备在能量充足时快速充电并存储多余能量，从而延长设备的工作时间，并在不影响主要任务的情况下执行额外任务。当小型电容器性能出众时，Stash 的表现就像小型电容器设备；当大型电容器性能出众时，Stash 的表现就像大型电容器设备，而且不增加软件复杂性，功耗也可忽略不计。我们利用两个应用--温度传感和可穿戴活动监测--在合成太阳能和来自各种人类活动的太阳能和热能记录下对 Stash 进行了评估。我们的结果表明，与使用固定的小型、大型和可重新配置的前端储能器的竞争对手配置相比，Stash 在可变的能量收集条件下将传感器覆盖率提高了 15%。

{"title":"Stash: Flexible Energy Storage for Intermittent Sensors","authors":"Arwa Alsubhi, Simeon Babatunde, Nicole Tobias, Jacob Sorber","doi":"10.1145/3641511","DOIUrl":"https://doi.org/10.1145/3641511","url":null,"abstract":"Batteryless sensors promise a sustainable future for sensing, but they face significant challenges when storing and using environmental energy. Incoming energy can fluctuate unpredictably between periods of scarcity and abundance, and device performance depends on both incoming energy and how much a device can store. Existing batteryless devices have used fixed or run-time selectable front-end capacitor banks to meet the energy needs of different tasks. Neither approach adapts well to rapidly changing energy harvesting conditions, nor does it allow devices to store excess energy during times of abundance without sacrificing performance. This paper presents Stash, a hardware back-end energy storage technique that allows batteryless devices to charge quickly and store excess energy when it is abundant, extending their operating time and carrying out additional tasks without compromising the main ones. Stash performs like a small capacitor device when small capacitors excel and like a large capacitor device when large capacitors excel, with no additional software complexity and negligible power overhead. We evaluate Stash using two applications—temperature sensing and wearable activity monitoring—under both synthetic solar energy and recorded solar and thermal traces from various human activities. Our results show that Stash increased sensor coverage by up to 15% under variable energy-harvesting conditions when compared to competitor configurations that used fixed small, large, and reconfigurable front-end energy storage.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139499446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0