首页 > 最新文献

IEEE Transactions on Emerging Topics in Computing最新文献

英文 中文
FHEmem: A Processing In-Memory Accelerator for Fully Homomorphic Encryption 全同态加密的内存处理加速器
IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-17 DOI: 10.1109/TETC.2025.3528862
Minxuan Zhou;Yujin Nam;Pranav Gangwar;Weihong Xu;Arpan Dutta;Chris Wilkerson;Rosario Cammarota;Saransh Gupta;Tajana Rosing
Fully Homomorphic Encryption (FHE) is a technique that allows arbitrary computations to be performed on encrypted data without the need for decryption, making it ideal for secure computation outsourcing. However, computation on FHE-encrypted data is significantly slower than that on plain data, primarily due to the explosive increases in data size and computation complexity after encryption. To enable real-world FHE applications, recent research has proposed several custom hardware accelerators that provide orders of magnitude speedup over conventional systems. However, the performance of existing FHE accelerators is severely bounded by memory bandwidth, even with expensive on-chip buffers. Processing In-Memory (PIM) is a promising technology that can accelerate data-intensive workloads with extensive internal bandwidth. Unfortunately, existing PIM accelerators cannot efficiently support FHE due to the limited throughput to support FHE’s complex computing and data movement operations. To tackle such challenges, we propose FHEmem, an FHE accelerator using a novel PIM architecture for high-throughput FHE acceleration. Furthermore, we present an optimized end-to-end processing flow with an automated mapping framework to maximize the hardware utilization of FHEmem. Our evaluation shows that FHEmem achieves at least 4.0× speedup and 6.9× energy-delay-area efficiency improvement over state-of-the-art FHE accelerators on popular FHE applications.
完全同态加密(FHE)是一种允许在不需要解密的情况下对加密数据执行任意计算的技术,使其成为安全计算外包的理想选择。但是,fhe加密数据的计算速度明显慢于平面数据,这主要是由于加密后数据大小和计算复杂度的爆炸式增长。为了实现现实世界的FHE应用,最近的研究提出了几种定制硬件加速器,它们比传统系统提供了数量级的加速。然而,现有的FHE加速器的性能受到内存带宽的严重限制,即使有昂贵的片上缓冲器。内存中处理(PIM)是一种很有前途的技术,它可以利用广泛的内部带宽加速数据密集型工作负载。不幸的是,现有的PIM加速器无法有效地支持FHE,因为支持FHE复杂计算和数据移动操作的吞吐量有限。为了应对这些挑战,我们提出了FHEmem,一种使用新颖PIM架构的FHE加速器,用于高通量FHE加速。此外,我们提出了一个优化的端到端处理流程和一个自动映射框架,以最大限度地提高FHEmem的硬件利用率。我们的评估表明,在流行的FHE应用中,与最先进的FHE加速器相比,FHEmem实现了至少4.0倍的加速和6.9倍的能量延迟区域效率提高。
{"title":"FHEmem: A Processing In-Memory Accelerator for Fully Homomorphic Encryption","authors":"Minxuan Zhou;Yujin Nam;Pranav Gangwar;Weihong Xu;Arpan Dutta;Chris Wilkerson;Rosario Cammarota;Saransh Gupta;Tajana Rosing","doi":"10.1109/TETC.2025.3528862","DOIUrl":"https://doi.org/10.1109/TETC.2025.3528862","url":null,"abstract":"Fully Homomorphic Encryption (FHE) is a technique that allows arbitrary computations to be performed on encrypted data without the need for decryption, making it ideal for secure computation outsourcing. However, computation on FHE-encrypted data is significantly slower than that on plain data, primarily due to the explosive increases in data size and computation complexity after encryption. To enable real-world FHE applications, recent research has proposed several custom hardware accelerators that provide orders of magnitude speedup over conventional systems. However, the performance of existing FHE accelerators is severely bounded by memory bandwidth, even with expensive on-chip buffers. Processing In-Memory (PIM) is a promising technology that can accelerate data-intensive workloads with extensive internal bandwidth. Unfortunately, existing PIM accelerators cannot efficiently support FHE due to the limited throughput to support FHE’s complex computing and data movement operations. To tackle such challenges, we propose FHEmem, an FHE accelerator using a novel PIM architecture for high-throughput FHE acceleration. Furthermore, we present an optimized end-to-end processing flow with an automated mapping framework to maximize the hardware utilization of FHEmem. Our evaluation shows that FHEmem achieves at least 4.0× speedup and 6.9× energy-delay-area efficiency improvement over state-of-the-art FHE accelerators on popular FHE applications.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 4","pages":"1367-1382"},"PeriodicalIF":5.4,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145729306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scatter-Gather DMA Performance Analysis Within an SoC-Based Control System for Trapped-Ion Quantum Computing 基于soc的捕获离子量子计算控制系统中的散射-聚集DMA性能分析
IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-17 DOI: 10.1109/TETC.2025.3528899
Tiamike Dudley;Jim Plusquellic;Eirini Eleni Tsiropoulou;Joshua Goldberg;Daniel Stick;Daniel Lobser
Scatter-gather dynamic-memory-access (SG-DMA) is utilized in applications that require high bandwidth and low latency data transfers between memory and peripherals, where data blocks, described using buffer descriptors (BDs), are distributed throughout the memory system. The data transfer organization and requirements of a Trapped-Ion Quantum Computer (TIQC) possess characteristics similar to those targeted by SG-DMA. In particular, the ion qubits in a TIQC are manipulated by applying control sequences consisting primarily of modulated laser pulses. These optical pulses are defined by parameters that are (re)configured by the electrical control system. Variations in the operating environment and equipment make it necessary to create and run a wide range of control sequence permutations, which can be well represented as BD regions distributed across the main memory. In this article, we experimentally evaluate the latency and throughput of SG-DMA on Xilinx radiofrequency SoC (RFSoC) devices under a variety of BD and payload sizes as a means of determining the benefits and limitations of an RFSoC system architecture for TIQC applications.
分散-收集动态内存访问(SG-DMA)用于在内存和外设之间需要高带宽和低延迟数据传输的应用程序,其中使用缓冲区描述符(bd)描述的数据块分布在整个内存系统中。捕获离子量子计算机(TIQC)的数据传输组织和要求与SG-DMA的目标具有相似的特征。特别是,TIQC中的离子量子位是通过应用主要由调制激光脉冲组成的控制序列来操纵的。这些光脉冲由电气控制系统(重新)配置的参数定义。操作环境和设备的变化使得有必要创建和运行范围广泛的控制序列排列,这可以很好地表示为分布在主存储器上的BD区域。在本文中,我们通过实验评估了赛灵思射频SoC (RFSoC)设备在各种BD和有效载荷大小下的SG-DMA延迟和吞吐量,作为确定RFSoC系统架构用于TIQC应用的优点和局限性的一种手段。
{"title":"Scatter-Gather DMA Performance Analysis Within an SoC-Based Control System for Trapped-Ion Quantum Computing","authors":"Tiamike Dudley;Jim Plusquellic;Eirini Eleni Tsiropoulou;Joshua Goldberg;Daniel Stick;Daniel Lobser","doi":"10.1109/TETC.2025.3528899","DOIUrl":"https://doi.org/10.1109/TETC.2025.3528899","url":null,"abstract":"Scatter-gather dynamic-memory-access (SG-DMA) is utilized in applications that require high bandwidth and low latency data transfers between memory and peripherals, where data blocks, described using buffer descriptors (BDs), are distributed throughout the memory system. The data transfer organization and requirements of a Trapped-Ion Quantum Computer (TIQC) possess characteristics similar to those targeted by SG-DMA. In particular, the ion qubits in a TIQC are manipulated by applying control sequences consisting primarily of modulated laser pulses. These optical pulses are defined by parameters that are (re)configured by the electrical control system. Variations in the operating environment and equipment make it necessary to create and run a wide range of control sequence permutations, which can be well represented as BD regions distributed across the main memory. In this article, we experimentally evaluate the latency and throughput of SG-DMA on Xilinx radiofrequency SoC (RFSoC) devices under a variety of BD and payload sizes as a means of determining the benefits and limitations of an RFSoC system architecture for TIQC applications.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"841-852"},"PeriodicalIF":5.4,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145050793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Deep Neural Network Reliability via Transient-Fault-Aware Design and Training 通过瞬时故障感知设计和训练提高深度神经网络可靠性
IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-10 DOI: 10.1109/TETC.2024.3520672
Fernando Fernandes dos Santos;Niccolò Cavagnero;Marco Ciccone;Giuseppe Averta;Angeliki Kritikakou;Olivier Sentieys;Paolo Rech;Tatiana Tommasi
Deep Neural Networks (DNNs) have revolutionized several fields, including safety- and mission-critical applications, such as autonomous driving and space exploration. However, recent studies have highlighted that transient hardware faults can corrupt the model's output, leading to high misprediction probabilities. Since traditional reliability strategies, based on modular hardware, software replications, or matrix multiplication checksum impose a high overhead, there is a pressing need for efficient and effective hardening solutions tailored for DNNs. In this article we present several network design choices and a training procedure that increase the robustness of standard deep models and thoroughly evaluate these strategies with experimental analyses on vision classification tasks. We name DieHardNet the specialized DNN obtained by applying all our hardening techniques that combine knowledge from experimental hardware faults characterization and machine learning studies. We conduct extensive ablation studies to quantify the reliability gain of each hardening component in DieHardNet. We perform over 10,000 instruction-level fault injections to validate our approach and expose DieHardNet executed on GPUs to an accelerated neutron beam equivalent to more than 570,000 years of natural radiation. Our evaluation demonstrates that DieHardNet can reduce the critical error rate (i.e., errors that modify the inference) up to 100 times compared to the unprotected baseline model, without causing any increase in inference time.
深度神经网络(dnn)已经彻底改变了几个领域,包括安全和关键任务应用,如自动驾驶和太空探索。然而,最近的研究强调,瞬态硬件故障会破坏模型的输出,导致高的错误预测概率。由于基于模块化硬件、软件复制或矩阵乘法校验和的传统可靠性策略带来了很高的开销,因此迫切需要为深度神经网络量身定制高效且有效的强化解决方案。在本文中,我们提出了几种网络设计选择和一个训练过程,以提高标准深度模型的鲁棒性,并通过视觉分类任务的实验分析彻底评估这些策略。我们将DieHardNet命名为专门的深度神经网络,该深度神经网络通过应用我们所有的强化技术获得,这些技术结合了实验硬件故障表征和机器学习研究的知识。我们进行了广泛的消融研究,以量化DieHardNet中每个硬化组件的可靠性增益。我们执行了超过10,000个指令级故障注入来验证我们的方法,并将在gpu上执行的DieHardNet暴露在相当于超过570,000年自然辐射的加速中子束中。我们的评估表明,与未受保护的基线模型相比,DieHardNet可以将临界错误率(即修改推理的错误)降低多达100倍,而不会导致推理时间的增加。
{"title":"Improving Deep Neural Network Reliability via Transient-Fault-Aware Design and Training","authors":"Fernando Fernandes dos Santos;Niccolò Cavagnero;Marco Ciccone;Giuseppe Averta;Angeliki Kritikakou;Olivier Sentieys;Paolo Rech;Tatiana Tommasi","doi":"10.1109/TETC.2024.3520672","DOIUrl":"https://doi.org/10.1109/TETC.2024.3520672","url":null,"abstract":"Deep Neural Networks (DNNs) have revolutionized several fields, including safety- and mission-critical applications, such as autonomous driving and space exploration. However, recent studies have highlighted that transient hardware faults can corrupt the model's output, leading to high misprediction probabilities. Since traditional reliability strategies, based on modular hardware, software replications, or matrix multiplication checksum impose a high overhead, there is a pressing need for efficient and effective hardening solutions tailored for DNNs. In this article we present several network design choices and a training procedure that increase the robustness of standard deep models and thoroughly evaluate these strategies with experimental analyses on vision classification tasks. We name <italic>DieHardNet</i> the specialized DNN obtained by applying all our hardening techniques that combine knowledge from experimental hardware faults characterization and machine learning studies. We conduct extensive ablation studies to quantify the reliability gain of each hardening component in DieHardNet. We perform over 10,000 instruction-level fault injections to validate our approach and expose DieHardNet executed on GPUs to an accelerated neutron beam equivalent to more than 570,000 years of natural radiation. Our evaluation demonstrates that DieHardNet can reduce the critical error rate (i.e., errors that modify the inference) up to 100 times compared to the unprotected baseline model, without causing any increase in inference time.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"829-840"},"PeriodicalIF":5.4,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145050993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Energy Efficient Approximate Computing Framework for DNN Acceleration Using a Probabilistic-Oriented Method 基于面向概率方法的DNN加速节能近似计算框架
IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-02 DOI: 10.1109/TETC.2024.3522307
Pengfei Huang;Ke Chen;Chenghua Wang;Weiqiang Liu
Approximate computing (AxC) has recently emerged as a successful approach for optimizing energy consumption in error-tolerant applications, such as deep neural networks (DNNs). The enormous model size and high computation cost of DNNs present significant challenges for deployment in energy-efficient and resource-constrained computing systems. Emerging DNN hardware accelerators based on AxC designs selectively approximate the non-critical segments of computation to address these challenges. However, a systematic and principled approach that incorporates domain knowledge and approximate hardware for optimal approximation is still lacking. In this paper, we propose a probabilistic-oriented AxC (PAxC) framework that provides high energy savings with acceptable quality by considering the overall probability effect of approximation. To achieve aggressive approximate designs, we utilize the minimum likelihood error to determine the AxC synergy profile at both application and circuit levels. This enables effective coordination of the trade-off between energy and accuracy. Compared with a baseline design, the power-delay product (PDP) is significantly reduced by up to 83.66% with an acceptable accuracy reduction. Simulation and a case study of the image process validate the effectiveness of the proposed framework.
近似计算(AxC)最近成为一种成功的方法,用于优化容错应用中的能耗,如深度神经网络(dnn)。深度神经网络庞大的模型尺寸和高昂的计算成本给在节能和资源受限的计算系统中的部署带来了重大挑战。新兴的基于AxC设计的深度神经网络硬件加速器选择性地近似计算的非关键部分,以解决这些挑战。然而,目前还缺乏一种结合领域知识和近似硬件来实现最优逼近的系统的、原则性的方法。在本文中,我们提出了一个面向概率的AxC (PAxC)框架,该框架通过考虑近似的总体概率效应来提供高节能和可接受的质量。为了实现激进的近似设计,我们利用最小似然误差来确定应用级和电路级的AxC协同配置文件。这使得能量和精度之间的权衡得到有效的协调。与基线设计相比,功率延迟积(PDP)显著降低高达83.66%,精度降低可接受。仿真和图像处理的实例研究验证了该框架的有效性。
{"title":"Energy Efficient Approximate Computing Framework for DNN Acceleration Using a Probabilistic-Oriented Method","authors":"Pengfei Huang;Ke Chen;Chenghua Wang;Weiqiang Liu","doi":"10.1109/TETC.2024.3522307","DOIUrl":"https://doi.org/10.1109/TETC.2024.3522307","url":null,"abstract":"Approximate computing (AxC) has recently emerged as a successful approach for optimizing energy consumption in error-tolerant applications, such as deep neural networks (DNNs). The enormous model size and high computation cost of DNNs present significant challenges for deployment in energy-efficient and resource-constrained computing systems. Emerging DNN hardware accelerators based on AxC designs selectively approximate the non-critical segments of computation to address these challenges. However, a systematic and principled approach that incorporates domain knowledge and approximate hardware for optimal approximation is still lacking. In this paper, we propose a probabilistic-oriented AxC (PAxC) framework that provides high energy savings with acceptable quality by considering the overall probability effect of approximation. To achieve aggressive approximate designs, we utilize the minimum likelihood error to determine the AxC synergy profile at both application and circuit levels. This enables effective coordination of the trade-off between energy and accuracy. Compared with a baseline design, the power-delay product (PDP) is significantly reduced by up to 83.66% with an acceptable accuracy reduction. Simulation and a case study of the image process validate the effectiveness of the proposed framework.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"816-828"},"PeriodicalIF":5.4,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145050820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D Invisible Cloak: A Robust Person Stealth Attack Against Object Detector in Complex 3D Physical Scenarios 三维隐形斗篷:复杂三维物理场景中针对目标探测器的鲁棒人隐身攻击
IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-13 DOI: 10.1109/TETC.2024.3513392
Mingfu Xue;Can He;Yushu Zhang;Zhe Liu;Weiqiang Liu
In this article, we propose a novel physical stealth attack against the person detectors in real world. For the first time, we consider the impacts of those complex and challenging 3D physical constraints (e.g., radian, wrinkle, occlusion, angle, etc.) on person stealth attacks, and propose 3D transformations to generate robust 3D invisible cloak. We launch the person stealth attacks in 3D physical space instead of 2D plane by printing the adversarial patches on real clothes. Anyone wearing the cloak can evade the detection of person detectors and achieve stealth under challenging and complex 3D physical scenarios. Experimental results in various indoor and outdoor physical scenarios show that, the proposed person stealth attack method is robust and effective even under those complex and challenging physical conditions, such as the cloak is wrinkled, obscured, curved, and from different/large angles. The attack success rate of the generated adversarial patch in digital domain (Inria dataset) is 86.56% against YOLO v2 and 80.32% against YOLO v5, while the static and dynamic stealth attack success rates of the generated 3D invisible cloak in physical world are 100%, 77% against YOLO v2 and 100%, 83.95% against YOLO v5, respectively, which are significantly better than state-of-the-art works.
在本文中,我们提出了一种新的针对现实世界中人检测器的物理隐身攻击。我们首次考虑了这些复杂且具有挑战性的3D物理约束(例如弧度,皱纹,遮挡,角度等)对人隐身攻击的影响,并提出了3D变换来生成鲁棒的3D隐形斗篷。我们通过在真实的衣服上打印对抗补丁,在三维物理空间而不是二维平面上发动人体隐身攻击。任何人穿着斗篷都可以避开人体探测器的检测,在具有挑战性和复杂的3D物理场景下实现隐身。在室内和室外各种物理场景下的实验结果表明,即使在斗篷皱褶、遮挡、弯曲、不同/大角度等复杂且具有挑战性的物理条件下,所提出的人隐身攻击方法也具有鲁棒性和有效性。生成的对抗补丁在数字域(Inria数据集)对YOLO v2和YOLO v5的攻击成功率分别为86.56%和80.32%,而生成的三维隐形斗篷在物理世界中对YOLO v2和YOLO v5的静态和动态隐身攻击成功率分别为100%和77%和100% 83.95%,明显优于目前的研究成果。
{"title":"3D Invisible Cloak: A Robust Person Stealth Attack Against Object Detector in Complex 3D Physical Scenarios","authors":"Mingfu Xue;Can He;Yushu Zhang;Zhe Liu;Weiqiang Liu","doi":"10.1109/TETC.2024.3513392","DOIUrl":"https://doi.org/10.1109/TETC.2024.3513392","url":null,"abstract":"In this article, we propose a novel physical stealth attack against the person detectors in real world. For the first time, we consider the impacts of those complex and challenging 3D physical constraints (e.g., radian, wrinkle, occlusion, angle, etc.) on person stealth attacks, and propose 3D transformations to generate robust 3D invisible cloak. We launch the person stealth attacks in 3D physical space instead of 2D plane by printing the adversarial patches on real clothes. Anyone wearing the cloak can evade the detection of person detectors and achieve stealth under challenging and complex 3D physical scenarios. Experimental results in various indoor and outdoor physical scenarios show that, the proposed person stealth attack method is robust and effective even under those complex and challenging physical conditions, such as the cloak is wrinkled, obscured, curved, and from different/large angles. The attack success rate of the generated adversarial patch in digital domain (Inria dataset) is 86.56% against YOLO v2 and 80.32% against YOLO v5, while the static and dynamic stealth attack success rates of the generated 3D invisible cloak in physical world are 100%, 77% against YOLO v2 and 100%, 83.95% against YOLO v5, respectively, which are significantly better than state-of-the-art works.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"799-815"},"PeriodicalIF":5.4,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145050821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spiker+: A Framework for the Generation of Efficient Spiking Neural Networks FPGA Accelerators for Inference at the Edge Spiker+:一种用于边缘推理的高效Spiker神经网络FPGA加速器生成框架
IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-11 DOI: 10.1109/TETC.2024.3511676
Alessio Carpegna;Alessandro Savino;Stefano Di Carlo
Including Artificial Neural Networks in embedded systems at the edge allows applications to exploit Artificial Intelligence capabilities directly within devices operating at the network periphery, containing sensitive data within the boundaries of the edge device. This facilitates real-time decision-making, reduces latency and power consumption, and enhances privacy and security. Spiking Neural Networks (SNNs) offer a promising computing paradigm in these environments. However, deploying efficient SNNs in resource-constrained edge devices requires highly parallel and reconfigurable hardware implementations. We introduce Spiker+, a comprehensive framework for generating efficient, low-power, and low-area SNN accelerators on Field Programmable Gate Arrays for inference at the edge. Spiker+ presents a configurable multi-layer SNN hardware architecture, a library of highly efficient neuron architectures, and a design framework to enable easy, Python-based customization of accelerators. Spiker+ is tested on three benchmark datasets: MNIST, Spiking Heidelberg Dataset (SHD), and AudioMNIST. On MNIST, it outperforms state-of-the-art SNN accelerators in terms of resource allocation, with a requirement of 7,612 logic cells and 18 Block RAMS (BRAMs), and power consumption, draining only 180 mW, with comparable latency (780 $mu$s/img) and accuracy (97%). On SHD and AudioMNIST, Spiker+ requires 18,268 and 10,124 logic cells, respectively, requiring 51 and 16 BRAMs, consuming 430 mW and 290 mW, with an accuracy of 75% and 95%. These results underscore the significance of Spiker+ in the hardware-accelerated SNN landscape, making it an excellent solution for deploying configurable and tunable SNN architectures in resource and power-constrained edge applications.
在边缘的嵌入式系统中包括人工神经网络,允许应用程序直接在网络外围运行的设备中利用人工智能功能,在边缘设备的边界内包含敏感数据。这有助于实时决策,降低延迟和功耗,并增强隐私和安全性。在这些环境中,脉冲神经网络(snn)提供了一种很有前途的计算范式。然而,在资源受限的边缘设备中部署高效snn需要高度并行和可重构的硬件实现。我们介绍Spiker+,这是一个综合框架,用于在现场可编程门阵列上生成高效,低功耗和低面积SNN加速器,用于边缘推断。Spiker+提供了一个可配置的多层SNN硬件架构,一个高效神经元架构库,以及一个设计框架,可以轻松地基于python定制加速器。Spiker+在三个基准数据集上进行了测试:MNIST、Spiking Heidelberg Dataset (SHD)和AudioMNIST。在MNIST上,它在资源分配方面优于最先进的SNN加速器,需要7,612个逻辑单元和18个块ram (bram),功耗仅为180 mW,具有相当的延迟(780 $mu$s/img)和精度(97%)。在SHD和AudioMNIST上,Spiker+分别需要18,268和10,124个逻辑单元,分别需要51和16个bram,消耗430兆瓦和290兆瓦,精度为75%和95%。这些结果强调了Spiker+在硬件加速SNN领域的重要性,使其成为在资源和功率受限的边缘应用中部署可配置和可调SNN架构的绝佳解决方案。
{"title":"Spiker+: A Framework for the Generation of Efficient Spiking Neural Networks FPGA Accelerators for Inference at the Edge","authors":"Alessio Carpegna;Alessandro Savino;Stefano Di Carlo","doi":"10.1109/TETC.2024.3511676","DOIUrl":"https://doi.org/10.1109/TETC.2024.3511676","url":null,"abstract":"Including Artificial Neural Networks in embedded systems at the edge allows applications to exploit Artificial Intelligence capabilities directly within devices operating at the network periphery, containing sensitive data within the boundaries of the edge device. This facilitates real-time decision-making, reduces latency and power consumption, and enhances privacy and security. Spiking Neural Networks (SNNs) offer a promising computing paradigm in these environments. However, deploying efficient SNNs in resource-constrained edge devices requires highly parallel and reconfigurable hardware implementations. We introduce Spiker+, a comprehensive framework for generating efficient, low-power, and low-area SNN accelerators on Field Programmable Gate Arrays for inference at the edge. Spiker+ presents a configurable multi-layer SNN hardware architecture, a library of highly efficient neuron architectures, and a design framework to enable easy, Python-based customization of accelerators. Spiker+ is tested on three benchmark datasets: MNIST, Spiking Heidelberg Dataset (SHD), and AudioMNIST. On MNIST, it outperforms state-of-the-art SNN accelerators in terms of resource allocation, with a requirement of 7,612 logic cells and 18 Block RAMS (BRAMs), and power consumption, draining only 180 mW, with comparable latency (780 <inline-formula><tex-math>$mu$</tex-math></inline-formula>s/img) and accuracy (97%). On SHD and AudioMNIST, Spiker+ requires 18,268 and 10,124 logic cells, respectively, requiring 51 and 16 BRAMs, consuming 430 mW and 290 mW, with an accuracy of 75% and 95%. These results underscore the significance of Spiker+ in the hardware-accelerated SNN landscape, making it an excellent solution for deploying configurable and tunable SNN architectures in resource and power-constrained edge applications.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"784-798"},"PeriodicalIF":5.4,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10794606","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145051041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guest Editorial: Special Section on “Approximate Data Processing: Computing, Storage and Applications” 嘉宾评论:“近似数据处理:计算、存储与应用”专题
IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-05 DOI: 10.1109/TETC.2024.3488452
Ke Chen;Shanshan Liu;Weiqiang Liu;Fabrizio Lombardi;Nader Bagherzadeh
{"title":"Guest Editorial: Special Section on “Approximate Data Processing: Computing, Storage and Applications”","authors":"Ke Chen;Shanshan Liu;Weiqiang Liu;Fabrizio Lombardi;Nader Bagherzadeh","doi":"10.1109/TETC.2024.3488452","DOIUrl":"https://doi.org/10.1109/TETC.2024.3488452","url":null,"abstract":"","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 4","pages":"954-955"},"PeriodicalIF":5.1,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10779333","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142777561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Emerging Topics in Computing Information for Authors 面向作者的计算信息新兴主题IEEE汇刊
IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-05 DOI: 10.1109/TETC.2024.3499715
{"title":"IEEE Transactions on Emerging Topics in Computing Information for Authors","authors":"","doi":"10.1109/TETC.2024.3499715","DOIUrl":"https://doi.org/10.1109/TETC.2024.3499715","url":null,"abstract":"","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 4","pages":"C2-C2"},"PeriodicalIF":5.1,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10779345","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142777661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Crowdsourcing-Driven AI Model Design Framework to Public Health Policy-Adherence Assessment 公共卫生政策依从性评估的众包驱动AI模型设计框架
IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-18 DOI: 10.1109/TETC.2024.3496835
Yang Zhang;Ruohan Zong;Lanyu Shang;Dong Wang
This paper focuses on a public health policy-adherence assessment (PHPA) application that aims to automatically assess people's public health policy adherence during emergent global health crisis events (e.g., COVID-19, MonkeyPox) by leveraging massive public health policy adherence imagery data from the social media. In particular, we study an optimal AI model design problem in the PHPA application, where the goal is to leverage the crowdsourced human intelligence to accurately identify the optimal AI model design (i.e., network architecture and hyperparameter configuration combination) without the need of AI experts. However, two critical challenges exist in our problem: 1) it is challenging to effectively optimize the AI model design given the interdependence between network architecture and hyperparameter configuration; 2) it is non-trivial to leverage the human intelligence queried from ordinary crowd workers to identify the optimal AI model design in the PHPA application. To address these challenges, we develop CrowdDesign, a subjective logic-driven human-AI collaborative learning framework that explores the complementary strength of AI and human intelligence to jointly identify the optimal network architecture and hyperparameter configuration of an AI model in the PHPA application. The experimental results from two real-world PHPA applications demonstrate that CrowdDesign consistently outperforms the state-of-the-art baseline methods by achieving the best PHPA performance.
本文重点研究了一种公共卫生政策依从性评估(PHPA)应用程序,该应用程序旨在利用来自社交媒体的大量公共卫生政策依从性图像数据,自动评估人们在紧急全球卫生危机事件(例如COVID-19,猴痘)中的公共卫生政策依从性。特别地,我们研究了PHPA应用中的最优AI模型设计问题,其目标是在不需要人工智能专家的情况下,利用众包人工智能来准确识别最优AI模型设计(即网络架构和超参数配置组合)。然而,在我们的问题中存在两个关键挑战:1)考虑到网络架构和超参数配置之间的相互依赖性,如何有效地优化人工智能模型设计是一个挑战;2)在PHPA应用中,利用从普通人群工作者那里查询到的人类智能来识别最优的AI模型设计是非常重要的。为了应对这些挑战,我们开发了CrowdDesign,这是一个主观逻辑驱动的人类-人工智能协作学习框架,探索人工智能和人类智能的互补优势,共同确定PHPA应用中人工智能模型的最佳网络架构和超参数配置。来自两个实际PHPA应用程序的实验结果表明,通过实现最佳的PHPA性能,CrowdDesign始终优于最先进的基线方法。
{"title":"A Crowdsourcing-Driven AI Model Design Framework to Public Health Policy-Adherence Assessment","authors":"Yang Zhang;Ruohan Zong;Lanyu Shang;Dong Wang","doi":"10.1109/TETC.2024.3496835","DOIUrl":"https://doi.org/10.1109/TETC.2024.3496835","url":null,"abstract":"This paper focuses on a <italic>public health policy-adherence assessment (PHPA)</i> application that aims to automatically assess people's public health policy adherence during emergent global health crisis events (e.g., COVID-19, MonkeyPox) by leveraging massive public health policy adherence imagery data from the social media. In particular, we study an <italic>optimal AI model design</i> problem in the PHPA application, where the goal is to leverage the crowdsourced human intelligence to accurately identify the optimal AI model design (i.e., network architecture and hyperparameter configuration combination) without the need of AI experts. However, two critical challenges exist in our problem: 1) it is challenging to effectively optimize the AI model design given the interdependence between network architecture and hyperparameter configuration; 2) it is non-trivial to leverage the human intelligence queried from ordinary crowd workers to identify the optimal AI model design in the PHPA application. To address these challenges, we develop <italic>CrowdDesign</i>, a subjective logic-driven human-AI collaborative learning framework that explores the complementary strength of AI and human intelligence to jointly identify the optimal network architecture and hyperparameter configuration of an AI model in the PHPA application. The experimental results from two real-world PHPA applications demonstrate that CrowdDesign consistently outperforms the state-of-the-art baseline methods by achieving the best PHPA performance.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"768-783"},"PeriodicalIF":5.4,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10756632","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145050805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performability of Service Chains With Rejuvenation: A Multidimensional Universal Generating Function Approach 具有再生的服务链的可执行性:一种多维通用生成函数方法
IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-18 DOI: 10.1109/TETC.2024.3496195
Luigi De Simone;Mario Di Mauro;Roberto Natella;Fabio Postiglione
Network Function Virtualization (NFV) converts legacy telecommunication systems into modular software appliances, known as service chains, running on the cloud. To address potential software aging-related issues, rejuvenation is often employed to clean up their state and maximize performance and availability. In this work, we propose a framework to model the performability of service chains with rejuvenation. Performance modeling uses queueing theory, specifically adopting an $M/G/m$ model with the Allen-Cunneen approximation, to capture real-world aspects related to service times. Availability modeling is addressed through the Multidimensional Universal Generating Function (MUGF), a recent technique that achieves computational efficiency when dealing with systems with many sub-elements, particularly useful for multi-provider service chains. Additionally, we deploy an experimental testbed based on the Open5GS service chain, to estimate key performance and availability parameters. Supported by experimental results, we evaluate the impact of rejuvenation on the performability of the Open5GS service chain. The numerical analysis shows that i) the configuration of replicas across nodes is important to meet availability goals; ii) rejuvenation can bring one additional “nine” of availability, depending on the time to recovery; and iii) MUGF can significantly reduce computational complexity through straightforward algebraic manipulations.
网络功能虚拟化(NFV)将传统的电信系统转换为运行在云上的模块化软件设备,即服务链。为了解决潜在的与软件老化相关的问题,通常使用恢复来清理它们的状态并最大化性能和可用性。在这项工作中,我们提出了一个框架来模拟服务链的可执行性。性能建模使用排队理论,特别是采用带有Allen-Cunneen近似的$M/G/ M $模型,来捕获与服务时间相关的真实方面。可用性建模是通过多维通用生成函数(multi- Universal Generating Function, MUGF)解决的,MUGF是一种最近的技术,在处理具有许多子元素的系统时实现了计算效率,对于多提供者服务链特别有用。此外,我们部署了一个基于Open5GS服务链的实验测试平台,以评估关键性能和可用性参数。在实验结果的支持下,我们评估了再生对Open5GS服务链性能的影响。数值分析表明,跨节点的副本配置对于满足可用性目标非常重要;Ii)恢复活力可以带来额外的“9”可用性,具体取决于恢复的时间;iii) MUGF可以通过直接的代数操作显著降低计算复杂度。
{"title":"Performability of Service Chains With Rejuvenation: A Multidimensional Universal Generating Function Approach","authors":"Luigi De Simone;Mario Di Mauro;Roberto Natella;Fabio Postiglione","doi":"10.1109/TETC.2024.3496195","DOIUrl":"https://doi.org/10.1109/TETC.2024.3496195","url":null,"abstract":"Network Function Virtualization (NFV) converts legacy telecommunication systems into modular software appliances, known as service chains, running on the cloud. To address potential software aging-related issues, rejuvenation is often employed to clean up their state and maximize performance and availability. In this work, we propose a framework to model the <i>performability</i> of service chains with rejuvenation. Performance modeling uses queueing theory, specifically adopting an <inline-formula><tex-math>$M/G/m$</tex-math></inline-formula> model with the Allen-Cunneen approximation, to capture real-world aspects related to service times. Availability modeling is addressed through the Multidimensional Universal Generating Function (MUGF), a recent technique that achieves computational efficiency when dealing with systems with many sub-elements, particularly useful for multi-provider service chains. Additionally, we deploy an experimental testbed based on the Open5GS service chain, to estimate key performance and availability parameters. Supported by experimental results, we evaluate the impact of rejuvenation on the performability of the Open5GS service chain. The numerical analysis shows that <i>i)</i> the configuration of replicas across nodes is important to meet availability goals; <i>ii)</i> rejuvenation can bring one additional “nine” of availability, depending on the time to recovery; and <i>iii)</i> MUGF can significantly reduce computational complexity through straightforward algebraic manipulations.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 2","pages":"341-353"},"PeriodicalIF":5.1,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144323051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Emerging Topics in Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1