The radiation and its effect on neighboring nodes are critical not only for space applications but also for terrestrial applications at modern lower technology nodes. This may cause SRAM failures due to single and multi-node upset. Hence, this paper proposes a 14T radiation-hardened-based SRAM cell to overcome soft errors for space and critical terrestrial applications. Simulation results show that the proposed cell can be resilient to any single event upset and single event double node upset at its storage nodes. This cell uses less power than others. The hold, read, and write stability increases compared to most considered cells. The higher critical charge of the proposed SRAM increases radiation resistance. Simulation results demonstrate that out of all compared SRAMs, only DNUSRM and proposed SRAM show 0% probability of logical flipping. Also, the other parameters like total critical charge, write stability, read stability, hold stability, area, power, sensitive area, write speed, and read speed of the proposed SRAM are improved by -19.1%, 5.22%, 25.7%, -5.46%, 22.5%, 50.6%, 60.0%, 17.91%, and 0.74% compared to DNUSRM SRAM. Hence, the better balance among the parameters makes the proposed cell more suitable for space and critical terrestrial applications. Finally, the post-layout and Monte Carlo simulation validate the efficiency of SRAMs.
{"title":"SEDONUT: A Single Event Double Node Upset Tolerant SRAM for Terrestrial Applications","authors":"Govind Prasad, Bipin Chnadra Mandi, Maifuz Ali","doi":"10.1145/3651985","DOIUrl":"https://doi.org/10.1145/3651985","url":null,"abstract":"<p>The radiation and its effect on neighboring nodes are critical not only for space applications but also for terrestrial applications at modern lower technology nodes. This may cause SRAM failures due to single and multi-node upset. Hence, this paper proposes a 14T radiation-hardened-based SRAM cell to overcome soft errors for space and critical terrestrial applications. Simulation results show that the proposed cell can be resilient to any single event upset and single event double node upset at its storage nodes. This cell uses less power than others. The hold, read, and write stability increases compared to most considered cells. The higher critical charge of the proposed SRAM increases radiation resistance. Simulation results demonstrate that out of all compared SRAMs, only DNUSRM and proposed SRAM show 0% probability of logical flipping. Also, the other parameters like total critical charge, write stability, read stability, hold stability, area, power, sensitive area, write speed, and read speed of the proposed SRAM are improved by -19.1%, 5.22%, 25.7%, -5.46%, 22.5%, 50.6%, 60.0%, 17.91%, and 0.74% compared to DNUSRM SRAM. Hence, the better balance among the parameters makes the proposed cell more suitable for space and critical terrestrial applications. Finally, the post-layout and Monte Carlo simulation validate the efficiency of SRAMs.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ping-Xiang Chen, Dongjoo Seo, Changhoon Sung, Jongheum Park, Minchul Lee, Huaicheng Li, Matias Bjørling, Nikil Dutt
We present ZoneTrace, a runtime monitoring tool for the Flash-Friendly File System (F2FS) on Zoned Namespace (ZNS) SSDs. ZNS SSD organizes its storage into zones of sequential write access. Due to ZNS SSD’s sequential write nature, F2FS is a log-structured file system that has recently been adopted to support ZNS SSDs. To present the space management with the zone concept between F2FS and the underlying ZNS SSD, we developed ZoneTrace, a tool that enables users to visualize and analyze the space management of F2FS on ZNS SSDs. ZoneTrace utilizes the extended Berkeley Packet Filter (eBPF) to trace the updated segment bitmap in F2FS and visualize each zone space usage accordingly. Furthermore, ZoneTrace is able to analyze on file fragmentation in F2FS and provides users with informative fragmentation histogram to serve as an indicator of file fragmentation. Using ZoneTrace’s visualization, we are able to identify the current F2FS space management scheme’s inability to fully optimize space for streaming data recording in autonomous systems, which leads to serious file fragmentation on ZNS SSDs. Our evaluations show that ZoneTrace is lightweight and assists users in getting useful insights for effortless monitoring on F2FS with ZNS SSD with both synthetic and realistic workloads. We believe ZoneTrace can help users analyze F2FS with ease and open up space management research topics with F2FS on ZNS SSDs.
{"title":"ZoneTrace: A Zone Monitoring Tool for F2FS on ZNS SSDs","authors":"Ping-Xiang Chen, Dongjoo Seo, Changhoon Sung, Jongheum Park, Minchul Lee, Huaicheng Li, Matias Bjørling, Nikil Dutt","doi":"10.1145/3656172","DOIUrl":"https://doi.org/10.1145/3656172","url":null,"abstract":"<p>We present <monospace>ZoneTrace</monospace>, a runtime monitoring tool for the Flash-Friendly File System (F2FS) on Zoned Namespace (ZNS) SSDs. ZNS SSD organizes its storage into zones of sequential write access. Due to ZNS SSD’s sequential write nature, F2FS is a log-structured file system that has recently been adopted to support ZNS SSDs. To present the space management with the zone concept between F2FS and the underlying ZNS SSD, we developed <monospace>ZoneTrace</monospace>, a tool that enables users to visualize and analyze the space management of F2FS on ZNS SSDs. <monospace>ZoneTrace</monospace> utilizes the extended Berkeley Packet Filter (eBPF) to trace the updated segment bitmap in F2FS and visualize each zone space usage accordingly. Furthermore, <monospace>ZoneTrace</monospace> is able to analyze on file fragmentation in F2FS and provides users with informative fragmentation histogram to serve as an indicator of file fragmentation. Using <monospace>ZoneTrace</monospace>’s visualization, we are able to identify the current F2FS space management scheme’s inability to fully optimize space for streaming data recording in autonomous systems, which leads to serious file fragmentation on ZNS SSDs. Our evaluations show that <monospace>ZoneTrace</monospace> is lightweight and assists users in getting useful insights for effortless monitoring on F2FS with ZNS SSD with both synthetic and realistic workloads. We believe <monospace>ZoneTrace</monospace> can help users analyze F2FS with ease and open up space management research topics with F2FS on ZNS SSDs.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenyan Yan, Dongsheng Wei, Bin Fu, Renfa Li, Guoqi Xie
The network architecture that Time-Sensitive Networking (TSN) is used as the backbone network and the Controller Area Network (CAN) serves as the intra-domain network is considered as the CAN-TSN interconnection network architecture, which has gained considerable attention within industrial embedded networks, such as spacecraft, intelligent automobiles, and factory automation. The architecture employs the CAN-TSN gateway as a central hub for transmitting and managing a significant volume of communications between the CAN domains and TSN. However, the CAN-TSN gateway faces a high congestion challenge due to the rapid growth in data volume, making it difficult to effectively support different time planning mechanisms provided by TSN. In this paper, we propose a two-stage mixed-criticality traffic scheduler. The scheduler in the first stage adopts a Message Optimization Algorithm (MOA) to aggregate multiple CAN messages into a single TSN message (including the aggregation of critical and non-critical CAN messages), which reduces the number of CAN messages requiring transmission. In the second stage, the scheduler proposes a Message Scheduling Optimization Algorithm (MSOA) to schedule critical TSN messages. This algorithm reassembles all the critical CAN messages (within the un-schedulable TSN messages) to generate new TSN messages for rescheduling. Experimental results show that our proposed scheduler effectively improves the acceptance ratio of critical and non-critical CAN messages and outperforms the state-of-the-art message scheduling method in terms of acceptance ratio while improving the bandwidth utilization and the number of schedule table entries. We further construct a hardware platform to evaluate the performance of MSOA. The consistency between practical results and theoretical results shows the effectiveness of MSOA.
时敏网络(TSN)作为骨干网络,控制器局域网(CAN)作为域内网络的网络架构被认为是 CAN-TSN 互联网络架构,这种架构在工业嵌入式网络(如航天器、智能汽车和工厂自动化)中受到了广泛关注。该架构采用 CAN-TSN 网关作为中心枢纽,在 CAN 域和 TSN 之间传输和管理大量通信。然而,由于数据量的快速增长,CAN-TSN 网关面临着高度拥塞的挑战,难以有效支持 TSN 提供的不同时间规划机制。本文提出了一种两阶段混合关键性流量调度器。第一阶段的调度器采用报文优化算法(MOA)将多个 CAN 报文聚合成一个 TSN 报文(包括关键和非关键 CAN 报文的聚合),从而减少了需要传输的 CAN 报文数量。在第二阶段,调度器提出一种报文调度优化算法(MSOA)来调度关键 TSN 报文。该算法将所有关键 CAN 报文(在无法调度的 TSN 报文内)重新组合,生成新的 TSN 报文,以便重新调度。实验结果表明,我们提出的调度器有效提高了关键和非关键 CAN 报文的接受率,在接受率方面优于最先进的报文调度方法,同时提高了带宽利用率和调度表条目数。我们进一步构建了一个硬件平台来评估 MSOA 的性能。实际结果与理论结果的一致性表明了 MSOA 的有效性。
{"title":"A Mixed-Criticality Traffic Scheduler with Mitigating Congestion for CAN-to-TSN Gateway","authors":"Wenyan Yan, Dongsheng Wei, Bin Fu, Renfa Li, Guoqi Xie","doi":"10.1145/3656173","DOIUrl":"https://doi.org/10.1145/3656173","url":null,"abstract":"<p>The network architecture that Time-Sensitive Networking (TSN) is used as the backbone network and the Controller Area Network (CAN) serves as the intra-domain network is considered as the CAN-TSN interconnection network architecture, which has gained considerable attention within industrial embedded networks, such as spacecraft, intelligent automobiles, and factory automation. The architecture employs the CAN-TSN gateway as a central hub for transmitting and managing a significant volume of communications between the CAN domains and TSN. However, the CAN-TSN gateway faces a high congestion challenge due to the rapid growth in data volume, making it difficult to effectively support different time planning mechanisms provided by TSN. In this paper, we propose a two-stage mixed-criticality traffic scheduler. The scheduler in the first stage adopts a Message Optimization Algorithm (MOA) to aggregate multiple CAN messages into a single TSN message (including the aggregation of critical and non-critical CAN messages), which reduces the number of CAN messages requiring transmission. In the second stage, the scheduler proposes a Message Scheduling Optimization Algorithm (MSOA) to schedule critical TSN messages. This algorithm reassembles all the critical CAN messages (within the un-schedulable TSN messages) to generate new TSN messages for rescheduling. Experimental results show that our proposed scheduler effectively improves the acceptance ratio of critical and non-critical CAN messages and outperforms the state-of-the-art message scheduling method in terms of acceptance ratio while improving the bandwidth utilization and the number of schedule table entries. We further construct a hardware platform to evaluate the performance of MSOA. The consistency between practical results and theoretical results shows the effectiveness of MSOA.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Concolic testing is a scalable solution for automated generation of directed tests for validation of hardware designs. Unfortunately, concolic testing fails to cover complex corner cases such as hard-to-activate branches. In this paper, we propose an incremental concolic testing technique to cover hard-to-activate branches in register-transfer level (RTL) models. We show that a complex branch condition can be viewed as a sequence of easy-to-activate events. We map the branch coverage problem to the coverage of a sequence of events. We propose an efficient algorithm to cover the sequence of events using concolic testing. Specifically, the test generated to activate the current event is used as the starting point to activate the next event in the sequence. Experimental results demonstrate that our approach can be used to generate directed tests to cover complex corner cases in RTL models while state-of-the-art methods fail to activate them.
{"title":"Incremental Concolic Testing of Register-Transfer Level Designs","authors":"Hasini Witharana, Aruna Jayasena, Prabhat Mishra","doi":"10.1145/3655621","DOIUrl":"https://doi.org/10.1145/3655621","url":null,"abstract":"Concolic testing is a scalable solution for automated generation of directed tests for validation of hardware designs. Unfortunately, concolic testing fails to cover complex corner cases such as hard-to-activate branches. In this paper, we propose an incremental concolic testing technique to cover hard-to-activate branches in register-transfer level (RTL) models. We show that a complex branch condition can be viewed as a sequence of easy-to-activate events. We map the branch coverage problem to the coverage of a sequence of events. We propose an efficient algorithm to cover the sequence of events using concolic testing. Specifically, the test generated to activate the current event is used as the starting point to activate the next event in the sequence. Experimental results demonstrate that our approach can be used to generate directed tests to cover complex corner cases in RTL models while state-of-the-art methods fail to activate them.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140362228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aritra Bagchi, Dharamjeet, Ohm Rishabh, Manan Suri, Preeti Ranjan Panda
Non-volatile memories (NVMs) with their high storage density and ultra-low leakage power offer promising potential for redesigning the memory hierarchy in next-generation Multi-Processor Systems-on-Chip (MPSoCs). However, the adoption of NVMs in cache designs introduces challenges such as NVM write overheads and limited NVM endurance. The shared NVM cache in an MPSoC experiences requests from different processor cores and responses from the off-chip memory when the requested data is not present in the cache. Besides, upon evictions of dirty data from higher-level caches, the shared NVM cache experiences another source of write operations, known as writebacks. These sources of write operations: writebacks and responses, further exacerbate the contention for the shared bandwidth of the NVM cache, and create significant performance bottlenecks. Uncontrolled write operations can also affect the endurance of the NVM cache, posing a threat to cache lifetime and system reliability. Existing strategies often address either performance or cache endurance individually, leaving a gap for a holistic solution. This study introduces the Performance Optimization and Endurance Management (POEM) methodology, a novel approach that aggressively bypasses cache writebacks and responses to alleviate the NVM cache contention. Contrary to the existing bypass policies which do not pay adequate attention to the shared NVM cache contention, and focus too much on cache data reuse, POEM’s aggressive bypass significantly improves the overall system performance, even at the expense of data reuse. POEM also employs effective wear leveling to enhance the NVM cache endurance by careful redistribution of write operations across different cache lines. Across diverse workloads, POEM yields an average speedup of (34% ) over a naïve baseline and (28.8% ) over a state-of-the-art NVM cache bypass technique, while enhancing the cache endurance by (15% ) over the baseline. POEM also explores diverse design choices by exploiting a key policy parameter that assigns varying priorities to the two system-level objectives.
{"title":"POEM: Performance Optimization and Endurance Management for Non-volatile Caches","authors":"Aritra Bagchi, Dharamjeet, Ohm Rishabh, Manan Suri, Preeti Ranjan Panda","doi":"10.1145/3653452","DOIUrl":"https://doi.org/10.1145/3653452","url":null,"abstract":"<p>Non-volatile memories (NVMs) with their high storage density and ultra-low leakage power offer promising potential for redesigning the memory hierarchy in next-generation Multi-Processor Systems-on-Chip (MPSoCs). However, the adoption of NVMs in cache designs introduces challenges such as NVM write overheads and limited NVM endurance. The shared NVM cache in an MPSoC experiences <i>requests</i> from different processor cores and <i>responses</i> from the off-chip memory when the requested data is not present in the cache. Besides, upon evictions of dirty data from higher-level caches, the shared NVM cache experiences another source of write operations, known as <i>writebacks</i>. These sources of write operations: writebacks and responses, further exacerbate the contention for the shared bandwidth of the NVM cache, and create significant performance bottlenecks. Uncontrolled write operations can also affect the endurance of the NVM cache, posing a threat to cache lifetime and system reliability. Existing strategies often address either performance or cache endurance individually, leaving a gap for a holistic solution. This study introduces the Performance Optimization and Endurance Management (POEM) methodology, a novel approach that aggressively bypasses cache writebacks and responses to alleviate the NVM cache contention. Contrary to the existing bypass policies which do not pay adequate attention to the shared NVM cache contention, and focus too much on cache data reuse, POEM’s aggressive bypass significantly improves the overall system performance, even at the expense of data reuse. POEM also employs effective wear leveling to enhance the NVM cache endurance by careful redistribution of write operations across different cache lines. Across diverse workloads, POEM yields an average speedup of (34% ) over a naïve baseline and (28.8% ) over a state-of-the-art NVM cache bypass technique, while enhancing the cache endurance by (15% ) over the baseline. POEM also explores diverse design choices by exploiting a key policy parameter that assigns varying priorities to the two system-level objectives.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140311393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Yang, Qi Xu, Hao Geng, Song Chen, Bei Yu, Yi Kang
In this paper, we focus on chip floorplanning, which aims to determine the location and orientation of circuit macros simultaneously, so that the chip area and wirelength are minimized. As the highest level of abstraction in hierarchical physical design, floorplanning bridges the gap between the system-level design and the physical synthesis, whose quality directly influences downstream placement and routing. To tackle chip floorplanning, we propose an end-to-end reinforcement learning (RL) methodology with a hindsight experience replay technique. An edge-aware graph attention network (EAGAT) is developed to effectively encode the macro and connection features of the netlist graph. Moreover, we build a hierarchical decoder architecture mainly consisting of transformer and attention pointer mechanism to output floorplan actions. Since the RL agent automatically extracts knowledge about the solution space, the previously learned policy can be quickly transferred to optimize new unseen netlists. Experimental results demonstrate that, compared with state-of-the-art floorplanners, the proposed end-to-end methodology significantly optimizes area and wirelength on public GSRC and MCNC benchmarks.
{"title":"Floorplanning with Edge-Aware Graph Attention Network and Hindsight Experience Replay","authors":"Bo Yang, Qi Xu, Hao Geng, Song Chen, Bei Yu, Yi Kang","doi":"10.1145/3653453","DOIUrl":"https://doi.org/10.1145/3653453","url":null,"abstract":"<p>In this paper, we focus on chip floorplanning, which aims to determine the location and orientation of circuit macros simultaneously, so that the chip area and wirelength are minimized. As the highest level of abstraction in hierarchical physical design, floorplanning bridges the gap between the system-level design and the physical synthesis, whose quality directly influences downstream placement and routing. To tackle chip floorplanning, we propose an end-to-end reinforcement learning (RL) methodology with a hindsight experience replay technique. An edge-aware graph attention network (EAGAT) is developed to effectively encode the macro and connection features of the netlist graph. Moreover, we build a hierarchical decoder architecture mainly consisting of transformer and attention pointer mechanism to output floorplan actions. Since the RL agent automatically extracts knowledge about the solution space, the previously learned policy can be quickly transferred to optimize new unseen netlists. Experimental results demonstrate that, compared with state-of-the-art floorplanners, the proposed end-to-end methodology significantly optimizes area and wirelength on public GSRC and MCNC benchmarks.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140204154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chunlin Li, Kun Jiang, Yong Zhang, Lincheng Jiang, Youlong Luo, Shaohua Wan
The convergence of unmanned aerial vehicle (UAV)-aided mobile edge computing (MEC) networks and blockchain transforms the existing mobile networking paradigm. However, in the temporary hotspot scenario for intelligent connected vehicles (ICVs) in UAV-aided MEC networks, deploying blockchain-based services and applications in vehicles is generally impossible due to its high computational resource and storage requirements. One possible solution is to offload part of all the computational tasks to MEC servers wherever possible. Unfortunately, due to the limited availability and high mobility of the vehicles, there is still lacking simple solutions that can support low-latency and higher reliability networking services for ICVs. In this paper, we study the task offloading problem of minimizing the total system latency and the optimal task offloading scheme, subject to constraints on the hover position coordinates of the UAV, the fixed bonuses, flexible transaction fees, transaction rates, mining difficulty, costs and battery energy consumption of the UAV. The problem is confirmed to be a challenging linear integer planning problem, we formulate the problem as a constrained Markov decision process (CMDP). Deep Reinforcement Learning (DRL) has excellently solved sequential decision-making problems in dynamic ICVs environment, therefore, we propose a novel distributed DRL-based P-D3QN approach by using Prioritized Experience Replay (PER) strategy and the dueling double deep Q-network (D3QN) algorithm to solve the optimal task offloading policy effectively. Finally, experiment results show that compared with the benchmark scheme, the P-D3QN algorithm can bring about 26.24% latency improvement and increase about 42.26% offloading utility.
{"title":"Deep Reinforcement Learning-Based Mining Task Offloading Scheme for Intelligent Connected Vehicles in UAV-Aided MEC","authors":"Chunlin Li, Kun Jiang, Yong Zhang, Lincheng Jiang, Youlong Luo, Shaohua Wan","doi":"10.1145/3653451","DOIUrl":"https://doi.org/10.1145/3653451","url":null,"abstract":"<p>The convergence of unmanned aerial vehicle (UAV)-aided mobile edge computing (MEC) networks and blockchain transforms the existing mobile networking paradigm. However, in the temporary hotspot scenario for intelligent connected vehicles (ICVs) in UAV-aided MEC networks, deploying blockchain-based services and applications in vehicles is generally impossible due to its high computational resource and storage requirements. One possible solution is to offload part of all the computational tasks to MEC servers wherever possible. Unfortunately, due to the limited availability and high mobility of the vehicles, there is still lacking simple solutions that can support low-latency and higher reliability networking services for ICVs. In this paper, we study the task offloading problem of minimizing the total system latency and the optimal task offloading scheme, subject to constraints on the hover position coordinates of the UAV, the fixed bonuses, flexible transaction fees, transaction rates, mining difficulty, costs and battery energy consumption of the UAV. The problem is confirmed to be a challenging linear integer planning problem, we formulate the problem as a constrained Markov decision process (CMDP). Deep Reinforcement Learning (DRL) has excellently solved sequential decision-making problems in dynamic ICVs environment, therefore, we propose a novel distributed DRL-based P-D3QN approach by using Prioritized Experience Replay (PER) strategy and the dueling double deep Q-network (D3QN) algorithm to solve the optimal task offloading policy effectively. Finally, experiment results show that compared with the benchmark scheme, the P-D3QN algorithm can bring about 26.24% latency improvement and increase about 42.26% offloading utility.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140170357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongduo Liu, Yijian Qian, Youqiang Liang, Bin Zhang, Zhaohan Liu, Tao He, Wenqian Zhao, Jiangbo Lu, Bei Yu
In the digital era, the prevalence of low-quality images contrasts with the widespread use of high-definition displays, primarily due to low-resolution cameras and compression technologies. Image super-resolution (SR) techniques, particularly those leveraging deep learning, aim to enhance these images for high-definition presentation. However, real-time execution of deep neural network (DNN)-based SR methods at the edge poses challenges due to their high computational and storage requirements. To address this, field-programmable gate arrays (FPGAs) have emerged as a promising platform, offering flexibility, programmability, and adaptability to evolving models. Previous FPGA-based SR solutions have focused on reducing computational and memory costs through aggressive simplification techniques, often sacrificing the quality of the reconstructed images. This paper introduces a novel SR network specifically designed for edge applications, which maintains reconstruction performance while managing computation costs effectively. Additionally, we propose an architectural design that enables the real-time and end-to-end inference of the proposed SR network on embedded FPGAs. Our key contributions include a tailored SR algorithm optimized for embedded FPGAs, a DSP-enhanced design that achieves a significant four-fold speedup, a novel scalable cache strategy for handling large feature maps, optimization of DSP cascade consumption, and a constraint optimization approach for resource allocation. Experimental results demonstrate that our FPGA-specific accelerator surpasses existing solutions, delivering superior throughput, energy efficiency, and image quality.
在数字时代,低质量图像的普遍存在与高清显示器的广泛使用形成了鲜明对比,这主要是由于低分辨率相机和压缩技术造成的。图像超分辨率(SR)技术,尤其是利用深度学习的技术,旨在增强这些图像的高清晰度。然而,在边缘实时执行基于深度神经网络(DNN)的 SR 方法面临着挑战,因为它们对计算和存储要求很高。为了解决这个问题,现场可编程门阵列(FPGA)成为一个很有前途的平台,它具有灵活性、可编程性和对不断发展的模型的适应性。以前基于 FPGA 的 SR 解决方案侧重于通过积极的简化技术降低计算和内存成本,但往往牺牲了重建图像的质量。本文介绍了一种专为边缘应用设计的新型 SR 网络,它能在保持重建性能的同时有效管理计算成本。此外,我们还提出了一种架构设计,可在嵌入式 FPGA 上实现拟议 SR 网络的实时和端到端推理。我们的主要贡献包括专为嵌入式 FPGA 优化的定制 SR 算法、可显著提高四倍速度的 DSP 增强设计、用于处理大型特征图的新型可扩展缓存策略、DSP 级联消耗优化以及用于资源分配的约束优化方法。实验结果表明,我们的 FPGA 专用加速器超越了现有解决方案,提供了卓越的吞吐量、能效和图像质量。
{"title":"A High-Performance Accelerator for Real-Time Super-Resolution on Edge FPGAs","authors":"Hongduo Liu, Yijian Qian, Youqiang Liang, Bin Zhang, Zhaohan Liu, Tao He, Wenqian Zhao, Jiangbo Lu, Bei Yu","doi":"10.1145/3652855","DOIUrl":"https://doi.org/10.1145/3652855","url":null,"abstract":"<p>In the digital era, the prevalence of low-quality images contrasts with the widespread use of high-definition displays, primarily due to low-resolution cameras and compression technologies. Image super-resolution (SR) techniques, particularly those leveraging deep learning, aim to enhance these images for high-definition presentation. However, real-time execution of deep neural network (DNN)-based SR methods at the edge poses challenges due to their high computational and storage requirements. To address this, field-programmable gate arrays (FPGAs) have emerged as a promising platform, offering flexibility, programmability, and adaptability to evolving models. Previous FPGA-based SR solutions have focused on reducing computational and memory costs through aggressive simplification techniques, often sacrificing the quality of the reconstructed images. This paper introduces a novel SR network specifically designed for edge applications, which maintains reconstruction performance while managing computation costs effectively. Additionally, we propose an architectural design that enables the real-time and end-to-end inference of the proposed SR network on embedded FPGAs. Our key contributions include a tailored SR algorithm optimized for embedded FPGAs, a DSP-enhanced design that achieves a significant four-fold speedup, a novel scalable cache strategy for handling large feature maps, optimization of DSP cascade consumption, and a constraint optimization approach for resource allocation. Experimental results demonstrate that our FPGA-specific accelerator surpasses existing solutions, delivering superior throughput, energy efficiency, and image quality.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140151320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Newcomb-Benford law is the law, also known as Benford's law, of anomalous numbers stating that in many real-life numerical datasets, including physical and statistical ones, numbers have small initial digit. Numbers irregularity observed in nature leads to the question, is the arithmetical-logical unit, responsible for performing calculations in computers optimal? Are there other architectures, not as regular as commonly used Parallel Prefix Adders that can perform better, especially when operating on the datasets that are not purely random, but irregular,? In this article, structures of propagate-generate tree are compared including regular and irregular configurations – various structures are examined: regular, irregular, with gray cells only, with both gray and black and with higher valency cells. Performance is evaluated in terms of energy consumption. The evaluation was performed using the extended power model of static CMOS gates. The model is based on changes of vectors, naturally taking into account spatio-temporal correlations. The energy parameters of the designed cells were calculated on the basis of electrical (Spice) simulation. Designs and simulations were done in Cadence environment, calculations of the power dissipation were performed in Matlab. The results clearly show that there are PPA structures that perform much better for a specific type of numerical data. Negligent design can lead to an increase greater than two times of power consumption. The novel architectures of PPA described in this work might find practical applications in specialized adders dealing with numerical datasets, such as, for example, sine functions commonly used in digital signal processing.
{"title":"Comparative Analysis of Dynamic Power Consumption of Parallel Prefix Adder","authors":"Ireneusz Brzozowski","doi":"10.1145/3651984","DOIUrl":"https://doi.org/10.1145/3651984","url":null,"abstract":"<p>The Newcomb-Benford law is the law, also known as Benford's law, of anomalous numbers stating that in many real-life numerical datasets, including physical and statistical ones, numbers have small initial digit. Numbers irregularity observed in nature leads to the question, is the arithmetical-logical unit, responsible for performing calculations in computers optimal? Are there other architectures, not as regular as commonly used Parallel Prefix Adders that can perform better, especially when operating on the datasets that are not purely random, but irregular,? In this article, structures of propagate-generate tree are compared including regular and irregular configurations – various structures are examined: regular, irregular, with gray cells only, with both gray and black and with higher valency cells. Performance is evaluated in terms of energy consumption. The evaluation was performed using the extended power model of static CMOS gates. The model is based on changes of vectors, naturally taking into account spatio-temporal correlations. The energy parameters of the designed cells were calculated on the basis of electrical (Spice) simulation. Designs and simulations were done in Cadence environment, calculations of the power dissipation were performed in Matlab. The results clearly show that there are PPA structures that perform much better for a specific type of numerical data. Negligent design can lead to an increase greater than two times of power consumption. The novel architectures of PPA described in this work might find practical applications in specialized adders dealing with numerical datasets, such as, for example, sine functions commonly used in digital signal processing.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140128363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Moshiur Rahman, Jim Geist, Daniel Xing, Yuntao Liu, Ankur Srivastava, Travis Meade, Yier Jin, Swarup Bhunia
Due to the inclination towards a fab-less model of integrated circuit (IC) manufacturing, several untrusted entities get white-box access to the proprietary intellectual property (IP) blocks from diverse vendors. To this end, the untrusted entities pose security-breach threats in the form of piracy, cloning, and reverse engineering, sometimes threatening national security. Hardware obfuscation is a prominent countermeasure against such issues. Obfuscation allows for preventing the usage of the IP blocks without authorization from the IP owners. Due to finite state machine (FSM) transformation-based hardware obfuscation, the design’s FSM gets transformed to make it difficult for an attacker to reverse engineer the design. A secret key needs to be applied to make the FSM functional thus preventing the usage of the IP for unintended purposes. Although several hardware obfuscation techniques have been proposed, due to the inability to analyze the techniques from the attackers’ standpoint, numerous vulnerabilities inherent to the obfuscation methods go undetected unless a true adversary discovers them. In this paper, we present a collaborative approach between two entities - one acting as an attacker or red team and another as a defender or blue team, the first systematic approach to replicate the real attacker-defender scenario in the hardware security domain, which in return strengthens the FSM transformation-based obfuscation technique. The blue team transforms the underlying FSM of a gate-level netlist using state space obfuscation. The red team plays the role of an adversary or evaluator and tries to unlock the design by extracting the unlocking key or recovering the obfuscation circuitries. As the key outcome of this red team - blue team effort, a robust state space obfuscation methodology is evolved showing security promises.
由于集成电路(IC)制造倾向于采用无工厂模式,一些不受信任的实体可以白盒方式访问来自不同供应商的专有知识产权(IP)模块。为此,这些不受信任的实体以盗版、克隆和逆向工程的形式造成安全漏洞威胁,有时甚至威胁到国家安全。硬件混淆是解决此类问题的重要对策。混淆可以防止未经知识产权所有者授权而使用知识产权块。由于采用了基于有限状态机(FSM)转换的硬件混淆技术,设计的 FSM 会被转换,使攻击者难以对设计进行逆向工程。要使 FSM 起作用,需要使用密钥,从而防止 IP 被用于非预期目的。虽然已经提出了几种硬件混淆技术,但由于无法从攻击者的角度分析这些技术,除非真正的对手发现,否则混淆方法中固有的许多漏洞都不会被发现。在本文中,我们提出了一种两个实体之间的合作方法--一个实体作为攻击者或红队,另一个实体作为防御者或蓝队,这是首个在硬件安全领域复制真实攻击者-防御者场景的系统方法,它反过来加强了基于 FSM 变换的混淆技术。蓝队使用状态空间混淆技术转换门级网表的底层 FSM。红队扮演对手或评估者的角色,试图通过提取解锁密钥或恢复混淆电路来解锁设计。作为红队和蓝队合作的重要成果,一种强大的状态空间混淆方法得到了发展,并显示出其安全性前景。
{"title":"Security Evaluation of State Space Obfuscation of Hardware IP through a Red Team – Blue Team Practice","authors":"Md Moshiur Rahman, Jim Geist, Daniel Xing, Yuntao Liu, Ankur Srivastava, Travis Meade, Yier Jin, Swarup Bhunia","doi":"10.1145/3640461","DOIUrl":"https://doi.org/10.1145/3640461","url":null,"abstract":"<p>Due to the inclination towards a fab-less model of integrated circuit (IC) manufacturing, several untrusted entities get white-box access to the proprietary intellectual property (IP) blocks from diverse vendors. To this end, the untrusted entities pose security-breach threats in the form of piracy, cloning, and reverse engineering, sometimes threatening national security. Hardware obfuscation is a prominent countermeasure against such issues. Obfuscation allows for preventing the usage of the IP blocks without authorization from the IP owners. Due to finite state machine (FSM) transformation-based hardware obfuscation, the design’s FSM gets transformed to make it difficult for an attacker to reverse engineer the design. A secret key needs to be applied to make the FSM functional thus preventing the usage of the IP for unintended purposes. Although several hardware obfuscation techniques have been proposed, due to the inability to analyze the techniques from the attackers’ standpoint, numerous vulnerabilities inherent to the obfuscation methods go undetected unless a true adversary discovers them. In this paper, we present a collaborative approach between two entities - one acting as an attacker or <i>red team</i> and another as a defender or <i>blue team</i>, the first systematic approach to replicate the real attacker-defender scenario in the hardware security domain, which in return strengthens the FSM transformation-based obfuscation technique. The <i>blue team</i> transforms the underlying FSM of a gate-level netlist using state space obfuscation. The <i>red team</i> plays the role of an adversary or evaluator and tries to unlock the design by extracting the unlocking key or recovering the obfuscation circuitries. As the key outcome of this red team - blue team effort, a robust state space obfuscation methodology is evolved showing security promises.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140037170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}