IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems最新文献_第8页

HuNT: Exploiting Heterogeneous PIM Devices to Design a 3-D Manycore Architecture for DNN Training HuNT：利用异构 PIM 设备设计用于 DNN 训练的三维多核架构

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3444708

Chukwufumnanya Ogbogu;Gaurav Narang;Biresh Kumar Joardar;Janardhan Rao Doppa;Krishnendu Chakrabarty;Partha Pratim Pande

Processing-in-memory (PIM) architectures have emerged as an attractive computing paradigm for accelerating deep neural network (DNN) training and inferencing. However, a plethora of PIM devices, e.g., resistive random-access memory, ferroelectric field-effect transistor, phase change memory, MRAM, static random-access memory, exists and each of these devices offers advantages and drawbacks in terms of power, latency, area, and nonidealities. A heterogeneous architecture that combines the benefits of multiple devices in a single platform can enable energy-efficient and high-performance DNN training and inference. 3-D integration enables the design of such a heterogeneous architecture where multiple planar tiers consisting of different PIM devices can be integrated into a single platform. In this work, we propose the HuNT framework, which hunts for (finds) an optimal DNN neural layer mapping, and planar tier configurations for a 3-D heterogeneous architecture. Overall, our experimental results demonstrate that the HuNT-enabled 3-D heterogeneous architecture achieves up to

$10 {times }$

and

$3.5 {times }$

improvement with respect to the homogeneous and existing heterogeneous PIM-based architectures, respectively, in terms of energy-efficiency (TOPS/W). Similarly, the proposed HuNT-enabled architecture outperforms existing homogeneous and heterogeneous architectures by up to

$8 {times }$

and

$2.4times $

, respectively, in terms of compute-efficiency (TOPS/mm2) without compromising the final DNN accuracy.

内存处理（PIM）架构已成为加速深度神经网络（DNN）训练和推理的一种极具吸引力的计算模式。然而，目前存在大量 PIM 设备，例如电阻式随机存取存储器、铁电场效应晶体管、相变存储器、MRAM、静态随机存取存储器，这些设备在功耗、延迟、面积和非理想性方面各有优缺点。在单个平台中结合多种器件优势的异构架构可实现高能效、高性能的 DNN 训练和推理。三维集成可以设计这样一种异构架构，将由不同 PIM 设备组成的多个平面层集成到一个平台中。在这项工作中，我们提出了 HuNT 框架，它可以为三维异构架构寻找（发现）最佳 DNN 神经层映射和平面层配置。总体而言，我们的实验结果表明，与基于 PIM 的同构架构和现有异构架构相比，支持 HuNT 的三维异构架构在能效（TOPS/W）方面分别实现了高达 10 {times }$ 美元和 3.5 {times }$ 美元的改进。同样，在计算效率（TOPS/mm2）方面，所提出的支持 HuNT 的架构比现有的同构和异构架构分别高出 8 {times }$ 美元和 2.4 {times }$ 美元，而不会影响最终 DNN 的准确性。

{"title":"HuNT: Exploiting Heterogeneous PIM Devices to Design a 3-D Manycore Architecture for DNN Training","authors":"Chukwufumnanya Ogbogu;Gaurav Narang;Biresh Kumar Joardar;Janardhan Rao Doppa;Krishnendu Chakrabarty;Partha Pratim Pande","doi":"10.1109/TCAD.2024.3444708","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3444708","url":null,"abstract":"Processing-in-memory (PIM) architectures have emerged as an attractive computing paradigm for accelerating deep neural network (DNN) training and inferencing. However, a plethora of PIM devices, e.g., resistive random-access memory, ferroelectric field-effect transistor, phase change memory, MRAM, static random-access memory, exists and each of these devices offers advantages and drawbacks in terms of power, latency, area, and nonidealities. A heterogeneous architecture that combines the benefits of multiple devices in a single platform can enable energy-efficient and high-performance DNN training and inference. 3-D integration enables the design of such a heterogeneous architecture where multiple planar tiers consisting of different PIM devices can be integrated into a single platform. In this work, we propose the HuNT framework, which hunts for (finds) an optimal DNN neural layer mapping, and planar tier configurations for a 3-D heterogeneous architecture. Overall, our experimental results demonstrate that the HuNT-enabled 3-D heterogeneous architecture achieves up to \u0000<inline-formula> <tex-math>$10 {times }$ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$3.5 {times }$ </tex-math></inline-formula>\u0000 improvement with respect to the homogeneous and existing heterogeneous PIM-based architectures, respectively, in terms of energy-efficiency (TOPS/W). Similarly, the proposed HuNT-enabled architecture outperforms existing homogeneous and heterogeneous architectures by up to \u0000<inline-formula> <tex-math>$8 {times }$ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$2.4times $ </tex-math></inline-formula>\u0000, respectively, in terms of compute-efficiency (TOPS/mm2) without compromising the final DNN accuracy.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3300-3311"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parallel Fuzzing of IoT Messaging Protocols Through Collaborative Packet Generation 通过协作数据包生成并行模糊物联网消息传输协议

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3444705

Zhengxiong Luo;Junze Yu;Qingpeng Du;Yanyang Zhao;Feifan Wu;Heyuan Shi;Wanli Chang;Yu Jiang

Internet of Things (IoT) messaging protocols play an important role in facilitating communications between users and IoT devices. Mainstream IoT platforms employ brokers, server-side implementations of IoT messaging protocols, to enable and mediate this user-device communication. Due to the complex nature of managing communications among devices with diverse roles and functionalities, comprehensive testing of the protocol brokers necessitates collaborative parallel fuzzing. However, being unaware of the relationship between test packets generated by different parties, existing parallel fuzzing methods fail to explore the brokers’ diverse processing logic effectively. This article introduces MPFuzz, a parallel fuzzing tool designed to secure IoT messaging protocols through collaborative packet generation. The approach leverages the critical role of certain fields within IoT messaging protocols that specify the logic for message forwarding and processing by protocol brokers. MPFuzz employs an information synchronization mechanism to synchronize these key fields across different fuzzing instances and introduces a semantic-aware refinement module that optimizes generated test packets by utilizing the shared information and field semantics. This strategy facilitates a collaborative refinement of test packets across otherwise isolated fuzzing instances, thereby boosting the efficiency of parallel fuzzing. We evaluated MPFuzz on six widely used IoT messaging protocol implementations. Compared to two state-of-the-art protocol fuzzers with parallel capabilities, Peach and AFLNet, as well as two representative parallel fuzzers, SPFuzz and AFLTeam, MPFuzz achieves (6.1%,

$174.5times $

), (20.2%,

$607.2times $

), (1.9%,

$4.1times $

), and (17.4%,

$570.2times $

) higher branch coverage and fuzzing speed under the same computing resource. Furthermore, MPFuzz exposed seven previously unknown vulnerabilities in these extensively tested projects, all of which have been assigned with CVE identifiers.

物联网（IoT）消息传输协议在促进用户与物联网设备之间的通信方面发挥着重要作用。主流物联网平台采用物联网消息传输协议的服务器端实施--代理，来实现和调解这种用户与设备之间的通信。由于管理具有不同角色和功能的设备间通信的复杂性，对协议代理进行全面测试需要进行协同并行模糊测试。然而，由于不了解各方生成的测试数据包之间的关系，现有的并行模糊方法无法有效探索协议代理的不同处理逻辑。本文介绍的 MPFuzz 是一种并行模糊工具，旨在通过协作生成数据包来确保物联网消息协议的安全。该方法利用了物联网消息传输协议中某些字段的关键作用，这些字段指定了协议代理的消息转发和处理逻辑。MPFuzz 采用信息同步机制，在不同的模糊实例中同步这些关键字段，并引入语义感知细化模块，利用共享信息和字段语义优化生成的测试数据包。这种策略有助于在原本孤立的模糊实例之间协同完善测试数据包，从而提高并行模糊处理的效率。我们在六种广泛使用的物联网消息协议实现上对 MPFuzz 进行了评估。与两款具有并行能力的一流协议模糊器Peach和AFLNet，以及两款具有代表性的并行模糊器SPFuzz和AFLTeam相比，MPFuzz在相同计算资源下的分支覆盖率和模糊速度分别提高了（6.1%，174.5美元/次）、（20.2%，607.2美元/次）、（1.9%，4.1美元/次）和（17.4%，570.2美元/次）。此外，MPFuzz 在这些广泛测试的项目中暴露了 7 个以前未知的漏洞，所有这些漏洞都已分配了 CVE 标识符。

{"title":"Parallel Fuzzing of IoT Messaging Protocols Through Collaborative Packet Generation","authors":"Zhengxiong Luo;Junze Yu;Qingpeng Du;Yanyang Zhao;Feifan Wu;Heyuan Shi;Wanli Chang;Yu Jiang","doi":"10.1109/TCAD.2024.3444705","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3444705","url":null,"abstract":"Internet of Things (IoT) messaging protocols play an important role in facilitating communications between users and IoT devices. Mainstream IoT platforms employ brokers, server-side implementations of IoT messaging protocols, to enable and mediate this user-device communication. Due to the complex nature of managing communications among devices with diverse roles and functionalities, comprehensive testing of the protocol brokers necessitates collaborative parallel fuzzing. However, being unaware of the relationship between test packets generated by different parties, existing parallel fuzzing methods fail to explore the brokers’ diverse processing logic effectively. This article introduces MPF\u0000<sc>uzz\u0000, a parallel fuzzing tool designed to secure IoT messaging protocols through collaborative packet generation. The approach leverages the critical role of certain fields within IoT messaging protocols that specify the logic for message forwarding and processing by protocol brokers. MPF\u0000<sc>uzz\u0000 employs an information synchronization mechanism to synchronize these key fields across different fuzzing instances and introduces a semantic-aware refinement module that optimizes generated test packets by utilizing the shared information and field semantics. This strategy facilitates a collaborative refinement of test packets across otherwise isolated fuzzing instances, thereby boosting the efficiency of parallel fuzzing. We evaluated MPF\u0000<sc>uzz\u0000 on six widely used IoT messaging protocol implementations. Compared to two state-of-the-art protocol fuzzers with parallel capabilities, Peach and AFLNet, as well as two representative parallel fuzzers, SPFuzz and AFLTeam, MPF\u0000<sc>uzz\u0000 achieves (6.1%, \u0000<inline-formula> <tex-math>$174.5times $ </tex-math></inline-formula>\u0000), (20.2%, \u0000<inline-formula> <tex-math>$607.2times $ </tex-math></inline-formula>\u0000), (1.9%, \u0000<inline-formula> <tex-math>$4.1times $ </tex-math></inline-formula>\u0000), and (17.4%, \u0000<inline-formula> <tex-math>$570.2times $ </tex-math></inline-formula>\u0000) higher branch coverage and fuzzing speed under the same computing resource. Furthermore, MPF\u0000<sc>uzz\u0000 exposed seven previously unknown vulnerabilities in these extensively tested projects, all of which have been assigned with CVE identifiers.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3431-3442"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning Memory-Contention Timing Models With Automated Platform Profiling 利用自动平台剖析学习记忆保持时间模型

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3449237

Andrea Stevanato;Matteo Zini;Alessandro Biondi;Bruno Morelli;Alessandro Biasci

Commercial off-the-shelf (COTS) multicore platforms are often used to enable the execution of mixed-criticality real-time applications. In these systems, the memory subsystem is one of the most notable sources of interference and unpredictability, with the memory controller (MC) being a key component orchestrating the data flow between processing units and main memory. The worst-case response times of real-time tasks is indeed particularly affected by memory contention and, in turn, by the MC behavior as well. This article presents FrATM2, a Framework to Automatically learn the Timing Models of the Memory subsystem. The framework automatically generates and executes micro-benchmarks on bare-metal hardware to profile the platform behavior in a large number of memory-contention scenarios. After aggregating and filtering the collected measurements, FrATM2 trains MC models to bound memory-related interference. The MC models can be used to enable response-time analysis. The framework was evaluated on an AMD/Xilinx Ultrascale+ SoC, collecting gigabytes of raw experimental data by testing tents of thousands of contention scenarios.

商用现成（COTS）多核平台通常用于执行混合关键性实时应用。在这些系统中，内存子系统是干扰和不可预测性的最显著来源之一，内存控制器（MC）是协调处理单元和主内存之间数据流的关键组件。事实上，实时任务的最坏响应时间尤其受到内存争用的影响，反过来也受到 MC 行为的影响。本文介绍了自动学习内存子系统定时模型的框架 FrATM2。该框架可在裸机硬件上自动生成和执行微基准，以描述大量内存争用场景下的平台行为。在汇总和过滤收集到的测量结果后，FrATM2 会训练 MC 模型来约束与内存相关的干扰。MC 模型可用于响应时间分析。该框架在 AMD/Xilinx Ultrascale+ SoC 上进行了评估，通过测试数以千计的争用场景，收集了数千兆字节的原始实验数据。

{"title":"Learning Memory-Contention Timing Models With Automated Platform Profiling","authors":"Andrea Stevanato;Matteo Zini;Alessandro Biondi;Bruno Morelli;Alessandro Biasci","doi":"10.1109/TCAD.2024.3449237","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3449237","url":null,"abstract":"Commercial off-the-shelf (COTS) multicore platforms are often used to enable the execution of mixed-criticality real-time applications. In these systems, the memory subsystem is one of the most notable sources of interference and unpredictability, with the memory controller (MC) being a key component orchestrating the data flow between processing units and main memory. The worst-case response times of real-time tasks is indeed particularly affected by memory contention and, in turn, by the MC behavior as well. This article presents FrATM2, a Framework to Automatically learn the Timing Models of the Memory subsystem. The framework automatically generates and executes micro-benchmarks on bare-metal hardware to profile the platform behavior in a large number of memory-contention scenarios. After aggregating and filtering the collected measurements, FrATM2 trains MC models to bound memory-related interference. The MC models can be used to enable response-time analysis. The framework was evaluated on an AMD/Xilinx Ultrascale+ SoC, collecting gigabytes of raw experimental data by testing tents of thousands of contention scenarios.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3816-3827"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems society information 电气和电子工程师学会《集成电路和系统计算机辅助设计期刊》（IEEE Transactions on Computer-Aided Design of Integrated Circits and Systems）社会信息

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3479789

引用次数: 0

Untrusted Code Compartmentalization for Bare Metal Embedded Devices 裸金属嵌入式设备的非信任代码区隔

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3444691

Liam Tyler;Ivan De Oliveira Nunes

Micro-controller units (MCUs) implement the de facto interface between the physical and digital worlds. As a consequence, they appear in a variety of sensing/actuation applications from smart personal spaces to complex industrial control systems and safety-critical medical equipment. While many of these devices perform safety- and time-critical tasks, they often lack support for security features compatible with their importance to overall system functions. This lack of architectural support leaves them vulnerable to run-time attacks that can remotely alter their intended behavior, with potentially catastrophic consequences. In particular, we note that, MCU software often includes untrusted third-party libraries (some of them closed-source) that are blindly used within MCU programs, without proper isolation from the rest of the system. In turn, a single vulnerability (or intentional backdoor) in one such third-party software can often compromise the entire MCU software state. In this article, we tackle this problem by proposing, demonstrating security, and formally verifying the implementation of UCCA: an Untrusted Code Compartment Architecture. UCCA provides flexible hardware-enforced isolation of untrusted code sections (e.g., third-party software modules) in resource-constrained and time-critical MCUs. To demonstrate UCCA’s practicality, we implement an open-source version of the design on a real resource-constrained MCU: the well-known TI MSP430. Our evaluation shows that UCCA incurs little overhead and is affordable even to lowest-end MCUs, requiring significantly less overhead and assumptions than the prior related work.

微控制器（MCU）是物理世界和数字世界之间的实际接口。因此，它们出现在从智能个人空间到复杂工业控制系统和安全关键型医疗设备等各种传感/执行应用中。虽然许多此类设备都能执行安全和时间关键型任务，但它们往往缺乏与其对整个系统功能的重要性相匹配的安全功能支持。由于缺乏架构支持，这些设备很容易受到运行时攻击，这些攻击可以远程改变设备的预期行为，并可能造成灾难性后果。我们特别注意到，MCU 软件通常包含不受信任的第三方库（其中一些是闭源库），这些库在 MCU 程序中被盲目使用，与系统的其他部分没有适当隔离。反过来，此类第三方软件中的一个漏洞（或故意后门）往往会危及整个 MCU 软件状态。在本文中，我们通过提出 UCCA（一种不受信任的代码区架构）、演示其安全性并正式验证其实现来解决这一问题。UCCA 在资源受限、时间紧迫的 MCU 中提供灵活的硬件强制隔离不受信任的代码段（如第三方软件模块）。为了证明 UCCA 的实用性，我们在著名的 TI MSP430 这一资源受限的 MCU 上实现了该设计的开源版本。我们的评估结果表明，UCCA 产生的开销很小，即使是最低端的 MCU 也能负担得起，与之前的相关工作相比，UCCA 所需的开销和假设条件要少得多。

{"title":"Untrusted Code Compartmentalization for Bare Metal Embedded Devices","authors":"Liam Tyler;Ivan De Oliveira Nunes","doi":"10.1109/TCAD.2024.3444691","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3444691","url":null,"abstract":"Micro-controller units (MCUs) implement the de facto interface between the physical and digital worlds. As a consequence, they appear in a variety of sensing/actuation applications from smart personal spaces to complex industrial control systems and safety-critical medical equipment. While many of these devices perform safety- and time-critical tasks, they often lack support for security features compatible with their importance to overall system functions. This lack of architectural support leaves them vulnerable to run-time attacks that can remotely alter their intended behavior, with potentially catastrophic consequences. In particular, we note that, MCU software often includes untrusted third-party libraries (some of them closed-source) that are blindly used within MCU programs, without proper isolation from the rest of the system. In turn, a single vulnerability (or intentional backdoor) in one such third-party software can often compromise the entire MCU software state. In this article, we tackle this problem by proposing, demonstrating security, and formally verifying the implementation of UCCA: an \u0000<underline>U\u0000ntrusted \u0000<underline>C\u0000ode \u0000<underline>C\u0000ompartment \u0000<underline>A\u0000rchitecture. UCCA provides flexible hardware-enforced isolation of untrusted code sections (e.g., third-party software modules) in resource-constrained and time-critical MCUs. To demonstrate UCCA’s practicality, we implement an open-source version of the design on a real resource-constrained MCU: the well-known TI MSP430. Our evaluation shows that UCCA incurs little overhead and is affordable even to lowest-end MCUs, requiring significantly less overhead and assumptions than the prior related work.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3419-3430"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Dataflow-Aware Network-on-Interposer for CNN Inferencing in the Presence of Defective Chiplets 在存在缺陷片段的情况下为 CNN 推理设计数据流感知网络对接器

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3447210

Harsh Sharma;Umit Ogras;Ananth Kalyanraman;Partha Pratim Pande

The emergence of 2.5D chiplet platforms provides a new avenue for compact scale-out implementations of deep learning (DL) workloads (WLs). Integrating multiple small chiplets using a network-on-interposer (NoI) offers not only significant cost reduction and higher manufacturing yield than 2-D ICs but also better energy efficiency and performance. However, defects in chiplets may compromise performance since they restrict the computing capability. Therefore, carefully designed chiplet and NoI link placement, and task mapping schemes, in presence of defects, are necessary. In this article, we propose a defect-aware NoI design approach using a custom-defined space-filling curve (SFC) for efficient execution of mixed WLs of convolutional neural network (CNN) inference tasks. We demonstrate that the k-ary n-cube-based NoI topologies can be degenerated into SFC-based counterparts, which we refer to as SFCed NoI topologies. They enable high performance and energy efficiency with lower fabrication costs over their parent k-ary n-cube counterparts. The SFCed approach helps us to extract high performance from an inherently defective system. We demonstrate that SFCed design achieves up to

$2.3times $

and

$3.5times $

reduction in latency and energy, respectively, compared to parent NoI architectures while executing diverse DL WLs.

2.5D 芯片平台的出现为深度学习（DL）工作负载（WL）的紧凑型扩展实施提供了一条新途径。与二维集成电路相比，利用网络集成器（NoI）集成多个小型芯片不仅能显著降低成本、提高制造良率，还能提高能效和性能。然而，芯片中的缺陷可能会影响性能，因为它们限制了计算能力。因此，在存在缺陷的情况下，有必要精心设计芯片和 NoI 链路布局以及任务映射方案。在本文中，我们提出了一种缺陷感知 NoI 设计方法，使用自定义空间填充曲线（SFC）高效执行卷积神经网络（CNN）推理任务的混合 WL。我们证明，基于 k-ary n 立方体的 NoI 拓扑可以退化为基于 SFC 的对应拓扑，我们称之为 SFCed NoI 拓扑。与kary n立方体拓扑相比，SFCed NoI拓扑能以更低的制造成本实现更高的性能和能效。SFCed 方法有助于我们从固有缺陷的系统中提取高性能。我们证明，与父 NoI 架构相比，SFCed 设计在执行多样化的 DL WL 时，延迟和能耗分别降低了 2.3 倍和 3.5 倍。

{"title":"A Dataflow-Aware Network-on-Interposer for CNN Inferencing in the Presence of Defective Chiplets","authors":"Harsh Sharma;Umit Ogras;Ananth Kalyanraman;Partha Pratim Pande","doi":"10.1109/TCAD.2024.3447210","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3447210","url":null,"abstract":"The emergence of 2.5D chiplet platforms provides a new avenue for compact scale-out implementations of deep learning (DL) workloads (WLs). Integrating multiple small chiplets using a network-on-interposer (NoI) offers not only significant cost reduction and higher manufacturing yield than 2-D ICs but also better energy efficiency and performance. However, defects in chiplets may compromise performance since they restrict the computing capability. Therefore, carefully designed chiplet and NoI link placement, and task mapping schemes, in presence of defects, are necessary. In this article, we propose a defect-aware NoI design approach using a custom-defined space-filling curve (SFC) for efficient execution of mixed WLs of convolutional neural network (CNN) inference tasks. We demonstrate that the k-ary n-cube-based NoI topologies can be degenerated into SFC-based counterparts, which we refer to as SFCed NoI topologies. They enable high performance and energy efficiency with lower fabrication costs over their parent k-ary n-cube counterparts. The SFCed approach helps us to extract high performance from an inherently defective system. We demonstrate that SFCed design achieves up to \u0000<inline-formula> <tex-math>$2.3times $ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$3.5times $ </tex-math></inline-formula>\u0000 reduction in latency and energy, respectively, compared to parent NoI architectures while executing diverse DL WLs.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4190-4201"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10745841","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ECG: Augmenting Embedded Operating System Fuzzing via LLM-Based Corpus Generation ECG：通过基于 LLM 的语料库生成增强嵌入式操作系统模糊测试

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3447220

Qiang Zhang;Yuheng Shen;Jianzhong Liu;Yiru Xu;Heyuan Shi;Yu Jiang;Wanli Chang

Embedded operating systems (Embedded OSs) power much of our critical infrastructure but are, in general, much less tested for bugs than general-purpose operating systems. Fuzzing Embedded OSs encounter significant roadblocks due to much less documented specifications, an inherent ineffectiveness in generating high-quality payloads. In this article, we propose ECG, an Embedded OS fuzzer empowered by large language models (LLMs) to sufficiently mitigate the aforementioned issues. ECG approaches fuzzing Embedded OS by automatically generating input specifications based on readily available source code and documentation, instrumenting and intercepting execution behavior for directional guidance information, and generating inputs with payloads according to the pregenerated input specifications and directional hints provided from previous runs. These methods are empowered by using an interactive refinement method to extract the most from LLMs while using established parsing checkers to validate the outputs. Our evaluation results demonstrate that ECG uncovered 32 new vulnerabilities across three popular open-source Embedded OS (RT-Linux, RaspiOS, and OpenWrt) and detected ten bugs in a commercial Embedded OS running on an actual device. Moreover, compared to Syzkaller, Moonshine, KernelGPT, Rtkaller, and DRLF, ECG has achieved additional kernel code coverage improvements of 23.20%, 19.46%, 10.96%, 15.47%, and 11.05%, respectively, with an overall average improvement of 16.02%. These results underscore ECG’s enhanced capability in uncovering vulnerabilities, thus contributing to the overall robustness and security of the Embedded OS.

嵌入式操作系统（Embedded OSs）为我们的许多关键基础设施提供了动力，但一般来说，与通用操作系统相比，嵌入式操作系统的漏洞测试要少得多。由于文档规范少得多，在生成高质量有效载荷方面存在固有的低效性，因此嵌入式操作系统的模糊测试遇到了重大障碍。在本文中，我们提出了一种嵌入式操作系统模糊器 ECG，该模糊器由大型语言模型（LLM）驱动，可充分缓解上述问题。ECG 采用以下方法对嵌入式操作系统进行模糊处理：根据现成的源代码和文档自动生成输入规范，利用工具和拦截执行行为以获取方向指引信息，并根据预生成的输入规范和先前运行中提供的方向提示生成带有有效载荷的输入。这些方法通过使用交互式细化方法来从 LLM 中提取最多信息，同时使用已建立的解析检查器来验证输出结果。我们的评估结果表明，ECG 在三个流行的开源嵌入式操作系统（RT-Linux、RaspiOS 和 OpenWrt）中发现了 32 个新漏洞，并在实际设备上运行的商业嵌入式操作系统中检测到 10 个漏洞。此外，与 Syzkaller、Moonshine、KernelGPT、Rtkaller 和 DRLF 相比，ECG 的内核代码覆盖率分别提高了 23.20%、19.46%、10.96%、15.47% 和 11.05%，总体平均提高了 16.02%。这些结果表明，ECG 在发现漏洞方面的能力得到了增强，从而提高了嵌入式操作系统的整体稳健性和安全性。

{"title":"ECG: Augmenting Embedded Operating System Fuzzing via LLM-Based Corpus Generation","authors":"Qiang Zhang;Yuheng Shen;Jianzhong Liu;Yiru Xu;Heyuan Shi;Yu Jiang;Wanli Chang","doi":"10.1109/TCAD.2024.3447220","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3447220","url":null,"abstract":"Embedded operating systems (Embedded OSs) power much of our critical infrastructure but are, in general, much less tested for bugs than general-purpose operating systems. Fuzzing Embedded OSs encounter significant roadblocks due to much less documented specifications, an inherent ineffectiveness in generating high-quality payloads. In this article, we propose ECG, an Embedded OS fuzzer empowered by large language models (LLMs) to sufficiently mitigate the aforementioned issues. ECG approaches fuzzing Embedded OS by automatically generating input specifications based on readily available source code and documentation, instrumenting and intercepting execution behavior for directional guidance information, and generating inputs with payloads according to the pregenerated input specifications and directional hints provided from previous runs. These methods are empowered by using an interactive refinement method to extract the most from LLMs while using established parsing checkers to validate the outputs. Our evaluation results demonstrate that ECG uncovered 32 new vulnerabilities across three popular open-source Embedded OS (RT-Linux, RaspiOS, and OpenWrt) and detected ten bugs in a commercial Embedded OS running on an actual device. Moreover, compared to Syzkaller, Moonshine, KernelGPT, Rtkaller, and DRLF, ECG has achieved additional kernel code coverage improvements of 23.20%, 19.46%, 10.96%, 15.47%, and 11.05%, respectively, with an overall average improvement of 16.02%. These results underscore ECG’s enhanced capability in uncovering vulnerabilities, thus contributing to the overall robustness and security of the Embedded OS.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4238-4249"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting Spoofed Noisy Speeches via Activation-Based Residual Blocks for Embedded Systems 通过基于激活的嵌入式系统残留块检测欺骗性噪声演讲

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3437331

Jinyu Zhan;Suidi Peng;Wei Jiang;Xiang Wang;Jiarui Liu

Spoofed noisy speeches seriously threaten the speech-based embedded systems, such as smartphones and intelligent assistants. Consequently, we present an anti-spoofing detection model with activation-based residual blocks to identify spoofed noisy speeches with the requirements of high accuracy and low time overhead. Through theoretic analysis of noise propagation on shortcut connections of traditional residual blocks, we observe that different activation functions can help reducing the influence of noise under certain situations. Then, we propose a feature-aware activation function to weaken the influence of noise and enhance the anti-spoofing features on shortcut connections, in which a fine-grained processing is designed to remove noise and strengthen significant features. We also propose a variance-increasing-based optimization algorithm to find the optimal hyperparameters of the feature-aware activation function. Benchmark-based experiments demonstrate that the proposed method can reduce the average equal error rate of anti-spoofing detection from 21.72% to 4.51% and improve the accuracy by up to 37.06% and save up to 91.26% of time overhead on Jetson AGX Xavier compared with ten state-of-the-art methods.

欺骗性高噪声语音严重威胁着智能手机和智能助手等基于语音的嵌入式系统。因此，我们提出了一种反欺骗检测模型，利用基于激活的残差块来识别欺骗性高噪音语音，同时满足高精度和低时间开销的要求。通过对噪声在传统残差块捷径连接上的传播进行理论分析，我们发现在特定情况下，不同的激活函数有助于降低噪声的影响。随后，我们提出了一种特征感知激活函数，以弱化噪声的影响并增强捷径连接上的反欺骗特征，其中设计了一种细粒度处理来去除噪声并强化重要特征。我们还提出了一种基于方差递增的优化算法，以找到特征感知激活函数的最佳超参数。基于基准的实验证明，在 Jetson AGX Xavier 上，与十种最先进的方法相比，所提出的方法可以将反欺骗检测的平均等效错误率从 21.72% 降低到 4.51%，准确率提高了 37.06%，时间开销节省了 91.26%。

{"title":"Detecting Spoofed Noisy Speeches via Activation-Based Residual Blocks for Embedded Systems","authors":"Jinyu Zhan;Suidi Peng;Wei Jiang;Xiang Wang;Jiarui Liu","doi":"10.1109/TCAD.2024.3437331","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3437331","url":null,"abstract":"Spoofed noisy speeches seriously threaten the speech-based embedded systems, such as smartphones and intelligent assistants. Consequently, we present an anti-spoofing detection model with activation-based residual blocks to identify spoofed noisy speeches with the requirements of high accuracy and low time overhead. Through theoretic analysis of noise propagation on shortcut connections of traditional residual blocks, we observe that different activation functions can help reducing the influence of noise under certain situations. Then, we propose a feature-aware activation function to weaken the influence of noise and enhance the anti-spoofing features on shortcut connections, in which a fine-grained processing is designed to remove noise and strengthen significant features. We also propose a variance-increasing-based optimization algorithm to find the optimal hyperparameters of the feature-aware activation function. Benchmark-based experiments demonstrate that the proposed method can reduce the average equal error rate of anti-spoofing detection from 21.72% to 4.51% and improve the accuracy by up to 37.06% and save up to 91.26% of time overhead on Jetson AGX Xavier compared with ten state-of-the-art methods.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3985-3996"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NDPGNN: A Near-Data Processing Architecture for GNN Training and Inference Acceleration NDPGNN：用于 GNN 训练和推理加速的近数据处理架构

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3446871

Haoyang Wang;Shengbing Zhang;Xiaoya Fan;Zhao Yang;Meng Zhang

Graph neural networks (GNNs) require a large number of fine-grained memory accesses, which results in inefficient use of bandwidth resources. In this article, we introduce a near-data processing architecture tailored for GNN acceleration, named NDPGNN. NDPGNN provides different operating modes to meet the acceleration needs of various GNN frameworks while ensuring the configurability and scalability of the system. NDPGNN takes advantage of data locality characteristics to repeatedly distribute and utilize data, thereby reducing memory access requirements, and further improving memory access efficiency by combining a subgraph sparse node scheduling strategy with intermediate result reuse. We use data packaging to provide a higher effective data ratio for long-distance data transmission, thereby improving the utilization of the system’s limited bandwidth resources. Compared with the previous method, NDPGNN brings 5.68 times improvement in system performance while reducing energy consumption overhead by 8.49 times.

图神经网络（GNN）需要大量细粒度内存访问，导致带宽资源使用效率低下。在本文中，我们介绍了一种专为 GNN 加速定制的近数据处理架构，命名为 NDPGNN。NDPGNN 提供不同的运行模式，以满足各种 GNN 框架的加速需求，同时确保系统的可配置性和可扩展性。NDPGNN 利用数据局部性特点重复分发和利用数据，从而降低了内存访问要求，并通过将子图稀疏节点调度策略与中间结果重用相结合，进一步提高了内存访问效率。我们利用数据打包为远距离数据传输提供了更高的有效数据比率，从而提高了系统有限带宽资源的利用率。与之前的方法相比，NDPGNN 将系统性能提高了 5.68 倍，同时将能耗开销降低了 8.49 倍。

引用次数: 0

Balancing Security and Efficiency: System-Informed Mitigation of Power-Based Covert Channels 平衡安全与效率：基于系统的功率型隐蔽信道缓解措施

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3438999

Jeferson González-Gómez;Mohammed Bakr Sikal;Heba Khdr;Lars Bauer;Jörg Henkel

As the digital landscape continues to evolve, the security of computing systems has become a critical concern. Power-based covert channels (e.g., thermal covert channel s (TCCs)), a form of communication that exploits the system resources to transmit information in a hidden or unintended manner, have been recently studied as an effective mechanism to leak information between malicious entities via the modulation of CPU power. To this end, dynamic voltage and frequency scaling (DVFS) has been widely used as a countermeasure to mitigate TCCs by directly affecting the communication between the actors. Although this technique has proven effective in neutralizing such attacks, it introduces significant performance and energy penalties, that are particularly detrimental to energy-constrained embedded systems. In this article, we propose different system-informed countermeasures to power-based covert channels from the heuristic and machine learning (ML) domains. Our proposed techniques leverage task migration and DVFS to jointly mitigate the channels and maximize energy efficiency. Our extensive experimental evaluation on two commercial platforms: 1) the NVIDIA Jetson TX2 and 2) Jetson Orin shows that our approach significantly improves the overall energy efficiency of the system compared to the state-of-the-art solution while nullifying the attack at all times.

随着数字技术的不断发展，计算系统的安全性已成为人们关注的焦点。基于功率的隐蔽信道（如热隐蔽信道（TCC））是一种利用系统资源以隐蔽或无意方式传输信息的通信形式，最近已被研究为一种通过调制 CPU 功率在恶意实体之间泄露信息的有效机制。为此，动态电压和频率缩放（DVFS）已被广泛用作一种对策，通过直接影响行为体之间的通信来缓解 TCC。虽然这种技术已被证明能有效抵消此类攻击，但它会带来显著的性能和能耗损失，尤其不利于能源受限的嵌入式系统。在本文中，我们从启发式和机器学习（ML）领域针对基于功率的隐蔽信道提出了不同的系统信息对策。我们提出的技术利用任务迁移和 DVFS 来共同缓解通道问题，并最大限度地提高能效。我们在两个商用平台（1）NVIDIA Jetson TX2 和 2）Jetson Orin 上进行了广泛的实验评估，结果表明，与最先进的解决方案相比，我们的方法显著提高了系统的整体能效，同时在任何时候都能使攻击无效。

{"title":"Balancing Security and Efficiency: System-Informed Mitigation of Power-Based Covert Channels","authors":"Jeferson González-Gómez;Mohammed Bakr Sikal;Heba Khdr;Lars Bauer;Jörg Henkel","doi":"10.1109/TCAD.2024.3438999","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3438999","url":null,"abstract":"As the digital landscape continues to evolve, the security of computing systems has become a critical concern. Power-based covert channels (e.g., thermal covert channel s (TCCs)), a form of communication that exploits the system resources to transmit information in a hidden or unintended manner, have been recently studied as an effective mechanism to leak information between malicious entities via the modulation of CPU power. To this end, dynamic voltage and frequency scaling (DVFS) has been widely used as a countermeasure to mitigate TCCs by directly affecting the communication between the actors. Although this technique has proven effective in neutralizing such attacks, it introduces significant performance and energy penalties, that are particularly detrimental to energy-constrained embedded systems. In this article, we propose different system-informed countermeasures to power-based covert channels from the heuristic and machine learning (ML) domains. Our proposed techniques leverage task migration and DVFS to jointly mitigate the channels and maximize energy efficiency. Our extensive experimental evaluation on two commercial platforms: 1) the NVIDIA Jetson TX2 and 2) Jetson Orin shows that our approach significantly improves the overall energy efficiency of the system compared to the state-of-the-art solution while nullifying the attack at all times.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3395-3406"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0