Proceedings of the Great Lakes Symposium on VLSI 2022最新文献

英文中文

Distributed Logic Encryption: Essential Security Requirements and Low-Overhead Implementation 分布式逻辑加密:基本安全要求和低开销实现

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530372

Raheel Afsharmazayejani, H. Sayadi, Amin Rezaei

Due to outsource manufacturing, the semiconductor industry must deal with various hardware threats such as piracy and overproduction. To prevent illegal electronic products from functioning, the circuit can be encrypted using a protected key only known to the designer. However, an attacker can still decipher the secret key utilizing a functioning circuit bought from the market, and the encrypted layout leaked from an untrusted foundry. In this paper, after introducing essential conformity and mutuality features for secure logic encryption, we propose DLE, a novel Distributed Logic Encryption design that resists against all known oracle guided and structural attacks including the newly proposed fault-aided SAT-based attack that iteratively injects a single stuck-at fault to thwart the locking effect. DLE forces the attacker to insert multiple stuck-at faults simultaneously in critical points to achieve a smaller but meaningful encrypted circuit; thus, exponentially reducing the chance to hit all the critical points with properly located stuck-at fault injections. Our experiments confirm that DLE maintains an exponentially high degree of security under diverse attacks with the polynomial area and linear performance overheads.

由于外包制造，半导体行业必须应对各种硬件威胁，如盗版和生产过剩。为了防止非法电子产品的功能，可以使用只有设计者知道的受保护密钥对电路进行加密。然而，攻击者仍然可以利用从市场上购买的功能电路破译密钥，并且加密的布局从不受信任的代工厂泄露。在本文中，在介绍了安全逻辑加密的基本一致性和互性特征之后，我们提出了一种新的分布式逻辑加密设计DLE，它可以抵抗所有已知的oracle引导和结构性攻击，包括新提出的基于故障辅助的sat攻击，该攻击迭代地注入单个卡在故障以阻止锁定效果。DLE迫使攻击者在关键点上同时插入多个卡在故障，以实现更小但有意义的加密电路;因此，通过正确定位卡钻断层注入，以指数方式降低了击中所有临界点的机会。我们的实验证实，在多项式面积和线性性能开销的各种攻击下，DLE保持了指数级的高安全性。

{"title":"Distributed Logic Encryption: Essential Security Requirements and Low-Overhead Implementation","authors":"Raheel Afsharmazayejani, H. Sayadi, Amin Rezaei","doi":"10.1145/3526241.3530372","DOIUrl":"https://doi.org/10.1145/3526241.3530372","url":null,"abstract":"Due to outsource manufacturing, the semiconductor industry must deal with various hardware threats such as piracy and overproduction. To prevent illegal electronic products from functioning, the circuit can be encrypted using a protected key only known to the designer. However, an attacker can still decipher the secret key utilizing a functioning circuit bought from the market, and the encrypted layout leaked from an untrusted foundry. In this paper, after introducing essential conformity and mutuality features for secure logic encryption, we propose DLE, a novel Distributed Logic Encryption design that resists against all known oracle guided and structural attacks including the newly proposed fault-aided SAT-based attack that iteratively injects a single stuck-at fault to thwart the locking effect. DLE forces the attacker to insert multiple stuck-at faults simultaneously in critical points to achieve a smaller but meaningful encrypted circuit; thus, exponentially reducing the chance to hit all the critical points with properly located stuck-at fault injections. Our experiments confirm that DLE maintains an exponentially high degree of security under diverse attacks with the polynomial area and linear performance overheads.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123061904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Inter-Architecture Portability of Artificial Neural Networks and Side Channel Attacks 人工神经网络的架构间可移植性与侧信道攻击

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530356

Manoj Gopale, G. Ditzler, Roman L. Lysecky, Janet Roveda

Side-channel attacks (SCA) have been studied for several decades, which resulted in many techniques that use statistical models to extract system information from side channels. More recently, machine learning has shown significant promise to advance the ability for SCAs to expose vulnerabilities. Artificial neural networks (ANN) can effectively learn nonlinear relationships between features within a side channel. In this paper, we propose a multi-architecture data aggregation technique to profile power traces for a system with an embedded processor that is based on three types of deep NNs, namely, multi-layer perceptrons (MLP), convolutional neural networks (CNN), and recurrent neural networks (RNN). This is one of the first works to explore the inter-architecture portability of NNs and SCAs. We demonstrate the robustness of the ANNs performing power-based SCAs on multiple architecture configurations with different architectural features, such as L1/L2 caches' size and associativity, and system memory size. We provide a comprehensive set of benchmarks to demonstrate that architecturally identical devices are not essential for profile-based SCAs

侧信道攻击(SCA)已经被研究了几十年，由此产生了许多使用统计模型从侧信道提取系统信息的技术。最近，机器学习在提高sca暴露漏洞的能力方面显示出了巨大的希望。人工神经网络(ANN)可以有效地学习侧信道内特征之间的非线性关系。在本文中，我们提出了一种基于三种深度神经网络的嵌入式处理器系统的多架构数据聚合技术，即多层感知器(MLP)，卷积神经网络(CNN)和循环神经网络(RNN)。这是探索神经网络和sca的架构间可移植性的首批工作之一。我们展示了人工神经网络在具有不同体系结构特征(如L1/L2缓存的大小和关联性以及系统内存大小)的多种体系结构配置上执行基于功率的sca的鲁棒性。我们提供了一组全面的基准测试，以证明架构相同的设备对于基于概要文件的sca来说并不是必需的

{"title":"Inter-Architecture Portability of Artificial Neural Networks and Side Channel Attacks","authors":"Manoj Gopale, G. Ditzler, Roman L. Lysecky, Janet Roveda","doi":"10.1145/3526241.3530356","DOIUrl":"https://doi.org/10.1145/3526241.3530356","url":null,"abstract":"Side-channel attacks (SCA) have been studied for several decades, which resulted in many techniques that use statistical models to extract system information from side channels. More recently, machine learning has shown significant promise to advance the ability for SCAs to expose vulnerabilities. Artificial neural networks (ANN) can effectively learn nonlinear relationships between features within a side channel. In this paper, we propose a multi-architecture data aggregation technique to profile power traces for a system with an embedded processor that is based on three types of deep NNs, namely, multi-layer perceptrons (MLP), convolutional neural networks (CNN), and recurrent neural networks (RNN). This is one of the first works to explore the inter-architecture portability of NNs and SCAs. We demonstrate the robustness of the ANNs performing power-based SCAs on multiple architecture configurations with different architectural features, such as L1/L2 caches' size and associativity, and system memory size. We provide a comprehensive set of benchmarks to demonstrate that architecturally identical devices are not essential for profile-based SCAs","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128808045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evolutionary Standard Cell Synthesis of Unconventional Designs 非常规设计的进化标准细胞合成

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530353

C. PrashanthH., M. Rao

Conventional synthesis algorithms transform the behavioral RTL design to a standard cell mapped gate level netlist, with support to customize optimization effort of few operators. HDL description standards and current synthesis methods lack support to generate netlist of custom functions for quick validation and characterization of the design. Additionally, synthesis does not cater directly to various mathematical functions, design efforts towards approximating the desired function is needed. Hence a synthesis method for realizing circuits applicable to not only arithmetic but also to non-linear functions will be highly valuable and appreciated among the VLSI design community. This work employs Cartesian Genetic Programming (CGP) algorithm, an evolutionary design methodology suitable to synthesize digital circuits. CGP benefits in accelerating the design process and offers the ease to realize complex functions with little to no design effort. Activation functions are difficult to realize as combinational circuits using traditional design methods, this work validates the synthesis results for 6 non-linear activation functions using both classical and standard cell synthesis oriented CGP. The ability to incorporate such unconventional designs to the traditional synthesis flow will be instrumental for implementing accelerators in hardware space, and eventually for efficient design of heterogeneous SoC systems.

传统的综合算法将行为RTL设计转换为标准的单元映射门级网表，支持对少数操作人员进行定制优化。HDL描述标准和当前的合成方法缺乏对生成用于快速验证和表征设计的自定义函数的网络列表的支持。此外，合成不直接迎合各种数学函数，设计努力接近所需的功能是必要的。因此，一种既适用于算术又适用于非线性函数的电路的综合实现方法将在VLSI设计界具有很高的价值和价值。这项工作采用笛卡尔遗传规划(CGP)算法，一种适合合成数字电路的进化设计方法。CGP的优点是加速设计过程，并且可以轻松地实现复杂的功能，几乎不需要设计工作。激活函数是传统设计方法难以实现的组合电路，本工作验证了6种非线性激活函数的合成结果，采用经典和标准的细胞合成面向CGP。将这种非常规设计整合到传统合成流程中的能力将有助于在硬件空间中实现加速器，并最终实现异构SoC系统的高效设计。

{"title":"Evolutionary Standard Cell Synthesis of Unconventional Designs","authors":"C. PrashanthH., M. Rao","doi":"10.1145/3526241.3530353","DOIUrl":"https://doi.org/10.1145/3526241.3530353","url":null,"abstract":"Conventional synthesis algorithms transform the behavioral RTL design to a standard cell mapped gate level netlist, with support to customize optimization effort of few operators. HDL description standards and current synthesis methods lack support to generate netlist of custom functions for quick validation and characterization of the design. Additionally, synthesis does not cater directly to various mathematical functions, design efforts towards approximating the desired function is needed. Hence a synthesis method for realizing circuits applicable to not only arithmetic but also to non-linear functions will be highly valuable and appreciated among the VLSI design community. This work employs Cartesian Genetic Programming (CGP) algorithm, an evolutionary design methodology suitable to synthesize digital circuits. CGP benefits in accelerating the design process and offers the ease to realize complex functions with little to no design effort. Activation functions are difficult to realize as combinational circuits using traditional design methods, this work validates the synthesis results for 6 non-linear activation functions using both classical and standard cell synthesis oriented CGP. The ability to incorporate such unconventional designs to the traditional synthesis flow will be instrumental for implementing accelerators in hardware space, and eventually for efficient design of heterogeneous SoC systems.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125737583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Protected ECC Still Leaks: A Novel Differential-Bit Side-channel Power Attack on ECDH and Countermeasures 受保护的ECC仍在泄漏:一种针对ECC的差分位侧通道功率攻击及对策

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530342

Tianhong Xu, Cheng Gongye, Yunsi Fei

Over the past decade, a few side-channel attacks (SCAs) and countermeasures against implementations of Elliptic-Curve Cryptography (ECC), commonly used in embedded systems and Internet-of- Things (IoT) devices, have been presented. This work discovers a new side-channel power leakage of an ECDH hardware implementation protected against existing attacks, where the power leakage is not directly related to the key bits, but related to the differential of two consecutive key bits. We propose an unsupervised differential-bit horizontal clustering attack and implement it against an ECDH FPGA implementation. We also comprehensively analyze the related operations and circuits, and identify the root cause of such leakage is due to the different arrival times of inputs to combinational circuits. Such leakage generally exists in ECC hardware implementations, including FPGA and ASIC. We further propose several effective countermeasures to address this new vulnerability and evaluate the implemetations.

在过去的十年中，针对嵌入式系统和物联网(IoT)设备中常用的椭圆曲线加密(ECC)实现的一些侧信道攻击(sca)和对策已经被提出。这项工作发现了一种新的ECDH硬件实现的侧信道功率泄漏，可以防止现有的攻击，其中功率泄漏与密钥位没有直接关系，而是与两个连续密钥位的差有关。我们提出了一种无监督差分位水平聚类攻击，并针对ECDH FPGA实现了它。我们还全面分析了相关的操作和电路，并确定了这种泄漏的根本原因是由于输入到组合电路的不同到达时间。这种泄漏一般存在于ECC硬件实现中，包括FPGA和ASIC。我们进一步提出了一些有效的对策来解决这一新的漏洞，并评估了实施情况。

引用次数: 0

Session details: Session 5A: Hardware Security 会话详细信息:会话5A:硬件安全

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3542690

K. Gaj

引用次数: 0

RAPTA: A Hierarchical Representation Learning Solution For Real-Time Prediction of Path-Based Static Timing Analysis RAPTA:基于路径的静态时序分析实时预测的分层表示学习解决方案

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530831

Tanmoy Chowdhury, Ashka Vakil, B. S. Latibari, Seyed Aresh Beheshti Shirazi, Ali Mirzaeian, Xiaojie Guo, Sai Manoj Pudukotai Dinakarrao, H. Homayoun, I. Savidis, Liang Zhao, Avesta Sasan

This paper presents RAPTA, a customized Representation-learning Architecture for automation of feature engineering and predicting the result of Path-based Timing-Analysis early in the physical design cycle. RAPTA offers multiple advantages compared to prior work: 1) It has superior accuracy with errors std ranges 3.9ps~16.05ps in 32nm technology. 2) RAPTA's architecture does not change with feature-set size, 3) RAPTA does not require manual input feature engineering. To the best of our knowledge, this is the first work, in which Bidirectional Long Short-Term Memory (Bi-LSTM) representation learning is used to digest raw information for feature engineering, where generation of latent features and Multilayer Perceptron (MLP) based regression for timing prediction can be trained end-to-end.

RAPTA是一种定制的表征学习体系结构，用于特征工程的自动化，并在物理设计周期的早期预测基于路径的时序分析的结果。RAPTA具有以下优点:1)在32nm工艺下，具有较高的精度，误差范围为3.9ps~16.05ps。2) RAPTA的架构不会随着特征集的大小而改变，3)RAPTA不需要人工输入特征工程。据我们所知，这是第一项工作，其中双向长短期记忆(Bi-LSTM)表示学习用于消化特征工程的原始信息，其中潜在特征的生成和基于多层感知器(MLP)的时间预测回归可以端到端进行训练。

引用次数: 1

Fast Parallel High-Level Synthesis Design Space Explorer: Targeting FPGAs to accelerate ASIC Exploration 快速并行高级综合设计空间探索者:针对fpga加速ASIC探索

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530339

M. I. Rashid, B. C. Schafer

Raising the level of VLSI design abstraction to the behavioral level allows to generate different micro-architectures from the same behavioral description by simply setting different synthesis options. These are typically synthesis directives in the form of pragmas that control how to synthesize arrays, loops, and functions. Out of all the combinations the designer is typically only interested in the synthesis directive combinations that lead to the Pareto-optimal designs. Unfortunately this multi-objective optimization problem grows supra-linearly with the number of the explorable operations. Thus, fast heuristics are needed. One additional way to accelerate the exploration process is by parallelizing the explorer tcreating multi-threaded versions. The main problem with this approach is that every time that a new pragma combination is generated the explorer requires to invoke the HLS process in order to evaluate the effect of these synthesis options on the resultant design. This tool invocation requires to check out a HLS tool license that will not be released until the HLS process has finished. This implies that the maximum number of parallel threads is limited by the number of licenses available. In the ASIC case, these licenses are extremely expensive, making it often prohibitory for some companies to have more than one. On contrary FPGA vendors provide their HLS tools free. Thus, it is tempting to investigate if FPGA HLS tools can be used to find the ASIC Pareto-optimal designs. To address this, in this work we present a dedicated multi-threaded parallel HLS DSE explorer that is able to accelerate HLS DSE for ASICs by targeting first FPGAs and using machine learning to convert the exploration results obtained to find the optimal ASIC equivalent. Experimental results show that our proposed approach is very efficient speedup up the exploration process considerably.

将VLSI设计抽象级别提升到行为级别，可以通过简单地设置不同的合成选项，从相同的行为描述生成不同的微架构。这些典型的合成指令以pragma的形式出现，控制如何合成数组、循环和函数。在所有的组合中，设计师通常只对合成指令组合感兴趣，从而导致帕累托最优设计。不幸的是，这个多目标优化问题随着可探索操作的数量呈超线性增长。因此，需要快速启发式。另一种加速探索过程的方法是通过并行化资源管理器来创建多线程版本。这种方法的主要问题是，每次生成新的pragma组合时，资源管理器都需要调用HLS流程，以便评估这些合成选项对最终设计的影响。此工具调用需要签出HLS工具许可证，该许可证在HLS进程完成之前不会被释放。这意味着并行线程的最大数量受到可用许可证数量的限制。在ASIC的案例中，这些许可证非常昂贵，使得一些公司通常禁止拥有多个许可证。相反，FPGA供应商免费提供他们的HLS工具。因此，研究FPGA HLS工具是否可以用来找到ASIC帕累托最优设计是很有诱惑力的。为了解决这个问题，在这项工作中，我们提出了一个专用的多线程并行HLS DSE探索者，它能够通过瞄准第一个fpga并使用机器学习来转换获得的勘探结果以找到最佳的ASIC等效物来加速ASIC的HLS DSE。实验结果表明，该方法非常有效，大大加快了勘探速度。

{"title":"Fast Parallel High-Level Synthesis Design Space Explorer: Targeting FPGAs to accelerate ASIC Exploration","authors":"M. I. Rashid, B. C. Schafer","doi":"10.1145/3526241.3530339","DOIUrl":"https://doi.org/10.1145/3526241.3530339","url":null,"abstract":"Raising the level of VLSI design abstraction to the behavioral level allows to generate different micro-architectures from the same behavioral description by simply setting different synthesis options. These are typically synthesis directives in the form of pragmas that control how to synthesize arrays, loops, and functions. Out of all the combinations the designer is typically only interested in the synthesis directive combinations that lead to the Pareto-optimal designs. Unfortunately this multi-objective optimization problem grows supra-linearly with the number of the explorable operations. Thus, fast heuristics are needed. One additional way to accelerate the exploration process is by parallelizing the explorer tcreating multi-threaded versions. The main problem with this approach is that every time that a new pragma combination is generated the explorer requires to invoke the HLS process in order to evaluate the effect of these synthesis options on the resultant design. This tool invocation requires to check out a HLS tool license that will not be released until the HLS process has finished. This implies that the maximum number of parallel threads is limited by the number of licenses available. In the ASIC case, these licenses are extremely expensive, making it often prohibitory for some companies to have more than one. On contrary FPGA vendors provide their HLS tools free. Thus, it is tempting to investigate if FPGA HLS tools can be used to find the ASIC Pareto-optimal designs. To address this, in this work we present a dedicated multi-threaded parallel HLS DSE explorer that is able to accelerate HLS DSE for ASICs by targeting first FPGAs and using machine learning to convert the exploration results obtained to find the optimal ASIC equivalent. Experimental results show that our proposed approach is very efficient speedup up the exploration process considerably.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132655176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Session details: Session 2B: Computer-Aided Design (CAD) 会议详情:第二部分:计算机辅助设计(CAD)

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3542685

E. Salman

引用次数: 0

In-Memory Computing based Machine Learning Accelerators: Opportunities and Challenges

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530051

K. Roy

Traditional computing systems based on von Neumann architectures are fundamentally bottle-necked by the transfer speeds between memory and processor. With growing computational needs of today's application space, dominated by Machine Learning (ML) workloads, there is a need to design special purpose computing systems operating on the principle of co-located memory and processing units. Such an approach, commonly known as 'In-memory computing', can potentially eliminate expensive data movement costs by computing inside the memory array itself. To that effect, crossbars based on resistive switching Non-Volatile Memory (NVM) devices has shown immense promise in serving as the building blocks of in-memory computing systems, as their high storage density can overcome scaling challenges that plague CMOS technology today. Adding to that, the ability of resistive crossbars to accelerate the main computational kernel of ML workloads by performing massively parallel, in-situ matrix vector multiplication (MVM) operations, makes them a promising candidate for building area and energy-efficient systems. However, the analog computing nature in resistive crossbars introduce approximations in MVM computations due to device and circuit level nonidealities. Further, analog systems pose high cost peripheral circuit requirements for conversions between the analog and digital domain. Thus, there is a need to understand the entire system design stack, from device characteristics to architectures, and perform effective hardware-software co-design to truly realize the potential of resistive crossbars as future computing systems. In this talk, we will present a comprehensive overview of NVM crossbars for accelerating ML workloads. We describe, in detail, the design principles of the basic building blocks, such as the device and associated circuits, that constitute the crossbars. We explore non-idealities arising from the device characteristics and circuit behavior and study their impact on MVM functionality of NVM crossbars for machine learning hardware.

基于冯·诺伊曼架构的传统计算系统基本上受到存储器和处理器之间传输速度的瓶颈。随着机器学习(ML)工作负载主导的当今应用程序空间的计算需求不断增长，需要设计基于共存内存和处理单元原理的特殊用途计算系统。这种方法通常被称为“内存计算”，它可以通过在内存数组内部进行计算来潜在地消除昂贵的数据移动成本。为此，基于电阻开关非易失性存储器(NVM)器件的交叉棒在作为内存计算系统的构建模块方面显示出巨大的前景，因为它们的高存储密度可以克服困扰CMOS技术的缩放挑战。此外，电阻交叉杆通过执行大规模并行、原位矩阵向量乘法(MVM)操作来加速机器学习工作负载的主要计算内核的能力，使其成为建筑面积和节能系统的有希望的候选者。然而，由于器件和电路级的非理想性，电阻交叉棒的模拟计算性质在MVM计算中引入了近似。此外，模拟系统对模拟和数字域之间的转换提出了高成本的外围电路要求。因此，有必要了解整个系统设计堆栈，从设备特性到架构，并执行有效的软硬件协同设计，以真正实现电阻交叉杆作为未来计算系统的潜力。在本次演讲中，我们将全面概述用于加速ML工作负载的NVM交叉栏。我们详细描述了基本构建块的设计原则，例如构成交叉杆的设备和相关电路。我们探索了由器件特性和电路行为引起的非理想性，并研究了它们对机器学习硬件中NVM交叉条的MVM功能的影响。

{"title":"In-Memory Computing based Machine Learning Accelerators: Opportunities and Challenges","authors":"K. Roy","doi":"10.1145/3526241.3530051","DOIUrl":"https://doi.org/10.1145/3526241.3530051","url":null,"abstract":"Traditional computing systems based on von Neumann architectures are fundamentally bottle-necked by the transfer speeds between memory and processor. With growing computational needs of today's application space, dominated by Machine Learning (ML) workloads, there is a need to design special purpose computing systems operating on the principle of co-located memory and processing units. Such an approach, commonly known as 'In-memory computing', can potentially eliminate expensive data movement costs by computing inside the memory array itself. To that effect, crossbars based on resistive switching Non-Volatile Memory (NVM) devices has shown immense promise in serving as the building blocks of in-memory computing systems, as their high storage density can overcome scaling challenges that plague CMOS technology today. Adding to that, the ability of resistive crossbars to accelerate the main computational kernel of ML workloads by performing massively parallel, in-situ matrix vector multiplication (MVM) operations, makes them a promising candidate for building area and energy-efficient systems. However, the analog computing nature in resistive crossbars introduce approximations in MVM computations due to device and circuit level nonidealities. Further, analog systems pose high cost peripheral circuit requirements for conversions between the analog and digital domain. Thus, there is a need to understand the entire system design stack, from device characteristics to architectures, and perform effective hardware-software co-design to truly realize the potential of resistive crossbars as future computing systems. In this talk, we will present a comprehensive overview of NVM crossbars for accelerating ML workloads. We describe, in detail, the design principles of the basic building blocks, such as the device and associated circuits, that constitute the crossbars. We explore non-idealities arising from the device characteristics and circuit behavior and study their impact on MVM functionality of NVM crossbars for machine learning hardware.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131143119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RAFeL - Robust and Data-Aware Federated Learning-inspired Malware Detection in Internet-of-Things (IoT) Networks 在物联网(IoT)网络中健壮和数据感知的联邦学习启发的恶意软件检测

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530378

Sanket Shukla, Gaurav Kolhe, H. Homayoun, S. Rafatirad, Sai Manoj Pudukotai Dinakarrao

Federated Learning (FL) is a decentralized machine learning in which the training data is distributed on the Internet-of-Things (IoT) devices and learns a shared global model by aggregating local updates. However, the training data can be poisoned and manipulated by malicious adversaries, contaminating locally computed updates. To prevent this, detecting malicious IoT devices is very important. Since the local updates are large because of the high volume of data, minimizing the communication overhead is also necessary. This paper proposes a "RAFeL" framework, comprising of two techniques to tackle the above issues, (1) a robust defense technique and (2) a "Performance-aware bit-wise encoding" technique. "Robust and Active Protection with Intelligent Defense (RAPID)" is a defense system that detects malicious IoT devices and restricts the participation of the contaminated local updates computed by these malicious devices. To minimize communication cost, "Performance-aware bit-wise encoding" selects the appropriate encoding scheme for individual split bits based on their significance and effect on FL performance. The results illustrate that the proposed framework shows a 1.2-1.8x higher compression rate than lossy and lossless encoding techniques and has an average accuracy drop of 3% to 10% even with a fraction of malicious devices.

联邦学习(FL)是一种分散的机器学习，其中训练数据分布在物联网(IoT)设备上，并通过聚合本地更新来学习共享的全局模型。然而，训练数据可能会被恶意对手破坏和操纵，从而污染本地计算的更新。为了防止这种情况，检测恶意物联网设备非常重要。由于数据量大，本地更新量很大，因此最小化通信开销也是必要的。本文提出了一个“RAFeL”框架，包括两种技术来解决上述问题，(1)一个健壮的防御技术和(2)一个“性能感知的位编码”技术。“具有智能防御的稳健主动防护(RAPID)”是一种检测恶意物联网设备并限制这些恶意设备计算的受污染本地更新参与的防御系统。为了最大限度地减少通信成本，“性能感知位编码”根据对FL性能的重要性和影响，为单个分割位选择合适的编码方案。结果表明，该框架的压缩率比有损和无损编码技术高1.2-1.8倍，即使存在一小部分恶意设备，平均精度也下降3%至10%。

{"title":"RAFeL - Robust and Data-Aware Federated Learning-inspired Malware Detection in Internet-of-Things (IoT) Networks","authors":"Sanket Shukla, Gaurav Kolhe, H. Homayoun, S. Rafatirad, Sai Manoj Pudukotai Dinakarrao","doi":"10.1145/3526241.3530378","DOIUrl":"https://doi.org/10.1145/3526241.3530378","url":null,"abstract":"Federated Learning (FL) is a decentralized machine learning in which the training data is distributed on the Internet-of-Things (IoT) devices and learns a shared global model by aggregating local updates. However, the training data can be poisoned and manipulated by malicious adversaries, contaminating locally computed updates. To prevent this, detecting malicious IoT devices is very important. Since the local updates are large because of the high volume of data, minimizing the communication overhead is also necessary. This paper proposes a \"RAFeL\" framework, comprising of two techniques to tackle the above issues, (1) a robust defense technique and (2) a \"Performance-aware bit-wise encoding\" technique. \"Robust and Active Protection with Intelligent Defense (RAPID)\" is a defense system that detects malicious IoT devices and restricts the participation of the contaminated local updates computed by these malicious devices. To minimize communication cost, \"Performance-aware bit-wise encoding\" selects the appropriate encoding scheme for individual split bits based on their significance and effect on FL performance. The results illustrate that the proposed framework shows a 1.2-1.8x higher compression rate than lossy and lossless encoding techniques and has an average accuracy drop of 3% to 10% even with a fraction of malicious devices.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115527249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the Great Lakes Symposium on VLSI 2022

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀