Proceedings of the 59th ACM/IEEE Design Automation Conference最新文献

英文中文

XMA: a crossbar-aware multi-task adaption framework via shift-based mask learning method 基于移位掩模学习方法的跨栏感知多任务自适应框架

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530458

Fan Zhang, Li Yang, Jian Meng, Jae-sun Seo, Yu Cao, Deliang Fan

ReRAM crossbar array as a high-parallel fast and energy-efficient structure attracts much attention, especially on the acceleration of Deep Neural Network (DNN) inference on one specific task. However, due to the high energy consumption of weight re-programming and the ReRAM cells' low endurance problem, adapting the crossbar array for multiple tasks has not been well explored. In this paper, we propose XMA, a novel crossbar-aware shift-based mask learning method for multiple task adaption in the ReRAM crossbar DNN accelerator for the first time. XMA leverages the popular mask-based learning algorithm's benefit to mitigate catastrophic forgetting and learn a task-specific, crossbar column-wise, and shift-based multi-level mask, rather than the most commonly used element-wise binary mask, for each new task based on a frozen backbone model. With our crossbar-aware design innovation, the required masking operation to adapt for a new task could be implemented in an existing crossbar-based convolution engine with minimal hardware/memory overhead and, more importantly, no need for power-hungry cell re-programming, unlike prior works. The extensive experimental results show that, compared with state-of-the-art multiple task adaption Piggyback method [1], XMA achieves 3.19% higher accuracy on average, while saving 96.6% memory overhead. Moreover, by eliminating cell re-programming, XMA achieves ~4.3x higher energy efficiency than Piggyback.

ReRAM交叉棒阵列作为一种高并行、快速、节能的结构受到了广泛的关注，特别是在加速深度神经网络对特定任务的推理方面。然而，由于重量重编程的高能量消耗和ReRAM单元的低耐力问题，使交叉杆阵列适应多任务尚未得到很好的探索。本文首次在ReRAM交叉棒深度神经网络加速器中提出了一种新的基于交叉棒感知位移的掩模学习方法XMA，用于多任务自适应。XMA利用流行的基于掩码的学习算法的优点来减轻灾难性的遗忘，并为每个基于固定主干模型的新任务学习特定于任务的、跨栏的、基于列的和基于移位的多级掩码，而不是最常用的基于元素的二进制掩码。通过我们的交叉棒感知设计创新，适应新任务所需的掩蔽操作可以在现有的基于交叉棒的卷积引擎中实现，硬件/内存开销最小，更重要的是，不需要像以前的工作那样需要耗电的单元重新编程。大量的实验结果表明，与目前最先进的多任务自适应Piggyback方法[1]相比，XMA的准确率平均提高了3.19%，内存开销节省了96.6%。此外，通过消除细胞重编程，XMA实现了比Piggyback高4.3倍的能源效率。

{"title":"XMA: a crossbar-aware multi-task adaption framework via shift-based mask learning method","authors":"Fan Zhang, Li Yang, Jian Meng, Jae-sun Seo, Yu Cao, Deliang Fan","doi":"10.1145/3489517.3530458","DOIUrl":"https://doi.org/10.1145/3489517.3530458","url":null,"abstract":"ReRAM crossbar array as a high-parallel fast and energy-efficient structure attracts much attention, especially on the acceleration of Deep Neural Network (DNN) inference on one specific task. However, due to the high energy consumption of weight re-programming and the ReRAM cells' low endurance problem, adapting the crossbar array for multiple tasks has not been well explored. In this paper, we propose XMA, a novel crossbar-aware shift-based mask learning method for multiple task adaption in the ReRAM crossbar DNN accelerator for the first time. XMA leverages the popular mask-based learning algorithm's benefit to mitigate catastrophic forgetting and learn a task-specific, crossbar column-wise, and shift-based multi-level mask, rather than the most commonly used element-wise binary mask, for each new task based on a frozen backbone model. With our crossbar-aware design innovation, the required masking operation to adapt for a new task could be implemented in an existing crossbar-based convolution engine with minimal hardware/memory overhead and, more importantly, no need for power-hungry cell re-programming, unlike prior works. The extensive experimental results show that, compared with state-of-the-art multiple task adaption Piggyback method [1], XMA achieves 3.19% higher accuracy on average, while saving 96.6% memory overhead. Moreover, by eliminating cell re-programming, XMA achieves ~4.3x higher energy efficiency than Piggyback.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133595620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Trusting the trust anchor: towards detecting cross-layer vulnerabilities with hardware fuzzing 信任锚:通过硬件模糊测试检测跨层漏洞

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530638

Chen Chen, Rahul Kande, Pouya Mahmoody, A. Sadeghi, J. Rajendran

The rise in the development of complex and application-specific commercial and open-source hardware and the shrinking verification time are causing numerous hardware-security vulnerabilities. Traditional verification techniques are limited in both scalability and completeness. Research in this direction is hindered due to the lack of robust testing benchmarks. In this paper, in collaboration with our industry partners, we built an ecosystem mimicking the hardware-development cycle where we inject bugs inspired by real-world vulnerabilities into RISC-V SoC design and organized an open-to-all bug-hunting competition. We equipped the participating researchers with industry-standard static and dynamic verification tools in a ready-to-use environment. The findings from our competition shed light on the strengths and weaknesses of the existing verification tools and highlight the potential for future research in developing new vulnerability detection techniques.

复杂的、特定于应用程序的商业和开源硬件开发的增加以及验证时间的缩短导致了许多硬件安全漏洞。传统的验证技术在可扩展性和完整性方面都受到限制。由于缺乏可靠的测试基准，这方面的研究受到阻碍。在本文中，我们与行业合作伙伴合作，建立了一个模仿硬件开发周期的生态系统，在该生态系统中，我们将受现实世界漏洞启发的漏洞注入RISC-V SoC设计中，并组织了一场面向所有人的漏洞搜索比赛。我们在现成的环境中为参与的研究人员配备了工业标准的静态和动态验证工具。我们的竞赛结果揭示了现有验证工具的优缺点，并强调了未来研究开发新的漏洞检测技术的潜力。

引用次数: 1

PHANES: ReRAM-based photonic accelerator for deep neural networks PHANES:基于reram的深度神经网络光子加速器

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530397

Yinyi Liu, Jiaqi Liu, Yuxiang Fu, Shixi Chen, Jiaxu Zhang, Jiang Xu

Resistive random access memory (ReRAM) has demonstrated great promises of in-situ matrix-vector multiplications to accelerate deep neural networks. However, subject to the intrinsic properties of analog processing, most of the proposed ReRAM-based accelerators require excessive costly ADC/DAC to avoid distortion of electronic analog signals during inter-tile transmission. Moreover, due to bit-shifting before addition, prior works require longer cycles to serially calculate partial sum compared to multiplications, which dramatically restricts the throughput and is more likely to stall the pipeline between layers of deep neural networks. In this paper, we present a novel ReRAM-based photonic accelerator (PHANES) architecture, which calculates multiplications in ReRAM and parallel weighted accumulations during optical transmission. Such photonic paradigm also serves as high-fidelity analog-analog links to further reduce ADC/DAC. To circumvent the memory wall problem, we further propose a progressive bit-depth technique. Evaluations show that PHANES improves the energy efficiency by 6.09x and throughput density by 14.7x compared to state-of-the-art designs. Our photonic architecture also has great potentials for scalability towards very-large-scale accelerators.

电阻式随机存取存储器(ReRAM)已经证明了原位矩阵向量乘法在加速深度神经网络方面的巨大前景。然而，由于模拟处理的固有特性，大多数基于reram的加速器都需要昂贵的ADC/DAC来避免电子模拟信号在片间传输过程中的失真。此外，由于在加法之前进行了位移位，与乘法相比，先前的工作需要更长的周期来连续计算部分和，这极大地限制了吞吐量，并且更有可能使深度神经网络层之间的管道停滞。在本文中，我们提出了一种新的基于ReRAM的光子加速器(PHANES)架构，该架构可以计算ReRAM中的乘法和光传输过程中的并行加权积累。这种光子模式也可以作为高保真的模拟链路，进一步减少ADC/DAC。为了规避内存墙问题，我们进一步提出了一种渐进式位深度技术。评估表明，与最先进的设计相比，PHANES的能源效率提高了6.09倍，吞吐量密度提高了14.7倍。我们的光子架构也有很大的潜力可扩展到非常大规模的加速器。

{"title":"PHANES: ReRAM-based photonic accelerator for deep neural networks","authors":"Yinyi Liu, Jiaqi Liu, Yuxiang Fu, Shixi Chen, Jiaxu Zhang, Jiang Xu","doi":"10.1145/3489517.3530397","DOIUrl":"https://doi.org/10.1145/3489517.3530397","url":null,"abstract":"Resistive random access memory (ReRAM) has demonstrated great promises of in-situ matrix-vector multiplications to accelerate deep neural networks. However, subject to the intrinsic properties of analog processing, most of the proposed ReRAM-based accelerators require excessive costly ADC/DAC to avoid distortion of electronic analog signals during inter-tile transmission. Moreover, due to bit-shifting before addition, prior works require longer cycles to serially calculate partial sum compared to multiplications, which dramatically restricts the throughput and is more likely to stall the pipeline between layers of deep neural networks. In this paper, we present a novel ReRAM-based photonic accelerator (PHANES) architecture, which calculates multiplications in ReRAM and parallel weighted accumulations during optical transmission. Such photonic paradigm also serves as high-fidelity analog-analog links to further reduce ADC/DAC. To circumvent the memory wall problem, we further propose a progressive bit-depth technique. Evaluations show that PHANES improves the energy efficiency by 6.09x and throughput density by 14.7x compared to state-of-the-art designs. Our photonic architecture also has great potentials for scalability towards very-large-scale accelerators.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133378070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accelerated synthesis of neural network-based barrier certificates using collaborative learning

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530608

Jun Xia, Ming Hu, Xin Chen, Mingsong Chen

Most of existing Neural Network (NN)-based barrier certificate synthesis methods cannot deal with high-dimensional continuous systems, since a large quantity of sampled data may easily result in inaccurate initial models coupled with slow convergence rate. To accelerate the synthesis of NN-based barrier certificates, this paper presents an effective two-stage approach named CL-BC, which fully exploits the parallel processing capability of underlying hardware to enable quick search for a barrier certificate. Unlike existing NN-based methods that adopt a random initial model for barrier certificate synthesis, in the first stage CL-BC pre-trains an initial model based on a small subset of sampling data. In this way, an approximate barrier certificate in an NN form can be quickly achieved with little overhead. Based on our proposed collaborative learning scheme, in the second stage CL-BC conducts the parallel learning on partitioned domains, where the learned knowledge from different partitions can be aggregated to accelerate the convergence of a global NN model for barrier certificate synthesis. In this way, the overall synthesis time of an NN-based barrier certificate can be drastically reduced. Experimental results show that our approach can not only drastically reduce barrier synthesis time, but also can synthesize barrier certificates for complex systems that cannot be handled by state-of-the-art.

现有的基于神经网络的屏障证书综合方法大多不能处理高维连续系统，因为大量的采样数据容易导致初始模型不准确且收敛速度慢。为了加速基于神经网络的屏障证书的合成，本文提出了一种有效的两阶段合成方法CL-BC，该方法充分利用底层硬件的并行处理能力，实现屏障证书的快速搜索。与现有基于神经网络的方法采用随机初始模型进行屏障证书合成不同，CL-BC在第一阶段基于一小部分采样数据预训练初始模型。通过这种方式，可以以很小的开销快速实现NN形式的近似屏障证书。基于我们提出的协同学习方案，第二阶段CL-BC在划分的域上进行并行学习，将不同分区的学习知识进行聚合，加快全局神经网络模型的收敛速度，用于屏障证书合成。通过这种方式，可以大大减少基于神经网络的屏障证书的综合时间。实验结果表明，该方法不仅可以大大缩短屏障合成时间，而且可以合成现有技术无法处理的复杂系统的屏障证书。

{"title":"Accelerated synthesis of neural network-based barrier certificates using collaborative learning","authors":"Jun Xia, Ming Hu, Xin Chen, Mingsong Chen","doi":"10.1145/3489517.3530608","DOIUrl":"https://doi.org/10.1145/3489517.3530608","url":null,"abstract":"Most of existing Neural Network (NN)-based barrier certificate synthesis methods cannot deal with high-dimensional continuous systems, since a large quantity of sampled data may easily result in inaccurate initial models coupled with slow convergence rate. To accelerate the synthesis of NN-based barrier certificates, this paper presents an effective two-stage approach named CL-BC, which fully exploits the parallel processing capability of underlying hardware to enable quick search for a barrier certificate. Unlike existing NN-based methods that adopt a random initial model for barrier certificate synthesis, in the first stage CL-BC pre-trains an initial model based on a small subset of sampling data. In this way, an approximate barrier certificate in an NN form can be quickly achieved with little overhead. Based on our proposed collaborative learning scheme, in the second stage CL-BC conducts the parallel learning on partitioned domains, where the learned knowledge from different partitions can be aggregated to accelerate the convergence of a global NN model for barrier certificate synthesis. In this way, the overall synthesis time of an NN-based barrier certificate can be drastically reduced. Experimental results show that our approach can not only drastically reduce barrier synthesis time, but also can synthesize barrier certificates for complex systems that cannot be handled by state-of-the-art.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134475535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Accelerator design with decoupled hardware customizations: benefits and challenges: invited 具有解耦硬件定制的加速器设计:好处和挑战:欢迎

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530681

D. Pal, Yi-Hsiang Lai, Shaojie Xiang, Niansong Zhang, Hongzheng Chen, Jeremy Casas, P. Cocchini, Zhenkun Yang, Jin Yang, L. Pouchet, Zhiru Zhang

The past decade has witnessed increasing adoption of high-level synthesis (HLS) to implement specialized hardware accelerators targeting either FPGAs or ASICs. However, current HLS programming models entangle algorithm specifications with hardware customization techniques, which lowers both the productivity and portability of the accelerator design. To tackle this problem, recent efforts such as HeteroCL propose to decouple algorithm definition from essential hardware customization techniques in compute, data type, and memory, increasing productivity, portability, and performance. While the decoupling of the algorithm and customizations provides benefits to the compilation/synthesis process, they also create new hurdles for the programmers to productively debug and validate the correctness of the optimized design. In this work, using HeteroCL and realistic machine learning applications as case studies, we first explain the key advantages of the decoupled programming model brought to a programmer to rapidly develop high-performance accelerators. Using the same case studies, we will further show how seemingly benign usage of the customization primitives can lead to new challenges to verification. We will then outline the research opportunities and discuss some of our recent efforts as the first step to enable a robust and viable verification solution in the future.

在过去的十年中，越来越多地采用高级合成(HLS)来实现针对fpga或asic的专用硬件加速器。然而，目前的HLS编程模型将算法规范与硬件定制技术纠缠在一起，这降低了加速器设计的生产力和可移植性。为了解决这个问题，最近的一些努力，如HeteroCL，建议将算法定义与计算、数据类型和内存方面的基本硬件定制技术解耦，从而提高生产率、可移植性和性能。虽然算法和自定义的解耦为编译/合成过程提供了好处，但它们也为程序员有效地调试和验证优化设计的正确性创造了新的障碍。在这项工作中，我们使用HeteroCL和现实机器学习应用程序作为案例研究，首先解释了解耦编程模型的主要优势，为程序员快速开发高性能加速器提供了帮助。通过使用相同的案例研究，我们将进一步展示自定义原语看似无害的使用如何导致验证面临新的挑战。然后，我们将概述研究机会，并讨论我们最近的一些努力，作为在未来实现健壮和可行的验证解决方案的第一步。

{"title":"Accelerator design with decoupled hardware customizations: benefits and challenges: invited","authors":"D. Pal, Yi-Hsiang Lai, Shaojie Xiang, Niansong Zhang, Hongzheng Chen, Jeremy Casas, P. Cocchini, Zhenkun Yang, Jin Yang, L. Pouchet, Zhiru Zhang","doi":"10.1145/3489517.3530681","DOIUrl":"https://doi.org/10.1145/3489517.3530681","url":null,"abstract":"The past decade has witnessed increasing adoption of high-level synthesis (HLS) to implement specialized hardware accelerators targeting either FPGAs or ASICs. However, current HLS programming models entangle algorithm specifications with hardware customization techniques, which lowers both the productivity and portability of the accelerator design. To tackle this problem, recent efforts such as HeteroCL propose to decouple algorithm definition from essential hardware customization techniques in compute, data type, and memory, increasing productivity, portability, and performance. While the decoupling of the algorithm and customizations provides benefits to the compilation/synthesis process, they also create new hurdles for the programmers to productively debug and validate the correctness of the optimized design. In this work, using HeteroCL and realistic machine learning applications as case studies, we first explain the key advantages of the decoupled programming model brought to a programmer to rapidly develop high-performance accelerators. Using the same case studies, we will further show how seemingly benign usage of the customization primitives can lead to new challenges to verification. We will then outline the research opportunities and discuss some of our recent efforts as the first step to enable a robust and viable verification solution in the future.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133366447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accurate timing prediction at placement stage with look-ahead RC network 利用超前RC网络对布放阶段进行精确的时序预测

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530598

Xu He, Zhiyong Fu, Yao Wang, Chang Liu, Yang Guo

Timing closure is a critical but effort-taking task in VLSI designs. In placement stage, a fast and accurate net delay estimator is highly desirable to guide the timing optimization prior to routing, and thus reduce the timing pessimism and shorten the design turn-around time. To handle the timing uncertainty at the placement stage, we propose a fast net delay timing predictor based on machine learning, which extract the fully timing features using a look-ahead RC network. Experimental results show that the proposed timing predictor has achieved average correlation over 0.99 with the post-routing sign-off timing results obtained in Synopsys PrimeTime.

在超大规模集成电路设计中，时序闭合是一项关键而又费时的任务。在布线阶段，迫切需要一个快速准确的净时延估计器来指导布线前的时序优化，从而减少时序悲观情绪，缩短设计周期。为了处理放置阶段的时间不确定性，我们提出了一种基于机器学习的快速网络延迟时间预测器，该预测器使用前瞻性RC网络提取完整的时间特征。实验结果表明，所提出的时序预测器与在Synopsys PrimeTime中获得的路由后签到时序结果的平均相关性超过0.99。

引用次数: 10

Thermal-aware optical-electrical routing codesign for on-chip signal communications 片上信号通信的热感知光电路由协同设计

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530404

Yu-Sheng Lu, Kuan-Cheng Chen, Yu-Ling Hsu, Yao-Wen Chang

The optical interconnection is a promising solution for on-chip signal communication in modern system-on-chip (SoC) and heterogeneous integration designs, providing large bandwidth and high-speed transmission with low power consumption. Previous works do not handle two main issues for on-chip optical-electrical (O-E) co-design: the thermal impact during O-E routing and the trade-offs among power consumption, wirelength, and congestion. As a result, the thermal-induced band shift might incur transmission malfunction; the power consumption estimation is inaccurate; thus, only suboptimal results are obtained. To remedy these disadvantages, we present a thermal-aware optical-electrical routing co-design flow to minimize power consumption, thermal impact, and wirelength. Experimental results based on the ISPD 2019 contest benchmarks show that our co-design flow significantly outperforms state-of-the-art works in power consumption, thermal impact, and wire-length.

在现代片上系统(SoC)和异构集成设计中，光互连是一种很有前途的片上信号通信解决方案，可以提供大带宽和低功耗的高速传输。先前的工作没有处理片上光电(O-E)协同设计的两个主要问题:O-E路由期间的热影响以及功耗，无线长度和拥塞之间的权衡。因此，热致带移可能导致传输故障;功耗估算不准确;因此，只能得到次优结果。为了弥补这些缺点，我们提出了一种热感知的光电路由协同设计流程，以最大限度地减少功耗，热影响和带宽。基于ISPD 2019竞赛基准的实验结果表明，我们的协同设计流程在功耗、热影响和导线长度方面明显优于最先进的作品。

引用次数: 1

SRA: a secure ReRAM-based DNN accelerator SRA:一个安全的基于reram的DNN加速器

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530440

Lei Zhao, Youtao Zhang, Jun Yang

Deep Neural Network (DNN) accelerators are increasingly developed to pursue high efficiency in DNN computing. However, the IP protection of the DNNs deployed on such accelerators is an important topic that has been less addressed. Although there are previous works that targeted this problem for CMOS-based designs, there is still no solution for ReRAM-based accelerators which pose new security challenges due to their crossbar structure and non-volatility. ReRAM's non-volatility retains data even after the system is powered off, making the stored DNN model vulnerable to attacks by simply reading out the ReRAM content. Because the crossbar structure can only compute on plaintext data, encrypting the ReRAM content is no longer a feasible solution in this scenario. In this paper, we propose SRA - a secure ReRAM-based DNN accelerator that stores DNN weights on crossbars in an encrypted format while still maintaining ReRAM's in-memory computing capability. The proposed encryption scheme also supports sharing bits among multiple weights, significantly reducing the storage overhead. In addition, SRA uses a novel high-bandwidth SC conversion scheme to protect each layer's intermediate results, which also contain sensitive information of the model. Our experimental results show that SRA can effectively prevent pirating the deployed DNN weights as well as the intermediate results with negligible accuracy loss, and achieves 1.14X performance speedup and 9% energy reduction compared to ISAAC - a non-secure ReRAM-based baseline.

为了追求深度神经网络计算的高效率，深度神经网络加速器日益得到发展。然而，部署在这种加速器上的dnn的知识产权保护是一个很少被解决的重要话题。虽然之前有针对基于cmos设计的这个问题的工作，但是基于reram的加速器由于其交叉杆结构和非易失性而带来了新的安全挑战，仍然没有解决方案。即使在系统断电后，ReRAM的非易失性也会保留数据，这使得存储的DNN模型容易受到简单读取ReRAM内容的攻击。由于交叉栏结构只能对明文数据进行计算，因此在这种情况下，加密ReRAM内容不再是可行的解决方案。在本文中，我们提出了SRA——一种安全的基于ReRAM的深度神经网络加速器，它以加密格式将深度神经网络权重存储在交叉条上，同时仍然保持ReRAM的内存计算能力。所提出的加密方案还支持在多个权重之间共享比特，从而大大降低了存储开销。此外，SRA采用了一种新颖的高带宽SC转换方案来保护各层的中间结果，这些中间结果也包含了模型的敏感信息。我们的实验结果表明，SRA可以有效地防止盗用部署的DNN权重和中间结果，精度损失可以忽略不计，并且与基于非安全reram的ISAAC(基于非安全reram的基线)相比，实现了1.14倍的性能加速和9%的能量减少。

{"title":"SRA: a secure ReRAM-based DNN accelerator","authors":"Lei Zhao, Youtao Zhang, Jun Yang","doi":"10.1145/3489517.3530440","DOIUrl":"https://doi.org/10.1145/3489517.3530440","url":null,"abstract":"Deep Neural Network (DNN) accelerators are increasingly developed to pursue high efficiency in DNN computing. However, the IP protection of the DNNs deployed on such accelerators is an important topic that has been less addressed. Although there are previous works that targeted this problem for CMOS-based designs, there is still no solution for ReRAM-based accelerators which pose new security challenges due to their crossbar structure and non-volatility. ReRAM's non-volatility retains data even after the system is powered off, making the stored DNN model vulnerable to attacks by simply reading out the ReRAM content. Because the crossbar structure can only compute on plaintext data, encrypting the ReRAM content is no longer a feasible solution in this scenario. In this paper, we propose SRA - a secure ReRAM-based DNN accelerator that stores DNN weights on crossbars in an encrypted format while still maintaining ReRAM's in-memory computing capability. The proposed encryption scheme also supports sharing bits among multiple weights, significantly reducing the storage overhead. In addition, SRA uses a novel high-bandwidth SC conversion scheme to protect each layer's intermediate results, which also contain sensitive information of the model. Our experimental results show that SRA can effectively prevent pirating the deployed DNN weights as well as the intermediate results with negligible accuracy loss, and achieves 1.14X performance speedup and 9% energy reduction compared to ISAAC - a non-secure ReRAM-based baseline.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116964780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Write or not: programming scheme optimization for RRAM-based neuromorphic computing 写或不写:基于随机存储器的神经形态计算的编程方案优化

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530558

Ziqi Meng, Yanan Sun, Weikang Qian

One main fault-tolerant method for a neural network accelerator based on resistive random access memory crossbars is the programming-based method, which is also known as write-and-verify (W-V). In the basic W-V scheme, all devices in crossbars are programmed repeatedly until they are close enough to their targets, which costs huge overhead. To reduce the cost, we optimize the W-V scheme by proposing a probabilistic termination criterion on a single device and a systematic optimization method on multiple devices. Furthermore, we propose a joint algorithm that assists the novel W-V scheme by incremental retraining, which further reduces the W-V cost. Compared to the basic W-V scheme, our proposed method improves the accuracy by 0.23% for ResNet18 on CIFAR10 with only 9.7% W-V cost under variation with σ = 1.2.

基于电阻性随机存取存储器交叉条的神经网络加速器容错的主要方法是基于编程的方法，也称为写验证(W-V)。在基本的W-V方案中，交叉杆中的所有设备都要重复编程，直到它们足够接近目标，这需要花费巨大的开销。为了降低成本，我们对W-V方案进行了优化，提出了单器件的概率终止准则和多器件的系统优化方法。此外，我们提出了一种联合算法，通过增量再训练来辅助新的W-V方案，进一步降低了W-V成本。与基本的W-V方案相比，我们提出的方法在σ = 1.2的变化条件下，以9.7%的W-V代价，将ResNet18在CIFAR10上的精度提高了0.23%。

引用次数: 3

The SODA approach: leveraging high-level synthesis for hardware/software co-design and hardware specialization: invited SODA方法:利用硬件/软件协同设计和硬件专门化的高级综合:邀请

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530628

Nicolas Bohm Agostini, S. Curzel, Ankur Limaye, Vinay C. Amatya, Marco Minutoli, Vito Giovanni Castellana, J. Manzano, Antonino Tumeo, Fabrizio Ferrandi

Novel "converged" applications combine phases of scientific simulation with data analysis and machine learning. Each computational phase can benefit from specialized accelerators. However, algorithms evolve so quickly that mapping them on existing accelerators is suboptimal or even impossible. This paper presents the SODA (Software Defined Accelerators) framework, a modular, multi-level, open-source, no-human-in-the-loop, hardware synthesizer that enables end-to-end generation of specialized accelerators. SODA is composed of SODA-Opt, a high-level frontend developed in MLIR that interfaces with domain-specific programming frameworks and allows performing system level design, and Bambu, a state-of-the-art high-level synthesis engine that can target different device technologies. The framework implements design space exploration as compiler optimization passes. We show how the modular, yet tight, integration of the high-level optimizer and lower-level HLS tools enables the generation of accelerators optimized for the computational patterns of converged applications. We then discuss some of the research opportunities that such a framework allows, including system-level design, profile driven optimization, and supporting new optimization metrics.

新颖的“融合”应用将科学模拟与数据分析和机器学习相结合。每个计算阶段都可以从专门的加速器中获益。然而，算法发展得如此之快，以至于将它们映射到现有的加速器上是次优的，甚至是不可能的。本文介绍了SODA(软件定义的加速器)框架，它是一个模块化的、多层次的、开源的、无人在环的硬件合成器，能够端到端生成专门的加速器。SODA由SODA- opt和Bambu组成，前者是在MLIR中开发的高级前端，可与特定领域的编程框架接口，并允许执行系统级设计，后者是最先进的高级合成引擎，可针对不同的设备技术。该框架在编译器优化通过时实现设计空间探索。我们将展示高级优化器和低级HLS工具的模块化紧密集成如何支持生成针对聚合应用程序的计算模式进行优化的加速器。然后我们讨论了这样一个框架允许的一些研究机会，包括系统级设计、配置文件驱动的优化，以及支持新的优化度量。

{"title":"The SODA approach: leveraging high-level synthesis for hardware/software co-design and hardware specialization: invited","authors":"Nicolas Bohm Agostini, S. Curzel, Ankur Limaye, Vinay C. Amatya, Marco Minutoli, Vito Giovanni Castellana, J. Manzano, Antonino Tumeo, Fabrizio Ferrandi","doi":"10.1145/3489517.3530628","DOIUrl":"https://doi.org/10.1145/3489517.3530628","url":null,"abstract":"Novel \"converged\" applications combine phases of scientific simulation with data analysis and machine learning. Each computational phase can benefit from specialized accelerators. However, algorithms evolve so quickly that mapping them on existing accelerators is suboptimal or even impossible. This paper presents the SODA (Software Defined Accelerators) framework, a modular, multi-level, open-source, no-human-in-the-loop, hardware synthesizer that enables end-to-end generation of specialized accelerators. SODA is composed of SODA-Opt, a high-level frontend developed in MLIR that interfaces with domain-specific programming frameworks and allows performing system level design, and Bambu, a state-of-the-art high-level synthesis engine that can target different device technologies. The framework implements design space exploration as compiler optimization passes. We show how the modular, yet tight, integration of the high-level optimizer and lower-level HLS tools enables the generation of accelerators optimized for the computational patterns of converged applications. We then discuss some of the research opportunities that such a framework allows, including system-level design, profile driven optimization, and supporting new optimization metrics.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115087169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 59th ACM/IEEE Design Automation Conference

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀