Proceedings of the 59th ACM/IEEE Design Automation Conference最新文献

英文中文

Improving LUT-based optimization for ASICs 改进基于lut的asic优化

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530461

Walter Lau Neto, L. Amarù, V. Possani, P. Vuillod, Jiong Luo, A. Mishchenko, P. Gaillardon

LUT-based optimization techniques are finding new applications in synthesis of ASIC designs. Intuitively, packing logic into LUTs provides a better balance between functionality and structure in logic optimization. On this basis, the LUT-engine framework [1] was introduced to enhance the ASIC synthesis. In this paper, we present key improvements, at both algorithmic and flow levels, making a much stronger LUT-engine. We restructure the flow of LUT-engine, to benefit from a heterogeneous mixture of LUT sizes, and revisit its requirements for maximum scalability. We propose a dedicated LUT mapper for the new flow, based on FlowMap, natively balancing LUT-count and NAND2-count for a wide range LUT sizes. We describe a specialized Boolean factoring technique, exploiting the fanin bounds in LUT networks, resulting in a very fast LUT-based AIG minimization. By using the proposed methodology, we improve 9 of the best area results in the ongoing EPFL synthesis competition. Integrated in a complete EDA flow for ASICs, the new LUT-engine performs well on a set of 87 benchmarks: -4.60% area and -3.41% switching power at +5% runtime, compared to the baseline flow without LUT-based optimizations, and -3.02% area and -2.54% switching power with -1% runtime, compared to the original LUT-engine.

基于lut的优化技术在集成电路综合设计中有了新的应用。直观地说，将逻辑打包到lut中可以在逻辑优化中更好地平衡功能和结构。在此基础上，引入了LUT-engine框架[1]来增强ASIC的综合。在本文中，我们提出了关键的改进，在算法和流水平，使一个更强大的lut引擎。我们重构了LUT引擎的流，以便从LUT大小的异构混合中获益，并重新审视其需求以获得最大的可伸缩性。我们为新流提出了一个专用的LUT映射器，基于FlowMap，本机平衡LUT计数和nand2计数，用于广泛的LUT大小。我们描述了一种专门的布尔因子分解技术，利用LUT网络中的fanin界，导致非常快速的基于LUT的AIG最小化。通过使用所提出的方法，我们在正在进行的EPFL合成竞赛中提高了9个最佳区域结果。集成在完整的asic EDA流程中，新的lut引擎在87个基准测试中表现良好:与没有基于lut优化的基准流相比，在+5%运行时面积为-4.60%，开关功率为-3.41%，与原始lut引擎相比，在-1%运行时面积为-3.02%，开关功率为-2.54%。

{"title":"Improving LUT-based optimization for ASICs","authors":"Walter Lau Neto, L. Amarù, V. Possani, P. Vuillod, Jiong Luo, A. Mishchenko, P. Gaillardon","doi":"10.1145/3489517.3530461","DOIUrl":"https://doi.org/10.1145/3489517.3530461","url":null,"abstract":"LUT-based optimization techniques are finding new applications in synthesis of ASIC designs. Intuitively, packing logic into LUTs provides a better balance between functionality and structure in logic optimization. On this basis, the LUT-engine framework [1] was introduced to enhance the ASIC synthesis. In this paper, we present key improvements, at both algorithmic and flow levels, making a much stronger LUT-engine. We restructure the flow of LUT-engine, to benefit from a heterogeneous mixture of LUT sizes, and revisit its requirements for maximum scalability. We propose a dedicated LUT mapper for the new flow, based on FlowMap, natively balancing LUT-count and NAND2-count for a wide range LUT sizes. We describe a specialized Boolean factoring technique, exploiting the fanin bounds in LUT networks, resulting in a very fast LUT-based AIG minimization. By using the proposed methodology, we improve 9 of the best area results in the ongoing EPFL synthesis competition. Integrated in a complete EDA flow for ASICs, the new LUT-engine performs well on a set of 87 benchmarks: -4.60% area and -3.41% switching power at +5% runtime, compared to the baseline flow without LUT-based optimizations, and -3.02% area and -2.54% switching power with -1% runtime, compared to the original LUT-engine.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121470997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

NAX NAX

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530476

Shubham Negi, I. Chakraborty, Aayush Ankit, K. Roy

Neural Architecture Search (NAS) has provided the ability to design efficient deep neural network (DNN) catered towards different hardwares like GPUs, CPUs etc. However, integrating NAS with Memristive Crossbar Array (MCA) based In-Memory Computing (IMC) accelerator remains an open problem. The hardware efficiency (energy, latency and area) as well as application accuracy (considering device and circuit non-idealities) of DNNs mapped to such hardware are co-dependent on network parameters such as kernel size, depth etc. and hardware architecture parameters such as crossbar size and the precision of analog-to-digital converters. Co-optimization of both network and hardware parameters presents a challenging search space comprising of different kernel sizes mapped to varying crossbar sizes. To that effect, we propose NAX - an efficient neural architecture search engine that co-designs neural network and IMC based hardware architecture. NAX explores the aforementioned search space to determine kernel and corresponding crossbar sizes for each DNN layer to achieve optimal tradeoffs between hardware efficiency and application accuracy. For CIFAR-10 and Tiny ImageNet, our models achieve 0.9% and 18.57% higher accuracy at 30% and -10.47% lower EDAP (energy-delay-area product), compared to baseline ResNet-20 and ResNet-18 models, respectively.

{"title":"NAX","authors":"Shubham Negi, I. Chakraborty, Aayush Ankit, K. Roy","doi":"10.1145/3489517.3530476","DOIUrl":"https://doi.org/10.1145/3489517.3530476","url":null,"abstract":"Neural Architecture Search (NAS) has provided the ability to design efficient deep neural network (DNN) catered towards different hardwares like GPUs, CPUs etc. However, integrating NAS with Memristive Crossbar Array (MCA) based In-Memory Computing (IMC) accelerator remains an open problem. The hardware efficiency (energy, latency and area) as well as application accuracy (considering device and circuit non-idealities) of DNNs mapped to such hardware are co-dependent on network parameters such as kernel size, depth etc. and hardware architecture parameters such as crossbar size and the precision of analog-to-digital converters. Co-optimization of both network and hardware parameters presents a challenging search space comprising of different kernel sizes mapped to varying crossbar sizes. To that effect, we propose NAX - an efficient neural architecture search engine that co-designs neural network and IMC based hardware architecture. NAX explores the aforementioned search space to determine kernel and corresponding crossbar sizes for each DNN layer to achieve optimal tradeoffs between hardware efficiency and application accuracy. For CIFAR-10 and Tiny ImageNet, our models achieve 0.9% and 18.57% higher accuracy at 30% and -10.47% lower EDAP (energy-delay-area product), compared to baseline ResNet-20 and ResNet-18 models, respectively.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121506923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Timing macro modeling with graph neural networks 基于图神经网络的时序宏建模

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530599

K. Chang, Chun-Yao Chiang, Pei-Yu Lee, I. Jiang

Due to rapidly growing design complexity, timing macro modeling has been widely adopted to enable hierarchical and parallel timing analysis. The main challenge of timing macro modeling is to identify timing variant pins for achieving high timing accuracy while keeping a compact model size. To tackle this challenge, prior work applied ad-hoc techniques and threshold setting. In this work, we present a novel timing macro modeling approach based on graph neural networks (GNNs). A timing sensitivity metric is proposed to precisely evaluate the influence of each pin on the timing accuracy. Based on the timing sensitivity data and the circuit topology, the GNN model can effectively learn and capture timing variant pins. Experimental results show that our GNN-based framework reduces 10% model sizes while preserving the same timing accuracy as the state-of-the-art. Furthermore, taking common path pessimism removal (CPPR) as an example, the generality and applicability of our framework on various timing analysis models and modes are also validated empirically.

由于设计复杂性的快速增长，时序宏建模已被广泛采用，以实现分层并行时序分析。时序宏建模的主要挑战是确定时序变量引脚，以实现高时序精度，同时保持紧凑的模型尺寸。为了应对这一挑战，之前的工作应用了特设技术和阈值设置。在这项工作中，我们提出了一种新的基于图神经网络(gnn)的时序宏建模方法。提出了一种时序灵敏度度量，以精确评估各引脚对时序精度的影响。基于时序灵敏度数据和电路拓扑结构，GNN模型可以有效地学习和捕获时序变引脚。实验结果表明，我们的基于gnn的框架减少了10%的模型尺寸，同时保持了与最先进的定时精度。并以共同路径悲观剔除(CPPR)为例，实证验证了本文框架在各种时序分析模型和模式上的通用性和适用性。

引用次数: 3

Rethinking key-value store for byte-addressable optane persistent memory 重新考虑键值存储对于字节可寻址的开放持久内存

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530535

Sung-Ming Wu, Li-Pin Chang

Optane Persistent Memory (PM) is a pioneering solution to byte-addressable PM for commodity systems. However, the performance of Optane PM is highly workload-sensitive, rendering many prior designs of Key-Value (KV) store inefficient. To cope with this reality, we advocate rethinking KV store design for Optane PM. Our design follows a principle of Single-stream Writing with managed Multi-stream Reading (SWMR): Incoming KV pairs are written to PM through a single write stream and managed by an ordered index in DRAM. Through asynchronously sorting and rewriting large sets of KV pairs, range queries are handled with a managed number of concurrent streams. YCSB results show that our design improved upon existing ones by 116% and 21% for write-only throughput and read-write throughput, respectively.

Optane Persistent Memory (PM)是面向商品系统的字节寻址PM的开创性解决方案。然而，Optane PM的性能对工作负载高度敏感，使得许多先前的键值(KV)存储设计效率低下。为了应对这一现实，我们提倡对Optane PM的KV商店设计进行重新思考。我们的设计遵循单流写入与托管多流读取(SWMR)的原则:通过单个写流将传入KV对写入PM，并由DRAM中的有序索引进行管理。通过异步排序和重写大的KV对集，范围查询通过管理并发流的数量来处理。YCSB结果表明，我们的设计在只写吞吐量和读写吞吐量方面分别比现有的设计提高了116%和21%。

引用次数: 0

Memory-efficient training of binarized neural networks on the edge 边缘上二值化神经网络的记忆效率训练

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530496

Mikail Yayla, Jian-Jia Chen

A visionary computing paradigm is to train resource efficient neural networks on the edge using dedicated low-power accelerators instead of cloud infrastructures, eliminating communication overheads and privacy concerns. One promising resource-efficient approach for inference is binarized neural networks (BNNs), which binarize parameters and activations. However, training BNNs remains resource demanding. State-of-the-art BNN training methods, such as the binary optimizer (Bop), require to store and update a large number of momentum values in the floating point (FP) format. In this work, we focus on memory-efficient FP encodings for the momentum values in Bop. To achieve this, we first investigate the impact of arbitrary FP encodings. When the FP format is not properly chosen, we prove that the updates of the momentum values can be lost and the quality of training is therefore dropped. With the insights, we formulate a metric to determine the number of unchanged momentum values in a training iteration due to the FP encoding. Based on the metric, we develop an algorithm to find FP encodings that are more memory-efficient than the standard FP encodings. In our experiments, the memory usage in BNN training is decreased by factors 2.47x, 2.43x, 2.04x, depending on the BNN model, with minimal accuracy cost (smaller than 1%) compared to using 32-bit FP encoding.

一个有远见的计算范式是在边缘使用专用的低功耗加速器而不是云基础设施来训练资源高效的神经网络，从而消除通信开销和隐私问题。二值化神经网络(bnn)是一种很有前途的资源高效推理方法，它对参数和激活进行二值化。然而，培训bnn仍然需要资源。最先进的BNN训练方法，如二进制优化器(Bop)，需要以浮点(FP)格式存储和更新大量动量值。在这项工作中，我们重点研究了Bop中动量值的内存高效FP编码。为了实现这一点，我们首先研究任意FP编码的影响。当FP格式选择不当时，我们证明动量值的更新可能会丢失，从而降低训练质量。有了这些见解，我们制定了一个度量来确定由于FP编码而导致的训练迭代中未改变的动量值的数量。基于这个度量，我们开发了一种算法来寻找比标准FP编码更高效的内存编码。在我们的实验中，与使用32位FP编码相比，BNN训练中的内存使用量减少了2.47倍，2.43倍和2.04倍，具体取决于BNN模型，并且精度成本最小(小于1%)。

{"title":"Memory-efficient training of binarized neural networks on the edge","authors":"Mikail Yayla, Jian-Jia Chen","doi":"10.1145/3489517.3530496","DOIUrl":"https://doi.org/10.1145/3489517.3530496","url":null,"abstract":"A visionary computing paradigm is to train resource efficient neural networks on the edge using dedicated low-power accelerators instead of cloud infrastructures, eliminating communication overheads and privacy concerns. One promising resource-efficient approach for inference is binarized neural networks (BNNs), which binarize parameters and activations. However, training BNNs remains resource demanding. State-of-the-art BNN training methods, such as the binary optimizer (Bop), require to store and update a large number of momentum values in the floating point (FP) format. In this work, we focus on memory-efficient FP encodings for the momentum values in Bop. To achieve this, we first investigate the impact of arbitrary FP encodings. When the FP format is not properly chosen, we prove that the updates of the momentum values can be lost and the quality of training is therefore dropped. With the insights, we formulate a metric to determine the number of unchanged momentum values in a training iteration due to the FP encoding. Based on the metric, we develop an algorithm to find FP encodings that are more memory-efficient than the standard FP encodings. In our experiments, the memory usage in BNN training is decreased by factors 2.47x, 2.43x, 2.04x, depending on the BNN model, with minimal accuracy cost (smaller than 1%) compared to using 32-bit FP encoding.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126133636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Equivalence checking paradigms in quantum circuit design: a case study 量子电路设计中的等效检验范例:个案研究

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530480

Tom Peham, Lukas Burgholzer, R. Wille

As state-of-the-art quantum computers are capable of running increasingly complex algorithms, the need for automated methods to design and test potential applications rises. Equivalence checking of quantum circuits is an important, yet hardly automated, task in the development of the quantum software stack. Recently, new methods have been proposed that tackle this problem from widely different perspectives. However, there is no established baseline on which to judge current and future progress in equivalence checking of quantum circuits. In order to close this gap, we conduct a detailed case study of two of the most promising equivalence checking methodologies---one based on decision diagrams and one based on the ZX-calculus---and compare their strengths and weaknesses.

随着最先进的量子计算机能够运行越来越复杂的算法，对设计和测试潜在应用的自动化方法的需求也在上升。在量子软件栈的开发中，量子电路的等效性检查是一项重要的但很难自动化的任务。最近，人们提出了从不同角度解决这个问题的新方法。然而，对于量子电路等效性检验的当前和未来进展，并没有确定的基准来判断。为了缩小这一差距，我们对两种最有前途的等价性检查方法进行了详细的案例研究——一种基于决策图，另一种基于zx微积分——并比较了它们的优缺点。

引用次数: 4

Solving traveling salesman problems via a parallel fully connected ising machine 利用并联全连通机求解旅行商问题

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530595

Qichao Tao, Jie Han

Annealing-based Ising machines have shown promising results in solving combinatorial optimization problems. As a typical class of these problems, however, traveling salesman problems (TSPs) are very challenging to solve due to the constraints imposed on the solution. This article proposes a parallel annealing algorithm for a fully connected Ising machine that significantly improves the accuracy and performance in solving constrained combinatorial optimization problems such as the TSP. Unlike previous parallel annealing algorithms, this improved parallel annealing (IPA) algorithm efficiently solves TSPs using an exponential temperature function with a dynamic offset. Compared with digital annealing (DA) and momentum annealing (MA), the IPA reduces the run time by 44.4 times and 19.9 times for a 14-city TSP, respectively. Large scale TSPs can be more efficiently solved by taking a k-medoids clustering approach that decreases the average travel distance of a 22-city TSP by 51.8% compared with DA and by 42.0% compared with MA. This approach groups neighboring cities into clusters to form a reduced TSP, which is then solved in a hierarchical manner by using the IPA algorithm.

基于退火的伊辛机器在解决组合优化问题方面已经显示出有希望的结果。旅行商问题(tsp)作为这类问题的典型，由于其解的约束条件，求解起来非常具有挑战性。本文提出了一种适用于全连接伊辛机的并行退火算法，该算法显著提高了求解约束组合优化问题(如TSP)的精度和性能。与以前的并行退火算法不同，这种改进的并行退火(IPA)算法使用带有动态偏移的指数温度函数有效地解决了tsp。与数字退火(DA)和动量退火(MA)相比，IPA在14个城市的TSP运行时间分别缩短了44.4倍和19.9倍。采用k- medidoids聚类方法可以更有效地求解大规模TSP，使22个城市TSP的平均行程距离比DA减少51.8%，比MA减少42.0%。该方法将邻近的城市分组，形成一个简化的TSP，然后使用IPA算法分层求解。

{"title":"Solving traveling salesman problems via a parallel fully connected ising machine","authors":"Qichao Tao, Jie Han","doi":"10.1145/3489517.3530595","DOIUrl":"https://doi.org/10.1145/3489517.3530595","url":null,"abstract":"Annealing-based Ising machines have shown promising results in solving combinatorial optimization problems. As a typical class of these problems, however, traveling salesman problems (TSPs) are very challenging to solve due to the constraints imposed on the solution. This article proposes a parallel annealing algorithm for a fully connected Ising machine that significantly improves the accuracy and performance in solving constrained combinatorial optimization problems such as the TSP. Unlike previous parallel annealing algorithms, this improved parallel annealing (IPA) algorithm efficiently solves TSPs using an exponential temperature function with a dynamic offset. Compared with digital annealing (DA) and momentum annealing (MA), the IPA reduces the run time by 44.4 times and 19.9 times for a 14-city TSP, respectively. Large scale TSPs can be more efficiently solved by taking a k-medoids clustering approach that decreases the average travel distance of a 22-city TSP by 51.8% compared with DA and by 42.0% compared with MA. This approach groups neighboring cities into clusters to form a reduced TSP, which is then solved in a hierarchical manner by using the IPA algorithm.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129984976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A scalable symbolic simulation tool for low power embedded systems 用于低功耗嵌入式系统的可扩展符号仿真工具

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530433

Subhash Sethumurugan, Shashank Hegde, Hari Cherupalli, J. Sartori

Recent work has demonstrated the effectiveness of using symbolic simulation to perform hardware software co-analysis on an application-processor pair and developed a variety of hardware and software design techniques and optimizations, ranging from providing system security guarantees to automated generation of application-specific bespoke processors. Despite their potential benefits, current state-of-the-art symbolic simulation tools for hardware-software co-analysis are restricted in their applicability, since prior work relies on a costly process of building a custom simulation tool for each processor design to be simulated. Furthermore, prior work does not describe how to extend the symbolic analysis technique to other processor designs. In an effort to generalize the technique for any processor design, we propose a custom symbolic simulator that uses iverilog to perform symbolic behavioral simulation. With iverilog - an open source synthesis and simulation tool - we implement a design-agnostic symbolic simulation tool for hardware-software co-analysis. To demonstrate the generality of our tool, we apply symbolic analysis to three embedded processors with different ISAs: bm32 (a MIPS-based processor), darkRiscV (a RISC-V-based processor), and openMSP430 (based on MSP430). We use analysis results to generate bespoke processors for each design and observe gate count reductions of 27%, 16%, and 56% on these processors, respectively. Our results demonstrate the versatility of our simulation tool and the uniqueness of each design with respect to symbolic analysis and the bespoke methodology.

最近的工作已经证明了使用符号模拟在应用处理器对上执行硬件软件协同分析的有效性，并开发了各种硬件和软件设计技术和优化，范围从提供系统安全保证到自动生成特定于应用的定制处理器。尽管具有潜在的好处，但目前最先进的用于硬件软件协同分析的符号仿真工具的适用性受到限制，因为之前的工作依赖于为每个要模拟的处理器设计构建定制仿真工具的昂贵过程。此外，先前的工作没有描述如何将符号分析技术扩展到其他处理器设计。为了将该技术推广到任何处理器设计中，我们提出了一个自定义的符号模拟器，它使用iverilog来执行符号行为模拟。利用iverilog——一个开源的综合和仿真工具——我们实现了一个设计无关的用于硬件软件协同分析的符号仿真工具。为了证明我们的工具的通用性，我们将符号分析应用于三个具有不同isa的嵌入式处理器:bm32(基于mips的处理器)，darkRiscV(基于risc - v的处理器)和openMSP430(基于MSP430)。我们使用分析结果为每种设计生成定制的处理器，并观察到这些处理器上的门计数分别减少了27%，16%和56%。我们的结果证明了我们的仿真工具的多功能性和每个设计在符号分析和定制方法方面的独特性。

{"title":"A scalable symbolic simulation tool for low power embedded systems","authors":"Subhash Sethumurugan, Shashank Hegde, Hari Cherupalli, J. Sartori","doi":"10.1145/3489517.3530433","DOIUrl":"https://doi.org/10.1145/3489517.3530433","url":null,"abstract":"Recent work has demonstrated the effectiveness of using symbolic simulation to perform hardware software co-analysis on an application-processor pair and developed a variety of hardware and software design techniques and optimizations, ranging from providing system security guarantees to automated generation of application-specific bespoke processors. Despite their potential benefits, current state-of-the-art symbolic simulation tools for hardware-software co-analysis are restricted in their applicability, since prior work relies on a costly process of building a custom simulation tool for each processor design to be simulated. Furthermore, prior work does not describe how to extend the symbolic analysis technique to other processor designs. In an effort to generalize the technique for any processor design, we propose a custom symbolic simulator that uses iverilog to perform symbolic behavioral simulation. With iverilog - an open source synthesis and simulation tool - we implement a design-agnostic symbolic simulation tool for hardware-software co-analysis. To demonstrate the generality of our tool, we apply symbolic analysis to three embedded processors with different ISAs: bm32 (a MIPS-based processor), darkRiscV (a RISC-V-based processor), and openMSP430 (based on MSP430). We use analysis results to generate bespoke processors for each design and observe gate count reductions of 27%, 16%, and 56% on these processors, respectively. Our results demonstrate the versatility of our simulation tool and the uniqueness of each design with respect to symbolic analysis and the bespoke methodology.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130022175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AL-PA: cross-device profiled side-channel attack using adversarial learning AL-PA:使用对抗性学习的跨设备侧信道攻击

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530517

Pei Cao, Hongyi Zhang, Dawu Gu, Yan Lu, Yidong Yuan

In this paper, we focus on the portability issue in profiled side-channel attacks (SCAs) that arises due to significant device-to-device variations. Device discrepancy is inevitable in realistic attacks, but it is often neglected in research works. In this paper, we identify such device variations and take a further step towards leveraging the transferability of neural networks. We propose a novel adversarial learning-based profiled attack (AL-PA), which enables our neural network to learn device-invariant features. We evaluated our strategy on eight XMEGA microcontrollers. Without the need for target-specific preprocessing and multiple profiling devices, our approach has outperformed the state-of-the-art methods.

在本文中，我们将重点关注由于设备到设备的显著差异而产生的侧信道攻击(sca)中的可移植性问题。设备差异在现实攻击中是不可避免的，但在研究工作中往往被忽视。在本文中，我们确定了这种设备变化，并朝着利用神经网络的可转移性迈出了进一步的一步。我们提出了一种新的基于对抗性学习的轮廓攻击(AL-PA)，它使我们的神经网络能够学习设备不变性特征。我们在8个XMEGA微控制器上评估了我们的策略。不需要针对特定目标的预处理和多个分析设备，我们的方法优于最先进的方法。

引用次数: 5

Beyond local optimality of buffer and splitter insertion for AQFP circuits AQFP电路中缓冲器和分配器插入的超局部最优性

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530661

Siang-Yun Lee, Heinz Riener, G. De Micheli

Adiabatic quantum-flux parametron (AQFP) is an energy-efficient superconducting technology. Buffer and splitter (B/S) cells must be inserted to an AQFP circuit to meet the technology-imposed constraints on path balancing and fanout branching. These cells account for a significant amount of the circuit's area and delay. In this paper, we identify that B/S insertion is a scheduling problem, and propose (a) a linear-time algorithm for locally optimal B/S insertion subject to a given schedule; (b) an SMT formulation to find the global optimum; and (c) an efficient heuristic for global B/S optimization. Experimental results show a reduction of 4% on the B/S cost and 124X speed-up compared to the state-of-the-art algorithm, and capability to scale to a magnitude larger benchmarks.

绝热量子通量参数(AQFP)是一种节能超导技术。缓冲和分路器(B/S)单元必须插入到AQFP电路中，以满足技术对路径平衡和扇出分支的约束。这些细胞占回路面积和延迟的很大一部分。本文认为B/S插入是一个调度问题，并提出了(a)在给定调度下局部最优B/S插入的线性时间算法;(b)寻找全局最优的SMT公式;(c)全局B/S优化的有效启发式算法。实验结果表明，与最先进的算法相比，B/S成本降低了4%，速度提高了124倍，并且能够扩展到更大的基准。

引用次数: 6

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 59th ACM/IEEE Design Automation Conference

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀