ACM Transactions on Design Automation of Electronic Systems最新文献_第9页

Sequential Routing-Based Time-Division Multiplexing Optimization for Multi-FPGA Systems 基于时序路由的多fpga系统时分复用优化

4区计算机科学 Q2 Computer Science

ACM Transactions on Design Automation of Electronic Systems

Pub Date : 2023-10-05 DOI: 10.1145/3626322

Wenxiong Lin, Haojie Wu, Peng Gao, Wenjun Luo, Shuting Cai, Xiaoming Xiong

Multi-FPGA systems are widely used in various circuit design-related areas, such as hardware emulation, virtual prototypes, and chiplet design methodologies. However, a physical resource clash between inter-FPGA signals and I/O pins can create a bottleneck in a multi-FPGA system. Specifically, inter-FPGA signals often outnumber I/O pins in a multi-FPGA system. To solve this problem, time-division multiplexing (TDM) is introduced. However, undue time delay caused by TDM may impair the performance of a multi-FPGA system. Therefore, a more efficient TDM solution is needed. In this work, we propose a new routing sequence strategy to improve the efficiency of TDM. Our strategy consists of two parts: a weighted routing algorithm and TDM assignment optimization. The algorithm takes into account the weight of the net to generate a high-quality routing topology. Then, a net-based TDM assignment is performed to obtain a lower TDM ratio for the multi-FPGA system. Experiments on the public dataset of CAD Contest 2019 at ICCAD showed that our routing sequence strategy achieved good results. Especially in those testcases of unbalanced designs, the performance of multi-FPGA systems was improved up to 2.63. Moreover, we outperformed the top two contest finalists as to TDM results in most of the testcases.

多fpga系统广泛应用于各种电路设计相关领域，如硬件仿真、虚拟原型和芯片设计方法。然而，fpga间信号和I/O引脚之间的物理资源冲突可能会在多fpga系统中造成瓶颈。具体来说，在多fpga系统中，fpga间信号的数量通常超过I/O引脚的数量。为了解决这一问题，引入了时分复用技术(TDM)。然而，时分复用引起的时间延迟可能会影响多fpga系统的性能。因此，需要一种更有效的时分复用解决方案。在这项工作中，我们提出了一种新的路由序列策略来提高时分复用的效率。我们的策略包括两个部分:加权路由算法和TDM分配优化。该算法考虑了网络的权重，生成了高质量的路由拓扑。然后，执行基于网络的时分复用分配，以获得多fpga系统的较低时分复用比。在ICCAD 2019年CAD大赛公开数据集上的实验表明，我们的路由序列策略取得了良好的效果。特别是在不平衡设计的测试用例中，多fpga系统的性能提高到2.63。此外，在大多数测试用例中，我们在TDM结果方面的表现超过了前两名决赛选手。

{"title":"Sequential Routing-Based Time-Division Multiplexing Optimization for Multi-FPGA Systems","authors":"Wenxiong Lin, Haojie Wu, Peng Gao, Wenjun Luo, Shuting Cai, Xiaoming Xiong","doi":"10.1145/3626322","DOIUrl":"https://doi.org/10.1145/3626322","url":null,"abstract":"Multi-FPGA systems are widely used in various circuit design-related areas, such as hardware emulation, virtual prototypes, and chiplet design methodologies. However, a physical resource clash between inter-FPGA signals and I/O pins can create a bottleneck in a multi-FPGA system. Specifically, inter-FPGA signals often outnumber I/O pins in a multi-FPGA system. To solve this problem, time-division multiplexing (TDM) is introduced. However, undue time delay caused by TDM may impair the performance of a multi-FPGA system. Therefore, a more efficient TDM solution is needed. In this work, we propose a new routing sequence strategy to improve the efficiency of TDM. Our strategy consists of two parts: a weighted routing algorithm and TDM assignment optimization. The algorithm takes into account the weight of the net to generate a high-quality routing topology. Then, a net-based TDM assignment is performed to obtain a lower TDM ratio for the multi-FPGA system. Experiments on the public dataset of CAD Contest 2019 at ICCAD showed that our routing sequence strategy achieved good results. Especially in those testcases of unbalanced designs, the performance of multi-FPGA systems was improved up to 2.63. Moreover, we outperformed the top two contest finalists as to TDM results in most of the testcases.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135482532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MOEA/D vs. NSGA-II: A Comprehensive Comparison for Multi/Many Objective Analog/RF Circuit Optimization Through A Generic Benchmark MOEA/D与NSGA-II:基于通用基准的多/多目标模拟/射频电路优化的综合比较

4区计算机科学 Q2 Computer Science

ACM Transactions on Design Automation of Electronic Systems

Pub Date : 2023-09-28 DOI: 10.1145/3626096

Enes Sağlıcan, Engin Afacan

Thanks to the enhanced computational capacity of modern computers, even sophisticated analog/RF circuit sizing problems can be solved via electronic design automation (EDA) tools. Recently, several analog/RF circuit optimization algorithms have been successfully applied to automatize the analog/RF circuit design process. Conventionally, metaheuristic algorithms are widely used in optimization process. Among various nature-inspired algorithms, evolutionary algorithms (EAs) have been more preferred due to their superiorities (robustness, efficiency, accuracy etc.) over the other algorithms. Furthermore, EAs have been diversified and several distinguished analog/RF circuit optimization approaches for single-, multi-, and many- objective problems have been reported in the literature. However, there are conflicting claims on the performance of these algorithms and no objective performance comparison has been revealed yet. In the previous work, only a few case study circuits have been under test to demonstrate the superiority of the utilized algorithm, so a limited comparison has been made for only these specific circuits. The underlying reason is that the literature lacks a generic benchmark for analog/RF circuit sizing problem. To address these issues, we propose a comprehensive comparison of the most popular two evolutionary computation algorithms, namely Non-Sorting Genetic Algorithm-II (NSGA-II) and Multi-Objective Evolutionary Algorithm based Decomposition (MOEA/D), in this paper. For that purpose, we introduce two ad-hoc testbenches for analog (ANLG) and radio frequency (RF) circuits including the common building blocks. The comparison has been made at both multi- and many- objective domains and the performances of algorithms have been quantitatively revealed through the well-known Pareto-optimal front quality metrics.

由于现代计算机的计算能力增强，即使是复杂的模拟/射频电路尺寸问题也可以通过电子设计自动化(EDA)工具来解决。近年来，一些模拟/射频电路优化算法已成功地应用于模拟/射频电路设计过程的自动化。传统上，元启发式算法被广泛用于优化过程。在各种受自然启发的算法中，进化算法(EAs)由于其鲁棒性、效率、准确性等优势而受到其他算法的青睐。此外，ea已经多样化，并且文献中已经报道了针对单目标、多目标和多目标问题的几种不同的模拟/射频电路优化方法。然而，对这些算法的性能有相互矛盾的说法，目前还没有发现客观的性能比较。在之前的工作中，只有少数案例研究电路进行了测试，以证明所使用算法的优越性，因此仅对这些特定电路进行了有限的比较。潜在的原因是，文献缺乏模拟/射频电路尺寸问题的通用基准。为了解决这些问题，本文提出了两种最流行的进化计算算法，即非排序遗传算法- ii (NSGA-II)和基于分解的多目标进化算法(MOEA/D)的综合比较。为此，我们介绍了模拟(ANLG)和射频(RF)电路的两个特设测试台，包括常见的构建块。在多目标和多目标领域进行了比较，并通过著名的帕累托最优前端质量指标定量地揭示了算法的性能。

{"title":"MOEA/D vs. NSGA-II: A Comprehensive Comparison for Multi/Many Objective Analog/RF Circuit Optimization Through A Generic Benchmark","authors":"Enes Sağlıcan, Engin Afacan","doi":"10.1145/3626096","DOIUrl":"https://doi.org/10.1145/3626096","url":null,"abstract":"Thanks to the enhanced computational capacity of modern computers, even sophisticated analog/RF circuit sizing problems can be solved via electronic design automation (EDA) tools. Recently, several analog/RF circuit optimization algorithms have been successfully applied to automatize the analog/RF circuit design process. Conventionally, metaheuristic algorithms are widely used in optimization process. Among various nature-inspired algorithms, evolutionary algorithms (EAs) have been more preferred due to their superiorities (robustness, efficiency, accuracy etc.) over the other algorithms. Furthermore, EAs have been diversified and several distinguished analog/RF circuit optimization approaches for single-, multi-, and many- objective problems have been reported in the literature. However, there are conflicting claims on the performance of these algorithms and no objective performance comparison has been revealed yet. In the previous work, only a few case study circuits have been under test to demonstrate the superiority of the utilized algorithm, so a limited comparison has been made for only these specific circuits. The underlying reason is that the literature lacks a generic benchmark for analog/RF circuit sizing problem. To address these issues, we propose a comprehensive comparison of the most popular two evolutionary computation algorithms, namely Non-Sorting Genetic Algorithm-II (NSGA-II) and Multi-Objective Evolutionary Algorithm based Decomposition (MOEA/D), in this paper. For that purpose, we introduce two ad-hoc testbenches for analog (ANLG) and radio frequency (RF) circuits including the common building blocks. The comparison has been made at both multi- and many- objective domains and the performances of algorithms have been quantitatively revealed through the well-known Pareto-optimal front quality metrics.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135385291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Task modules Partitioning, Scheduling and Floorplanning for Partially Dynamically Reconfigurable Systems with Heterogeneous Resources 具有异构资源的部分动态可重构系统的任务模块划分、调度和楼层规划

4区计算机科学 Q2 Computer Science

ACM Transactions on Design Automation of Electronic Systems

Pub Date : 2023-09-26 DOI: 10.1145/3625295

Bo Ding, Jinglei Huang, Junpeng Wang, Qi Xu, Song Chen, Yi Kang

Some field programmable gate arrays (FPGAs) can be partially dynamically reconfigurable with heterogeneous resources distributed on the chip. FPGA-based partially dynamically reconfigurable system (FPGA-PDRS) can be used to accelerate computing and improve computing flexibility. However, the traditional design of FPGA-PDRS is based on manual design. Implementing the automation of FPGA-PDRS needs to solve the problems of task modules partitioning, scheduling, and floorplanning on heterogeneous resources. Existing works only partly solve problems for the automation process of FPGA-PDRS or model homogeneous resource for FPGA-PDRS. To better solve the problems in the automation process of FPGA-PDRS and narrow the gap between algorithm and application, in this paper, we propose a complete workflow including three parts: pre-processing to generate the lists of task module candidate shapes according to the resource requirements, exploration process to search the solution of task modules partitioning, scheduling, and floorplanning, and post-optimization to improve the floorplan success rate. Experimental results show that, compared with state-of-the-art work, the pre-processing process can reduce the occupied area of task modules by 6% on average; the proposed complete workflow can improve performance by 9.6%, and reduce communication cost by 14.2% with improving the resources reuse rate of the heterogeneous resources on the chip. Based on the solution generated by the exploration process, the post-optimization process can improve the floorplan success rate by 11%.

一些现场可编程门阵列(fpga)可以部分动态重构分布在芯片上的异构资源。基于fpga的部分动态可重构系统(partial dynamic reconfigurable system, FPGA-PDRS)可以提高计算速度和计算灵活性。然而，传统的FPGA-PDRS设计是基于手工设计的。实现FPGA-PDRS的自动化需要解决任务模块在异构资源上的划分、调度和布局问题。现有的工作只是部分解决了FPGA-PDRS自动化过程或FPGA-PDRS模型资源同构的问题。为了更好地解决FPGA-PDRS自动化过程中存在的问题，缩小算法与应用之间的差距，本文提出了一个完整的工作流程，包括三个部分:预处理，根据资源需求生成任务模块候选形状列表;探索，搜索任务模块划分、调度和布局的解决方案;后期优化，提高布局成功率。实验结果表明，与现有工作相比，该预处理过程可使任务模块占用面积平均减少6%;提出的完整工作流提高了芯片上异构资源的资源重用率，性能提高了9.6%，通信成本降低了14.2%。基于勘探过程生成的解，后优化过程可将平面图成功率提高11%。

{"title":"Task modules Partitioning, Scheduling and Floorplanning for Partially Dynamically Reconfigurable Systems with Heterogeneous Resources","authors":"Bo Ding, Jinglei Huang, Junpeng Wang, Qi Xu, Song Chen, Yi Kang","doi":"10.1145/3625295","DOIUrl":"https://doi.org/10.1145/3625295","url":null,"abstract":"Some field programmable gate arrays (FPGAs) can be partially dynamically reconfigurable with heterogeneous resources distributed on the chip. FPGA-based partially dynamically reconfigurable system (FPGA-PDRS) can be used to accelerate computing and improve computing flexibility. However, the traditional design of FPGA-PDRS is based on manual design. Implementing the automation of FPGA-PDRS needs to solve the problems of task modules partitioning, scheduling, and floorplanning on heterogeneous resources. Existing works only partly solve problems for the automation process of FPGA-PDRS or model homogeneous resource for FPGA-PDRS. To better solve the problems in the automation process of FPGA-PDRS and narrow the gap between algorithm and application, in this paper, we propose a complete workflow including three parts: pre-processing to generate the lists of task module candidate shapes according to the resource requirements, exploration process to search the solution of task modules partitioning, scheduling, and floorplanning, and post-optimization to improve the floorplan success rate. Experimental results show that, compared with state-of-the-art work, the pre-processing process can reduce the occupied area of task modules by 6% on average; the proposed complete workflow can improve performance by 9.6%, and reduce communication cost by 14.2% with improving the resources reuse rate of the heterogeneous resources on the chip. Based on the solution generated by the exploration process, the post-optimization process can improve the floorplan success rate by 11%.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134958022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lightning: Leveraging DVFS-induced Transient Fault Injection to Attack Deep Learning Accelerator of GPUs

4区计算机科学 Q2 Computer Science

ACM Transactions on Design Automation of Electronic Systems

Pub Date : 2023-09-20 DOI: 10.1145/3617893

Rihui sun, Pengfei Qiu, Yongqiang Lyu, Jian Dong, Haixia Wang, Dongsheng Wang, Gang Qu

Graphics Processing Units(GPU) are widely used as deep learning accelerators because of its high performance and low power consumption. Additionally, it remains secure against hardware-induced transient fault injection attacks, a classic type of attacks that have been developed on other computing platforms. In this work, we demonstrate that well-trained machine learning models are robust against hardware fault injection attacks when the faults are generated randomly. However, we discover that these models have components, which we refer to as sensitive targets, that are vulnerable to faults. By exploiting this vulnerability, we propose the Lightning attack, which precisely strikes the model’s sensitive targets with hardware-induced transient faults based on the Dynamic Voltage and Frequency Scaling (DVFS). We design a sensitive targets search algorithm to find the most critical processing units of Deep Neural Network(DNN) models determining the inference results, and develop a genetic algorithm to automatically optimize the attack parameters for DVFS to induce faults. Experiments on three commodity Nvidia GPUs for four widely-used DNN models show that the proposed Lightning attack can reduce the inference accuracy by 69.1% on average for non-targeted attacks, and, more interestingly, achieve a success rate of 67.9% for targeted attacks.

此外，它对硬件引起的瞬态故障注入攻击保持安全，这是一种在其他计算平台上开发的经典攻击类型。在这项工作中，我们证明了当故障随机产生时，训练有素的机器学习模型对硬件故障注入攻击具有鲁棒性。然而，我们发现这些模型具有组件，我们将其称为敏感目标，这些组件容易受到故障的影响。利用这一漏洞，我们提出了基于动态电压频率缩放(DVFS)的闪电攻击方法，该方法利用硬件诱导的瞬态故障精确攻击模型的敏感目标。设计了敏感目标搜索算法，寻找深度神经网络(DNN)模型中决定推理结果的最关键处理单元，并开发了遗传算法，自动优化DVFS的攻击参数以诱导故障。在三款Nvidia商用gpu上对四种广泛使用的DNN模型进行的实验表明，提出的闪电攻击对非目标攻击的推理准确率平均降低69.1%，更有趣的是，对目标攻击的成功率达到67.9%。

{"title":"Lightning: Leveraging DVFS-induced Transient Fault Injection to Attack Deep Learning Accelerator of GPUs","authors":"Rihui sun, Pengfei Qiu, Yongqiang Lyu, Jian Dong, Haixia Wang, Dongsheng Wang, Gang Qu","doi":"10.1145/3617893","DOIUrl":"https://doi.org/10.1145/3617893","url":null,"abstract":"Graphics Processing Units(GPU) are widely used as deep learning accelerators because of its high performance and low power consumption. Additionally, it remains secure against hardware-induced transient fault injection attacks, a classic type of attacks that have been developed on other computing platforms. In this work, we demonstrate that well-trained machine learning models are robust against hardware fault injection attacks when the faults are generated randomly. However, we discover that these models have components, which we refer to as sensitive targets, that are vulnerable to faults. By exploiting this vulnerability, we propose the Lightning attack, which precisely strikes the model’s sensitive targets with hardware-induced transient faults based on the Dynamic Voltage and Frequency Scaling (DVFS). We design a sensitive targets search algorithm to find the most critical processing units of Deep Neural Network(DNN) models determining the inference results, and develop a genetic algorithm to automatically optimize the attack parameters for DVFS to induce faults. Experiments on three commodity Nvidia GPUs for four widely-used DNN models show that the proposed Lightning attack can reduce the inference accuracy by 69.1% on average for non-targeted attacks, and, more interestingly, achieve a success rate of 67.9% for targeted attacks.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136308878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural Network Pruning 替代拉格朗日松弛:无重训练的深度神经网络修剪路径

4区计算机科学 Q2 Computer Science

ACM Transactions on Design Automation of Electronic Systems

Pub Date : 2023-09-19 DOI: 10.1145/3624476

Shanglin Zhou, Mikhail A. Bragin, Deniz Gurevin, Lynn Pepin, Fei Miao, Caiwen Ding

Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. However, the typical three-stage pipeline (i.e., training, pruning, and retraining (fine-tuning)) significantly increases the overall training time. In this article, we develop a systematic weight-pruning optimization approach based on surrogate Lagrangian relaxation (SLR), which is tailored to overcome difficulties caused by the discrete nature of the weight-pruning problem. We further prove that our method ensures fast convergence of the model compression problem, and the convergence of the SLR is accelerated by using quadratic penalties. Model parameters obtained by SLR during the training phase are much closer to their optimal values as compared to those obtained by other state-of-the-art methods. We evaluate our method on image classification tasks using CIFAR-10 and ImageNet with state-of-the-art multi-layer perceptron based networks such as MLP-Mixer; attention-based networks such as Swin Transformer; and convolutional neural network based models such as VGG-16, ResNet-18, ResNet-50, ResNet-110, and MobileNetV2. We also evaluate object detection and segmentation tasks on COCO, the KITTI benchmark, and the TuSimple lane detection dataset using a variety of models. Experimental results demonstrate that our SLR-based weight-pruning optimization approach achieves a higher compression rate than state-of-the-art methods under the same accuracy requirement and also can achieve higher accuracy under the same compression rate requirement. Under classification tasks, our SLR approach converges to the desired accuracy × faster on both of the datasets. Under object detection and segmentation tasks, SLR also converges 2× faster to the desired accuracy. Further, our SLR achieves high model accuracy even at the hardpruning stage without retraining, which reduces the traditional three-stage pruning into a two-stage process. Given a limited budget of retraining epochs, our approach quickly recovers the model’s accuracy.

网络修剪是一种广泛使用的技术，以减少深度神经网络的计算成本和模型大小。然而，典型的三阶段管道，即训练、修剪和再训练(微调)，显著地增加了总体训练时间。在本文中，我们开发了一种基于代理拉格朗日松弛(SLR)的系统权剪枝优化方法，该方法专门用于克服权剪枝问题的离散性所带来的困难。进一步证明了该方法保证了模型压缩问题的快速收敛，并且通过使用二次惩罚加快了SLR的收敛速度。与其他最先进的方法相比，单反法在训练阶段获得的模型参数更接近于其最优值。我们使用CIFAR-10和ImageNet对图像分类任务的方法进行了评估，其中包括基于多层感知器(mlp)的网络(如MLP-Mixer)、基于注意力的网络(如Swin Transformer)和基于卷积神经网络的模型(如VGG-16、ResNet-18、ResNet-50和ResNet-110、MobileNetV2)。我们还使用各种模型在COCO、KITTI基准和tussimple车道检测数据集上评估了目标检测和分割任务。实验结果表明，在相同精度要求下，基于slr的权重剪枝优化方法获得了比现有方法更高的压缩率，并且在相同压缩率要求下也可以获得更高的精度。在分类任务下，我们的单反方法在两个数据集上收敛到所需精度的速度提高了3倍。在目标检测和分割任务中，单反也以2倍的速度收敛到所需的精度。此外，我们的单反即使在硬修剪阶段也能达到很高的模型精度，而无需再训练，从而将传统的三阶段修剪过程减少到两阶段。由于再训练周期的预算有限，我们的方法可以快速恢复模型的准确性。

{"title":"Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural Network Pruning","authors":"Shanglin Zhou, Mikhail A. Bragin, Deniz Gurevin, Lynn Pepin, Fei Miao, Caiwen Ding","doi":"10.1145/3624476","DOIUrl":"https://doi.org/10.1145/3624476","url":null,"abstract":"Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. However, the typical three-stage pipeline (i.e., training, pruning, and retraining (fine-tuning)) significantly increases the overall training time. In this article, we develop a systematic weight-pruning optimization approach based on surrogate Lagrangian relaxation (SLR), which is tailored to overcome difficulties caused by the discrete nature of the weight-pruning problem. We further prove that our method ensures fast convergence of the model compression problem, and the convergence of the SLR is accelerated by using quadratic penalties. Model parameters obtained by SLR during the training phase are much closer to their optimal values as compared to those obtained by other state-of-the-art methods. We evaluate our method on image classification tasks using CIFAR-10 and ImageNet with state-of-the-art multi-layer perceptron based networks such as MLP-Mixer; attention-based networks such as Swin Transformer; and convolutional neural network based models such as VGG-16, ResNet-18, ResNet-50, ResNet-110, and MobileNetV2. We also evaluate object detection and segmentation tasks on COCO, the KITTI benchmark, and the TuSimple lane detection dataset using a variety of models. Experimental results demonstrate that our SLR-based weight-pruning optimization approach achieves a higher compression rate than state-of-the-art methods under the same accuracy requirement and also can achieve higher accuracy under the same compression rate requirement. Under classification tasks, our SLR approach converges to the desired accuracy × faster on both of the datasets. Under object detection and segmentation tasks, SLR also converges 2× faster to the desired accuracy. Further, our SLR achieves high model accuracy even at the hardpruning stage without retraining, which reduces the traditional three-stage pruning into a two-stage process. Given a limited budget of retraining epochs, our approach quickly recovers the model’s accuracy.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135063048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Introduction to the Special Section on Advances in Physical Design Automation 物理设计自动化进展专题介绍

4区计算机科学 Q2 Computer Science

ACM Transactions on Design Automation of Electronic Systems

Pub Date : 2023-09-09 DOI: 10.1145/3604593

Iris Hru Jiang, David Chinnery, Gracieli Posser, Jens Lienig

equation and proximal group alternating direction method of multipliers ( ADMM ). A fast computation of 3D Poisson’s equation and a parameter updating scheme are presented to accelerate the convergence of the optimization problem. “A Fast Optimal Double Row Legalization Algorithm,” by Hougardy et al., improves the legalization step in standard-cell placement by minimizing cell displacement for both single-row and double-row height cells, assuming a fixed left-to-right ordering within each row. In doing so, the authors do not artificially bound the maximum cell movement and can guarantee to find an optimal solution with minimum cell displacement

引用次数: 0

A Fast Optimal Double-row Legalization Algorithm 一种快速最优双行合法化算法

4区计算机科学 Q2 Computer Science

ACM Transactions on Design Automation of Electronic Systems

Pub Date : 2023-09-08 DOI: 10.1145/3579844

Stefan Hougardy, Meike Neuwohner, Ulrike Schorr

In Placement Legalization, it is often assumed that (almost) all standard cells possess the same height and can therefore be aligned in cell rows , which can then be treated independently. However, this is no longer true for recent technologies, where a substantial number of cells of double- or even arbitrary multiple-row height is to be expected. Due to interdependencies between the cell placements within several rows, the legalization task becomes considerably harder. In this article, we show how to optimize squared cell movement for pairs of adjacent rows comprising cells of single- as well as double-row height with a fixed left-to-right ordering in time 𝒪( n · log ( n )), where n denotes the number of cells involved. Opposed to prior works, we do not artificially bound the maximum cell movement and can guarantee to find an optimum solution. Our approach also allows us to include gridding and movebound constraints for the cells. Experimental results show an average percental decrease of over 26% in the total squared movement when compared to a legalization approach that fixes cells of more than single-row height after Global Placement.

在位置合法化中，通常假设(几乎)所有标准单元格具有相同的高度，因此可以在单元格行中对齐，然后可以独立处理。然而，对于最近的技术来说，这不再是正确的，在这些技术中，期望有大量的双行甚至任意多行高度的单元格。由于几行内单元格位置之间的相互依赖关系，合法化任务变得相当困难。在本文中，我们展示了如何优化由单行和双行高度的单元组成的相邻行对的平方单元移动，并且在时间 (n·log (n))中具有固定的从左到右顺序，其中n表示所涉及的单元数。与以往的工作不同，我们没有人为地限制细胞的最大运动，可以保证找到最优解。我们的方法还允许我们为单元格添加网格和移动约束。实验结果显示，与在全局放置后修复超过单行高度的单元格的合法化方法相比，总平方移动的平均百分比减少了26%以上。

{"title":"A Fast Optimal Double-row Legalization Algorithm","authors":"Stefan Hougardy, Meike Neuwohner, Ulrike Schorr","doi":"10.1145/3579844","DOIUrl":"https://doi.org/10.1145/3579844","url":null,"abstract":"In Placement Legalization, it is often assumed that (almost) all standard cells possess the same height and can therefore be aligned in cell rows , which can then be treated independently. However, this is no longer true for recent technologies, where a substantial number of cells of double- or even arbitrary multiple-row height is to be expected. Due to interdependencies between the cell placements within several rows, the legalization task becomes considerably harder. In this article, we show how to optimize squared cell movement for pairs of adjacent rows comprising cells of single- as well as double-row height with a fixed left-to-right ordering in time 𝒪( n · log ( n )), where n denotes the number of cells involved. Opposed to prior works, we do not artificially bound the maximum cell movement and can guarantee to find an optimum solution. Our approach also allows us to include gridding and movebound constraints for the cells. Experimental results show an average percental decrease of over 26% in the total squared movement when compared to a legalization approach that fixes cells of more than single-row height after Global Placement.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136298928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Target Fluid Mixing in MEDA Biochips: Theory and an Attempt towards Waste Minimization MEDA生物芯片中多目标流体混合:最小化浪费的理论与尝试

IF 1.4 4区计算机科学 Q2 Computer Science

ACM Transactions on Design Automation of Electronic Systems

Pub Date : 2023-09-06 DOI: 10.1145/3622785

Debraj Kundu, Sudip Roy

Sample preparation is an inherent procedure of many biochemical applications, and digital microfluidic biochips (DMBs) proved to be very effective in performing such a procedure. In a single mixing step, conventional DMBs can mix two droplets in 1:1 ratio only. Due to this limitation, DMBs suffer from heavy fluid wastage and often require a lot of mixing steps. However, the next-generation DMBs, i.e., micro-electrode-dot-array (MEDA) biochips can realize multiple mixing ratios, which in general helps in minimizing the number of mixing operations. In this paper, we present a heuristic-based sample preparation algorithm, specifically a mixing algorithm called Division by Factor Method for MEDA that exploits the mixing models of MEDA biochips. We propose another mixing algorithm for MEDA biochips called Single Target Waste Minimization (STWM), which minimizes the wastage of fluids and determines an efficient mixing graph. We also propose an advanced methodology for multiple target reagent mixing problems called Multi-Target Waste Minimization (MTWM), which determines efficient mixing graphs for different target ratios by maximizing the sharing of fluids and minimizing the fluid wastage. Simulation results suggest that the proposed STWM and MTWM methods outperform the state-of-the-art methods in terms of minimizing the amount of fluid wastage, reducing the total usage of reagent fluids, and minimizing the number of mixing operations.

样品制备是许多生物化学应用的固有过程，数字微流控生物芯片（DMBs）被证明在执行这种过程中非常有效。在单个混合步骤中，传统DMB只能以1:1的比例混合两个液滴。由于这种限制，DMB遭受严重的流体浪费，并且通常需要大量的混合步骤。然而，下一代DMB，即微电极点阵列（MEDA）生物芯片，可以实现多种混合比，这通常有助于最大限度地减少混合操作的次数。在本文中，我们提出了一种基于启发式的样品制备算法，特别是一种称为MEDA因子除法的混合算法，该算法利用了MEDA生物芯片的混合模型。我们为MEDA生物芯片提出了另一种混合算法，称为单目标废物最小化（STWM），该算法最大限度地减少了流体的浪费，并确定了有效的混合图。我们还提出了一种用于多目标试剂混合问题的高级方法，称为多目标废物最小化（MTWM），该方法通过最大化流体共享和最小化流体浪费来确定不同目标比例的有效混合图。仿真结果表明，所提出的STWM和MTWM方法在最小化流体浪费量、减少试剂流体的总使用量和最小化混合操作次数方面优于最先进的方法。

{"title":"Multi-Target Fluid Mixing in MEDA Biochips: Theory and an Attempt towards Waste Minimization","authors":"Debraj Kundu, Sudip Roy","doi":"10.1145/3622785","DOIUrl":"https://doi.org/10.1145/3622785","url":null,"abstract":"Sample preparation is an inherent procedure of many biochemical applications, and digital microfluidic biochips (DMBs) proved to be very effective in performing such a procedure. In a single mixing step, conventional DMBs can mix two droplets in 1:1 ratio only. Due to this limitation, DMBs suffer from heavy fluid wastage and often require a lot of mixing steps. However, the next-generation DMBs, i.e., micro-electrode-dot-array (MEDA) biochips can realize multiple mixing ratios, which in general helps in minimizing the number of mixing operations. In this paper, we present a heuristic-based sample preparation algorithm, specifically a mixing algorithm called Division by Factor Method for MEDA that exploits the mixing models of MEDA biochips. We propose another mixing algorithm for MEDA biochips called Single Target Waste Minimization (STWM), which minimizes the wastage of fluids and determines an efficient mixing graph. We also propose an advanced methodology for multiple target reagent mixing problems called Multi-Target Waste Minimization (MTWM), which determines efficient mixing graphs for different target ratios by maximizing the sharing of fluids and minimizing the fluid wastage. Simulation results suggest that the proposed STWM and MTWM methods outperform the state-of-the-art methods in terms of minimizing the amount of fluid wastage, reducing the total usage of reagent fluids, and minimizing the number of mixing operations.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48611534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Programmable In-memory Computing Circuit of Fast Hartley Transform 快速哈特利变换的可编程内存计算电路

IF 1.4 4区计算机科学 Q2 Computer Science

ACM Transactions on Design Automation of Electronic Systems

Pub Date : 2023-09-01 DOI: 10.1145/3618112

Q. Hong, Richeng Huang, Pin-an Xiao, Jun Yu Li, Jingru Sun, Jiliang Zhang

Discrete Hartley transform is a core component of digital signal processing because of its advantages of fast computing speed and less power consumption. Traditional FPGA-based implementation methods have the disadvantage of high latency, which cannot meet the needs of energy-efficient computing in the Internet of Things era. Therefore, A programmable analog memory computing circuit is proposed to accelerate FHT and IFHT calculations for large-scale one-step matrix computation. By adjusting the weight of memristor, different scales of FHT calculation can be achieved. PSPICE simulation results show that the average accuracy of the proposed circuit can reach 99.9%, and the speed can also reach the level of 0.1μs. The robustness analysis shows that the circuit can tolerate a certain degree of programming error and resistance tolerance. The designed analog circuit is applied to image compression processing, and the image compression accuracy can reach 99.9%.

离散哈特利变换具有计算速度快、功耗低等优点，是数字信号处理的核心组成部分。传统的基于fpga的实现方法存在时延高的缺点，无法满足物联网时代节能计算的需求。为此，提出了一种可编程模拟存储器计算电路，以加速大规模一步矩阵计算的FHT和IFHT计算。通过调整忆阻器的权重，可以实现不同尺度的FHT计算。PSPICE仿真结果表明，该电路的平均精度可以达到99.9%，速度也可以达到0.1μs的水平。鲁棒性分析表明，该电路能够承受一定程度的编程误差和电阻容忍度。将所设计的模拟电路应用于图像压缩处理，图像压缩精度可达99.9%。

引用次数: 0

Automatic Synthesis of FSMs for Enforcing Non-functional Requirements on MPSoCs using Multi-Objective Evolutionary Algorithms 使用多目标进化算法自动合成FSM以强制MPSoC的非功能需求

IF 1.4 4区计算机科学 Q2 Computer Science

ACM Transactions on Design Automation of Electronic Systems

Pub Date : 2023-08-29 DOI: 10.1145/3617832

Khalil Esper, S. Wildermann, J. Teich

Embedded system applications often require guarantees regarding non-functional properties when executed on a given MPSoC platform. Examples of such requirements include real-time, energy or safety properties on corresponding programs. One option to implement the enforcement of such requirements is by a reactive control loop, where an enforcer decides based on a system response (feedback) how to control the system, e.g., by adapting the number of cores allocated to a program or by scaling the voltage/frequency mode of involved processors. Typically, a violation of a requirement must either never happen in case of strict enforcement, or only happen temporally (in case of so-called loose enforcement). However, it is a challenge to design enforcers for which it is possible to give formal guarantees with respect to requirements, especially in the presence of typically largely varying environmental input (workload) per execution. Technically, an enforcement strategy can be formally modeled by a finite state machine (FSM) and the uncertain environment determining the workload by a discrete-time Markov chain. It has been shown in previous work that this formalization allows the formal verification of temporal properties (verification goals) regarding the fulfillment of requirements for a given enforcement strategy. In this paper, we consider the so far unsolved problem of design space exploration and automatic synthesis of enforcement automata that maximize a number of deterministic and probabilistic verification goals formulated on a given set of non-functional requirements. For the design space exploration (DSE), an approach based on multi-objective evolutionary algorithms is proposed in which enforcement automata are encoded as genes of states and state transition conditions. For each individual, the verification goals are evaluated using probabilistic model checking. At the end, the DSE returns a set of efficient FSMs in terms of probabilities of meeting given requirements. As experimental results, we present three use cases while considering requirements on latency and energy consumption.

嵌入式系统应用程序在给定的MPSoC平台上执行时，通常需要关于非功能属性的保证。此类要求的示例包括相应程序的实时性、能量或安全性。实现这种要求的实施的一种选择是通过无功控制回路，其中实施者基于系统响应（反馈）来决定如何控制系统，例如，通过调整分配给程序的内核数量或通过缩放所涉及处理器的电压/频率模式。通常，在严格执行的情况下，违反要求的行为必须永远不会发生，或者只是暂时发生（在所谓的宽松执行的情况中）。然而，设计执行器是一个挑战，可以为其提供关于需求的正式保证，特别是在每次执行通常存在很大变化的环境输入（工作量）的情况下。从技术上讲，执行策略可以通过有限状态机（FSM）和通过离散时间马尔可夫链确定工作负载的不确定环境来形式化建模。在以前的工作中已经表明，这种形式化允许对与满足给定执行策略的要求有关的时间属性（验证目标）进行正式验证。在本文中，我们考虑了迄今为止尚未解决的设计空间探索和强制自动机的自动合成问题，该问题最大化了在给定的一组非功能需求上制定的许多确定性和概率性验证目标。对于设计空间探索（DSE），提出了一种基于多目标进化算法的方法，其中执行自动机被编码为状态和状态转换条件的基因。对于每个个体，使用概率模型检查来评估验证目标。最后，DSE根据满足给定要求的概率返回一组有效的FSM。作为实验结果，我们提出了三个用例，同时考虑了对延迟和能耗的要求。

{"title":"Automatic Synthesis of FSMs for Enforcing Non-functional Requirements on MPSoCs using Multi-Objective Evolutionary Algorithms","authors":"Khalil Esper, S. Wildermann, J. Teich","doi":"10.1145/3617832","DOIUrl":"https://doi.org/10.1145/3617832","url":null,"abstract":"Embedded system applications often require guarantees regarding non-functional properties when executed on a given MPSoC platform. Examples of such requirements include real-time, energy or safety properties on corresponding programs. One option to implement the enforcement of such requirements is by a reactive control loop, where an enforcer decides based on a system response (feedback) how to control the system, e.g., by adapting the number of cores allocated to a program or by scaling the voltage/frequency mode of involved processors. Typically, a violation of a requirement must either never happen in case of strict enforcement, or only happen temporally (in case of so-called loose enforcement). However, it is a challenge to design enforcers for which it is possible to give formal guarantees with respect to requirements, especially in the presence of typically largely varying environmental input (workload) per execution. Technically, an enforcement strategy can be formally modeled by a finite state machine (FSM) and the uncertain environment determining the workload by a discrete-time Markov chain. It has been shown in previous work that this formalization allows the formal verification of temporal properties (verification goals) regarding the fulfillment of requirements for a given enforcement strategy. In this paper, we consider the so far unsolved problem of design space exploration and automatic synthesis of enforcement automata that maximize a number of deterministic and probabilistic verification goals formulated on a given set of non-functional requirements. For the design space exploration (DSE), an approach based on multi-objective evolutionary algorithms is proposed in which enforcement automata are encoded as genes of states and state transition conditions. For each individual, the verification goals are evaluated using probabilistic model checking. At the end, the DSE returns a set of efficient FSMs in terms of probabilities of meeting given requirements. As experimental results, we present three use cases while considering requirements on latency and energy consumption.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47572110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0