Multi-FPGA systems are widely used in various circuit design-related areas, such as hardware emulation, virtual prototypes, and chiplet design methodologies. However, a physical resource clash between inter-FPGA signals and I/O pins can create a bottleneck in a multi-FPGA system. Specifically, inter-FPGA signals often outnumber I/O pins in a multi-FPGA system. To solve this problem, time-division multiplexing (TDM) is introduced. However, undue time delay caused by TDM may impair the performance of a multi-FPGA system. Therefore, a more efficient TDM solution is needed. In this work, we propose a new routing sequence strategy to improve the efficiency of TDM. Our strategy consists of two parts: a weighted routing algorithm and TDM assignment optimization. The algorithm takes into account the weight of the net to generate a high-quality routing topology. Then, a net-based TDM assignment is performed to obtain a lower TDM ratio for the multi-FPGA system. Experiments on the public dataset of CAD Contest 2019 at ICCAD showed that our routing sequence strategy achieved good results. Especially in those testcases of unbalanced designs, the performance of multi-FPGA systems was improved up to 2.63. Moreover, we outperformed the top two contest finalists as to TDM results in most of the testcases.
{"title":"Sequential Routing-Based Time-Division Multiplexing Optimization for Multi-FPGA Systems","authors":"Wenxiong Lin, Haojie Wu, Peng Gao, Wenjun Luo, Shuting Cai, Xiaoming Xiong","doi":"10.1145/3626322","DOIUrl":"https://doi.org/10.1145/3626322","url":null,"abstract":"Multi-FPGA systems are widely used in various circuit design-related areas, such as hardware emulation, virtual prototypes, and chiplet design methodologies. However, a physical resource clash between inter-FPGA signals and I/O pins can create a bottleneck in a multi-FPGA system. Specifically, inter-FPGA signals often outnumber I/O pins in a multi-FPGA system. To solve this problem, time-division multiplexing (TDM) is introduced. However, undue time delay caused by TDM may impair the performance of a multi-FPGA system. Therefore, a more efficient TDM solution is needed. In this work, we propose a new routing sequence strategy to improve the efficiency of TDM. Our strategy consists of two parts: a weighted routing algorithm and TDM assignment optimization. The algorithm takes into account the weight of the net to generate a high-quality routing topology. Then, a net-based TDM assignment is performed to obtain a lower TDM ratio for the multi-FPGA system. Experiments on the public dataset of CAD Contest 2019 at ICCAD showed that our routing sequence strategy achieved good results. Especially in those testcases of unbalanced designs, the performance of multi-FPGA systems was improved up to 2.63. Moreover, we outperformed the top two contest finalists as to TDM results in most of the testcases.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135482532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thanks to the enhanced computational capacity of modern computers, even sophisticated analog/RF circuit sizing problems can be solved via electronic design automation (EDA) tools. Recently, several analog/RF circuit optimization algorithms have been successfully applied to automatize the analog/RF circuit design process. Conventionally, metaheuristic algorithms are widely used in optimization process. Among various nature-inspired algorithms, evolutionary algorithms (EAs) have been more preferred due to their superiorities (robustness, efficiency, accuracy etc.) over the other algorithms. Furthermore, EAs have been diversified and several distinguished analog/RF circuit optimization approaches for single-, multi-, and many- objective problems have been reported in the literature. However, there are conflicting claims on the performance of these algorithms and no objective performance comparison has been revealed yet. In the previous work, only a few case study circuits have been under test to demonstrate the superiority of the utilized algorithm, so a limited comparison has been made for only these specific circuits. The underlying reason is that the literature lacks a generic benchmark for analog/RF circuit sizing problem. To address these issues, we propose a comprehensive comparison of the most popular two evolutionary computation algorithms, namely Non-Sorting Genetic Algorithm-II (NSGA-II) and Multi-Objective Evolutionary Algorithm based Decomposition (MOEA/D), in this paper. For that purpose, we introduce two ad-hoc testbenches for analog (ANLG) and radio frequency (RF) circuits including the common building blocks. The comparison has been made at both multi- and many- objective domains and the performances of algorithms have been quantitatively revealed through the well-known Pareto-optimal front quality metrics.
由于现代计算机的计算能力增强,即使是复杂的模拟/射频电路尺寸问题也可以通过电子设计自动化(EDA)工具来解决。近年来,一些模拟/射频电路优化算法已成功地应用于模拟/射频电路设计过程的自动化。传统上,元启发式算法被广泛用于优化过程。在各种受自然启发的算法中,进化算法(EAs)由于其鲁棒性、效率、准确性等优势而受到其他算法的青睐。此外,ea已经多样化,并且文献中已经报道了针对单目标、多目标和多目标问题的几种不同的模拟/射频电路优化方法。然而,对这些算法的性能有相互矛盾的说法,目前还没有发现客观的性能比较。在之前的工作中,只有少数案例研究电路进行了测试,以证明所使用算法的优越性,因此仅对这些特定电路进行了有限的比较。潜在的原因是,文献缺乏模拟/射频电路尺寸问题的通用基准。为了解决这些问题,本文提出了两种最流行的进化计算算法,即非排序遗传算法- ii (NSGA-II)和基于分解的多目标进化算法(MOEA/D)的综合比较。为此,我们介绍了模拟(ANLG)和射频(RF)电路的两个特设测试台,包括常见的构建块。在多目标和多目标领域进行了比较,并通过著名的帕累托最优前端质量指标定量地揭示了算法的性能。
{"title":"MOEA/D vs. NSGA-II: A Comprehensive Comparison for Multi/Many Objective Analog/RF Circuit Optimization Through A Generic Benchmark","authors":"Enes Sağlıcan, Engin Afacan","doi":"10.1145/3626096","DOIUrl":"https://doi.org/10.1145/3626096","url":null,"abstract":"Thanks to the enhanced computational capacity of modern computers, even sophisticated analog/RF circuit sizing problems can be solved via electronic design automation (EDA) tools. Recently, several analog/RF circuit optimization algorithms have been successfully applied to automatize the analog/RF circuit design process. Conventionally, metaheuristic algorithms are widely used in optimization process. Among various nature-inspired algorithms, evolutionary algorithms (EAs) have been more preferred due to their superiorities (robustness, efficiency, accuracy etc.) over the other algorithms. Furthermore, EAs have been diversified and several distinguished analog/RF circuit optimization approaches for single-, multi-, and many- objective problems have been reported in the literature. However, there are conflicting claims on the performance of these algorithms and no objective performance comparison has been revealed yet. In the previous work, only a few case study circuits have been under test to demonstrate the superiority of the utilized algorithm, so a limited comparison has been made for only these specific circuits. The underlying reason is that the literature lacks a generic benchmark for analog/RF circuit sizing problem. To address these issues, we propose a comprehensive comparison of the most popular two evolutionary computation algorithms, namely Non-Sorting Genetic Algorithm-II (NSGA-II) and Multi-Objective Evolutionary Algorithm based Decomposition (MOEA/D), in this paper. For that purpose, we introduce two ad-hoc testbenches for analog (ANLG) and radio frequency (RF) circuits including the common building blocks. The comparison has been made at both multi- and many- objective domains and the performances of algorithms have been quantitatively revealed through the well-known Pareto-optimal front quality metrics.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135385291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Ding, Jinglei Huang, Junpeng Wang, Qi Xu, Song Chen, Yi Kang
Some field programmable gate arrays (FPGAs) can be partially dynamically reconfigurable with heterogeneous resources distributed on the chip. FPGA-based partially dynamically reconfigurable system (FPGA-PDRS) can be used to accelerate computing and improve computing flexibility. However, the traditional design of FPGA-PDRS is based on manual design. Implementing the automation of FPGA-PDRS needs to solve the problems of task modules partitioning, scheduling, and floorplanning on heterogeneous resources. Existing works only partly solve problems for the automation process of FPGA-PDRS or model homogeneous resource for FPGA-PDRS. To better solve the problems in the automation process of FPGA-PDRS and narrow the gap between algorithm and application, in this paper, we propose a complete workflow including three parts: pre-processing to generate the lists of task module candidate shapes according to the resource requirements, exploration process to search the solution of task modules partitioning, scheduling, and floorplanning, and post-optimization to improve the floorplan success rate. Experimental results show that, compared with state-of-the-art work, the pre-processing process can reduce the occupied area of task modules by 6% on average; the proposed complete workflow can improve performance by 9.6%, and reduce communication cost by 14.2% with improving the resources reuse rate of the heterogeneous resources on the chip. Based on the solution generated by the exploration process, the post-optimization process can improve the floorplan success rate by 11%.
{"title":"Task modules Partitioning, Scheduling and Floorplanning for Partially Dynamically Reconfigurable Systems with Heterogeneous Resources","authors":"Bo Ding, Jinglei Huang, Junpeng Wang, Qi Xu, Song Chen, Yi Kang","doi":"10.1145/3625295","DOIUrl":"https://doi.org/10.1145/3625295","url":null,"abstract":"Some field programmable gate arrays (FPGAs) can be partially dynamically reconfigurable with heterogeneous resources distributed on the chip. FPGA-based partially dynamically reconfigurable system (FPGA-PDRS) can be used to accelerate computing and improve computing flexibility. However, the traditional design of FPGA-PDRS is based on manual design. Implementing the automation of FPGA-PDRS needs to solve the problems of task modules partitioning, scheduling, and floorplanning on heterogeneous resources. Existing works only partly solve problems for the automation process of FPGA-PDRS or model homogeneous resource for FPGA-PDRS. To better solve the problems in the automation process of FPGA-PDRS and narrow the gap between algorithm and application, in this paper, we propose a complete workflow including three parts: pre-processing to generate the lists of task module candidate shapes according to the resource requirements, exploration process to search the solution of task modules partitioning, scheduling, and floorplanning, and post-optimization to improve the floorplan success rate. Experimental results show that, compared with state-of-the-art work, the pre-processing process can reduce the occupied area of task modules by 6% on average; the proposed complete workflow can improve performance by 9.6%, and reduce communication cost by 14.2% with improving the resources reuse rate of the heterogeneous resources on the chip. Based on the solution generated by the exploration process, the post-optimization process can improve the floorplan success rate by 11%.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134958022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graphics Processing Units(GPU) are widely used as deep learning accelerators because of its high performance and low power consumption. Additionally, it remains secure against hardware-induced transient fault injection attacks, a classic type of attacks that have been developed on other computing platforms. In this work, we demonstrate that well-trained machine learning models are robust against hardware fault injection attacks when the faults are generated randomly. However, we discover that these models have components, which we refer to as sensitive targets, that are vulnerable to faults. By exploiting this vulnerability, we propose the Lightning attack, which precisely strikes the model’s sensitive targets with hardware-induced transient faults based on the Dynamic Voltage and Frequency Scaling (DVFS). We design a sensitive targets search algorithm to find the most critical processing units of Deep Neural Network(DNN) models determining the inference results, and develop a genetic algorithm to automatically optimize the attack parameters for DVFS to induce faults. Experiments on three commodity Nvidia GPUs for four widely-used DNN models show that the proposed Lightning attack can reduce the inference accuracy by 69.1% on average for non-targeted attacks, and, more interestingly, achieve a success rate of 67.9% for targeted attacks.
{"title":"Lightning: Leveraging DVFS-induced Transient Fault Injection to Attack Deep Learning Accelerator of GPUs","authors":"Rihui sun, Pengfei Qiu, Yongqiang Lyu, Jian Dong, Haixia Wang, Dongsheng Wang, Gang Qu","doi":"10.1145/3617893","DOIUrl":"https://doi.org/10.1145/3617893","url":null,"abstract":"Graphics Processing Units(GPU) are widely used as deep learning accelerators because of its high performance and low power consumption. Additionally, it remains secure against hardware-induced transient fault injection attacks, a classic type of attacks that have been developed on other computing platforms. In this work, we demonstrate that well-trained machine learning models are robust against hardware fault injection attacks when the faults are generated randomly. However, we discover that these models have components, which we refer to as sensitive targets, that are vulnerable to faults. By exploiting this vulnerability, we propose the Lightning attack, which precisely strikes the model’s sensitive targets with hardware-induced transient faults based on the Dynamic Voltage and Frequency Scaling (DVFS). We design a sensitive targets search algorithm to find the most critical processing units of Deep Neural Network(DNN) models determining the inference results, and develop a genetic algorithm to automatically optimize the attack parameters for DVFS to induce faults. Experiments on three commodity Nvidia GPUs for four widely-used DNN models show that the proposed Lightning attack can reduce the inference accuracy by 69.1% on average for non-targeted attacks, and, more interestingly, achieve a success rate of 67.9% for targeted attacks.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136308878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shanglin Zhou, Mikhail A. Bragin, Deniz Gurevin, Lynn Pepin, Fei Miao, Caiwen Ding
Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. However, the typical three-stage pipeline (i.e., training, pruning, and retraining (fine-tuning)) significantly increases the overall training time. In this article, we develop a systematic weight-pruning optimization approach based on surrogate Lagrangian relaxation (SLR), which is tailored to overcome difficulties caused by the discrete nature of the weight-pruning problem. We further prove that our method ensures fast convergence of the model compression problem, and the convergence of the SLR is accelerated by using quadratic penalties. Model parameters obtained by SLR during the training phase are much closer to their optimal values as compared to those obtained by other state-of-the-art methods. We evaluate our method on image classification tasks using CIFAR-10 and ImageNet with state-of-the-art multi-layer perceptron based networks such as MLP-Mixer; attention-based networks such as Swin Transformer; and convolutional neural network based models such as VGG-16, ResNet-18, ResNet-50, ResNet-110, and MobileNetV2. We also evaluate object detection and segmentation tasks on COCO, the KITTI benchmark, and the TuSimple lane detection dataset using a variety of models. Experimental results demonstrate that our SLR-based weight-pruning optimization approach achieves a higher compression rate than state-of-the-art methods under the same accuracy requirement and also can achieve higher accuracy under the same compression rate requirement. Under classification tasks, our SLR approach converges to the desired accuracy × faster on both of the datasets. Under object detection and segmentation tasks, SLR also converges 2× faster to the desired accuracy. Further, our SLR achieves high model accuracy even at the hardpruning stage without retraining, which reduces the traditional three-stage pruning into a two-stage process. Given a limited budget of retraining epochs, our approach quickly recovers the model’s accuracy.
{"title":"Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural Network Pruning","authors":"Shanglin Zhou, Mikhail A. Bragin, Deniz Gurevin, Lynn Pepin, Fei Miao, Caiwen Ding","doi":"10.1145/3624476","DOIUrl":"https://doi.org/10.1145/3624476","url":null,"abstract":"Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. However, the typical three-stage pipeline (i.e., training, pruning, and retraining (fine-tuning)) significantly increases the overall training time. In this article, we develop a systematic weight-pruning optimization approach based on surrogate Lagrangian relaxation (SLR), which is tailored to overcome difficulties caused by the discrete nature of the weight-pruning problem. We further prove that our method ensures fast convergence of the model compression problem, and the convergence of the SLR is accelerated by using quadratic penalties. Model parameters obtained by SLR during the training phase are much closer to their optimal values as compared to those obtained by other state-of-the-art methods. We evaluate our method on image classification tasks using CIFAR-10 and ImageNet with state-of-the-art multi-layer perceptron based networks such as MLP-Mixer; attention-based networks such as Swin Transformer; and convolutional neural network based models such as VGG-16, ResNet-18, ResNet-50, ResNet-110, and MobileNetV2. We also evaluate object detection and segmentation tasks on COCO, the KITTI benchmark, and the TuSimple lane detection dataset using a variety of models. Experimental results demonstrate that our SLR-based weight-pruning optimization approach achieves a higher compression rate than state-of-the-art methods under the same accuracy requirement and also can achieve higher accuracy under the same compression rate requirement. Under classification tasks, our SLR approach converges to the desired accuracy × faster on both of the datasets. Under object detection and segmentation tasks, SLR also converges 2× faster to the desired accuracy. Further, our SLR achieves high model accuracy even at the hardpruning stage without retraining, which reduces the traditional three-stage pruning into a two-stage process. Given a limited budget of retraining epochs, our approach quickly recovers the model’s accuracy.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135063048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Iris Hru Jiang, David Chinnery, Gracieli Posser, Jens Lienig
equation and proximal group alternating direction method of multipliers ( ADMM ). A fast computation of 3D Poisson’s equation and a parameter updating scheme are presented to accelerate the convergence of the optimization problem. “A Fast Optimal Double Row Legalization Algorithm,” by Hougardy et al., improves the legalization step in standard-cell placement by minimizing cell displacement for both single-row and double-row height cells, assuming a fixed left-to-right ordering within each row. In doing so, the authors do not artificially bound the maximum cell movement and can guarantee to find an optimal solution with minimum cell displacement
{"title":"Introduction to the Special Section on Advances in Physical Design Automation","authors":"Iris Hru Jiang, David Chinnery, Gracieli Posser, Jens Lienig","doi":"10.1145/3604593","DOIUrl":"https://doi.org/10.1145/3604593","url":null,"abstract":"equation and proximal group alternating direction method of multipliers ( ADMM ). A fast computation of 3D Poisson’s equation and a parameter updating scheme are presented to accelerate the convergence of the optimization problem. “A Fast Optimal Double Row Legalization Algorithm,” by Hougardy et al., improves the legalization step in standard-cell placement by minimizing cell displacement for both single-row and double-row height cells, assuming a fixed left-to-right ordering within each row. In doing so, the authors do not artificially bound the maximum cell movement and can guarantee to find an optimal solution with minimum cell displacement","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136192256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In Placement Legalization, it is often assumed that (almost) all standard cells possess the same height and can therefore be aligned in cell rows , which can then be treated independently. However, this is no longer true for recent technologies, where a substantial number of cells of double- or even arbitrary multiple-row height is to be expected. Due to interdependencies between the cell placements within several rows, the legalization task becomes considerably harder. In this article, we show how to optimize squared cell movement for pairs of adjacent rows comprising cells of single- as well as double-row height with a fixed left-to-right ordering in time 𝒪( n · log ( n )), where n denotes the number of cells involved. Opposed to prior works, we do not artificially bound the maximum cell movement and can guarantee to find an optimum solution. Our approach also allows us to include gridding and movebound constraints for the cells. Experimental results show an average percental decrease of over 26% in the total squared movement when compared to a legalization approach that fixes cells of more than single-row height after Global Placement.
{"title":"A Fast Optimal Double-row Legalization Algorithm","authors":"Stefan Hougardy, Meike Neuwohner, Ulrike Schorr","doi":"10.1145/3579844","DOIUrl":"https://doi.org/10.1145/3579844","url":null,"abstract":"In Placement Legalization, it is often assumed that (almost) all standard cells possess the same height and can therefore be aligned in cell rows , which can then be treated independently. However, this is no longer true for recent technologies, where a substantial number of cells of double- or even arbitrary multiple-row height is to be expected. Due to interdependencies between the cell placements within several rows, the legalization task becomes considerably harder. In this article, we show how to optimize squared cell movement for pairs of adjacent rows comprising cells of single- as well as double-row height with a fixed left-to-right ordering in time 𝒪( n · log ( n )), where n denotes the number of cells involved. Opposed to prior works, we do not artificially bound the maximum cell movement and can guarantee to find an optimum solution. Our approach also allows us to include gridding and movebound constraints for the cells. Experimental results show an average percental decrease of over 26% in the total squared movement when compared to a legalization approach that fixes cells of more than single-row height after Global Placement.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136298928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sample preparation is an inherent procedure of many biochemical applications, and digital microfluidic biochips (DMBs) proved to be very effective in performing such a procedure. In a single mixing step, conventional DMBs can mix two droplets in 1:1 ratio only. Due to this limitation, DMBs suffer from heavy fluid wastage and often require a lot of mixing steps. However, the next-generation DMBs, i.e., micro-electrode-dot-array (MEDA) biochips can realize multiple mixing ratios, which in general helps in minimizing the number of mixing operations. In this paper, we present a heuristic-based sample preparation algorithm, specifically a mixing algorithm called Division by Factor Method for MEDA that exploits the mixing models of MEDA biochips. We propose another mixing algorithm for MEDA biochips called Single Target Waste Minimization (STWM), which minimizes the wastage of fluids and determines an efficient mixing graph. We also propose an advanced methodology for multiple target reagent mixing problems called Multi-Target Waste Minimization (MTWM), which determines efficient mixing graphs for different target ratios by maximizing the sharing of fluids and minimizing the fluid wastage. Simulation results suggest that the proposed STWM and MTWM methods outperform the state-of-the-art methods in terms of minimizing the amount of fluid wastage, reducing the total usage of reagent fluids, and minimizing the number of mixing operations.
{"title":"Multi-Target Fluid Mixing in MEDA Biochips: Theory and an Attempt towards Waste Minimization","authors":"Debraj Kundu, Sudip Roy","doi":"10.1145/3622785","DOIUrl":"https://doi.org/10.1145/3622785","url":null,"abstract":"Sample preparation is an inherent procedure of many biochemical applications, and digital microfluidic biochips (DMBs) proved to be very effective in performing such a procedure. In a single mixing step, conventional DMBs can mix two droplets in 1:1 ratio only. Due to this limitation, DMBs suffer from heavy fluid wastage and often require a lot of mixing steps. However, the next-generation DMBs, i.e., micro-electrode-dot-array (MEDA) biochips can realize multiple mixing ratios, which in general helps in minimizing the number of mixing operations. In this paper, we present a heuristic-based sample preparation algorithm, specifically a mixing algorithm called Division by Factor Method for MEDA that exploits the mixing models of MEDA biochips. We propose another mixing algorithm for MEDA biochips called Single Target Waste Minimization (STWM), which minimizes the wastage of fluids and determines an efficient mixing graph. We also propose an advanced methodology for multiple target reagent mixing problems called Multi-Target Waste Minimization (MTWM), which determines efficient mixing graphs for different target ratios by maximizing the sharing of fluids and minimizing the fluid wastage. Simulation results suggest that the proposed STWM and MTWM methods outperform the state-of-the-art methods in terms of minimizing the amount of fluid wastage, reducing the total usage of reagent fluids, and minimizing the number of mixing operations.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48611534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Discrete Hartley transform is a core component of digital signal processing because of its advantages of fast computing speed and less power consumption. Traditional FPGA-based implementation methods have the disadvantage of high latency, which cannot meet the needs of energy-efficient computing in the Internet of Things era. Therefore, A programmable analog memory computing circuit is proposed to accelerate FHT and IFHT calculations for large-scale one-step matrix computation. By adjusting the weight of memristor, different scales of FHT calculation can be achieved. PSPICE simulation results show that the average accuracy of the proposed circuit can reach 99.9%, and the speed can also reach the level of 0.1μs. The robustness analysis shows that the circuit can tolerate a certain degree of programming error and resistance tolerance. The designed analog circuit is applied to image compression processing, and the image compression accuracy can reach 99.9%.
{"title":"Programmable In-memory Computing Circuit of Fast Hartley Transform","authors":"Q. Hong, Richeng Huang, Pin-an Xiao, Jun Yu Li, Jingru Sun, Jiliang Zhang","doi":"10.1145/3618112","DOIUrl":"https://doi.org/10.1145/3618112","url":null,"abstract":"Discrete Hartley transform is a core component of digital signal processing because of its advantages of fast computing speed and less power consumption. Traditional FPGA-based implementation methods have the disadvantage of high latency, which cannot meet the needs of energy-efficient computing in the Internet of Things era. Therefore, A programmable analog memory computing circuit is proposed to accelerate FHT and IFHT calculations for large-scale one-step matrix computation. By adjusting the weight of memristor, different scales of FHT calculation can be achieved. PSPICE simulation results show that the average accuracy of the proposed circuit can reach 99.9%, and the speed can also reach the level of 0.1μs. The robustness analysis shows that the circuit can tolerate a certain degree of programming error and resistance tolerance. The designed analog circuit is applied to image compression processing, and the image compression accuracy can reach 99.9%.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44030534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Embedded system applications often require guarantees regarding non-functional properties when executed on a given MPSoC platform. Examples of such requirements include real-time, energy or safety properties on corresponding programs. One option to implement the enforcement of such requirements is by a reactive control loop, where an enforcer decides based on a system response (feedback) how to control the system, e.g., by adapting the number of cores allocated to a program or by scaling the voltage/frequency mode of involved processors. Typically, a violation of a requirement must either never happen in case of strict enforcement, or only happen temporally (in case of so-called loose enforcement). However, it is a challenge to design enforcers for which it is possible to give formal guarantees with respect to requirements, especially in the presence of typically largely varying environmental input (workload) per execution. Technically, an enforcement strategy can be formally modeled by a finite state machine (FSM) and the uncertain environment determining the workload by a discrete-time Markov chain. It has been shown in previous work that this formalization allows the formal verification of temporal properties (verification goals) regarding the fulfillment of requirements for a given enforcement strategy. In this paper, we consider the so far unsolved problem of design space exploration and automatic synthesis of enforcement automata that maximize a number of deterministic and probabilistic verification goals formulated on a given set of non-functional requirements. For the design space exploration (DSE), an approach based on multi-objective evolutionary algorithms is proposed in which enforcement automata are encoded as genes of states and state transition conditions. For each individual, the verification goals are evaluated using probabilistic model checking. At the end, the DSE returns a set of efficient FSMs in terms of probabilities of meeting given requirements. As experimental results, we present three use cases while considering requirements on latency and energy consumption.
{"title":"Automatic Synthesis of FSMs for Enforcing Non-functional Requirements on MPSoCs using Multi-Objective Evolutionary Algorithms","authors":"Khalil Esper, S. Wildermann, J. Teich","doi":"10.1145/3617832","DOIUrl":"https://doi.org/10.1145/3617832","url":null,"abstract":"Embedded system applications often require guarantees regarding non-functional properties when executed on a given MPSoC platform. Examples of such requirements include real-time, energy or safety properties on corresponding programs. One option to implement the enforcement of such requirements is by a reactive control loop, where an enforcer decides based on a system response (feedback) how to control the system, e.g., by adapting the number of cores allocated to a program or by scaling the voltage/frequency mode of involved processors. Typically, a violation of a requirement must either never happen in case of strict enforcement, or only happen temporally (in case of so-called loose enforcement). However, it is a challenge to design enforcers for which it is possible to give formal guarantees with respect to requirements, especially in the presence of typically largely varying environmental input (workload) per execution. Technically, an enforcement strategy can be formally modeled by a finite state machine (FSM) and the uncertain environment determining the workload by a discrete-time Markov chain. It has been shown in previous work that this formalization allows the formal verification of temporal properties (verification goals) regarding the fulfillment of requirements for a given enforcement strategy. In this paper, we consider the so far unsolved problem of design space exploration and automatic synthesis of enforcement automata that maximize a number of deterministic and probabilistic verification goals formulated on a given set of non-functional requirements. For the design space exploration (DSE), an approach based on multi-objective evolutionary algorithms is proposed in which enforcement automata are encoded as genes of states and state transition conditions. For each individual, the verification goals are evaluated using probabilistic model checking. At the end, the DSE returns a set of efficient FSMs in terms of probabilities of meeting given requirements. As experimental results, we present three use cases while considering requirements on latency and energy consumption.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47572110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}