Design space exploration (DSE) is a key activity in embedded design processes, where a mapping between applications and platforms that meets the process design requirements must be found. Finding such mappings is very challenging due to the complexity of modern embedded platforms and applications. DSE tools aid in this challenge by potentially covering sections of the design space that could be unintuitive to designers, leading to more optimised designs. Despite this potential benefit, DSE tools remain relatively niche in the embedded industry. A significant obstacle hindering their wider adoption is integrating such tools into embedded design processes.
We present two contributions that address this integration issue. First, we present the design space identification (DSI) approach for systematically constructing DSE solutions that are modular and tuneable. Modularity means that DSE solutions can be reused to construct other DSE solutions, while tuneability means that the most specific DSE solution is chosen for the target DSE problem. Moreover, DSI enables transparent cooperation between exploration algorithms. Second, we present IDeSyDe, an extensible DSE framework for DSE solutions based on DSI. IDeSyDe allows extensions to be developed in different programming languages in a manner compliant with the DSI approach.
We showcase the relevance of these contributions through five different case studies. The case study evaluations showed that non-exploration DSI procedures create overheads, which are marginal compared to the exploration algorithms. Empirically, most evaluations average 2% of the total DSE request. More importantly, the case studies have shown that IDeSyDe indeed provides a modular and incremental framework for constructing DSE solutions. In particular, the last case study required minimal extensions over the previous case studies so that support for a new application type was added to IDeSyDe.
{"title":"IDeSyDe: Systematic Design Space Exploration via Design Space Identification","authors":"Rodolfo Jordão, Matthias Becker, Ingo Sander","doi":"10.1145/3647640","DOIUrl":"https://doi.org/10.1145/3647640","url":null,"abstract":"<p>Design space exploration (DSE) is a key activity in embedded design processes, where a mapping between applications and platforms that meets the process design requirements must be found. Finding such mappings is very challenging due to the complexity of modern embedded platforms and applications. DSE tools aid in this challenge by potentially covering sections of the design space that could be unintuitive to designers, leading to more optimised designs. Despite this potential benefit, DSE tools remain relatively niche in the embedded industry. A significant obstacle hindering their wider adoption is integrating such tools into embedded design processes. </p><p>We present two contributions that address this integration issue. First, we present the design space identification (DSI) approach for systematically constructing DSE solutions that are modular and tuneable. Modularity means that DSE solutions can be reused to construct other DSE solutions, while tuneability means that the most specific DSE solution is chosen for the target DSE problem. Moreover, DSI enables transparent cooperation between exploration algorithms. Second, we present IDeSyDe, an extensible DSE framework for DSE solutions based on DSI. IDeSyDe allows extensions to be developed in different programming languages in a manner compliant with the DSI approach. </p><p>We showcase the relevance of these contributions through five different case studies. The case study evaluations showed that non-exploration DSI procedures create overheads, which are marginal compared to the exploration algorithms. Empirically, most evaluations average 2% of the total DSE request. More importantly, the case studies have shown that IDeSyDe indeed provides a modular and incremental framework for constructing DSE solutions. In particular, the last case study required minimal extensions over the previous case studies so that support for a new application type was added to IDeSyDe.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"26 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139768878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, we explore the capability of Large Language Models (LLMs) to automate hardware design by automatically completing partial Verilog code, a common language for designing and modeling digital systems. We fine-tune pre-existing LLMs on Verilog datasets compiled from GitHub and Verilog textbooks. We evaluate the functional correctness of the generated Verilog code using a specially designed test suite, featuring a custom problem set and testing benches. Here, our fine-tuned open-source CodeGen-16B model outperforms the commercial state-of-the-art GPT-3.5-turbo model with a 1.1% overall increase. Upon testing with a more diverse and complex problem set, we find that the fine-tuned model shows competitive performance against state-of-the-art gpt-3.5-turbo, excelling in certain scenarios. Notably, it demonstrates a 41% improvement in generating syntactically correct Verilog code across various problem categories compared to its pre-trained counterpart, highlighting the potential of smaller, in-house LLMs in hardware design automation.
We release our training/evaluation scripts and LLM checkpoints as open-source contributions.
{"title":"VeriGen: A Large Language Model for Verilog Code Generation","authors":"Shailja Thakur, Baleegh Ahmad, Hammond Pearce, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh Karri, Siddharth Garg","doi":"10.1145/3643681","DOIUrl":"https://doi.org/10.1145/3643681","url":null,"abstract":"<p>In this study, we explore the capability of Large Language Models (LLMs) to automate hardware design by automatically completing partial Verilog code, a common language for designing and modeling digital systems. We fine-tune pre-existing LLMs on Verilog datasets compiled from GitHub and Verilog textbooks. We evaluate the functional correctness of the generated Verilog code using a specially designed test suite, featuring a custom problem set and testing benches. Here, our fine-tuned open-source CodeGen-16B model outperforms the commercial state-of-the-art GPT-3.5-turbo model with a 1.1% overall increase. Upon testing with a more diverse and complex problem set, we find that the fine-tuned model shows competitive performance against state-of-the-art gpt-3.5-turbo, excelling in certain scenarios. Notably, it demonstrates a 41% improvement in generating syntactically correct Verilog code across various problem categories compared to its pre-trained counterpart, highlighting the potential of smaller, in-house LLMs in hardware design automation. </p><p>We release our training/evaluation scripts and LLM checkpoints as open-source contributions.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"39 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139768885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fatemeh Serajeh Hassani, Mohammad Sadrosadati, Nezam Rohbani, Sebastian Pointner, Robert Wille, Hamid Sarbazi-azad
Abstract. Field-Programmable Gate Arrays (FPGAs) employ a large number of SRAM cells to provide a flexible routing architecture which have a significant impact on the FPGA’s area and power consumption. This flexible routing allows for a rather easy realization of the desired functionality, but our evaluations show that the full routing flexibility is not required in many occasions. In this work, we focus on what is actually needed and introduce a new switch-box realization what we call Turn-Restricted Switch-Boxes which supports only a subset of possible turns. The proposed method increases the utilization rate of FPGA switch-boxes by eliminating the unemployed resources. Experimental evaluations confirm that the area and average power consumption can be reduced by 12.8% and 14.1%, on average, respectively and the FPGA routing susceptibility to SEU and MBU can be improved by 18.2%, on average, by imposing negligible performance.
摘要现场可编程门阵列(FPGA)采用大量的 SRAM 单元来提供灵活的路由架构,这对 FPGA 的面积和功耗有重大影响。这种灵活的路由可以轻松实现所需的功能,但我们的评估表明,在很多情况下并不需要完全的路由灵活性。在这项工作中,我们将重点放在实际需要的功能上,并引入了一种新的开关盒实现方式,我们称之为 "转数受限开关盒"(Turn-Restricted Switch-Boxes),它只支持可能的转数子集。所提出的方法通过消除闲置资源提高了 FPGA 开关盒的利用率。实验评估证实,面积和平均功耗可分别平均减少 12.8% 和 14.1%,FPGA 路由对 SEU 和 MBU 的敏感性可平均提高 18.2%,性能可忽略不计。
{"title":"An Efficient FPGA Architecture with Turn-Restricted Switch Boxes","authors":"Fatemeh Serajeh Hassani, Mohammad Sadrosadati, Nezam Rohbani, Sebastian Pointner, Robert Wille, Hamid Sarbazi-azad","doi":"10.1145/3643809","DOIUrl":"https://doi.org/10.1145/3643809","url":null,"abstract":"<p><i>Abstract. Field-Programmable Gate Arrays</i> (FPGAs) employ a large number of SRAM cells to provide a flexible routing architecture which have a significant impact on the FPGA’s area and power consumption. This flexible routing allows for a rather easy realization of the desired functionality, but our evaluations show that the full routing flexibility is not required in many occasions. In this work, we focus on what is actually needed and introduce a new switch-box realization what we call <i>Turn-Restricted Switch-Boxes</i> which supports only a subset of possible turns. The proposed method increases the utilization rate of FPGA switch-boxes by eliminating the unemployed resources. Experimental evaluations confirm that the area and average power consumption can be reduced by 12.8% and 14.1%, on average, respectively and the FPGA routing susceptibility to SEU and MBU can be improved by 18.2%, on average, by imposing negligible performance.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"20 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139678094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Logic built-in self-test (LBIST) approaches use an on-chip logic block for test generation and thus enable in-field testing. Recent reports of silent data corruption underline the importance of in-field testing. In a class of storage-based LBIST approaches, compressed tests are stored on-chip and decompressed by an on-chip decompression logic. The on-chip storage requirements may become a bottleneck when the number of compressed tests is large. In this case, using each compressed test for applying several different tests allows the storage requirements to be reduced. However, producing different tests from each compressed test has a hardware overhead. This article suggests a new on-chip storage scheme for compressed tests that eliminates the additional hardware overhead. Under the new storage scheme, a set of N B-bit compressed tests targeting a set of faults F0 is translated into a sequence S of N · B bits. Every B consecutive bits of S are considered as a compressed test. The sequence S thus yields close to N · B compressed tests, magnifying the test data stored in S almost B times. Taking advantage of the extra tests, the article describes a software procedure that is applied off-line to reduce S without losing fault coverage of F0. Experimental results for benchmark circuits demonstrate significant reductions in the storage requirements of S, and significant increases in the fault coverage of a second set of faults, F1.
逻辑内置自测试(LBIST)方法使用片上逻辑块生成测试,从而实现了现场测试。最近有关无声数据损坏的报道强调了现场测试的重要性。在一类基于存储的 LBIST 方法中,压缩测试存储在片上,并由片上解压缩逻辑进行解压缩。当压缩测试数量较多时,片上存储要求可能会成为瓶颈。在这种情况下,利用每个压缩测试应用多个不同的测试,可以降低存储要求。但是,从每个压缩测试中生成不同的测试会产生硬件开销。本文提出了一种新的片上压缩测试存储方案,可消除额外的硬件开销。在新的存储方案下,一组针对故障 F0 的 N B 位压缩测试被转换为 N - B 位序列 S。S 的每 B 个连续比特都被视为一个压缩测试。这样,序列 S 就产生了接近 N - B 的压缩测试,将存储在 S 中的测试数据放大了近 B 倍。文章介绍了一种软件程序,利用额外的测试,在不损失 F0 故障覆盖率的情况下,离线减少 S。对基准电路的实验结果表明,S 的存储要求显著降低,第二组故障 F1 的故障覆盖率显著提高。
{"title":"Reduced On-Chip Storage of Seeds for Built-In Test Generation","authors":"Irith Pomeranz","doi":"10.1145/3643810","DOIUrl":"https://doi.org/10.1145/3643810","url":null,"abstract":"<p>Logic built-in self-test (<i>LBIST</i>) approaches use an on-chip logic block for test generation and thus enable in-field testing. Recent reports of silent data corruption underline the importance of in-field testing. In a class of storage-based <i>LBIST</i> approaches, compressed tests are stored on-chip and decompressed by an on-chip decompression logic. The on-chip storage requirements may become a bottleneck when the number of compressed tests is large. In this case, using each compressed test for applying several different tests allows the storage requirements to be reduced. However, producing different tests from each compressed test has a hardware overhead. This article suggests a new on-chip storage scheme for compressed tests that eliminates the additional hardware overhead. Under the new storage scheme, a set of <i>N B</i>-bit compressed tests targeting a set of faults <i>F</i><sub>0</sub> is translated into a sequence <i>S</i> of <i>N</i> · <i>B</i> bits. Every <i>B</i> consecutive bits of <i>S</i> are considered as a compressed test. The sequence <i>S</i> thus yields close to <i>N</i> · <i>B</i> compressed tests, magnifying the test data stored in <i>S</i> almost <i>B</i> times. Taking advantage of the extra tests, the article describes a software procedure that is applied off-line to reduce <i>S</i> without losing fault coverage of <i>F</i><sub>0</sub>. Experimental results for benchmark circuits demonstrate significant reductions in the storage requirements of <i>S</i>, and significant increases in the fault coverage of a second set of faults, <i>F</i><sub>1</sub>.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"29 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139656906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aidong Zhao, Tianchen Gu, Zhaori Bi, Fan Yang, Changhao Yan, Xuan Zeng, Zixiao Lin, Wenchuang Hu, Dian Zhou
Bayesian optimization (BO) is an efficient global optimization method for expensive black-box functions. Whereas, the expansion for high-dimensional problems and large sample budgets still remains a severe challenge. In order to extend BO for large-scale analog circuit synthesis, a novel computationally efficient parallel BO method, D3PBO, is proposed for high-dimensional problems in this work. We introduce the dynamic domain decomposition method based on maximum variance between clusters. The search space is decomposed into subdomains progressively to limit the maximal number of observations in each domain. The promising domain is explored by multi-trust region based batch BO with the local Gaussian process (GP) model. As the domain decomposition progresses, the basin-shaped domain is identified using a GP-assisted quadratic regression method and exploited by the local search method BOBYQA to achieve faster convergence rate. The time complexity of D3PBO is constant for each iteration. Experiments demonstrate that D3PBO obtains better results with significant less runtime consumption compared to state-of-the-art methods. For the circuit optimization experiments, D3PBO achieves up to 10 × runtime speedup compared to TuRBO with better solutions.
贝叶斯优化(BO)是一种针对昂贵的黑盒函数的高效全局优化方法。然而,对高维问题和大样本预算的扩展仍然是一个严峻的挑战。为了将 BO 扩展到大规模模拟电路合成中,本研究针对高维问题提出了一种新型计算高效的并行 BO 方法 D3PBO。我们引入了基于簇间最大方差的动态域分解方法。搜索空间被逐步分解成子域,以限制每个域中观测值的最大数量。通过基于多信任区域的批量 BO 和本地高斯过程(GP)模型,探索有希望的域。随着域分解的进行,利用 GP 辅助二次回归方法识别出盆地状域,并通过局部搜索方法 BOBYQA 加以利用,以实现更快的收敛速度。D3PBO 的时间复杂度在每次迭代中都是恒定的。实验证明,与最先进的方法相比,D3PBO 能以更少的运行时间获得更好的结果。在电路优化实验中,与 TuRBO 相比,D3PBO 的运行速度提高了 10 倍,并获得了更好的解决方案。
{"title":"D3PBO: Dynamic Domain Decomposition based Parallel Bayesian Optimization for Large-scale Analog Circuit Sizing","authors":"Aidong Zhao, Tianchen Gu, Zhaori Bi, Fan Yang, Changhao Yan, Xuan Zeng, Zixiao Lin, Wenchuang Hu, Dian Zhou","doi":"10.1145/3643811","DOIUrl":"https://doi.org/10.1145/3643811","url":null,"abstract":"<p>Bayesian optimization (BO) is an efficient global optimization method for expensive black-box functions. Whereas, the expansion for high-dimensional problems and large sample budgets still remains a severe challenge. In order to extend BO for large-scale analog circuit synthesis, a novel computationally efficient parallel BO method, D<sup>3</sup>PBO, is proposed for high-dimensional problems in this work. We introduce the dynamic domain decomposition method based on maximum variance between clusters. The search space is decomposed into subdomains progressively to limit the maximal number of observations in each domain. The promising domain is explored by multi-trust region based batch BO with the local Gaussian process (GP) model. As the domain decomposition progresses, the basin-shaped domain is identified using a GP-assisted quadratic regression method and exploited by the local search method BOBYQA to achieve faster convergence rate. The time complexity of D<sup>3</sup>PBO is constant for each iteration. Experiments demonstrate that D<sup>3</sup>PBO obtains better results with significant less runtime consumption compared to state-of-the-art methods. For the circuit optimization experiments, D<sup>3</sup>PBO achieves up to 10 × runtime speedup compared to TuRBO with better solutions.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"17 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139665323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Can Deng, Zhaoyun Chen, Yang Shi, Yimin Ma, Mei Wen, Lei Luo
Typical embedded processors, such as Digital Signal Processors (DSPs), usually adopt Very Long Instruction Word (VLIW) architecture to improve computing efficiency. The performance of VLIW processors heavily relies on Instruction-Level Parallelism (ILP). Therefore, it is crucial to develop an efficient instruction scheduling algorithm to explore more ILP. While heuristic algorithms are widely used in modern compilers due to simple implementation and low computational cost, they have limitations in providing accurate solutions and are prone to local optima. On the other hand, exact algorithms can usually find the optimal solution, but their high time overhead makes them less suitable for large-scale problems. This paper proposes a two-dimensional constrained dynamic programming (TDCDP) approach and a quantitative model for instruction scheduling. The TDCDP approach achieves near-optimal solutions within an acceptable time overhead. Furthermore, we integrate our TDCDP approach into mainstream compiler architecture, encompassing Pre- and Post-RA (register allocation) scheduling. We conduct a quantitative evaluation of TDCDP compared to four heuristic algorithms on a typical VLIW processor. Our approach achieves an efficiency improvement of up to 58.34% in final solutions compared to the heuristic algorithms. Additionally, the Post-RA Scheduling enhances programs with an average speedup of 14.04% than solely applying the Pre-RA Scheduling.
{"title":"Optimizing VLIW Instruction Scheduling via a Two-Dimensional Constrained Dynamic Programming","authors":"Can Deng, Zhaoyun Chen, Yang Shi, Yimin Ma, Mei Wen, Lei Luo","doi":"10.1145/3643135","DOIUrl":"https://doi.org/10.1145/3643135","url":null,"abstract":"<p>Typical embedded processors, such as Digital Signal Processors (DSPs), usually adopt Very Long Instruction Word (VLIW) architecture to improve computing efficiency. The performance of VLIW processors heavily relies on Instruction-Level Parallelism (ILP). Therefore, it is crucial to develop an efficient instruction scheduling algorithm to explore more ILP. While heuristic algorithms are widely used in modern compilers due to simple implementation and low computational cost, they have limitations in providing accurate solutions and are prone to local optima. On the other hand, exact algorithms can usually find the optimal solution, but their high time overhead makes them less suitable for large-scale problems. This paper proposes a two-dimensional constrained dynamic programming (TDCDP) approach and a quantitative model for instruction scheduling. The TDCDP approach achieves near-optimal solutions within an acceptable time overhead. Furthermore, we integrate our TDCDP approach into mainstream compiler architecture, encompassing Pre- and Post-RA (register allocation) scheduling. We conduct a quantitative evaluation of TDCDP compared to four heuristic algorithms on a typical VLIW processor. Our approach achieves an efficiency improvement of up to 58.34% in final solutions compared to the heuristic algorithms. Additionally, the Post-RA Scheduling enhances programs with an average speedup of 14.04% than solely applying the Pre-RA Scheduling.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"53 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139554221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artificial neural networks (ANNs) and spiking neural networks (SNNs) are two general approaches to achieve artificial intelligence (AI). The former have been widely used in academia and industry fields. The latter, SNNs, are more similar to biological neural networks and can realize ultra-low power consumption, thus have received widespread research attention. However, due to their fundamental differences in computation formula and information coding, the two methods often require different and incompatible platforms. Alongside the development of AI, a general platform that can support both ANNs and SNNs is necessary. Moreover, there are some similarities between ANNs and SNNs, which leaves room to deploy different networks on the same architecture. However, there is little related research on this topic. Accordingly, this paper presents an energy-efficient, scalable, and non-Von Neumann architecture (EPHA) for ANNs and SNNs. Our study combines device-, circuit-, architecture-, and algorithm-level innovations to achieve a parallel architecture with ultra-low power consumption. We use the compensated ferrimagnet to act as both synapses and neurons to store weights and perform dot-product operations, respectively. Moreover, we propose a novel computing flow to reduce the operations across multiple crossbar arrays, which enables our design to conduct large and complex tasks. On a suite of ANN and SNN workloads, the EPHA is 1.6 × more power efficient than a state-of-the-art design, NEBULA, in the ANN mode. In the SNN mode, our design is 4 orders of magnitude more than the Loihi in power efficiency.
{"title":"EPHA: An Energy-efficient Parallel Hybrid Architecture for ANNs and SNNs","authors":"Yunping Zhao, Sheng Ma, Hengzhu Liu, Libo Huang, Libo Huang","doi":"10.1145/3643134","DOIUrl":"https://doi.org/10.1145/3643134","url":null,"abstract":"<p>Artificial neural networks (ANNs) and spiking neural networks (SNNs) are two general approaches to achieve artificial intelligence (AI). The former have been widely used in academia and industry fields. The latter, SNNs, are more similar to biological neural networks and can realize ultra-low power consumption, thus have received widespread research attention. However, due to their fundamental differences in computation formula and information coding, the two methods often require different and incompatible platforms. Alongside the development of AI, a general platform that can support both ANNs and SNNs is necessary. Moreover, there are some similarities between ANNs and SNNs, which leaves room to deploy different networks on the same architecture. However, there is little related research on this topic. Accordingly, this paper presents an energy-efficient, scalable, and non-Von Neumann architecture (EPHA) for ANNs and SNNs. Our study combines device-, circuit-, architecture-, and algorithm-level innovations to achieve a parallel architecture with ultra-low power consumption. We use the compensated ferrimagnet to act as both synapses and neurons to store weights and perform dot-product operations, respectively. Moreover, we propose a novel computing flow to reduce the operations across multiple crossbar arrays, which enables our design to conduct large and complex tasks. On a suite of ANN and SNN workloads, the EPHA is 1.6 × more power efficient than a state-of-the-art design, NEBULA, in the ANN mode. In the SNN mode, our design is 4 orders of magnitude more than the Loihi in power efficiency.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"28 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139579076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Analog circuit optimization and design presents a unique set of challenges in the IC design process. Many applications require for the designer to optimize for multiple competing objectives which poses a crucial challenge. Motivated by these practical aspects, we propose a novel method to tackle multi-objective optimization for analog circuit design in continuous action spaces. In particular, we propose to: (i) Extrapolate current techniques in Multi-Objective Reinforcement Learning (MORL) to continuous state and action spaces. (ii) Provide for a dynamically tunable trained model to query user defined preferences in multi-objective optimization in the analog circuit design context.
{"title":"Pareto Optimization of Analog circuits using Reinforcement Learning","authors":"Karthik Somayaji Ns, Peng Li","doi":"10.1145/3640463","DOIUrl":"https://doi.org/10.1145/3640463","url":null,"abstract":"<p>Analog circuit optimization and design presents a unique set of challenges in the IC design process. Many applications require for the designer to optimize for multiple competing objectives which poses a crucial challenge. Motivated by these practical aspects, we propose a novel method to tackle multi-objective optimization for analog circuit design in continuous action spaces. In particular, we propose to: (i) Extrapolate current techniques in Multi-Objective Reinforcement Learning (MORL) to continuous state and action spaces. (ii) Provide for a dynamically tunable trained model to query user defined preferences in multi-objective optimization in the analog circuit design context.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"22 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139482630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we introduce two novel methods to detect adversarial examples utilizing pixel value diversity. First, we propose the concept of pixel value diversity (which reflects the spread of pixel values in an image) and two independent metrics (UPVR and RPVR) to assess the pixel value diversity separately. Then we propose two methods to detect adversarial examples based on the threshold method and Bayesian method respectively. Experimental results show that compared to an excellent prior method LID, our proposed methods achieve better performances in detecting adversarial examples. We also show the robustness of our proposed work against an adaptive attack method.
在本文中,我们介绍了两种利用像素值多样性检测对抗示例的新方法。首先,我们提出了像素值多样性的概念(它反映了图像中像素值的分布)和两个独立的指标(UPVR 和 RPVR)来分别评估像素值多样性。然后,我们分别提出了基于阈值法和贝叶斯法的两种检测对抗示例的方法。实验结果表明,与优秀的先验方法 LID 相比,我们提出的方法在检测对抗性示例方面取得了更好的性能。我们还展示了我们提出的方法对自适应攻击方法的鲁棒性。
{"title":"Detecting Adversarial Examples Utilizing Pixel Value Diversity","authors":"Jinxin Dong, Pingqiang Zhou","doi":"10.1145/3636460","DOIUrl":"https://doi.org/10.1145/3636460","url":null,"abstract":"<p>In this paper, we introduce two novel methods to detect adversarial examples utilizing pixel value diversity. First, we propose the concept of pixel value diversity (which reflects the spread of pixel values in an image) and two independent metrics (UPVR and RPVR) to assess the pixel value diversity separately. Then we propose two methods to detect adversarial examples based on the threshold method and Bayesian method respectively. Experimental results show that compared to an excellent prior method LID, our proposed methods achieve better performances in detecting adversarial examples. We also show the robustness of our proposed work against an adaptive attack method.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139476939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianfeng Wang, Zhonghao Chen, Jiahao Zhang, Yixin Xu, Tongguang Yu, Ziheng Zheng, Enze Ye, Sumitha George, Huazhong Yang, Yongpan Liu, Kai Ni, Vijaykrishnan Narayanan, Xueqing Li
Logic camouflage is a widely adopted technique that mitigates the threat of intellectual property (IP) piracy and overproduction in the integrated circuit (IC) supply chain. Camouflaged logic achieves functional obfuscation through physical-level ambiguity and post-manufacturing programmability. However, discussions on programmability are confined to the level of logic cells/gates, limiting the broader-scale application of logic camouflage. In this work, we propose a novel module-level configuration methodology for programmable camouflaged logic that can be implemented without additional hardware ports and with negligible resources. We prove theoretically that the configuration of the programmable camouflaged logic cells can be achieved through the inputs and netlist of the original module. Further, we propose a novel lightweight ferroelectric FET (FeFET)-based reconfigurable logic gate (rGate) family and apply it to the proposed methodology. With the flexible replacement and the proposed configuration-aware conversion algorithm, this work is characterized by the input-only programming scheme as well as the combination of high output error rate and point-function-like defense. Evaluations show an average of >95% of the alternative rGate location for camouflage, which is sufficient for the security-aware design. We illustrate the exponential complexity in function state traversal and the enhanced defense capability of locked blackbox against SAT attacks compared to key-based methods. We also preserve an evident output Hamming distance and introduce negligible hardware overheads in both gate-level and module-level evaluations under typical benchmarks.
逻辑伪装是一种广泛采用的技术,可减轻集成电路(IC)供应链中知识产权(IP)盗版和过度生产的威胁。伪装逻辑通过物理层面的模糊性和制造后的可编程性实现功能混淆。然而,关于可编程性的讨论仅限于逻辑单元/门的层面,限制了逻辑伪装在更大范围内的应用。在这项工作中,我们为可编程伪装逻辑提出了一种新颖的模块级配置方法,无需额外的硬件端口和可忽略的资源即可实现。我们从理论上证明,可编程伪装逻辑单元的配置可以通过原始模块的输入和网表来实现。此外,我们还提出了一种基于铁电场效应晶体管(FeFET)的新型轻量级可重构逻辑门(rGate)系列,并将其应用于所提出的方法。通过灵活的替换和所提出的配置感知转换算法,这项工作的特点是只需输入编程方案以及高输出错误率和点函数式防御的结合。评估显示,伪装的可选 rGate 位置平均达到 95%,足以满足安全感知设计的要求。我们说明了函数状态遍历的指数复杂性,以及与基于密钥的方法相比,锁定黑盒对 SAT 攻击的增强防御能力。我们还保留了明显的输出汉明距离,并在典型基准下的门级和模块级评估中引入了可忽略不计的硬件开销。
{"title":"A Module-Level Configuration Methodology for Programmable Camouflaged Logic","authors":"Jianfeng Wang, Zhonghao Chen, Jiahao Zhang, Yixin Xu, Tongguang Yu, Ziheng Zheng, Enze Ye, Sumitha George, Huazhong Yang, Yongpan Liu, Kai Ni, Vijaykrishnan Narayanan, Xueqing Li","doi":"10.1145/3640462","DOIUrl":"https://doi.org/10.1145/3640462","url":null,"abstract":"<p>Logic camouflage is a widely adopted technique that mitigates the threat of intellectual property (IP) piracy and overproduction in the integrated circuit (IC) supply chain. Camouflaged logic achieves functional obfuscation through physical-level ambiguity and post-manufacturing programmability. However, discussions on programmability are confined to the level of logic cells/gates, limiting the broader-scale application of logic camouflage. In this work, we propose a novel module-level configuration methodology for programmable camouflaged logic that can be implemented without additional hardware ports and with negligible resources. We prove theoretically that the configuration of the programmable camouflaged logic cells can be achieved through the inputs and netlist of the original module. Further, we propose a novel lightweight ferroelectric FET (FeFET)-based reconfigurable logic gate (rGate) family and apply it to the proposed methodology. With the flexible replacement and the proposed configuration-aware conversion algorithm, this work is characterized by the input-only programming scheme as well as the combination of high output error rate and point-function-like defense. Evaluations show an average of >95% of the alternative rGate location for camouflage, which is sufficient for the security-aware design. We illustrate the exponential complexity in function state traversal and the enhanced defense capability of locked blackbox against SAT attacks compared to key-based methods. We also preserve an evident output Hamming distance and introduce negligible hardware overheads in both gate-level and module-level evaluations under typical benchmarks.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"273 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139458921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}