首页 > 最新文献

ACM Transactions on Design Automation of Electronic Systems最新文献

英文 中文
IDeSyDe: Systematic Design Space Exploration via Design Space Identification IDeSyDe:通过设计空间识别进行系统设计空间探索
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-02-10 DOI: 10.1145/3647640
Rodolfo Jordão, Matthias Becker, Ingo Sander

Design space exploration (DSE) is a key activity in embedded design processes, where a mapping between applications and platforms that meets the process design requirements must be found. Finding such mappings is very challenging due to the complexity of modern embedded platforms and applications. DSE tools aid in this challenge by potentially covering sections of the design space that could be unintuitive to designers, leading to more optimised designs. Despite this potential benefit, DSE tools remain relatively niche in the embedded industry. A significant obstacle hindering their wider adoption is integrating such tools into embedded design processes.

We present two contributions that address this integration issue. First, we present the design space identification (DSI) approach for systematically constructing DSE solutions that are modular and tuneable. Modularity means that DSE solutions can be reused to construct other DSE solutions, while tuneability means that the most specific DSE solution is chosen for the target DSE problem. Moreover, DSI enables transparent cooperation between exploration algorithms. Second, we present IDeSyDe, an extensible DSE framework for DSE solutions based on DSI. IDeSyDe allows extensions to be developed in different programming languages in a manner compliant with the DSI approach.

We showcase the relevance of these contributions through five different case studies. The case study evaluations showed that non-exploration DSI procedures create overheads, which are marginal compared to the exploration algorithms. Empirically, most evaluations average 2% of the total DSE request. More importantly, the case studies have shown that IDeSyDe indeed provides a modular and incremental framework for constructing DSE solutions. In particular, the last case study required minimal extensions over the previous case studies so that support for a new application type was added to IDeSyDe.

设计空间探索(DSE)是嵌入式设计流程中的一项关键活动,必须在应用和平台之间找到符合流程设计要求的映射。由于现代嵌入式平台和应用的复杂性,找到这种映射非常具有挑战性。DSE 工具可以帮助应对这一挑战,因为它有可能涵盖设计人员无法直观理解的设计空间部分,从而实现更优化的设计。尽管有这样的潜在优势,但 DSE 工具在嵌入式行业仍相对小众。将这些工具集成到嵌入式设计流程中是阻碍其更广泛应用的一大障碍。我们提出了两个解决集成问题的方案。首先,我们提出了设计空间识别 (DSI) 方法,用于系统地构建模块化和可调整的 DSE 解决方案。模块化意味着 DSE 解决方案可重复用于构建其他 DSE 解决方案,而可调整性则意味着可针对目标 DSE 问题选择最具体的 DSE 解决方案。此外,DSI 还能实现探索算法之间的透明合作。其次,我们介绍了 IDeSyDe,这是一个可扩展的 DSE 框架,用于基于 DSI 的 DSE 解决方案。IDeSyDe 允许以符合 DSI 方法的方式用不同的编程语言开发扩展程序。我们通过五个不同的案例研究展示了这些贡献的相关性。案例研究评估表明,非探索式 DSI 程序产生的开销与探索式算法相比微不足道。根据经验,大多数评估平均占 DSE 总请求的 2%。更重要的是,案例研究表明,IDeSyDe 确实为构建 DSE 解决方案提供了一个模块化的增量框架。特别是,最后一个案例研究与之前的案例研究相比,只需进行最小限度的扩展,就能在 IDeSyDe 中添加对新应用类型的支持。
{"title":"IDeSyDe: Systematic Design Space Exploration via Design Space Identification","authors":"Rodolfo Jordão, Matthias Becker, Ingo Sander","doi":"10.1145/3647640","DOIUrl":"https://doi.org/10.1145/3647640","url":null,"abstract":"<p>Design space exploration (DSE) is a key activity in embedded design processes, where a mapping between applications and platforms that meets the process design requirements must be found. Finding such mappings is very challenging due to the complexity of modern embedded platforms and applications. DSE tools aid in this challenge by potentially covering sections of the design space that could be unintuitive to designers, leading to more optimised designs. Despite this potential benefit, DSE tools remain relatively niche in the embedded industry. A significant obstacle hindering their wider adoption is integrating such tools into embedded design processes. </p><p>We present two contributions that address this integration issue. First, we present the design space identification (DSI) approach for systematically constructing DSE solutions that are modular and tuneable. Modularity means that DSE solutions can be reused to construct other DSE solutions, while tuneability means that the most specific DSE solution is chosen for the target DSE problem. Moreover, DSI enables transparent cooperation between exploration algorithms. Second, we present IDeSyDe, an extensible DSE framework for DSE solutions based on DSI. IDeSyDe allows extensions to be developed in different programming languages in a manner compliant with the DSI approach. </p><p>We showcase the relevance of these contributions through five different case studies. The case study evaluations showed that non-exploration DSI procedures create overheads, which are marginal compared to the exploration algorithms. Empirically, most evaluations average 2% of the total DSE request. More importantly, the case studies have shown that IDeSyDe indeed provides a modular and incremental framework for constructing DSE solutions. In particular, the last case study required minimal extensions over the previous case studies so that support for a new application type was added to IDeSyDe.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"26 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139768878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VeriGen: A Large Language Model for Verilog Code Generation VeriGen:用于 Verilog 代码生成的大型语言模型
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-02-09 DOI: 10.1145/3643681
Shailja Thakur, Baleegh Ahmad, Hammond Pearce, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh Karri, Siddharth Garg

In this study, we explore the capability of Large Language Models (LLMs) to automate hardware design by automatically completing partial Verilog code, a common language for designing and modeling digital systems. We fine-tune pre-existing LLMs on Verilog datasets compiled from GitHub and Verilog textbooks. We evaluate the functional correctness of the generated Verilog code using a specially designed test suite, featuring a custom problem set and testing benches. Here, our fine-tuned open-source CodeGen-16B model outperforms the commercial state-of-the-art GPT-3.5-turbo model with a 1.1% overall increase. Upon testing with a more diverse and complex problem set, we find that the fine-tuned model shows competitive performance against state-of-the-art gpt-3.5-turbo, excelling in certain scenarios. Notably, it demonstrates a 41% improvement in generating syntactically correct Verilog code across various problem categories compared to its pre-trained counterpart, highlighting the potential of smaller, in-house LLMs in hardware design automation.

We release our training/evaluation scripts and LLM checkpoints as open-source contributions.

在本研究中,我们探索了大型语言模型(LLM)通过自动完成部分 Verilog 代码实现硬件设计自动化的能力,Verilog 是一种用于设计和模拟数字系统的通用语言。我们在从 GitHub 和 Verilog 教科书编译的 Verilog 数据集上对已有的 LLM 进行了微调。我们使用专门设计的测试套件来评估生成的 Verilog 代码的功能正确性,该套件具有自定义问题集和测试台。在这里,我们经过微调的开源 CodeGen-16B 模型的性能优于最先进的商用 GPT-3.5-turbo 模型,整体提高了 1.1%。在使用更多样、更复杂的问题集进行测试后,我们发现经过微调的模型与最先进的 GPT-3.5-turbo 模型相比表现出了竞争力,在某些情况下更胜一筹。值得注意的是,在生成语法正确的 Verilog 代码方面,该模型在各种问题类别中的表现比预先训练的模型提高了 41%,这凸显了小型内部 LLM 在硬件设计自动化中的潜力。我们将训练/评估脚本和 LLM 检查点作为开源贡献发布。
{"title":"VeriGen: A Large Language Model for Verilog Code Generation","authors":"Shailja Thakur, Baleegh Ahmad, Hammond Pearce, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh Karri, Siddharth Garg","doi":"10.1145/3643681","DOIUrl":"https://doi.org/10.1145/3643681","url":null,"abstract":"<p>In this study, we explore the capability of Large Language Models (LLMs) to automate hardware design by automatically completing partial Verilog code, a common language for designing and modeling digital systems. We fine-tune pre-existing LLMs on Verilog datasets compiled from GitHub and Verilog textbooks. We evaluate the functional correctness of the generated Verilog code using a specially designed test suite, featuring a custom problem set and testing benches. Here, our fine-tuned open-source CodeGen-16B model outperforms the commercial state-of-the-art GPT-3.5-turbo model with a 1.1% overall increase. Upon testing with a more diverse and complex problem set, we find that the fine-tuned model shows competitive performance against state-of-the-art gpt-3.5-turbo, excelling in certain scenarios. Notably, it demonstrates a 41% improvement in generating syntactically correct Verilog code across various problem categories compared to its pre-trained counterpart, highlighting the potential of smaller, in-house LLMs in hardware design automation. </p><p>We release our training/evaluation scripts and LLM checkpoints as open-source contributions.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"39 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139768885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient FPGA Architecture with Turn-Restricted Switch Boxes 带转角限制开关盒的高效 FPGA 架构
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-02-03 DOI: 10.1145/3643809
Fatemeh Serajeh Hassani, Mohammad Sadrosadati, Nezam Rohbani, Sebastian Pointner, Robert Wille, Hamid Sarbazi-azad

Abstract. Field-Programmable Gate Arrays (FPGAs) employ a large number of SRAM cells to provide a flexible routing architecture which have a significant impact on the FPGA’s area and power consumption. This flexible routing allows for a rather easy realization of the desired functionality, but our evaluations show that the full routing flexibility is not required in many occasions. In this work, we focus on what is actually needed and introduce a new switch-box realization what we call Turn-Restricted Switch-Boxes which supports only a subset of possible turns. The proposed method increases the utilization rate of FPGA switch-boxes by eliminating the unemployed resources. Experimental evaluations confirm that the area and average power consumption can be reduced by 12.8% and 14.1%, on average, respectively and the FPGA routing susceptibility to SEU and MBU can be improved by 18.2%, on average, by imposing negligible performance.

摘要现场可编程门阵列(FPGA)采用大量的 SRAM 单元来提供灵活的路由架构,这对 FPGA 的面积和功耗有重大影响。这种灵活的路由可以轻松实现所需的功能,但我们的评估表明,在很多情况下并不需要完全的路由灵活性。在这项工作中,我们将重点放在实际需要的功能上,并引入了一种新的开关盒实现方式,我们称之为 "转数受限开关盒"(Turn-Restricted Switch-Boxes),它只支持可能的转数子集。所提出的方法通过消除闲置资源提高了 FPGA 开关盒的利用率。实验评估证实,面积和平均功耗可分别平均减少 12.8% 和 14.1%,FPGA 路由对 SEU 和 MBU 的敏感性可平均提高 18.2%,性能可忽略不计。
{"title":"An Efficient FPGA Architecture with Turn-Restricted Switch Boxes","authors":"Fatemeh Serajeh Hassani, Mohammad Sadrosadati, Nezam Rohbani, Sebastian Pointner, Robert Wille, Hamid Sarbazi-azad","doi":"10.1145/3643809","DOIUrl":"https://doi.org/10.1145/3643809","url":null,"abstract":"<p><i>Abstract. Field-Programmable Gate Arrays</i> (FPGAs) employ a large number of SRAM cells to provide a flexible routing architecture which have a significant impact on the FPGA’s area and power consumption. This flexible routing allows for a rather easy realization of the desired functionality, but our evaluations show that the full routing flexibility is not required in many occasions. In this work, we focus on what is actually needed and introduce a new switch-box realization what we call <i>Turn-Restricted Switch-Boxes</i> which supports only a subset of possible turns. The proposed method increases the utilization rate of FPGA switch-boxes by eliminating the unemployed resources. Experimental evaluations confirm that the area and average power consumption can be reduced by 12.8% and 14.1%, on average, respectively and the FPGA routing susceptibility to SEU and MBU can be improved by 18.2%, on average, by imposing negligible performance.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"20 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139678094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reduced On-Chip Storage of Seeds for Built-In Test Generation 减少内置测试生成的片上种子存储量
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-02-01 DOI: 10.1145/3643810
Irith Pomeranz

Logic built-in self-test (LBIST) approaches use an on-chip logic block for test generation and thus enable in-field testing. Recent reports of silent data corruption underline the importance of in-field testing. In a class of storage-based LBIST approaches, compressed tests are stored on-chip and decompressed by an on-chip decompression logic. The on-chip storage requirements may become a bottleneck when the number of compressed tests is large. In this case, using each compressed test for applying several different tests allows the storage requirements to be reduced. However, producing different tests from each compressed test has a hardware overhead. This article suggests a new on-chip storage scheme for compressed tests that eliminates the additional hardware overhead. Under the new storage scheme, a set of N B-bit compressed tests targeting a set of faults F0 is translated into a sequence S of N · B bits. Every B consecutive bits of S are considered as a compressed test. The sequence S thus yields close to N · B compressed tests, magnifying the test data stored in S almost B times. Taking advantage of the extra tests, the article describes a software procedure that is applied off-line to reduce S without losing fault coverage of F0. Experimental results for benchmark circuits demonstrate significant reductions in the storage requirements of S, and significant increases in the fault coverage of a second set of faults, F1.

逻辑内置自测试(LBIST)方法使用片上逻辑块生成测试,从而实现了现场测试。最近有关无声数据损坏的报道强调了现场测试的重要性。在一类基于存储的 LBIST 方法中,压缩测试存储在片上,并由片上解压缩逻辑进行解压缩。当压缩测试数量较多时,片上存储要求可能会成为瓶颈。在这种情况下,利用每个压缩测试应用多个不同的测试,可以降低存储要求。但是,从每个压缩测试中生成不同的测试会产生硬件开销。本文提出了一种新的片上压缩测试存储方案,可消除额外的硬件开销。在新的存储方案下,一组针对故障 F0 的 N B 位压缩测试被转换为 N - B 位序列 S。S 的每 B 个连续比特都被视为一个压缩测试。这样,序列 S 就产生了接近 N - B 的压缩测试,将存储在 S 中的测试数据放大了近 B 倍。文章介绍了一种软件程序,利用额外的测试,在不损失 F0 故障覆盖率的情况下,离线减少 S。对基准电路的实验结果表明,S 的存储要求显著降低,第二组故障 F1 的故障覆盖率显著提高。
{"title":"Reduced On-Chip Storage of Seeds for Built-In Test Generation","authors":"Irith Pomeranz","doi":"10.1145/3643810","DOIUrl":"https://doi.org/10.1145/3643810","url":null,"abstract":"<p>Logic built-in self-test (<i>LBIST</i>) approaches use an on-chip logic block for test generation and thus enable in-field testing. Recent reports of silent data corruption underline the importance of in-field testing. In a class of storage-based <i>LBIST</i> approaches, compressed tests are stored on-chip and decompressed by an on-chip decompression logic. The on-chip storage requirements may become a bottleneck when the number of compressed tests is large. In this case, using each compressed test for applying several different tests allows the storage requirements to be reduced. However, producing different tests from each compressed test has a hardware overhead. This article suggests a new on-chip storage scheme for compressed tests that eliminates the additional hardware overhead. Under the new storage scheme, a set of <i>N B</i>-bit compressed tests targeting a set of faults <i>F</i><sub>0</sub> is translated into a sequence <i>S</i> of <i>N</i> · <i>B</i> bits. Every <i>B</i> consecutive bits of <i>S</i> are considered as a compressed test. The sequence <i>S</i> thus yields close to <i>N</i> · <i>B</i> compressed tests, magnifying the test data stored in <i>S</i> almost <i>B</i> times. Taking advantage of the extra tests, the article describes a software procedure that is applied off-line to reduce <i>S</i> without losing fault coverage of <i>F</i><sub>0</sub>. Experimental results for benchmark circuits demonstrate significant reductions in the storage requirements of <i>S</i>, and significant increases in the fault coverage of a second set of faults, <i>F</i><sub>1</sub>.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"29 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139656906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
D3PBO: Dynamic Domain Decomposition based Parallel Bayesian Optimization for Large-scale Analog Circuit Sizing D3PBO:基于动态领域分解的并行贝叶斯优化,用于大规模模拟电路选型
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-01-31 DOI: 10.1145/3643811
Aidong Zhao, Tianchen Gu, Zhaori Bi, Fan Yang, Changhao Yan, Xuan Zeng, Zixiao Lin, Wenchuang Hu, Dian Zhou

Bayesian optimization (BO) is an efficient global optimization method for expensive black-box functions. Whereas, the expansion for high-dimensional problems and large sample budgets still remains a severe challenge. In order to extend BO for large-scale analog circuit synthesis, a novel computationally efficient parallel BO method, D3PBO, is proposed for high-dimensional problems in this work. We introduce the dynamic domain decomposition method based on maximum variance between clusters. The search space is decomposed into subdomains progressively to limit the maximal number of observations in each domain. The promising domain is explored by multi-trust region based batch BO with the local Gaussian process (GP) model. As the domain decomposition progresses, the basin-shaped domain is identified using a GP-assisted quadratic regression method and exploited by the local search method BOBYQA to achieve faster convergence rate. The time complexity of D3PBO is constant for each iteration. Experiments demonstrate that D3PBO obtains better results with significant less runtime consumption compared to state-of-the-art methods. For the circuit optimization experiments, D3PBO achieves up to 10 × runtime speedup compared to TuRBO with better solutions.

贝叶斯优化(BO)是一种针对昂贵的黑盒函数的高效全局优化方法。然而,对高维问题和大样本预算的扩展仍然是一个严峻的挑战。为了将 BO 扩展到大规模模拟电路合成中,本研究针对高维问题提出了一种新型计算高效的并行 BO 方法 D3PBO。我们引入了基于簇间最大方差的动态域分解方法。搜索空间被逐步分解成子域,以限制每个域中观测值的最大数量。通过基于多信任区域的批量 BO 和本地高斯过程(GP)模型,探索有希望的域。随着域分解的进行,利用 GP 辅助二次回归方法识别出盆地状域,并通过局部搜索方法 BOBYQA 加以利用,以实现更快的收敛速度。D3PBO 的时间复杂度在每次迭代中都是恒定的。实验证明,与最先进的方法相比,D3PBO 能以更少的运行时间获得更好的结果。在电路优化实验中,与 TuRBO 相比,D3PBO 的运行速度提高了 10 倍,并获得了更好的解决方案。
{"title":"D3PBO: Dynamic Domain Decomposition based Parallel Bayesian Optimization for Large-scale Analog Circuit Sizing","authors":"Aidong Zhao, Tianchen Gu, Zhaori Bi, Fan Yang, Changhao Yan, Xuan Zeng, Zixiao Lin, Wenchuang Hu, Dian Zhou","doi":"10.1145/3643811","DOIUrl":"https://doi.org/10.1145/3643811","url":null,"abstract":"<p>Bayesian optimization (BO) is an efficient global optimization method for expensive black-box functions. Whereas, the expansion for high-dimensional problems and large sample budgets still remains a severe challenge. In order to extend BO for large-scale analog circuit synthesis, a novel computationally efficient parallel BO method, D<sup>3</sup>PBO, is proposed for high-dimensional problems in this work. We introduce the dynamic domain decomposition method based on maximum variance between clusters. The search space is decomposed into subdomains progressively to limit the maximal number of observations in each domain. The promising domain is explored by multi-trust region based batch BO with the local Gaussian process (GP) model. As the domain decomposition progresses, the basin-shaped domain is identified using a GP-assisted quadratic regression method and exploited by the local search method BOBYQA to achieve faster convergence rate. The time complexity of D<sup>3</sup>PBO is constant for each iteration. Experiments demonstrate that D<sup>3</sup>PBO obtains better results with significant less runtime consumption compared to state-of-the-art methods. For the circuit optimization experiments, D<sup>3</sup>PBO achieves up to 10 × runtime speedup compared to TuRBO with better solutions.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"17 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139665323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing VLIW Instruction Scheduling via a Two-Dimensional Constrained Dynamic Programming 通过二维约束动态编程优化 VLIW 指令调度
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-01-25 DOI: 10.1145/3643135
Can Deng, Zhaoyun Chen, Yang Shi, Yimin Ma, Mei Wen, Lei Luo

Typical embedded processors, such as Digital Signal Processors (DSPs), usually adopt Very Long Instruction Word (VLIW) architecture to improve computing efficiency. The performance of VLIW processors heavily relies on Instruction-Level Parallelism (ILP). Therefore, it is crucial to develop an efficient instruction scheduling algorithm to explore more ILP. While heuristic algorithms are widely used in modern compilers due to simple implementation and low computational cost, they have limitations in providing accurate solutions and are prone to local optima. On the other hand, exact algorithms can usually find the optimal solution, but their high time overhead makes them less suitable for large-scale problems. This paper proposes a two-dimensional constrained dynamic programming (TDCDP) approach and a quantitative model for instruction scheduling. The TDCDP approach achieves near-optimal solutions within an acceptable time overhead. Furthermore, we integrate our TDCDP approach into mainstream compiler architecture, encompassing Pre- and Post-RA (register allocation) scheduling. We conduct a quantitative evaluation of TDCDP compared to four heuristic algorithms on a typical VLIW processor. Our approach achieves an efficiency improvement of up to 58.34% in final solutions compared to the heuristic algorithms. Additionally, the Post-RA Scheduling enhances programs with an average speedup of 14.04% than solely applying the Pre-RA Scheduling.

典型的嵌入式处理器,如数字信号处理器(DSP),通常采用超长指令字(VLIW)架构来提高计算效率。VLIW 处理器的性能在很大程度上依赖于指令级并行性(ILP)。因此,开发一种高效的指令调度算法以探索更多的 ILP 至关重要。虽然启发式算法由于实施简单、计算成本低廉而被广泛应用于现代编译器中,但它们在提供精确解决方案方面存在局限性,而且容易出现局部最优。另一方面,精确算法通常能找到最优解,但其时间开销大,不太适合大规模问题。本文提出了一种二维约束动态编程(TDCDP)方法和指令调度的定量模型。TDCDP 方法能在可接受的时间开销内实现接近最优的解决方案。此外,我们还将 TDCDP 方法集成到主流编译器架构中,包括前 RA 和后 RA(寄存器分配)调度。我们在典型的 VLIW 处理器上对 TDCDP 与四种启发式算法进行了定量评估。与启发式算法相比,我们的方法使最终解决方案的效率提高了 58.34%。此外,Post-RA Scheduling 比单纯应用 Pre-RA Scheduling 的程序平均提速 14.04%。
{"title":"Optimizing VLIW Instruction Scheduling via a Two-Dimensional Constrained Dynamic Programming","authors":"Can Deng, Zhaoyun Chen, Yang Shi, Yimin Ma, Mei Wen, Lei Luo","doi":"10.1145/3643135","DOIUrl":"https://doi.org/10.1145/3643135","url":null,"abstract":"<p>Typical embedded processors, such as Digital Signal Processors (DSPs), usually adopt Very Long Instruction Word (VLIW) architecture to improve computing efficiency. The performance of VLIW processors heavily relies on Instruction-Level Parallelism (ILP). Therefore, it is crucial to develop an efficient instruction scheduling algorithm to explore more ILP. While heuristic algorithms are widely used in modern compilers due to simple implementation and low computational cost, they have limitations in providing accurate solutions and are prone to local optima. On the other hand, exact algorithms can usually find the optimal solution, but their high time overhead makes them less suitable for large-scale problems. This paper proposes a two-dimensional constrained dynamic programming (TDCDP) approach and a quantitative model for instruction scheduling. The TDCDP approach achieves near-optimal solutions within an acceptable time overhead. Furthermore, we integrate our TDCDP approach into mainstream compiler architecture, encompassing Pre- and Post-RA (register allocation) scheduling. We conduct a quantitative evaluation of TDCDP compared to four heuristic algorithms on a typical VLIW processor. Our approach achieves an efficiency improvement of up to 58.34% in final solutions compared to the heuristic algorithms. Additionally, the Post-RA Scheduling enhances programs with an average speedup of 14.04% than solely applying the Pre-RA Scheduling.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"53 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139554221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EPHA: An Energy-efficient Parallel Hybrid Architecture for ANNs and SNNs EPHA:适用于 ANN 和 SNN 的高能效并行混合架构
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-01-25 DOI: 10.1145/3643134
Yunping Zhao, Sheng Ma, Hengzhu Liu, Libo Huang, Libo Huang

Artificial neural networks (ANNs) and spiking neural networks (SNNs) are two general approaches to achieve artificial intelligence (AI). The former have been widely used in academia and industry fields. The latter, SNNs, are more similar to biological neural networks and can realize ultra-low power consumption, thus have received widespread research attention. However, due to their fundamental differences in computation formula and information coding, the two methods often require different and incompatible platforms. Alongside the development of AI, a general platform that can support both ANNs and SNNs is necessary. Moreover, there are some similarities between ANNs and SNNs, which leaves room to deploy different networks on the same architecture. However, there is little related research on this topic. Accordingly, this paper presents an energy-efficient, scalable, and non-Von Neumann architecture (EPHA) for ANNs and SNNs. Our study combines device-, circuit-, architecture-, and algorithm-level innovations to achieve a parallel architecture with ultra-low power consumption. We use the compensated ferrimagnet to act as both synapses and neurons to store weights and perform dot-product operations, respectively. Moreover, we propose a novel computing flow to reduce the operations across multiple crossbar arrays, which enables our design to conduct large and complex tasks. On a suite of ANN and SNN workloads, the EPHA is 1.6 × more power efficient than a state-of-the-art design, NEBULA, in the ANN mode. In the SNN mode, our design is 4 orders of magnitude more than the Loihi in power efficiency.

人工神经网络(ANN)和尖峰神经网络(SNN)是实现人工智能(AI)的两种通用方法。前者已广泛应用于学术界和工业领域。后者,即 SNN,与生物神经网络更为相似,可以实现超低功耗,因此受到了广泛的研究关注。然而,由于这两种方法在计算公式和信息编码方面存在本质区别,它们往往需要不同的平台,互不兼容。随着人工智能的发展,有必要建立一个能同时支持 ANN 和 SNN 的通用平台。此外,ANN 和 SNN 有一些相似之处,这就为在同一架构上部署不同的网络留出了空间。然而,这方面的相关研究很少。因此,本文提出了一种适用于 ANNs 和 SNNs 的高能效、可扩展和非冯-诺依曼架构 (EPHA)。我们的研究结合了设备、电路、架构和算法层面的创新,以实现超低功耗的并行架构。我们利用补偿铁氧体作为突触和神经元,分别存储权重和执行点积运算。此外,我们还提出了一种新颖的计算流程,以减少跨多个横杆阵列的操作,从而使我们的设计能够执行大型复杂任务。在一系列 ANN 和 SNN 工作负载上,EPHA 在 ANN 模式下的功耗效率是最先进设计 NEBULA 的 1.6 倍。在 SNN 模式下,我们的设计比 Loihi 的能效高出 4 个数量级。
{"title":"EPHA: An Energy-efficient Parallel Hybrid Architecture for ANNs and SNNs","authors":"Yunping Zhao, Sheng Ma, Hengzhu Liu, Libo Huang, Libo Huang","doi":"10.1145/3643134","DOIUrl":"https://doi.org/10.1145/3643134","url":null,"abstract":"<p>Artificial neural networks (ANNs) and spiking neural networks (SNNs) are two general approaches to achieve artificial intelligence (AI). The former have been widely used in academia and industry fields. The latter, SNNs, are more similar to biological neural networks and can realize ultra-low power consumption, thus have received widespread research attention. However, due to their fundamental differences in computation formula and information coding, the two methods often require different and incompatible platforms. Alongside the development of AI, a general platform that can support both ANNs and SNNs is necessary. Moreover, there are some similarities between ANNs and SNNs, which leaves room to deploy different networks on the same architecture. However, there is little related research on this topic. Accordingly, this paper presents an energy-efficient, scalable, and non-Von Neumann architecture (EPHA) for ANNs and SNNs. Our study combines device-, circuit-, architecture-, and algorithm-level innovations to achieve a parallel architecture with ultra-low power consumption. We use the compensated ferrimagnet to act as both synapses and neurons to store weights and perform dot-product operations, respectively. Moreover, we propose a novel computing flow to reduce the operations across multiple crossbar arrays, which enables our design to conduct large and complex tasks. On a suite of ANN and SNN workloads, the EPHA is 1.6 × more power efficient than a state-of-the-art design, NEBULA, in the ANN mode. In the SNN mode, our design is 4 orders of magnitude more than the Loihi in power efficiency.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"28 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139579076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pareto Optimization of Analog circuits using Reinforcement Learning 利用强化学习对模拟电路进行帕累托优化
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-01-17 DOI: 10.1145/3640463
Karthik Somayaji Ns, Peng Li

Analog circuit optimization and design presents a unique set of challenges in the IC design process. Many applications require for the designer to optimize for multiple competing objectives which poses a crucial challenge. Motivated by these practical aspects, we propose a novel method to tackle multi-objective optimization for analog circuit design in continuous action spaces. In particular, we propose to: (i) Extrapolate current techniques in Multi-Objective Reinforcement Learning (MORL) to continuous state and action spaces. (ii) Provide for a dynamically tunable trained model to query user defined preferences in multi-objective optimization in the analog circuit design context.

模拟电路优化和设计在集成电路设计过程中提出了一系列独特的挑战。许多应用要求设计人员针对多个相互竞争的目标进行优化,这就提出了严峻的挑战。在这些实际问题的推动下,我们提出了一种新方法来解决连续作用空间中模拟电路设计的多目标优化问题。具体而言,我们建议(i) 将当前的多目标强化学习(MORL)技术推广到连续状态和动作空间。(ii) 在模拟电路设计的多目标优化中,提供一个可动态调整的训练模型,以查询用户定义的偏好。
{"title":"Pareto Optimization of Analog circuits using Reinforcement Learning","authors":"Karthik Somayaji Ns, Peng Li","doi":"10.1145/3640463","DOIUrl":"https://doi.org/10.1145/3640463","url":null,"abstract":"<p>Analog circuit optimization and design presents a unique set of challenges in the IC design process. Many applications require for the designer to optimize for multiple competing objectives which poses a crucial challenge. Motivated by these practical aspects, we propose a novel method to tackle multi-objective optimization for analog circuit design in continuous action spaces. In particular, we propose to: (i) Extrapolate current techniques in Multi-Objective Reinforcement Learning (MORL) to continuous state and action spaces. (ii) Provide for a dynamically tunable trained model to query user defined preferences in multi-objective optimization in the analog circuit design context.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"22 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139482630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Adversarial Examples Utilizing Pixel Value Diversity 利用像素值多样性检测对抗性示例
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-01-16 DOI: 10.1145/3636460
Jinxin Dong, Pingqiang Zhou

In this paper, we introduce two novel methods to detect adversarial examples utilizing pixel value diversity. First, we propose the concept of pixel value diversity (which reflects the spread of pixel values in an image) and two independent metrics (UPVR and RPVR) to assess the pixel value diversity separately. Then we propose two methods to detect adversarial examples based on the threshold method and Bayesian method respectively. Experimental results show that compared to an excellent prior method LID, our proposed methods achieve better performances in detecting adversarial examples. We also show the robustness of our proposed work against an adaptive attack method.

在本文中,我们介绍了两种利用像素值多样性检测对抗示例的新方法。首先,我们提出了像素值多样性的概念(它反映了图像中像素值的分布)和两个独立的指标(UPVR 和 RPVR)来分别评估像素值多样性。然后,我们分别提出了基于阈值法和贝叶斯法的两种检测对抗示例的方法。实验结果表明,与优秀的先验方法 LID 相比,我们提出的方法在检测对抗性示例方面取得了更好的性能。我们还展示了我们提出的方法对自适应攻击方法的鲁棒性。
{"title":"Detecting Adversarial Examples Utilizing Pixel Value Diversity","authors":"Jinxin Dong, Pingqiang Zhou","doi":"10.1145/3636460","DOIUrl":"https://doi.org/10.1145/3636460","url":null,"abstract":"<p>In this paper, we introduce two novel methods to detect adversarial examples utilizing pixel value diversity. First, we propose the concept of pixel value diversity (which reflects the spread of pixel values in an image) and two independent metrics (UPVR and RPVR) to assess the pixel value diversity separately. Then we propose two methods to detect adversarial examples based on the threshold method and Bayesian method respectively. Experimental results show that compared to an excellent prior method LID, our proposed methods achieve better performances in detecting adversarial examples. We also show the robustness of our proposed work against an adaptive attack method.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139476939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Module-Level Configuration Methodology for Programmable Camouflaged Logic 可编程伪装逻辑的模块级配置方法
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-01-12 DOI: 10.1145/3640462
Jianfeng Wang, Zhonghao Chen, Jiahao Zhang, Yixin Xu, Tongguang Yu, Ziheng Zheng, Enze Ye, Sumitha George, Huazhong Yang, Yongpan Liu, Kai Ni, Vijaykrishnan Narayanan, Xueqing Li

Logic camouflage is a widely adopted technique that mitigates the threat of intellectual property (IP) piracy and overproduction in the integrated circuit (IC) supply chain. Camouflaged logic achieves functional obfuscation through physical-level ambiguity and post-manufacturing programmability. However, discussions on programmability are confined to the level of logic cells/gates, limiting the broader-scale application of logic camouflage. In this work, we propose a novel module-level configuration methodology for programmable camouflaged logic that can be implemented without additional hardware ports and with negligible resources. We prove theoretically that the configuration of the programmable camouflaged logic cells can be achieved through the inputs and netlist of the original module. Further, we propose a novel lightweight ferroelectric FET (FeFET)-based reconfigurable logic gate (rGate) family and apply it to the proposed methodology. With the flexible replacement and the proposed configuration-aware conversion algorithm, this work is characterized by the input-only programming scheme as well as the combination of high output error rate and point-function-like defense. Evaluations show an average of >95% of the alternative rGate location for camouflage, which is sufficient for the security-aware design. We illustrate the exponential complexity in function state traversal and the enhanced defense capability of locked blackbox against SAT attacks compared to key-based methods. We also preserve an evident output Hamming distance and introduce negligible hardware overheads in both gate-level and module-level evaluations under typical benchmarks.

逻辑伪装是一种广泛采用的技术,可减轻集成电路(IC)供应链中知识产权(IP)盗版和过度生产的威胁。伪装逻辑通过物理层面的模糊性和制造后的可编程性实现功能混淆。然而,关于可编程性的讨论仅限于逻辑单元/门的层面,限制了逻辑伪装在更大范围内的应用。在这项工作中,我们为可编程伪装逻辑提出了一种新颖的模块级配置方法,无需额外的硬件端口和可忽略的资源即可实现。我们从理论上证明,可编程伪装逻辑单元的配置可以通过原始模块的输入和网表来实现。此外,我们还提出了一种基于铁电场效应晶体管(FeFET)的新型轻量级可重构逻辑门(rGate)系列,并将其应用于所提出的方法。通过灵活的替换和所提出的配置感知转换算法,这项工作的特点是只需输入编程方案以及高输出错误率和点函数式防御的结合。评估显示,伪装的可选 rGate 位置平均达到 95%,足以满足安全感知设计的要求。我们说明了函数状态遍历的指数复杂性,以及与基于密钥的方法相比,锁定黑盒对 SAT 攻击的增强防御能力。我们还保留了明显的输出汉明距离,并在典型基准下的门级和模块级评估中引入了可忽略不计的硬件开销。
{"title":"A Module-Level Configuration Methodology for Programmable Camouflaged Logic","authors":"Jianfeng Wang, Zhonghao Chen, Jiahao Zhang, Yixin Xu, Tongguang Yu, Ziheng Zheng, Enze Ye, Sumitha George, Huazhong Yang, Yongpan Liu, Kai Ni, Vijaykrishnan Narayanan, Xueqing Li","doi":"10.1145/3640462","DOIUrl":"https://doi.org/10.1145/3640462","url":null,"abstract":"<p>Logic camouflage is a widely adopted technique that mitigates the threat of intellectual property (IP) piracy and overproduction in the integrated circuit (IC) supply chain. Camouflaged logic achieves functional obfuscation through physical-level ambiguity and post-manufacturing programmability. However, discussions on programmability are confined to the level of logic cells/gates, limiting the broader-scale application of logic camouflage. In this work, we propose a novel module-level configuration methodology for programmable camouflaged logic that can be implemented without additional hardware ports and with negligible resources. We prove theoretically that the configuration of the programmable camouflaged logic cells can be achieved through the inputs and netlist of the original module. Further, we propose a novel lightweight ferroelectric FET (FeFET)-based reconfigurable logic gate (rGate) family and apply it to the proposed methodology. With the flexible replacement and the proposed configuration-aware conversion algorithm, this work is characterized by the input-only programming scheme as well as the combination of high output error rate and point-function-like defense. Evaluations show an average of &gt;95% of the alternative rGate location for camouflage, which is sufficient for the security-aware design. We illustrate the exponential complexity in function state traversal and the enhanced defense capability of locked blackbox against SAT attacks compared to key-based methods. We also preserve an evident output Hamming distance and introduce negligible hardware overheads in both gate-level and module-level evaluations under typical benchmarks.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"273 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139458921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Design Automation of Electronic Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1