首页 > 最新文献

2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)最新文献

英文 中文
A High Performance Detailed Router Based on Integer Programming with Adaptive Route Guides 基于整数编程和自适应路由指引的高性能详细路由器
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473934
Zhongdong Qi, Shizhe Hu, Qi Peng, Hailong You, Chao Han, Zhangming Zhu
Detailed routing is a crucial and time-consuming stage for ASIC design. As the number and complexity of design rules increase, it is challenging to achieve high solution quality and fast speed at the same time in detailed routing. In this work, a high performance detailed routing algorithm named IPAG with integer programming (IP) is proposed. The IP formulation uses the selection of candidate routes as decision variables. High quality candidate routes are generated by queue-based rip-up and reroute with adaptive global route guidance. A design rule checking engine which can simultaneously process nets with multiple routes is designed, to efficiently construct penalty parameters in the IP formulation. Experimental results on ISPD 2018 detailed routing benchmark show that IPAG achieves better solution quality in shorter or comparable runtime, as compared to the state-of-the-art academic detailed router.
详细布线是 ASIC 设计的关键和耗时阶段。随着设计规则数量和复杂度的增加,在详细路由过程中同时实现高解决方案质量和高速度是一项挑战。本研究提出了一种名为 IPAG 的高性能详细路由算法,该算法采用整数编程(IP)。IP 公式将候选路由的选择作为决策变量。通过基于队列的自适应全局路由引导撕裂和重路由,生成高质量的候选路由。设计了一个可同时处理多条路线的设计规则检查引擎,以便在 IP 公式中有效地构建惩罚参数。ISPD 2018 详细路由基准的实验结果表明,与最先进的学术详细路由器相比,IPAG 在更短或相当的运行时间内实现了更高的解决方案质量。
{"title":"A High Performance Detailed Router Based on Integer Programming with Adaptive Route Guides","authors":"Zhongdong Qi, Shizhe Hu, Qi Peng, Hailong You, Chao Han, Zhangming Zhu","doi":"10.1109/ASP-DAC58780.2024.10473934","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473934","url":null,"abstract":"Detailed routing is a crucial and time-consuming stage for ASIC design. As the number and complexity of design rules increase, it is challenging to achieve high solution quality and fast speed at the same time in detailed routing. In this work, a high performance detailed routing algorithm named IPAG with integer programming (IP) is proposed. The IP formulation uses the selection of candidate routes as decision variables. High quality candidate routes are generated by queue-based rip-up and reroute with adaptive global route guidance. A design rule checking engine which can simultaneously process nets with multiple routes is designed, to efficiently construct penalty parameters in the IP formulation. Experimental results on ISPD 2018 detailed routing benchmark show that IPAG achieves better solution quality in shorter or comparable runtime, as compared to the state-of-the-art academic detailed router.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"58 9-10","pages":"975-980"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140530928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In Medio Stat Virtus*: Combining Boolean and Pattern Matching In Medio Stat Virtus*:布尔匹配与模式匹配相结合
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473889
Gianluca Radi, A. Calvino, Giovanni De Micheli
Technology mapping transforms a technology-independent representation into a technology-dependent one given a library of cells. This process is performed by means of local replacements that are extracted by matching sections of the subject graph to library cells. Matching techniques are classified mainly into pattern and Boolean. These two techniques differ in quality and number of generated matches, scalability, and run time. This paper proposes hybrid matching, a new methodology that integrates both techniques in a technology mapping algorithm. In particular, pattern matching is used to speed up the matching phase and support large cells. Boolean matching is used to increase the number of matches and quality. Compared to Boolean matching, we show that hybrid matching yields an average reduction in the area and run time by 6% and 25%, respectively, with similar delay.
技术映射将一个独立于技术的表示法转换成一个独立于技术的表示法,并给定一个单元库。这一过程是通过局部替换来完成的,局部替换是通过将主题图的部分与单元库进行匹配来提取的。匹配技术主要分为模式匹配和布尔匹配。这两种技术在生成匹配的质量和数量、可扩展性和运行时间方面存在差异。本文提出了混合匹配技术,这是一种将两种技术集成到技术映射算法中的新方法。其中,模式匹配用于加快匹配阶段并支持大型单元。布尔匹配用于提高匹配数量和质量。与布尔匹配法相比,我们的研究表明,混合匹配法的平均面积和运行时间分别减少了 6% 和 25%,而延迟却相差无几。
{"title":"In Medio Stat Virtus*: Combining Boolean and Pattern Matching","authors":"Gianluca Radi, A. Calvino, Giovanni De Micheli","doi":"10.1109/ASP-DAC58780.2024.10473889","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473889","url":null,"abstract":"Technology mapping transforms a technology-independent representation into a technology-dependent one given a library of cells. This process is performed by means of local replacements that are extracted by matching sections of the subject graph to library cells. Matching techniques are classified mainly into pattern and Boolean. These two techniques differ in quality and number of generated matches, scalability, and run time. This paper proposes hybrid matching, a new methodology that integrates both techniques in a technology mapping algorithm. In particular, pattern matching is used to speed up the matching phase and support large cells. Boolean matching is used to increase the number of matches and quality. Compared to Boolean matching, we show that hybrid matching yields an average reduction in the area and run time by 6% and 25%, respectively, with similar delay.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"259 6","pages":"404-410"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TIUP: Effective Processor Verification with Tautology-Induced Universal Properties TIUP:利用同义反复通用特性进行有效处理器验证
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473912
Yufeng Li, Yiwei Ci, Qiusong Yang
Design verification is a complex and costly task, especially for large and intricate processor projects. Formal verification techniques provide advantages by thoroughly examining design behaviors, but they require extensive labor and expertise in property formulation. Recent research focuses on verifying designs using the self-consistency universal property, reducing verification difficulty as it is design-independent. However, the single self-consistency property faces false positives and scalability issues due to exponential state space growth. To tackle these challenges, this paper introduces TIUP, a technique using tautologies as universal properties. We show how TIUP effectively uses tautologies as abstract specifications, covering processor data and control paths. TIUP simplifies and streamlines verification for engineers, enabling efficient formal processor verification.
设计验证是一项复杂且成本高昂的任务,对于大型复杂处理器项目而言尤其如此。形式化验证技术通过彻底检查设计行为提供了优势,但它们需要大量的人力和属性表述方面的专业知识。最近的研究重点是使用自一致性通用属性验证设计,由于它与设计无关,因此降低了验证难度。然而,由于状态空间呈指数增长,单一自洽性属性面临误报和可扩展性问题。为了应对这些挑战,本文介绍了 TIUP,一种使用同义反复作为普遍属性的技术。我们展示了 TIUP 如何有效地使用同义反复作为抽象规范,涵盖处理器数据和控制路径。TIUP 简化并精简了工程师的验证工作,实现了高效的正式处理器验证。
{"title":"TIUP: Effective Processor Verification with Tautology-Induced Universal Properties","authors":"Yufeng Li, Yiwei Ci, Qiusong Yang","doi":"10.1109/ASP-DAC58780.2024.10473912","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473912","url":null,"abstract":"Design verification is a complex and costly task, especially for large and intricate processor projects. Formal verification techniques provide advantages by thoroughly examining design behaviors, but they require extensive labor and expertise in property formulation. Recent research focuses on verifying designs using the self-consistency universal property, reducing verification difficulty as it is design-independent. However, the single self-consistency property faces false positives and scalability issues due to exponential state space growth. To tackle these challenges, this paper introduces TIUP, a technique using tautologies as universal properties. We show how TIUP effectively uses tautologies as abstract specifications, covering processor data and control paths. TIUP simplifies and streamlines verification for engineers, enabling efficient formal processor verification.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"83 3","pages":"269-274"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physics-Informed Learning for Versatile RRAM Reset and Retention Simulation 多用途 RRAM 重置和保持仿真的物理信息学习
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473856
Tianshu Hou, Yuan Ren, Wenyong Zhou, Can Li, Zhongrui Wang, Haibao Chen, Ngai Wong
Resistive random-access memory (RRAM) constitutes an emerging and promising platform for compute-inmemory (CIM) edge AI. However, the switching mechanism and controllability of RRAM are still under debate owing to the influence of multiphysics. Although physics-informed neural networks (PINNs) are successful in achieving mesh-free multiphysics solutions in many applications, the resultant accuracy is not satisfactory in RRAM analyses. This work investigates the characteristics of RRAM devices - retention and reset transition which are described in terms of the dissolution of a conductive filament (CF) in 3-D axis-symmetric geometry. Specifically, we provide a novel neural network characterization of ion migration, Joule heating, and carrier transport, governed by the solutions of partial differential equations (PDEs). Motivated by physics-informed learning, the separation of variables (SOV) method and the neural tangent kernel (NTK) theory, we propose a customized 3-channel fully-connected network and a modified random Fourier feature (mRFF) embedding strategy to capture multiscale properties and appropriate frequency features of the self-consistent multiphysics solutions. The proposed model eliminates the need for grid meshing and temporal iterations widely used in RRAM analysis. Experiments then confirm its superior accuracy over competing physics-informed methods.
电阻式随机存取存储器(RRAM)是计算内存(CIM)边缘人工智能的新兴平台,前景广阔。然而,由于多物理场的影响,RRAM 的开关机制和可控性仍存在争议。虽然物理信息神经网络(PINNs)在许多应用中成功实现了无网格多物理场求解,但在 RRAM 分析中,其精度并不令人满意。这项工作研究了 RRAM 器件的特性--保持和重置转换,这是用三维轴对称几何中导电丝 (CF) 的溶解来描述的。具体来说,我们通过偏微分方程 (PDE) 的解法,对离子迁移、焦耳加热和载流子传输进行了新颖的神经网络表征。受物理信息学习、变量分离(SOV)方法和神经切核(NTK)理论的启发,我们提出了一种定制的三通道全连接网络和一种改进的随机傅里叶特征(mRFF)嵌入策略,以捕捉自洽多物理解的多尺度特性和适当的频率特性。所提出的模型省去了 RRAM 分析中广泛使用的网格划分和时间迭代。随后的实验证实,该模型的准确性优于同类物理信息方法。
{"title":"Physics-Informed Learning for Versatile RRAM Reset and Retention Simulation","authors":"Tianshu Hou, Yuan Ren, Wenyong Zhou, Can Li, Zhongrui Wang, Haibao Chen, Ngai Wong","doi":"10.1109/ASP-DAC58780.2024.10473856","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473856","url":null,"abstract":"Resistive random-access memory (RRAM) constitutes an emerging and promising platform for compute-inmemory (CIM) edge AI. However, the switching mechanism and controllability of RRAM are still under debate owing to the influence of multiphysics. Although physics-informed neural networks (PINNs) are successful in achieving mesh-free multiphysics solutions in many applications, the resultant accuracy is not satisfactory in RRAM analyses. This work investigates the characteristics of RRAM devices - retention and reset transition which are described in terms of the dissolution of a conductive filament (CF) in 3-D axis-symmetric geometry. Specifically, we provide a novel neural network characterization of ion migration, Joule heating, and carrier transport, governed by the solutions of partial differential equations (PDEs). Motivated by physics-informed learning, the separation of variables (SOV) method and the neural tangent kernel (NTK) theory, we propose a customized 3-channel fully-connected network and a modified random Fourier feature (mRFF) embedding strategy to capture multiscale properties and appropriate frequency features of the self-consistent multiphysics solutions. The proposed model eliminates the need for grid meshing and temporal iterations widely used in RRAM analysis. Experiments then confirm its superior accuracy over competing physics-informed methods.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"381 1","pages":"746-751"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LIPSTICK: Corruptibility-Aware and Explainable Graph Neural Network-based Oracle-Less Attack on Logic Locking LIPSTICK:基于可破译和可解释图神经网络的逻辑锁定无谕攻击
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473982
Yeganeh Aghamohammadi, Amin Rezaei
In a zero-trust fabless paradigm, designers are increasingly concerned about hardware-based attacks on the semiconductor supply chain. Logic locking is a design-for-trust method that adds extra key-controlled gates in the circuits to prevent hardware intellectual property theft and overproduction. While attackers have traditionally relied on an oracle to attack logic-locked circuits, machine learning attacks have shown the ability to retrieve the secret key even without access to an oracle. In this paper, we first examine the limitations of state-of-the-art machine learning attacks and argue that the use of key hamming distance as the sole model-guiding structural metric is not always useful. Then, we develop, train, and test a corruptibility-aware graph neural network-based oracle-less attack on logic locking that takes into consideration both the structure and the behavior of the circuits. Our model is explainable in the sense that we analyze what the machine learning model has interpreted in the training process and how it can perform a successful attack. Chip designers may find this information beneficial in securing their designs while avoiding incremental fixes.
在零信任的无晶圆厂模式下,设计人员越来越担心半导体供应链受到基于硬件的攻击。逻辑锁定是一种为信任而设计的方法,它在电路中增加了额外的密钥控制门,以防止硬件知识产权被盗和生产过剩。虽然攻击者传统上依靠神谕来攻击逻辑锁定电路,但机器学习攻击已显示出即使无法访问神谕也能检索秘钥的能力。在本文中,我们首先研究了最先进的机器学习攻击的局限性,并认为使用密钥汉明距离作为唯一的模型指导结构度量并不总是有用的。然后,我们开发、训练并测试了一种基于图神经网络的无甲骨文攻击逻辑锁的可破坏性感知模型,它同时考虑了电路的结构和行为。我们的模型是可解释的,因为我们分析了机器学习模型在训练过程中解释了什么,以及它如何能成功执行攻击。芯片设计人员可能会发现这些信息有助于确保其设计的安全性,同时避免增量修复。
{"title":"LIPSTICK: Corruptibility-Aware and Explainable Graph Neural Network-based Oracle-Less Attack on Logic Locking","authors":"Yeganeh Aghamohammadi, Amin Rezaei","doi":"10.1109/ASP-DAC58780.2024.10473982","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473982","url":null,"abstract":"In a zero-trust fabless paradigm, designers are increasingly concerned about hardware-based attacks on the semiconductor supply chain. Logic locking is a design-for-trust method that adds extra key-controlled gates in the circuits to prevent hardware intellectual property theft and overproduction. While attackers have traditionally relied on an oracle to attack logic-locked circuits, machine learning attacks have shown the ability to retrieve the secret key even without access to an oracle. In this paper, we first examine the limitations of state-of-the-art machine learning attacks and argue that the use of key hamming distance as the sole model-guiding structural metric is not always useful. Then, we develop, train, and test a corruptibility-aware graph neural network-based oracle-less attack on logic locking that takes into consideration both the structure and the behavior of the circuits. Our model is explainable in the sense that we analyze what the machine learning model has interpreted in the training process and how it can perform a successful attack. Chip designers may find this information beneficial in securing their designs while avoiding incremental fixes.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"372 3","pages":"606-611"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SWAT: An Efficient Swin Transformer Accelerator Based on FPGA SWAT:基于 FPGA 的高效 Swin 变压器加速器
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473931
Qiwei Dong, Xiaoru Xie, Zhongfeng Wang
Swin Transformer achieves greater efficiency than Vision Transformer by utilizing local self-attention and shifted windows. However, existing hardware accelerators designed for Transformer have not been optimized for the unique computation flow and data reuse property in Swin Transformer, resulting in lower hardware utilization and extra memory accesses. To address this issue, we develop SWAT, an efficient Swin Transformer Accelerator based on FPGA. Firstly, to eliminate the redundant computations in shifted windows, a novel tiling strategy is employed, which helps the developed multiplier array to fully utilize the sparsity. Additionally, we deploy a dynamic pipeline interleaving dataflow, which not only reduces the processing latency but also maximizes data reuse, thereby decreasing access to memories. Furthermore, customized quantization strategies and approximate calculations for non-linear calculations are adopted to simplify the hardware complexity with negligible network accuracy loss. We implement SWAT on the Xilinx Alveo U50 platform and evaluate it with Swin-T on the ImageNet dataset. The proposed architecture can achieve improvements of $2.02 times sim 3.11 times$ in power efficiency compared to existing Transformer accelerators on FPGAs.
与 Vision Transformer 相比,Swin Transformer 利用本地自关注和移位窗口实现了更高的效率。然而,为 Transformer 设计的现有硬件加速器没有针对 Swin Transformer 独特的计算流和数据重用特性进行优化,导致硬件利用率较低和额外的内存访问。为解决这一问题,我们开发了基于 FPGA 的高效 Swin Transformer 加速器 SWAT。首先,为了消除移位窗口中的冗余计算,我们采用了一种新颖的平铺策略,这有助于所开发的乘法器阵列充分利用稀疏性。此外,我们还部署了动态流水线交错数据流,不仅降低了处理延迟,还最大限度地提高了数据重用率,从而减少了对存储器的访问。此外,我们还针对非线性计算采用了定制的量化策略和近似计算方法,以简化硬件复杂性,而网络精度损失却可以忽略不计。我们在 Xilinx Alveo U50 平台上实现了 SWAT,并在 ImageNet 数据集上与 Swin-T 进行了评估。与 FPGA 上现有的 Transformer 加速器相比,所提出的架构可将能效提高 2.02 美元(次)、3.11 美元(次)。
{"title":"SWAT: An Efficient Swin Transformer Accelerator Based on FPGA","authors":"Qiwei Dong, Xiaoru Xie, Zhongfeng Wang","doi":"10.1109/ASP-DAC58780.2024.10473931","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473931","url":null,"abstract":"Swin Transformer achieves greater efficiency than Vision Transformer by utilizing local self-attention and shifted windows. However, existing hardware accelerators designed for Transformer have not been optimized for the unique computation flow and data reuse property in Swin Transformer, resulting in lower hardware utilization and extra memory accesses. To address this issue, we develop SWAT, an efficient Swin Transformer Accelerator based on FPGA. Firstly, to eliminate the redundant computations in shifted windows, a novel tiling strategy is employed, which helps the developed multiplier array to fully utilize the sparsity. Additionally, we deploy a dynamic pipeline interleaving dataflow, which not only reduces the processing latency but also maximizes data reuse, thereby decreasing access to memories. Furthermore, customized quantization strategies and approximate calculations for non-linear calculations are adopted to simplify the hardware complexity with negligible network accuracy loss. We implement SWAT on the Xilinx Alveo U50 platform and evaluate it with Swin-T on the ImageNet dataset. The proposed architecture can achieve improvements of $2.02 times sim 3.11 times$ in power efficiency compared to existing Transformer accelerators on FPGAs.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"340 2","pages":"515-520"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collaborative Coalescing of Redundant Memory Access for GPU System 为 GPU 系统协同聚合冗余内存访问
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473837
Fan Jiang, Chengeng Li, Wei Zhang, Jiang Xu
GPU-based computing serves as the primary solution driving the performance of HPC systems. However, modern GPU systems encounter performance bottlenecks resulting from heavy memory access traffic and insufficient NoC bandwidth. In this work, we propose a collaborative coalescing mechanism aimed at eliminating redundant memory access and boosting GPU system performance. To achieve this, we design a coalescing unit for each memory partition, effectively merging requests from both inter-cluster and intra-cluster SMs. Additionally, we introduce a hierarchical multicast module to replicate and distribute the coalesced reply messages to multiple destination SMs. Experimental results show that our method achieves 20.6% improvement on performance and 27.1% reduction on NoC traffic over the baseline.
基于 GPU 的计算是提高高性能计算系统性能的主要解决方案。然而,由于内存访问流量大和 NoC 带宽不足,现代 GPU 系统遇到了性能瓶颈。在这项工作中,我们提出了一种协作凝聚机制,旨在消除冗余内存访问并提高 GPU 系统性能。为此,我们为每个内存分区设计了一个聚合单元,有效地合并了来自集群间和集群内 SM 的请求。此外,我们还引入了一个分层组播模块,用于将合并后的回复信息复制并分发到多个目标 SM。实验结果表明,与基线相比,我们的方法提高了 20.6% 的性能,减少了 27.1% 的 NoC 流量。
{"title":"Collaborative Coalescing of Redundant Memory Access for GPU System","authors":"Fan Jiang, Chengeng Li, Wei Zhang, Jiang Xu","doi":"10.1109/ASP-DAC58780.2024.10473837","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473837","url":null,"abstract":"GPU-based computing serves as the primary solution driving the performance of HPC systems. However, modern GPU systems encounter performance bottlenecks resulting from heavy memory access traffic and insufficient NoC bandwidth. In this work, we propose a collaborative coalescing mechanism aimed at eliminating redundant memory access and boosting GPU system performance. To achieve this, we design a coalescing unit for each memory partition, effectively merging requests from both inter-cluster and intra-cluster SMs. Additionally, we introduce a hierarchical multicast module to replicate and distribute the coalesced reply messages to multiple destination SMs. Experimental results show that our method achieves 20.6% improvement on performance and 27.1% reduction on NoC traffic over the baseline.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"99 1","pages":"195-200"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ZEBRA: A Zero-Bit Robust-Accumulation Compute-In-Memory Approach for Neural Network Acceleration Utilizing Different Bitwise Patterns ZEBRA:利用不同比特模式实现神经网络加速的零比特稳健累积计算内存方法
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473851
Yiming Chen, Guodong Yin, Hongtao Zhong, Ming-En Lee, Huazhong Yang, Sumitha George, Vijaykrishnan Narayanan, Xueqing Li
Deploying a lightweight quantized model in compute-in-memory (CIM) might result in significant accuracy degradation due to reduced signal-noise rate (SNR). To address this issue, this paper presents ZEBRA, a zero-bit robust-accumulation CIM approach, which utilizes bitwise zero patterns to compress computation with ultra-high resilience against noise due to circuit non-idealities, etc. First, ZEBRA provides a cross-level design that successfully exploits value-adaptive zero-bit patterns to improve the performance in robust 8-bit quantization dramatically. Second, ZEBRA presents a multi-level local computing unit circuit design to implement the bitwise sparsity pattern, which boosts the area/energy efficiency by 2x-4x compared with existing CIM works. Experiments demonstrate that ZEBRA can achieve <1.0% accuracy loss in CIFAR10/100 with typical noise, while conventional CIM works suffer from > 10% accuracy loss. Such robustness leads to much more stable accuracy for high-parallelism inference on large models in practice.
在内存计算(CIM)中部署轻量级量化模型可能会因信噪比(SNR)降低而导致精度显著下降。为解决这一问题,本文提出了零位稳健累积 CIM 方法 ZEBRA,该方法利用顺位零模式压缩计算,具有超高的抗噪声能力,可抵御电路非理想性等造成的噪声。首先,ZEBRA 提供了一种跨级设计,成功地利用了值自适应零位模式,显著提高了稳健 8 位量化的性能。其次,ZEBRA 提出了一种多级本地计算单元电路设计来实现位向稀疏性模式,与现有的 CIM 作品相比,其面积/能效提高了 2 倍至 4 倍。实验证明,ZEBRA 可以实现 10% 的精度损失。这种鲁棒性为大型模型的高并行性推理带来了更稳定的精度。
{"title":"ZEBRA: A Zero-Bit Robust-Accumulation Compute-In-Memory Approach for Neural Network Acceleration Utilizing Different Bitwise Patterns","authors":"Yiming Chen, Guodong Yin, Hongtao Zhong, Ming-En Lee, Huazhong Yang, Sumitha George, Vijaykrishnan Narayanan, Xueqing Li","doi":"10.1109/ASP-DAC58780.2024.10473851","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473851","url":null,"abstract":"Deploying a lightweight quantized model in compute-in-memory (CIM) might result in significant accuracy degradation due to reduced signal-noise rate (SNR). To address this issue, this paper presents ZEBRA, a zero-bit robust-accumulation CIM approach, which utilizes bitwise zero patterns to compress computation with ultra-high resilience against noise due to circuit non-idealities, etc. First, ZEBRA provides a cross-level design that successfully exploits value-adaptive zero-bit patterns to improve the performance in robust 8-bit quantization dramatically. Second, ZEBRA presents a multi-level local computing unit circuit design to implement the bitwise sparsity pattern, which boosts the area/energy efficiency by 2x-4x compared with existing CIM works. Experiments demonstrate that ZEBRA can achieve <1.0% accuracy loss in CIFAR10/100 with typical noise, while conventional CIM works suffer from > 10% accuracy loss. Such robustness leads to much more stable accuracy for high-parallelism inference on large models in practice.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"68 3","pages":"153-158"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Microscope: Causality Inference Crossing the Hardware and Software Boundary from Hardware Perspective 显微镜:从硬件角度看跨越硬件和软件界限的因果推理
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473793
Zhaoxiang Liu, Kejun Chen, Dean Sullivan, Orlando Arias, R. Dutta, Yier Jin, Xiaolong Guo
The increasing complexity of System-on-Chip (SoC) designs and the rise of third-party vendors in the semiconductor industry have led to unprecedented security concerns. Traditional formal methods struggle to address software-exploited hardware bugs, and existing solutions for hardware-software co-verification often fall short. This paper presents Microscope, a novel framework for inferring software instruction patterns that can trigger hardware vulnerabilities in SoC designs. Microscope enhances the Structural Causal Model (SCM) with hardware features, creating a scalable Hardware Structural Causal Model (HW-SCM). A domain-specific language (DSL) in SMT-LIB represents the HW-SCM and predefined security properties, with incremental SMT solving deducing possible instructions. Microscope identifies causality to determine whether a hardware threat could result from any software events, providing a valuable resource for patching hardware bugs and generating test input. Extensive experimentation demonstrates Microscope’s capability to infer the causality of a wide range of vulnerabilities and bugs located in SoC-level benchmarks.
系统级芯片(SoC)设计的复杂性不断增加以及半导体行业第三方供应商的崛起,引发了前所未有的安全问题。传统的形式化方法难以解决被软件利用的硬件漏洞,而现有的软硬件协同验证解决方案也往往不尽如人意。本文介绍的 Microscope 是一种新型框架,用于推断 SoC 设计中可能触发硬件漏洞的软件指令模式。Microscope 利用硬件特性增强了结构因果模型(SCM),创建了可扩展的硬件结构因果模型(HW-SCM)。SMT-LIB 中的特定领域语言 (DSL) 表示 HW-SCM 和预定义的安全属性,并通过增量 SMT 解算推导出可能的指令。Microscope 可识别因果关系,确定硬件威胁是否可能源于任何软件事件,为修补硬件漏洞和生成测试输入提供了宝贵的资源。大量实验证明,Microscope 能够推断出 SoC 级基准中各种漏洞和错误的因果关系。
{"title":"Microscope: Causality Inference Crossing the Hardware and Software Boundary from Hardware Perspective","authors":"Zhaoxiang Liu, Kejun Chen, Dean Sullivan, Orlando Arias, R. Dutta, Yier Jin, Xiaolong Guo","doi":"10.1109/ASP-DAC58780.2024.10473793","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473793","url":null,"abstract":"The increasing complexity of System-on-Chip (SoC) designs and the rise of third-party vendors in the semiconductor industry have led to unprecedented security concerns. Traditional formal methods struggle to address software-exploited hardware bugs, and existing solutions for hardware-software co-verification often fall short. This paper presents Microscope, a novel framework for inferring software instruction patterns that can trigger hardware vulnerabilities in SoC designs. Microscope enhances the Structural Causal Model (SCM) with hardware features, creating a scalable Hardware Structural Causal Model (HW-SCM). A domain-specific language (DSL) in SMT-LIB represents the HW-SCM and predefined security properties, with incremental SMT solving deducing possible instructions. Microscope identifies causality to determine whether a hardware threat could result from any software events, providing a valuable resource for patching hardware bugs and generating test input. Extensive experimentation demonstrates Microscope’s capability to infer the causality of a wide range of vulnerabilities and bugs located in SoC-level benchmarks.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"288 16-17","pages":"933-938"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140530972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated synthesis of mixed-signal ML inference hardware under accuracy constraints 精度限制下混合信号 ML 推理硬件的自动合成
Pub Date : 2024-01-22 DOI: 10.1109/ASP-DAC58780.2024.10473942
K. Kunal, Jitesh Poojary, S. Ramprasath, Ramesh Harjani, S. Sapatnekar
Due to the inherent error-tolerance of machine learning (ML) algorithms, many parts of the inference computation can be performed with adequate accuracy and low power under relatively low precision. Early approaches have used digital approximate computing methods to explore this space. Recent approaches using analog-based operations achieve power-efficient computation at moderate precision. This work proposes a mixed-signal optimization (MiSO) approach that optimally blends analog and digital computation for ML inference. Based on accuracy and power models, an integer linear programming formulation is used to optimize design metrics of analog/digital implementations. The efficacy of the method is demonstrated on multiple ML architectures.
由于机器学习(ML)算法固有的容错性,推理计算的许多部分都可以在相对较低的精度下以足够的精度和较低的功耗执行。早期的方法使用数字近似计算方法来探索这一空间。最近的方法使用基于模拟的运算,在中等精度下实现了高能效计算。本研究提出了一种混合信号优化(MiSO)方法,可将模拟计算和数字计算最佳地融合到 ML 推断中。在精度和功耗模型的基础上,使用整数线性规划公式来优化模拟/数字实现的设计指标。该方法在多种 ML 架构上的功效得到了验证。
{"title":"Automated synthesis of mixed-signal ML inference hardware under accuracy constraints","authors":"K. Kunal, Jitesh Poojary, S. Ramprasath, Ramesh Harjani, S. Sapatnekar","doi":"10.1109/ASP-DAC58780.2024.10473942","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473942","url":null,"abstract":"Due to the inherent error-tolerance of machine learning (ML) algorithms, many parts of the inference computation can be performed with adequate accuracy and low power under relatively low precision. Early approaches have used digital approximate computing methods to explore this space. Recent approaches using analog-based operations achieve power-efficient computation at moderate precision. This work proposes a mixed-signal optimization (MiSO) approach that optimally blends analog and digital computation for ML inference. Based on accuracy and power models, an integer linear programming formulation is used to optimize design metrics of analog/digital implementations. The efficacy of the method is demonstrated on multiple ML architectures.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"186 1","pages":"478-483"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1