Detailed routing is a crucial and time-consuming stage for ASIC design. As the number and complexity of design rules increase, it is challenging to achieve high solution quality and fast speed at the same time in detailed routing. In this work, a high performance detailed routing algorithm named IPAG with integer programming (IP) is proposed. The IP formulation uses the selection of candidate routes as decision variables. High quality candidate routes are generated by queue-based rip-up and reroute with adaptive global route guidance. A design rule checking engine which can simultaneously process nets with multiple routes is designed, to efficiently construct penalty parameters in the IP formulation. Experimental results on ISPD 2018 detailed routing benchmark show that IPAG achieves better solution quality in shorter or comparable runtime, as compared to the state-of-the-art academic detailed router.
{"title":"A High Performance Detailed Router Based on Integer Programming with Adaptive Route Guides","authors":"Zhongdong Qi, Shizhe Hu, Qi Peng, Hailong You, Chao Han, Zhangming Zhu","doi":"10.1109/ASP-DAC58780.2024.10473934","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473934","url":null,"abstract":"Detailed routing is a crucial and time-consuming stage for ASIC design. As the number and complexity of design rules increase, it is challenging to achieve high solution quality and fast speed at the same time in detailed routing. In this work, a high performance detailed routing algorithm named IPAG with integer programming (IP) is proposed. The IP formulation uses the selection of candidate routes as decision variables. High quality candidate routes are generated by queue-based rip-up and reroute with adaptive global route guidance. A design rule checking engine which can simultaneously process nets with multiple routes is designed, to efficiently construct penalty parameters in the IP formulation. Experimental results on ISPD 2018 detailed routing benchmark show that IPAG achieves better solution quality in shorter or comparable runtime, as compared to the state-of-the-art academic detailed router.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"58 9-10","pages":"975-980"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140530928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-22DOI: 10.1109/ASP-DAC58780.2024.10473889
Gianluca Radi, A. Calvino, Giovanni De Micheli
Technology mapping transforms a technology-independent representation into a technology-dependent one given a library of cells. This process is performed by means of local replacements that are extracted by matching sections of the subject graph to library cells. Matching techniques are classified mainly into pattern and Boolean. These two techniques differ in quality and number of generated matches, scalability, and run time. This paper proposes hybrid matching, a new methodology that integrates both techniques in a technology mapping algorithm. In particular, pattern matching is used to speed up the matching phase and support large cells. Boolean matching is used to increase the number of matches and quality. Compared to Boolean matching, we show that hybrid matching yields an average reduction in the area and run time by 6% and 25%, respectively, with similar delay.
{"title":"In Medio Stat Virtus*: Combining Boolean and Pattern Matching","authors":"Gianluca Radi, A. Calvino, Giovanni De Micheli","doi":"10.1109/ASP-DAC58780.2024.10473889","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473889","url":null,"abstract":"Technology mapping transforms a technology-independent representation into a technology-dependent one given a library of cells. This process is performed by means of local replacements that are extracted by matching sections of the subject graph to library cells. Matching techniques are classified mainly into pattern and Boolean. These two techniques differ in quality and number of generated matches, scalability, and run time. This paper proposes hybrid matching, a new methodology that integrates both techniques in a technology mapping algorithm. In particular, pattern matching is used to speed up the matching phase and support large cells. Boolean matching is used to increase the number of matches and quality. Compared to Boolean matching, we show that hybrid matching yields an average reduction in the area and run time by 6% and 25%, respectively, with similar delay.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"259 6","pages":"404-410"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-22DOI: 10.1109/ASP-DAC58780.2024.10473912
Yufeng Li, Yiwei Ci, Qiusong Yang
Design verification is a complex and costly task, especially for large and intricate processor projects. Formal verification techniques provide advantages by thoroughly examining design behaviors, but they require extensive labor and expertise in property formulation. Recent research focuses on verifying designs using the self-consistency universal property, reducing verification difficulty as it is design-independent. However, the single self-consistency property faces false positives and scalability issues due to exponential state space growth. To tackle these challenges, this paper introduces TIUP, a technique using tautologies as universal properties. We show how TIUP effectively uses tautologies as abstract specifications, covering processor data and control paths. TIUP simplifies and streamlines verification for engineers, enabling efficient formal processor verification.
{"title":"TIUP: Effective Processor Verification with Tautology-Induced Universal Properties","authors":"Yufeng Li, Yiwei Ci, Qiusong Yang","doi":"10.1109/ASP-DAC58780.2024.10473912","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473912","url":null,"abstract":"Design verification is a complex and costly task, especially for large and intricate processor projects. Formal verification techniques provide advantages by thoroughly examining design behaviors, but they require extensive labor and expertise in property formulation. Recent research focuses on verifying designs using the self-consistency universal property, reducing verification difficulty as it is design-independent. However, the single self-consistency property faces false positives and scalability issues due to exponential state space growth. To tackle these challenges, this paper introduces TIUP, a technique using tautologies as universal properties. We show how TIUP effectively uses tautologies as abstract specifications, covering processor data and control paths. TIUP simplifies and streamlines verification for engineers, enabling efficient formal processor verification.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"83 3","pages":"269-274"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-22DOI: 10.1109/ASP-DAC58780.2024.10473856
Tianshu Hou, Yuan Ren, Wenyong Zhou, Can Li, Zhongrui Wang, Haibao Chen, Ngai Wong
Resistive random-access memory (RRAM) constitutes an emerging and promising platform for compute-inmemory (CIM) edge AI. However, the switching mechanism and controllability of RRAM are still under debate owing to the influence of multiphysics. Although physics-informed neural networks (PINNs) are successful in achieving mesh-free multiphysics solutions in many applications, the resultant accuracy is not satisfactory in RRAM analyses. This work investigates the characteristics of RRAM devices - retention and reset transition which are described in terms of the dissolution of a conductive filament (CF) in 3-D axis-symmetric geometry. Specifically, we provide a novel neural network characterization of ion migration, Joule heating, and carrier transport, governed by the solutions of partial differential equations (PDEs). Motivated by physics-informed learning, the separation of variables (SOV) method and the neural tangent kernel (NTK) theory, we propose a customized 3-channel fully-connected network and a modified random Fourier feature (mRFF) embedding strategy to capture multiscale properties and appropriate frequency features of the self-consistent multiphysics solutions. The proposed model eliminates the need for grid meshing and temporal iterations widely used in RRAM analysis. Experiments then confirm its superior accuracy over competing physics-informed methods.
{"title":"Physics-Informed Learning for Versatile RRAM Reset and Retention Simulation","authors":"Tianshu Hou, Yuan Ren, Wenyong Zhou, Can Li, Zhongrui Wang, Haibao Chen, Ngai Wong","doi":"10.1109/ASP-DAC58780.2024.10473856","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473856","url":null,"abstract":"Resistive random-access memory (RRAM) constitutes an emerging and promising platform for compute-inmemory (CIM) edge AI. However, the switching mechanism and controllability of RRAM are still under debate owing to the influence of multiphysics. Although physics-informed neural networks (PINNs) are successful in achieving mesh-free multiphysics solutions in many applications, the resultant accuracy is not satisfactory in RRAM analyses. This work investigates the characteristics of RRAM devices - retention and reset transition which are described in terms of the dissolution of a conductive filament (CF) in 3-D axis-symmetric geometry. Specifically, we provide a novel neural network characterization of ion migration, Joule heating, and carrier transport, governed by the solutions of partial differential equations (PDEs). Motivated by physics-informed learning, the separation of variables (SOV) method and the neural tangent kernel (NTK) theory, we propose a customized 3-channel fully-connected network and a modified random Fourier feature (mRFF) embedding strategy to capture multiscale properties and appropriate frequency features of the self-consistent multiphysics solutions. The proposed model eliminates the need for grid meshing and temporal iterations widely used in RRAM analysis. Experiments then confirm its superior accuracy over competing physics-informed methods.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"381 1","pages":"746-751"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-22DOI: 10.1109/ASP-DAC58780.2024.10473982
Yeganeh Aghamohammadi, Amin Rezaei
In a zero-trust fabless paradigm, designers are increasingly concerned about hardware-based attacks on the semiconductor supply chain. Logic locking is a design-for-trust method that adds extra key-controlled gates in the circuits to prevent hardware intellectual property theft and overproduction. While attackers have traditionally relied on an oracle to attack logic-locked circuits, machine learning attacks have shown the ability to retrieve the secret key even without access to an oracle. In this paper, we first examine the limitations of state-of-the-art machine learning attacks and argue that the use of key hamming distance as the sole model-guiding structural metric is not always useful. Then, we develop, train, and test a corruptibility-aware graph neural network-based oracle-less attack on logic locking that takes into consideration both the structure and the behavior of the circuits. Our model is explainable in the sense that we analyze what the machine learning model has interpreted in the training process and how it can perform a successful attack. Chip designers may find this information beneficial in securing their designs while avoiding incremental fixes.
{"title":"LIPSTICK: Corruptibility-Aware and Explainable Graph Neural Network-based Oracle-Less Attack on Logic Locking","authors":"Yeganeh Aghamohammadi, Amin Rezaei","doi":"10.1109/ASP-DAC58780.2024.10473982","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473982","url":null,"abstract":"In a zero-trust fabless paradigm, designers are increasingly concerned about hardware-based attacks on the semiconductor supply chain. Logic locking is a design-for-trust method that adds extra key-controlled gates in the circuits to prevent hardware intellectual property theft and overproduction. While attackers have traditionally relied on an oracle to attack logic-locked circuits, machine learning attacks have shown the ability to retrieve the secret key even without access to an oracle. In this paper, we first examine the limitations of state-of-the-art machine learning attacks and argue that the use of key hamming distance as the sole model-guiding structural metric is not always useful. Then, we develop, train, and test a corruptibility-aware graph neural network-based oracle-less attack on logic locking that takes into consideration both the structure and the behavior of the circuits. Our model is explainable in the sense that we analyze what the machine learning model has interpreted in the training process and how it can perform a successful attack. Chip designers may find this information beneficial in securing their designs while avoiding incremental fixes.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"372 3","pages":"606-611"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-22DOI: 10.1109/ASP-DAC58780.2024.10473931
Qiwei Dong, Xiaoru Xie, Zhongfeng Wang
Swin Transformer achieves greater efficiency than Vision Transformer by utilizing local self-attention and shifted windows. However, existing hardware accelerators designed for Transformer have not been optimized for the unique computation flow and data reuse property in Swin Transformer, resulting in lower hardware utilization and extra memory accesses. To address this issue, we develop SWAT, an efficient Swin Transformer Accelerator based on FPGA. Firstly, to eliminate the redundant computations in shifted windows, a novel tiling strategy is employed, which helps the developed multiplier array to fully utilize the sparsity. Additionally, we deploy a dynamic pipeline interleaving dataflow, which not only reduces the processing latency but also maximizes data reuse, thereby decreasing access to memories. Furthermore, customized quantization strategies and approximate calculations for non-linear calculations are adopted to simplify the hardware complexity with negligible network accuracy loss. We implement SWAT on the Xilinx Alveo U50 platform and evaluate it with Swin-T on the ImageNet dataset. The proposed architecture can achieve improvements of $2.02 times sim 3.11 times$ in power efficiency compared to existing Transformer accelerators on FPGAs.
{"title":"SWAT: An Efficient Swin Transformer Accelerator Based on FPGA","authors":"Qiwei Dong, Xiaoru Xie, Zhongfeng Wang","doi":"10.1109/ASP-DAC58780.2024.10473931","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473931","url":null,"abstract":"Swin Transformer achieves greater efficiency than Vision Transformer by utilizing local self-attention and shifted windows. However, existing hardware accelerators designed for Transformer have not been optimized for the unique computation flow and data reuse property in Swin Transformer, resulting in lower hardware utilization and extra memory accesses. To address this issue, we develop SWAT, an efficient Swin Transformer Accelerator based on FPGA. Firstly, to eliminate the redundant computations in shifted windows, a novel tiling strategy is employed, which helps the developed multiplier array to fully utilize the sparsity. Additionally, we deploy a dynamic pipeline interleaving dataflow, which not only reduces the processing latency but also maximizes data reuse, thereby decreasing access to memories. Furthermore, customized quantization strategies and approximate calculations for non-linear calculations are adopted to simplify the hardware complexity with negligible network accuracy loss. We implement SWAT on the Xilinx Alveo U50 platform and evaluate it with Swin-T on the ImageNet dataset. The proposed architecture can achieve improvements of $2.02 times sim 3.11 times$ in power efficiency compared to existing Transformer accelerators on FPGAs.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"340 2","pages":"515-520"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-22DOI: 10.1109/ASP-DAC58780.2024.10473837
Fan Jiang, Chengeng Li, Wei Zhang, Jiang Xu
GPU-based computing serves as the primary solution driving the performance of HPC systems. However, modern GPU systems encounter performance bottlenecks resulting from heavy memory access traffic and insufficient NoC bandwidth. In this work, we propose a collaborative coalescing mechanism aimed at eliminating redundant memory access and boosting GPU system performance. To achieve this, we design a coalescing unit for each memory partition, effectively merging requests from both inter-cluster and intra-cluster SMs. Additionally, we introduce a hierarchical multicast module to replicate and distribute the coalesced reply messages to multiple destination SMs. Experimental results show that our method achieves 20.6% improvement on performance and 27.1% reduction on NoC traffic over the baseline.
{"title":"Collaborative Coalescing of Redundant Memory Access for GPU System","authors":"Fan Jiang, Chengeng Li, Wei Zhang, Jiang Xu","doi":"10.1109/ASP-DAC58780.2024.10473837","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473837","url":null,"abstract":"GPU-based computing serves as the primary solution driving the performance of HPC systems. However, modern GPU systems encounter performance bottlenecks resulting from heavy memory access traffic and insufficient NoC bandwidth. In this work, we propose a collaborative coalescing mechanism aimed at eliminating redundant memory access and boosting GPU system performance. To achieve this, we design a coalescing unit for each memory partition, effectively merging requests from both inter-cluster and intra-cluster SMs. Additionally, we introduce a hierarchical multicast module to replicate and distribute the coalesced reply messages to multiple destination SMs. Experimental results show that our method achieves 20.6% improvement on performance and 27.1% reduction on NoC traffic over the baseline.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"99 1","pages":"195-200"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deploying a lightweight quantized model in compute-in-memory (CIM) might result in significant accuracy degradation due to reduced signal-noise rate (SNR). To address this issue, this paper presents ZEBRA, a zero-bit robust-accumulation CIM approach, which utilizes bitwise zero patterns to compress computation with ultra-high resilience against noise due to circuit non-idealities, etc. First, ZEBRA provides a cross-level design that successfully exploits value-adaptive zero-bit patterns to improve the performance in robust 8-bit quantization dramatically. Second, ZEBRA presents a multi-level local computing unit circuit design to implement the bitwise sparsity pattern, which boosts the area/energy efficiency by 2x-4x compared with existing CIM works. Experiments demonstrate that ZEBRA can achieve <1.0% accuracy loss in CIFAR10/100 with typical noise, while conventional CIM works suffer from > 10% accuracy loss. Such robustness leads to much more stable accuracy for high-parallelism inference on large models in practice.
{"title":"ZEBRA: A Zero-Bit Robust-Accumulation Compute-In-Memory Approach for Neural Network Acceleration Utilizing Different Bitwise Patterns","authors":"Yiming Chen, Guodong Yin, Hongtao Zhong, Ming-En Lee, Huazhong Yang, Sumitha George, Vijaykrishnan Narayanan, Xueqing Li","doi":"10.1109/ASP-DAC58780.2024.10473851","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473851","url":null,"abstract":"Deploying a lightweight quantized model in compute-in-memory (CIM) might result in significant accuracy degradation due to reduced signal-noise rate (SNR). To address this issue, this paper presents ZEBRA, a zero-bit robust-accumulation CIM approach, which utilizes bitwise zero patterns to compress computation with ultra-high resilience against noise due to circuit non-idealities, etc. First, ZEBRA provides a cross-level design that successfully exploits value-adaptive zero-bit patterns to improve the performance in robust 8-bit quantization dramatically. Second, ZEBRA presents a multi-level local computing unit circuit design to implement the bitwise sparsity pattern, which boosts the area/energy efficiency by 2x-4x compared with existing CIM works. Experiments demonstrate that ZEBRA can achieve <1.0% accuracy loss in CIFAR10/100 with typical noise, while conventional CIM works suffer from > 10% accuracy loss. Such robustness leads to much more stable accuracy for high-parallelism inference on large models in practice.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"68 3","pages":"153-158"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-22DOI: 10.1109/ASP-DAC58780.2024.10473793
Zhaoxiang Liu, Kejun Chen, Dean Sullivan, Orlando Arias, R. Dutta, Yier Jin, Xiaolong Guo
The increasing complexity of System-on-Chip (SoC) designs and the rise of third-party vendors in the semiconductor industry have led to unprecedented security concerns. Traditional formal methods struggle to address software-exploited hardware bugs, and existing solutions for hardware-software co-verification often fall short. This paper presents Microscope, a novel framework for inferring software instruction patterns that can trigger hardware vulnerabilities in SoC designs. Microscope enhances the Structural Causal Model (SCM) with hardware features, creating a scalable Hardware Structural Causal Model (HW-SCM). A domain-specific language (DSL) in SMT-LIB represents the HW-SCM and predefined security properties, with incremental SMT solving deducing possible instructions. Microscope identifies causality to determine whether a hardware threat could result from any software events, providing a valuable resource for patching hardware bugs and generating test input. Extensive experimentation demonstrates Microscope’s capability to infer the causality of a wide range of vulnerabilities and bugs located in SoC-level benchmarks.
{"title":"Microscope: Causality Inference Crossing the Hardware and Software Boundary from Hardware Perspective","authors":"Zhaoxiang Liu, Kejun Chen, Dean Sullivan, Orlando Arias, R. Dutta, Yier Jin, Xiaolong Guo","doi":"10.1109/ASP-DAC58780.2024.10473793","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473793","url":null,"abstract":"The increasing complexity of System-on-Chip (SoC) designs and the rise of third-party vendors in the semiconductor industry have led to unprecedented security concerns. Traditional formal methods struggle to address software-exploited hardware bugs, and existing solutions for hardware-software co-verification often fall short. This paper presents Microscope, a novel framework for inferring software instruction patterns that can trigger hardware vulnerabilities in SoC designs. Microscope enhances the Structural Causal Model (SCM) with hardware features, creating a scalable Hardware Structural Causal Model (HW-SCM). A domain-specific language (DSL) in SMT-LIB represents the HW-SCM and predefined security properties, with incremental SMT solving deducing possible instructions. Microscope identifies causality to determine whether a hardware threat could result from any software events, providing a valuable resource for patching hardware bugs and generating test input. Extensive experimentation demonstrates Microscope’s capability to infer the causality of a wide range of vulnerabilities and bugs located in SoC-level benchmarks.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"288 16-17","pages":"933-938"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140530972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-22DOI: 10.1109/ASP-DAC58780.2024.10473942
K. Kunal, Jitesh Poojary, S. Ramprasath, Ramesh Harjani, S. Sapatnekar
Due to the inherent error-tolerance of machine learning (ML) algorithms, many parts of the inference computation can be performed with adequate accuracy and low power under relatively low precision. Early approaches have used digital approximate computing methods to explore this space. Recent approaches using analog-based operations achieve power-efficient computation at moderate precision. This work proposes a mixed-signal optimization (MiSO) approach that optimally blends analog and digital computation for ML inference. Based on accuracy and power models, an integer linear programming formulation is used to optimize design metrics of analog/digital implementations. The efficacy of the method is demonstrated on multiple ML architectures.
由于机器学习(ML)算法固有的容错性,推理计算的许多部分都可以在相对较低的精度下以足够的精度和较低的功耗执行。早期的方法使用数字近似计算方法来探索这一空间。最近的方法使用基于模拟的运算,在中等精度下实现了高能效计算。本研究提出了一种混合信号优化(MiSO)方法,可将模拟计算和数字计算最佳地融合到 ML 推断中。在精度和功耗模型的基础上,使用整数线性规划公式来优化模拟/数字实现的设计指标。该方法在多种 ML 架构上的功效得到了验证。
{"title":"Automated synthesis of mixed-signal ML inference hardware under accuracy constraints","authors":"K. Kunal, Jitesh Poojary, S. Ramprasath, Ramesh Harjani, S. Sapatnekar","doi":"10.1109/ASP-DAC58780.2024.10473942","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473942","url":null,"abstract":"Due to the inherent error-tolerance of machine learning (ML) algorithms, many parts of the inference computation can be performed with adequate accuracy and low power under relatively low precision. Early approaches have used digital approximate computing methods to explore this space. Recent approaches using analog-based operations achieve power-efficient computation at moderate precision. This work proposes a mixed-signal optimization (MiSO) approach that optimally blends analog and digital computation for ML inference. Based on accuracy and power models, an integer linear programming formulation is used to optimize design metrics of analog/digital implementations. The efficacy of the method is demonstrated on multiple ML architectures.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"186 1","pages":"478-483"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}