Accurate and efficient on-chip power modeling is crucial to runtime power, energy, and voltage management. Such power monitoring can be achieved by designing and integrating on-chip power meters (OPMs) into the target design. In this work, we propose a new method named DEEP to automatically develop extremely efficient OPM solutions for a given design. DEEP selects OPM inputs from all individual bits in RTL signals. Such bit-level selection provides an unprecedentedly large number ofinput candidates and supports lower hardware cost, compared with signal-level selection in prior works. In addition, DEEP proposes a powerful two-step OPM input selection method, and it supports reporting both total power and the power of major design components. Experiments on a commercial microprocessor demonstrate that DEEP's OPM solution achieves correlation R > 0.97 in per-cycle power prediction with an unprecedented low area overhead on hardware, i.e., < 0.1% of the microprocessor layout. This reduces the OPM hardware cost by 4 – 6× compared with the state-of-the-art solution.
{"title":"DEEP: Developing Extremely Efficient Runtime On-Chip Power Meters","authors":"Zhiyao Xie, Shiyu Li, Mingyuan Ma, Chen-Chia Chang, Jingyu Pan, Yiran Chen, Jiangkun Hu","doi":"10.1145/3508352.3549427","DOIUrl":"https://doi.org/10.1145/3508352.3549427","url":null,"abstract":"Accurate and efficient on-chip power modeling is crucial to runtime power, energy, and voltage management. Such power monitoring can be achieved by designing and integrating on-chip power meters (OPMs) into the target design. In this work, we propose a new method named DEEP to automatically develop extremely efficient OPM solutions for a given design. DEEP selects OPM inputs from all individual bits in RTL signals. Such bit-level selection provides an unprecedentedly large number ofinput candidates and supports lower hardware cost, compared with signal-level selection in prior works. In addition, DEEP proposes a powerful two-step OPM input selection method, and it supports reporting both total power and the power of major design components. Experiments on a commercial microprocessor demonstrate that DEEP's OPM solution achieves correlation R > 0.97 in per-cycle power prediction with an unprecedented low area overhead on hardware, i.e., < 0.1% of the microprocessor layout. This reduces the OPM hardware cost by 4 – 6× compared with the state-of-the-art solution.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129742138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hai-qiang Zhao, Michael Hefenbrock, M. Beigl, M. Tahoori
Printed electronics allow for ultra-low-cost circuit fabrication with unique properties such as flexibility, non-toxicity, and stretchability. Because of these advanced properties, there is a growing interest in adapting printed electronics for emerging areas such as fast-moving consumer goods and wearable technologies. In such domains, analog signal processing in or near the sensor is favorable. Printed neuromorphic circuits have been recently proposed as a solution to perform such analog processing natively. Additionally, their learning-based design process allows high efficiency of their optimization and enables them to mitigate the high process variations associated with low-cost printed processes. In this work, we address the aging of the printed components. This effect can significantly degrade the accuracy of printed neuromorphic circuits over time. For this, we develop a stochastic aging-model to describe the behavior of aged printed resistors and modify the training objective by considering the expected loss over the lifetime of the device. This approach ensures to provide acceptable accuracy over the device lifetime. Our experiments show that an overall 35.8% improvement in terms of expected accuracy over the device lifetime can be achieved using the proposed learning approach.
{"title":"Aging-Aware Training for Printed Neuromorphic Circuits","authors":"Hai-qiang Zhao, Michael Hefenbrock, M. Beigl, M. Tahoori","doi":"10.1145/3508352.3549411","DOIUrl":"https://doi.org/10.1145/3508352.3549411","url":null,"abstract":"Printed electronics allow for ultra-low-cost circuit fabrication with unique properties such as flexibility, non-toxicity, and stretchability. Because of these advanced properties, there is a growing interest in adapting printed electronics for emerging areas such as fast-moving consumer goods and wearable technologies. In such domains, analog signal processing in or near the sensor is favorable. Printed neuromorphic circuits have been recently proposed as a solution to perform such analog processing natively. Additionally, their learning-based design process allows high efficiency of their optimization and enables them to mitigate the high process variations associated with low-cost printed processes. In this work, we address the aging of the printed components. This effect can significantly degrade the accuracy of printed neuromorphic circuits over time. For this, we develop a stochastic aging-model to describe the behavior of aged printed resistors and modify the training objective by considering the expected loss over the lifetime of the device. This approach ensures to provide acceptable accuracy over the device lifetime. Our experiments show that an overall 35.8% improvement in terms of expected accuracy over the device lifetime can be achieved using the proposed learning approach.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129820862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, several open-source projects have shown potential to serve a future technology commons for EDA and design prototyping. This paper examines how open-source and proprietary EDA technologies will inevitably take on complementary roles within a future technology commons. Proprietary EDA technologies offer numerous benefits that will endure, including (i) exceptional technology and engineering; (ii) ever-increasing importance in design-based equivalent scaling and the overall semiconductor value chain; and (iii) well-established commercial and partner relationships. On the other hand, proprietary EDA technologies face challenges that will also endure, including (i) inability to pursue directions such as massive leverage of cloud compute, extreme reduction of turnaround times, or "free tools"; and (ii) difficulty in evolving and addressing new applications and markets. By contrast, open-source EDA technologies offer benefits that include (i) the capability to serve as a friction-free, democratized platform for education and future workforce development (i.e., as a platform for EDA research, and as a means of teaching / training both designers and EDA developers with public code); and (ii) addressing the needs of underserved, non-enterprise account markets (e.g., older nodes, research flows, cost-sensitive IoT, new devices and integrations, system-design-technology pathfinding). This said, open-source will always face challenges such as sustainability, governance, and how to achieve critical mass and critical quality. The paper will conclude with key directions and synergies for open-source and proprietary EDA within an EDA Commons for education and prototyping.
{"title":"A Mixed Open-Source and Proprietary EDA Commons for Education and Prototyping : Invited Paper","authors":"A. Kahng","doi":"10.1145/3508352.3561378","DOIUrl":"https://doi.org/10.1145/3508352.3561378","url":null,"abstract":"In recent years, several open-source projects have shown potential to serve a future technology commons for EDA and design prototyping. This paper examines how open-source and proprietary EDA technologies will inevitably take on complementary roles within a future technology commons. Proprietary EDA technologies offer numerous benefits that will endure, including (i) exceptional technology and engineering; (ii) ever-increasing importance in design-based equivalent scaling and the overall semiconductor value chain; and (iii) well-established commercial and partner relationships. On the other hand, proprietary EDA technologies face challenges that will also endure, including (i) inability to pursue directions such as massive leverage of cloud compute, extreme reduction of turnaround times, or \"free tools\"; and (ii) difficulty in evolving and addressing new applications and markets. By contrast, open-source EDA technologies offer benefits that include (i) the capability to serve as a friction-free, democratized platform for education and future workforce development (i.e., as a platform for EDA research, and as a means of teaching / training both designers and EDA developers with public code); and (ii) addressing the needs of underserved, non-enterprise account markets (e.g., older nodes, research flows, cost-sensitive IoT, new devices and integrations, system-design-technology pathfinding). This said, open-source will always face challenges such as sustainability, governance, and how to achieve critical mass and critical quality. The paper will conclude with key directions and synergies for open-source and proprietary EDA within an EDA Commons for education and prototyping.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133676126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sparse matrix-dense vector (SpMV) multiplication is inherent in most scientific, neural networks and machine learning algorithms. To efficiently exploit sparsity of data in SpMV computations, several compressed data representations have been used. However, compressed data representations of sparse data can result in overheads of locating nonzero values, requiring indirect memory accesses which increases instruction count and memory access delays. We call these translations of compressed representations as metadata processing. We propose a memory-side accelerator for metadata (or indexing) computations and supplying only the required nonzero values to the processor, additionally permitting an overlap of indexing with core computations on nonzero elements. In this contribution, we target our accelerator for low-end micro-controllers with very limited memory and processing capabilities. In this paper we will explore two dedicated ASIC designs of the proposed accelerator that handles the indexed memory accesses for compressed sparse row (CSR) format working alongside a simple RISC-like programmable core. One version of the accelerator supplies only vector values corresponding to nonzero matrix values and the second version supplies both nonzero matrix and matching vector values for SpMV computations. Our experiments show speedups ranging between 1.3 and 2.1 times for SpMV for different levels of sparsity. Our accelerator also results in energy savings ranging between 15.8% and 52.7% over different matrix sizes, when compared to the baseline system with primary RISC-V core performing all computations. We use smaller synthetic matrices with different sparsity levels and larger real-world matrices with higher sparsity (below 1% non-zeros) in our experimental evaluations.
{"title":"Sparse-T: Hardware accelerator thread for unstructured sparse data processing","authors":"Pranathi Vasireddy, K. Kavi, Gayatri Mehta","doi":"10.1145/3508352.3549441","DOIUrl":"https://doi.org/10.1145/3508352.3549441","url":null,"abstract":"Sparse matrix-dense vector (SpMV) multiplication is inherent in most scientific, neural networks and machine learning algorithms. To efficiently exploit sparsity of data in SpMV computations, several compressed data representations have been used. However, compressed data representations of sparse data can result in overheads of locating nonzero values, requiring indirect memory accesses which increases instruction count and memory access delays. We call these translations of compressed representations as metadata processing. We propose a memory-side accelerator for metadata (or indexing) computations and supplying only the required nonzero values to the processor, additionally permitting an overlap of indexing with core computations on nonzero elements. In this contribution, we target our accelerator for low-end micro-controllers with very limited memory and processing capabilities. In this paper we will explore two dedicated ASIC designs of the proposed accelerator that handles the indexed memory accesses for compressed sparse row (CSR) format working alongside a simple RISC-like programmable core. One version of the accelerator supplies only vector values corresponding to nonzero matrix values and the second version supplies both nonzero matrix and matching vector values for SpMV computations. Our experiments show speedups ranging between 1.3 and 2.1 times for SpMV for different levels of sparsity. Our accelerator also results in energy savings ranging between 15.8% and 52.7% over different matrix sizes, when compared to the baseline system with primary RISC-V core performing all computations. We use smaller synthetic matrices with different sparsity levels and larger real-world matrices with higher sparsity (below 1% non-zeros) in our experimental evaluations.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"121 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134161514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the cost and design complexity associated with advanced technology nodes, it is difficult for traditional monolithic System-on-Chip to follow the Moore’s Law, which means the economic benefits have been weakened. Semiconductor industries are looking for advanced packages to improve the economic advantages. Since the multi-chiplet architecture supporting heterogeneous integration has the robust re-usability and effective cost reduction, chiplet integration has become the mainstream of advanced packages. Nowadays, the number of mounted chiplets in a package is continuously increasing with the requirement of high system performance. However, the large area caused by the increasing of chiplets leads to the serious reliability issues, including warpage and bump stress, which worsens the yield and cost. The multi-package architecture, which can distribute chiplets to multiple packages and use less area of each package, is a popular alternative to enhance the reliability and reduce the cost in advanced packages. However, the primary challenge of the multi-package architecture lies in the tradeoff between the inter-package costs, i.e., the interconnection among packages, and the intra-package costs, i.e., the reliability caused by warpage and bump stress. Therefore, a co-design methodology is indispensable to optimize multiple packages simultaneously to improve the quality of the whole system. To tackle this challenge, we adopt mathematical programming methods in the multi-package co-design problem regarding the nature of the synergistic optimization of multiple packages. To the best of our knowledge, this is the first work to solve the multi-package co-design problem.
{"title":"Multi-Package Co-Design for Chiplet Integration","authors":"Zhen Zhuang, Bei Yu, Kai-Yuan Chao, Tsung-Yi Ho","doi":"10.1145/3508352.3549404","DOIUrl":"https://doi.org/10.1145/3508352.3549404","url":null,"abstract":"Due to the cost and design complexity associated with advanced technology nodes, it is difficult for traditional monolithic System-on-Chip to follow the Moore’s Law, which means the economic benefits have been weakened. Semiconductor industries are looking for advanced packages to improve the economic advantages. Since the multi-chiplet architecture supporting heterogeneous integration has the robust re-usability and effective cost reduction, chiplet integration has become the mainstream of advanced packages. Nowadays, the number of mounted chiplets in a package is continuously increasing with the requirement of high system performance. However, the large area caused by the increasing of chiplets leads to the serious reliability issues, including warpage and bump stress, which worsens the yield and cost. The multi-package architecture, which can distribute chiplets to multiple packages and use less area of each package, is a popular alternative to enhance the reliability and reduce the cost in advanced packages. However, the primary challenge of the multi-package architecture lies in the tradeoff between the inter-package costs, i.e., the interconnection among packages, and the intra-package costs, i.e., the reliability caused by warpage and bump stress. Therefore, a co-design methodology is indispensable to optimize multiple packages simultaneously to improve the quality of the whole system. To tackle this challenge, we adopt mathematical programming methods in the multi-package co-design problem regarding the nature of the synergistic optimization of multiple packages. To the best of our knowledge, this is the first work to solve the multi-package co-design problem.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130543474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wentian Jin, Liang Chen, Subed Lamichhane, M. Kavousi, S. Tan
Electromigration (EM) becomes a major concern for VLSI circuits as the technology advances in the nanometer regime. The crux of problem is to solve the partial differential Korhonen equations, which remains challenging due to the increasing integrated density. Recently, scientific m achine l earning has been explored to solve partial differential equations ( PDE) due to breakthrough success in deep neural networks and existing approach such as physics-informed neural networks (PINN) shows promising results for some small PDE problems. However, for large engineering problems like EM analysis for large interconnect trees, it was shown that the plain PINN does not work well due the to large number of variables. In this work, we propose a novel hierarchical PINN approach, HierPINN-EM for fast EM induced stress analysis for multi-segment interconnects. Instead of solving the interconnect tree as a whole, we first solve EM problem for one wire segment under different boundary and geometrical parameters using supervised learning. Then we apply unsupervised PINN concept to solve the whole interconnects by enforcing the physics laws in the boundaries for all wire segments. In this way, HierPINN-EM can significantly reduce the number of variables at plain PINN solver. Numerical results on a number of synthetic interconnect trees show that HierPINN-EM can lead to orders of magnitude speedup in training and more than 79× better accuracy over the plain PINN method. Furthermore, HierPINN-EM yields 19% better accuracy with 99% reduction in training cost over recently proposed Graph Neural Network-based EM solver, EMGraph.
{"title":"HierPINN-EM: Fast Learning-Based Electromigration Analysis for Multi-Segment Interconnects Using Hierarchical Physics-informed Neural Network","authors":"Wentian Jin, Liang Chen, Subed Lamichhane, M. Kavousi, S. Tan","doi":"10.1145/3508352.3549371","DOIUrl":"https://doi.org/10.1145/3508352.3549371","url":null,"abstract":"Electromigration (EM) becomes a major concern for VLSI circuits as the technology advances in the nanometer regime. The crux of problem is to solve the partial differential Korhonen equations, which remains challenging due to the increasing integrated density. Recently, scientific m achine l earning has been explored to solve partial differential equations ( PDE) due to breakthrough success in deep neural networks and existing approach such as physics-informed neural networks (PINN) shows promising results for some small PDE problems. However, for large engineering problems like EM analysis for large interconnect trees, it was shown that the plain PINN does not work well due the to large number of variables. In this work, we propose a novel hierarchical PINN approach, HierPINN-EM for fast EM induced stress analysis for multi-segment interconnects. Instead of solving the interconnect tree as a whole, we first solve EM problem for one wire segment under different boundary and geometrical parameters using supervised learning. Then we apply unsupervised PINN concept to solve the whole interconnects by enforcing the physics laws in the boundaries for all wire segments. In this way, HierPINN-EM can significantly reduce the number of variables at plain PINN solver. Numerical results on a number of synthetic interconnect trees show that HierPINN-EM can lead to orders of magnitude speedup in training and more than 79× better accuracy over the plain PINN method. Furthermore, HierPINN-EM yields 19% better accuracy with 99% reduction in training cost over recently proposed Graph Neural Network-based EM solver, EMGraph.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128860760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing deep neural network (DNN) accelerator design automation (ADA) methods adopt architecture templates to predetermine parts of design choices and then explore the left design choices beyond templates. These templates can be classified into intra-PU templates and inter-PU templates according to the architecture hierarchy. Since templates limit the flexibility of ADA, designing effective ADA methods without templates has become an important research topic. Although there have appeared some works to enhance the flexibility of ADA by removing intra-PU templates, to the best of our knowledge no existing works have studied ADA methods without inter-PU templates. ADA with predetermined inter-PU templates is typically inefficient in terms of resource utilization, especially for DNNs with complex topology. In this paper, we propose a novel method, called hardware computation graph (HCG), for ADA without inter-PU templates. Experiments show that HCG method can achieve competitive latency while using only 1.4x ~ 5x fewer on-chip memory, compared with existing state-of-the-art ADA methods.
{"title":"Hardware Computation Graph for DNN Accelerator Design Automation without Inter-PU Templates","authors":"Jun Yu Li, Wei Wang, Wufeng Li","doi":"10.1145/3508352.3549342","DOIUrl":"https://doi.org/10.1145/3508352.3549342","url":null,"abstract":"Existing deep neural network (DNN) accelerator design automation (ADA) methods adopt architecture templates to predetermine parts of design choices and then explore the left design choices beyond templates. These templates can be classified into intra-PU templates and inter-PU templates according to the architecture hierarchy. Since templates limit the flexibility of ADA, designing effective ADA methods without templates has become an important research topic. Although there have appeared some works to enhance the flexibility of ADA by removing intra-PU templates, to the best of our knowledge no existing works have studied ADA methods without inter-PU templates. ADA with predetermined inter-PU templates is typically inefficient in terms of resource utilization, especially for DNNs with complex topology. In this paper, we propose a novel method, called hardware computation graph (HCG), for ADA without inter-PU templates. Experiments show that HCG method can achieve competitive latency while using only 1.4x ~ 5x fewer on-chip memory, compared with existing state-of-the-art ADA methods.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"792 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122627452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, a lot of effort has been put into developing formal verification approaches by both academic and industrial research. In practice, these techniques often give satisfying results for some types of circuits, while they fail for others. A major challenge in this domain is that the verification techniques suffer from unpredictability in their performance. The only way to overcome this challenge is the calculation of bounds for the space and time complexities. If a verification method has polynomial space and time complexities, scalability can be guaranteed.In this tutorial paper, we review recent developments in formal verification techniques and give a comprehensive overview of Polynomial Formal Verification (PFV). In PFV, polynomial upper bounds for the run-time and memory needed during the entire verification task hold. Thus, correctness under resource constraints can be ensured. We discuss the importance and advantages of PFV in the design flow. Formal methods on the bit-level and the word-level, and their complexities when used to verify different types of circuits, like adders, multipliers, or ALUs are presented. The current status of this new research field and directions for future work are discussed.
{"title":"Polynomial Formal Verification: Ensuring Correctness under Resource Constraints : (Invited Paper)","authors":"R. Drechsler, Alireza Mahzoon","doi":"10.1145/3508352.3561104","DOIUrl":"https://doi.org/10.1145/3508352.3561104","url":null,"abstract":"Recently, a lot of effort has been put into developing formal verification approaches by both academic and industrial research. In practice, these techniques often give satisfying results for some types of circuits, while they fail for others. A major challenge in this domain is that the verification techniques suffer from unpredictability in their performance. The only way to overcome this challenge is the calculation of bounds for the space and time complexities. If a verification method has polynomial space and time complexities, scalability can be guaranteed.In this tutorial paper, we review recent developments in formal verification techniques and give a comprehensive overview of Polynomial Formal Verification (PFV). In PFV, polynomial upper bounds for the run-time and memory needed during the entire verification task hold. Thus, correctness under resource constraints can be ensured. We discuss the importance and advantages of PFV in the design flow. Formal methods on the bit-level and the word-level, and their complexities when used to verify different types of circuits, like adders, multipliers, or ALUs are presented. The current status of this new research field and directions for future work are discussed.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132658407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Protecting the intellectual property (IP) of integrated circuit (IC) design is becoming a significant concern of fab-less semiconductor design houses. Malicious actors can access the chip design at any stage, reverse engineer the functionality, and create illegal copies. On the one hand, defenders are crafting more and more solutions to hide the critical portions of the circuit. On the other hand, attackers are designing more and more powerful tools to extract useful information from the design and reverse engineer the functionality, especially when they can get access to working chips. In this context, the use of custom reconfigurable fabrics has recently been investigated for hardware IP protection. This paper will discuss recent trends in hardware obfuscation with embedded FP-GAs, focusing also on the open challenges that must be necessarily addressed for making this solution viable.
{"title":"Reconfigurable Logic for Hardware IP Protection: Opportunities and Challenges (Invited Paper)","authors":"L. Collini, Benjamin Tan, C. Pilato, R. Karri","doi":"10.1145/3508352.3561117","DOIUrl":"https://doi.org/10.1145/3508352.3561117","url":null,"abstract":"Protecting the intellectual property (IP) of integrated circuit (IC) design is becoming a significant concern of fab-less semiconductor design houses. Malicious actors can access the chip design at any stage, reverse engineer the functionality, and create illegal copies. On the one hand, defenders are crafting more and more solutions to hide the critical portions of the circuit. On the other hand, attackers are designing more and more powerful tools to extract useful information from the design and reverse engineer the functionality, especially when they can get access to working chips. In this context, the use of custom reconfigurable fabrics has recently been investigated for hardware IP protection. This paper will discuss recent trends in hardware obfuscation with embedded FP-GAs, focusing also on the open challenges that must be necessarily addressed for making this solution viable.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131991253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chaojian Li, Sixu Li, Yang Zhao, Wenbo Zhu, Yingyan Lin
Neural Radiance Field (NeRF) based rendering has attracted growing attention thanks to its state-of-the-art (SOTA) rendering quality and wide applications in Augmented and Virtual Reality (AR/VR). However, immersive real-time (> 30 FPS) NeRF based rendering enabled interactions are still limited due to the low achievable throughput on AR/VR devices. To this end, we first profile SOTA efficient NeRF algorithms on commercial devices and identify two primary causes of the aforementioned inefficiency: (1) the uniform point sampling and (2) the dense accesses and computations of the required embeddings in NeRF. Furthermore, we propose RT-NeRF, which to the best of our knowledge is the first algorithm-hardware co-design acceleration of NeRF. Specifically, on the algorithm level, RT-NeRF integrates an efficient rendering pipeline for largely alleviating the inefficiency due to the commonly adopted uniform point sampling method in NeRF by directly computing the geometry of pre-existing points. Additionally, RT-NeRF leverages a coarse-grained view-dependent computing ordering scheme for eliminating the (unnecessary) processing of invisible points. On the hardware level, our proposed RT-NeRF accelerator (1) adopts a hybrid encoding scheme to adaptively switch between a bitmap- or coordinate-based sparsity encoding format for NeRF’s sparse embeddings, aiming to maximize the storage savings and thus reduce the required DRAM accesses while supporting efficient NeRF decoding; and (2) integrates both a high-density sparse search unit and a dual-purpose bi-direction adder & search tree to coordinate the two aforementioned encoding formats. Extensive experiments on eight datasets consistently validate the effectiveness of RT-NeRF, achieving a large throughput improvement (e.g., 9.7×∼3,201×) while maintaining the rendering quality as compared with SOTA efficient NeRF solutions.
{"title":"RT-NeRF: Real-Time On-Device Neural Radiance Fields Towards Immersive AR/VR Rendering","authors":"Chaojian Li, Sixu Li, Yang Zhao, Wenbo Zhu, Yingyan Lin","doi":"10.1145/3508352.3549380","DOIUrl":"https://doi.org/10.1145/3508352.3549380","url":null,"abstract":"Neural Radiance Field (NeRF) based rendering has attracted growing attention thanks to its state-of-the-art (SOTA) rendering quality and wide applications in Augmented and Virtual Reality (AR/VR). However, immersive real-time (> 30 FPS) NeRF based rendering enabled interactions are still limited due to the low achievable throughput on AR/VR devices. To this end, we first profile SOTA efficient NeRF algorithms on commercial devices and identify two primary causes of the aforementioned inefficiency: (1) the uniform point sampling and (2) the dense accesses and computations of the required embeddings in NeRF. Furthermore, we propose RT-NeRF, which to the best of our knowledge is the first algorithm-hardware co-design acceleration of NeRF. Specifically, on the algorithm level, RT-NeRF integrates an efficient rendering pipeline for largely alleviating the inefficiency due to the commonly adopted uniform point sampling method in NeRF by directly computing the geometry of pre-existing points. Additionally, RT-NeRF leverages a coarse-grained view-dependent computing ordering scheme for eliminating the (unnecessary) processing of invisible points. On the hardware level, our proposed RT-NeRF accelerator (1) adopts a hybrid encoding scheme to adaptively switch between a bitmap- or coordinate-based sparsity encoding format for NeRF’s sparse embeddings, aiming to maximize the storage savings and thus reduce the required DRAM accesses while supporting efficient NeRF decoding; and (2) integrates both a high-density sparse search unit and a dual-purpose bi-direction adder & search tree to coordinate the two aforementioned encoding formats. Extensive experiments on eight datasets consistently validate the effectiveness of RT-NeRF, achieving a large throughput improvement (e.g., 9.7×∼3,201×) while maintaining the rendering quality as compared with SOTA efficient NeRF solutions.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114191296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}