Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643545
Imam Al Razi, Quang Le, H. Mantooth, Yarui Peng
Multi-chip power module (MCPM) layout design automation has become an emerging research field in the power electronics society. MCPM physical design is currently a trial-and-error procedure that heavily relies on the designers' experience to produce a reliable solution. To push the boundary of energy efficiency and power density, novel packaging technologies are emerging with increasing design complexity. As this manual design process becomes the bottleneck in design productivity, the power electronics industry is calling for more intelligence in design CAD tools, especially for advanced packaging solutions with stacked substrates. This paper presents a physical design, synthesis, and optimization framework for 2D, 2.5D, and 3D power modules. Generic, scalable, and efficient physical design algorithms are implemented with optimization metaheuristics to solve the hierarchical layout synthesis problem. Corner stitching data structure and hierarchical constraint graph evaluation have been customized to better align with power electronics design considerations. A complete layout synthesis process is demonstrated for both 2D and 3D power module examples. Further, electro-thermal design optimization is carried out on a sample 3D MCPM layout using both exhaustive and evolutionary search methods. Our algorithm can generate 937 3D layouts in 56 s, resulting in 10 layouts on the Pareto-front. In addition, our optimized 3D layouts can achieve 1.3 nH loop inductance with 38 °C temperature rise and 836 mm2 footprint area, compared to 2D layouts with 8.5 nH, 99 °C, and 2000 mm2.
{"title":"Hierarchical Layout Synthesis and Optimization Framework for High-Density Power Module Design Automation","authors":"Imam Al Razi, Quang Le, H. Mantooth, Yarui Peng","doi":"10.1109/ICCAD51958.2021.9643545","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643545","url":null,"abstract":"Multi-chip power module (MCPM) layout design automation has become an emerging research field in the power electronics society. MCPM physical design is currently a trial-and-error procedure that heavily relies on the designers' experience to produce a reliable solution. To push the boundary of energy efficiency and power density, novel packaging technologies are emerging with increasing design complexity. As this manual design process becomes the bottleneck in design productivity, the power electronics industry is calling for more intelligence in design CAD tools, especially for advanced packaging solutions with stacked substrates. This paper presents a physical design, synthesis, and optimization framework for 2D, 2.5D, and 3D power modules. Generic, scalable, and efficient physical design algorithms are implemented with optimization metaheuristics to solve the hierarchical layout synthesis problem. Corner stitching data structure and hierarchical constraint graph evaluation have been customized to better align with power electronics design considerations. A complete layout synthesis process is demonstrated for both 2D and 3D power module examples. Further, electro-thermal design optimization is carried out on a sample 3D MCPM layout using both exhaustive and evolutionary search methods. Our algorithm can generate 937 3D layouts in 56 s, resulting in 10 layouts on the Pareto-front. In addition, our optimized 3D layouts can achieve 1.3 nH loop inductance with 38 °C temperature rise and 836 mm2 footprint area, compared to 2D layouts with 8.5 nH, 99 °C, and 2000 mm2.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127763316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643517
Fangzhou Wang, Lixin Liu, Jingsong Chen, Jinwei Liu, Xinshi Zang, Martin D. F. Wong
Placement and routing (P&R) are two important stages in the physical design flow. After circuit components are assigned locations by a placer, routing will take place to make the connections. Defined as two separate problems, placement and routing aim to optimize different objectives. For instance, placement usually focuses on optimizing the half-perimeter wire length (HPWL) and estimated congestion while routing will try to minimize the routed wire length and the number of overflows. The misalignment between the objectives will inevitably lead to a significant degradation in solution quality. Therefore, in this paper, we present Starfish, an efficient P&R co-optimization engine that bridges the gap between placement and routing. To incrementally optimize the routed wire length, Starfish conducts cell movements and reconnects broken nets by A*-based partial rerouting. Experimental results on the ICCAD 2020 contest benchmark suites [1] show that our co-optimizer outperforms all the contestants with better solution quality and much shorter runtime.
{"title":"Starfish: An Efficient P&R Co-Optimization Engine with A*-based Partial Rerouting","authors":"Fangzhou Wang, Lixin Liu, Jingsong Chen, Jinwei Liu, Xinshi Zang, Martin D. F. Wong","doi":"10.1109/ICCAD51958.2021.9643517","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643517","url":null,"abstract":"Placement and routing (P&R) are two important stages in the physical design flow. After circuit components are assigned locations by a placer, routing will take place to make the connections. Defined as two separate problems, placement and routing aim to optimize different objectives. For instance, placement usually focuses on optimizing the half-perimeter wire length (HPWL) and estimated congestion while routing will try to minimize the routed wire length and the number of overflows. The misalignment between the objectives will inevitably lead to a significant degradation in solution quality. Therefore, in this paper, we present Starfish, an efficient P&R co-optimization engine that bridges the gap between placement and routing. To incrementally optimize the routed wire length, Starfish conducts cell movements and reconnects broken nets by A*-based partial rerouting. Experimental results on the ICCAD 2020 contest benchmark suites [1] show that our co-optimizer outperforms all the contestants with better solution quality and much shorter runtime.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"262 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132663477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643469
Marco Pistoia, Syed Farhan Ahmad, Akshay Ajagekar, Alexander Buts, Shouvanik Chakrabarti, Dylan Herman, Shaohan Hu, Andrew Jena, Pierre Minssen, Pradeep Niroula, Arthur G. Rattew, Yue Sun, Romina Yalovetzky
Quantum computers are expected to surpass the computational capabilities of classical computers during this decade, and achieve disruptive impact on numerous industry sectors, particularly finance. In fact, finance is estimated to be the first industry sector to benefit from Quantum Computing not only in the medium and long terms, but even in the short term. This review paper presents the state of the art of quantum algorithms for financial applications, with particular focus to those use cases that can be solved via Machine Learning.
{"title":"Quantum Machine Learning for Finance ICCAD Special Session Paper","authors":"Marco Pistoia, Syed Farhan Ahmad, Akshay Ajagekar, Alexander Buts, Shouvanik Chakrabarti, Dylan Herman, Shaohan Hu, Andrew Jena, Pierre Minssen, Pradeep Niroula, Arthur G. Rattew, Yue Sun, Romina Yalovetzky","doi":"10.1109/ICCAD51958.2021.9643469","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643469","url":null,"abstract":"Quantum computers are expected to surpass the computational capabilities of classical computers during this decade, and achieve disruptive impact on numerous industry sectors, particularly finance. In fact, finance is estimated to be the first industry sector to benefit from Quantum Computing not only in the medium and long terms, but even in the short term. This review paper presents the state of the art of quantum algorithms for financial applications, with particular focus to those use cases that can be solved via Machine Learning.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132200971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643510
Boyang Zhang, Yang Sui, Lingyi Huang, Siyu Liao, Chunhua Deng, Bo Yuan
Channel decoder is a key component module in many communication systems. Recently, neural networks-based channel decoders have been actively investigated because of the great potential of their data-driven decoding procedure. However, as the intersection among machine learning, information theory and hardware design, the efficient algorithm and hardware codesign of deep learning-powered channel decoder has not been well studied. This paper is a first step towards exploring the efficient DNN-enabled channel decoders, from a joint perspective of algorithm and hardware. We first revisit our recently proposed doubly residual neural decoder. By introducing the advanced architectural topology on the decoder design, the overall error-correcting performance can be significantly improved. Based on this algorithm, we further develop the corresponding systolic array-based hardware architecture for the DRN decoder. The corresponding FPGA implementation for our DRN decoder on short LDPC code is also developed.
{"title":"Algorithm and Hardware Co-design for Deep Learning-powered Channel Decoder: A Case Study","authors":"Boyang Zhang, Yang Sui, Lingyi Huang, Siyu Liao, Chunhua Deng, Bo Yuan","doi":"10.1109/ICCAD51958.2021.9643510","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643510","url":null,"abstract":"Channel decoder is a key component module in many communication systems. Recently, neural networks-based channel decoders have been actively investigated because of the great potential of their data-driven decoding procedure. However, as the intersection among machine learning, information theory and hardware design, the efficient algorithm and hardware codesign of deep learning-powered channel decoder has not been well studied. This paper is a first step towards exploring the efficient DNN-enabled channel decoders, from a joint perspective of algorithm and hardware. We first revisit our recently proposed doubly residual neural decoder. By introducing the advanced architectural topology on the decoder design, the overall error-correcting performance can be significantly improved. Based on this algorithm, we further develop the corresponding systolic array-based hardware architecture for the DRN decoder. The corresponding FPGA implementation for our DRN decoder on short LDPC code is also developed.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114442518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643484
Wei Zeng, A. Davoodi, R. Topaloglu
Recent years have seen promising studies on machine learning (ML) techniques applied to approximate logic synthesis (ALS), especially based on logic reconstruction from samples of input-output pairs. This “sampling-based ALS” supports integration with conventional logic synthesis and optimization techniques, as well as synthesis for a constrained input space (e.g., when primary input values are restricted using Boolean relations). To achieve an effective sampling-based ALS, for the first time, this paper proposes the use of adaptive decision trees (ADTs), and in particular variations guided by explainable ML. We adopt SHAP importance, which is a feature importance metric derived from a recent advance in explainable ML to guide the training of ADTs. We also include approximation techniques for ADT which are specifically designed for ALS, including don't-care bit assertion and instantiation. Comprehensive experiments show that we can achieve 39%-42% area reduction with 0.20%-0.22% error rate on average, based on 15 logic functions in the IWLS'20 benchmark suite.
{"title":"Sampling-Based Approximate Logic Synthesis: An Explainable Machine Learning Approach","authors":"Wei Zeng, A. Davoodi, R. Topaloglu","doi":"10.1109/ICCAD51958.2021.9643484","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643484","url":null,"abstract":"Recent years have seen promising studies on machine learning (ML) techniques applied to approximate logic synthesis (ALS), especially based on logic reconstruction from samples of input-output pairs. This “sampling-based ALS” supports integration with conventional logic synthesis and optimization techniques, as well as synthesis for a constrained input space (e.g., when primary input values are restricted using Boolean relations). To achieve an effective sampling-based ALS, for the first time, this paper proposes the use of adaptive decision trees (ADTs), and in particular variations guided by explainable ML. We adopt SHAP importance, which is a feature importance metric derived from a recent advance in explainable ML to guide the training of ADTs. We also include approximation techniques for ADT which are specifically designed for ALS, including don't-care bit assertion and instantiation. Comprehensive experiments show that we can achieve 39%-42% area reduction with 0.20%-0.22% error rate on average, based on 15 logic functions in the IWLS'20 benchmark suite.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115735095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643439
Andreas Krinke, Shubham Rai, Akash Kumar, J. Lienig
Recently proposed ambipolar nanotechnologies allow the development of reconfigurable circuits with low area and power overheads as compared to the conventional CMOS technology. However, using a conventional physical synthesis flow for circuits that include gates based on reconfigurable FETs (RFETs) leads to sub-optimal results. This is due to the fact that the physical synthesis flow for circuits based on RFETs has to cater to the additional gate terminal per RFET transistors. In the present work, we explore three important verticals that lead to an optimized physical synthesis flow for RFET-based circuits with circuit-level reconfigurability: (1) designing optimized layouts of reconfigurable gates, (2) utilize special driver cells to drive the reconfigurable portions of a circuit, and (3) optimized placement of these reconfigurable parts in separate power domains. Experimental evaluations over EPFL benchmarks using our proposed approach show a reduction in chip area of up to 17.5% when compared to conventional flows.
{"title":"Exploring Physical Synthesis for Circuits based on Emerging Reconfigurable Nanotechnologies","authors":"Andreas Krinke, Shubham Rai, Akash Kumar, J. Lienig","doi":"10.1109/ICCAD51958.2021.9643439","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643439","url":null,"abstract":"Recently proposed ambipolar nanotechnologies allow the development of reconfigurable circuits with low area and power overheads as compared to the conventional CMOS technology. However, using a conventional physical synthesis flow for circuits that include gates based on reconfigurable FETs (RFETs) leads to sub-optimal results. This is due to the fact that the physical synthesis flow for circuits based on RFETs has to cater to the additional gate terminal per RFET transistors. In the present work, we explore three important verticals that lead to an optimized physical synthesis flow for RFET-based circuits with circuit-level reconfigurability: (1) designing optimized layouts of reconfigurable gates, (2) utilize special driver cells to drive the reconfigurable portions of a circuit, and (3) optimized placement of these reconfigurable parts in separate power domains. Experimental evaluations over EPFL benchmarks using our proposed approach show a reduction in chip area of up to 17.5% when compared to conventional flows.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115391845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643450
Sina Asadi, M. Najafi, M. Imani
Stochastic computing (SC) is a re-emerging computing paradigm providing low-cost and noise-tolerant designs for a wide range of arithmetic operations. SC circuits operate on uniform bit-streams with the value determined by the probability of observing 1's in the bit-stream. The accuracy of SC operations highly depends on the correlation between input bit-streams. While some operations such as minimum and maximum value functions require highly correlated inputs, some other such as multiplication operation need uncorrelated or independent inputs for accurate computation. Developing low-cost and accurate correlation manipulation circuits is an important research in SC as these circuits can manage correlation between bit-streams without expensive bit-stream regeneration. This work proposes a novel in-stream correlator and decorrelator circuit that manages 1) correlation between stochastic bit-streams, and 2) distribution of 1's in the output bit-streams. Compared to state-of-the-art solutions, our designs achieve lower hardware cost and higher accuracy. The output bit-streams enjoy a low-discrepancy distribution of bits which leads to higher quality of results. The effectiveness of the proposed circuits is shown with two case studies: SC design of sorting and median filtering.
{"title":"CORLD: In-Stream Correlation Manipulation for Low-Discrepancy Stochastic Computing","authors":"Sina Asadi, M. Najafi, M. Imani","doi":"10.1109/ICCAD51958.2021.9643450","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643450","url":null,"abstract":"Stochastic computing (SC) is a re-emerging computing paradigm providing low-cost and noise-tolerant designs for a wide range of arithmetic operations. SC circuits operate on uniform bit-streams with the value determined by the probability of observing 1's in the bit-stream. The accuracy of SC operations highly depends on the correlation between input bit-streams. While some operations such as minimum and maximum value functions require highly correlated inputs, some other such as multiplication operation need uncorrelated or independent inputs for accurate computation. Developing low-cost and accurate correlation manipulation circuits is an important research in SC as these circuits can manage correlation between bit-streams without expensive bit-stream regeneration. This work proposes a novel in-stream correlator and decorrelator circuit that manages 1) correlation between stochastic bit-streams, and 2) distribution of 1's in the output bit-streams. Compared to state-of-the-art solutions, our designs achieve lower hardware cost and higher accuracy. The output bit-streams enjoy a low-discrepancy distribution of bits which leads to higher quality of results. The effectiveness of the proposed circuits is shown with two case studies: SC design of sorting and median filtering.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115400834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643533
W. R. Davis, P. Franzon, Luis Francisco, Bill Huggins, Rajeev Jain
The power, performance and area (PPA) of digital blocks can vary 10:1 based on their synthesis, place, and route tool recipes. With rapid increase in number of PVT corners and complexity of logic functions approaching 10M gates, industry has an acute need to minimize the human resources, compute servers, and EDA licenses needed to achieve a Pareto optimal recipe. We first present models for fast accurate PPA prediction that can reduce the manual optimization iterations with EDA tools. Secondly we investigate techniques to automate the PPA optimization using evolutionary algorithms. For PPA prediction, a baseline model is trained on a known design using Latin hypercube sample runs of the EDA tool, and transfer learning is then used to train the model for an unseen design. For a known design the baseline needed 150 training runs to achieve a 95% accuracy. With transfer learning the same accuracy was achieved on a different (unseen) design in only 15 runs indicating the viability of transfer learning to generalize PPA models. The PPA optimization technique, based on evolutionary algorithms, effectively combines the PPA modeling and optimization. Our approach reached the same PPA solution as human designers in the same or fewer runs for a CORTEX-M0 system design. This shows potential for automating the recipe optimization without needing more runs than a human designer would need.
{"title":"Fast and Accurate PPA Modeling with Transfer Learning","authors":"W. R. Davis, P. Franzon, Luis Francisco, Bill Huggins, Rajeev Jain","doi":"10.1109/ICCAD51958.2021.9643533","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643533","url":null,"abstract":"The power, performance and area (PPA) of digital blocks can vary 10:1 based on their synthesis, place, and route tool recipes. With rapid increase in number of PVT corners and complexity of logic functions approaching 10M gates, industry has an acute need to minimize the human resources, compute servers, and EDA licenses needed to achieve a Pareto optimal recipe. We first present models for fast accurate PPA prediction that can reduce the manual optimization iterations with EDA tools. Secondly we investigate techniques to automate the PPA optimization using evolutionary algorithms. For PPA prediction, a baseline model is trained on a known design using Latin hypercube sample runs of the EDA tool, and transfer learning is then used to train the model for an unseen design. For a known design the baseline needed 150 training runs to achieve a 95% accuracy. With transfer learning the same accuracy was achieved on a different (unseen) design in only 15 runs indicating the viability of transfer learning to generalize PPA models. The PPA optimization technique, based on evolutionary algorithms, effectively combines the PPA modeling and optimization. Our approach reached the same PPA solution as human designers in the same or fewer runs for a CORTEX-M0 system design. This shows potential for automating the recipe optimization without needing more runs than a human designer would need.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"103 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113954482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643590
Binwu Zhu, Ran Chen, Xinyun Zhang, Fan Yang, Xuan Zeng, Bei Yu, Martin D. F. Wong
With the rapid development of semiconductors and the continuous scaling-down of circuit feature size, hotspot detection has become much more challenging and crucial as a critical step in the physical verification flow. In recent years, advanced deep learning techniques have spawned many frameworks for hotspot detection. However, most existing hotspot detectors can only detect defects arising in the central region of small clips, making the whole detection process time-consuming on large layouts. Some advanced hotspot detectors can detect multiple hotspots in a large area but need to propose potential defect regions, and a refinement step is required to locate the hotspot precisely. To simplify the procedure of multi-stage detectors, an end - to-end single-stage hotspot detector is proposed to identify hotspots on large scales without refining potential regions. Besides, multiple tasks are developed to learn various pattern topological features. Also, a feature aggregation module based on Transformer Encoder is designed to globally capture the relationship between different features, further enhancing the feature representation ability. Experimental results show that our proposed framework achieves higher accuracy over prior methods with faster inference speed.
{"title":"Hotspot Detection via Multi-task Learning and Transformer Encoder","authors":"Binwu Zhu, Ran Chen, Xinyun Zhang, Fan Yang, Xuan Zeng, Bei Yu, Martin D. F. Wong","doi":"10.1109/ICCAD51958.2021.9643590","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643590","url":null,"abstract":"With the rapid development of semiconductors and the continuous scaling-down of circuit feature size, hotspot detection has become much more challenging and crucial as a critical step in the physical verification flow. In recent years, advanced deep learning techniques have spawned many frameworks for hotspot detection. However, most existing hotspot detectors can only detect defects arising in the central region of small clips, making the whole detection process time-consuming on large layouts. Some advanced hotspot detectors can detect multiple hotspots in a large area but need to propose potential defect regions, and a refinement step is required to locate the hotspot precisely. To simplify the procedure of multi-stage detectors, an end - to-end single-stage hotspot detector is proposed to identify hotspots on large scales without refining potential regions. Besides, multiple tasks are developed to learn various pattern topological features. Also, a feature aggregation module based on Transformer Encoder is designed to globally capture the relationship between different features, further enhancing the feature representation ability. Experimental results show that our proposed framework achieves higher accuracy over prior methods with faster inference speed.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124159522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643465
Nagadastagiri Challapalle, Karthik Swaminathan, Nandhini Chandramoorthy, V. Narayanan
Graph data structures are central to many applications such as social networks, citation networks, molecular interactions, and navigation systems. Graph Convolutional Networks (GCNs) are used to process and learn insights from the graph data for tasks such as link prediction, node classification, and learning node embeddings. The compute and memory access characteristics of GCNs differ, both from conventional graph analytics algorithms and from convolutional neural networks, rendering the existing accelerators for graph analytics as well as deep learning, inefficient. In this work, we propose PIM-GCN, a crossbar-based processing-in-memory (PIM) accelerator architecture for GCNs. PIM-GCN incorporates a node-stationary dataflow with support for both Compressed Sparse Row (CSR) and Compressed Sparse Column (CSC) graph data representations. We propose techniques for graph traversal in the compressed sparse domain, feature aggregation, and feature transformation operations in GCNs mapped to in-situ analog compute functions of crossbar memory, and present the trade-offs in performance, energy, and scalability aspects of the PIM-GCN architecture for CSR, and CSC graph data representations. PIM-GCN shows an average speedup of over $3-16times$ and an average energy reduction of $4-12times$ compared to the existing accelerator architectures.
{"title":"Crossbar based Processing in Memory Accelerator Architecture for Graph Convolutional Networks","authors":"Nagadastagiri Challapalle, Karthik Swaminathan, Nandhini Chandramoorthy, V. Narayanan","doi":"10.1109/ICCAD51958.2021.9643465","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643465","url":null,"abstract":"Graph data structures are central to many applications such as social networks, citation networks, molecular interactions, and navigation systems. Graph Convolutional Networks (GCNs) are used to process and learn insights from the graph data for tasks such as link prediction, node classification, and learning node embeddings. The compute and memory access characteristics of GCNs differ, both from conventional graph analytics algorithms and from convolutional neural networks, rendering the existing accelerators for graph analytics as well as deep learning, inefficient. In this work, we propose PIM-GCN, a crossbar-based processing-in-memory (PIM) accelerator architecture for GCNs. PIM-GCN incorporates a node-stationary dataflow with support for both Compressed Sparse Row (CSR) and Compressed Sparse Column (CSC) graph data representations. We propose techniques for graph traversal in the compressed sparse domain, feature aggregation, and feature transformation operations in GCNs mapped to in-situ analog compute functions of crossbar memory, and present the trade-offs in performance, energy, and scalability aspects of the PIM-GCN architecture for CSR, and CSC graph data representations. PIM-GCN shows an average speedup of over $3-16times$ and an average energy reduction of $4-12times$ compared to the existing accelerator architectures.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124166380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}