Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643545
Imam Al Razi, Quang Le, H. Mantooth, Yarui Peng
Multi-chip power module (MCPM) layout design automation has become an emerging research field in the power electronics society. MCPM physical design is currently a trial-and-error procedure that heavily relies on the designers' experience to produce a reliable solution. To push the boundary of energy efficiency and power density, novel packaging technologies are emerging with increasing design complexity. As this manual design process becomes the bottleneck in design productivity, the power electronics industry is calling for more intelligence in design CAD tools, especially for advanced packaging solutions with stacked substrates. This paper presents a physical design, synthesis, and optimization framework for 2D, 2.5D, and 3D power modules. Generic, scalable, and efficient physical design algorithms are implemented with optimization metaheuristics to solve the hierarchical layout synthesis problem. Corner stitching data structure and hierarchical constraint graph evaluation have been customized to better align with power electronics design considerations. A complete layout synthesis process is demonstrated for both 2D and 3D power module examples. Further, electro-thermal design optimization is carried out on a sample 3D MCPM layout using both exhaustive and evolutionary search methods. Our algorithm can generate 937 3D layouts in 56 s, resulting in 10 layouts on the Pareto-front. In addition, our optimized 3D layouts can achieve 1.3 nH loop inductance with 38 °C temperature rise and 836 mm2 footprint area, compared to 2D layouts with 8.5 nH, 99 °C, and 2000 mm2.
{"title":"Hierarchical Layout Synthesis and Optimization Framework for High-Density Power Module Design Automation","authors":"Imam Al Razi, Quang Le, H. Mantooth, Yarui Peng","doi":"10.1109/ICCAD51958.2021.9643545","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643545","url":null,"abstract":"Multi-chip power module (MCPM) layout design automation has become an emerging research field in the power electronics society. MCPM physical design is currently a trial-and-error procedure that heavily relies on the designers' experience to produce a reliable solution. To push the boundary of energy efficiency and power density, novel packaging technologies are emerging with increasing design complexity. As this manual design process becomes the bottleneck in design productivity, the power electronics industry is calling for more intelligence in design CAD tools, especially for advanced packaging solutions with stacked substrates. This paper presents a physical design, synthesis, and optimization framework for 2D, 2.5D, and 3D power modules. Generic, scalable, and efficient physical design algorithms are implemented with optimization metaheuristics to solve the hierarchical layout synthesis problem. Corner stitching data structure and hierarchical constraint graph evaluation have been customized to better align with power electronics design considerations. A complete layout synthesis process is demonstrated for both 2D and 3D power module examples. Further, electro-thermal design optimization is carried out on a sample 3D MCPM layout using both exhaustive and evolutionary search methods. Our algorithm can generate 937 3D layouts in 56 s, resulting in 10 layouts on the Pareto-front. In addition, our optimized 3D layouts can achieve 1.3 nH loop inductance with 38 °C temperature rise and 836 mm2 footprint area, compared to 2D layouts with 8.5 nH, 99 °C, and 2000 mm2.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127763316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643517
Fangzhou Wang, Lixin Liu, Jingsong Chen, Jinwei Liu, Xinshi Zang, Martin D. F. Wong
Placement and routing (P&R) are two important stages in the physical design flow. After circuit components are assigned locations by a placer, routing will take place to make the connections. Defined as two separate problems, placement and routing aim to optimize different objectives. For instance, placement usually focuses on optimizing the half-perimeter wire length (HPWL) and estimated congestion while routing will try to minimize the routed wire length and the number of overflows. The misalignment between the objectives will inevitably lead to a significant degradation in solution quality. Therefore, in this paper, we present Starfish, an efficient P&R co-optimization engine that bridges the gap between placement and routing. To incrementally optimize the routed wire length, Starfish conducts cell movements and reconnects broken nets by A*-based partial rerouting. Experimental results on the ICCAD 2020 contest benchmark suites [1] show that our co-optimizer outperforms all the contestants with better solution quality and much shorter runtime.
{"title":"Starfish: An Efficient P&R Co-Optimization Engine with A*-based Partial Rerouting","authors":"Fangzhou Wang, Lixin Liu, Jingsong Chen, Jinwei Liu, Xinshi Zang, Martin D. F. Wong","doi":"10.1109/ICCAD51958.2021.9643517","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643517","url":null,"abstract":"Placement and routing (P&R) are two important stages in the physical design flow. After circuit components are assigned locations by a placer, routing will take place to make the connections. Defined as two separate problems, placement and routing aim to optimize different objectives. For instance, placement usually focuses on optimizing the half-perimeter wire length (HPWL) and estimated congestion while routing will try to minimize the routed wire length and the number of overflows. The misalignment between the objectives will inevitably lead to a significant degradation in solution quality. Therefore, in this paper, we present Starfish, an efficient P&R co-optimization engine that bridges the gap between placement and routing. To incrementally optimize the routed wire length, Starfish conducts cell movements and reconnects broken nets by A*-based partial rerouting. Experimental results on the ICCAD 2020 contest benchmark suites [1] show that our co-optimizer outperforms all the contestants with better solution quality and much shorter runtime.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"262 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132663477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643469
Marco Pistoia, Syed Farhan Ahmad, Akshay Ajagekar, Alexander Buts, Shouvanik Chakrabarti, Dylan Herman, Shaohan Hu, Andrew Jena, Pierre Minssen, Pradeep Niroula, Arthur G. Rattew, Yue Sun, Romina Yalovetzky
Quantum computers are expected to surpass the computational capabilities of classical computers during this decade, and achieve disruptive impact on numerous industry sectors, particularly finance. In fact, finance is estimated to be the first industry sector to benefit from Quantum Computing not only in the medium and long terms, but even in the short term. This review paper presents the state of the art of quantum algorithms for financial applications, with particular focus to those use cases that can be solved via Machine Learning.
{"title":"Quantum Machine Learning for Finance ICCAD Special Session Paper","authors":"Marco Pistoia, Syed Farhan Ahmad, Akshay Ajagekar, Alexander Buts, Shouvanik Chakrabarti, Dylan Herman, Shaohan Hu, Andrew Jena, Pierre Minssen, Pradeep Niroula, Arthur G. Rattew, Yue Sun, Romina Yalovetzky","doi":"10.1109/ICCAD51958.2021.9643469","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643469","url":null,"abstract":"Quantum computers are expected to surpass the computational capabilities of classical computers during this decade, and achieve disruptive impact on numerous industry sectors, particularly finance. In fact, finance is estimated to be the first industry sector to benefit from Quantum Computing not only in the medium and long terms, but even in the short term. This review paper presents the state of the art of quantum algorithms for financial applications, with particular focus to those use cases that can be solved via Machine Learning.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132200971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643577
M. Ziegler, Jihye Kwon, Hung-Yi Liu, L. Carloni
Modern logic and physical synthesis tools provide numerous options and parameters that can drastically affect design quality; however, the large number of options leads to a complex design space difficult for human designers to navigate. Fortunately, machine learning approaches and cloud computing environments are well suited for tackling complex parameter tuning problems like those seen in VLSI design flows. This paper proposes a holistic approach where online and offline machine learning approaches work together for tuning industrial design flows. We describe a system called SynTunSys (STS) that has been used to optimize multiple industrial high-performance processors. STS consists of an online system that optimizes designs and generates data for a recommender system that performs offline training and recommendation. Experimental results show the collaboration between STS online and offline machine learning systems as well as insight from human designers provide best-of-breed results. Finally, we discuss potential new directions for research on design flow tuning.
{"title":"Online and Offline Machine Learning for Industrial Design Flow Tuning: (Invited - ICCAD Special Session Paper)","authors":"M. Ziegler, Jihye Kwon, Hung-Yi Liu, L. Carloni","doi":"10.1109/ICCAD51958.2021.9643577","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643577","url":null,"abstract":"Modern logic and physical synthesis tools provide numerous options and parameters that can drastically affect design quality; however, the large number of options leads to a complex design space difficult for human designers to navigate. Fortunately, machine learning approaches and cloud computing environments are well suited for tackling complex parameter tuning problems like those seen in VLSI design flows. This paper proposes a holistic approach where online and offline machine learning approaches work together for tuning industrial design flows. We describe a system called SynTunSys (STS) that has been used to optimize multiple industrial high-performance processors. STS consists of an online system that optimizes designs and generates data for a recommender system that performs offline training and recommendation. Experimental results show the collaboration between STS online and offline machine learning systems as well as insight from human designers provide best-of-breed results. Finally, we discuss potential new directions for research on design flow tuning.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125358481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643559
Biresh Kumar Joardar, Aqeeb Iqbal Arka, J. Doppa, P. Pande, Hai Helen Li, K. Chakrabarty
Resistive random-access memory (ReRAM)-based processing-in-memory (PIM) architectures have recently become a popular architectural choice for deep-learning applications. ReRAM-based architectures can accelerate inferencing and training of deep learning algorithms and are more energy efficient compared to traditional GPUs. However, these architectures have various limitations that affect the model accuracy and performance. Moreover, the choice of the deep-learning application also imposes new design challenges that must be addressed to achieve high performance. In this paper, we present the advantages and challenges associated with ReRAM-based PIM architectures by considering Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs) as important application domains. We also outline methods that can be used to address these challenges.
{"title":"Heterogeneous Manycore Architectures Enabled by Processing-in-Memory for Deep Learning: From CNNs to GNNs: (ICCAD Special Session Paper)","authors":"Biresh Kumar Joardar, Aqeeb Iqbal Arka, J. Doppa, P. Pande, Hai Helen Li, K. Chakrabarty","doi":"10.1109/ICCAD51958.2021.9643559","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643559","url":null,"abstract":"Resistive random-access memory (ReRAM)-based processing-in-memory (PIM) architectures have recently become a popular architectural choice for deep-learning applications. ReRAM-based architectures can accelerate inferencing and training of deep learning algorithms and are more energy efficient compared to traditional GPUs. However, these architectures have various limitations that affect the model accuracy and performance. Moreover, the choice of the deep-learning application also imposes new design challenges that must be addressed to achieve high performance. In this paper, we present the advantages and challenges associated with ReRAM-based PIM architectures by considering Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs) as important application domains. We also outline methods that can be used to address these challenges.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"194 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132640181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643450
Sina Asadi, M. Najafi, M. Imani
Stochastic computing (SC) is a re-emerging computing paradigm providing low-cost and noise-tolerant designs for a wide range of arithmetic operations. SC circuits operate on uniform bit-streams with the value determined by the probability of observing 1's in the bit-stream. The accuracy of SC operations highly depends on the correlation between input bit-streams. While some operations such as minimum and maximum value functions require highly correlated inputs, some other such as multiplication operation need uncorrelated or independent inputs for accurate computation. Developing low-cost and accurate correlation manipulation circuits is an important research in SC as these circuits can manage correlation between bit-streams without expensive bit-stream regeneration. This work proposes a novel in-stream correlator and decorrelator circuit that manages 1) correlation between stochastic bit-streams, and 2) distribution of 1's in the output bit-streams. Compared to state-of-the-art solutions, our designs achieve lower hardware cost and higher accuracy. The output bit-streams enjoy a low-discrepancy distribution of bits which leads to higher quality of results. The effectiveness of the proposed circuits is shown with two case studies: SC design of sorting and median filtering.
{"title":"CORLD: In-Stream Correlation Manipulation for Low-Discrepancy Stochastic Computing","authors":"Sina Asadi, M. Najafi, M. Imani","doi":"10.1109/ICCAD51958.2021.9643450","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643450","url":null,"abstract":"Stochastic computing (SC) is a re-emerging computing paradigm providing low-cost and noise-tolerant designs for a wide range of arithmetic operations. SC circuits operate on uniform bit-streams with the value determined by the probability of observing 1's in the bit-stream. The accuracy of SC operations highly depends on the correlation between input bit-streams. While some operations such as minimum and maximum value functions require highly correlated inputs, some other such as multiplication operation need uncorrelated or independent inputs for accurate computation. Developing low-cost and accurate correlation manipulation circuits is an important research in SC as these circuits can manage correlation between bit-streams without expensive bit-stream regeneration. This work proposes a novel in-stream correlator and decorrelator circuit that manages 1) correlation between stochastic bit-streams, and 2) distribution of 1's in the output bit-streams. Compared to state-of-the-art solutions, our designs achieve lower hardware cost and higher accuracy. The output bit-streams enjoy a low-discrepancy distribution of bits which leads to higher quality of results. The effectiveness of the proposed circuits is shown with two case studies: SC design of sorting and median filtering.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115400834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To enable the performance optimization of application mapping on modern field-programmable gate arrays (FPGAs), certain critical path portions of the designs might be prearranged into many multi-cell macros during synthesis. These movable macros with constraints of shape and resources lead to challenging mixed-size placement for FPGA designs which cannot be addressed by previous works of analytical placers. In this work, we propose AMF-Placer, an open-source Analytical Mixed-size FPGA placer supporting mixed-size placement on FPGA, with an interface to Xilinx Vivado. To speed up the convergence and improve the quality of the placement, AMF-Placer is equipped with a series of new techniques for wirelength optimization, cell spreading, packing, and legalization. Based on a set of the latest large open-source benchmarks from various domains for Xilinx Ultrascale FPGAs, experimental results indicate that AMF-Placer can improve HPWL by 20.4%-89.3% and reduce runtime by 8.0%-84.2%, compared to the baseline. Furthermore, utilizing the parallelism of the proposed algorithms, with 8 threads, the placement procedure can be accelerated by 2.41x on average.
{"title":"AMF-Placer: High-Performance Analytical Mixed-size Placer for FPGA","authors":"Tingyuan Liang, Gengjie Chen, Jieru Zhao, Sharad Sinha, Wei Zhang","doi":"10.1109/ICCAD51958.2021.9643574","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643574","url":null,"abstract":"To enable the performance optimization of application mapping on modern field-programmable gate arrays (FPGAs), certain critical path portions of the designs might be prearranged into many multi-cell macros during synthesis. These movable macros with constraints of shape and resources lead to challenging mixed-size placement for FPGA designs which cannot be addressed by previous works of analytical placers. In this work, we propose AMF-Placer, an open-source Analytical Mixed-size FPGA placer supporting mixed-size placement on FPGA, with an interface to Xilinx Vivado. To speed up the convergence and improve the quality of the placement, AMF-Placer is equipped with a series of new techniques for wirelength optimization, cell spreading, packing, and legalization. Based on a set of the latest large open-source benchmarks from various domains for Xilinx Ultrascale FPGAs, experimental results indicate that AMF-Placer can improve HPWL by 20.4%-89.3% and reduce runtime by 8.0%-84.2%, compared to the baseline. Furthermore, utilizing the parallelism of the proposed algorithms, with 8 threads, the placement procedure can be accelerated by 2.41x on average.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116930343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643439
Andreas Krinke, Shubham Rai, Akash Kumar, J. Lienig
Recently proposed ambipolar nanotechnologies allow the development of reconfigurable circuits with low area and power overheads as compared to the conventional CMOS technology. However, using a conventional physical synthesis flow for circuits that include gates based on reconfigurable FETs (RFETs) leads to sub-optimal results. This is due to the fact that the physical synthesis flow for circuits based on RFETs has to cater to the additional gate terminal per RFET transistors. In the present work, we explore three important verticals that lead to an optimized physical synthesis flow for RFET-based circuits with circuit-level reconfigurability: (1) designing optimized layouts of reconfigurable gates, (2) utilize special driver cells to drive the reconfigurable portions of a circuit, and (3) optimized placement of these reconfigurable parts in separate power domains. Experimental evaluations over EPFL benchmarks using our proposed approach show a reduction in chip area of up to 17.5% when compared to conventional flows.
{"title":"Exploring Physical Synthesis for Circuits based on Emerging Reconfigurable Nanotechnologies","authors":"Andreas Krinke, Shubham Rai, Akash Kumar, J. Lienig","doi":"10.1109/ICCAD51958.2021.9643439","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643439","url":null,"abstract":"Recently proposed ambipolar nanotechnologies allow the development of reconfigurable circuits with low area and power overheads as compared to the conventional CMOS technology. However, using a conventional physical synthesis flow for circuits that include gates based on reconfigurable FETs (RFETs) leads to sub-optimal results. This is due to the fact that the physical synthesis flow for circuits based on RFETs has to cater to the additional gate terminal per RFET transistors. In the present work, we explore three important verticals that lead to an optimized physical synthesis flow for RFET-based circuits with circuit-level reconfigurability: (1) designing optimized layouts of reconfigurable gates, (2) utilize special driver cells to drive the reconfigurable portions of a circuit, and (3) optimized placement of these reconfigurable parts in separate power domains. Experimental evaluations over EPFL benchmarks using our proposed approach show a reduction in chip area of up to 17.5% when compared to conventional flows.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115391845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643484
Wei Zeng, A. Davoodi, R. Topaloglu
Recent years have seen promising studies on machine learning (ML) techniques applied to approximate logic synthesis (ALS), especially based on logic reconstruction from samples of input-output pairs. This “sampling-based ALS” supports integration with conventional logic synthesis and optimization techniques, as well as synthesis for a constrained input space (e.g., when primary input values are restricted using Boolean relations). To achieve an effective sampling-based ALS, for the first time, this paper proposes the use of adaptive decision trees (ADTs), and in particular variations guided by explainable ML. We adopt SHAP importance, which is a feature importance metric derived from a recent advance in explainable ML to guide the training of ADTs. We also include approximation techniques for ADT which are specifically designed for ALS, including don't-care bit assertion and instantiation. Comprehensive experiments show that we can achieve 39%-42% area reduction with 0.20%-0.22% error rate on average, based on 15 logic functions in the IWLS'20 benchmark suite.
{"title":"Sampling-Based Approximate Logic Synthesis: An Explainable Machine Learning Approach","authors":"Wei Zeng, A. Davoodi, R. Topaloglu","doi":"10.1109/ICCAD51958.2021.9643484","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643484","url":null,"abstract":"Recent years have seen promising studies on machine learning (ML) techniques applied to approximate logic synthesis (ALS), especially based on logic reconstruction from samples of input-output pairs. This “sampling-based ALS” supports integration with conventional logic synthesis and optimization techniques, as well as synthesis for a constrained input space (e.g., when primary input values are restricted using Boolean relations). To achieve an effective sampling-based ALS, for the first time, this paper proposes the use of adaptive decision trees (ADTs), and in particular variations guided by explainable ML. We adopt SHAP importance, which is a feature importance metric derived from a recent advance in explainable ML to guide the training of ADTs. We also include approximation techniques for ADT which are specifically designed for ALS, including don't-care bit assertion and instantiation. Comprehensive experiments show that we can achieve 39%-42% area reduction with 0.20%-0.22% error rate on average, based on 15 logic functions in the IWLS'20 benchmark suite.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115735095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643465
Nagadastagiri Challapalle, Karthik Swaminathan, Nandhini Chandramoorthy, V. Narayanan
Graph data structures are central to many applications such as social networks, citation networks, molecular interactions, and navigation systems. Graph Convolutional Networks (GCNs) are used to process and learn insights from the graph data for tasks such as link prediction, node classification, and learning node embeddings. The compute and memory access characteristics of GCNs differ, both from conventional graph analytics algorithms and from convolutional neural networks, rendering the existing accelerators for graph analytics as well as deep learning, inefficient. In this work, we propose PIM-GCN, a crossbar-based processing-in-memory (PIM) accelerator architecture for GCNs. PIM-GCN incorporates a node-stationary dataflow with support for both Compressed Sparse Row (CSR) and Compressed Sparse Column (CSC) graph data representations. We propose techniques for graph traversal in the compressed sparse domain, feature aggregation, and feature transformation operations in GCNs mapped to in-situ analog compute functions of crossbar memory, and present the trade-offs in performance, energy, and scalability aspects of the PIM-GCN architecture for CSR, and CSC graph data representations. PIM-GCN shows an average speedup of over $3-16times$ and an average energy reduction of $4-12times$ compared to the existing accelerator architectures.
{"title":"Crossbar based Processing in Memory Accelerator Architecture for Graph Convolutional Networks","authors":"Nagadastagiri Challapalle, Karthik Swaminathan, Nandhini Chandramoorthy, V. Narayanan","doi":"10.1109/ICCAD51958.2021.9643465","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643465","url":null,"abstract":"Graph data structures are central to many applications such as social networks, citation networks, molecular interactions, and navigation systems. Graph Convolutional Networks (GCNs) are used to process and learn insights from the graph data for tasks such as link prediction, node classification, and learning node embeddings. The compute and memory access characteristics of GCNs differ, both from conventional graph analytics algorithms and from convolutional neural networks, rendering the existing accelerators for graph analytics as well as deep learning, inefficient. In this work, we propose PIM-GCN, a crossbar-based processing-in-memory (PIM) accelerator architecture for GCNs. PIM-GCN incorporates a node-stationary dataflow with support for both Compressed Sparse Row (CSR) and Compressed Sparse Column (CSC) graph data representations. We propose techniques for graph traversal in the compressed sparse domain, feature aggregation, and feature transformation operations in GCNs mapped to in-situ analog compute functions of crossbar memory, and present the trade-offs in performance, energy, and scalability aspects of the PIM-GCN architecture for CSR, and CSC graph data representations. PIM-GCN shows an average speedup of over $3-16times$ and an average energy reduction of $4-12times$ compared to the existing accelerator architectures.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124166380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}