Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218581
Runbin Shi, Yuhao Ding, Xuechao Wei, He Li, Hang Liu, Hayden Kwok-Hay So, Caiwen Ding
Fast inference is of paramount value to a wide range of deep learning applications. This work presents FTDL, a highly-scalable FPGA overlay framework for deep learning applications, to address the architecture and hardware mismatch faced by traditional efforts. The FTDL overlay is specifically optimized for the tiled structure of FPGAs, thereby achieving post-place-and-route operating frequencies exceeding 88 % of the theoretical maximum across different devices and design scales. A flexible compilation framework efficiently schedules matrix multiply and convolution operations of large neural network inference on the overlay and achieved over 80 % hardware efficiency on average. Taking advantage of both high operating frequency and hardware efficiency, FTDL achieves 402.6 and 151.2 FPS with GoogLeNet and ResNet50 on ImageNet, respectively, while operating at a power efficiency of 27.6 GOPS/W, making it up to 7.7 × higher performance and 1.9× more power-efficient than the state-of-the-art.
{"title":"FTDL: A Tailored FPGA-Overlay for Deep Learning with High Scalability","authors":"Runbin Shi, Yuhao Ding, Xuechao Wei, He Li, Hang Liu, Hayden Kwok-Hay So, Caiwen Ding","doi":"10.1109/DAC18072.2020.9218581","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218581","url":null,"abstract":"Fast inference is of paramount value to a wide range of deep learning applications. This work presents FTDL, a highly-scalable FPGA overlay framework for deep learning applications, to address the architecture and hardware mismatch faced by traditional efforts. The FTDL overlay is specifically optimized for the tiled structure of FPGAs, thereby achieving post-place-and-route operating frequencies exceeding 88 % of the theoretical maximum across different devices and design scales. A flexible compilation framework efficiently schedules matrix multiply and convolution operations of large neural network inference on the overlay and achieved over 80 % hardware efficiency on average. Taking advantage of both high operating frequency and hardware efficiency, FTDL achieves 402.6 and 151.2 FPS with GoogLeNet and ResNet50 on ImageNet, respectively, while operating at a power efficiency of 27.6 GOPS/W, making it up to 7.7 × higher performance and 1.9× more power-efficient than the state-of-the-art.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124098955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218517
Xianfeng Li, Gengchao Li, Xiaole Cui
Graphics rendering is a compute-intensive work and a major source of energy consumption on battery-driven mobile devices. Unlike the existing works that degrade user experience or reuse rendering results coarsely, we propose ReTriple, a fine-grained scheme to reduce rendering workload by reusing the past rendering results at the UI element level. This fine-grained reuse mechanism can explore more opportunities to reduce the workload of the rendering process and save energy. The experiments tested with popular apps show that ReTriple achieves an average speedup of 2.6x and per-frame energy saving of 32.3% for the rendering process while improving user experience.
{"title":"ReTriple: Reduction of Redundant Rendering on Android Devices for Performance and Energy Optimizations","authors":"Xianfeng Li, Gengchao Li, Xiaole Cui","doi":"10.1109/DAC18072.2020.9218517","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218517","url":null,"abstract":"Graphics rendering is a compute-intensive work and a major source of energy consumption on battery-driven mobile devices. Unlike the existing works that degrade user experience or reuse rendering results coarsely, we propose ReTriple, a fine-grained scheme to reduce rendering workload by reusing the past rendering results at the UI element level. This fine-grained reuse mechanism can explore more opportunities to reduce the workload of the rendering process and save energy. The experiments tested with popular apps show that ReTriple achieves an average speedup of 2.6x and per-frame energy saving of 32.3% for the rendering process while improving user experience.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123485192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218580
M. Alawieh, D. Boning, D. Pan
With the continuous drive toward integrated circuits scaling, efficient yield analysis is becoming more crucial yet more challenging. In this paper, we propose a novel methodology for wafer map defect pattern classification using deep selective learning. Our proposed approach features an integrated reject option where the model chooses to abstain from predicting a class label when misclassification risk is high. Thus, providing a trade-off between prediction coverage and misclassification risk. This selective learning scheme allows for new defect class detection, concept shift detection, and resource allocation. Besides, and to address the class imbalance problem in the wafer map classification, we propose a data augmentation framework built around a convolutional auto-encoder model for synthetic sample generation. The efficacy of our proposed approach is demonstrated on the WM-811k industrial dataset where it achieves 94% accuracy under full coverage and 99% with selective learning while successfully detecting new defect types.
{"title":"Wafer Map Defect Patterns Classification using Deep Selective Learning","authors":"M. Alawieh, D. Boning, D. Pan","doi":"10.1109/DAC18072.2020.9218580","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218580","url":null,"abstract":"With the continuous drive toward integrated circuits scaling, efficient yield analysis is becoming more crucial yet more challenging. In this paper, we propose a novel methodology for wafer map defect pattern classification using deep selective learning. Our proposed approach features an integrated reject option where the model chooses to abstain from predicting a class label when misclassification risk is high. Thus, providing a trade-off between prediction coverage and misclassification risk. This selective learning scheme allows for new defect class detection, concept shift detection, and resource allocation. Besides, and to address the class imbalance problem in the wafer map classification, we propose a data augmentation framework built around a convolutional auto-encoder model for synthetic sample generation. The efficacy of our proposed approach is demonstrated on the WM-811k industrial dataset where it achieves 94% accuracy under full coverage and 99% with selective learning while successfully detecting new defect types.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121896576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218634
Abhishek Patyal, Hung-Ming Chen, Mark Po-Hung Lin
This paper presents a new paradigm for analog placement, which further incorporates poles in addition to the considerations of symmetry-island and monotonic current flow while minimizing wire crossings. The nodes along the signal path in an analog circuit contribute to the poles, and the parasitics on these dominant poles can significantly limit the circuit performance. Although the monotonic placements introduced in the previous works can generate simpler routing topologies, the unawareness of poles, especially both dominant pole and the first non-dominant pole, and wire crossing among critical nets may result in the increase wire-load and performance degradation. Experimental results show that the proposed pole-aware analog placement method considering symmetry-island, monotonic current flow, and crossing-wire minimization results in much better solution quality in terms of circuit performance.
{"title":"Late Breaking Results: Pole-aware Analog Placement Considering Monotonic Current Flow and Crossing-Wire Minimization","authors":"Abhishek Patyal, Hung-Ming Chen, Mark Po-Hung Lin","doi":"10.1109/DAC18072.2020.9218634","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218634","url":null,"abstract":"This paper presents a new paradigm for analog placement, which further incorporates poles in addition to the considerations of symmetry-island and monotonic current flow while minimizing wire crossings. The nodes along the signal path in an analog circuit contribute to the poles, and the parasitics on these dominant poles can significantly limit the circuit performance. Although the monotonic placements introduced in the previous works can generate simpler routing topologies, the unawareness of poles, especially both dominant pole and the first non-dominant pole, and wire crossing among critical nets may result in the increase wire-load and performance degradation. Experimental results show that the proposed pole-aware analog placement method considering symmetry-island, monotonic current flow, and crossing-wire minimization results in much better solution quality in terms of circuit performance.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123531456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218649
D. Wolf, Christoph Spang, C. Hochberger
Coarse Grained Reconfigurable Arrays become increasingly popular. Besides research on scheduling algorithms and microarchitecture concepts, the use of heterogeneous structures can be a key approach to exploit their full potential. Unfortunately, a purposeful design space exploration of CGRAs is not trivial, since one needs to know the clock frequency of the resulting hardware implementation. This paper discusses challenges and a statistical approach to maximum clock frequency estimation of heterogeneous CGRAs with an irregular interconnect on FPGAs. The presented approach allows estimation with a maximum error of 8.8 - 17.4% and a mean error of only 1.9 - 4.6%.
{"title":"Towards Purposeful Design Space Exploration of Heterogeneous CGRAs: Clock Frequency Estimation","authors":"D. Wolf, Christoph Spang, C. Hochberger","doi":"10.1109/DAC18072.2020.9218649","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218649","url":null,"abstract":"Coarse Grained Reconfigurable Arrays become increasingly popular. Besides research on scheduling algorithms and microarchitecture concepts, the use of heterogeneous structures can be a key approach to exploit their full potential. Unfortunately, a purposeful design space exploration of CGRAs is not trivial, since one needs to know the clock frequency of the resulting hardware implementation. This paper discusses challenges and a statistical approach to maximum clock frequency estimation of heterogeneous CGRAs with an irregular interconnect on FPGAs. The presented approach allows estimation with a maximum error of 8.8 - 17.4% and a mean error of only 1.9 - 4.6%.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"87 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131692258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218705
Benjamin Hettwer, Daniel Fennes, S. Leger, Jan Richter-Brockmann, Stefan Gehrer, T. Güneysu
State-of-the-art hardware masking approaches like threshold implementations and domain-oriented masking provide a guaranteed level of security even in the presence of glitches. Although provable secure in theory, recent work showed that the effective security order of a masked hardware implementation can be lowered by applying a multi-probe attack or exploiting externally amplified coupling effects. However, the proposed attacks are based on an unrealistic adversary model (i.e. knowledge of masks values during profiling) or require complex measurement setup manipulations.In this work, we propose a novel attack vector that exploits location dependent leakage from several decoupling capacitors of a modern System-on-Chip (SoC) with 16 nm fabrication technology. We combine the leakage from different sources using a deep learning-based information fusion approach. The results show a remarkable advantage regarding the number of required traces for a successful key recovery compared to state-of-the-art profiled side-channel attacks. All evaluations are performed under realistic conditions, resulting in a real-world attack scenario that is not limited to academic environments.
{"title":"Deep Learning Multi-Channel Fusion Attack Against Side-Channel Protected Hardware","authors":"Benjamin Hettwer, Daniel Fennes, S. Leger, Jan Richter-Brockmann, Stefan Gehrer, T. Güneysu","doi":"10.1109/DAC18072.2020.9218705","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218705","url":null,"abstract":"State-of-the-art hardware masking approaches like threshold implementations and domain-oriented masking provide a guaranteed level of security even in the presence of glitches. Although provable secure in theory, recent work showed that the effective security order of a masked hardware implementation can be lowered by applying a multi-probe attack or exploiting externally amplified coupling effects. However, the proposed attacks are based on an unrealistic adversary model (i.e. knowledge of masks values during profiling) or require complex measurement setup manipulations.In this work, we propose a novel attack vector that exploits location dependent leakage from several decoupling capacitors of a modern System-on-Chip (SoC) with 16 nm fabrication technology. We combine the leakage from different sources using a deep learning-based information fusion approach. The results show a remarkable advantage regarding the number of required traces for a successful key recovery compared to state-of-the-art profiled side-channel attacks. All evaluations are performed under realistic conditions, resulting in a real-world attack scenario that is not limited to academic environments.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116505418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218538
Jinho Lee, Inseok Hwang, Soham Shah, Minsik Cho
We propose FlexReduce, an efficient and flexible all-reduce algorithm for distributed deep learning under irregular network hierarchies. With ever-growing deep neural networks, distributed learning over multiple nodes is becoming imperative for expedited training. There are several approaches leveraging the symmetric network structure to optimize the performance over different hierarchy levels of the network. However, the assumption of symmetric network does not always hold, especially in shared cloud environments. By allocating an uneven portion of gradients to each learner (GPU), FlexReduce outperforms conventional algorithms on asymmetric network structures, and still performs even or better on symmetric networks.
{"title":"FlexReduce: Flexible All-reduce for Distributed Deep Learning on Asymmetric Network Topology","authors":"Jinho Lee, Inseok Hwang, Soham Shah, Minsik Cho","doi":"10.1109/DAC18072.2020.9218538","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218538","url":null,"abstract":"We propose FlexReduce, an efficient and flexible all-reduce algorithm for distributed deep learning under irregular network hierarchies. With ever-growing deep neural networks, distributed learning over multiple nodes is becoming imperative for expedited training. There are several approaches leveraging the symmetric network structure to optimize the performance over different hierarchy levels of the network. However, the assumption of symmetric network does not always hold, especially in shared cloud environments. By allocating an uneven portion of gradients to each learner (GPU), FlexReduce outperforms conventional algorithms on asymmetric network structures, and still performs even or better on symmetric networks.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129545680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218636
Chenlin Ma, Yi Wang, Zhaoyan Shen, Z. Shao
Shingled Magnetic Recording (SMR) disks have been proposed as a promising solution to satisfy the increasing capacity need in the big data era. Drive-Managed SMR (DM-SMR) disk which acts as a traditional block device is favored for providing high compatibility. However, DM-SMR disks suffer from high performance recovery time (PRT) due to the "SMR space reclamation" issue. This paper proposes an optimal cache management named K-Framed Reclamation (KFR) to minimize PRT within the DM-SMR disk. The effectiveness of our proposed design was evaluated with realistic and intensive I/O workloads and the results are encouraging.
{"title":"KFR: Optimal Cache Management with K-Framed Reclamation for Drive-Managed SMR Disks","authors":"Chenlin Ma, Yi Wang, Zhaoyan Shen, Z. Shao","doi":"10.1109/DAC18072.2020.9218636","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218636","url":null,"abstract":"Shingled Magnetic Recording (SMR) disks have been proposed as a promising solution to satisfy the increasing capacity need in the big data era. Drive-Managed SMR (DM-SMR) disk which acts as a traditional block device is favored for providing high compatibility. However, DM-SMR disks suffer from high performance recovery time (PRT) due to the \"SMR space reclamation\" issue. This paper proposes an optimal cache management named K-Framed Reclamation (KFR) to minimize PRT within the DM-SMR disk. The effectiveness of our proposed design was evaluated with realistic and intensive I/O workloads and the results are encouraging.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"34 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131653368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Service live migration means migrating the running services from one machine to another with negligible service downtime. It has been considered as a powerful mechanism to facilitate service management. However, conventional live migration methods always come with expensive cost of data transmission, and thus can hardly be applied to a real-world edge computing system directly due to the limited network bandwidth. To tackle this problem, some recent works present various techniques to reduce the data transmission.However, these techniques for data transmission reduction always introduce extra computational costs, which have a great impact on the quality of service (QoS), especially in edge systems containing lots of nodes with insufficient computational resources. To alleviate this issue, we propose an insight to offload data reduction computations to a specific hardware accelerator, thus reducing the burden of CPU cores. To this end, we present a novel hardware accelerator design to speed up the data transmission reduction computations to accelerate the service live migration. For evaluation, we implement a prototype on an FPGA platform. Compared to the normal CPU-based approaches, our specialized accelerator is 3.1× faster, 2.9× more-energy efficient, and can reduce 29%∼47% of total migrating time and 24%∼40% of service downtime in our cases. Furthermore, our architecture has great scalability and is easy-configurable to achieve a balance between cost and performance.
{"title":"Hardware-assisted Service Live Migration in Resource-limited Edge Computing Systems","authors":"Zhe Zhou, Xintong Li, Xiaoyang Wang, Zheng Liang, Guangyu Sun, Guojie Luo","doi":"10.1109/DAC18072.2020.9218677","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218677","url":null,"abstract":"Service live migration means migrating the running services from one machine to another with negligible service downtime. It has been considered as a powerful mechanism to facilitate service management. However, conventional live migration methods always come with expensive cost of data transmission, and thus can hardly be applied to a real-world edge computing system directly due to the limited network bandwidth. To tackle this problem, some recent works present various techniques to reduce the data transmission.However, these techniques for data transmission reduction always introduce extra computational costs, which have a great impact on the quality of service (QoS), especially in edge systems containing lots of nodes with insufficient computational resources. To alleviate this issue, we propose an insight to offload data reduction computations to a specific hardware accelerator, thus reducing the burden of CPU cores. To this end, we present a novel hardware accelerator design to speed up the data transmission reduction computations to accelerate the service live migration. For evaluation, we implement a prototype on an FPGA platform. Compared to the normal CPU-based approaches, our specialized accelerator is 3.1× faster, 2.9× more-energy efficient, and can reduce 29%∼47% of total migrating time and 24%∼40% of service downtime in our cases. Furthermore, our architecture has great scalability and is easy-configurable to achieve a balance between cost and performance.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130736899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218556
Quan Chen
In this paper, we aim to address two long-lasting issues in large scale transient circuit simulation using the exponential integrator (EI) method. First is the numerical instability caused by the singularity in the differential-algebraic equation system. Our proposed solution is a systematic, algebraic and sparsity preserving regularization technique to eliminate the unstable modes in the system to be solved. Next, we devise a generic scheme to apply Newton-Raphson iterations in the EI framework for enhanced nonlinearity handling capability. With the two techniques, we wish to elevate the robustness and performance of EI and make it a competitive alternative to the existing SPICE-type simulators in practical usage.
{"title":"A Robust Exponential Integrator Method for Generic Nonlinear Circuit Simulation","authors":"Quan Chen","doi":"10.1109/DAC18072.2020.9218556","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218556","url":null,"abstract":"In this paper, we aim to address two long-lasting issues in large scale transient circuit simulation using the exponential integrator (EI) method. First is the numerical instability caused by the singularity in the differential-algebraic equation system. Our proposed solution is a systematic, algebraic and sparsity preserving regularization technique to eliminate the unstable modes in the system to be solved. Next, we devise a generic scheme to apply Newton-Raphson iterations in the EI framework for enhanced nonlinearity handling capability. With the two techniques, we wish to elevate the robustness and performance of EI and make it a competitive alternative to the existing SPICE-type simulators in practical usage.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134423917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}