Advances in semiconductor technology have enabled unprecedented growth in safety-critical applications. However, due to unabated scaling, the unreliability of the underlying hardware is only getting worse. For a lot of applications, just recovering from errors is not enough -- the latency between the occurrence of the fault to it's detection and recovery from the fault, i.e., in-time error resilience is of vital importance. This is especially true for real-time applications, where the timing of application events is a crucial part of the correctness of application. While software techniques for resilience are highly desirable since they can be flexibly applied, but achieving reliable, in-time software resilience is still an elusive goal. A new class of recent techniques have started to tackle this problem. This paper presents a succinct overview of existing software resilience techniques from the point-of-view of in-time resilience, and points out future challenges.
{"title":"Software Approaches for In-time Resilience","authors":"Aviral Shrivastava, Moslem Didehban","doi":"10.1145/3316781.3323487","DOIUrl":"https://doi.org/10.1145/3316781.3323487","url":null,"abstract":"Advances in semiconductor technology have enabled unprecedented growth in safety-critical applications. However, due to unabated scaling, the unreliability of the underlying hardware is only getting worse. For a lot of applications, just recovering from errors is not enough -- the latency between the occurrence of the fault to it's detection and recovery from the fault, i.e., in-time error resilience is of vital importance. This is especially true for real-time applications, where the timing of application events is a crucial part of the correctness of application. While software techniques for resilience are highly desirable since they can be flexibly applied, but achieving reliable, in-time software resilience is still an elusive goal. A new class of recent techniques have started to tackle this problem. This paper presents a succinct overview of existing software resilience techniques from the point-of-view of in-time resilience, and points out future challenges.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122251621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Kunal, Meghna Madhusudan, Arvind Sharma, Wenbin Xu, S. Burns, R. Harjani, Jiang Hu, D. Kirkpatrick, Sachin S. Sapatnekar
This paper presents analog layout automation efforts under the ALIGN (“Analog Layout, Intelligently Generated from Netlists”) project for fast layout generation using a modular approach based on a mix of algorithmic and machine learning-based tools. The road to rapid turnaround is based on an approach that detects structure and hierarchy in the input netlist and uses a grid based philosophy for layout. The paper provides a view of the current status of the project, challenges in developing open-source code with an academic/industry team, and nuts-and-bolts issues such as working with abstracted PDKs, navigating the “wall” between secured IP and open-source software, and securing access to example designs.
{"title":"ALIGN","authors":"K. Kunal, Meghna Madhusudan, Arvind Sharma, Wenbin Xu, S. Burns, R. Harjani, Jiang Hu, D. Kirkpatrick, Sachin S. Sapatnekar","doi":"10.1145/3316781.3323471","DOIUrl":"https://doi.org/10.1145/3316781.3323471","url":null,"abstract":"This paper presents analog layout automation efforts under the ALIGN (“Analog Layout, Intelligently Generated from Netlists”) project for fast layout generation using a modular approach based on a mix of algorithmic and machine learning-based tools. The road to rapid turnaround is based on an approach that detects structure and hierarchy in the input netlist and uses a grid based philosophy for layout. The paper provides a view of the current status of the project, challenges in developing open-source code with an academic/industry team, and nuts-and-bolts issues such as working with abstracted PDKs, navigating the “wall” between secured IP and open-source software, and securing access to example designs.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130328906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Concerted efforts by the academia and the industries e.g., IBM, Google and Intel have brought us to the era of Noisy Intermediate-Scale Quantum (NISQ) computers. Qubits, the basic elements of quantum computer, have been proven extremely susceptible to different noises. Recent experiments have exhibited spatial variations among the qubits in NISQ hardware. Therefore, conventional mapping of qubit done without quality awareness results in significant loss of fidelity for a given workload. In this paper, we have analyzed the effects of various noise sources on the overall fidelity of the given workload for a real NISQ hardware. We have also presented novel optimization technique namely, Qubit Re-allocation (QURE) to maximize the sequence fidelity of a given workload. QURE is scalable and can be applied to future large scale quantum computers. QURE can improve the fidelity of a quantum workload up to 1.54X (1.39X on average) in simulation and up to 1.7X in real device compared to variation oblivious qubit allocation without incurring any physical overhead. CCS CONCEPTS • Hardware → Quantum error correction and fault tolerance;
{"title":"QURE","authors":"Abdullah Ash-Saki, M. Alam, Swaroop Ghosh","doi":"10.1145/3316781.3317888","DOIUrl":"https://doi.org/10.1145/3316781.3317888","url":null,"abstract":"Concerted efforts by the academia and the industries e.g., IBM, Google and Intel have brought us to the era of Noisy Intermediate-Scale Quantum (NISQ) computers. Qubits, the basic elements of quantum computer, have been proven extremely susceptible to different noises. Recent experiments have exhibited spatial variations among the qubits in NISQ hardware. Therefore, conventional mapping of qubit done without quality awareness results in significant loss of fidelity for a given workload. In this paper, we have analyzed the effects of various noise sources on the overall fidelity of the given workload for a real NISQ hardware. We have also presented novel optimization technique namely, Qubit Re-allocation (QURE) to maximize the sequence fidelity of a given workload. QURE is scalable and can be applied to future large scale quantum computers. QURE can improve the fidelity of a quantum workload up to 1.54X (1.39X on average) in simulation and up to 1.7X in real device compared to variation oblivious qubit allocation without incurring any physical overhead. CCS CONCEPTS • Hardware → Quantum error correction and fault tolerance;","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116034989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Imani, Alice Sokolova, Ricardo Garcia, Andrew Huang, Fan Wu, Baris Aksanli, Tajana Rosing
In a data hungry world, approximate computing has emerged as one of the solutions to create higher energy efficiency and faster systems, while providing application tailored quality. In this paper, we propose ApproxLP, an Approximate Multiplier based on Linear Planes. We introduce an iterative method for approximating the product of two operands using fitted linear functions with two inputs, referred to as linear planes. The linearization of multiplication allows multiplication operations to be completely replaced with weighted addition. The proposed technique is used to find the significand of the product of two floating point numbers, decreasing the high energy cost of floating point arithmetic. Our method fully exploits the trade-off between accuracy and energy consumption by offering various degrees of approximation at different energy costs. As the level of approximation increases, the approximated product asymptotically approaches the exact product in an iterative manner. The performance of ApproxLP is evaluated over a range of multimedia and machine learning applications. A GPU enhanced by ApproxLP yields significant energy-delay product (EDP) improvement. For multimedia, neural network, and hyperdimensional computing applications, ApproxLP offers on average $2.4 times, 2.7 times $, and $4.3 times $ EDP improvement respectively with sufficient computational quality for the application. ApproxLP also provides up to $4.5 times $ EDP improvement and has $2.3 times $ lower chip area than other state-of-the-art approximate multipliers.CCS CONCEPTS•Hardware → Integrated circuits; • Computer systems organization → Architectures;
{"title":"ApproxLP","authors":"M. Imani, Alice Sokolova, Ricardo Garcia, Andrew Huang, Fan Wu, Baris Aksanli, Tajana Rosing","doi":"10.1145/3316781.3317774","DOIUrl":"https://doi.org/10.1145/3316781.3317774","url":null,"abstract":"In a data hungry world, approximate computing has emerged as one of the solutions to create higher energy efficiency and faster systems, while providing application tailored quality. In this paper, we propose ApproxLP, an Approximate Multiplier based on Linear Planes. We introduce an iterative method for approximating the product of two operands using fitted linear functions with two inputs, referred to as linear planes. The linearization of multiplication allows multiplication operations to be completely replaced with weighted addition. The proposed technique is used to find the significand of the product of two floating point numbers, decreasing the high energy cost of floating point arithmetic. Our method fully exploits the trade-off between accuracy and energy consumption by offering various degrees of approximation at different energy costs. As the level of approximation increases, the approximated product asymptotically approaches the exact product in an iterative manner. The performance of ApproxLP is evaluated over a range of multimedia and machine learning applications. A GPU enhanced by ApproxLP yields significant energy-delay product (EDP) improvement. For multimedia, neural network, and hyperdimensional computing applications, ApproxLP offers on average $2.4 times, 2.7 times $, and $4.3 times $ EDP improvement respectively with sufficient computational quality for the application. ApproxLP also provides up to $4.5 times $ EDP improvement and has $2.3 times $ lower chip area than other state-of-the-art approximate multipliers.CCS CONCEPTS•Hardware → Integrated circuits; • Computer systems organization → Architectures;","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130476940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lithography simulation is one of the most fundamental steps in process modeling and physical verification. Conventional simulation methods suffer from a tremendous computational cost for achieving high accuracy. Recently, machine learning was introduced to trade off between accuracy and runtime through speeding up the resist modeling stage of the simulation flow. In this work, we propose LithoGAN, an end-to-end lithography modeling framework based on a generative adversarial network (GAN), to map the input mask patterns directly to the output resist patterns. Our experimental results show that LithoGAN can predict resist patterns with high accuracy while achieving orders of magnitude speedup compared to conventional lithography simulation and previous machine learning based approach.
{"title":"LithoGAN","authors":"Wei Ye, M. Alawieh, Yibo Lin, D. Pan","doi":"10.1145/3316781.3317852","DOIUrl":"https://doi.org/10.1145/3316781.3317852","url":null,"abstract":"Lithography simulation is one of the most fundamental steps in process modeling and physical verification. Conventional simulation methods suffer from a tremendous computational cost for achieving high accuracy. Recently, machine learning was introduced to trade off between accuracy and runtime through speeding up the resist modeling stage of the simulation flow. In this work, we propose LithoGAN, an end-to-end lithography modeling framework based on a generative adversarial network (GAN), to map the input mask patterns directly to the output resist patterns. Our experimental results show that LithoGAN can predict resist patterns with high accuracy while achieving orders of magnitude speedup compared to conventional lithography simulation and previous machine learning based approach.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127647568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Designers wait several hours to get synthesis, placement and routing results even for small changes. Commercial FPGA flows allow for resynthesis after code changes, however, they target large code changes with not so effective incremental flows. We propose SMatch, a flow for FPGAs that has a novel incremental elaboration and novel incremental FPGA placement and routing that improves the state-of-the-art by reducing the amount of placement and routing work needed. We evaluate our approach against commercial FPGAs flows. Our method finishes synthesis, placement, and routing in under 30s for most changes of publicly available benchmarks with negligible QoR impact, being over $20 times$ faster than existing incremental FPGA flows. CCS CONCEPTS •Hardware → Methodologies for EDA; Logic synthesis.
{"title":"SMatch","authors":"R. T. Possignolo, Josep Renau","doi":"10.1145/3316781.3317912","DOIUrl":"https://doi.org/10.1145/3316781.3317912","url":null,"abstract":"Designers wait several hours to get synthesis, placement and routing results even for small changes. Commercial FPGA flows allow for resynthesis after code changes, however, they target large code changes with not so effective incremental flows. We propose SMatch, a flow for FPGAs that has a novel incremental elaboration and novel incremental FPGA placement and routing that improves the state-of-the-art by reducing the amount of placement and routing work needed. We evaluate our approach against commercial FPGAs flows. Our method finishes synthesis, placement, and routing in under 30s for most changes of publicly available benchmarks with negligible QoR impact, being over $20 times$ faster than existing incremental FPGA flows. CCS CONCEPTS •Hardware → Methodologies for EDA; Logic synthesis.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121570675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brendan L. West, Jian Zhou, R. Dreslinski, J. Fowlkes, O. Kripfgans, C. Chakrabarti, T. Wenisch
High volume acquisition rates are imperative for medical ultrasound imaging applications, such as 3D elastography and 3D vector flow imaging. Unfortunately, despite recent algorithmic improvements, high-volume-rate imaging remains computationally infeasible on known platforms.In this paper, we propose TETRIS, a novel hardware accelerator for ultrasound beamforming that enables volume acquisition rates up to the physics limits of acoustic propagation delay. Through algorithmic and hardware optimizations, we enable a streaming system design outclassing previously proposed accelerators in performance while lowering hardware complexity and storage requirements. For a representative imaging task, our proposed system generates physics-limited 13,020 volumes per second in a 2. 5W power budget.CCS CONCEPTS• Hardware → Emerging architectures; 3D integrated circuits.;
{"title":"Tetris","authors":"Brendan L. West, Jian Zhou, R. Dreslinski, J. Fowlkes, O. Kripfgans, C. Chakrabarti, T. Wenisch","doi":"10.1145/3316781.3317921","DOIUrl":"https://doi.org/10.1145/3316781.3317921","url":null,"abstract":"High volume acquisition rates are imperative for medical ultrasound imaging applications, such as 3D elastography and 3D vector flow imaging. Unfortunately, despite recent algorithmic improvements, high-volume-rate imaging remains computationally infeasible on known platforms.In this paper, we propose TETRIS, a novel hardware accelerator for ultrasound beamforming that enables volume acquisition rates up to the physics limits of acoustic propagation delay. Through algorithmic and hardware optimizations, we enable a streaming system design outclassing previously proposed accelerators in performance while lowering hardware complexity and storage requirements. For a representative imaging task, our proposed system generates physics-limited 13,020 volumes per second in a 2. 5W power budget.CCS CONCEPTS• Hardware → Emerging architectures; 3D integrated circuits.;","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116877201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Mahmoodi, H. Nili, S. Larimian, Xinjie Guo, Dmitri B. Strukov
We exploit randomness in static I-V characteristics and reconfigurability of embedded flash memories to design very efficient physically unclonable function. Leakage current and subthreshold slope variations, nonlinearity, nondeterministic tuning error, and sneak path current in the redesigned commercial flash memory arrays are exploited to create a unique digital fingerprint. A time-multiplexed architecture is designed to enhance the security and expand the challenge-response pair space to 10211. Experimental results demonstrate 50.3% average uniformity, 49.99% average diffuseness, and native < 5% bit error rate. The analysis of the measured data also shows strong resilience against machine learning attacks and possibility for extremely energy efficient, 0.56 pJ/b operation.
{"title":"ChipSecure","authors":"M. Mahmoodi, H. Nili, S. Larimian, Xinjie Guo, Dmitri B. Strukov","doi":"10.1145/3316781.3324890","DOIUrl":"https://doi.org/10.1145/3316781.3324890","url":null,"abstract":"We exploit randomness in static I-V characteristics and reconfigurability of embedded flash memories to design very efficient physically unclonable function. Leakage current and subthreshold slope variations, nonlinearity, nondeterministic tuning error, and sneak path current in the redesigned commercial flash memory arrays are exploited to create a unique digital fingerprint. A time-multiplexed architecture is designed to enhance the security and expand the challenge-response pair space to 10211. Experimental results demonstrate 50.3% average uniformity, 49.99% average diffuseness, and native < 5% bit error rate. The analysis of the measured data also shows strong resilience against machine learning attacks and possibility for extremely energy efficient, 0.56 pJ/b operation.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115011681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tomás Picornell, J. Flich, Carles Hernández, J. Duato
The adoption of many-cores in safety-critical systems requires real-time capable networks on chip (NoC). In this paper we propose a new time-predictable NoC design paradigm where contention within the network is eliminated. This new paradigm builds on the Channel Dependency Graph (CDG) and guarantees by design the absence of contention. Our delayed conflict-free NoC (DCFNoC) is able to naturally inject messages using a TDM period equal to the optimal theoretical bound and without the need of using a computationally demanding offline process. Results show that DCFNoC guarantees time predictability with very low implementation cost.
{"title":"DCFNoC","authors":"Tomás Picornell, J. Flich, Carles Hernández, J. Duato","doi":"10.1145/3316781.3317794","DOIUrl":"https://doi.org/10.1145/3316781.3317794","url":null,"abstract":"The adoption of many-cores in safety-critical systems requires real-time capable networks on chip (NoC). In this paper we propose a new time-predictable NoC design paradigm where contention within the network is eliminated. This new paradigm builds on the Channel Dependency Graph (CDG) and guarantees by design the absence of contention. Our delayed conflict-free NoC (DCFNoC) is able to naturally inject messages using a TDM period equal to the optimal theoretical bound and without the need of using a computationally demanding offline process. Results show that DCFNoC guarantees time predictability with very low implementation cost.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127999921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To satisfy increasing computing demands, heterogeneous computing platforms are gaining attention, especially CPU-FPGA platforms. Recently, emerging tightly coupled CPU-FPGA platforms with shared coherent caches (such as the Intel HARP and IBM POWER with CAPI) have been proposed to facilitate data communication and simplify the programming model. In this work, we propose LAMA, a static analysis and dynamic control combined framework for memory access management in such platforms, to further enhance the memory access efficiency and maintain the data consistency. Based on implementation results on the real Intel HARP2 platform, LAMA is shown to improve the performance by 34% on average with low overhead.
{"title":"LAMA","authors":"Liang Feng, Jieru Zhao, Tingyuan Liang, Sharad Sinha, Wei Zhang","doi":"10.1145/3316781.3317846","DOIUrl":"https://doi.org/10.1145/3316781.3317846","url":null,"abstract":"To satisfy increasing computing demands, heterogeneous computing platforms are gaining attention, especially CPU-FPGA platforms. Recently, emerging tightly coupled CPU-FPGA platforms with shared coherent caches (such as the Intel HARP and IBM POWER with CAPI) have been proposed to facilitate data communication and simplify the programming model. In this work, we propose LAMA, a static analysis and dynamic control combined framework for memory access management in such platforms, to further enhance the memory access efficiency and maintain the data consistency. Based on implementation results on the real Intel HARP2 platform, LAMA is shown to improve the performance by 34% on average with low overhead.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128821713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}