Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218673
Hüsrev Cılasun, Salonik Resch, Z. Chowdhury, Erin Olson, Masoud Zabihi, Zhengyang Zhao, Thomas J. Peterson, Jianping Wang, S. Sapatnekar, Ulya R. Karpuzcu
High resolution Fast Fourier Transform (FFT) is important for various applications while increased memory access and parallelism requirement limits the traditional hardware. In this work, we explore acceleration opportunities for high resolution FFTs in spintronic computational RAM (CRAM) which supports true in-memory processing semantics. We experiment with Spin-Torque-Transfer (STT) and Spin-Hall-Effect (SHE) based CRAMs in implementing CRAFFT, a high resolution FFT accelerator in memory. For one million point fixed-point FFT, we demonstrate that CRAFFT can provide up to 2.57× speedup and 673× energy reduction. We also provide a proof-of-concept extension to floating-point FFT.
{"title":"CRAFFT: High Resolution FFT Accelerator In Spintronic Computational RAM","authors":"Hüsrev Cılasun, Salonik Resch, Z. Chowdhury, Erin Olson, Masoud Zabihi, Zhengyang Zhao, Thomas J. Peterson, Jianping Wang, S. Sapatnekar, Ulya R. Karpuzcu","doi":"10.1109/DAC18072.2020.9218673","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218673","url":null,"abstract":"High resolution Fast Fourier Transform (FFT) is important for various applications while increased memory access and parallelism requirement limits the traditional hardware. In this work, we explore acceleration opportunities for high resolution FFTs in spintronic computational RAM (CRAM) which supports true in-memory processing semantics. We experiment with Spin-Torque-Transfer (STT) and Spin-Hall-Effect (SHE) based CRAMs in implementing CRAFFT, a high resolution FFT accelerator in memory. For one million point fixed-point FFT, we demonstrate that CRAFFT can provide up to 2.57× speedup and 673× energy reduction. We also provide a proof-of-concept extension to floating-point FFT.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130993576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218615
Aein Rezaei Shahmirzadi, Shahram Rasoolzadeh, A. Moradi
Protection against active physical attacks is of serious concerns of cryptographic hardware designers. Introduction of SIFA invalidating several previously-thought-effective counter-measures, made this challenge even harder. Here in this work we deal with error correction, and introduce a methodology which shows, depending on the selected adversary model, how to correctly embed error-correcting codes in a cryptographic implementation. Our construction guarantees the correction of faults, in any location of the circuit and at any clock cycle, as long as they fit into the underlying adversary model. Based on case studies evaluated by open-source fault diagnostic tools, we claim protection against SIFA.
{"title":"Impeccable Circuits II","authors":"Aein Rezaei Shahmirzadi, Shahram Rasoolzadeh, A. Moradi","doi":"10.1109/DAC18072.2020.9218615","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218615","url":null,"abstract":"Protection against active physical attacks is of serious concerns of cryptographic hardware designers. Introduction of SIFA invalidating several previously-thought-effective counter-measures, made this challenge even harder. Here in this work we deal with error correction, and introduce a methodology which shows, depending on the selected adversary model, how to correctly embed error-correcting codes in a cryptographic implementation. Our construction guarantees the correction of faults, in any location of the circuit and at any clock cycle, as long as they fit into the underlying adversary model. Based on case studies evaluated by open-source fault diagnostic tools, we claim protection against SIFA.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131252374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218663
Panagiota Kiourti, Kacper Wardega, Susmit Jha, Wenchao Li
We present TrojDRL, a tool for exploring and evaluating backdoor attacks on deep reinforcement learning agents. TrojDRL exploits the sequential nature of deep reinforcement learning (DRL) and considers different gradations of threat models. We show that untargeted attacks on state-of-the-art actor-critic algorithms can circumvent existing defenses built on the assumption of backdoors being targeted. We evaluated TrojDRL on a broad set of DRL benchmarks and showed that the attacks require only poisoning as little as 0.025% of the training data. Compared with existing works of backdoor attacks on classification models, TrojDRL provides a first step towards understanding the vulnerability of DRL agents.
{"title":"TrojDRL: Evaluation of Backdoor Attacks on Deep Reinforcement Learning","authors":"Panagiota Kiourti, Kacper Wardega, Susmit Jha, Wenchao Li","doi":"10.1109/DAC18072.2020.9218663","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218663","url":null,"abstract":"We present TrojDRL, a tool for exploring and evaluating backdoor attacks on deep reinforcement learning agents. TrojDRL exploits the sequential nature of deep reinforcement learning (DRL) and considers different gradations of threat models. We show that untargeted attacks on state-of-the-art actor-critic algorithms can circumvent existing defenses built on the assumption of backdoors being targeted. We evaluated TrojDRL on a broad set of DRL benchmarks and showed that the attacks require only poisoning as little as 0.025% of the training data. Compared with existing works of backdoor attacks on classification models, TrojDRL provides a first step towards understanding the vulnerability of DRL agents.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132350308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218697
N. Khoshavi, A. Roohi, Connor Broyles, S. Sargolzaei, Yu Bi, D. Pan
We propose SHIELDeNN, an end-to-end inference accelerator frame-work that synergizes the mitigation approach and computational resources to realize a low-overhead error-resilient Neural Network (NN) overlay. We develop a rigorous fault assessment paradigm to delineate a ground-truth fault-skeleton map for revealing the most vulnerable parameters in NN. The error-susceptible parameters and resource constraints are given to a function to find superior design. The error-resiliency magnitude offered by SHIELDeNN can be adjusted based on the given boundaries. SHIELDeNN methodology improves the error-resiliency magnitude of cnvW1A1 by 17.19% and 96.15% for 100 MBUs that target weight and activation layers, respectively.
{"title":"SHIELDeNN: Online Accelerated Framework for Fault-Tolerant Deep Neural Network Architectures","authors":"N. Khoshavi, A. Roohi, Connor Broyles, S. Sargolzaei, Yu Bi, D. Pan","doi":"10.1109/DAC18072.2020.9218697","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218697","url":null,"abstract":"We propose SHIELDeNN, an end-to-end inference accelerator frame-work that synergizes the mitigation approach and computational resources to realize a low-overhead error-resilient Neural Network (NN) overlay. We develop a rigorous fault assessment paradigm to delineate a ground-truth fault-skeleton map for revealing the most vulnerable parameters in NN. The error-susceptible parameters and resource constraints are given to a function to find superior design. The error-resiliency magnitude offered by SHIELDeNN can be adjusted based on the given boundaries. SHIELDeNN methodology improves the error-resiliency magnitude of cnvW1A1 by 17.19% and 96.15% for 100 MBUs that target weight and activation layers, respectively.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132083869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218584
Maolin Yang, Zewei Chen, Xu Jiang, Nan Guan, Hang Lei
Real-time scheduling and locking protocols are fundamental facilities to construct time-critical systems. For parallel real-time tasks, predictable locking protocols are required when concurrent sub-jobs mutually exclusive access to shared resources. This paper for the first time studies the distributed synchronization framework of parallel real-time tasks, where both tasks and global resources are partitioned to designated processors, and requests to each global resource are conducted on the processor on which the resource is partitioned. We extend the Distributed Priority Ceiling Protocol (DPCP) for parallel tasks under federated scheduling, with which we proved that a request can be blocked by at most one lower-priority request. We develop task and resource partitioning heuristics and propose analysis techniques to safely bound the task response times. Numerical evaluation (with heavy tasks on 8-, 16-, and 32-core processors) indicates that the proposed methods improve the schedulability significantly compared to the state-of-the-art locking protocols under federated scheduling.
{"title":"DPCP-p: A Distributed Locking Protocol for Parallel Real-Time Tasks","authors":"Maolin Yang, Zewei Chen, Xu Jiang, Nan Guan, Hang Lei","doi":"10.1109/DAC18072.2020.9218584","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218584","url":null,"abstract":"Real-time scheduling and locking protocols are fundamental facilities to construct time-critical systems. For parallel real-time tasks, predictable locking protocols are required when concurrent sub-jobs mutually exclusive access to shared resources. This paper for the first time studies the distributed synchronization framework of parallel real-time tasks, where both tasks and global resources are partitioned to designated processors, and requests to each global resource are conducted on the processor on which the resource is partitioned. We extend the Distributed Priority Ceiling Protocol (DPCP) for parallel tasks under federated scheduling, with which we proved that a request can be blocked by at most one lower-priority request. We develop task and resource partitioning heuristics and propose analysis techniques to safely bound the task response times. Numerical evaluation (with heavy tasks on 8-, 16-, and 32-core processors) indicates that the proposed methods improve the schedulability significantly compared to the state-of-the-art locking protocols under federated scheduling.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114943458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218691
L. Amarù, F. Marranghello, Eleonora Testa, Christopher Casares, V. Possani, Jiong Luo, P. Vuillod, A. Mishchenko, G. Micheli
SAT-sweeping is a powerful method for simplifying logic networks. It consists of merging gates that are proven equivalent (up to complementation) by running simulation and SAT solving in synergy. SAT-sweeping is used in both verification and synthesis applications within EDA. In this paper, we focus on the development of a highly efficient, synthesis-oriented, SAT-sweeping engine. We introduce a new algorithm to guide initial simulation, which strongly reduces the number of false candidates for merge, thus increasing the computational efficiency of the sweeper. We revisit the SAT-sweeping flow in light of practical considerations for synthesis, with the aim of proving all valid merges and ensuring fast execution. Experimental results confirm remarkable speedup deriving from our methodology, up to 10× for large combinational networks, and better QoR as compared to previous SAT-sweeping implementation. Embedded in a commercial synthesis flow, our proposes SAT-sweeper enables area and power savings of 1.98% and 1.81%, respectively, with neutral timing at negligible runtime overhead, over 36 testcases.
{"title":"SAT-Sweeping Enhanced for Logic Synthesis","authors":"L. Amarù, F. Marranghello, Eleonora Testa, Christopher Casares, V. Possani, Jiong Luo, P. Vuillod, A. Mishchenko, G. Micheli","doi":"10.1109/DAC18072.2020.9218691","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218691","url":null,"abstract":"SAT-sweeping is a powerful method for simplifying logic networks. It consists of merging gates that are proven equivalent (up to complementation) by running simulation and SAT solving in synergy. SAT-sweeping is used in both verification and synthesis applications within EDA. In this paper, we focus on the development of a highly efficient, synthesis-oriented, SAT-sweeping engine. We introduce a new algorithm to guide initial simulation, which strongly reduces the number of false candidates for merge, thus increasing the computational efficiency of the sweeper. We revisit the SAT-sweeping flow in light of practical considerations for synthesis, with the aim of proving all valid merges and ensuring fast execution. Experimental results confirm remarkable speedup deriving from our methodology, up to 10× for large combinational networks, and better QoR as compared to previous SAT-sweeping implementation. Embedded in a commercial synthesis flow, our proposes SAT-sweeper enables area and power savings of 1.98% and 1.81%, respectively, with neutral timing at negligible runtime overhead, over 36 testcases.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116176072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218715
Eshan Singh, Florian Lonsing, Saranyu Chattopadhyay, Maxwell Strange, Peng Wei, Xiaofan Zhang, Yuan Zhou, Deming Chen, J. Cong, Priyanka Raina, Zhiru Zhang, Clark W. Barrett, S. Mitra
We present A-QED (Accelerator-Quick Error Detection), a new approach for pre-silicon formal verification of stand-alone hardware accelerators. A-QED relies on bounded model checking -- however, it does not require extensive design-specific properties or a full formal design specification. While A- QED is effective for both RTL and high-level synthesis (HLS) design flows, it integrates seamlessly with HLS flows. Our A-QED results on several hardware accelerator designs demonstrate its practicality and effectiveness: 1. A-QED detected all bugs detected by conventional verification flow. 2. A-QED detected bugs that escaped conventional verification flow. 3. A-QED improved verification productivity dramatically, by 30X, in one of our case studies (1 person-day using A-QED vs. 30 person-days using conventional verification flow). 4. A-QED produced short counterexamples for easy debug (37X shorter on average vs. conventional verification flow).
{"title":"A-QED Verification of Hardware Accelerators","authors":"Eshan Singh, Florian Lonsing, Saranyu Chattopadhyay, Maxwell Strange, Peng Wei, Xiaofan Zhang, Yuan Zhou, Deming Chen, J. Cong, Priyanka Raina, Zhiru Zhang, Clark W. Barrett, S. Mitra","doi":"10.1109/DAC18072.2020.9218715","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218715","url":null,"abstract":"We present A-QED (Accelerator-Quick Error Detection), a new approach for pre-silicon formal verification of stand-alone hardware accelerators. A-QED relies on bounded model checking -- however, it does not require extensive design-specific properties or a full formal design specification. While A- QED is effective for both RTL and high-level synthesis (HLS) design flows, it integrates seamlessly with HLS flows. Our A-QED results on several hardware accelerator designs demonstrate its practicality and effectiveness: 1. A-QED detected all bugs detected by conventional verification flow. 2. A-QED detected bugs that escaped conventional verification flow. 3. A-QED improved verification productivity dramatically, by 30X, in one of our case studies (1 person-day using A-QED vs. 30 person-days using conventional verification flow). 4. A-QED produced short counterexamples for easy debug (37X shorter on average vs. conventional verification flow).","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116345811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218551
Ji Zhang, Yuanzhang Wang, Yangtao Wang, Ke Zhou, Sebastian Schelter, Ping Huang, Bin Cheng, Yongguang Ji
Sector errors are a common type of error in modern disks. A sector error that occurs during I/O operations might cause inaccessibility of an application. Even worse, it could result in permanent data loss if the data is being reconstructed, and thereby severely affects the reliability of a storage system. Many disk scrubbing schemes have been proposed to solve this problem. However, existing approaches have several limitations. First, schemes use machine learning (ML) to predict latent sector errors (LSEs), but only leverage a single snapshot of training data to make a prediction, and thereby ignore sequential dependencies between different statuses of a hard disk over time. Second, they accelerate the scrubbing at a fixed rate based on the results of a binary classification model, which may result in unnecessary increases in scrubbing cost. Third, they naively accelerate the scrubbing of the full disk which has LSEs based on the predictive results, but neglect partial high-risk areas (the areas that have a higher probability of encountering LSEs). Lastly, they do not employ strategies to scrub these high-risk areas in advance based on I/O accesses patterns, in order to further increase the efficiency of scrubbing.We address these challenges by designing a Tier-Scrubbing (TS) scheme that combines a Long Short-Term Memory (LSTM) based Adaptive Scrubbing Rate Controller (ASRC), a module focusing on sector error locality to locate high-risk areas in a disk, and a piggyback scrubbing strategy to improve the reliability of a storage system. Our evaluation results on realistic datasets and workloads from two real world data centers demonstrate that TS can simultaneously decrease the Mean-Time-To-Detection (MTTD) by about 80% and the scrubbing cost by 20%, compared to a state-of-the-art scrubbing scheme.
{"title":"Tier-Scrubbing: An Adaptive and Tiered Disk Scrubbing Scheme with Improved MTTD and Reduced Cost","authors":"Ji Zhang, Yuanzhang Wang, Yangtao Wang, Ke Zhou, Sebastian Schelter, Ping Huang, Bin Cheng, Yongguang Ji","doi":"10.1109/DAC18072.2020.9218551","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218551","url":null,"abstract":"Sector errors are a common type of error in modern disks. A sector error that occurs during I/O operations might cause inaccessibility of an application. Even worse, it could result in permanent data loss if the data is being reconstructed, and thereby severely affects the reliability of a storage system. Many disk scrubbing schemes have been proposed to solve this problem. However, existing approaches have several limitations. First, schemes use machine learning (ML) to predict latent sector errors (LSEs), but only leverage a single snapshot of training data to make a prediction, and thereby ignore sequential dependencies between different statuses of a hard disk over time. Second, they accelerate the scrubbing at a fixed rate based on the results of a binary classification model, which may result in unnecessary increases in scrubbing cost. Third, they naively accelerate the scrubbing of the full disk which has LSEs based on the predictive results, but neglect partial high-risk areas (the areas that have a higher probability of encountering LSEs). Lastly, they do not employ strategies to scrub these high-risk areas in advance based on I/O accesses patterns, in order to further increase the efficiency of scrubbing.We address these challenges by designing a Tier-Scrubbing (TS) scheme that combines a Long Short-Term Memory (LSTM) based Adaptive Scrubbing Rate Controller (ASRC), a module focusing on sector error locality to locate high-risk areas in a disk, and a piggyback scrubbing strategy to improve the reliability of a storage system. Our evaluation results on realistic datasets and workloads from two real world data centers demonstrate that TS can simultaneously decrease the Mean-Time-To-Detection (MTTD) by about 80% and the scrubbing cost by 20%, compared to a state-of-the-art scrubbing scheme.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131911869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218508
Huili Chen, Rosario Cammarota, Felipe Valencia, F. Regazzoni, F. Koushanfar
Privacy-preserving machine learning (PPML) is driven by the emerging adoption of Machine Learning as a Service (MLaaS). In a typical MLaaS system, the end-user sends his personal data to the service provider and receives the corresponding prediction output. However, such interaction raises severe privacy concerns about both the user’s proprietary data and the server’s ML model. PPML integrates cryptographic primitives such as Multi-Party Computation (MPC) and/or Homomorphic Encryption (HE) into ML services to resolve the privacy issue. However, existing PPML solutions have not been widely deployed in practice since: (i) Privacy protection comes at the cost of additional computation and/or communication overhead; (ii) Adapting PPML to different front-end frameworks and back-end hardware incurs prohibitive engineering cost.We propose AHEC, the first automated, end-to-end HE compiler for efficient PPML inference. Leveraging the capability of Domain Specific Languages (DSLs), AHEC enables automated generation and optimization of HE kernels across diverse types of hardware platforms and ML frameworks. We perform extensive experiments to investigate the performance of AHEC from different abstraction levels: HE operations, HE-based ML kernels, and neural network layers. Empirical results corroborate that AHEC achieves superior runtime reduction compared to the state-of-the-art solutions built from static HE libraries.
{"title":"AHEC: End-to-end Compiler Framework for Privacy-preserving Machine Learning Acceleration","authors":"Huili Chen, Rosario Cammarota, Felipe Valencia, F. Regazzoni, F. Koushanfar","doi":"10.1109/DAC18072.2020.9218508","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218508","url":null,"abstract":"Privacy-preserving machine learning (PPML) is driven by the emerging adoption of Machine Learning as a Service (MLaaS). In a typical MLaaS system, the end-user sends his personal data to the service provider and receives the corresponding prediction output. However, such interaction raises severe privacy concerns about both the user’s proprietary data and the server’s ML model. PPML integrates cryptographic primitives such as Multi-Party Computation (MPC) and/or Homomorphic Encryption (HE) into ML services to resolve the privacy issue. However, existing PPML solutions have not been widely deployed in practice since: (i) Privacy protection comes at the cost of additional computation and/or communication overhead; (ii) Adapting PPML to different front-end frameworks and back-end hardware incurs prohibitive engineering cost.We propose AHEC, the first automated, end-to-end HE compiler for efficient PPML inference. Leveraging the capability of Domain Specific Languages (DSLs), AHEC enables automated generation and optimization of HE kernels across diverse types of hardware platforms and ML frameworks. We perform extensive experiments to investigate the performance of AHEC from different abstraction levels: HE operations, HE-based ML kernels, and neural network layers. Empirical results corroborate that AHEC achieves superior runtime reduction compared to the state-of-the-art solutions built from static HE libraries.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133487350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/DAC18072.2020.9218500
He-Teng Zhang, J. H. Jiang
Fanouts are an essential element for signal cloning to achieve logic sharing, but can be a very limited resource in certain emerging technologies, such as quantum circuits, superconducting electronic circuits, photonic integrated circuits, and biological circuits. Although fanout synthesis has been intensively studied for high performance circuit synthesis, prior methods often treat fanout as a soft constraint for critical path optimization or target on specific high-fanout nets such as clock and reset signals. They are not particularly suited for circuit synthesis of these emerging technologies. By treating fanouts as first class citizens, the problem of fanout-bounded logic synthesis was posed as a challenge in the 2019 IWLS Programming Contest. In this paper, we present our winning method, which achieved the overall best quality in the competition, based on fanout load redistribution among existing or expanded equivalent signals.
{"title":"SFO: A Scalable Approach to Fanout-Bounded Logic Synthesis for Emerging Technologies","authors":"He-Teng Zhang, J. H. Jiang","doi":"10.1109/DAC18072.2020.9218500","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218500","url":null,"abstract":"Fanouts are an essential element for signal cloning to achieve logic sharing, but can be a very limited resource in certain emerging technologies, such as quantum circuits, superconducting electronic circuits, photonic integrated circuits, and biological circuits. Although fanout synthesis has been intensively studied for high performance circuit synthesis, prior methods often treat fanout as a soft constraint for critical path optimization or target on specific high-fanout nets such as clock and reset signals. They are not particularly suited for circuit synthesis of these emerging technologies. By treating fanouts as first class citizens, the problem of fanout-bounded logic synthesis was posed as a challenge in the 2019 IWLS Programming Contest. In this paper, we present our winning method, which achieved the overall best quality in the competition, based on fanout load redistribution among existing or expanded equivalent signals.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130580507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}