Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116343
Stefano Spellini, Roberta Chirico, M. Panato, M. Lora, F. Fummi
The advent of Industry 4.0 is making production processes every day more complicated. As such, early process validation is becoming crucial to avoid production errors thus decreasing costs. In this paper, we present an approach to validate production recipes. Initially, the recipe is specified according to the ISA-95 standard, while the production plant is described using AutomationML. These specifications are formalized into a hierarchy of assume-guarantee contracts. Each contract specifies a set of temporal behaviors, characterizing the different machines composing the production line, their actions and interaction. Then, the formal specifications provided by the contracts are systematically synthesized to automatically generate a digital twin for the production line. Finally, the digital twin is used to evaluate, and validate, both the functional and the extra-functional characteristics of the system.The methodology has been applied to validate the production of a product requiring additive manufacturing, robotic assembling and transportation.
{"title":"Production Recipe Validation through Formalization and Digital Twin Generation","authors":"Stefano Spellini, Roberta Chirico, M. Panato, M. Lora, F. Fummi","doi":"10.23919/DATE48585.2020.9116343","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116343","url":null,"abstract":"The advent of Industry 4.0 is making production processes every day more complicated. As such, early process validation is becoming crucial to avoid production errors thus decreasing costs. In this paper, we present an approach to validate production recipes. Initially, the recipe is specified according to the ISA-95 standard, while the production plant is described using AutomationML. These specifications are formalized into a hierarchy of assume-guarantee contracts. Each contract specifies a set of temporal behaviors, characterizing the different machines composing the production line, their actions and interaction. Then, the formal specifications provided by the contracts are systematically synthesized to automatically generate a digital twin for the production line. Finally, the digital twin is used to evaluate, and validate, both the functional and the extra-functional characteristics of the system.The methodology has been applied to validate the production of a product requiring additive manufacturing, robotic assembling and transportation.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116967895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116576
Haotian Wang, W. Kang, Liuyang Zhang, He Zhang, B. Kaushik, Weisheng Zhao
Voltage-control spin orbit torque (VC-SOT) magnetic tunnel junction (MTJ) has the potential to achieve high-speed and low-power spintronic memory, owing to the adaptive voltage modulated energy barrier of the MTJ. However, the three-terminal device structure needs two access transistors (one for write operation and the other one for read operation) and thus occupies larger bit-cell area compared to two terminal MTJs. A feasible method to reduce area overhead is to stack multiple VC-SOT MTJs on a common antiferromagnetic strip to share the write access transistors. In this structure, high density can be achieved. However, write and read operations face problems and the design space is not sure given a strip length. In this paper, we propose a synchronous two-step multi-bit write and symmetric read method by exploiting the selective VC-SOT driven MTJ switching mechanism. Then hybrid circuits are designed and evaluated based a physics-based VC-SOT MTJ model and a 40nm CMOS design-kit to show the feasibility and performance of our method. Our work enables high-density, low-power, high-speed voltage-control SOT memory.
{"title":"High-Density, Low-Power Voltage-Control Spin Orbit Torque Memory with Synchronous Two-Step Write and Symmetric Read Techniques","authors":"Haotian Wang, W. Kang, Liuyang Zhang, He Zhang, B. Kaushik, Weisheng Zhao","doi":"10.23919/DATE48585.2020.9116576","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116576","url":null,"abstract":"Voltage-control spin orbit torque (VC-SOT) magnetic tunnel junction (MTJ) has the potential to achieve high-speed and low-power spintronic memory, owing to the adaptive voltage modulated energy barrier of the MTJ. However, the three-terminal device structure needs two access transistors (one for write operation and the other one for read operation) and thus occupies larger bit-cell area compared to two terminal MTJs. A feasible method to reduce area overhead is to stack multiple VC-SOT MTJs on a common antiferromagnetic strip to share the write access transistors. In this structure, high density can be achieved. However, write and read operations face problems and the design space is not sure given a strip length. In this paper, we propose a synchronous two-step multi-bit write and symmetric read method by exploiting the selective VC-SOT driven MTJ switching mechanism. Then hybrid circuits are designed and evaluated based a physics-based VC-SOT MTJ model and a 40nm CMOS design-kit to show the feasibility and performance of our method. Our work enables high-density, low-power, high-speed voltage-control SOT memory.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117088753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116546
Ken Chau-Cheung Cheng, Katherine Shu-Min Li, Andrew Yi-Ann Huang, Ji-Wei Li, L. Chen, Nova Cheng-Yen Tsai, Sying-Jyan Wang, Chen-Shiun Lee, Leon Chou, Peter Yi-Yu Liao, Hsing-Chung Liang, Jwu E. Chen
Wafer defect maps provide precious information of fabrication and test process defects, so they can be used as valuable sources to improve fabrication and test yield. This paper applies artificial intelligence based pattern recognition techniques to distinguish fab-induced defects from test-induced ones. As a result, test quality, reliability and yield could be improved accordingly. Wafer test data contain site-dependent information regarding test configurations in automatic test equipment, including effective load push force, gap between probe and load-board, probe tip size, probe-cleaning stress, etc. Our method analyzes both the test paths and site-dependent test characteristics to identify test-induced defects. Experimental results achieve 96.83% prediction accuracy of six NXP products, which show that our methods are both effective and efficient.
{"title":"Wafer-Level Test Path Pattern Recognition and Test Characteristics for Test-Induced Defect Diagnosis","authors":"Ken Chau-Cheung Cheng, Katherine Shu-Min Li, Andrew Yi-Ann Huang, Ji-Wei Li, L. Chen, Nova Cheng-Yen Tsai, Sying-Jyan Wang, Chen-Shiun Lee, Leon Chou, Peter Yi-Yu Liao, Hsing-Chung Liang, Jwu E. Chen","doi":"10.23919/DATE48585.2020.9116546","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116546","url":null,"abstract":"Wafer defect maps provide precious information of fabrication and test process defects, so they can be used as valuable sources to improve fabrication and test yield. This paper applies artificial intelligence based pattern recognition techniques to distinguish fab-induced defects from test-induced ones. As a result, test quality, reliability and yield could be improved accordingly. Wafer test data contain site-dependent information regarding test configurations in automatic test equipment, including effective load push force, gap between probe and load-board, probe tip size, probe-cleaning stress, etc. Our method analyzes both the test paths and site-dependent test characteristics to identify test-induced defects. Experimental results achieve 96.83% prediction accuracy of six NXP products, which show that our methods are both effective and efficient.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117252292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116316
F. Ganji, Sarah Amir, Shahin Tajik, Domenic Forte, Jean-Pierre Seifert
The concept of the adversary model has been widely applied in the context of cryptography. When designing a cryptographic scheme or protocol, the adversary model plays a crucial role in the formalization of the capabilities and limitations of potential attackers. These models further enable the designer to verify the security of the scheme or protocol under investigation. Although being well established for conventional cryptanalysis attacks, adversary models associated with attackers enjoying the advantages of machine learning techniques have not yet been developed thoroughly. In particular, when it comes to composed hardware, often being security-critical, the lack of such models has become increasingly noticeable in the face of advanced, machine learning-enabled attacks. This paper aims at exploring the adversary models from the machine learning perspective. In this regard, we provide examples of machine learning-based attacks against hardware primitives, e.g., obfuscation schemes and hardware root-of-trust, claimed to be infeasible. We demonstrate that this assumption becomes however invalid as inaccurate adversary models have been considered in the literature.
{"title":"Pitfalls in Machine Learning-based Adversary Modeling for Hardware Systems","authors":"F. Ganji, Sarah Amir, Shahin Tajik, Domenic Forte, Jean-Pierre Seifert","doi":"10.23919/DATE48585.2020.9116316","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116316","url":null,"abstract":"The concept of the adversary model has been widely applied in the context of cryptography. When designing a cryptographic scheme or protocol, the adversary model plays a crucial role in the formalization of the capabilities and limitations of potential attackers. These models further enable the designer to verify the security of the scheme or protocol under investigation. Although being well established for conventional cryptanalysis attacks, adversary models associated with attackers enjoying the advantages of machine learning techniques have not yet been developed thoroughly. In particular, when it comes to composed hardware, often being security-critical, the lack of such models has become increasingly noticeable in the face of advanced, machine learning-enabled attacks. This paper aims at exploring the adversary models from the machine learning perspective. In this regard, we provide examples of machine learning-based attacks against hardware primitives, e.g., obfuscation schemes and hardware root-of-trust, claimed to be infeasible. We demonstrate that this assumption becomes however invalid as inaccurate adversary models have been considered in the literature.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116743367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116494
Farzaneh Zokaee, Qian Lou, N. Youngblood, Weichen Liu, Yiyuan Xie, Lei Jiang
Although Convolutional Neural Networks (CNNs) have demonstrated the state-of-the-art inference accuracy in various intelligent applications, each CNN inference involves millions of expensive floating point multiply-accumulate (MAC) operations. To energy-efficiently process CNN inferences, prior work proposes an electro-optical accelerator to process power-of-2 quantized CNNs by electro-optical ripple-carry adders and optical binary shifters. The electro-optical accelerator also uses SRAM registers to store intermediate data. However, electro-optical ripple-carry adders and SRAMs seriously limit the operating frequency and inference throughput of the electro-optical accelerator, due to the long critical path of the adder and the long access latency of SRAMs. In this paper, we propose a photonic nonvolatile memory (NVM)-based accelerator, Light-Bulb, to process binarized CNNs by high frequency photonic XNOR gates and popcount units. LightBulb also adopts photonic racetrack memory to serve as input/output registers to achieve high operating frequency. Compared to prior electro-optical accelerators, on average, LightBulb improves the CNN inference throughput by 17× ~ 173× and the inference throughput per Watt by 17.5 × ~ 660×.
{"title":"LightBulb: A Photonic-Nonvolatile-Memory-based Accelerator for Binarized Convolutional Neural Networks","authors":"Farzaneh Zokaee, Qian Lou, N. Youngblood, Weichen Liu, Yiyuan Xie, Lei Jiang","doi":"10.23919/DATE48585.2020.9116494","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116494","url":null,"abstract":"Although Convolutional Neural Networks (CNNs) have demonstrated the state-of-the-art inference accuracy in various intelligent applications, each CNN inference involves millions of expensive floating point multiply-accumulate (MAC) operations. To energy-efficiently process CNN inferences, prior work proposes an electro-optical accelerator to process power-of-2 quantized CNNs by electro-optical ripple-carry adders and optical binary shifters. The electro-optical accelerator also uses SRAM registers to store intermediate data. However, electro-optical ripple-carry adders and SRAMs seriously limit the operating frequency and inference throughput of the electro-optical accelerator, due to the long critical path of the adder and the long access latency of SRAMs. In this paper, we propose a photonic nonvolatile memory (NVM)-based accelerator, Light-Bulb, to process binarized CNNs by high frequency photonic XNOR gates and popcount units. LightBulb also adopts photonic racetrack memory to serve as input/output registers to achieve high operating frequency. Compared to prior electro-optical accelerators, on average, LightBulb improves the CNN inference throughput by 17× ~ 173× and the inference throughput per Watt by 17.5 × ~ 660×.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127367156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116496
A. Narayan, Y. Thonnart, P. Vivet, A. Joshi, A. Coskun
Emerging data-driven applications such as graph processing applications are characterized by their excessive memory footprint and abundant parallelism, resulting in high memory bandwidth demand. As the scale of datasets for applications is reaching orders of TBs, performance limitation due to bandwidth demands is a major concern. Traditional on-chip electrical networks fail to meet such high bandwidth demands due to increased energy-per-bit or physical limitations with pin counts. Silicon photonic networks have emerged as a promising alternative to electrical interconnects, owing to their high bandwidth density and low energy-per-bit communication with negligible data-dependent power. Wide-scale adoption of silicon photonics at chip level, however, is hampered by their high sensitivity to process and thermal variations, high laser power due to losses along the network, and power consumption of the electrical-optical conversion. Device-level technological innovations to mitigate these issues are promising, yet they do not consider the system-level implications of the applications running on manycore systems with photonic networks. This work aims to bridge the gap between the system-level attributes of applications with the underlying architectural and device-level characteristics of silicon photonic networks to achieve energy-efficient computing. We particularly focus on graph applications, which involve unstructured yet abundant parallel memory accesses that stress the on-chip communication networks, and develop a cross-layer framework to evaluate 2.5D systems with silicon photonic networks. We demonstrate 38% power savings through system-level management using wavelength selection policies with only 1% loss in system performance and further evaluate architectural design choices on 2.5D systems with photonic networks.
{"title":"System-level Evaluation of Chip-Scale Silicon Photonic Networks for Emerging Data-Intensive Applications","authors":"A. Narayan, Y. Thonnart, P. Vivet, A. Joshi, A. Coskun","doi":"10.23919/DATE48585.2020.9116496","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116496","url":null,"abstract":"Emerging data-driven applications such as graph processing applications are characterized by their excessive memory footprint and abundant parallelism, resulting in high memory bandwidth demand. As the scale of datasets for applications is reaching orders of TBs, performance limitation due to bandwidth demands is a major concern. Traditional on-chip electrical networks fail to meet such high bandwidth demands due to increased energy-per-bit or physical limitations with pin counts. Silicon photonic networks have emerged as a promising alternative to electrical interconnects, owing to their high bandwidth density and low energy-per-bit communication with negligible data-dependent power. Wide-scale adoption of silicon photonics at chip level, however, is hampered by their high sensitivity to process and thermal variations, high laser power due to losses along the network, and power consumption of the electrical-optical conversion. Device-level technological innovations to mitigate these issues are promising, yet they do not consider the system-level implications of the applications running on manycore systems with photonic networks. This work aims to bridge the gap between the system-level attributes of applications with the underlying architectural and device-level characteristics of silicon photonic networks to achieve energy-efficient computing. We particularly focus on graph applications, which involve unstructured yet abundant parallel memory accesses that stress the on-chip communication networks, and develop a cross-layer framework to evaluate 2.5D systems with silicon photonic networks. We demonstrate 38% power savings through system-level management using wavelength selection policies with only 1% loss in system performance and further evaluate architectural design choices on 2.5D systems with photonic networks.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125712857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116552
S. De, S. Mohamed, Konstantinos Bimpisidis, Dip Goswami, T. Basten, H. Corporaal
Image-based control (IBC) systems use camera sensor(s) to perceive the environment. The inherent compute-heavy nature of image processing causes long processing delay that negatively influences the performance of the IBC systems. Our idea is to reduce the long delay using coarse-grained approximation of the image signal processing pipeline without affecting the functionality and performance of the IBC system. The question is: how is the degree of approximation related to the closed-loop quality-of-control (QoC), memory utilization and energy consumption? We present a software-in-the-loop (SiL) evaluation framework for the above approximation-in-the-loop system. We identify the error resilient stages and the corresponding coarse-grained approximation settings for the IBC system. We perform trade off analysis between the QoC, memory utilisation and energy consumption for varying degrees of coarse-grained approximation. We demonstrate the effectiveness of our approach using a concrete case study of a lane keeping assist system (LKAS). We obtain energy and memory reduction of upto 84% and 29% respectively, for 28% QoC improvements.
{"title":"Approximation Trade Offs in an Image-Based Control System","authors":"S. De, S. Mohamed, Konstantinos Bimpisidis, Dip Goswami, T. Basten, H. Corporaal","doi":"10.23919/DATE48585.2020.9116552","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116552","url":null,"abstract":"Image-based control (IBC) systems use camera sensor(s) to perceive the environment. The inherent compute-heavy nature of image processing causes long processing delay that negatively influences the performance of the IBC systems. Our idea is to reduce the long delay using coarse-grained approximation of the image signal processing pipeline without affecting the functionality and performance of the IBC system. The question is: how is the degree of approximation related to the closed-loop quality-of-control (QoC), memory utilization and energy consumption? We present a software-in-the-loop (SiL) evaluation framework for the above approximation-in-the-loop system. We identify the error resilient stages and the corresponding coarse-grained approximation settings for the IBC system. We perform trade off analysis between the QoC, memory utilisation and energy consumption for varying degrees of coarse-grained approximation. We demonstrate the effectiveness of our approach using a concrete case study of a lane keeping assist system (LKAS). We obtain energy and memory reduction of upto 84% and 29% respectively, for 28% QoC improvements.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128048969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116572
Subodha Charles, Megan Logan, P. Mishra
System-on-Chip (SoC) supply chain is widely acknowledged as a major source of security vulnerabilities. Potentially malicious third-party IPs integrated on the same Network-on-Chip (NoC) with the trusted components can lead to security and trust concerns. While secure communication is a well studied problem in computer networks domain, it is not feasible to implement those solutions on resource-constrained SoCs. In this paper, we present a lightweight anonymous routing protocol for communication between IP cores in NoC based SoCs. Our method eliminates the major overhead associated with traditional anonymous routing protocols while ensuring that the desired security goals are met. Experimental results demonstrate that existing security solutions on NoC can introduce significant (1.5X) performance degradation, whereas our approach provides the same security features with minor (4%) impact on performance.
{"title":"Lightweight Anonymous Routing in NoC based SoCs","authors":"Subodha Charles, Megan Logan, P. Mishra","doi":"10.23919/DATE48585.2020.9116572","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116572","url":null,"abstract":"System-on-Chip (SoC) supply chain is widely acknowledged as a major source of security vulnerabilities. Potentially malicious third-party IPs integrated on the same Network-on-Chip (NoC) with the trusted components can lead to security and trust concerns. While secure communication is a well studied problem in computer networks domain, it is not feasible to implement those solutions on resource-constrained SoCs. In this paper, we present a lightweight anonymous routing protocol for communication between IP cores in NoC based SoCs. Our method eliminates the major overhead associated with traditional anonymous routing protocols while ensuring that the desired security goals are met. Experimental results demonstrate that existing security solutions on NoC can introduce significant (1.5X) performance degradation, whereas our approach provides the same security features with minor (4%) impact on performance.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124153878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116235
Lei Xun, Long Tran-Thanh, B. Al-Hashimi, G. Merrett
Machine learning inference is increasingly being executed locally on mobile and embedded platforms, due to the clear advantages in latency, privacy and connectivity. In this paper, we present approaches for online resource management in heterogeneous multi-core systems and show how they can be applied to optimise the performance of machine learning work-loads. Performance can be defined using platform-dependent (e.g. speed, energy) and platform-independent (accuracy, confidence) metrics. In particular, we show how a Deep Neural Network (DNN) can be dynamically scalable to trade-off these various performance metrics. Achieving consistent performance when executing on different platforms is necessary yet challenging, due to the different resources provided and their capability, and their time-varying availability when executing alongside other workloads. Managing the interface between available hardware resources (often numerous and heterogeneous in nature), software requirements, and user experience is increasingly complex.
{"title":"Optimising Resource Management for Embedded Machine Learning","authors":"Lei Xun, Long Tran-Thanh, B. Al-Hashimi, G. Merrett","doi":"10.23919/DATE48585.2020.9116235","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116235","url":null,"abstract":"Machine learning inference is increasingly being executed locally on mobile and embedded platforms, due to the clear advantages in latency, privacy and connectivity. In this paper, we present approaches for online resource management in heterogeneous multi-core systems and show how they can be applied to optimise the performance of machine learning work-loads. Performance can be defined using platform-dependent (e.g. speed, energy) and platform-independent (accuracy, confidence) metrics. In particular, we show how a Deep Neural Network (DNN) can be dynamically scalable to trade-off these various performance metrics. Achieving consistent performance when executing on different platforms is necessary yet challenging, due to the different resources provided and their capability, and their time-varying availability when executing alongside other workloads. Managing the interface between available hardware resources (often numerous and heterogeneous in nature), software requirements, and user experience is increasingly complex.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122498125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116233
L. Pang, Mengyun Yao, Yifan Chai
Statistical SRAM yield analysis has become a growing concern for the requirement of high integration density and reliability of SRAM under process variations. It is a challenge to estimate the SRAM failure probability efficiently and accurately because the circuit failure is a "rare-event". Existing methods are still not efficient enough to solve the problem, especially in high dimensions. In this paper, we develop a scaled-sigma adaptive importance sampling (SSAIS) which is an extension of the adaptive importance sampling. This method changes not only the location parameters but the shape parameters by searching the failure region iteratively. The 40nm SRAM cell experiment validated that our method outperforms Monte Carlo method by 1500x and is 2.3x~5.2x faster than the state-of-art methods with reasonable accuracy. Another experiment on sense amplifier shows our method achieves 1811x speedup over the Monte Carlo method and 2x~11x speedup over the other methods.
{"title":"An Efficient SRAM yield Analysis Using Scaled-Sigma Adaptive Importance Sampling","authors":"L. Pang, Mengyun Yao, Yifan Chai","doi":"10.23919/DATE48585.2020.9116233","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116233","url":null,"abstract":"Statistical SRAM yield analysis has become a growing concern for the requirement of high integration density and reliability of SRAM under process variations. It is a challenge to estimate the SRAM failure probability efficiently and accurately because the circuit failure is a \"rare-event\". Existing methods are still not efficient enough to solve the problem, especially in high dimensions. In this paper, we develop a scaled-sigma adaptive importance sampling (SSAIS) which is an extension of the adaptive importance sampling. This method changes not only the location parameters but the shape parameters by searching the failure region iteratively. The 40nm SRAM cell experiment validated that our method outperforms Monte Carlo method by 1500x and is 2.3x~5.2x faster than the state-of-art methods with reasonable accuracy. Another experiment on sense amplifier shows our method achieves 1811x speedup over the Monte Carlo method and 2x~11x speedup over the other methods.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122524119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}