Increasing rates of transient hardware faults pose a problem for computing applications. Current and future trends are likely to exacerbate this problem. When a transient fault occurs during program execution, data in the output can become corrupted. The severity of output corruptions depends on the application domain. Hence, different applications require different levels of fault tolerance. We present an LLVM-based AN encoder that can equip programs with an error detection mechanism at configurable levels of rigor. Based on our AN encoder, the trade-off between fault tolerance and runtime overhead is analyzed. It is found that, by suitably configuring our AN encoder, the runtime overhead can be reduced from 9.9x to 2.1x. At the same time, however, the probability that a hardware fault in the CPU will result in silent data corruption rises from 0.007 to over 0.022. The same probability for memory faults increases from 0.009 to over 0.032. It is further demonstrated, by applying different configurations of our AN encoder to the components of an arithmetic expression interpreter, that having fine-grained control over levels of fault tolerance can be beneficial.
{"title":"Trading Fault Tolerance for Performance in AN Encoding","authors":"Norman A. Rink, J. Castrillón","doi":"10.1145/3075564.3075565","DOIUrl":"https://doi.org/10.1145/3075564.3075565","url":null,"abstract":"Increasing rates of transient hardware faults pose a problem for computing applications. Current and future trends are likely to exacerbate this problem. When a transient fault occurs during program execution, data in the output can become corrupted. The severity of output corruptions depends on the application domain. Hence, different applications require different levels of fault tolerance. We present an LLVM-based AN encoder that can equip programs with an error detection mechanism at configurable levels of rigor. Based on our AN encoder, the trade-off between fault tolerance and runtime overhead is analyzed. It is found that, by suitably configuring our AN encoder, the runtime overhead can be reduced from 9.9x to 2.1x. At the same time, however, the probability that a hardware fault in the CPU will result in silent data corruption rises from 0.007 to over 0.022. The same probability for memory faults increases from 0.009 to over 0.032. It is further demonstrated, by applying different configurations of our AN encoder to the components of an arithmetic expression interpreter, that having fine-grained control over levels of fault tolerance can be beneficial.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128134798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As computing becomes increasingly pervasive, different heterogeneous networks are connected and integrated. This is especially true in the Internet of Things (IoT) and Wireless Sensor Networks (WSN) settings. However, as different networks managed by different parties and with different security requirements are integrated, security becomes a primary concern. WSN nodes, in particular, are often deployed "in the open", where a potential attacker can gain physical access to the device. As nodes can be deployed in hostile or difficult scenarios, such as military battlefields or disaster recovery settings, it is crucial to avoid escalation from successful attacks on a single node to the whole network, and from there to other connected networks. It is therefore crucial to secure the communication within the WSN, and in particular, maintain context information, such as the network topology and the location and identity of base stations (which collect data gathered by the sensors) private. In this paper, we propose a protocol achieving anonymous routing between different interconnected IoT or WSN networks, based on the Spatial Bloom Filter (SBF) data structure. The protocol enables communications between the nodes through the use of anonymous identifiers, thus hiding the location and identity of the nodes within the network. The proposed routing strategy preserves context privacy, and prevents adversaries from learning the network structure and topology, as routing information is encrypted using a homomorphic encryption scheme, and computed only in the encrypted domain. Preserving context privacy is crucial in preventing adversaries from gaining valuable network information from a successful attacks on a single node of the network, and reduces the potential for attack escalation.
{"title":"Private inter-network routing for Wireless Sensor Networks and the Internet of Things","authors":"P. Palmieri, L. Calderoni, D. Maio","doi":"10.1145/3075564.3079068","DOIUrl":"https://doi.org/10.1145/3075564.3079068","url":null,"abstract":"As computing becomes increasingly pervasive, different heterogeneous networks are connected and integrated. This is especially true in the Internet of Things (IoT) and Wireless Sensor Networks (WSN) settings. However, as different networks managed by different parties and with different security requirements are integrated, security becomes a primary concern. WSN nodes, in particular, are often deployed \"in the open\", where a potential attacker can gain physical access to the device. As nodes can be deployed in hostile or difficult scenarios, such as military battlefields or disaster recovery settings, it is crucial to avoid escalation from successful attacks on a single node to the whole network, and from there to other connected networks. It is therefore crucial to secure the communication within the WSN, and in particular, maintain context information, such as the network topology and the location and identity of base stations (which collect data gathered by the sensors) private. In this paper, we propose a protocol achieving anonymous routing between different interconnected IoT or WSN networks, based on the Spatial Bloom Filter (SBF) data structure. The protocol enables communications between the nodes through the use of anonymous identifiers, thus hiding the location and identity of the nodes within the network. The proposed routing strategy preserves context privacy, and prevents adversaries from learning the network structure and topology, as routing information is encrypted using a homomorphic encryption scheme, and computed only in the encrypted domain. Preserving context privacy is crucial in preventing adversaries from gaining valuable network information from a successful attacks on a single node of the network, and reduces the potential for attack escalation.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114064139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a brain-inspired von Neumann memory architecture for sparse, nonlocal, and unstructured workloads. Memory at each node contains selectable windows for optimistic shared access. A low-latency multiple access control for various policies is provided inside the local memory controller, using conditional deferred queuing with shared address list entries and associated lock bits. When combined with a memory-side cache, the proposed architecture is expected to transparently accelerate and flexibly scale the performance of sparse, nonlocal, and unstructured workloads by better regulating the data-access pipelining across local and remote memory requests.
{"title":"Brain-Inspired Memory Architecture for Sparse Nonlocal and Unstructured Workloads","authors":"Y. Katayama","doi":"10.1145/3075564.3075597","DOIUrl":"https://doi.org/10.1145/3075564.3075597","url":null,"abstract":"This paper presents a brain-inspired von Neumann memory architecture for sparse, nonlocal, and unstructured workloads. Memory at each node contains selectable windows for optimistic shared access. A low-latency multiple access control for various policies is provided inside the local memory controller, using conditional deferred queuing with shared address list entries and associated lock bits. When combined with a memory-side cache, the proposed architecture is expected to transparently accelerate and flexibly scale the performance of sparse, nonlocal, and unstructured workloads by better regulating the data-access pipelining across local and remote memory requests.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117046975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Picek, L. Mariot, Bohan Yang, D. Jakobović, N. Mentens
The aim of this paper is to find cellular automata (CA) rules that are used to describe S-boxes with good cryptographic properties and low implementation cost. Up to now, CA rules have been used in several ciphers to define an S-box, but in all those ciphers, the same CA rule is used. This CA rule is best known as the one defining the Keccak χ transformation. Since there exists no straightforward method for constructing CA rules that define S-boxes with good cryptographic/implementation properties, we use a special kind of heuristics for that -- Genetic Programming (GP). Although it is not possible to theoretically prove the efficiency of such a method, our experimental results show that GP is able to find a large number of CA rules that define good S-boxes in a relatively easy way. We focus on the 4 x 4 and 5 x 5 sizes and we implement the S-boxes in hardware to examine implementation properties like latency, area, and power. Particularly interesting is the internal encoding of the solutions in the considered heuristics using combinatorial circuits; this makes it easy to approximate S-box implementation properties like latency and area a priori.
{"title":"Design of S-boxes Defined with Cellular Automata Rules","authors":"S. Picek, L. Mariot, Bohan Yang, D. Jakobović, N. Mentens","doi":"10.1145/3075564.3079069","DOIUrl":"https://doi.org/10.1145/3075564.3079069","url":null,"abstract":"The aim of this paper is to find cellular automata (CA) rules that are used to describe S-boxes with good cryptographic properties and low implementation cost. Up to now, CA rules have been used in several ciphers to define an S-box, but in all those ciphers, the same CA rule is used. This CA rule is best known as the one defining the Keccak χ transformation. Since there exists no straightforward method for constructing CA rules that define S-boxes with good cryptographic/implementation properties, we use a special kind of heuristics for that -- Genetic Programming (GP). Although it is not possible to theoretically prove the efficiency of such a method, our experimental results show that GP is able to find a large number of CA rules that define good S-boxes in a relatively easy way. We focus on the 4 x 4 and 5 x 5 sizes and we implement the S-boxes in hardware to examine implementation properties like latency, area, and power. Particularly interesting is the internal encoding of the solutions in the considered heuristics using combinatorial circuits; this makes it easy to approximate S-box implementation properties like latency and area a priori.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115130621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Out-of-order execution is essential for high performance, general-purpose computation, as it can find and execute useful work instead of stalling. However, it is limited by the requirement of visibly sequential, atomic instruction execution --- in other words in-order instruction commit. While in-order commit has its advantages, such as providing precise interrupts and avoiding complications with the memory consistency model, it requires the core to hold on to resources (reorder buffer entries, load/store queue entries, registers) until they are released in program order. In contrast, out-of-order commit releases resources much earlier, yielding improved performance with fewer traditional hardware resources. However, out-of-order commit is limited in terms of correctness by the conditions described in the work of Bell and Lipasti. In this paper we revisit out-of-order commit from a different perspective, not by proposing another hardware technique, but by examining these conditions one by one and in combination with respect to their potential performance benefit for both non-speculative and speculative out-of-order commit. While correctly handling recovery for all out-of-order commit conditions currently requires complex tracking and expensive checkpointing, this work aims to demonstrate the potential for selective, speculative out-of-order commit using an oracle implementation without speculative rollback costs. We learn that: a) there is significant untapped potential for aggressive variants of out-of-order commit; b) it is important to optimize the commit depth, or the search distance for out-of-order commit, for a balanced design: smaller cores can benefit from shorter depths while larger cores continue to benefit from aggressive parameters; c) the focus on a subset of out-of-order commit conditions could lead to efficient implementations; d) the benefits for out-of-order commit increase with higher memory latency and works well in conjunction with prefetching to continue to improve performance.
{"title":"Exploring the Performance Limits of Out-of-order Commit","authors":"M. Alipour, Trevor E. Carlson, S. Kaxiras","doi":"10.1145/3075564.3075581","DOIUrl":"https://doi.org/10.1145/3075564.3075581","url":null,"abstract":"Out-of-order execution is essential for high performance, general-purpose computation, as it can find and execute useful work instead of stalling. However, it is limited by the requirement of visibly sequential, atomic instruction execution --- in other words in-order instruction commit. While in-order commit has its advantages, such as providing precise interrupts and avoiding complications with the memory consistency model, it requires the core to hold on to resources (reorder buffer entries, load/store queue entries, registers) until they are released in program order. In contrast, out-of-order commit releases resources much earlier, yielding improved performance with fewer traditional hardware resources. However, out-of-order commit is limited in terms of correctness by the conditions described in the work of Bell and Lipasti. In this paper we revisit out-of-order commit from a different perspective, not by proposing another hardware technique, but by examining these conditions one by one and in combination with respect to their potential performance benefit for both non-speculative and speculative out-of-order commit. While correctly handling recovery for all out-of-order commit conditions currently requires complex tracking and expensive checkpointing, this work aims to demonstrate the potential for selective, speculative out-of-order commit using an oracle implementation without speculative rollback costs. We learn that: a) there is significant untapped potential for aggressive variants of out-of-order commit; b) it is important to optimize the commit depth, or the search distance for out-of-order commit, for a balanced design: smaller cores can benefit from shorter depths while larger cores continue to benefit from aggressive parameters; c) the focus on a subset of out-of-order commit conditions could lead to efficient implementations; d) the benefits for out-of-order commit increase with higher memory latency and works well in conjunction with prefetching to continue to improve performance.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124889400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Process technology improvements have historically allowed an effortless expansion of the capacity and capabilities of computers and the cloud with few changes to the underlying software or programming model. However, the end of Dennard Scaling means that performance and efficiency gains will rely on the customization of the hardware for each application. Yet customizing hardware for each application runs contrary to the trend to moving more and more applications to a common hardware infrastructure the Cloud. Microsofts Catapult project has brought the power and performance of FPGA-based reconfigurable computing to hyperscale datacenters, accelerating major production cloud applications such as Bing web search and Microsoft Azure, and enabling a new generation of machine learning and artificial intelligence applications. These diverse workloads are accelerated using the same underlying hardware by using highly programmable silicon. The presence of ubiquitous and programmable silicon in the datacenter enables a new era of hardware/software co-design across a wide variety of workloads, opening up affordable and efficient performance across an enormous set of workloads. Catapult is now deployed in nearly every new server across the more than a million machines that make up the Microsoft hyperscale cloud. In this talk, I will describe the next generation of the Catapult configurable cloud architecture, and the tools and techniques that have made Catapult successful to date. I will discuss areas where traditional hardware and software development flows fall short, the domains where this programmable hardware holds the most potential, and how this technology can enable new computing frontiers.
{"title":"Designing and Programming the Configurable Cloud","authors":"Andrew Putnam","doi":"10.1145/3075564.3095083","DOIUrl":"https://doi.org/10.1145/3075564.3095083","url":null,"abstract":"Process technology improvements have historically allowed an effortless expansion of the capacity and capabilities of computers and the cloud with few changes to the underlying software or programming model. However, the end of Dennard Scaling means that performance and efficiency gains will rely on the customization of the hardware for each application. Yet customizing hardware for each application runs contrary to the trend to moving more and more applications to a common hardware infrastructure the Cloud. Microsofts Catapult project has brought the power and performance of FPGA-based reconfigurable computing to hyperscale datacenters, accelerating major production cloud applications such as Bing web search and Microsoft Azure, and enabling a new generation of machine learning and artificial intelligence applications. These diverse workloads are accelerated using the same underlying hardware by using highly programmable silicon. The presence of ubiquitous and programmable silicon in the datacenter enables a new era of hardware/software co-design across a wide variety of workloads, opening up affordable and efficient performance across an enormous set of workloads. Catapult is now deployed in nearly every new server across the more than a million machines that make up the Microsoft hyperscale cloud. In this talk, I will describe the next generation of the Catapult configurable cloud architecture, and the tools and techniques that have made Catapult successful to date. I will discuss areas where traditional hardware and software development flows fall short, the domains where this programmable hardware holds the most potential, and how this technology can enable new computing frontiers.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129778573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ronny Bazan Antequera, P. Calyam, A. Chandrashekara, Shivoam Malhotra
Emerging interdisciplinary data-intensive applications in science and engineering fields (e.g. bioinformatics, cybermanufacturing) demand the use of high-performance computing resources. However, data-intensive applications' local resources usually present limited capacity and availability due to sizable upfront costs. The applications requirements warrant intelligent resource 'abstractions' coupled with 'reusable' approaches to save time and effort in deploying cyberinfrastructure (CI). In this paper, we present a novel 'custom templates' management middleware to overcome this scarcity of resources by use of advanced CI management technologies/protocols to on-demand deploy data-intensive applications across distributed/federated cloud resources. Our middleware comprises of a novel resource recommendation scheme that abstracts user requirements of data-intensive applications and matches them with federated cloud resources using custom templates in a catalog. We evaluate the accuracy of our recommendation scheme in two experiment scenarios. The experiments involve simulating a series of user interactions with diverse applications requirements, also feature a real-world data-intensive application case study. Our experiment results show that our scheme improves the resource recommendation accuracy by up to 21%, compared to the existing schemes.
{"title":"Recommending Resources to Cloud Applications based on Custom Templates Composition","authors":"Ronny Bazan Antequera, P. Calyam, A. Chandrashekara, Shivoam Malhotra","doi":"10.1145/3075564.3075582","DOIUrl":"https://doi.org/10.1145/3075564.3075582","url":null,"abstract":"Emerging interdisciplinary data-intensive applications in science and engineering fields (e.g. bioinformatics, cybermanufacturing) demand the use of high-performance computing resources. However, data-intensive applications' local resources usually present limited capacity and availability due to sizable upfront costs. The applications requirements warrant intelligent resource 'abstractions' coupled with 'reusable' approaches to save time and effort in deploying cyberinfrastructure (CI). In this paper, we present a novel 'custom templates' management middleware to overcome this scarcity of resources by use of advanced CI management technologies/protocols to on-demand deploy data-intensive applications across distributed/federated cloud resources. Our middleware comprises of a novel resource recommendation scheme that abstracts user requirements of data-intensive applications and matches them with federated cloud resources using custom templates in a catalog. We evaluate the accuracy of our recommendation scheme in two experiment scenarios. The experiments involve simulating a series of user interactions with diverse applications requirements, also feature a real-world data-intensive application case study. Our experiment results show that our scheme improves the resource recommendation accuracy by up to 21%, compared to the existing schemes.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126239564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Oliveira, Vinicius Fratin, P. Navaux, I. Koren, P. Rech
Transient faults are a major problem for large scale HPC systems, and the mitigation of adverse fault effects need to be highly efficient as we approach exascale. We developed a fault injection tool (CAROL-FI) to identify the potential sources of adverse fault effects. With a deeper understanding of such effects, we provide useful insights to design efficient mitigation techniques, like selective hardening of critical portions of the code. We performed a fault injection campaign injecting more than 67,000 faults into an Intel Xeon Phi executing six representative HPC programs. We show that selective hardening can be successfully applied to DGEMM and Hotspot while LavaMD and NW may require a complete code hardening.
{"title":"CAROL-FI: an Efficient Fault-Injection Tool for Vulnerability Evaluation of Modern HPC Parallel Accelerators","authors":"Daniel Oliveira, Vinicius Fratin, P. Navaux, I. Koren, P. Rech","doi":"10.1145/3075564.3075598","DOIUrl":"https://doi.org/10.1145/3075564.3075598","url":null,"abstract":"Transient faults are a major problem for large scale HPC systems, and the mitigation of adverse fault effects need to be highly efficient as we approach exascale. We developed a fault injection tool (CAROL-FI) to identify the potential sources of adverse fault effects. With a deeper understanding of such effects, we provide useful insights to design efficient mitigation techniques, like selective hardening of critical portions of the code. We performed a fault injection campaign injecting more than 67,000 faults into an Intel Xeon Phi executing six representative HPC programs. We show that selective hardening can be successfully applied to DGEMM and Hotspot while LavaMD and NW may require a complete code hardening.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125210105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Gkorou, A. Ypma, G. Tsirogiannis, Manuel Giollo, Dag Sonntag, Geert Vinken, Richard van Haren, Robert Jan van Wijk, Jelle Nije, Tomoko Hoogenboom
In semiconductor manufacturing, continuous on-line monitoring prevents production stop and yield loss. The challenges towards this accomplishment are: 1) the complexity of lithography machines which are composed of hundreds of mechanical and optical components, 2) the high rate and volume data acquisition from different lithography and metrology machines, and 3) the scarcity of performance measurements due to their cost. This paper addresses these challenges by 1) visualizing and ranking the most relevant factors to a performance metric, 2) organizing efficiently Big Data from different sources and 3) predicting the performance with machine learning when measurements are lacking. Even though this project targets semiconductor manufacturing, its methodology is applicable to any case of monitoring complex systems, with many potentially interesting features, and imbalanced datasets.
{"title":"Towards Big Data Visualization for Monitoring and Diagnostics of High Volume Semiconductor Manufacturing","authors":"D. Gkorou, A. Ypma, G. Tsirogiannis, Manuel Giollo, Dag Sonntag, Geert Vinken, Richard van Haren, Robert Jan van Wijk, Jelle Nije, Tomoko Hoogenboom","doi":"10.1145/3075564.3078883","DOIUrl":"https://doi.org/10.1145/3075564.3078883","url":null,"abstract":"In semiconductor manufacturing, continuous on-line monitoring prevents production stop and yield loss. The challenges towards this accomplishment are: 1) the complexity of lithography machines which are composed of hundreds of mechanical and optical components, 2) the high rate and volume data acquisition from different lithography and metrology machines, and 3) the scarcity of performance measurements due to their cost. This paper addresses these challenges by 1) visualizing and ranking the most relevant factors to a performance metric, 2) organizing efficiently Big Data from different sources and 3) predicting the performance with machine learning when measurements are lacking. Even though this project targets semiconductor manufacturing, its methodology is applicable to any case of monitoring complex systems, with many potentially interesting features, and imbalanced datasets.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134009224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erwan Nogues, D. Ménard, Alexandre Mercat, M. Pelcat
The energy efficiency of a Multiprocessor SoC (MPSoC) is enhanced by complex hardware features such as Dynamic Voltage and Frequency Scaling (DVFS) and Dynamic Power Management (DPM). This paper proposes a methodology to learn an energy model from real power measurements. From this energy model, a convex optimization framework can determine the optimal energy efficient operating point in terms of frequency and number of active cores in an MPSoC. Experimental data are reported using a Samsung Exynos 5410 MPSoC. They show that a precise yet relatively simple model can be derived.
{"title":"On Learning the Energy Model of an MPSoC for Convex Optimization","authors":"Erwan Nogues, D. Ménard, Alexandre Mercat, M. Pelcat","doi":"10.1145/3075564.3078893","DOIUrl":"https://doi.org/10.1145/3075564.3078893","url":null,"abstract":"The energy efficiency of a Multiprocessor SoC (MPSoC) is enhanced by complex hardware features such as Dynamic Voltage and Frequency Scaling (DVFS) and Dynamic Power Management (DPM). This paper proposes a methodology to learn an energy model from real power measurements. From this energy model, a convex optimization framework can determine the optimal energy efficient operating point in terms of frequency and number of active cores in an MPSoC. Experimental data are reported using a Samsung Exynos 5410 MPSoC. They show that a precise yet relatively simple model can be derived.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129545046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}