The safety of real-time embedded systems relies on both functional and timing correctness. On the timing side, realtime constraints are set on task executions, and missing them may lead to system failure. On the functional side, soft errors have become a major concern. Various soft error tolerance strategies are proposed for soft error detection and recovery, however they may introduce significant computation overhead and cause timing violations. In this work, we address the two aspects in an integrated framework, and propose a set of formulations to quantitatively model the impact of soft error detection and recovery mechanisms on real-time constraints. The formulations facilitate designers to analyze system feasibility under fault tolerance requirements and compare various architecture platforms. They may also help select the appropriate error tolerance mechanisms for software tasks, together with exploring task scheduling and allocation on representative single-core, multicore and distributed platforms, to maximize error coverage while meeting real-time constraints. Experiments on an industrial case study and synthetic examples demonstrate the effectiveness of our approach.
{"title":"Analysis and optimization of soft error tolerance strategies for real-time systems","authors":"Bowen Zheng, Yue Gao, Qi Zhu, S. Gupta","doi":"10.5555/2830840.2830847","DOIUrl":"https://doi.org/10.5555/2830840.2830847","url":null,"abstract":"The safety of real-time embedded systems relies on both functional and timing correctness. On the timing side, realtime constraints are set on task executions, and missing them may lead to system failure. On the functional side, soft errors have become a major concern. Various soft error tolerance strategies are proposed for soft error detection and recovery, however they may introduce significant computation overhead and cause timing violations. In this work, we address the two aspects in an integrated framework, and propose a set of formulations to quantitatively model the impact of soft error detection and recovery mechanisms on real-time constraints. The formulations facilitate designers to analyze system feasibility under fault tolerance requirements and compare various architecture platforms. They may also help select the appropriate error tolerance mechanisms for software tasks, together with exploring task scheduling and allocation on representative single-core, multicore and distributed platforms, to maximize error coverage while meeting real-time constraints. Experiments on an industrial case study and synthetic examples demonstrate the effectiveness of our approach.","PeriodicalId":281383,"journal":{"name":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115712887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-10-04DOI: 10.1109/CODESISSS.2015.7331382
Alessandro Danese, Luca Piccolboni, G. Pravadelli
A relevant aspect in design analysis and verification is monitoring how logic relations among different variables change at run time. Current static approaches suffer from scalability problems that prevent their adoption on large designs. On the contrary, dynamic techniques scale better from the memory-consumption point of view. However, to achieve a high accuracy, they require to analyse a huge number of (long) execution traces, which results in time-consuming phases. In this paper, we present a new efficient approach to automatically infer logic relations among the variables of a design implementation. Both a sequential and a GPU-oriented parallel implementation are proposed to dynamically extract likely invariants from execution traces on different time windows. Execution traces composed of millions of simulation instants can be efficiently analysed.
{"title":"A parallelizable approach for mining likely invariants","authors":"Alessandro Danese, Luca Piccolboni, G. Pravadelli","doi":"10.1109/CODESISSS.2015.7331382","DOIUrl":"https://doi.org/10.1109/CODESISSS.2015.7331382","url":null,"abstract":"A relevant aspect in design analysis and verification is monitoring how logic relations among different variables change at run time. Current static approaches suffer from scalability problems that prevent their adoption on large designs. On the contrary, dynamic techniques scale better from the memory-consumption point of view. However, to achieve a high accuracy, they require to analyse a huge number of (long) execution traces, which results in time-consuming phases. In this paper, we present a new efficient approach to automatically infer logic relations among the variables of a design implementation. Both a sequential and a GPU-oriented parallel implementation are proposed to dynamically extract likely invariants from execution traces on different time windows. Execution traces composed of millions of simulation instants can be efficiently analysed.","PeriodicalId":281383,"journal":{"name":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127485545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-10-04DOI: 10.1109/CODESISSS.2015.7331381
Ashraf El Antably, O. Gruber, F. Rousseau, Nicolas Fournel
Fully distributed memory multi-processors (MPSoC) implemented in multi-tiled architectures are promising solutions to support modern sophisticated applications, however, reliability of such systems is always an issue. As a result, system-level solution like task migration keeps its importance. Transferring the execution of a task from one tile to another helps keep acceptable reliability of such systems. A tile contains at least one processor and associated peripherals with a communication device responsible for inter-tile communications. We propose in this work a task migration technique that targets data-flow applications running on multi-tiled architectures. This technique uses a middleware layer that makes it transparent to application programmers and eases its portability over different multi-tiled architectures. It can be deployed on small operating systems that support neither MMU nor dynamic loading for task code. We show that this technique is operational on x86 based real hardware platform. Experimental results show low overhead both in memory and performance without much variance.
{"title":"Transparent and portable agent based task migration for data-flow applications on multi-tiled architectures","authors":"Ashraf El Antably, O. Gruber, F. Rousseau, Nicolas Fournel","doi":"10.1109/CODESISSS.2015.7331381","DOIUrl":"https://doi.org/10.1109/CODESISSS.2015.7331381","url":null,"abstract":"Fully distributed memory multi-processors (MPSoC) implemented in multi-tiled architectures are promising solutions to support modern sophisticated applications, however, reliability of such systems is always an issue. As a result, system-level solution like task migration keeps its importance. Transferring the execution of a task from one tile to another helps keep acceptable reliability of such systems. A tile contains at least one processor and associated peripherals with a communication device responsible for inter-tile communications. We propose in this work a task migration technique that targets data-flow applications running on multi-tiled architectures. This technique uses a middleware layer that makes it transparent to application programmers and eases its portability over different multi-tiled architectures. It can be deployed on small operating systems that support neither MMU nor dynamic loading for task code. We show that this technique is operational on x86 based real hardware platform. Experimental results show low overhead both in memory and performance without much variance.","PeriodicalId":281383,"journal":{"name":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127268924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-10-04DOI: 10.1109/CODESISSS.2015.7331367
Pirmin Vogel, A. Marongiu, L. Benini
While high-end heterogeneous systems are increasingly supporting heterogeneous uniform memory access (hUMA) as envisioned by the Heterogeneous System Architecture (HSA) foundation, their low-power counterparts targeting the embedded domain still lack basic features like virtual memory support for accelerators. As opposed to simply passing virtual address pointers, explicit data management involving copies is needed to share data between host processor and accelerators which hampers programmability and performance. In this work, we present a mixed hardware/software solution to enable lightweight virtual memory support for many-core accelerators in heterogeneous embedded systems-on-chip (SoCs). Based on an input/output translation lookaside buffer (IOTLB), efficiently managed by a kernel-level driver module running on the host, our solution features a considerably lower design complexity compared to conventional input/output memory management units. Using our evaluation platform based on the Xilinx Zynq-7000 SoC with a many-core accelerator implemented in the programmable logic, we demonstrate the effectiveness of our solution and the benefits of virtual memory support for embedded heterogeneous SoCs.
{"title":"Lightweight virtual memory support for many-core accelerators in heterogeneous embedded SoCs","authors":"Pirmin Vogel, A. Marongiu, L. Benini","doi":"10.1109/CODESISSS.2015.7331367","DOIUrl":"https://doi.org/10.1109/CODESISSS.2015.7331367","url":null,"abstract":"While high-end heterogeneous systems are increasingly supporting heterogeneous uniform memory access (hUMA) as envisioned by the Heterogeneous System Architecture (HSA) foundation, their low-power counterparts targeting the embedded domain still lack basic features like virtual memory support for accelerators. As opposed to simply passing virtual address pointers, explicit data management involving copies is needed to share data between host processor and accelerators which hampers programmability and performance. In this work, we present a mixed hardware/software solution to enable lightweight virtual memory support for many-core accelerators in heterogeneous embedded systems-on-chip (SoCs). Based on an input/output translation lookaside buffer (IOTLB), efficiently managed by a kernel-level driver module running on the host, our solution features a considerably lower design complexity compared to conventional input/output memory management units. Using our evaluation platform based on the Xilinx Zynq-7000 SoC with a many-core accelerator implemented in the programmable logic, we demonstrate the effectiveness of our solution and the benefits of virtual memory support for embedded heterogeneous SoCs.","PeriodicalId":281383,"journal":{"name":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131948257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-10-04DOI: 10.1109/CODESISSS.2015.7331375
Eunhyeok Park, Dongyoung Kim, Soobeom Kim, Yong-Deok Kim, Gunhee Kim, Sungroh Yoon, S. Yoo
Deep neural networks (DNNs) have recently proved their effectiveness in complex data analyses such as object/speech recognition. As their applications are being expanded to mobile devices, their energy efficiencies are becoming critical. In this paper, we propose a novel concept called big/LITTLE DNN (BL-DNN) which significantly reduces energy consumption required for DNN execution at a negligible loss of inference accuracy. The BL-DNN consists of a little DNN (consuming low energy) and a full-fledged big DNN. In order to reduce energy consumption, the BL-DNN aims at avoiding the execution of the big DNN whenever possible. The key idea for this goal is to execute the little DNN first for inference (without big DNN execution) and simply use its result as the final inference result as long as the result is estimated to be accurate. On the other hand, if the result from the little DNN is not considered to be accurate, the big DNN is executed to give the final inference result. This approach reduces the total energy consumption by obtaining the inference result only with the little, energy-efficient DNN in most cases, while maintaining the similar level of inference accuracy through selectively utilizing the big DNN execution. We present design-time and runtime methods to control the execution of big DNN under a trade-off between energy consumption and inference accuracy. Experiments with state-of-the-art DNNs for ImageNet and MNIST show that our proposed BL-DNN can offer up to 53.7% (ImageNet) and 94.1% (MNIST) reductions in energy consumption at a loss of 0.90% (ImageNet) and 0.12% (MNIST) in inference accuracy, respectively.
{"title":"Big/little deep neural network for ultra low power inference","authors":"Eunhyeok Park, Dongyoung Kim, Soobeom Kim, Yong-Deok Kim, Gunhee Kim, Sungroh Yoon, S. Yoo","doi":"10.1109/CODESISSS.2015.7331375","DOIUrl":"https://doi.org/10.1109/CODESISSS.2015.7331375","url":null,"abstract":"Deep neural networks (DNNs) have recently proved their effectiveness in complex data analyses such as object/speech recognition. As their applications are being expanded to mobile devices, their energy efficiencies are becoming critical. In this paper, we propose a novel concept called big/LITTLE DNN (BL-DNN) which significantly reduces energy consumption required for DNN execution at a negligible loss of inference accuracy. The BL-DNN consists of a little DNN (consuming low energy) and a full-fledged big DNN. In order to reduce energy consumption, the BL-DNN aims at avoiding the execution of the big DNN whenever possible. The key idea for this goal is to execute the little DNN first for inference (without big DNN execution) and simply use its result as the final inference result as long as the result is estimated to be accurate. On the other hand, if the result from the little DNN is not considered to be accurate, the big DNN is executed to give the final inference result. This approach reduces the total energy consumption by obtaining the inference result only with the little, energy-efficient DNN in most cases, while maintaining the similar level of inference accuracy through selectively utilizing the big DNN execution. We present design-time and runtime methods to control the execution of big DNN under a trade-off between energy consumption and inference accuracy. Experiments with state-of-the-art DNNs for ImageNet and MNIST show that our proposed BL-DNN can offer up to 53.7% (ImageNet) and 94.1% (MNIST) reductions in energy consumption at a loss of 0.90% (ImageNet) and 0.12% (MNIST) in inference accuracy, respectively.","PeriodicalId":281383,"journal":{"name":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130473268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-10-04DOI: 10.1109/CODESISSS.2015.7331372
M. Santambrogio, J. Ayala, Simone Campanoni, Riccardo Cattaneo, Gianluca Durelli, M. Ferroni, A. A. Nacci, Josué Pagán, Marina Zapater, Mónica Vallejo
Resources such as quantities of transistors and memory, the level of integration and the speed of components have increased dramatically over the years. Even though the technologies have improved, we continue to apply outdated approaches to our use of these resources. Key computer science abstractions have not changed since the 1960's. Therefore this is the time for a fresh approach to the way systems are designed and used.
{"title":"Power-awareness and smart-resource management in embedded computing systems","authors":"M. Santambrogio, J. Ayala, Simone Campanoni, Riccardo Cattaneo, Gianluca Durelli, M. Ferroni, A. A. Nacci, Josué Pagán, Marina Zapater, Mónica Vallejo","doi":"10.1109/CODESISSS.2015.7331372","DOIUrl":"https://doi.org/10.1109/CODESISSS.2015.7331372","url":null,"abstract":"Resources such as quantities of transistors and memory, the level of integration and the speed of components have increased dramatically over the years. Even though the technologies have improved, we continue to apply outdated approaches to our use of these resources. Key computer science abstractions have not changed since the 1960's. Therefore this is the time for a fresh approach to the way systems are designed and used.","PeriodicalId":281383,"journal":{"name":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122273491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-10-04DOI: 10.1109/CODESISSS.2015.7331364
Mungyu Son, Junwhan Ahn, S. Yoo
Mobile storage writes are often dominated by writes to SQLite database files. Our characterization shows that they mostly consist of frequent overwrites with small new data (which we call small writes) and relatively infrequent writes with large data updates. In order to reduce writes to the Flash memory on smartphones, we propose exploiting these characteristics and present a low-cost nonvolatile write buffer for write coalescing. The key challenge in it is that the stringent resource constraints of mobile devices force the write buffer size to be minimized down to a single Flash page in order to reduce the overhead of SRAM buffer on the controller chip and a backing capacitor that maintains non-volatility of the buffer on power failure. As a solution to this problem, we propose three optimizations that make the best use of this small single-page nonvolatile write buffer. First, we propose managing only the difference between old and new data (i.e., differential logs) in the write buffer, based on the observation that small writes are frequent. Second, we develop a dynamic bypass scheme which judiciously bypasses overwrite-unfriendly pages from the write buffer. Third, we devise an incremental flush policy which controls the number of write buffer entries to be flushed according to the size of the newly written data. According to our experiments using four representative mobile applications on a real storage platform, OpenSSD, the proposed method gives average 69.5% and 64.5% reductions in Flash memory writes in single- and multi-application runs, respectively. In addition, our scheme introduces a very small cost into existing systems, including 8-18.5KB SRAM on the controller chip and a tiny capacitor occupying only 1.7% of eMMC package volume.
{"title":"A tiny-capacitor-backed non-volatile buffer to reduce storage writes in smartphones","authors":"Mungyu Son, Junwhan Ahn, S. Yoo","doi":"10.1109/CODESISSS.2015.7331364","DOIUrl":"https://doi.org/10.1109/CODESISSS.2015.7331364","url":null,"abstract":"Mobile storage writes are often dominated by writes to SQLite database files. Our characterization shows that they mostly consist of frequent overwrites with small new data (which we call small writes) and relatively infrequent writes with large data updates. In order to reduce writes to the Flash memory on smartphones, we propose exploiting these characteristics and present a low-cost nonvolatile write buffer for write coalescing. The key challenge in it is that the stringent resource constraints of mobile devices force the write buffer size to be minimized down to a single Flash page in order to reduce the overhead of SRAM buffer on the controller chip and a backing capacitor that maintains non-volatility of the buffer on power failure. As a solution to this problem, we propose three optimizations that make the best use of this small single-page nonvolatile write buffer. First, we propose managing only the difference between old and new data (i.e., differential logs) in the write buffer, based on the observation that small writes are frequent. Second, we develop a dynamic bypass scheme which judiciously bypasses overwrite-unfriendly pages from the write buffer. Third, we devise an incremental flush policy which controls the number of write buffer entries to be flushed according to the size of the newly written data. According to our experiments using four representative mobile applications on a real storage platform, OpenSSD, the proposed method gives average 69.5% and 64.5% reductions in Flash memory writes in single- and multi-application runs, respectively. In addition, our scheme introduces a very small cost into existing systems, including 8-18.5KB SRAM on the controller chip and a tiny capacitor occupying only 1.7% of eMMC package volume.","PeriodicalId":281383,"journal":{"name":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116011132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-10-04DOI: 10.1109/CODESISSS.2015.7331380
T. Mück, S. Sarma, N. Dutt
In this paper we propose Run-DMC, an accurate runtime performance and power estimation scheme for dynamic workloads executing on heterogeneous multicore systems. In contrast to previous works, Run-DMC uses fine grain per-thread metrics that model the Thread Load Contribution (TLC) induced by the native OS scheduling policy to accurately predict performance and power for any possible thread-to-core mapping. This allows the operating system to opportunistically exploit the heterogeneous multicore architecture by dynamically mapping workloads to the most appropriate core type. We have integrated our models into the Linux kernel running on top of a heterogeneous multicore system with 4 different core types. Our experimental results show that Run-DMC models yield up to 97% more energy efficient when compared to the vanilla Linux. When compared to the approach employed by state-of-the-art energy-aware schedulers, Run-DMC yields up-to 44% better energy efficiency.
{"title":"Run-DMC: Runtime dynamic heterogeneous multicore performance and power estimation for energy efficiency","authors":"T. Mück, S. Sarma, N. Dutt","doi":"10.1109/CODESISSS.2015.7331380","DOIUrl":"https://doi.org/10.1109/CODESISSS.2015.7331380","url":null,"abstract":"In this paper we propose Run-DMC, an accurate runtime performance and power estimation scheme for dynamic workloads executing on heterogeneous multicore systems. In contrast to previous works, Run-DMC uses fine grain per-thread metrics that model the Thread Load Contribution (TLC) induced by the native OS scheduling policy to accurately predict performance and power for any possible thread-to-core mapping. This allows the operating system to opportunistically exploit the heterogeneous multicore architecture by dynamically mapping workloads to the most appropriate core type. We have integrated our models into the Linux kernel running on top of a heterogeneous multicore system with 4 different core types. Our experimental results show that Run-DMC models yield up to 97% more energy efficient when compared to the vanilla Linux. When compared to the approach employed by state-of-the-art energy-aware schedulers, Run-DMC yields up-to 44% better energy efficiency.","PeriodicalId":281383,"journal":{"name":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127233580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-10-04DOI: 10.1109/CODESISSS.2015.7331370
M. Salehi, M. Shafique, F. Kriebel, Semeen Rehman, Mohammad Khavari Tavana, A. Ejlali, J. Henkel
Due to the tight power envelope, in the future technology nodes it is envisaged that not all cores in a many-core chip can be simultaneously powered-on (at full performance level). The power-gated cores are referred to as Dark Silicon. At the same time, growing reliability issues due to process variations and soft errors challenge the cost-effective deployment of future technology nodes. This paper presents a reliability management system for Dark Silicon chips (dsReliM) that optimizes for reliability of on-chip systems while jointly accounting for soft errors, process variations and the thermal design power (TDP) constraint. Towards the TDP-constrained reliability optimization, dsReliM leverages multiple reliable application versions that can potentially execute on different cores with frequency variations and supporting differenst voltage-frequency levels, thus facilitating distinct power, reliability and performance tradeoffs at run time. Experiments show that our dsReliM system provides up to 20% reliability improvements under different TDP constraints when compared to a state-of-the-art technique. Also, compared to an ideal-case optimal solution, dsReliM deviates up to 2.5% in terms of reliability efficiency, but speeds up the reliability management decision time by a factor of up to 3100.
{"title":"dsReliM: Power-constrained reliability management in Dark-Silicon many-core chips under process variations","authors":"M. Salehi, M. Shafique, F. Kriebel, Semeen Rehman, Mohammad Khavari Tavana, A. Ejlali, J. Henkel","doi":"10.1109/CODESISSS.2015.7331370","DOIUrl":"https://doi.org/10.1109/CODESISSS.2015.7331370","url":null,"abstract":"Due to the tight power envelope, in the future technology nodes it is envisaged that not all cores in a many-core chip can be simultaneously powered-on (at full performance level). The power-gated cores are referred to as Dark Silicon. At the same time, growing reliability issues due to process variations and soft errors challenge the cost-effective deployment of future technology nodes. This paper presents a reliability management system for Dark Silicon chips (dsReliM) that optimizes for reliability of on-chip systems while jointly accounting for soft errors, process variations and the thermal design power (TDP) constraint. Towards the TDP-constrained reliability optimization, dsReliM leverages multiple reliable application versions that can potentially execute on different cores with frequency variations and supporting differenst voltage-frequency levels, thus facilitating distinct power, reliability and performance tradeoffs at run time. Experiments show that our dsReliM system provides up to 20% reliability improvements under different TDP constraints when compared to a state-of-the-art technique. Also, compared to an ideal-case optimal solution, dsReliM deviates up to 2.5% in terms of reliability efficiency, but speeds up the reliability management decision time by a factor of up to 3100.","PeriodicalId":281383,"journal":{"name":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129775016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-10-04DOI: 10.1109/CODESISSS.2015.7331365
M. A. Faruque, F. Regazzoni, M. Pajic
Cyber-Physical Systems (CPS) are in most cases safety- and mission-critical. Standard design techniques used for securing embedded systems are not suitable for CPS due to the restricted computation and communication budget available in the latter. In addition, the sensitivity of sensed data and the presence of actuation components further increase the security requirements of CPS. To address these issues, it is necessary to provide new design methods in which security is considered from the beginning of the whole design flow and addressed in a holistic way. In this paper, we focus on the design of secure CPS as part of the complete CPS design process, and provide insights into new requirements on platform-aware design of control components, design methodologies and architectures posed by CPS design. We start by discussing methods for the multi-disciplinary modeling, simulation, tools, and software synthesis challenges for CPS. We also present a framework for design of secure control systems for CPS, while taking into account properties of the underlying computation and communication platforms. Finally, we describe the security challenges in the computing hardware that is used in CPS.
{"title":"Design methodologies for securing cyber-physical systems","authors":"M. A. Faruque, F. Regazzoni, M. Pajic","doi":"10.1109/CODESISSS.2015.7331365","DOIUrl":"https://doi.org/10.1109/CODESISSS.2015.7331365","url":null,"abstract":"Cyber-Physical Systems (CPS) are in most cases safety- and mission-critical. Standard design techniques used for securing embedded systems are not suitable for CPS due to the restricted computation and communication budget available in the latter. In addition, the sensitivity of sensed data and the presence of actuation components further increase the security requirements of CPS. To address these issues, it is necessary to provide new design methods in which security is considered from the beginning of the whole design flow and addressed in a holistic way. In this paper, we focus on the design of secure CPS as part of the complete CPS design process, and provide insights into new requirements on platform-aware design of control components, design methodologies and architectures posed by CPS design. We start by discussing methods for the multi-disciplinary modeling, simulation, tools, and software synthesis challenges for CPS. We also present a framework for design of secure control systems for CPS, while taking into account properties of the underlying computation and communication platforms. Finally, we describe the security challenges in the computing hardware that is used in CPS.","PeriodicalId":281383,"journal":{"name":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132850779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}