Pub Date : 2015-10-04DOI: 10.1109/CODESISSS.2015.7331384
Mojtaba Ebrahimi, Nour Sayed, Maryam Rashvand, M. Tahoori
Radiation-induced soft errors are major reliability concerns in advanced technology nodes. The de facto approach for evaluation of the soft error vulnerability is to perform a costly fault injection campaign. Due to the long residency of some errors in system states, the error has to be traced for even millions of cycles. However, only a very small portion of injected errors leads to the failure. This means that many simulation cycles are wasted as they contribute to no failure due to various masking effects. In this paper, we present an importance sampling technique based on Architecturally Correct Execution (ACE) analysis to identify the non-vulnerable time intervals in memory arrays and avoid unnecessary fault injections to speedup the soft error vulnerability evaluation process without sacrificing the accuracy. Our analysis reveals that this approach significantly expedites our architecture-level fault injection technique (on average by 13X).
{"title":"Fault injection acceleration by architectural importance sampling","authors":"Mojtaba Ebrahimi, Nour Sayed, Maryam Rashvand, M. Tahoori","doi":"10.1109/CODESISSS.2015.7331384","DOIUrl":"https://doi.org/10.1109/CODESISSS.2015.7331384","url":null,"abstract":"Radiation-induced soft errors are major reliability concerns in advanced technology nodes. The de facto approach for evaluation of the soft error vulnerability is to perform a costly fault injection campaign. Due to the long residency of some errors in system states, the error has to be traced for even millions of cycles. However, only a very small portion of injected errors leads to the failure. This means that many simulation cycles are wasted as they contribute to no failure due to various masking effects. In this paper, we present an importance sampling technique based on Architecturally Correct Execution (ACE) analysis to identify the non-vulnerable time intervals in memory arrays and avoid unnecessary fault injections to speedup the soft error vulnerability evaluation process without sacrificing the accuracy. Our analysis reveals that this approach significantly expedites our architecture-level fault injection technique (on average by 13X).","PeriodicalId":281383,"journal":{"name":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115262964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-10-04DOI: 10.1109/CODESISSS.2015.7331383
Sunha Ahn, S. Malik, Aarti Gupta
An emerging trend in system design is to implement complex system management functions in firmware (FW). This changing design scenario requires support for verifying FW in the context of its hardware (HW) environment. As shown in previous work, there is value in a unified HW-FW model for driving the verification tasks. This model can help identify specific commonly-occurring interaction patterns between the HW and FW. These patterns enable pruning the verification search space as demonstrated in previous work in automating FW test generation using concolic testing. In this work, we introduce a bounded model checking (BMC)-based methodology for FW verification. Although BMC is effective for finding bugs by unrolling the underlying transition system up to some bound, it requires a completeness threshold on the bound to ensure complete verification. We show how commonly occurring FW code patterns can be exploited, using inexpensive static analysis techniques, to determine this completeness bound. Further, we show how this bound analysis, combined with the interaction patterns in the unified HW-FW model, is used to sequentialize the concurrent FW and HW code, i.e., to derive a sequential program that represents the parallel interaction of the FW and HW. This enables the direct application of standard software model checkers such as CBMC on this sequentialized program. We have automated this process by implementing: (i) a static completeness bound analyzer on top of the tool Frama-C, and (ii) a sequentializer to generate code for verification by the CBMC model checker. We evaluate the resulting tool using three real FW benchmarks, each consisting of a Linux device driver and its interacting QEMU-emulated HW code with multiple correctness properties. We successfully computed the BMC completeness bounds for 41 out of 46 properties and completed model checking for 12 out of 16 FW transactions.
{"title":"Completeness bounds and sequentialization for model checking of interacting firmware and hardware","authors":"Sunha Ahn, S. Malik, Aarti Gupta","doi":"10.1109/CODESISSS.2015.7331383","DOIUrl":"https://doi.org/10.1109/CODESISSS.2015.7331383","url":null,"abstract":"An emerging trend in system design is to implement complex system management functions in firmware (FW). This changing design scenario requires support for verifying FW in the context of its hardware (HW) environment. As shown in previous work, there is value in a unified HW-FW model for driving the verification tasks. This model can help identify specific commonly-occurring interaction patterns between the HW and FW. These patterns enable pruning the verification search space as demonstrated in previous work in automating FW test generation using concolic testing. In this work, we introduce a bounded model checking (BMC)-based methodology for FW verification. Although BMC is effective for finding bugs by unrolling the underlying transition system up to some bound, it requires a completeness threshold on the bound to ensure complete verification. We show how commonly occurring FW code patterns can be exploited, using inexpensive static analysis techniques, to determine this completeness bound. Further, we show how this bound analysis, combined with the interaction patterns in the unified HW-FW model, is used to sequentialize the concurrent FW and HW code, i.e., to derive a sequential program that represents the parallel interaction of the FW and HW. This enables the direct application of standard software model checkers such as CBMC on this sequentialized program. We have automated this process by implementing: (i) a static completeness bound analyzer on top of the tool Frama-C, and (ii) a sequentializer to generate code for verification by the CBMC model checker. We evaluate the resulting tool using three real FW benchmarks, each consisting of a Linux device driver and its interacting QEMU-emulated HW code with multiple correctness properties. We successfully computed the BMC completeness bounds for 41 out of 46 properties and completed model checking for 12 out of 16 FW transactions.","PeriodicalId":281383,"journal":{"name":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133573454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-10-04DOI: 10.1109/CODESISSS.2015.7331363
Chun-Ta Lin, Yuan-Hao Chang, Tei-Wei Kuo, Hung-Sheng Chang, Hsiang-Pang Li
There is a growing demand to introduce more and more intelligence to storage devices in recent years, especially with the rapid increasing of hardware computing power. This paper targets on essential design issues in space utilization for dedup-based non-volatile phase-change memory (PCM). We explore the adoption of data duplication techniques to reduce potential data duplicates over PCM storage devices to provide more storage space than the physical storage space does. Among various data deduplication techniques, variable-sized chunking is considered in less cost-effective PCM-based storage devices because variable-sized chunking has better data deduplication capability than fixed-sized chunking. However, in a typical system architecture, data are written or updated in the fixed management units (e.g., LBAs). Thus, to ultimately improve the space utilization of PCM-based storage device, the technical problem falls on (1) how to map fixed-sized LBAs to variable-sized chunks and (2) how to efficiently manage (i.e., allocated and deallocate) free PCM storage space for variable-sized chunks. In this work, we propose a free space manager, called container-based space manager, to resolve the above two issues by exploiting the fact that (1) a storage system initially has more free space to relax the complexity on space management and (2) the space optimization of a storage system can grow with the time when it contains more and more data. The proposed design is evaluated over popular benchmarks, for which we have very encouraging results.
{"title":"How to improve the space utilization of dedup-based PCM storage devices?","authors":"Chun-Ta Lin, Yuan-Hao Chang, Tei-Wei Kuo, Hung-Sheng Chang, Hsiang-Pang Li","doi":"10.1109/CODESISSS.2015.7331363","DOIUrl":"https://doi.org/10.1109/CODESISSS.2015.7331363","url":null,"abstract":"There is a growing demand to introduce more and more intelligence to storage devices in recent years, especially with the rapid increasing of hardware computing power. This paper targets on essential design issues in space utilization for dedup-based non-volatile phase-change memory (PCM). We explore the adoption of data duplication techniques to reduce potential data duplicates over PCM storage devices to provide more storage space than the physical storage space does. Among various data deduplication techniques, variable-sized chunking is considered in less cost-effective PCM-based storage devices because variable-sized chunking has better data deduplication capability than fixed-sized chunking. However, in a typical system architecture, data are written or updated in the fixed management units (e.g., LBAs). Thus, to ultimately improve the space utilization of PCM-based storage device, the technical problem falls on (1) how to map fixed-sized LBAs to variable-sized chunks and (2) how to efficiently manage (i.e., allocated and deallocate) free PCM storage space for variable-sized chunks. In this work, we propose a free space manager, called container-based space manager, to resolve the above two issues by exploiting the fact that (1) a storage system initially has more free space to relax the complexity on space management and (2) the space optimization of a storage system can grow with the time when it contains more and more data. The proposed design is evaluated over popular benchmarks, for which we have very encouraging results.","PeriodicalId":281383,"journal":{"name":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134178192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-10-04DOI: 10.1109/CODESISSS.2015.7331369
J. Spasić, Di Liu, E. Cannella, T. Stefanov
Recently, it has been shown that hard real-time scheduling theory can be applied to streaming applications modeled as acyclic Cyclo-Static Dataflow (CSDF) graphs. However, that approach is not efficient in terms of throughput and processor utilization. Therefore, in this paper, we propose an improved hard real-time scheduling approach to schedule streaming applications modeled as acyclic CSDF graphs on a Multi-Processor System-on-Chip (MPSoC) platform. The proposed approach converts each actor in a CSDF graph to a set of real-time periodic tasks. The conversion enables application of many hard real-time scheduling algorithms which offer fast calculation of the required number of processors for scheduling the tasks. We evaluate the performance and time complexity of our approach in comparison to several existing scheduling approaches. Experiments on a set of real-life streaming applications demonstrate that our approach: 1) results in systems with higher throughput and better processor utilization in comparison to the existing hard real-time scheduling approach for CSDF graphs while requiring comparable time for the system derivation; 2) gives the same throughput as the existing periodic scheduling approach for CSDF graphs but requires much shorter time to derive the task schedule and tasks' parameters (periods, start times, etc.); and 3) gives the throughput that is equal or very close to the maximum achievable throughput of an application obtained via self-timed scheduling, but requires much shorter time to derive the schedule. The total time needed for the proposed conversion approach and the calculation of the minimum number of processors needed to schedule the tasks and the calculation of the size of communication buffers between tasks is in the range of seconds.
{"title":"Improved hard real-time scheduling of CSDF-modeled streaming applications","authors":"J. Spasić, Di Liu, E. Cannella, T. Stefanov","doi":"10.1109/CODESISSS.2015.7331369","DOIUrl":"https://doi.org/10.1109/CODESISSS.2015.7331369","url":null,"abstract":"Recently, it has been shown that hard real-time scheduling theory can be applied to streaming applications modeled as acyclic Cyclo-Static Dataflow (CSDF) graphs. However, that approach is not efficient in terms of throughput and processor utilization. Therefore, in this paper, we propose an improved hard real-time scheduling approach to schedule streaming applications modeled as acyclic CSDF graphs on a Multi-Processor System-on-Chip (MPSoC) platform. The proposed approach converts each actor in a CSDF graph to a set of real-time periodic tasks. The conversion enables application of many hard real-time scheduling algorithms which offer fast calculation of the required number of processors for scheduling the tasks. We evaluate the performance and time complexity of our approach in comparison to several existing scheduling approaches. Experiments on a set of real-life streaming applications demonstrate that our approach: 1) results in systems with higher throughput and better processor utilization in comparison to the existing hard real-time scheduling approach for CSDF graphs while requiring comparable time for the system derivation; 2) gives the same throughput as the existing periodic scheduling approach for CSDF graphs but requires much shorter time to derive the task schedule and tasks' parameters (periods, start times, etc.); and 3) gives the throughput that is equal or very close to the maximum achievable throughput of an application obtained via self-timed scheduling, but requires much shorter time to derive the schedule. The total time needed for the proposed conversion approach and the calculation of the minimum number of processors needed to schedule the tasks and the calculation of the size of communication buffers between tasks is in the range of seconds.","PeriodicalId":281383,"journal":{"name":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115744533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}