Pub Date : 2014-11-24DOI: 10.1109/DFT.2014.6962108
J. Anwer, M. Platzner
With increasing error-proneness of nano-circuits, a number of fault-tolerance approaches are presented in the literature to enhance circuit reliability. The evaluation of the effectiveness of fault-tolerant circuit structures remains a challenge. An analytical model is required to provide exact figures of reliability of a circuit design, to be able to locate error-sensitive parts of the circuit as well as to compare different fault-tolerance approaches. At the logic layer, probabilistic approaches exist that provide such measures of circuit reliability, but they do not consider the reliability-enhancement effect of fault-tolerant circuit structures. Furthermore, these approaches are often not applicable for large circuits and their complexity hinders the development of generic simulation tools. In this paper we combine the Boolean difference error calculator (BDEC), a previous probabilistic reliability model for hardware designs, with a reliability model for fault-tolerant circuit structures. As a result we are able to study the reliability of fault-tolerant circuit structures at the logic layer. We focus on fault-tolerant circuits to be implemented in FPGAs and show how to extend our combined model from combinational to sequential circuits. For analyzing larger circuits, we develop a MATLAB-based tool utilizing BDEC. With this tool, we perform a variability analysis of different input parameters, such as logic component, input and voter error probabilities, to observe their single and joint effect on the circuit reliability. Our analyses show that circuit reliability depends strongly on the circuit structure due to error-masking effects of components on each other. Moreover, the benefit of redundancy can be obtained up to a certain threshold of component, input and voter error probabilities though the voter reliability has the strongest impact on overall circuit reliability.
{"title":"Analytic reliability evaluation for fault-tolerant circuit structures on FPGAs","authors":"J. Anwer, M. Platzner","doi":"10.1109/DFT.2014.6962108","DOIUrl":"https://doi.org/10.1109/DFT.2014.6962108","url":null,"abstract":"With increasing error-proneness of nano-circuits, a number of fault-tolerance approaches are presented in the literature to enhance circuit reliability. The evaluation of the effectiveness of fault-tolerant circuit structures remains a challenge. An analytical model is required to provide exact figures of reliability of a circuit design, to be able to locate error-sensitive parts of the circuit as well as to compare different fault-tolerance approaches. At the logic layer, probabilistic approaches exist that provide such measures of circuit reliability, but they do not consider the reliability-enhancement effect of fault-tolerant circuit structures. Furthermore, these approaches are often not applicable for large circuits and their complexity hinders the development of generic simulation tools. In this paper we combine the Boolean difference error calculator (BDEC), a previous probabilistic reliability model for hardware designs, with a reliability model for fault-tolerant circuit structures. As a result we are able to study the reliability of fault-tolerant circuit structures at the logic layer. We focus on fault-tolerant circuits to be implemented in FPGAs and show how to extend our combined model from combinational to sequential circuits. For analyzing larger circuits, we develop a MATLAB-based tool utilizing BDEC. With this tool, we perform a variability analysis of different input parameters, such as logic component, input and voter error probabilities, to observe their single and joint effect on the circuit reliability. Our analyses show that circuit reliability depends strongly on the circuit structure due to error-masking effects of components on each other. Moreover, the benefit of redundancy can be obtained up to a certain threshold of component, input and voter error probabilities though the voter reliability has the strongest impact on overall circuit reliability.","PeriodicalId":414665,"journal":{"name":"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129394523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-11-24DOI: 10.1109/DFT.2014.6962103
G. Chapman, Rohit Thomas, Rahul Thomas, I. Koren, Z. Koren
From extensive study of digital imager defects, we found that “Hot Pixels” are the main digital camera defects, and that they increase at a nearly constant temporal rate over the camera's lifetime. Previously we characterized the hot pixels by a linear function of the exposure time in response to a dark frame setting. Using a camera with 55 known hot pixels, we compared our hot pixel correction algorithm to a conventional 4-nearest neighbor interpolation techniques. We developed a new “moving camera” method to exactly obtain both the actual hot pixel contribution and the true undamaged pixel value at a defect. Using these calibrated results we find that the correction method should be based on the hot pixel severity, the illumination intensity at the pixel, camera parameters such as ISO and exposure time, and on the neighboring pixels' variability.
{"title":"Improved correction for hot pixels in digital imagers","authors":"G. Chapman, Rohit Thomas, Rahul Thomas, I. Koren, Z. Koren","doi":"10.1109/DFT.2014.6962103","DOIUrl":"https://doi.org/10.1109/DFT.2014.6962103","url":null,"abstract":"From extensive study of digital imager defects, we found that “Hot Pixels” are the main digital camera defects, and that they increase at a nearly constant temporal rate over the camera's lifetime. Previously we characterized the hot pixels by a linear function of the exposure time in response to a dark frame setting. Using a camera with 55 known hot pixels, we compared our hot pixel correction algorithm to a conventional 4-nearest neighbor interpolation techniques. We developed a new “moving camera” method to exactly obtain both the actual hot pixel contribution and the true undamaged pixel value at a defect. Using these calibrated results we find that the correction method should be based on the hot pixel severity, the illumination intensity at the pixel, camera parameters such as ISO and exposure time, and on the neighboring pixels' variability.","PeriodicalId":414665,"journal":{"name":"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129548426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-11-24DOI: 10.1109/DFT.2014.6962075
M. Haghbayan, A. Rahmani, P. Liljeberg, J. Plosila, H. Tenhunen
Dark Silicon issue stresses that a fraction of silicon chip being able to switch in a full frequency is dropping and designers will soon face a growing underutilization inherent in future technology scaling. On the other hand, by reducing the transistor sizes, susceptibility to internal defects increases and large range of defects such as aging or transient faults will be shown up more frequently. In this paper, we propose an online concurrent test scheduling approach for the fraction of chip that cannot be utilized due to the restricted utilization wall. Dynamic voltage and frequency scaling including near-threshold operation is utilized in order to maximize the concurrency of the online testing process under the constant power. As the dark area of the system is dynamic and reshapes at a runtime, our approach dynamically tests unused cores in a runtime to provided tested cores for upcoming application and hence enhance system reliability. Empirical results show that our proposed concurrent testing approach using dynamic voltage and frequency scaling (DVFS) improves the overall test throughput by over 250% compared to the state-of-the-art dark silicon aware online testing approaches under the same power budget.
{"title":"Energy-efficient concurrent testing approach for many-core systems in the dark silicon age","authors":"M. Haghbayan, A. Rahmani, P. Liljeberg, J. Plosila, H. Tenhunen","doi":"10.1109/DFT.2014.6962075","DOIUrl":"https://doi.org/10.1109/DFT.2014.6962075","url":null,"abstract":"Dark Silicon issue stresses that a fraction of silicon chip being able to switch in a full frequency is dropping and designers will soon face a growing underutilization inherent in future technology scaling. On the other hand, by reducing the transistor sizes, susceptibility to internal defects increases and large range of defects such as aging or transient faults will be shown up more frequently. In this paper, we propose an online concurrent test scheduling approach for the fraction of chip that cannot be utilized due to the restricted utilization wall. Dynamic voltage and frequency scaling including near-threshold operation is utilized in order to maximize the concurrency of the online testing process under the constant power. As the dark area of the system is dynamic and reshapes at a runtime, our approach dynamically tests unused cores in a runtime to provided tested cores for upcoming application and hence enhance system reliability. Empirical results show that our proposed concurrent testing approach using dynamic voltage and frequency scaling (DVFS) improves the overall test throughput by over 250% compared to the state-of-the-art dark silicon aware online testing approaches under the same power budget.","PeriodicalId":414665,"journal":{"name":"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121583805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-11-24DOI: 10.1109/DFT.2014.6962099
H. Dogan, Domenic Forte, M. Tehranipoor
The counterfeit electronic component industry continues to threaten the security and reliability of systems by infiltrating recycled components into the supply chain. With the increased use of FPGAs in critical systems, recycled FPGAs cause significant concerns for government and industry. In this paper, we propose a two phase detection approach to differentiate recycled (used) FPGAs from new ones. Both approaches rely on machine learning via support vector machines (SVM) for classification. The first phase examines suspect FPGAs “as is” while the second phase requires some accelerated aging. To be more specific, Phase I detects recycled FPGAs by comparing the frequencies of ring oscillators (ROs) distributed on the FPGAs against a golden model. Experimental results on Xilinx FPGAs show that Phase I can correctly classify 8 out of 20 FPGAs under test. However, Phase I fails to detect FPGAs at fast corners and with lesser prior usage. Phase II is then used to compliment Phase I and overcome its limitations. The second phase performs a short aging step on the suspect FPGAs and exploits the aging speed reduction (due to prior usage) to cover the cases missed by Phase I. In our silicon results, Phase II detects all the fresh and recycled FPGAs correctly.
{"title":"Aging analysis for recycled FPGA detection","authors":"H. Dogan, Domenic Forte, M. Tehranipoor","doi":"10.1109/DFT.2014.6962099","DOIUrl":"https://doi.org/10.1109/DFT.2014.6962099","url":null,"abstract":"The counterfeit electronic component industry continues to threaten the security and reliability of systems by infiltrating recycled components into the supply chain. With the increased use of FPGAs in critical systems, recycled FPGAs cause significant concerns for government and industry. In this paper, we propose a two phase detection approach to differentiate recycled (used) FPGAs from new ones. Both approaches rely on machine learning via support vector machines (SVM) for classification. The first phase examines suspect FPGAs “as is” while the second phase requires some accelerated aging. To be more specific, Phase I detects recycled FPGAs by comparing the frequencies of ring oscillators (ROs) distributed on the FPGAs against a golden model. Experimental results on Xilinx FPGAs show that Phase I can correctly classify 8 out of 20 FPGAs under test. However, Phase I fails to detect FPGAs at fast corners and with lesser prior usage. Phase II is then used to compliment Phase I and overcome its limitations. The second phase performs a short aging step on the suspect FPGAs and exploits the aging speed reduction (due to prior usage) to cover the cases missed by Phase I. In our silicon results, Phase II detects all the fresh and recycled FPGAs correctly.","PeriodicalId":414665,"journal":{"name":"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133387206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-11-24DOI: 10.1109/DFT.2014.6962074
A. Malek, S. Tzilis, D. Khan, I. Sourdis, Georgios Smaragdos, C. Strydis
Reconfigurable hardware can be employed to tolerate permanent faults. Hardware components comprising a System-on-Chip can be partitioned into a handful of substitutable units interconnected with reconfigurable wires to allow isolation and replacement of faulty parts. This paper offers a probabilistic analysis of reconfigurable designs estimating for different fault densities the average number of fault-free components that can be constructed as well as the probability to guarantee a particular availability of components. Considering the area overheads of reconfigurability, we evaluate the resilience of various reconfigurable designs with different granularities. Based on this analysis, we conduct a comprehensive design-space exploration to identify the granularity mixes that maximize the fault-tolerance of a system. Our findings reveal that mixing fine-grain logic with a coarse-grain sparing approach tolerates up to 3× more permanent faults than component redundancy and 2× more than any other purely coarse-grain solution. Component redundancy is preferable at low fault densities, while coarse-grain and mixed-grain reconfigurability maximize availability at medium and high fault densities, respectively.
{"title":"A probabilistic analysis of resilient reconfigurable designs","authors":"A. Malek, S. Tzilis, D. Khan, I. Sourdis, Georgios Smaragdos, C. Strydis","doi":"10.1109/DFT.2014.6962074","DOIUrl":"https://doi.org/10.1109/DFT.2014.6962074","url":null,"abstract":"Reconfigurable hardware can be employed to tolerate permanent faults. Hardware components comprising a System-on-Chip can be partitioned into a handful of substitutable units interconnected with reconfigurable wires to allow isolation and replacement of faulty parts. This paper offers a probabilistic analysis of reconfigurable designs estimating for different fault densities the average number of fault-free components that can be constructed as well as the probability to guarantee a particular availability of components. Considering the area overheads of reconfigurability, we evaluate the resilience of various reconfigurable designs with different granularities. Based on this analysis, we conduct a comprehensive design-space exploration to identify the granularity mixes that maximize the fault-tolerance of a system. Our findings reveal that mixing fine-grain logic with a coarse-grain sparing approach tolerates up to 3× more permanent faults than component redundancy and 2× more than any other purely coarse-grain solution. Component redundancy is preferable at low fault densities, while coarse-grain and mixed-grain reconfigurability maximize availability at medium and high fault densities, respectively.","PeriodicalId":414665,"journal":{"name":"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133280308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-11-24DOI: 10.1109/DFT.2014.6962082
Paniz Foroutan, M. Kamal, Z. Navabi
By increasing the impact of process variation on the uncertainty of the delay of the gates, and also the need for increasing the number of test paths, delay test has become an essential part of the chip testing. In this paper, a heuristic test path selection method is proposed that is a combination of the non-optimal and optimal selection methods. In the first step of the proposed selection method, the search space is reduced by considering correlations between the paths. Next, by using ILP formulation, best paths from the reduced search space are selected. For the ILP formulation, we have proposed an objective function which considers correlation and the criticality of the paths. The results show that the delay failure capturing probability (DFCP) of the proposed path selection method for eight largest ITC'99 benchmarks, on average, is only about 3% smaller than the Monte Carlo method, while its runtime is about 1340 times smaller than the Monte Carlo approach.
{"title":"A heuristic path selection method for small delay defects test","authors":"Paniz Foroutan, M. Kamal, Z. Navabi","doi":"10.1109/DFT.2014.6962082","DOIUrl":"https://doi.org/10.1109/DFT.2014.6962082","url":null,"abstract":"By increasing the impact of process variation on the uncertainty of the delay of the gates, and also the need for increasing the number of test paths, delay test has become an essential part of the chip testing. In this paper, a heuristic test path selection method is proposed that is a combination of the non-optimal and optimal selection methods. In the first step of the proposed selection method, the search space is reduced by considering correlations between the paths. Next, by using ILP formulation, best paths from the reduced search space are selected. For the ILP formulation, we have proposed an objective function which considers correlation and the criticality of the paths. The results show that the delay failure capturing probability (DFCP) of the proposed path selection method for eight largest ITC'99 benchmarks, on average, is only about 3% smaller than the Monte Carlo method, while its runtime is about 1340 times smaller than the Monte Carlo approach.","PeriodicalId":414665,"journal":{"name":"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129401641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-11-24DOI: 10.1109/DFT.2014.6962107
H. Aziza, H. Ayari, S. Onkaraiah, J. Portal, M. Moreau, M. Bocquet
A deeper understanding of the impact of variability on Oxide-based Resistive Random Access Memory (so-called OxRRAM) is needed to propose variability tolerant designs to ensure the robustness of the technology. Although research has taken steps to resolve this issue, variability remains an important characteristic for OxRRAMs. In this paper, impact of variability on OxRRAM circuit performances is analysed quantitatively at a circuit level through electrical simulations. Variability is introduced at the memory cell level but also at the peripheral circuitry level. The aim of this study is to determine the contribution of each component of an OxRRAM circuit on the ON/OFF resistance ratio.
{"title":"Oxide based resistive RAM: ON/OFF resistance analysis versus circuit variability","authors":"H. Aziza, H. Ayari, S. Onkaraiah, J. Portal, M. Moreau, M. Bocquet","doi":"10.1109/DFT.2014.6962107","DOIUrl":"https://doi.org/10.1109/DFT.2014.6962107","url":null,"abstract":"A deeper understanding of the impact of variability on Oxide-based Resistive Random Access Memory (so-called OxRRAM) is needed to propose variability tolerant designs to ensure the robustness of the technology. Although research has taken steps to resolve this issue, variability remains an important characteristic for OxRRAMs. In this paper, impact of variability on OxRRAM circuit performances is analysed quantitatively at a circuit level through electrical simulations. Variability is introduced at the memory cell level but also at the peripheral circuitry level. The aim of this study is to determine the contribution of each component of an OxRRAM circuit on the ON/OFF resistance ratio.","PeriodicalId":414665,"journal":{"name":"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128138391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-11-24DOI: 10.1109/DFT.2014.6962092
J. Semião, David Saraiva, C. Leong, A. Romão, Marcelino B. Santos, I. Teixeira, João Paulo Teixeira
This paper presents the Scout Flip-Flop, a new performance Sensor for toleranCe and predictive detectiOn of delay-faUlTs in synchronous digital circuits. The sensor is based on a new master-slave Flip-Flop (FF), the Scout FF, with built-in functionality to locally (inside the FF) create two distinct guard-band windows: (1) a tolerance window, to increase tolerance to late transitions, making the Scout's master latch transparent during an additional predefined period after the clock trigger; and (2) a detection window, which starts before the clock edge trigger and persists during the tolerance window, to inform that performance and circuit functionality is at risk. When a PVTA (Process, power-supply Voltage, Temperature and Aging) variation occurs, circuit performance is affected and a delay-fault may occur. Hence, the existence of a tolerance window, introduces an extra time-slack by borrowing time from subsequent clock cycles. Moreover, as the predictive-error detection window starts prior to the clock edge trigger, it provides an additional safety margin and may be used to trigger corrective actions before real error occurrence, such as clock frequency reduction. Both tolerance and detection windows are defined by design and are sensitive to performance errors, increasing its size in worst PVTA conditions. Extensive SPICE simulations allowed characterizing the new flip-flop and simulation results are presented for 65nm CMOS technology, using Berkeley Predictive Technology Models (PTM), showing Scout's effectiveness on tolerance and predictive error detection.
介绍了一种用于同步数字电路延迟故障容差和预测检测的新型高性能传感器Scout触发器。该传感器基于一种新的主从触发器(FF), Scout FF,具有内置功能,可在本地(FF内部)创建两个不同的保护带窗口:(1)容差窗口,以增加对后期转换的容差,使Scout的主锁存器在时钟触发后的额外预定义时间段内透明;(2)检测窗口,在时钟边缘触发之前开始,并在容限窗口期间持续,以通知性能和电路功能处于危险之中。当PVTA (Process, power supply Voltage, Temperature, Aging)发生变化时,会影响电路性能,并可能导致延迟故障。因此,容忍窗口的存在通过从后续时钟周期中借用时间引入了额外的时间松弛。此外,由于预测错误检测窗口在时钟边缘触发之前启动,因此它提供了额外的安全裕度,并可用于在实际错误发生之前触发纠正措施,例如时钟频率降低。公差和检测窗口都是由设计定义的,对性能误差很敏感,在最坏的PVTA条件下会增加其大小。广泛的SPICE模拟允许表征新的触发器和65nm CMOS技术的模拟结果,使用伯克利预测技术模型(PTM),显示Scout在公差和预测误差检测方面的有效性。
{"title":"Performance sensor for tolerance and predictive detection of delay-faults","authors":"J. Semião, David Saraiva, C. Leong, A. Romão, Marcelino B. Santos, I. Teixeira, João Paulo Teixeira","doi":"10.1109/DFT.2014.6962092","DOIUrl":"https://doi.org/10.1109/DFT.2014.6962092","url":null,"abstract":"This paper presents the Scout Flip-Flop, a new performance Sensor for toleranCe and predictive detectiOn of delay-faUlTs in synchronous digital circuits. The sensor is based on a new master-slave Flip-Flop (FF), the Scout FF, with built-in functionality to locally (inside the FF) create two distinct guard-band windows: (1) a tolerance window, to increase tolerance to late transitions, making the Scout's master latch transparent during an additional predefined period after the clock trigger; and (2) a detection window, which starts before the clock edge trigger and persists during the tolerance window, to inform that performance and circuit functionality is at risk. When a PVTA (Process, power-supply Voltage, Temperature and Aging) variation occurs, circuit performance is affected and a delay-fault may occur. Hence, the existence of a tolerance window, introduces an extra time-slack by borrowing time from subsequent clock cycles. Moreover, as the predictive-error detection window starts prior to the clock edge trigger, it provides an additional safety margin and may be used to trigger corrective actions before real error occurrence, such as clock frequency reduction. Both tolerance and detection windows are defined by design and are sensitive to performance errors, increasing its size in worst PVTA conditions. Extensive SPICE simulations allowed characterizing the new flip-flop and simulation results are presented for 65nm CMOS technology, using Berkeley Predictive Technology Models (PTM), showing Scout's effectiveness on tolerance and predictive error detection.","PeriodicalId":414665,"journal":{"name":"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125655225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-11-24DOI: 10.1109/DFT.2014.6962080
B. Montrucchio, M. Rebaudengo, A. Velasco
Transient faults in computer-based systems for which high availability is a strict requirement, originated from several sources, like high energy particles, are a major issue. Fault injection is a commonly used method to evaluate the sensitivity of such systems. The paper presents an evaluation of the effects of faults in the memory containing the process descriptor of a Unix-based Operating System. In particular the state field has been taken into consideration as the main target, changing the current state value into another one that could be valid or invalid. An experimental analysis has been conducted on a large set of different tasks, belonging to the operating system itself. Results of tests show that the state field in the process descriptor represents a critical variable as far as dependability is considered.
{"title":"Fault injection in the process descriptor of a Unix-based operating system","authors":"B. Montrucchio, M. Rebaudengo, A. Velasco","doi":"10.1109/DFT.2014.6962080","DOIUrl":"https://doi.org/10.1109/DFT.2014.6962080","url":null,"abstract":"Transient faults in computer-based systems for which high availability is a strict requirement, originated from several sources, like high energy particles, are a major issue. Fault injection is a commonly used method to evaluate the sensitivity of such systems. The paper presents an evaluation of the effects of faults in the memory containing the process descriptor of a Unix-based Operating System. In particular the state field has been taken into consideration as the main target, changing the current state value into another one that could be valid or invalid. An experimental analysis has been conducted on a large set of different tasks, belonging to the operating system itself. Results of tests show that the state field in the process descriptor represents a critical variable as far as dependability is considered.","PeriodicalId":414665,"journal":{"name":"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"302 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123082823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-11-24DOI: 10.1109/DFT.2014.6962066
Anup Das, Akash Kumar, B. Veeravalli
Fault-tolerance is emerging as one of the important optimization objectives for designs in deep submicron technology nodes. This paper proposes a technique of application mapping and scheduling with checkpointing on a multiprocessor system to maximize the reliability considering transient faults. The proposed model incorporates checkpoints with imperfect fault detection probability, and pipelined execution and cyclic dependency associated with multimedia applications. This is solved using an Artificial Intelligence technique known as Particle Swarm Optimization to determine the number of checkpoints of every task of the application that maximizes the confidence of the output. The proposed approach is validated experimentally with synthetic and real-life application graphs. Results demonstrate the proposed technique improves the probability of correct result by an average 15% with imperfect fault detection. Additionally, even with 100% fault detection, the proposed technique is able to achieve better results (25% higher confidence) as compared to the existing fault-tolerant techniques.
{"title":"Artificial intelligence based task mapping and pipelined scheduling for checkpointing on real time systems with imperfect fault detection","authors":"Anup Das, Akash Kumar, B. Veeravalli","doi":"10.1109/DFT.2014.6962066","DOIUrl":"https://doi.org/10.1109/DFT.2014.6962066","url":null,"abstract":"Fault-tolerance is emerging as one of the important optimization objectives for designs in deep submicron technology nodes. This paper proposes a technique of application mapping and scheduling with checkpointing on a multiprocessor system to maximize the reliability considering transient faults. The proposed model incorporates checkpoints with imperfect fault detection probability, and pipelined execution and cyclic dependency associated with multimedia applications. This is solved using an Artificial Intelligence technique known as Particle Swarm Optimization to determine the number of checkpoints of every task of the application that maximizes the confidence of the output. The proposed approach is validated experimentally with synthetic and real-life application graphs. Results demonstrate the proposed technique improves the probability of correct result by an average 15% with imperfect fault detection. Additionally, even with 100% fault detection, the proposed technique is able to achieve better results (25% higher confidence) as compared to the existing fault-tolerant techniques.","PeriodicalId":414665,"journal":{"name":"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114418471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}