2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)最新文献

英文中文

Analytic reliability evaluation for fault-tolerant circuit structures on FPGAs fpga上容错电路结构的解析可靠性评估

2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)

Pub Date : 2014-11-24 DOI: 10.1109/DFT.2014.6962108

J. Anwer, M. Platzner

With increasing error-proneness of nano-circuits, a number of fault-tolerance approaches are presented in the literature to enhance circuit reliability. The evaluation of the effectiveness of fault-tolerant circuit structures remains a challenge. An analytical model is required to provide exact figures of reliability of a circuit design, to be able to locate error-sensitive parts of the circuit as well as to compare different fault-tolerance approaches. At the logic layer, probabilistic approaches exist that provide such measures of circuit reliability, but they do not consider the reliability-enhancement effect of fault-tolerant circuit structures. Furthermore, these approaches are often not applicable for large circuits and their complexity hinders the development of generic simulation tools. In this paper we combine the Boolean difference error calculator (BDEC), a previous probabilistic reliability model for hardware designs, with a reliability model for fault-tolerant circuit structures. As a result we are able to study the reliability of fault-tolerant circuit structures at the logic layer. We focus on fault-tolerant circuits to be implemented in FPGAs and show how to extend our combined model from combinational to sequential circuits. For analyzing larger circuits, we develop a MATLAB-based tool utilizing BDEC. With this tool, we perform a variability analysis of different input parameters, such as logic component, input and voter error probabilities, to observe their single and joint effect on the circuit reliability. Our analyses show that circuit reliability depends strongly on the circuit structure due to error-masking effects of components on each other. Moreover, the benefit of redundancy can be obtained up to a certain threshold of component, input and voter error probabilities though the voter reliability has the strongest impact on overall circuit reliability.

随着纳米电路的错误倾向日益增加，文献中提出了许多容错方法来提高电路的可靠性。评估容错电路结构的有效性仍然是一个挑战。需要一个解析模型来提供电路设计可靠性的精确数字，以便能够定位电路的错误敏感部分以及比较不同的容错方法。在逻辑层，存在提供这种电路可靠性度量的概率方法，但它们没有考虑容错电路结构的可靠性增强效果。此外，这些方法通常不适用于大型电路，其复杂性阻碍了通用仿真工具的发展。本文将布尔差分误差计算器(BDEC)与容错电路结构的可靠性模型相结合。因此，我们能够在逻辑层研究容错电路结构的可靠性。我们专注于在fpga中实现的容错电路，并展示了如何将我们的组合模型从组合电路扩展到顺序电路。为了分析更大的电路，我们利用BDEC开发了一个基于matlab的工具。利用该工具，我们对不同的输入参数(如逻辑元件、输入和选民错误概率)进行了可变性分析，以观察它们对电路可靠性的单一和联合影响。我们的分析表明，电路的可靠性在很大程度上取决于电路结构，因为元件之间的误差掩蔽效应。此外，尽管投票可靠性对整个电路可靠性的影响最大，但在组件、输入和投票错误概率达到一定阈值时，冗余的好处仍然可以得到。

{"title":"Analytic reliability evaluation for fault-tolerant circuit structures on FPGAs","authors":"J. Anwer, M. Platzner","doi":"10.1109/DFT.2014.6962108","DOIUrl":"https://doi.org/10.1109/DFT.2014.6962108","url":null,"abstract":"With increasing error-proneness of nano-circuits, a number of fault-tolerance approaches are presented in the literature to enhance circuit reliability. The evaluation of the effectiveness of fault-tolerant circuit structures remains a challenge. An analytical model is required to provide exact figures of reliability of a circuit design, to be able to locate error-sensitive parts of the circuit as well as to compare different fault-tolerance approaches. At the logic layer, probabilistic approaches exist that provide such measures of circuit reliability, but they do not consider the reliability-enhancement effect of fault-tolerant circuit structures. Furthermore, these approaches are often not applicable for large circuits and their complexity hinders the development of generic simulation tools. In this paper we combine the Boolean difference error calculator (BDEC), a previous probabilistic reliability model for hardware designs, with a reliability model for fault-tolerant circuit structures. As a result we are able to study the reliability of fault-tolerant circuit structures at the logic layer. We focus on fault-tolerant circuits to be implemented in FPGAs and show how to extend our combined model from combinational to sequential circuits. For analyzing larger circuits, we develop a MATLAB-based tool utilizing BDEC. With this tool, we perform a variability analysis of different input parameters, such as logic component, input and voter error probabilities, to observe their single and joint effect on the circuit reliability. Our analyses show that circuit reliability depends strongly on the circuit structure due to error-masking effects of components on each other. Moreover, the benefit of redundancy can be obtained up to a certain threshold of component, input and voter error probabilities though the voter reliability has the strongest impact on overall circuit reliability.","PeriodicalId":414665,"journal":{"name":"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129394523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Improved correction for hot pixels in digital imagers 改进了对数字成像仪中的热像素的校正

2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)

Pub Date : 2014-11-24 DOI: 10.1109/DFT.2014.6962103

G. Chapman, Rohit Thomas, Rahul Thomas, I. Koren, Z. Koren

From extensive study of digital imager defects, we found that “Hot Pixels” are the main digital camera defects, and that they increase at a nearly constant temporal rate over the camera's lifetime. Previously we characterized the hot pixels by a linear function of the exposure time in response to a dark frame setting. Using a camera with 55 known hot pixels, we compared our hot pixel correction algorithm to a conventional 4-nearest neighbor interpolation techniques. We developed a new “moving camera” method to exactly obtain both the actual hot pixel contribution and the true undamaged pixel value at a defect. Using these calibrated results we find that the correction method should be based on the hot pixel severity, the illumination intensity at the pixel, camera parameters such as ISO and exposure time, and on the neighboring pixels' variability.

通过对数码成像仪缺陷的广泛研究，我们发现“热像素”是数码相机的主要缺陷，并且它们在相机的使用寿命中以几乎恒定的时间速率增加。以前，我们通过响应于暗帧设置的曝光时间的线性函数来表征热像素。使用具有55个已知热像素的相机，我们将热像素校正算法与传统的4最近邻插值技术进行了比较。我们开发了一种新的“移动摄像机”方法，以准确地获得实际热像素贡献和真正的完好像素值在一个缺陷。根据标定结果，我们发现校正方法应基于热像素的严重程度、像素处的照明强度、相机参数(如ISO和曝光时间)以及邻近像素的可变性。

引用次数: 2

Energy-efficient concurrent testing approach for many-core systems in the dark silicon age 暗硅时代多核系统的节能并行测试方法

2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)

Pub Date : 2014-11-24 DOI: 10.1109/DFT.2014.6962075

M. Haghbayan, A. Rahmani, P. Liljeberg, J. Plosila, H. Tenhunen

Dark Silicon issue stresses that a fraction of silicon chip being able to switch in a full frequency is dropping and designers will soon face a growing underutilization inherent in future technology scaling. On the other hand, by reducing the transistor sizes, susceptibility to internal defects increases and large range of defects such as aging or transient faults will be shown up more frequently. In this paper, we propose an online concurrent test scheduling approach for the fraction of chip that cannot be utilized due to the restricted utilization wall. Dynamic voltage and frequency scaling including near-threshold operation is utilized in order to maximize the concurrency of the online testing process under the constant power. As the dark area of the system is dynamic and reshapes at a runtime, our approach dynamically tests unused cores in a runtime to provided tested cores for upcoming application and hence enhance system reliability. Empirical results show that our proposed concurrent testing approach using dynamic voltage and frequency scaling (DVFS) improves the overall test throughput by over 250% compared to the state-of-the-art dark silicon aware online testing approaches under the same power budget.

暗硅问题强调，能够切换到全频率的硅芯片的一小部分正在下降，设计师将很快面临未来技术规模固有的越来越多的未充分利用问题。另一方面，由于晶体管尺寸的减小，对内部缺陷的敏感性增加，大范围的缺陷如老化或瞬态故障将更频繁地出现。在本文中，我们提出了一种在线并发测试调度方法，用于由于限制使用墙壁而无法使用的芯片部分。为了使恒功率下在线测试过程的并发性最大化，采用了包括近阈值运算在内的动态电压和频率缩放。由于系统的暗区是动态的，并且在运行时重塑，我们的方法在运行时动态测试未使用的核心，为即将到来的应用程序提供测试的核心，从而提高系统的可靠性。实证结果表明，与相同功率预算下最先进的暗硅感知在线测试方法相比，我们提出的使用动态电压和频率缩放(DVFS)的并发测试方法将整体测试吞吐量提高了250%以上。

{"title":"Energy-efficient concurrent testing approach for many-core systems in the dark silicon age","authors":"M. Haghbayan, A. Rahmani, P. Liljeberg, J. Plosila, H. Tenhunen","doi":"10.1109/DFT.2014.6962075","DOIUrl":"https://doi.org/10.1109/DFT.2014.6962075","url":null,"abstract":"Dark Silicon issue stresses that a fraction of silicon chip being able to switch in a full frequency is dropping and designers will soon face a growing underutilization inherent in future technology scaling. On the other hand, by reducing the transistor sizes, susceptibility to internal defects increases and large range of defects such as aging or transient faults will be shown up more frequently. In this paper, we propose an online concurrent test scheduling approach for the fraction of chip that cannot be utilized due to the restricted utilization wall. Dynamic voltage and frequency scaling including near-threshold operation is utilized in order to maximize the concurrency of the online testing process under the constant power. As the dark area of the system is dynamic and reshapes at a runtime, our approach dynamically tests unused cores in a runtime to provided tested cores for upcoming application and hence enhance system reliability. Empirical results show that our proposed concurrent testing approach using dynamic voltage and frequency scaling (DVFS) improves the overall test throughput by over 250% compared to the state-of-the-art dark silicon aware online testing approaches under the same power budget.","PeriodicalId":414665,"journal":{"name":"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121583805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A heuristic path selection method for small delay defects test 小延迟缺陷检测的启发式路径选择方法

2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)

Pub Date : 2014-11-24 DOI: 10.1109/DFT.2014.6962082

Paniz Foroutan, M. Kamal, Z. Navabi

By increasing the impact of process variation on the uncertainty of the delay of the gates, and also the need for increasing the number of test paths, delay test has become an essential part of the chip testing. In this paper, a heuristic test path selection method is proposed that is a combination of the non-optimal and optimal selection methods. In the first step of the proposed selection method, the search space is reduced by considering correlations between the paths. Next, by using ILP formulation, best paths from the reduced search space are selected. For the ILP formulation, we have proposed an objective function which considers correlation and the criticality of the paths. The results show that the delay failure capturing probability (DFCP) of the proposed path selection method for eight largest ITC'99 benchmarks, on average, is only about 3% smaller than the Monte Carlo method, while its runtime is about 1340 times smaller than the Monte Carlo approach.

由于工艺变化对栅极延迟不确定性的影响越来越大，同时也需要增加测试路径的数量，延迟测试已经成为芯片测试的重要组成部分。本文提出了一种非最优和最优选择相结合的启发式测试路径选择方法。在本文提出的选择方法的第一步，通过考虑路径之间的相关性来减小搜索空间。然后，利用ILP公式从约简的搜索空间中选择最优路径。对于ILP公式，我们提出了一个考虑路径相关性和临界性的目标函数。结果表明，在8个最大的ITC’99基准测试中，所提出的路径选择方法的延迟失效捕获概率(DFCP)平均仅比蒙特卡罗方法小3%左右，而其运行时间比蒙特卡罗方法小1340倍左右。

引用次数: 3

Aging analysis for recycled FPGA detection 回收FPGA检测的老化分析

2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)

Pub Date : 2014-11-24 DOI: 10.1109/DFT.2014.6962099

H. Dogan, Domenic Forte, M. Tehranipoor

The counterfeit electronic component industry continues to threaten the security and reliability of systems by infiltrating recycled components into the supply chain. With the increased use of FPGAs in critical systems, recycled FPGAs cause significant concerns for government and industry. In this paper, we propose a two phase detection approach to differentiate recycled (used) FPGAs from new ones. Both approaches rely on machine learning via support vector machines (SVM) for classification. The first phase examines suspect FPGAs “as is” while the second phase requires some accelerated aging. To be more specific, Phase I detects recycled FPGAs by comparing the frequencies of ring oscillators (ROs) distributed on the FPGAs against a golden model. Experimental results on Xilinx FPGAs show that Phase I can correctly classify 8 out of 20 FPGAs under test. However, Phase I fails to detect FPGAs at fast corners and with lesser prior usage. Phase II is then used to compliment Phase I and overcome its limitations. The second phase performs a short aging step on the suspect FPGAs and exploits the aging speed reduction (due to prior usage) to cover the cases missed by Phase I. In our silicon results, Phase II detects all the fresh and recycled FPGAs correctly.

假冒电子元件行业通过将回收的元件渗透到供应链中，继续威胁着系统的安全性和可靠性。随着fpga在关键系统中的使用越来越多，回收fpga引起了政府和工业的重大关注。在本文中，我们提出了一种两相检测方法来区分回收(使用)fpga和新fpga。这两种方法都依赖于通过支持向量机(SVM)进行分类的机器学习。第一阶段“按原样”检查可疑的fpga，而第二阶段需要一些加速老化。更具体地说，第一阶段通过将分布在fpga上的环形振荡器(ROs)的频率与黄金模型进行比较来检测回收的fpga。在赛灵思fpga上的实验结果表明，Phase I可以正确地对20个被测fpga中的8个进行分类。然而，阶段I未能检测到fpga在快速弯道和较少的先前使用。然后用第二阶段来补充第一阶段并克服其局限性。第二阶段对可疑的fpga执行一个简短的老化步骤，并利用老化速度的降低(由于先前的使用)来覆盖第一阶段遗漏的情况。在我们的硅结果中，第二阶段正确地检测了所有新鲜和回收的fpga。

{"title":"Aging analysis for recycled FPGA detection","authors":"H. Dogan, Domenic Forte, M. Tehranipoor","doi":"10.1109/DFT.2014.6962099","DOIUrl":"https://doi.org/10.1109/DFT.2014.6962099","url":null,"abstract":"The counterfeit electronic component industry continues to threaten the security and reliability of systems by infiltrating recycled components into the supply chain. With the increased use of FPGAs in critical systems, recycled FPGAs cause significant concerns for government and industry. In this paper, we propose a two phase detection approach to differentiate recycled (used) FPGAs from new ones. Both approaches rely on machine learning via support vector machines (SVM) for classification. The first phase examines suspect FPGAs “as is” while the second phase requires some accelerated aging. To be more specific, Phase I detects recycled FPGAs by comparing the frequencies of ring oscillators (ROs) distributed on the FPGAs against a golden model. Experimental results on Xilinx FPGAs show that Phase I can correctly classify 8 out of 20 FPGAs under test. However, Phase I fails to detect FPGAs at fast corners and with lesser prior usage. Phase II is then used to compliment Phase I and overcome its limitations. The second phase performs a short aging step on the suspect FPGAs and exploits the aging speed reduction (due to prior usage) to cover the cases missed by Phase I. In our silicon results, Phase II detects all the fresh and recycled FPGAs correctly.","PeriodicalId":414665,"journal":{"name":"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133387206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 67

A probabilistic analysis of resilient reconfigurable designs 弹性可重构设计的概率分析

2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)

Pub Date : 2014-11-24 DOI: 10.1109/DFT.2014.6962074

A. Malek, S. Tzilis, D. Khan, I. Sourdis, Georgios Smaragdos, C. Strydis

Reconfigurable hardware can be employed to tolerate permanent faults. Hardware components comprising a System-on-Chip can be partitioned into a handful of substitutable units interconnected with reconfigurable wires to allow isolation and replacement of faulty parts. This paper offers a probabilistic analysis of reconfigurable designs estimating for different fault densities the average number of fault-free components that can be constructed as well as the probability to guarantee a particular availability of components. Considering the area overheads of reconfigurability, we evaluate the resilience of various reconfigurable designs with different granularities. Based on this analysis, we conduct a comprehensive design-space exploration to identify the granularity mixes that maximize the fault-tolerance of a system. Our findings reveal that mixing fine-grain logic with a coarse-grain sparing approach tolerates up to 3× more permanent faults than component redundancy and 2× more than any other purely coarse-grain solution. Component redundancy is preferable at low fault densities, while coarse-grain and mixed-grain reconfigurability maximize availability at medium and high fault densities, respectively.

可重构硬件可用于容忍永久故障。组成片上系统的硬件组件可以被分割成几个可替换的单元，这些单元通过可重构的电线相互连接，以允许隔离和更换故障部件。本文对可重构设计进行了概率分析，估计了不同故障密度下可构造的无故障部件的平均数量以及保证部件可用性的概率。考虑到可重构性的面积开销，我们评估了不同粒度的可重构设计的弹性。基于此分析，我们进行了全面的设计空间探索，以确定最大限度地提高系统容错性的粒度混合。我们的研究结果表明，将细粒度逻辑与粗粒度保留方法混合在一起，可以容忍比组件冗余多3倍的永久故障，比任何其他纯粗粒度解决方案多2倍。在低故障密度下，组件冗余更可取，而粗粒度可重构性和混合粒度可重构性分别在中、高故障密度下最大化可用性。

{"title":"A probabilistic analysis of resilient reconfigurable designs","authors":"A. Malek, S. Tzilis, D. Khan, I. Sourdis, Georgios Smaragdos, C. Strydis","doi":"10.1109/DFT.2014.6962074","DOIUrl":"https://doi.org/10.1109/DFT.2014.6962074","url":null,"abstract":"Reconfigurable hardware can be employed to tolerate permanent faults. Hardware components comprising a System-on-Chip can be partitioned into a handful of substitutable units interconnected with reconfigurable wires to allow isolation and replacement of faulty parts. This paper offers a probabilistic analysis of reconfigurable designs estimating for different fault densities the average number of fault-free components that can be constructed as well as the probability to guarantee a particular availability of components. Considering the area overheads of reconfigurability, we evaluate the resilience of various reconfigurable designs with different granularities. Based on this analysis, we conduct a comprehensive design-space exploration to identify the granularity mixes that maximize the fault-tolerance of a system. Our findings reveal that mixing fine-grain logic with a coarse-grain sparing approach tolerates up to 3× more permanent faults than component redundancy and 2× more than any other purely coarse-grain solution. Component redundancy is preferable at low fault densities, while coarse-grain and mixed-grain reconfigurability maximize availability at medium and high fault densities, respectively.","PeriodicalId":414665,"journal":{"name":"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133280308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Oxide based resistive RAM: ON/OFF resistance analysis versus circuit variability 基于氧化物的阻性RAM:开/关电阻分析与电路可变性

2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)

Pub Date : 2014-11-24 DOI: 10.1109/DFT.2014.6962107

H. Aziza, H. Ayari, S. Onkaraiah, J. Portal, M. Moreau, M. Bocquet

A deeper understanding of the impact of variability on Oxide-based Resistive Random Access Memory (so-called OxRRAM) is needed to propose variability tolerant designs to ensure the robustness of the technology. Although research has taken steps to resolve this issue, variability remains an important characteristic for OxRRAMs. In this paper, impact of variability on OxRRAM circuit performances is analysed quantitatively at a circuit level through electrical simulations. Variability is introduced at the memory cell level but also at the peripheral circuitry level. The aim of this study is to determine the contribution of each component of an OxRRAM circuit on the ON/OFF resistance ratio.

需要更深入地了解可变性对基于氧化物的电阻随机存取存储器(OxRRAM)的影响，以提出可变性容忍设计，以确保该技术的鲁棒性。尽管研究已经采取措施来解决这个问题，可变性仍然是oxrram的一个重要特征。本文通过电学仿真，在电路层面定量分析了可变性对OxRRAM电路性能的影响。可变性不仅存在于存储单元级，也存在于外围电路级。本研究的目的是确定OxRRAM电路的每个组件对开/关电阻比的贡献。

引用次数: 1

Exploiting Intel TSX for fault-tolerant execution in safety-critical systems 利用Intel TSX在安全关键系统中实现容错执行

2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)

Pub Date : 2014-11-24 DOI: 10.1109/DFT.2014.6962083

Florian Haas, Sebastian Weis, Stefan Metzlaff, T. Ungerer

Safety-critical systems demand increasing computational power, which requests high-performance embedded systems. While commercial-of-the-shelf (COTS) processors offer high computational performance for a low price, they do not provide hardware support for fault-tolerant execution. However, pure software-based fault-tolerance methods entail high design complexity and runtime overhead. In this paper, we present an efficient software/hardware-based redundant execution scheme for a COTS ×86 processor, which exploits the Transactional Synchronization Extensions (TSX) introduced with the Intel Haswell microarchitecture. Our approach extends a static binary instrumentation tool to insert fault-tolerant transactions and fault-detection instructions at function granularity. TSX hardware support is used for error containment and recovery. The average runtime overhead for selected SPEC2006 benchmarks was only 49% compared to a non-fault-tolerant execution.

安全关键系统需要不断增长的计算能力，这就要求高性能的嵌入式系统。虽然商用处理器以低廉的价格提供高计算性能，但它们不提供对容错执行的硬件支持。然而，纯基于软件的容错方法需要很高的设计复杂性和运行时开销。在本文中，我们提出了一种高效的基于软件/硬件的COTS ×86处理器冗余执行方案，该方案利用了英特尔Haswell微架构引入的事务同步扩展(TSX)。我们的方法扩展了静态二进制检测工具，以在功能粒度上插入容错事务和故障检测指令。TSX硬件支持用于错误控制和恢复。与非容错执行相比，所选SPEC2006基准测试的平均运行时开销仅为49%。

引用次数: 5

Fault injection in the process descriptor of a Unix-based operating system 基于unix的操作系统的进程描述符中的错误注入

2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)

Pub Date : 2014-11-24 DOI: 10.1109/DFT.2014.6962080

B. Montrucchio, M. Rebaudengo, A. Velasco

Transient faults in computer-based systems for which high availability is a strict requirement, originated from several sources, like high energy particles, are a major issue. Fault injection is a commonly used method to evaluate the sensitivity of such systems. The paper presents an evaluation of the effects of faults in the memory containing the process descriptor of a Unix-based Operating System. In particular the state field has been taken into consideration as the main target, changing the current state value into another one that could be valid or invalid. An experimental analysis has been conducted on a large set of different tasks, belonging to the operating system itself. Results of tests show that the state field in the process descriptor represents a critical variable as far as dependability is considered.

在对高可用性有严格要求的计算机系统中，由多种来源(如高能粒子)引起的瞬时故障是一个主要问题。故障注入是评估这类系统灵敏度的常用方法。本文提出了一个基于unix操作系统的包含进程描述符的内存故障影响的评估方法。特别是以state字段为主要目标，将当前状态值更改为另一个可能有效或无效的状态值。对属于操作系统本身的大量不同任务进行了实验分析。测试结果表明，就可靠性而言，进程描述符中的状态字段代表一个关键变量。

引用次数: 7

Artificial intelligence based task mapping and pipelined scheduling for checkpointing on real time systems with imperfect fault detection 基于人工智能的任务映射和流水线调度用于不完全故障检测的实时系统的检查点

2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)

Pub Date : 2014-11-24 DOI: 10.1109/DFT.2014.6962066

Anup Das, Akash Kumar, B. Veeravalli

Fault-tolerance is emerging as one of the important optimization objectives for designs in deep submicron technology nodes. This paper proposes a technique of application mapping and scheduling with checkpointing on a multiprocessor system to maximize the reliability considering transient faults. The proposed model incorporates checkpoints with imperfect fault detection probability, and pipelined execution and cyclic dependency associated with multimedia applications. This is solved using an Artificial Intelligence technique known as Particle Swarm Optimization to determine the number of checkpoints of every task of the application that maximizes the confidence of the output. The proposed approach is validated experimentally with synthetic and real-life application graphs. Results demonstrate the proposed technique improves the probability of correct result by an average 15% with imperfect fault detection. Additionally, even with 100% fault detection, the proposed technique is able to achieve better results (25% higher confidence) as compared to the existing fault-tolerant techniques.

在深亚微米技术节点设计中，容错正成为重要的优化目标之一。针对多处理机系统暂态故障，提出了一种具有检查点的应用映射调度技术。该模型结合了具有不完全故障检测概率的检查点，以及与多媒体应用相关的流水线执行和循环依赖。这是通过一种被称为粒子群优化的人工智能技术来解决的，该技术可以确定应用程序中每个任务的检查点数量，从而最大化输出的置信度。该方法通过合成图和实际应用图进行了实验验证。结果表明，在不完全故障检测的情况下，该方法的正确率平均提高了15%。此外，即使100%的故障检测，与现有的容错技术相比，所提出的技术也能够获得更好的结果(置信度提高25%)。

{"title":"Artificial intelligence based task mapping and pipelined scheduling for checkpointing on real time systems with imperfect fault detection","authors":"Anup Das, Akash Kumar, B. Veeravalli","doi":"10.1109/DFT.2014.6962066","DOIUrl":"https://doi.org/10.1109/DFT.2014.6962066","url":null,"abstract":"Fault-tolerance is emerging as one of the important optimization objectives for designs in deep submicron technology nodes. This paper proposes a technique of application mapping and scheduling with checkpointing on a multiprocessor system to maximize the reliability considering transient faults. The proposed model incorporates checkpoints with imperfect fault detection probability, and pipelined execution and cyclic dependency associated with multimedia applications. This is solved using an Artificial Intelligence technique known as Particle Swarm Optimization to determine the number of checkpoints of every task of the application that maximizes the confidence of the output. The proposed approach is validated experimentally with synthetic and real-life application graphs. Results demonstrate the proposed technique improves the probability of correct result by an average 15% with imperfect fault detection. Additionally, even with 100% fault detection, the proposed technique is able to achieve better results (25% higher confidence) as compared to the existing fault-tolerant techniques.","PeriodicalId":414665,"journal":{"name":"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114418471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀