2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)最新文献

英文中文

Modeling and analysing operation processes for dependability 建模和分析操作过程的可靠性

2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2013-06-24 DOI: 10.1109/DSN.2013.6575337

Xiwei Xu, Liming Zhu, J. Li, L. Bass, Q. Lu, Min Fu

Application dependability issues depend on increasingly sophisticated activities during operation time for deployment, upgrade, scaling out/in and reactions to various failures. Traditional approaches to improving application dependability focus on artifact-oriented troubleshooting and improvements. In this paper, we present an approach using process models to represent and analyze operations with considerations of exception handlings and fault-proneness. Our goal is to reduce diagnosis and repair time for application failures that occur during operation activities such as deployment and upgrade.

应用程序可靠性问题依赖于部署、升级、向外扩展/向内扩展以及对各种故障的反应等操作期间日益复杂的活动。改进应用程序可靠性的传统方法侧重于面向工件的故障排除和改进。在本文中，我们提出了一种使用过程模型来表示和分析操作的方法，并考虑了异常处理和故障倾向。我们的目标是减少在部署和升级等操作活动期间发生的应用程序故障的诊断和修复时间。

引用次数: 3

An adaptive approach to dependable circuits for a digital power control 一种数字功率控制可靠电路的自适应方法

2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2013-06-24 DOI: 10.1109/DSN.2013.6575333

Aromhack Saysanasongkham, K. Imai, M. Arai, S. Fukumoto, K. Wada

Recently, a microcomputer and a FPGA are apt to be used for control of the power conversion circuits because of their capability to simplify the parameter resetting and also their flexibility on the basis of programming by software. On the other hand, the control circuits are getting extremely close to the high current main circuit. Thus the electro-magnetic radiation generated nearby the high current pulse may affect the control circuit as transient faults. In this study, we focus on transient noise caused by switching activities of a DC-DC converter and propose a dependable digital power control circuit by FPGA. The basic idea is to keep the sampling times as far away from the switching times as possible to avoid the effects of transient noise. A control circuit, with the proposed method applied, is designed and its effectiveness is shown by simulations.

近年来，由于微机和FPGA能够简化参数的复位，并且在软件编程的基础上具有灵活性，因此易于用于功率转换电路的控制。另一方面，控制电路越来越接近大电流主电路。因此，在大电流脉冲附近产生的电磁辐射可能作为暂态故障影响控制电路。本研究针对DC-DC变换器开关活动引起的瞬态噪声，提出了一种可靠的FPGA数字功率控制电路。其基本思想是使采样次数尽可能远离开关次数，以避免瞬态噪声的影响。应用该方法设计了控制电路，并通过仿真验证了其有效性。

引用次数: 0

Manipulating semantic values in kernel data structures: Attack assessments and implications 操纵内核数据结构中的语义值:攻击评估和影响

2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2013-06-24 DOI: 10.1109/DSN.2013.6575344

Aravind Prakash, Eknath Venkataramani, Heng Yin, Zhiqiang Lin

Semantic values in kernel data structures are critical to many security applications, such as virtual machine introspection, malware analysis, and memory forensics. However, malware, or more specifically a kernel rootkit, can often directly tamper with the raw kernel data structures, known as DKOM (Direct Kernel Object Manipulation) attacks, thereby significantly thwarting security analysis. In addition to manipulating pointer fields to hide certain kernel objects, DKOM attacks may also mutate semantic values, which are data values with important semantic meanings. Prior research efforts have been made to defeat pointer manipulation attacks and thus identify hidden kernel objects. However, the space and severity of Semantic Value Manipulation (SVM) attacks have not received sufficient understanding. In this paper, we take a first step to systematically assess this attack space. To this end, we devise a new fuzz testing technique, namely - duplicate-value directed semantic field fuzzing, and implement a prototype called MOSS. Using MOSS, we evaluate two widely used operating systems: Windows XP and Ubuntu 10.04. Our experimental results show that the space of SVM attacks is vast for both OSes. Our proof-of-concept kernel rootkit further demonstrates that it can successfully evade all the security tools tested in our experiments, including recently proposed robust signature schemes. Moreover, our duplicate value analysis implies the challenges in defeating SVM attacks, such as an intuitive cross checking approach on duplicate values can only provide marginal detection improvement. Our study motivates revisiting of existing security solutions and calls for more effective defense against kernel threats.

内核数据结构中的语义值对许多安全应用程序至关重要，例如虚拟机自省、恶意软件分析和内存取证。然而，恶意软件，或者更具体地说是内核rootkit，通常可以直接篡改原始内核数据结构，称为DKOM(直接内核对象操作)攻击，从而严重阻碍安全分析。除了操纵指针字段来隐藏某些内核对象之外，DKOM攻击还可能会改变语义值，即具有重要语义含义的数据值。先前的研究工作已经被用来挫败指针操作攻击，从而识别隐藏的内核对象。然而，语义值操纵(SVM)攻击的范围和严重程度还没有得到足够的认识。在本文中，我们采取了第一步系统地评估这个攻击空间。为此，我们设计了一种新的模糊测试技术，即双值定向语义场模糊测试，并实现了一个名为MOSS的原型。使用MOSS，我们评估了两种广泛使用的操作系统:Windows XP和Ubuntu 10.04。实验结果表明，两种操作系统的SVM攻击空间都很大。我们的概念验证内核rootkit进一步证明，它可以成功地逃避我们实验中测试的所有安全工具，包括最近提出的鲁棒签名方案。此外，我们的重复值分析暗示了击败支持向量机攻击的挑战，例如对重复值的直观交叉检查方法只能提供边际检测改进。我们的研究促使人们重新审视现有的安全解决方案，并呼吁对内核威胁进行更有效的防御。

{"title":"Manipulating semantic values in kernel data structures: Attack assessments and implications","authors":"Aravind Prakash, Eknath Venkataramani, Heng Yin, Zhiqiang Lin","doi":"10.1109/DSN.2013.6575344","DOIUrl":"https://doi.org/10.1109/DSN.2013.6575344","url":null,"abstract":"Semantic values in kernel data structures are critical to many security applications, such as virtual machine introspection, malware analysis, and memory forensics. However, malware, or more specifically a kernel rootkit, can often directly tamper with the raw kernel data structures, known as DKOM (Direct Kernel Object Manipulation) attacks, thereby significantly thwarting security analysis. In addition to manipulating pointer fields to hide certain kernel objects, DKOM attacks may also mutate semantic values, which are data values with important semantic meanings. Prior research efforts have been made to defeat pointer manipulation attacks and thus identify hidden kernel objects. However, the space and severity of Semantic Value Manipulation (SVM) attacks have not received sufficient understanding. In this paper, we take a first step to systematically assess this attack space. To this end, we devise a new fuzz testing technique, namely - duplicate-value directed semantic field fuzzing, and implement a prototype called MOSS. Using MOSS, we evaluate two widely used operating systems: Windows XP and Ubuntu 10.04. Our experimental results show that the space of SVM attacks is vast for both OSes. Our proof-of-concept kernel rootkit further demonstrates that it can successfully evade all the security tools tested in our experiments, including recently proposed robust signature schemes. Moreover, our duplicate value analysis implies the challenges in defeating SVM attacks, such as an intuitive cross checking approach on duplicate values can only provide marginal detection improvement. Our study motivates revisiting of existing security solutions and calls for more effective defense against kernel threats.","PeriodicalId":163407,"journal":{"name":"2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121728291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Security implications of memory deduplication in a virtualized environment 虚拟化环境下内存重复数据删除的安全问题

2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2013-06-24 DOI: 10.1109/DSN.2013.6575349

Jidong Xiao, Zhang Xu, Hai Huang, Haining Wang

Memory deduplication has been widely used in various commodity hypervisors. By merging identical memory contents, it allows more virtual machines to run concurrently on top of a hypervisor. However, while this technique improves memory efficiency, it has a large impact on system security. In particular, memory deduplication is usually implemented using a variant of copy-on-write techniques, for which, writing to a shared page would incur a longer access time than those non-shared. In this paper, we investigate the security implication of memory deduplication from the perspectives of both attackers and defenders. On one hand, using the artifact above, we demonstrate two new attacks to create a covert channel and detect virtualization, respectively. On the other hand, we also show that memory deduplication can be leveraged to safeguard Linux kernel integrity.

内存重复数据删除已广泛用于各种商用管理程序。通过合并相同的内存内容，它允许更多的虚拟机并发地运行在一个管理程序之上。然而，虽然这种技术提高了内存效率，但它对系统安全性有很大的影响。特别是，内存重复数据删除通常使用写时复制(copy-on-write)技术的一种变体来实现，在这种技术中，写入共享页面会比非共享页面产生更长的访问时间。在本文中，我们从攻击者和防御者的角度研究了内存重复删除的安全含义。一方面，使用上面的构件，我们演示了两种新的攻击，分别用于创建隐蔽通道和检测虚拟化。另一方面，我们还展示了可以利用内存重复数据删除来保护Linux内核的完整性。

引用次数: 64

Improving SSD reliability with RAID via Elastic Striping and Anywhere Parity 通过弹性条带和任意奇偶校验提高RAID的SSD可靠性

2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2013-06-24 DOI: 10.1109/DSN.2013.6575359

Jaeho Kim, Jongmin Lee, Jongmoo Choi, Donghee Lee, S. Noh

While the move from SLC to MLC/TLC flash memory technology is increasing SSD capacity at lower cost, it is being done at the cost of sacrificing reliability. An approach to remedy this loss is to employ the RAID architecture with the chips that comprise SSDs. However, using the traditional RAID approach may result in negative effects as the total number of writes may increase due to the parity updates, consequently leading to increased P/E cycles and higher bit error rates. Using a technique that we call Elastic Striping and Anywhere Parity (eSAP), we develop eSAP-RAID, a RAID scheme that significantly reduces parity writes while providing reliability better than RAID-5. We derive performance and lifetime models of SSDs employing RAID-5 and eSAP-RAID that show the benefits of eSAP-RAID. We also implement these schemes in SSDs using DiskSim with SSD Extension and validate the models using realistic workloads. Our results show that eSAP-RAID improves reliability considerably, while limiting its wear. Specifically, the expected lifetime of eSAP-RAID employing SSDs may be as long as current ECC based SSDs, while its reliability level can be maintained at the level of the early stages of current ECC based SSDs throughout its entire lifetime.

虽然从SLC到MLC/TLC闪存技术的转变以更低的成本增加了SSD的容量，但这是以牺牲可靠性为代价的。弥补这种损失的一种方法是在组成ssd的芯片上使用RAID体系结构。但是，使用传统的RAID方法可能会产生负面影响，因为奇偶更新可能会增加写的总数，从而导致P/E周期增加和误码率提高。使用我们称为弹性条带和任意奇偶校验(eSAP)的技术，我们开发了eSAP-RAID，这是一种显著减少奇偶校验写入的RAID方案，同时提供比RAID-5更好的可靠性。我们推导了采用RAID-5和eSAP-RAID的ssd的性能和寿命模型，这些模型显示了eSAP-RAID的好处。我们还使用带有SSD扩展的DiskSim在SSD中实现了这些方案，并使用实际工作负载验证了这些模型。我们的研究结果表明，eSAP-RAID大大提高了可靠性，同时限制了其磨损。具体来说，采用固态硬盘的eSAP-RAID的预期寿命可能与当前基于ECC的固态硬盘一样长，而其可靠性水平可以在整个生命周期内保持在当前基于ECC的固态硬盘的早期阶段的水平。

{"title":"Improving SSD reliability with RAID via Elastic Striping and Anywhere Parity","authors":"Jaeho Kim, Jongmin Lee, Jongmoo Choi, Donghee Lee, S. Noh","doi":"10.1109/DSN.2013.6575359","DOIUrl":"https://doi.org/10.1109/DSN.2013.6575359","url":null,"abstract":"While the move from SLC to MLC/TLC flash memory technology is increasing SSD capacity at lower cost, it is being done at the cost of sacrificing reliability. An approach to remedy this loss is to employ the RAID architecture with the chips that comprise SSDs. However, using the traditional RAID approach may result in negative effects as the total number of writes may increase due to the parity updates, consequently leading to increased P/E cycles and higher bit error rates. Using a technique that we call Elastic Striping and Anywhere Parity (eSAP), we develop eSAP-RAID, a RAID scheme that significantly reduces parity writes while providing reliability better than RAID-5. We derive performance and lifetime models of SSDs employing RAID-5 and eSAP-RAID that show the benefits of eSAP-RAID. We also implement these schemes in SSDs using DiskSim with SSD Extension and validate the models using realistic workloads. Our results show that eSAP-RAID improves reliability considerably, while limiting its wear. Specifically, the expected lifetime of eSAP-RAID employing SSDs may be as long as current ECC based SSDs, while its reliability level can be maintained at the level of the early stages of current ECC based SSDs throughout its entire lifetime.","PeriodicalId":163407,"journal":{"name":"2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127805607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58

Fault detection and localization in distributed systems using invariant relationships 基于不变关系的分布式系统故障检测与定位

2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2013-06-24 DOI: 10.1109/DSN.2013.6575304

Abhishek B. Sharma, Haifeng Chen, Min Ding, K. Yoshihira, Guofei Jiang

Recent advances in sensing and communication technologies enable us to collect round-the-clock monitoring data from a wide-array of distributed systems including data centers, manufacturing plants, transportation networks, automobiles, etc. Often this data is in the form of time series collected from multiple sensors (hardware as well as software based). Previously, we developed a time-invariant relationships based approach that uses Auto-Regressive models with eXogenous input (ARX) to model this data. A tool based on our approach has been effective for fault detection and capacity planning in distributed systems. In this paper, we first describe our experience in applying this tool in real-world settings. We also discuss the challenges in fault localization that we face when using our tool, and present two approaches - a spatial approach based on invariant graphs and a temporal approach based on expected broken invariant patterns - that we developed to address this problem.

传感和通信技术的最新进展使我们能够从广泛的分布式系统(包括数据中心、制造工厂、运输网络、汽车等)收集全天候监控数据。通常，这些数据是以时间序列的形式从多个传感器(基于硬件和基于软件)收集的。在此之前，我们开发了一种基于时不变关系的方法，该方法使用带有外生输入(ARX)的自回归模型来建模该数据。基于该方法的工具已被用于分布式系统的故障检测和容量规划。在本文中，我们首先描述了在实际环境中应用该工具的经验。我们还讨论了在使用我们的工具时所面临的故障定位挑战，并提出了两种方法——基于不变图的空间方法和基于预期的破坏不变模式的时间方法——我们开发了这两种方法来解决这个问题。

引用次数: 49

State-of-the-practice in data center virtualization: Toward a better understanding of VM usage 数据中心虚拟化的实践现状:更好地理解虚拟机的使用

2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2013-06-24 DOI: 10.1109/DSN.2013.6575350

R. Birke, Andrej Podzimek, L. Chen, E. Smirni

Hardware virtualization is the prevalent way to share data centers among different tenants. In this paper we present a large scale workload characterization study that aims to a better understanding of the state-of-the-practice, i.e., how data centers in the private cloud are used by their customers, how physical resources are shared among different tenants using virtualization, and how virtualization technologies are actually employed. Our study focuses on all corporate data centers of a major infrastructure provider that are geographically dispersed across the entire globe and reports on their observed usage across a 19-day period. We especially focus on how virtual machines are deployed across different physical resources with an emphasis on processors and memory, focusing on resource sharing and usage of physical resources, virtual machine life cycles, and migration patterns and frequencies. Our study illustrates that there is a huge tendency in over provisioning resources while being conservative to the several possibilities opened up by virtualization (e.g., migration and co-location), showing tremendous potential for the development of policies aiming to reduce data center operational costs.

硬件虚拟化是在不同租户之间共享数据中心的流行方式。在本文中，我们提出了一项大规模的工作负载特征研究，旨在更好地理解现状，即客户如何使用私有云中的数据中心，如何在使用虚拟化的不同租户之间共享物理资源，以及如何实际使用虚拟化技术。我们的研究重点是一家主要基础设施提供商的所有企业数据中心，这些数据中心在地理上分散在全球各地，并报告了在19天内观察到的使用情况。我们特别关注如何跨不同的物理资源部署虚拟机，重点是处理器和内存，关注物理资源的资源共享和使用、虚拟机生命周期以及迁移模式和频率。我们的研究表明，在过度配置资源的同时，对虚拟化带来的几种可能性(例如迁移和托管)保持保守的趋势非常明显，这显示出旨在降低数据中心运营成本的策略开发的巨大潜力。

{"title":"State-of-the-practice in data center virtualization: Toward a better understanding of VM usage","authors":"R. Birke, Andrej Podzimek, L. Chen, E. Smirni","doi":"10.1109/DSN.2013.6575350","DOIUrl":"https://doi.org/10.1109/DSN.2013.6575350","url":null,"abstract":"Hardware virtualization is the prevalent way to share data centers among different tenants. In this paper we present a large scale workload characterization study that aims to a better understanding of the state-of-the-practice, i.e., how data centers in the private cloud are used by their customers, how physical resources are shared among different tenants using virtualization, and how virtualization technologies are actually employed. Our study focuses on all corporate data centers of a major infrastructure provider that are geographically dispersed across the entire globe and reports on their observed usage across a 19-day period. We especially focus on how virtual machines are deployed across different physical resources with an emphasis on processors and memory, focusing on resource sharing and usage of physical resources, virtual machine life cycles, and migration patterns and frequencies. Our study illustrates that there is a huge tendency in over provisioning resources while being conservative to the several possibilities opened up by virtualization (e.g., migration and co-location), showing tremendous potential for the development of policies aiming to reduce data center operational costs.","PeriodicalId":163407,"journal":{"name":"2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123130629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 57

Evaluating Xilinx SEU Controller Macro for fault injection 评估Xilinx SEU控制器宏的故障注入

2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2013-06-24 DOI: 10.1109/DSN.2013.6575336

J. Nunes, J. Cunha, R. Barbosa, M. Z. Rela

This paper presents a preliminary evaluation of the SEU Controller Macro, a VHDL component developed by Xilinx for the detection and recovery of single event upsets, as a building block of an FPGA fault-injector. We found that this SEU Controller Macro is extremely effective for injecting faults into the FPGA configuration memory, as single and double bit-flips, with precise location, virtually no intrusiveness, and coarse timing accuracy. We present some clues on how to extend its functionalities to build a fully-fledge FPGA fault injector.

本文介绍了SEU控制器宏的初步评估，SEU控制器宏是Xilinx开发的用于检测和恢复单个事件异常的VHDL组件，作为FPGA故障注入器的构建块。我们发现这个SEU控制器宏对于将故障注入FPGA配置内存非常有效，作为单位和双位翻转，具有精确的位置，几乎没有侵入性和粗糙的定时精度。我们提出了一些关于如何扩展其功能以构建一个成熟的FPGA故障注入器的线索。

引用次数: 5

Stress balancing to mitigate NBTI effects in register files 压力平衡以减轻寄存器文件中的NBTI影响

2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2013-06-24 DOI: 10.1109/DSN.2013.6575315

H. Amrouch, T. Ebi, J. Henkel

Negative Bias Temperature Instability (NBTI) is considered one of the major reliability concerns of transistors in current and upcoming technology nodes and a main cause of their diminished lifetime. We propose a new means to mitigate the effects of NBTI on SRAM-based register files, which are particularly vulnerable due to their small structure size and are under continuous voltage stress for prolonged intervals. The conducted results from our technology simulator demonstrate the severity of NBTI effects on the SRAM cells - especially when process variation is taken into account. Based on the presented analysis, we show that NBTI stress in different registers needs to be tackled using different strategies corresponding to their access patterns. To this end, we propose to selectively increase the resilience of individual registers against NBTI. Our technique balances the gate voltage stress of the two PMOS transistors of an SRAM cell such that both are under stress for approximately the same amount of time during operation - thereby minimizing the deleterious effects of NBTI. We present mitigation implementations in both hardware and in software along with the incurred overhead. Through a wide range of applications we can show that our technique reduces the NBTI-induced reliability degradation by 35% on average. This is 22% better than current State-of-the-Art.

负偏置温度不稳定性(NBTI)被认为是当前和未来技术节点中晶体管可靠性的主要问题之一，也是导致其寿命缩短的主要原因。我们提出了一种新的方法来减轻NBTI对基于sram的寄存器文件的影响，这些寄存器文件由于结构尺寸小且长时间处于连续电压应力下而特别脆弱。从我们的技术模拟器中进行的结果表明NBTI对SRAM单元的影响的严重性-特别是当考虑到工艺变化时。在此基础上，我们发现不同寄存器的NBTI应力需要采用不同的策略来解决。为此，我们建议选择性地增加单个寄存器对NBTI的弹性。我们的技术平衡了SRAM单元的两个PMOS晶体管的栅极电压应力，使它们在工作期间处于应力状态的时间大致相同，从而最大限度地减少了NBTI的有害影响。我们介绍了硬件和软件方面的缓解实现以及产生的开销。通过广泛的应用，我们可以证明我们的技术可以将nbti引起的可靠性下降平均降低35%。这比目前最先进的技术好22%。

{"title":"Stress balancing to mitigate NBTI effects in register files","authors":"H. Amrouch, T. Ebi, J. Henkel","doi":"10.1109/DSN.2013.6575315","DOIUrl":"https://doi.org/10.1109/DSN.2013.6575315","url":null,"abstract":"Negative Bias Temperature Instability (NBTI) is considered one of the major reliability concerns of transistors in current and upcoming technology nodes and a main cause of their diminished lifetime. We propose a new means to mitigate the effects of NBTI on SRAM-based register files, which are particularly vulnerable due to their small structure size and are under continuous voltage stress for prolonged intervals. The conducted results from our technology simulator demonstrate the severity of NBTI effects on the SRAM cells - especially when process variation is taken into account. Based on the presented analysis, we show that NBTI stress in different registers needs to be tackled using different strategies corresponding to their access patterns. To this end, we propose to selectively increase the resilience of individual registers against NBTI. Our technique balances the gate voltage stress of the two PMOS transistors of an SRAM cell such that both are under stress for approximately the same amount of time during operation - thereby minimizing the deleterious effects of NBTI. We present mitigation implementations in both hardware and in software along with the incurred overhead. Through a wide range of applications we can show that our technique reduces the NBTI-induced reliability degradation by 35% on average. This is 22% better than current State-of-the-Art.","PeriodicalId":163407,"journal":{"name":"2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116353601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Error detector placement for soft computation 用于软计算的错误检测器位置

2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2013-06-24 DOI: 10.1109/DSN.2013.6575353

Anna Thomas, K. Pattabiraman

The scaling of Silicon devices has exacerbated the unreliability of modern computer systems, and power constraints have necessitated the involvement of software in hardware error detection. At the same time, emerging workloads in the form of soft computing applications, (e.g., multimedia applications) can tolerate most hardware errors as long as the erroneous outputs do not deviate significantly from error-free outcomes. We term outcomes that deviate significantly from the error-free outcomes as Egregious Data Corruptions (EDCs). In this study, we propose a technique to place detectors for selectively detecting EDC causing errors in an application. We performed an initial study to formulate heuristics that identify EDC causing data. Based on these heuristics, we developed an algorithm that identifies program locations for placing high coverage detectors for EDCs using static analysis.We evaluate our technique on six benchmarks to measure the EDC coverage under given performance overhead bounds. Our technique achieves an average EDC coverage of 82%, under performance overheads of 10%, while detecting 10% of the Non-EDC and benign faults.

硅器件的规模化加剧了现代计算机系统的不可靠性，而功率限制使得软件参与硬件错误检测成为必要。与此同时，以软计算应用程序(例如多媒体应用程序)形式出现的工作负载可以容忍大多数硬件错误，只要错误输出不会明显偏离无错误的结果。我们将严重偏离无错误结果的结果称为严重数据损坏(EDCs)。在这项研究中，我们提出了一种放置检测器的技术，用于选择性地检测应用程序中导致错误的EDC。我们进行了初步研究，以制定启发式方法来确定EDC的成因数据。基于这些启发式方法，我们开发了一种算法，该算法可以使用静态分析来确定为EDCs放置高覆盖率检测器的程序位置。我们在六个基准上评估了我们的技术，以测量给定性能开销界限下的EDC覆盖率。我们的技术实现了平均82%的EDC覆盖率，在10%的性能开销下，同时检测10%的非EDC和良性故障。

{"title":"Error detector placement for soft computation","authors":"Anna Thomas, K. Pattabiraman","doi":"10.1109/DSN.2013.6575353","DOIUrl":"https://doi.org/10.1109/DSN.2013.6575353","url":null,"abstract":"The scaling of Silicon devices has exacerbated the unreliability of modern computer systems, and power constraints have necessitated the involvement of software in hardware error detection. At the same time, emerging workloads in the form of soft computing applications, (e.g., multimedia applications) can tolerate most hardware errors as long as the erroneous outputs do not deviate significantly from error-free outcomes. We term outcomes that deviate significantly from the error-free outcomes as Egregious Data Corruptions (EDCs). In this study, we propose a technique to place detectors for selectively detecting EDC causing errors in an application. We performed an initial study to formulate heuristics that identify EDC causing data. Based on these heuristics, we developed an algorithm that identifies program locations for placing high coverage detectors for EDCs using static analysis.We evaluate our technique on six benchmarks to measure the EDC coverage under given performance overhead bounds. Our technique achieves an average EDC coverage of 82%, under performance overheads of 10%, while detecting 10% of the Non-EDC and benign faults.","PeriodicalId":163407,"journal":{"name":"2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127494172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀