首页 > 最新文献

2009 IEEE/IFIP International Conference on Dependable Systems & Networks最新文献

英文 中文
A self-diagnosis technique using Reed-Solomon codes for self-repairing chips 一种使用里德-所罗门代码的自我诊断技术,用于自我修复芯片
Pub Date : 2009-09-29 DOI: 10.1109/DSN.2009.5270327
Xiangyu Tang, Seongmoon Wang
A self-diagnosis circuit that can be used for builtin self-repair is proposed. The circuit under diagnosis is assumed to be comprised of a large number of field repairable units (FRUs), which can be replaced with spares when they are found to be defective. Since the proposed self-diagnosis circuit is implemented on the chip, responses that are scanned out of scan chains are compressed first by the space compression circuit and then by the time compression circuit to reduce the volume of test response data. Both the space and the time compression circuit implement a Reed-Solomon code. Unlike prior work, in the proposed technique, responses of all FRUs are observed at the same time to reduce diagnosis time. The proposed diagnosis circuit can locate up to l defective FRUs. We propose a novel space-compression circuit that reduces hardware overhead by exploiting the frequency difference of the scan shift clock and the system clock. When the size of constituent multiple-input signature-register (MISR) is m, the total number of signatures to be stored for the fault-free signature is 2lmB bits, where 1 ≤ B ≤ m. The experimental results show that the proposed diagnosis circuit that can locate up to 4 defective FRUs in the same test session can be implemented with less than 1 % of hardware overhead for a large industrial design. Hardware overhead for the diagnosis circuit is lower for large CUDs.
提出了一种可用于内置自修复的自诊断电路。假定诊断电路由大量现场可维修单元组成,当发现故障时可以用备件替换。由于所提出的自诊断电路是在芯片上实现的,因此扫描出扫描链的响应首先由空间压缩电路压缩,然后由时间压缩电路压缩,以减少测试响应数据的体积。空间和时间压缩电路都实现了里德-所罗门码。与先前的工作不同,在该技术中,所有fru的反应被同时观察,以减少诊断时间。所提出的诊断电路可定位多达1个故障fru。我们提出了一种新的空间压缩电路,通过利用扫描移位时钟和系统时钟的频率差来减少硬件开销。当组成多输入签名寄存器(MISR)的大小为m时,无故障签名需要存储的签名总数为2lmB位,其中1≤B≤m。实验结果表明,所提出的诊断电路可以在同一测试会话中定位多达4个故障fru,硬件开销低于1%,适用于大型工业设计。对于大型cud,诊断电路的硬件开销较低。
{"title":"A self-diagnosis technique using Reed-Solomon codes for self-repairing chips","authors":"Xiangyu Tang, Seongmoon Wang","doi":"10.1109/DSN.2009.5270327","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270327","url":null,"abstract":"A self-diagnosis circuit that can be used for builtin self-repair is proposed. The circuit under diagnosis is assumed to be comprised of a large number of field repairable units (FRUs), which can be replaced with spares when they are found to be defective. Since the proposed self-diagnosis circuit is implemented on the chip, responses that are scanned out of scan chains are compressed first by the space compression circuit and then by the time compression circuit to reduce the volume of test response data. Both the space and the time compression circuit implement a Reed-Solomon code. Unlike prior work, in the proposed technique, responses of all FRUs are observed at the same time to reduce diagnosis time. The proposed diagnosis circuit can locate up to l defective FRUs. We propose a novel space-compression circuit that reduces hardware overhead by exploiting the frequency difference of the scan shift clock and the system clock. When the size of constituent multiple-input signature-register (MISR) is m, the total number of signatures to be stored for the fault-free signature is 2lmB bits, where 1 ≤ B ≤ m. The experimental results show that the proposed diagnosis circuit that can locate up to 4 defective FRUs in the same test session can be implemented with less than 1 % of hardware overhead for a large industrial design. Hardware overhead for the diagnosis circuit is lower for large CUDs.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115560670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Decoupling Dynamic Information Flow Tracking with a dedicated coprocessor 用专用协处理器解耦动态信息流跟踪
Pub Date : 2009-09-29 DOI: 10.1109/DSN.2009.5270347
Hari Kannan, Michael Dalton, C. Kozyrakis
Dynamic Information Flow Tracking (DIFT) is a promising security technique. With hardware support, DIFT prevents a wide range of attacks on vulnerable software with minimal performance impact. DIFT architectures, however, require significant changes in the processor pipeline that increase design and verification complexity and may affect clock frequency. These complications deter hardware vendors from supporting DIFT. This paper makes hardware support for DIFT cost-effective by decoupling DIFT functionality onto a simple, separate coprocessor. Decoupling is possible because DIFT operations and regular computation need only synchronize on system calls. The coprocessor is a small hardware engine that performs logical operations and caches 4-bit tags. It introduces no changes to the design or layout of the main processor's logic, pipeline, or caches, and can be combined with various processors. Using a full-system hardware prototype and realistic Linux workloads, we show that the DIFT coprocessor provides the same security guarantees as current DIFT architectures with low runtime overheads.
动态信息流跟踪(DIFT)是一种很有前途的安全技术。有了硬件支持,DIFT可以在对易受攻击的软件造成最小性能影响的情况下防止各种攻击。然而,DIFT架构需要对处理器管道进行重大更改,这会增加设计和验证的复杂性,并可能影响时钟频率。这些复杂性阻碍了硬件供应商支持DIFT。本文通过将DIFT功能解耦到一个简单、独立的协处理器上,使DIFT的硬件支持具有成本效益。解耦是可能的,因为DIFT操作和常规计算只需要在系统调用上同步。协处理器是一个小型硬件引擎,执行逻辑操作并缓存4位标签。它不会改变主处理器的逻辑、管道或缓存的设计或布局,并且可以与各种处理器结合使用。使用完整的系统硬件原型和实际的Linux工作负载,我们展示了DIFT协处理器提供与当前DIFT体系结构相同的安全保证,并且运行时开销较低。
{"title":"Decoupling Dynamic Information Flow Tracking with a dedicated coprocessor","authors":"Hari Kannan, Michael Dalton, C. Kozyrakis","doi":"10.1109/DSN.2009.5270347","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270347","url":null,"abstract":"Dynamic Information Flow Tracking (DIFT) is a promising security technique. With hardware support, DIFT prevents a wide range of attacks on vulnerable software with minimal performance impact. DIFT architectures, however, require significant changes in the processor pipeline that increase design and verification complexity and may affect clock frequency. These complications deter hardware vendors from supporting DIFT. This paper makes hardware support for DIFT cost-effective by decoupling DIFT functionality onto a simple, separate coprocessor. Decoupling is possible because DIFT operations and regular computation need only synchronize on system calls. The coprocessor is a small hardware engine that performs logical operations and caches 4-bit tags. It introduces no changes to the design or layout of the main processor's logic, pipeline, or caches, and can be combined with various processors. Using a full-system hardware prototype and realistic Linux workloads, we show that the DIFT coprocessor provides the same security guarantees as current DIFT architectures with low runtime overheads.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126132964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
Power supply induced common cause faults-experimental assessment of potential countermeasures 电源共因故障——潜在对策的实验评估
Pub Date : 2009-09-29 DOI: 10.1109/DSN.2009.5270308
Peter Tummeltshammer, A. Steininger
Fault-tolerant architectures based on physical replication of components are vulnerable to faults that cause the same effect in all replica. Short outages in a power supply shared by all replica are a prominent example for such common cause faults. For systems in which the provision of a replicated power supply would cause prohibitive efforts the identification of reliable countermeasures against these effects is vital to maintain the required dependability level. In this paper we propose several of such countermeasures, namely parity protection, voltage monitoring and time diversity of the replica. We perform extensive fault injection experiments on three fault-tolerant dual core processor designs, one FPGA based and two commercial ASICs. These experiments provide evidence for the vulnerability of a completely unprotected dual core solution, while time diversity and voltage monitoring in combination with increased timing margins turn out particularly effective for eliminating common cause effects.
基于组件物理复制的容错架构容易受到在所有副本中导致相同影响的错误的影响。所有副本共享电源的短暂中断是此类常见原因故障的一个突出例子。对于提供重复电源会造成禁止努力的系统,确定可靠的对抗这些影响的措施对于维持所需的可靠性水平至关重要。在本文中,我们提出了几种这样的对策,即奇偶保护、电压监测和副本时分集。我们对三种容错双核处理器设计,一种基于FPGA和两种商用asic进行了广泛的故障注入实验。这些实验为完全不受保护的双核解决方案的脆弱性提供了证据,而时间分集和电压监测与增加的时间裕度相结合,对于消除共同原因影响特别有效。
{"title":"Power supply induced common cause faults-experimental assessment of potential countermeasures","authors":"Peter Tummeltshammer, A. Steininger","doi":"10.1109/DSN.2009.5270308","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270308","url":null,"abstract":"Fault-tolerant architectures based on physical replication of components are vulnerable to faults that cause the same effect in all replica. Short outages in a power supply shared by all replica are a prominent example for such common cause faults. For systems in which the provision of a replicated power supply would cause prohibitive efforts the identification of reliable countermeasures against these effects is vital to maintain the required dependability level. In this paper we propose several of such countermeasures, namely parity protection, voltage monitoring and time diversity of the replica. We perform extensive fault injection experiments on three fault-tolerant dual core processor designs, one FPGA based and two commercial ASICs. These experiments provide evidence for the vulnerability of a completely unprotected dual core solution, while time diversity and voltage monitoring in combination with increased timing margins turn out particularly effective for eliminating common cause effects.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124783539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Maximizing system lifetime by battery scheduling 通过电池调度最大化系统寿命
Pub Date : 2009-09-29 DOI: 10.1109/DSN.2009.5270351
M. Jongerden, B. Haverkort, H. Bohnenkamp, J. Katoen
The use of mobile devices is limited by the battery lifetime. Some devices have the option to connect an extra battery, or to use smart battery-packs with multiple cells to extend the lifetime. In these cases, scheduling the batteries over the load to exploit recovery properties usually extends the system lifetime. Straightforward scheduling schemes, like round robin or choosing the best battery available, already provide a big improvement compared to a sequential discharge of the batteries. In this paper we compare these scheduling schemes with the optimal scheduling scheme produced with a priced-timed automaton battery model (implemented and evaluated in Uppaal Cora). We see that in some cases the results of the simple scheduling schemes are close to optimal. However, the optimal schedules also clearly show that there is still room for improving the battery lifetimes.
移动设备的使用受到电池寿命的限制。一些设备可以选择连接一个额外的电池,或者使用带有多个电池的智能电池组来延长使用寿命。在这些情况下,在负载上调度电池以利用恢复特性通常可以延长系统的使用寿命。直接的调度方案,如循环或选择最好的电池,已经提供了一个很大的改进,相比于电池的顺序放电。本文将这些调度方案与由定价时间自动机电池模型产生的最优调度方案(在Uppaal Cora中实现和评估)进行了比较。我们看到,在某些情况下,简单调度方案的结果接近最优。然而,最佳时间表也清楚地表明,电池寿命仍有提高的空间。
{"title":"Maximizing system lifetime by battery scheduling","authors":"M. Jongerden, B. Haverkort, H. Bohnenkamp, J. Katoen","doi":"10.1109/DSN.2009.5270351","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270351","url":null,"abstract":"The use of mobile devices is limited by the battery lifetime. Some devices have the option to connect an extra battery, or to use smart battery-packs with multiple cells to extend the lifetime. In these cases, scheduling the batteries over the load to exploit recovery properties usually extends the system lifetime. Straightforward scheduling schemes, like round robin or choosing the best battery available, already provide a big improvement compared to a sequential discharge of the batteries. In this paper we compare these scheduling schemes with the optimal scheduling scheme produced with a priced-timed automaton battery model (implemented and evaluated in Uppaal Cora). We see that in some cases the results of the simple scheduling schemes are close to optimal. However, the optimal schedules also clearly show that there is still room for improving the battery lifetimes.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"7 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131325254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
An energy efficient circuit level technique to protect register file from MBUs and SETs in embedded processors 一种在嵌入式处理器中保护寄存器文件免受MBUs和set侵害的高能效电路级技术
Pub Date : 2009-09-29 DOI: 10.1109/DSN.2009.5270337
M. Fazeli, Alireza Namazi, S. Miremadi
This paper presents a circuit level soft errortolerant- technique, called RRC (Robust Register Caching), for the register file of embedded processors. The basic idea behind the RRC is to effectively cache the most vulnerable registers in a small highly robust register cache built by circuit level SEU and SET protected memory cells. To decide which cache entry should be replaced, the average number of read operations during a register ACE time is used as a criterion to judge. In fact, the victim cache entry is one which has the maximum read count. To minimize the power overhead of the RRC, the clock gating technique is efficiently exploited for the main register file resulting in significantly low power consumption. The RRC is able to protect the register file not only against Single Bit Upsets (SBUs) but also against Multiple Bit Upsets (MBUs) and Single Event Transients (SETs). The RRC is experimentally evaluated using the LEON processor. The experimental results show that, if the cache size is selected properly, the Architectural Vulnerability Factor (AVF) of the register file becomes about 1% while it imposes low power, area and performance overheads to the processor.
本文提出了一种用于嵌入式处理器寄存器文件的电路级软容错技术,称为RRC(鲁棒寄存器缓存)。RRC背后的基本思想是有效地将最脆弱的寄存器缓存在由电路级SEU和SET保护的存储单元构建的小型高鲁棒寄存器缓存中。为了决定应该替换哪个缓存项,在寄存器ACE时间内的平均读操作次数被用作判断标准。实际上,受害缓存条目是具有最大读计数的条目。为了最大限度地减少RRC的功率开销,时钟门控技术被有效地用于主寄存器文件,从而显著降低功耗。RRC不仅可以保护寄存器文件不受单比特扰流(SBUs)的影响,还可以保护寄存器文件不受多比特扰流(MBUs)和单事件瞬变(set)的影响。使用LEON处理器对RRC进行了实验评估。实验结果表明,如果选择适当的缓存大小,寄存器文件的架构漏洞系数(AVF)约为1%,同时对处理器的功耗、面积和性能开销都很低。
{"title":"An energy efficient circuit level technique to protect register file from MBUs and SETs in embedded processors","authors":"M. Fazeli, Alireza Namazi, S. Miremadi","doi":"10.1109/DSN.2009.5270337","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270337","url":null,"abstract":"This paper presents a circuit level soft errortolerant- technique, called RRC (Robust Register Caching), for the register file of embedded processors. The basic idea behind the RRC is to effectively cache the most vulnerable registers in a small highly robust register cache built by circuit level SEU and SET protected memory cells. To decide which cache entry should be replaced, the average number of read operations during a register ACE time is used as a criterion to judge. In fact, the victim cache entry is one which has the maximum read count. To minimize the power overhead of the RRC, the clock gating technique is efficiently exploited for the main register file resulting in significantly low power consumption. The RRC is able to protect the register file not only against Single Bit Upsets (SBUs) but also against Multiple Bit Upsets (MBUs) and Single Event Transients (SETs). The RRC is experimentally evaluated using the LEON processor. The experimental results show that, if the cache size is selected properly, the Architectural Vulnerability Factor (AVF) of the register file becomes about 1% while it imposes low power, area and performance overheads to the processor.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"51 89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124604573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
On the effectiveness of structural detection and defense against P2P-based botnets 基于p2p的僵尸网络结构检测与防御的有效性研究
Pub Date : 2009-09-29 DOI: 10.1109/DSN.2009.5270322
Duc T. Ha, Guanhua Yan, S. Eidenbenz, H. Ngo
Recently, peer-to-peer (P2P) networks have emerged as a covert communication platform for malicious programs known as bots. As popular distributed systems, they allow bots to communicate easily while protecting the botmaster from being discovered. Existing work on P2P-based botnets mainly focuses on measurement-based studies of botnet behaviors. In this work, through simulation, we study extensively the structure of P2P networks running Kademlia, one of a few widely used P2P protocols in practice. Our simulation testbed not only incorporates the actual code of a real Kademlia client software to achieve high realism, but also applies distributed event-driven simulation techniques to achieve high scalability. Using this testbed, we analyze the scaling, clustering, reachability, and various centrality properties of P2P-based botnets from a graph-theoretical perspective. We further demonstrate experimentally and theoretically that monitoring bot activities in a P2P network is difficult, suggesting that the P2P mechanism indeed helps botnets hide their communication effectively. Finally, we evaluate the effectiveness of some potential mitigation techniques, such as content poisoning, sybil-based and eclipse-based mitigation. Conclusions drawn from this work shed light on the structure of P2P botnets, how to monitor bot activities in P2P networks, and how to mitigate botnet operations effectively.
最近,点对点(P2P)网络已经成为被称为机器人的恶意程序的秘密通信平台。作为流行的分布式系统,它们允许机器人轻松通信,同时保护僵尸主机不被发现。现有的基于p2p的僵尸网络研究主要集中在基于测量的僵尸网络行为研究上。本文通过仿真,对实际应用中为数不多的几个广泛使用的P2P协议之一Kademlia的P2P网络结构进行了广泛的研究。我们的仿真试验台不仅集成了真实的Kademlia客户端软件的实际代码,实现了高真实感,而且应用了分布式事件驱动仿真技术,实现了高可扩展性。利用这个测试平台,我们从图论的角度分析了基于p2p的僵尸网络的可扩展性、集群性、可达性和各种中心性属性。我们进一步从实验和理论上证明,在P2P网络中监控僵尸活动是困难的,这表明P2P机制确实有助于僵尸网络有效地隐藏其通信。最后,我们评估了一些潜在的缓解技术的有效性,例如内容中毒、基于sybil和基于eclipse的缓解。从这项工作中得出的结论揭示了P2P僵尸网络的结构,如何监控P2P网络中的僵尸活动,以及如何有效地减轻僵尸网络的操作。
{"title":"On the effectiveness of structural detection and defense against P2P-based botnets","authors":"Duc T. Ha, Guanhua Yan, S. Eidenbenz, H. Ngo","doi":"10.1109/DSN.2009.5270322","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270322","url":null,"abstract":"Recently, peer-to-peer (P2P) networks have emerged as a covert communication platform for malicious programs known as bots. As popular distributed systems, they allow bots to communicate easily while protecting the botmaster from being discovered. Existing work on P2P-based botnets mainly focuses on measurement-based studies of botnet behaviors. In this work, through simulation, we study extensively the structure of P2P networks running Kademlia, one of a few widely used P2P protocols in practice. Our simulation testbed not only incorporates the actual code of a real Kademlia client software to achieve high realism, but also applies distributed event-driven simulation techniques to achieve high scalability. Using this testbed, we analyze the scaling, clustering, reachability, and various centrality properties of P2P-based botnets from a graph-theoretical perspective. We further demonstrate experimentally and theoretically that monitoring bot activities in a P2P network is difficult, suggesting that the P2P mechanism indeed helps botnets hide their communication effectively. Finally, we evaluate the effectiveness of some potential mitigation techniques, such as content poisoning, sybil-based and eclipse-based mitigation. Conclusions drawn from this work shed light on the structure of P2P botnets, how to monitor bot activities in P2P networks, and how to mitigate botnet operations effectively.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124636240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Dynamic content web applications: Crash, failover, and recovery analysis 动态内容web应用程序:崩溃、故障转移和恢复分析
Pub Date : 2009-09-29 DOI: 10.1109/DSN.2009.5270331
L. E. Buzato, G. M. D. Vieira, W. Zwaenepoel
This work assesses how crashes and recoveries affect the performance of a replicated dynamic content web application. RobustStore is the result of retrofitting TPC-W's on-line bookstore with Treplica, a middleware for building dependable applications. Implementations of Paxos and Fast Paxos are at the core of Treplica's efficient and programmer-friendly support for replication and recovery. The TPC-W benchmark, augmented with faultloads and dependability measures, is used to evaluate the behaviour of RobustStore. Experiments apply faultloads that cause sequential and concurrent replica crashes. RobustStore's performance drops by less than 13% during the recovery from two simultaneous replica crashes. When subject to an identical faultload and a shopping workload, a five-replicas RobustStore maintains an accuracy of 99.999%. Our results display not only good performance, total autonomy and uninterrupted availability, they also show that it is simple to develop efficient recovery-oriented applications using Treplica.
这项工作评估了崩溃和恢复如何影响复制的动态内容web应用程序的性能。RobustStore是使用Treplica(用于构建可靠应用程序的中间件)对TPC-W的在线书店进行改造的结果。Paxos和Fast Paxos的实现是Treplica高效且对程序员友好的复制和恢复支持的核心。在TPC-W基准测试中,增加了故障负载和可靠性度量,用于评估RobustStore的行为。实验应用导致顺序和并发副本崩溃的错误负载。在两个副本同时崩溃的恢复过程中,RobustStore的性能下降不到13%。当受到相同的故障负载和购物工作负载时,五个副本的RobustStore保持99.999%的准确性。我们的结果不仅显示了良好的性能、完全的自主性和不间断的可用性,还表明使用Treplica开发高效的面向恢复的应用程序很简单。
{"title":"Dynamic content web applications: Crash, failover, and recovery analysis","authors":"L. E. Buzato, G. M. D. Vieira, W. Zwaenepoel","doi":"10.1109/DSN.2009.5270331","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270331","url":null,"abstract":"This work assesses how crashes and recoveries affect the performance of a replicated dynamic content web application. RobustStore is the result of retrofitting TPC-W's on-line bookstore with Treplica, a middleware for building dependable applications. Implementations of Paxos and Fast Paxos are at the core of Treplica's efficient and programmer-friendly support for replication and recovery. The TPC-W benchmark, augmented with faultloads and dependability measures, is used to evaluate the behaviour of RobustStore. Experiments apply faultloads that cause sequential and concurrent replica crashes. RobustStore's performance drops by less than 13% during the recovery from two simultaneous replica crashes. When subject to an identical faultload and a shopping workload, a five-replicas RobustStore maintains an accuracy of 99.999%. Our results display not only good performance, total autonomy and uninterrupted availability, they also show that it is simple to develop efficient recovery-oriented applications using Treplica.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116454642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
LFI: A practical and general library-level fault injector LFI:一个实用的通用库级故障注入器
Pub Date : 2009-09-29 DOI: 10.1109/DSN.2009.5270313
P. Marinescu, George Candea
Fault injection, a critical aspect of testing robust systems, is often overlooked in the development of general-purpose software. We believe this is due to the absence of easy-to-use tools and to the extensive manual labor required to perform fault injection tests. This paper introduces LFI (L ibrary Fault Injector), a tool that automates the preparation of fault scenarios and their injection at the boundary between shared libraries and applications. LFI extends prior work by automatically profiling fault behaviors of libraries via static analysis of their binaries, thus reducing the dependence on human labor and perfect documentation. We present techniques for automatically generating injection scenarios and we describe a simple language for expressing such scenarios. LFI does not require access to libraries' source code and works for Linux, Windows, and Solaris on x86 and SPARC platforms.
故障注入是测试健壮系统的一个关键方面,但在通用软件的开发中常常被忽视。我们认为这是由于缺乏易于使用的工具,以及执行故障注入测试所需的大量手工劳动。本文介绍了LFI (L库故障注入器),它是一个自动准备故障场景并在共享库和应用程序之间的边界注入故障场景的工具。LFI通过对库的二进制文件进行静态分析来自动分析库的错误行为,从而扩展了先前的工作,从而减少了对人工劳动和完善文档的依赖。我们提出了自动生成注入场景的技术,并描述了一种表达这些场景的简单语言。LFI不需要访问库的源代码,它适用于x86和SPARC平台上的Linux、Windows和Solaris。
{"title":"LFI: A practical and general library-level fault injector","authors":"P. Marinescu, George Candea","doi":"10.1109/DSN.2009.5270313","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270313","url":null,"abstract":"Fault injection, a critical aspect of testing robust systems, is often overlooked in the development of general-purpose software. We believe this is due to the absence of easy-to-use tools and to the extensive manual labor required to perform fault injection tests. This paper introduces LFI (L ibrary Fault Injector), a tool that automates the preparation of fault scenarios and their injection at the boundary between shared libraries and applications. LFI extends prior work by automatically profiling fault behaviors of libraries via static analysis of their binaries, thus reducing the dependence on human labor and perfect documentation. We present techniques for automatically generating injection scenarios and we describe a simple language for expressing such scenarios. LFI does not require access to libraries' source code and works for Linux, Windows, and Solaris on x86 and SPARC platforms.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"491 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133869987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 90
An end-to-end approach for the automatic derivation of application-aware error detectors 用于自动派生应用程序感知错误检测器的端到端方法
Pub Date : 2009-09-29 DOI: 10.1109/DSN.2009.5270291
Galen Lyle, Shelley Cheny, K. Pattabiraman, Z. Kalbarczyk, R. Iyer
Critical Variable Recomputation (CVR) based error detection provides high coverage for data critical to an application while reducing the performance overhead associated with detecting benign errors. However, when implemented exclusively in software, the performance penalty associated with CVR based detection is unsuitably high. This paper addresses this limitation by providing a hybrid hardware/software tool chain which allows for the design of efficient error detectors while minimizing additional hardware. Detection mechanisms are automatically derived during compilation and mapped onto hardware where they are executed in parallel with the original task at runtime. When tested using an FPGA platform, results show that our approach incurs an area overhead of 53% while increasing execution time by 27% on average.
基于关键变量重新计算(Critical Variable Recomputation, CVR)的错误检测为应用程序的关键数据提供了高覆盖率,同时减少了与检测良性错误相关的性能开销。然而,当仅在软件中实现时,与基于CVR的检测相关的性能损失是不合适的高。本文通过提供一个混合硬件/软件工具链来解决这一限制,该工具链允许在最小化额外硬件的同时设计有效的错误检测器。在编译过程中自动派生检测机制,并将其映射到硬件上,在运行时与原始任务并行执行。当使用FPGA平台进行测试时,结果表明我们的方法导致53%的区域开销,而平均增加27%的执行时间。
{"title":"An end-to-end approach for the automatic derivation of application-aware error detectors","authors":"Galen Lyle, Shelley Cheny, K. Pattabiraman, Z. Kalbarczyk, R. Iyer","doi":"10.1109/DSN.2009.5270291","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270291","url":null,"abstract":"Critical Variable Recomputation (CVR) based error detection provides high coverage for data critical to an application while reducing the performance overhead associated with detecting benign errors. However, when implemented exclusively in software, the performance penalty associated with CVR based detection is unsuitably high. This paper addresses this limitation by providing a hybrid hardware/software tool chain which allows for the design of efficient error detectors while minimizing additional hardware. Detection mechanisms are automatically derived during compilation and mapped onto hardware where they are executed in parallel with the original task at runtime. When tested using an FPGA platform, results show that our approach incurs an area overhead of 53% while increasing execution time by 27% on average.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130275400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
A low-tech solution to avoid the severe impact of transient errors on the IP interconnect 一个低技术含量的解决方案,以避免瞬态错误对IP互连的严重影响
Pub Date : 2009-09-29 DOI: 10.1109/DSN.2009.5270301
D. Graham, P. Strid, Scott Roy, Fernando Rodriguez
There are many sources of failure within a System-on-Chip (SoC), so it is important to look beyond the processor core at other components that affect the reliable operation of the SoC, such as the fabric included in every one that connects the IP together. We use ARM's AMBA 3 AXI bus matrix to demonstrate that the impact of errors on the IP interconnect can be severe: possibly causing deadlock or memory corruption. We consider the detection of 1-bit transient faults without changing the IP that connects to the bus matrix or the AMBA 3 standard and without adding extra latency while keeping the performance and area overhead low. We explore what can be done under these constraints and propose a combination of techniques for a low-tech solution to detect these rare events.
片上系统(SoC)中有许多故障来源,因此重要的是要超越处理器核心,关注影响SoC可靠运行的其他组件,例如将IP连接在一起的每个组件中包含的结构。我们使用ARM的amba3axi总线矩阵来证明错误对IP互连的影响可能是严重的:可能导致死锁或内存损坏。我们考虑在不改变连接到总线矩阵或amba3标准的IP的情况下检测1位瞬态故障,并且不增加额外的延迟,同时保持低性能和面积开销。我们探索了在这些限制条件下可以做些什么,并提出了一种低技术解决方案的技术组合来检测这些罕见事件。
{"title":"A low-tech solution to avoid the severe impact of transient errors on the IP interconnect","authors":"D. Graham, P. Strid, Scott Roy, Fernando Rodriguez","doi":"10.1109/DSN.2009.5270301","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270301","url":null,"abstract":"There are many sources of failure within a System-on-Chip (SoC), so it is important to look beyond the processor core at other components that affect the reliable operation of the SoC, such as the fabric included in every one that connects the IP together. We use ARM's AMBA 3 AXI bus matrix to demonstrate that the impact of errors on the IP interconnect can be severe: possibly causing deadlock or memory corruption. We consider the detection of 1-bit transient faults without changing the IP that connects to the bus matrix or the AMBA 3 standard and without adding extra latency while keeping the performance and area overhead low. We explore what can be done under these constraints and propose a combination of techniques for a low-tech solution to detect these rare events.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127851720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2009 IEEE/IFIP International Conference on Dependable Systems & Networks
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1