Today's computers have gigabytes of main memory due to improved DRAM density. As density increases, smaller bit cells become more susceptible to errors. With an increase in error susceptibility, the need for memory resiliency also increases. Self-testing of memory health can proactively check for errors to improve resiliency. This paper describes a software-only self test to continuously test memory. We present the challenges and design for an approach, called Continuous Online Memory Testing (COMeT), that targets chip multiprocessors. COMeT tests memory health simultaneously with application execution in anticipation of allocation requests. The approach guarantees that memory is tested within a fixed time interval to limit exposure to lurking errors. We developed and evaluated an implementation of COMeT. On the SPEC CPU2006 benchmarks, COMeT has a low 4% average performance overhead. When emulated errors were injected into physical memory, applications executed 1.13x to 4.41x longer with COMeT than without it.
{"title":"COMeT: Continuous Online Memory Test","authors":"Musfiq Rahman, B. Childers, Sangyeun Cho","doi":"10.1109/PRDC.2011.22","DOIUrl":"https://doi.org/10.1109/PRDC.2011.22","url":null,"abstract":"Today's computers have gigabytes of main memory due to improved DRAM density. As density increases, smaller bit cells become more susceptible to errors. With an increase in error susceptibility, the need for memory resiliency also increases. Self-testing of memory health can proactively check for errors to improve resiliency. This paper describes a software-only self test to continuously test memory. We present the challenges and design for an approach, called Continuous Online Memory Testing (COMeT), that targets chip multiprocessors. COMeT tests memory health simultaneously with application execution in anticipation of allocation requests. The approach guarantees that memory is tested within a fixed time interval to limit exposure to lurking errors. We developed and evaluated an implementation of COMeT. On the SPEC CPU2006 benchmarks, COMeT has a low 4% average performance overhead. When emulated errors were injected into physical memory, applications executed 1.13x to 4.41x longer with COMeT than without it.","PeriodicalId":254760,"journal":{"name":"2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121073179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cyber security has gained national and international attention as a result of near continuous headlines from financial institutions, retail stores, government offices and universities reporting compromised systems and stolen data. Concerns continue to rise as threats of service interruption, unauthorized access, stealing and altering of information, and spreading of viruses become ever more prevalent and serious. Controlling access to application layer resources is a critical component in a layered security solution that includes encryption, firewalls, virtual private networks, antivirus, and intrusion detection. In this paper we discuss the development of an application-level access control solution, based on an open-source access manager augmented with custom software components, to provide protection to both Web-based and Java-based client and server applications.
{"title":"Access Control of Web and Java Based Applications","authors":"Kam S. Tso, Michael J. Pajevski, Bryan Johnson","doi":"10.1109/PRDC.2011.54","DOIUrl":"https://doi.org/10.1109/PRDC.2011.54","url":null,"abstract":"Cyber security has gained national and international attention as a result of near continuous headlines from financial institutions, retail stores, government offices and universities reporting compromised systems and stolen data. Concerns continue to rise as threats of service interruption, unauthorized access, stealing and altering of information, and spreading of viruses become ever more prevalent and serious. Controlling access to application layer resources is a critical component in a layered security solution that includes encryption, firewalls, virtual private networks, antivirus, and intrusion detection. In this paper we discuss the development of an application-level access control solution, based on an open-source access manager augmented with custom software components, to provide protection to both Web-based and Java-based client and server applications.","PeriodicalId":254760,"journal":{"name":"2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115837360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For most purposes, it is sufficient for a low-power test set to ensure that the power dissipation during test application will not exceed that possible during functional operation. This is guaranteed for the fast functional capture cycles of functional broadside tests. This paper describes a procedure that generates broadside test sets with bounded switching activity during fast functional capture cycles based on the maximum switching activity of a functional broadside test set, targeting transition faults in full-scan circuits. The procedure first generates a compact functional broadside test set. It then extends the test set in steps in order to increase its fault coverage to that of an arbitrary broadside test set (a test set that includes non-functional broadside tests). During these steps, the maximum switching activity of the functional broadside test set is used for bounding the switching activity.
{"title":"Augmenting Functional Broadside Tests for Transition Fault Coverage with Bounded Switching Activity","authors":"I. Pomeranz","doi":"10.1109/PRDC.2011.14","DOIUrl":"https://doi.org/10.1109/PRDC.2011.14","url":null,"abstract":"For most purposes, it is sufficient for a low-power test set to ensure that the power dissipation during test application will not exceed that possible during functional operation. This is guaranteed for the fast functional capture cycles of functional broadside tests. This paper describes a procedure that generates broadside test sets with bounded switching activity during fast functional capture cycles based on the maximum switching activity of a functional broadside test set, targeting transition faults in full-scan circuits. The procedure first generates a compact functional broadside test set. It then extends the test set in steps in order to increase its fault coverage to that of an arbitrary broadside test set (a test set that includes non-functional broadside tests). During these steps, the maximum switching activity of the functional broadside test set is used for bounding the switching activity.","PeriodicalId":254760,"journal":{"name":"2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing","volume":"08 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122406062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fabian Oboril, M. Tahoori, V. Heuveline, D. Lukarski, Jan-Philipp Weiss
As hardware devices like processor cores and memory sub-systems based on nano-scale technology nodes become more unreliable, the need for fault tolerant numerical computing engines, as used in many critical applications with long computation/mission times, is becoming pronounced. In this paper, we present an Algorithm-based Fault Tolerance (ABFT) scheme for an iterative linear solver engine based on the Conjugated Gradient method (CG) by taking the advantage of numerical defect correction. This method is "pay as you go", meaning that there is practically only a runtime overhead if errors occur and a correction is performed. Our experimental comparison with software-based Triple Modular Redundancy (TMR) clearly shows the runtime benefit of the proposed approach, good fault tolerance and no occurrence of silent data corruption.
{"title":"Numerical Defect Correction as an Algorithm-Based Fault Tolerance Technique for Iterative Solvers","authors":"Fabian Oboril, M. Tahoori, V. Heuveline, D. Lukarski, Jan-Philipp Weiss","doi":"10.1109/PRDC.2011.26","DOIUrl":"https://doi.org/10.1109/PRDC.2011.26","url":null,"abstract":"As hardware devices like processor cores and memory sub-systems based on nano-scale technology nodes become more unreliable, the need for fault tolerant numerical computing engines, as used in many critical applications with long computation/mission times, is becoming pronounced. In this paper, we present an Algorithm-based Fault Tolerance (ABFT) scheme for an iterative linear solver engine based on the Conjugated Gradient method (CG) by taking the advantage of numerical defect correction. This method is \"pay as you go\", meaning that there is practically only a runtime overhead if errors occur and a correction is performed. Our experimental comparison with software-based Triple Modular Redundancy (TMR) clearly shows the runtime benefit of the proposed approach, good fault tolerance and no occurrence of silent data corruption.","PeriodicalId":254760,"journal":{"name":"2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127505033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a self-stabilizing distributed clock synchronization protocol in the absence of faults in the system. It is focused on the distributed clock synchronization of an arbitrary, non-partitioned digraph ranging from fully connected to 1-connected networks of nodes while allowing for differences in the network elements. This protocol does not rely on assumptions about the initial state of the system, other than the presence of at least one node, and no central clock or a centrally generated signal, pulse, or message is used. Nodes are anonymous, i.e., they do not have unique identities. There is no theoretical limit on the maximum number of participating nodes. The only constraint on the behavior of the node is that the interactions with other nodes are restricted to defined links and interfaces. This protocol deterministically converges within a time bound that is a linear function of the self-stabilization period. We present an outline of a deductive proof of the correctness of the protocol. A bounded model of the protocol was mechanically verified for a variety of topologies. Results of the mechanical proof of the correctness of the protocol are provided. The model checking results have verified the correctness of the protocol as they apply to the networks with unidirectional and bidirectional links. In addition, the results confirm the claims of determinism and linear convergence. As a result, we conjecture that the protocol solves the general case of this problem. We also present several variations of the protocol and discuss that this synchronization protocol is indeed an emergent system.
{"title":"A Self-Stabilizing Synchronization Protocol for Arbitrary Digraphs: A Self-Stabilizing Distributed Clock Synchronization Protocol For Arbitrary Digraphs","authors":"M. Malekpour","doi":"10.1109/PRDC.2011.37","DOIUrl":"https://doi.org/10.1109/PRDC.2011.37","url":null,"abstract":"This paper presents a self-stabilizing distributed clock synchronization protocol in the absence of faults in the system. It is focused on the distributed clock synchronization of an arbitrary, non-partitioned digraph ranging from fully connected to 1-connected networks of nodes while allowing for differences in the network elements. This protocol does not rely on assumptions about the initial state of the system, other than the presence of at least one node, and no central clock or a centrally generated signal, pulse, or message is used. Nodes are anonymous, i.e., they do not have unique identities. There is no theoretical limit on the maximum number of participating nodes. The only constraint on the behavior of the node is that the interactions with other nodes are restricted to defined links and interfaces. This protocol deterministically converges within a time bound that is a linear function of the self-stabilization period. We present an outline of a deductive proof of the correctness of the protocol. A bounded model of the protocol was mechanically verified for a variety of topologies. Results of the mechanical proof of the correctness of the protocol are provided. The model checking results have verified the correctness of the protocol as they apply to the networks with unidirectional and bidirectional links. In addition, the results confirm the claims of determinism and linear convergence. As a result, we conjecture that the protocol solves the general case of this problem. We also present several variations of the protocol and discuss that this synchronization protocol is indeed an emergent system.","PeriodicalId":254760,"journal":{"name":"2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127379955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}