To estimate the reliability and find the weak point of circuits at design phase, several high-level evaluation methods have been proposed recently. However, most of these methods can only be used for combinational circuits. In this paper, we propose a reliability evaluation method based on probabilistic transfer matrices to accurately estimate the reliability of a flip flop circuit. The proposed method is compared with the method in [7] for the D-type flip-flop. Experimental results confirmed that our method is accurate.
{"title":"Reliability Evaluation of Flip-Flops Based on Probabilistic Transfer Matrices","authors":"Chengtian Ouyang, Jianhui Jiang, Jie Xiao","doi":"10.1109/PRDC.2010.22","DOIUrl":"https://doi.org/10.1109/PRDC.2010.22","url":null,"abstract":"To estimate the reliability and find the weak point of circuits at design phase, several high-level evaluation methods have been proposed recently. However, most of these methods can only be used for combinational circuits. In this paper, we propose a reliability evaluation method based on probabilistic transfer matrices to accurately estimate the reliability of a flip flop circuit. The proposed method is compared with the method in [7] for the D-type flip-flop. Experimental results confirmed that our method is accurate.","PeriodicalId":382974,"journal":{"name":"2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing","volume":"25 18","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120859218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software system developed for a specific user under contract undergoes a period of testing by the user before acceptance. This is known as user acceptance testing and is useful to debug the software in the user's operational circumstance. In this paper we first present a simple non-homogeneous Poisson process (NHPP)-based software reliability model to assess the quantitative software reliability under the user acceptance test, where the idea of an accelerated life testing model is introduced to represent the user's operational phase and to investigate the impact of user's acceptance test. This idea is applied to the reliability assessment of web applications in a different testing environment, where two stress tests with normal and higher workload conditions are executed in parallel. Through numerical examples with real software fault data observed in actual user acceptance and stress tests, we show the applicability of the software accelerated life testing model to two different software testing schemes.
{"title":"A Software Accelerated Life Testing Model","authors":"Toshiya Fujii, T. Dohi, H. Okamura, T. Fujiwara","doi":"10.1109/PRDC.2010.50","DOIUrl":"https://doi.org/10.1109/PRDC.2010.50","url":null,"abstract":"Software system developed for a specific user under contract undergoes a period of testing by the user before acceptance. This is known as user acceptance testing and is useful to debug the software in the user's operational circumstance. In this paper we first present a simple non-homogeneous Poisson process (NHPP)-based software reliability model to assess the quantitative software reliability under the user acceptance test, where the idea of an accelerated life testing model is introduced to represent the user's operational phase and to investigate the impact of user's acceptance test. This idea is applied to the reliability assessment of web applications in a different testing environment, where two stress tests with normal and higher workload conditions are executed in parallel. Through numerical examples with real software fault data observed in actual user acceptance and stress tests, we show the applicability of the software accelerated life testing model to two different software testing schemes.","PeriodicalId":382974,"journal":{"name":"2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128091904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Wang, Lei Zhang, Yinhe Han, Huawei Li, Xiaowei Li
Large scale Chip-Multiprocessors (CMPs) generally employ Network-on-Chip (NoC) to connect the last level cache (LLC), which is generally organized as distributed NUCA (non-uniform cache access) arrays for scalability and efficiency. On the other hand, aggressive technology scaling induces severe reliability problems, causing on-chip components (e.g., cores, cache banks, routers) failure due to manufacture defects or on-line hardware faults. Typical degradable CMPs should possess the ability to work around defects by disabling faulty components. For static NUCA architecture, when cache banks attached to a computing node are disabled, however, certain physical address sections will no longer be accessible. Prior approaches such as sets reduction introduced in Intel Xeon processor 7100 series enable turning off cache banks by masking certain sets bits in physical address1, which greatly wastes cache capacity. In this paper, we propose to tackle the above problem in a finer granularity to restrict the capacity loss in NUCA cache. Cache accesses to isolated nodes are redirected based on the utility-driven address remapping scheme that reduces data blocks conflicts in fault-tolerant shared-LLC. We evaluate our technique using GEMS simulator. Experimental results show that address remapping achieves significant improvement over the conventional cache sizing scheme.
{"title":"Address Remapping for Static NUCA in NoC-Based Degradable Chip-Multiprocessors","authors":"Ying Wang, Lei Zhang, Yinhe Han, Huawei Li, Xiaowei Li","doi":"10.1109/PRDC.2010.33","DOIUrl":"https://doi.org/10.1109/PRDC.2010.33","url":null,"abstract":"Large scale Chip-Multiprocessors (CMPs) generally employ Network-on-Chip (NoC) to connect the last level cache (LLC), which is generally organized as distributed NUCA (non-uniform cache access) arrays for scalability and efficiency. On the other hand, aggressive technology scaling induces severe reliability problems, causing on-chip components (e.g., cores, cache banks, routers) failure due to manufacture defects or on-line hardware faults. Typical degradable CMPs should possess the ability to work around defects by disabling faulty components. For static NUCA architecture, when cache banks attached to a computing node are disabled, however, certain physical address sections will no longer be accessible. Prior approaches such as sets reduction introduced in Intel Xeon processor 7100 series enable turning off cache banks by masking certain sets bits in physical address1, which greatly wastes cache capacity. In this paper, we propose to tackle the above problem in a finer granularity to restrict the capacity loss in NUCA cache. Cache accesses to isolated nodes are redirected based on the utility-driven address remapping scheme that reduces data blocks conflicts in fault-tolerant shared-LLC. We evaluate our technique using GEMS simulator. Experimental results show that address remapping achieves significant improvement over the conventional cache sizing scheme.","PeriodicalId":382974,"journal":{"name":"2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115297014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Matsuno, J. Nakazawa, M. Takeyama, Midori Sugaya, Y. Ishikawa
Computers are now present almost everywhere and connected into ever more complex networks. This means not only that embedded systems are more complicated, but also that communication among the diverse stakeholders of systems is much harder than before. This paper introduces the D-Case approach to a systematic explanation of embedded-systems dependability. A D-Case is a structured document that argues for the dependability of a system, supported by evidence. This extends the notion of •textit{safety cases} •cite{BB98} commonly used in (European) safety-critical sectors. The goal is to develop the D-Case language for communication systems dependability among the stakeholders. The paper reports the experience in constructing a D-Case for the remote test surveillance system developed to demonstrate certain dependability system components. D-Case construction is shown to be an effective method in explaining how each system component contributes to the overall dependability of the system. Another experiment shows how the D-Case approach can promote dependability through the life cycle of a larger system. Finally, the paper presents some comments on the difficulties and insights for future work.
{"title":"Towards a Language for Communication among Stakeholders","authors":"Y. Matsuno, J. Nakazawa, M. Takeyama, Midori Sugaya, Y. Ishikawa","doi":"10.1109/PRDC.2010.47","DOIUrl":"https://doi.org/10.1109/PRDC.2010.47","url":null,"abstract":"Computers are now present almost everywhere and connected into ever more complex networks. This means not only that embedded systems are more complicated, but also that communication among the diverse stakeholders of systems is much harder than before. This paper introduces the D-Case approach to a systematic explanation of embedded-systems dependability. A D-Case is a structured document that argues for the dependability of a system, supported by evidence. This extends the notion of •textit{safety cases} •cite{BB98} commonly used in (European) safety-critical sectors. The goal is to develop the D-Case language for communication systems dependability among the stakeholders. The paper reports the experience in constructing a D-Case for the remote test surveillance system developed to demonstrate certain dependability system components. D-Case construction is shown to be an effective method in explaining how each system component contributes to the overall dependability of the system. Another experiment shows how the D-Case approach can promote dependability through the life cycle of a larger system. Finally, the paper presents some comments on the difficulties and insights for future work.","PeriodicalId":382974,"journal":{"name":"2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing","volume":" 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120830489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Although either of temporal ordering and frequency distribution information embedded in process traces can profile normal process behaviors, but none of ever published schemes uses both of them to detect system call anomaly. This paper claims combining those two kinds of useful information can improve detection performance and firstly proposes sequential frequency vector (SFV) to exploit both temporal ordering and frequency information for system call anomaly detection. Extensive experiments on DARPA-1998 and UNM dataset have substantiated the claim. It is shown that SFV contains richer information and significantly outperforms other techniques in achieving lower false positive rates at 100% detection rate.
{"title":"Sequential Frequency Vector Based System Call Anomaly Detection","authors":"Ying Wu, Jianhui Jiang, L. Kong","doi":"10.1109/PRDC.2010.26","DOIUrl":"https://doi.org/10.1109/PRDC.2010.26","url":null,"abstract":"Although either of temporal ordering and frequency distribution information embedded in process traces can profile normal process behaviors, but none of ever published schemes uses both of them to detect system call anomaly. This paper claims combining those two kinds of useful information can improve detection performance and firstly proposes sequential frequency vector (SFV) to exploit both temporal ordering and frequency information for system call anomaly detection. Extensive experiments on DARPA-1998 and UNM dataset have substantiated the claim. It is shown that SFV contains richer information and significantly outperforms other techniques in achieving lower false positive rates at 100% detection rate.","PeriodicalId":382974,"journal":{"name":"2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124135869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Server consolidation using virtual machines (VMs) makes it difficult to execute processes as the administrators intend. A process scheduler in each VM is not aware of the other VM and schedules only processes in one VM independently. To solve this problem, process scheduling across VMs is necessary. However, such system-wide scheduling is vulnerable to denial-of-service (DoS) attacks from a compromised VM against the other VMs. In this paper, we propose the Monarch scheduler, which is a secure system-wide process scheduler running in the virtual machine monitor (VMM). The Monarch scheduler monitors the execution of processes and changes the scheduling behavior in all VMs. To change process scheduling from the VMM, it manipulates run queues and process states consistently without modifying guest operating systems. Its hybrid scheduling mitigates DoS attacks by leveraging performance isolation among VMs. We confirmed that the Monarch scheduler could achieve useful scheduling and the overheads were small.
{"title":"A Secure System-Wide Process Scheduler across Virtual Machines","authors":"H. Tadokoro, Kenichi Kourai, S. Chiba","doi":"10.1109/PRDC.2010.34","DOIUrl":"https://doi.org/10.1109/PRDC.2010.34","url":null,"abstract":"Server consolidation using virtual machines (VMs) makes it difficult to execute processes as the administrators intend. A process scheduler in each VM is not aware of the other VM and schedules only processes in one VM independently. To solve this problem, process scheduling across VMs is necessary. However, such system-wide scheduling is vulnerable to denial-of-service (DoS) attacks from a compromised VM against the other VMs. In this paper, we propose the Monarch scheduler, which is a secure system-wide process scheduler running in the virtual machine monitor (VMM). The Monarch scheduler monitors the execution of processes and changes the scheduling behavior in all VMs. To change process scheduling from the VMM, it manipulates run queues and process states consistently without modifying guest operating systems. Its hybrid scheduling mitigates DoS attacks by leveraging performance isolation among VMs. We confirmed that the Monarch scheduler could achieve useful scheduling and the overheads were small.","PeriodicalId":382974,"journal":{"name":"2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128997504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An F-FTCS mechanism that develops a fault tolerant single IP address cluster for TCP applications is proposed. The FTCS mechanism performs fine grain load balancing by handling all incoming TCP connection requests with a master node. Three fail-over algorithms are designed and implemented to carry out the fault tolerant FTCS mechanism. Discarding and Gathering Algorithms discard and gather TCP connections whose state is SYN-RECEIVED, respectively, at failure. A Scattering Algorithm synchronizes the information between nodes in the failure-free phase. These three algorithms are evaluated on Core 2 Duo machines. The Discarding Algorithm recovers from a failure from 440 to 950 msec earlier than the Gathering Algorithm, but it requires reprocessing the discarded TCP connection requests. The Scattering Algorithm requires from 120 to 160 usec more overhead during processing of a TCP connection request than that of the original FTCS mechanism.
{"title":"Design and Implementation of a Fault Tolerant Single IP Address Cluster","authors":"Jun Kato, H. Fujita, Y. Ishikawa","doi":"10.1109/PRDC.2010.39","DOIUrl":"https://doi.org/10.1109/PRDC.2010.39","url":null,"abstract":"An F-FTCS mechanism that develops a fault tolerant single IP address cluster for TCP applications is proposed. The FTCS mechanism performs fine grain load balancing by handling all incoming TCP connection requests with a master node. Three fail-over algorithms are designed and implemented to carry out the fault tolerant FTCS mechanism. Discarding and Gathering Algorithms discard and gather TCP connections whose state is SYN-RECEIVED, respectively, at failure. A Scattering Algorithm synchronizes the information between nodes in the failure-free phase. These three algorithms are evaluated on Core 2 Duo machines. The Discarding Algorithm recovers from a failure from 440 to 950 msec earlier than the Gathering Algorithm, but it requires reprocessing the discarded TCP connection requests. The Scattering Algorithm requires from 120 to 160 usec more overhead during processing of a TCP connection request than that of the original FTCS mechanism.","PeriodicalId":382974,"journal":{"name":"2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133485006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Triple modular redundancy (TMR) is a well-known technique for building fault-tolerant systems. In TMR, a module unit is triplicated, and the outputs of these three units are compared by a voter. In this paper we consider systems that consist of multiple TMR units in series. Only recently has it been found that even such simple systems can be configured into various structures. We propose (i) a method of calculating the reliability of cascaded TMR systems and (ii) an algorithm for finding a structure that maximizes reliability. The algorithm uses the branch and bound search algorithm, where candidate solutions are evaluated by means of the proposed reliability calculation method. We also show that some new structures have optimal reliability within some ranges of voter and module reliability.
{"title":"On the Reliability of Cascaded TMR Systems","authors":"Masashi Hamamatsu, Tatsuhiro Tsuchiya, T. Kikuno","doi":"10.1109/PRDC.2010.45","DOIUrl":"https://doi.org/10.1109/PRDC.2010.45","url":null,"abstract":"Triple modular redundancy (TMR) is a well-known technique for building fault-tolerant systems. In TMR, a module unit is triplicated, and the outputs of these three units are compared by a voter. In this paper we consider systems that consist of multiple TMR units in series. Only recently has it been found that even such simple systems can be configured into various structures. We propose (i) a method of calculating the reliability of cascaded TMR systems and (ii) an algorithm for finding a structure that maximizes reliability. The algorithm uses the branch and bound search algorithm, where candidate solutions are evaluated by means of the proposed reliability calculation method. We also show that some new structures have optimal reliability within some ranges of voter and module reliability.","PeriodicalId":382974,"journal":{"name":"2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114834688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The aim of this research is to use a data analysis/mining approach to extract information from a large number of failure equipment notifications, form a fuzzy system that would be capable of learning and optimizing the knowledge from historical evidence, and subsequently use it as a guiding tool in decision making processes.
{"title":"Optimal Inventory of Computer Repair Parts: A Fuzzy Systems Approach","authors":"L. Sztandera","doi":"10.1109/PRDC.2010.38","DOIUrl":"https://doi.org/10.1109/PRDC.2010.38","url":null,"abstract":"The aim of this research is to use a data analysis/mining approach to extract information from a large number of failure equipment notifications, form a fuzzy system that would be capable of learning and optimizing the knowledge from historical evidence, and subsequently use it as a guiding tool in decision making processes.","PeriodicalId":382974,"journal":{"name":"2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126275498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Takahiko Ikeda, Mamoru Ohara, S. Fukumoto, M. Arai, K. Iwasaki
Some cloud storage services have recently introduced file versioning features by which more than one version of a file can be maintained. For providing file versioning with limited storage resources, it is essential to divide the resources among versions in accordance with the varied needs of numerous users. In this paper, we focus on applications in which newer versions of a file are more likely to be requested, which may be true in the case of many subscription services. We propose a new distributed data replication protocol supporting the file versioning feature. We also construct an analytical model that can derive an optimal allocation of the resources when the total number of replica nodes in a system and the distribution of the frequency of read requests for each version are given. In addition, we present some numerical examples obtained by simulations to show the good scalability and dependability of our system by assuming some realistic parameters.
{"title":"A Distributed Data Replication Protocol for File Versioning with Optimal Node Assignments","authors":"Takahiko Ikeda, Mamoru Ohara, S. Fukumoto, M. Arai, K. Iwasaki","doi":"10.1109/PRDC.2010.40","DOIUrl":"https://doi.org/10.1109/PRDC.2010.40","url":null,"abstract":"Some cloud storage services have recently introduced file versioning features by which more than one version of a file can be maintained. For providing file versioning with limited storage resources, it is essential to divide the resources among versions in accordance with the varied needs of numerous users. In this paper, we focus on applications in which newer versions of a file are more likely to be requested, which may be true in the case of many subscription services. We propose a new distributed data replication protocol supporting the file versioning feature. We also construct an analytical model that can derive an optimal allocation of the resources when the total number of replica nodes in a system and the distribution of the frequency of read requests for each version are given. In addition, we present some numerical examples obtained by simulations to show the good scalability and dependability of our system by assuming some realistic parameters.","PeriodicalId":382974,"journal":{"name":"2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126356889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}