Richard S. Wilkins, Xing Du, Robert A. Cochran, M. Popp
{"title":"Disaster tolerant Wolfpack geo-clusters","authors":"Richard S. Wilkins, Xing Du, Robert A. Cochran, M. Popp","doi":"10.1109/CLUSTR.2002.1137750","DOIUrl":null,"url":null,"abstract":"Clustering of computer systems to increase application availability has become a common industry practice. While it does increase the availability of applications and their data to users, it does not solve the problem of a disaster (flood, tornado, earthquake, terrorism, civil unrest, etc.) making the entire cluster, and the applications and data it is serving, unavailable. Distance mirroring of an application's data store allows for recovery from disaster but may still result in long periods of unacceptable downtime. This paper describes a method for stretching a standard Wolfpack (Microsoft/sup /spl trade// Cluster Service, MSCS) cluster of Intel architecture servers geographically for disaster tolerance. Server nodes and their storage may be placed at two (or more) distant sites to prevent a single disaster from taking down the entire cluster. Standard cluster semantics and ease of use are maintained using the remote mirroring capabilities of Hewlett-Packard's high-end storage arrays. The design of additional software to control data mirroring behavior when moving or failing-over applications between server nodes is described. Also, software that allows \"stretching\" the cluster quorum disk between sites in a manner that is transparent to the cluster software and also software for an external arbitrator node that provides rapid recovery from total loss of inter-site communications is described. Flexibility provided by the array's firmware mirroring options (i.e. synchronous or asynchronous I/O mirroring) allows for optimum use of inter-site link bandwidth based on the data safety requirements of individual applications.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2002.1137750","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Clustering of computer systems to increase application availability has become a common industry practice. While it does increase the availability of applications and their data to users, it does not solve the problem of a disaster (flood, tornado, earthquake, terrorism, civil unrest, etc.) making the entire cluster, and the applications and data it is serving, unavailable. Distance mirroring of an application's data store allows for recovery from disaster but may still result in long periods of unacceptable downtime. This paper describes a method for stretching a standard Wolfpack (Microsoft/sup /spl trade// Cluster Service, MSCS) cluster of Intel architecture servers geographically for disaster tolerance. Server nodes and their storage may be placed at two (or more) distant sites to prevent a single disaster from taking down the entire cluster. Standard cluster semantics and ease of use are maintained using the remote mirroring capabilities of Hewlett-Packard's high-end storage arrays. The design of additional software to control data mirroring behavior when moving or failing-over applications between server nodes is described. Also, software that allows "stretching" the cluster quorum disk between sites in a manner that is transparent to the cluster software and also software for an external arbitrator node that provides rapid recovery from total loss of inter-site communications is described. Flexibility provided by the array's firmware mirroring options (i.e. synchronous or asynchronous I/O mirroring) allows for optimum use of inter-site link bandwidth based on the data safety requirements of individual applications.