{"title":"The RAS Implications of DIMM Connector Failure Rates in Large, Highly Available Server Systems","authors":"T. J. Dell","doi":"10.1109/HOLM.2007.4318226","DOIUrl":null,"url":null,"abstract":"The juxtaposition of low-cost dual inline memory module (DIMM) connectors in highly reliable servers has created a difficult reliability, availability, and serviceability conundrum: the connector cost must be low enough to allow hundreds of sockets to be used per system, while at the same time, the system-level reliability must be high enough to prevent connector-related memory failures. This paper explores some of the modeling techniques that can be used to guide system-level fault tolerance decisions in view of the propensity of card-edge connectors to experience corrosion-induced failures, and it explains why understanding the probability density function (PDF) of the connector failure rate is crucial in establishing the system RAS strategy for DIMM connectors. The effects of both a \"low\" and \"high\" contact failure rate are analyzed at two different PDF's, and the resultant system implications are discussed.","PeriodicalId":11624,"journal":{"name":"Electrical Contacts - 2007 Proceedings of the 53rd IEEE Holm Conference on Electrical Contacts","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2007-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electrical Contacts - 2007 Proceedings of the 53rd IEEE Holm Conference on Electrical Contacts","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HOLM.2007.4318226","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The juxtaposition of low-cost dual inline memory module (DIMM) connectors in highly reliable servers has created a difficult reliability, availability, and serviceability conundrum: the connector cost must be low enough to allow hundreds of sockets to be used per system, while at the same time, the system-level reliability must be high enough to prevent connector-related memory failures. This paper explores some of the modeling techniques that can be used to guide system-level fault tolerance decisions in view of the propensity of card-edge connectors to experience corrosion-induced failures, and it explains why understanding the probability density function (PDF) of the connector failure rate is crucial in establishing the system RAS strategy for DIMM connectors. The effects of both a "low" and "high" contact failure rate are analyzed at two different PDF's, and the resultant system implications are discussed.