{"title":"机器检查错误诊断的有效性","authors":"Nikhil Pandit, Z. Kalbarczyk, R. Iyer","doi":"10.1109/DSN.2009.5270290","DOIUrl":null,"url":null,"abstract":"Machine Check Architecture (MCA) is a processor internal architecture subsystem that detects and logs correctable and uncorrectable errors in the data or control paths in each CPU core and the Northbridge. These errors include parity errors associated with caches, TLBs, ECC errors associated with caches and DRAM, and system bus errors. This paper reports on an experimental study on: (i) monitoring a computing cluster for machine checks and using this data to identify patterns that can be employed for error diagnostics and (ii) introducing faults into the machine to understand the resulting machine checks and correlate this data with relevant performance metrics.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Effectiveness of machine checks for error diagnostics\",\"authors\":\"Nikhil Pandit, Z. Kalbarczyk, R. Iyer\",\"doi\":\"10.1109/DSN.2009.5270290\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine Check Architecture (MCA) is a processor internal architecture subsystem that detects and logs correctable and uncorrectable errors in the data or control paths in each CPU core and the Northbridge. These errors include parity errors associated with caches, TLBs, ECC errors associated with caches and DRAM, and system bus errors. This paper reports on an experimental study on: (i) monitoring a computing cluster for machine checks and using this data to identify patterns that can be employed for error diagnostics and (ii) introducing faults into the machine to understand the resulting machine checks and correlate this data with relevant performance metrics.\",\"PeriodicalId\":376982,\"journal\":{\"name\":\"2009 IEEE/IFIP International Conference on Dependable Systems & Networks\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE/IFIP International Conference on Dependable Systems & Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSN.2009.5270290\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSN.2009.5270290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Effectiveness of machine checks for error diagnostics
Machine Check Architecture (MCA) is a processor internal architecture subsystem that detects and logs correctable and uncorrectable errors in the data or control paths in each CPU core and the Northbridge. These errors include parity errors associated with caches, TLBs, ECC errors associated with caches and DRAM, and system bus errors. This paper reports on an experimental study on: (i) monitoring a computing cluster for machine checks and using this data to identify patterns that can be employed for error diagnostics and (ii) introducing faults into the machine to understand the resulting machine checks and correlate this data with relevant performance metrics.