Michael D. Powell, Arijit Biswas, S. Gupta, Shubhendu S. Mukherjee
{"title":"多核处理器中的体系结构核心回收,以实现硬错误容错性","authors":"Michael D. Powell, Arijit Biswas, S. Gupta, Shubhendu S. Mukherjee","doi":"10.1145/1555754.1555769","DOIUrl":null,"url":null,"abstract":"The incidence of hard errors in CPUs is a challenge for future multicore designs due to increasing total core area. Even if the location and nature of hard errors are known a priori, either at manufacture-time or in the field, cores with such errors must be disabled in the absence of hard-error tolerance. While caches, with their regular and repetitive structures, are easily covered against hard errors by providing spare arrays or spare lines, structures within a core are neither as regular nor as repetitive. Previous work has proposed microarchitectural core salvaging to exploit structural redundancy within a core and maintain functionality in the presence of hard errors. Unfortunately microarchitectural salvaging introduces complexity and may provide only limited coverage of core area against hard errors due to a lack of natural redundancy in the core.\n This paper makes a case for architectural core salvaging. We observe that even if some individual cores cannot execute certain operations, a CPU die can be instruction-set-architecture (ISA) compliant, that is execute all of the instructions required by its ISA, by exploiting natural cross-core redundancy. We propose using hardware to migrate offending threads to another core that can execute the operation. Architectural core salvaging can cover a large core area against faults, and be implemented by leveraging known techniques that minimize changes to the microarchitecture. We show it is possible to optimize architectural core salvaging such that the performance on a faulty die approaches that of a fault-free die--assuring significantly better performance than core disabling for many workloads and no worse performance than core disabling for the remainder.","PeriodicalId":91388,"journal":{"name":"Proceedings. International Symposium on Computer Architecture","volume":"126 1","pages":"93-104"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"137","resultStr":"{\"title\":\"Architectural core salvaging in a multi-core processor for hard-error tolerance\",\"authors\":\"Michael D. Powell, Arijit Biswas, S. Gupta, Shubhendu S. Mukherjee\",\"doi\":\"10.1145/1555754.1555769\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The incidence of hard errors in CPUs is a challenge for future multicore designs due to increasing total core area. Even if the location and nature of hard errors are known a priori, either at manufacture-time or in the field, cores with such errors must be disabled in the absence of hard-error tolerance. While caches, with their regular and repetitive structures, are easily covered against hard errors by providing spare arrays or spare lines, structures within a core are neither as regular nor as repetitive. Previous work has proposed microarchitectural core salvaging to exploit structural redundancy within a core and maintain functionality in the presence of hard errors. Unfortunately microarchitectural salvaging introduces complexity and may provide only limited coverage of core area against hard errors due to a lack of natural redundancy in the core.\\n This paper makes a case for architectural core salvaging. We observe that even if some individual cores cannot execute certain operations, a CPU die can be instruction-set-architecture (ISA) compliant, that is execute all of the instructions required by its ISA, by exploiting natural cross-core redundancy. We propose using hardware to migrate offending threads to another core that can execute the operation. Architectural core salvaging can cover a large core area against faults, and be implemented by leveraging known techniques that minimize changes to the microarchitecture. We show it is possible to optimize architectural core salvaging such that the performance on a faulty die approaches that of a fault-free die--assuring significantly better performance than core disabling for many workloads and no worse performance than core disabling for the remainder.\",\"PeriodicalId\":91388,\"journal\":{\"name\":\"Proceedings. International Symposium on Computer Architecture\",\"volume\":\"126 1\",\"pages\":\"93-104\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-06-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"137\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Symposium on Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1555754.1555769\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1555754.1555769","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Architectural core salvaging in a multi-core processor for hard-error tolerance
The incidence of hard errors in CPUs is a challenge for future multicore designs due to increasing total core area. Even if the location and nature of hard errors are known a priori, either at manufacture-time or in the field, cores with such errors must be disabled in the absence of hard-error tolerance. While caches, with their regular and repetitive structures, are easily covered against hard errors by providing spare arrays or spare lines, structures within a core are neither as regular nor as repetitive. Previous work has proposed microarchitectural core salvaging to exploit structural redundancy within a core and maintain functionality in the presence of hard errors. Unfortunately microarchitectural salvaging introduces complexity and may provide only limited coverage of core area against hard errors due to a lack of natural redundancy in the core.
This paper makes a case for architectural core salvaging. We observe that even if some individual cores cannot execute certain operations, a CPU die can be instruction-set-architecture (ISA) compliant, that is execute all of the instructions required by its ISA, by exploiting natural cross-core redundancy. We propose using hardware to migrate offending threads to another core that can execute the operation. Architectural core salvaging can cover a large core area against faults, and be implemented by leveraging known techniques that minimize changes to the microarchitecture. We show it is possible to optimize architectural core salvaging such that the performance on a faulty die approaches that of a fault-free die--assuring significantly better performance than core disabling for many workloads and no worse performance than core disabling for the remainder.