M. Salehi, M. Shafique, F. Kriebel, Semeen Rehman, Mohammad Khavari Tavana, A. Ejlali, J. Henkel
{"title":"dsReliM:工艺变化下暗硅多核芯片的功率约束可靠性管理","authors":"M. Salehi, M. Shafique, F. Kriebel, Semeen Rehman, Mohammad Khavari Tavana, A. Ejlali, J. Henkel","doi":"10.1109/CODESISSS.2015.7331370","DOIUrl":null,"url":null,"abstract":"Due to the tight power envelope, in the future technology nodes it is envisaged that not all cores in a many-core chip can be simultaneously powered-on (at full performance level). The power-gated cores are referred to as Dark Silicon. At the same time, growing reliability issues due to process variations and soft errors challenge the cost-effective deployment of future technology nodes. This paper presents a reliability management system for Dark Silicon chips (dsReliM) that optimizes for reliability of on-chip systems while jointly accounting for soft errors, process variations and the thermal design power (TDP) constraint. Towards the TDP-constrained reliability optimization, dsReliM leverages multiple reliable application versions that can potentially execute on different cores with frequency variations and supporting differenst voltage-frequency levels, thus facilitating distinct power, reliability and performance tradeoffs at run time. Experiments show that our dsReliM system provides up to 20% reliability improvements under different TDP constraints when compared to a state-of-the-art technique. Also, compared to an ideal-case optimal solution, dsReliM deviates up to 2.5% in terms of reliability efficiency, but speeds up the reliability management decision time by a factor of up to 3100.","PeriodicalId":281383,"journal":{"name":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"dsReliM: Power-constrained reliability management in Dark-Silicon many-core chips under process variations\",\"authors\":\"M. Salehi, M. Shafique, F. Kriebel, Semeen Rehman, Mohammad Khavari Tavana, A. Ejlali, J. Henkel\",\"doi\":\"10.1109/CODESISSS.2015.7331370\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the tight power envelope, in the future technology nodes it is envisaged that not all cores in a many-core chip can be simultaneously powered-on (at full performance level). The power-gated cores are referred to as Dark Silicon. At the same time, growing reliability issues due to process variations and soft errors challenge the cost-effective deployment of future technology nodes. This paper presents a reliability management system for Dark Silicon chips (dsReliM) that optimizes for reliability of on-chip systems while jointly accounting for soft errors, process variations and the thermal design power (TDP) constraint. Towards the TDP-constrained reliability optimization, dsReliM leverages multiple reliable application versions that can potentially execute on different cores with frequency variations and supporting differenst voltage-frequency levels, thus facilitating distinct power, reliability and performance tradeoffs at run time. Experiments show that our dsReliM system provides up to 20% reliability improvements under different TDP constraints when compared to a state-of-the-art technique. Also, compared to an ideal-case optimal solution, dsReliM deviates up to 2.5% in terms of reliability efficiency, but speeds up the reliability management decision time by a factor of up to 3100.\",\"PeriodicalId\":281383,\"journal\":{\"name\":\"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CODESISSS.2015.7331370\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CODESISSS.2015.7331370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
dsReliM: Power-constrained reliability management in Dark-Silicon many-core chips under process variations
Due to the tight power envelope, in the future technology nodes it is envisaged that not all cores in a many-core chip can be simultaneously powered-on (at full performance level). The power-gated cores are referred to as Dark Silicon. At the same time, growing reliability issues due to process variations and soft errors challenge the cost-effective deployment of future technology nodes. This paper presents a reliability management system for Dark Silicon chips (dsReliM) that optimizes for reliability of on-chip systems while jointly accounting for soft errors, process variations and the thermal design power (TDP) constraint. Towards the TDP-constrained reliability optimization, dsReliM leverages multiple reliable application versions that can potentially execute on different cores with frequency variations and supporting differenst voltage-frequency levels, thus facilitating distinct power, reliability and performance tradeoffs at run time. Experiments show that our dsReliM system provides up to 20% reliability improvements under different TDP constraints when compared to a state-of-the-art technique. Also, compared to an ideal-case optimal solution, dsReliM deviates up to 2.5% in terms of reliability efficiency, but speeds up the reliability management decision time by a factor of up to 3100.