{"title":"基于有限传感数据的多核芯片临界温度鲁棒预测","authors":"S. Ankireddi","doi":"10.1109/STHERM.2011.5767203","DOIUrl":null,"url":null,"abstract":"Current generations of high performance microprocessors feature multiple cores and micro-cores, with each supporting multiple threads implemented in hardware. Such designs routinely feature billions of transistors, and chip layout teams are frequently hard pressed for placement and routing of all the functional blocks and sub-blocks that go into the design. An additional complexity arises because system engineers would like to have each micro-cores temperature monitored for silicon reliability and system performance reasons, which translates into them requiring that each core preferably be outfitted with a thermal sensor that routed out to the external world. Since die real estate is already at a premium and sensor macros can often be large, CPU design teams frequently shy away from placing and routing one sensor per each micro-core. The practical implication of this is that there is no means to monitor how hot any given micro-core is getting during field operation — which can compound risk significantly from the standpoints of silicon reliability (GoX, TDDB), chip electrical performance (timing, clock skew, jitter) and system performance (real time benchmarks, field performance, data coherency etc). In this study, a multi-core processor chip with a wide range of core-to-core power variability is considered. A finite number of sensor locations, which are known to be thermally sub-optimal, are assumed to be available for placement and routing. Using sensory data from these “poor” locations and an offline training algorithm, temperatures of all key core locations are determined using a causal, linear least-squares error basis. The resulting formulation is tested for prediction integrity using a large sample Monte Carlo analysis, and the temperature predictions are found to be robust. The technique developed is general enough to be applied across any microprocessor product family. The study concludes with suggested techniques to maintain prediction robustness in the presence of measurement errors, diode part-to-part variation and other inaccuracies. The approach proposed here can circumvent the limitations on placing and routing multiple diodes in real-estate constrained multi-core microprocessor and ASIC applications.","PeriodicalId":128077,"journal":{"name":"2011 27th Annual IEEE Semiconductor Thermal Measurement and Management Symposium","volume":"150 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Robust prediction of critical temperatures in multi-core chips with limited sensory data\",\"authors\":\"S. Ankireddi\",\"doi\":\"10.1109/STHERM.2011.5767203\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Current generations of high performance microprocessors feature multiple cores and micro-cores, with each supporting multiple threads implemented in hardware. Such designs routinely feature billions of transistors, and chip layout teams are frequently hard pressed for placement and routing of all the functional blocks and sub-blocks that go into the design. An additional complexity arises because system engineers would like to have each micro-cores temperature monitored for silicon reliability and system performance reasons, which translates into them requiring that each core preferably be outfitted with a thermal sensor that routed out to the external world. Since die real estate is already at a premium and sensor macros can often be large, CPU design teams frequently shy away from placing and routing one sensor per each micro-core. The practical implication of this is that there is no means to monitor how hot any given micro-core is getting during field operation — which can compound risk significantly from the standpoints of silicon reliability (GoX, TDDB), chip electrical performance (timing, clock skew, jitter) and system performance (real time benchmarks, field performance, data coherency etc). In this study, a multi-core processor chip with a wide range of core-to-core power variability is considered. A finite number of sensor locations, which are known to be thermally sub-optimal, are assumed to be available for placement and routing. Using sensory data from these “poor” locations and an offline training algorithm, temperatures of all key core locations are determined using a causal, linear least-squares error basis. The resulting formulation is tested for prediction integrity using a large sample Monte Carlo analysis, and the temperature predictions are found to be robust. The technique developed is general enough to be applied across any microprocessor product family. The study concludes with suggested techniques to maintain prediction robustness in the presence of measurement errors, diode part-to-part variation and other inaccuracies. The approach proposed here can circumvent the limitations on placing and routing multiple diodes in real-estate constrained multi-core microprocessor and ASIC applications.\",\"PeriodicalId\":128077,\"journal\":{\"name\":\"2011 27th Annual IEEE Semiconductor Thermal Measurement and Management Symposium\",\"volume\":\"150 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 27th Annual IEEE Semiconductor Thermal Measurement and Management Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/STHERM.2011.5767203\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 27th Annual IEEE Semiconductor Thermal Measurement and Management Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/STHERM.2011.5767203","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Robust prediction of critical temperatures in multi-core chips with limited sensory data
Current generations of high performance microprocessors feature multiple cores and micro-cores, with each supporting multiple threads implemented in hardware. Such designs routinely feature billions of transistors, and chip layout teams are frequently hard pressed for placement and routing of all the functional blocks and sub-blocks that go into the design. An additional complexity arises because system engineers would like to have each micro-cores temperature monitored for silicon reliability and system performance reasons, which translates into them requiring that each core preferably be outfitted with a thermal sensor that routed out to the external world. Since die real estate is already at a premium and sensor macros can often be large, CPU design teams frequently shy away from placing and routing one sensor per each micro-core. The practical implication of this is that there is no means to monitor how hot any given micro-core is getting during field operation — which can compound risk significantly from the standpoints of silicon reliability (GoX, TDDB), chip electrical performance (timing, clock skew, jitter) and system performance (real time benchmarks, field performance, data coherency etc). In this study, a multi-core processor chip with a wide range of core-to-core power variability is considered. A finite number of sensor locations, which are known to be thermally sub-optimal, are assumed to be available for placement and routing. Using sensory data from these “poor” locations and an offline training algorithm, temperatures of all key core locations are determined using a causal, linear least-squares error basis. The resulting formulation is tested for prediction integrity using a large sample Monte Carlo analysis, and the temperature predictions are found to be robust. The technique developed is general enough to be applied across any microprocessor product family. The study concludes with suggested techniques to maintain prediction robustness in the presence of measurement errors, diode part-to-part variation and other inaccuracies. The approach proposed here can circumvent the limitations on placing and routing multiple diodes in real-estate constrained multi-core microprocessor and ASIC applications.