Pub Date : 2010-03-22DOI: 10.1109/ISQED.2010.5450491
Abhishek A. Sinkar, N. Kim
Power-gating (PG) techniques have been widely used in modern digital ICs to reduce their standby leakage power during idle periods. Meanwhile, virtual supply voltage (VVDD) of a power-gated IC is a function of strength of a PG device and total current flowing through it. Thus, the VVDD level becomes susceptible to 1) negative bias temperature instability (NBTI) degradation that weakens the PG device over time and 2) temporal temperature variation that affects active leakage current (thus total current) of the IC. To account for the NBTI degradation, the PG device must be upsized such that it guarantees a minimum VVDD level that prevents any timing failure over chip lifetime. Moreover, the PG device is also sized for the worst-case voltage drop partly resulted by a large amount of active leakage current at high temperature. However, increasing the size of the PG device to consider both effects leads to higher VVDD (thus active leakage power) than necessary at low temperature and/or in early chip lifetime. To minimize active leakage power increase due to these effects, we propose two techniques that adjust strength of a PG device based on its usage and IC's temperature at runtime. Both techniques are applied to an experimental setup modeling total current consumption of an IC in 32nm technology and their efficacy is demonstrated in the presence of within-die spatial process and temperature variations. On average of 100 die samples, they can reduce active leakage power by up to 10% in early chip lifetime.
{"title":"Analyzing and minimizing effects of temperature variation and NBTI on active leakage power of power-gated circuits","authors":"Abhishek A. Sinkar, N. Kim","doi":"10.1109/ISQED.2010.5450491","DOIUrl":"https://doi.org/10.1109/ISQED.2010.5450491","url":null,"abstract":"Power-gating (PG) techniques have been widely used in modern digital ICs to reduce their standby leakage power during idle periods. Meanwhile, virtual supply voltage (VVDD) of a power-gated IC is a function of strength of a PG device and total current flowing through it. Thus, the VVDD level becomes susceptible to 1) negative bias temperature instability (NBTI) degradation that weakens the PG device over time and 2) temporal temperature variation that affects active leakage current (thus total current) of the IC. To account for the NBTI degradation, the PG device must be upsized such that it guarantees a minimum VVDD level that prevents any timing failure over chip lifetime. Moreover, the PG device is also sized for the worst-case voltage drop partly resulted by a large amount of active leakage current at high temperature. However, increasing the size of the PG device to consider both effects leads to higher VVDD (thus active leakage power) than necessary at low temperature and/or in early chip lifetime. To minimize active leakage power increase due to these effects, we propose two techniques that adjust strength of a PG device based on its usage and IC's temperature at runtime. Both techniques are applied to an experimental setup modeling total current consumption of an IC in 32nm technology and their efficacy is demonstrated in the presence of within-die spatial process and temperature variations. On average of 100 die samples, they can reduce active leakage power by up to 10% in early chip lifetime.","PeriodicalId":369046,"journal":{"name":"2010 11th International Symposium on Quality Electronic Design (ISQED)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115681100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-22DOI: 10.1109/ISQED.2010.5450506
Yu-Shen Yang, Brian Keng, N. Nicolici, A. Veneris, Sean Safarpour
Silicon debug poses a unique challenge to the engineer because of the limited access to internal signals of the chip. Embedded hardware such as trace buffers helps overcome this challenge by acquiring data in real time. However, trace buffers only provide access to a limited subset of pre-selected signals. In order to effectively debug, it is essential to configure the trace-buffer to trace the relevant signals selected from the pre-defined set. This can be a labor-intensive and time-consuming process. This paper introduces a set of techniques to automate the configuring process for trace buffer-based hardware. First, the proposed approach utilizes UNSAT cores to identify signals that can provide valuable information for localizing the error. Next, it finds alternatives for signals not part of the traceable set so that it can imply the corresponding values. Integrating the proposed techniques with a debugging methodology, experiments show that the methodology can reduce 30% of potential suspects with as low as 8% of registers traced, demonstrating the effectiveness of the proposed procedures.
{"title":"Automated silicon debug data analysis techniques for a hardware data acquisition environment","authors":"Yu-Shen Yang, Brian Keng, N. Nicolici, A. Veneris, Sean Safarpour","doi":"10.1109/ISQED.2010.5450506","DOIUrl":"https://doi.org/10.1109/ISQED.2010.5450506","url":null,"abstract":"Silicon debug poses a unique challenge to the engineer because of the limited access to internal signals of the chip. Embedded hardware such as trace buffers helps overcome this challenge by acquiring data in real time. However, trace buffers only provide access to a limited subset of pre-selected signals. In order to effectively debug, it is essential to configure the trace-buffer to trace the relevant signals selected from the pre-defined set. This can be a labor-intensive and time-consuming process. This paper introduces a set of techniques to automate the configuring process for trace buffer-based hardware. First, the proposed approach utilizes UNSAT cores to identify signals that can provide valuable information for localizing the error. Next, it finds alternatives for signals not part of the traceable set so that it can imply the corresponding values. Integrating the proposed techniques with a debugging methodology, experiments show that the methodology can reduce 30% of potential suspects with as low as 8% of registers traced, demonstrating the effectiveness of the proposed procedures.","PeriodicalId":369046,"journal":{"name":"2010 11th International Symposium on Quality Electronic Design (ISQED)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124428216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-22DOI: 10.1109/ISQED.2010.5450412
V. Nandakumar, D. Newmark, Yaping Zhan, M. Marek-Sadowska
Process variations are of great concern in modern technologies. Early prediction of their effects on the circuit performance and parametric yield is extremely useful. In today's microprocessors, custom designed transistor level macros and memory array macros, like caches, occupy a significant fraction of the total core area. While block-based statistical static timing analysis (SSTA) techniques are fast and can be used for analyzing cell based designs, they cannot be used for transistor level macros. Currently, such macros are either abstracted with statistical timing models which are less accurate or are analyzed using statistical Monte-Carlo circuit simulations which are time consuming. In this paper, we develop a fast and accurate flow that can be used to perform SSTA on large transistor and memory array macros. The delay distributions of paths obtained using our flow for a large, industrial, 45 nm, transistor level macro have error of less than 6% compared to those obtained after rigorous Monte-Carlo SPICE simulations. The resulting flow enables full-chip SSTA, provides visibility into the macro even at the chip level, and eliminates the need to abstract the macros with statistical timing models.
{"title":"Statistical static timing analysis flow for transistor level macros in a microprocessor","authors":"V. Nandakumar, D. Newmark, Yaping Zhan, M. Marek-Sadowska","doi":"10.1109/ISQED.2010.5450412","DOIUrl":"https://doi.org/10.1109/ISQED.2010.5450412","url":null,"abstract":"Process variations are of great concern in modern technologies. Early prediction of their effects on the circuit performance and parametric yield is extremely useful. In today's microprocessors, custom designed transistor level macros and memory array macros, like caches, occupy a significant fraction of the total core area. While block-based statistical static timing analysis (SSTA) techniques are fast and can be used for analyzing cell based designs, they cannot be used for transistor level macros. Currently, such macros are either abstracted with statistical timing models which are less accurate or are analyzed using statistical Monte-Carlo circuit simulations which are time consuming. In this paper, we develop a fast and accurate flow that can be used to perform SSTA on large transistor and memory array macros. The delay distributions of paths obtained using our flow for a large, industrial, 45 nm, transistor level macro have error of less than 6% compared to those obtained after rigorous Monte-Carlo SPICE simulations. The resulting flow enables full-chip SSTA, provides visibility into the macro even at the chip level, and eliminates the need to abstract the macros with statistical timing models.","PeriodicalId":369046,"journal":{"name":"2010 11th International Symposium on Quality Electronic Design (ISQED)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121919815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-22DOI: 10.1109/ISQED.2010.5450397
Jawar Singh, Dilip S. Aswar, S. Mohanty, D. Pradhan
Low power, minimum transistor count and fast access static random access memory (SRAM) is essential for embedded multimedia and communication applications realized using system on a chip (SoC) technology. Hence, simultaneous or parallel read/write (R/W) access multi-port SRAM bitcells are widely employed in such embedded systems. In this paper, we present a 2-port 6T SRAM bitcell with multi-port capabilities and a reduced area overhead compared to existing 2-port 7-transistor (7T) and 8T SRAM bitcells. The proposed 2-port bitcell has six transistors (6T) and single-ended read and write bitlines (RBL/WBL). We compare the stability, simultaneous read/write disturbance, SNM sensitivity and misread current from the read bitline with the 7T and 8T bitcells. The static noise margin (SNM) of the 6T bitcells around the write disturbed bitcell is 53% to 61% higher than that of the 7T bitcell. The average active power dissipation under the different read/write operations of the 6T bitcells is 28% lower than the 8T and equal to 7T bitcell. Hence, the proposed 2-port 6T-SRAM is a potential candidate in terms of process variability, stability, area, and power dissipation.
{"title":"A 2-port 6T SRAM bitcell design with multi-port capabilities at reduced area overhead","authors":"Jawar Singh, Dilip S. Aswar, S. Mohanty, D. Pradhan","doi":"10.1109/ISQED.2010.5450397","DOIUrl":"https://doi.org/10.1109/ISQED.2010.5450397","url":null,"abstract":"Low power, minimum transistor count and fast access static random access memory (SRAM) is essential for embedded multimedia and communication applications realized using system on a chip (SoC) technology. Hence, simultaneous or parallel read/write (R/W) access multi-port SRAM bitcells are widely employed in such embedded systems. In this paper, we present a 2-port 6T SRAM bitcell with multi-port capabilities and a reduced area overhead compared to existing 2-port 7-transistor (7T) and 8T SRAM bitcells. The proposed 2-port bitcell has six transistors (6T) and single-ended read and write bitlines (RBL/WBL). We compare the stability, simultaneous read/write disturbance, SNM sensitivity and misread current from the read bitline with the 7T and 8T bitcells. The static noise margin (SNM) of the 6T bitcells around the write disturbed bitcell is 53% to 61% higher than that of the 7T bitcell. The average active power dissipation under the different read/write operations of the 6T bitcells is 28% lower than the 8T and equal to 7T bitcell. Hence, the proposed 2-port 6T-SRAM is a potential candidate in terms of process variability, stability, area, and power dissipation.","PeriodicalId":369046,"journal":{"name":"2010 11th International Symposium on Quality Electronic Design (ISQED)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124061694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-22DOI: 10.1109/ISQED.2010.5450514
Q. Ma, Tan Yan, Martin D. F. Wong
The negotiated congestion based routing scheme finds success in FPGA routing and IC global routing. However, its application in simultaneous escape routing, a key problem in PCB design, has never been reported in previous literature. In this paper, we investigate how well the negotiated congestion based router performs on escape routing problems. We propose an underlying routing graph which correctly models the routing resources of the pin grids on board. We then build a Negotiated Congestion based Escape Router (NCER) by applying the negotiated congestion routing scheme on the constructed routing graph. We compare the performance of NCER with that of Cadence PCB router Allegro on 14 industrial test cases, and experimental results show that the two routers have comparable routability: each of them completely routes 7 test cases. Moreover, we observe that NCER and Allegro exhibit complementary behaviors: each is able to solve most of the test cases that the other cannot solve. Together, they completely route 11 test cases. Therefore, by using NCER as a supplement to Allegro, we can solve a broader range of escape routing problems.
{"title":"A negotiated congestion based router for simultaneous escape routing","authors":"Q. Ma, Tan Yan, Martin D. F. Wong","doi":"10.1109/ISQED.2010.5450514","DOIUrl":"https://doi.org/10.1109/ISQED.2010.5450514","url":null,"abstract":"The negotiated congestion based routing scheme finds success in FPGA routing and IC global routing. However, its application in simultaneous escape routing, a key problem in PCB design, has never been reported in previous literature. In this paper, we investigate how well the negotiated congestion based router performs on escape routing problems. We propose an underlying routing graph which correctly models the routing resources of the pin grids on board. We then build a Negotiated Congestion based Escape Router (NCER) by applying the negotiated congestion routing scheme on the constructed routing graph. We compare the performance of NCER with that of Cadence PCB router Allegro on 14 industrial test cases, and experimental results show that the two routers have comparable routability: each of them completely routes 7 test cases. Moreover, we observe that NCER and Allegro exhibit complementary behaviors: each is able to solve most of the test cases that the other cannot solve. Together, they completely route 11 test cases. Therefore, by using NCER as a supplement to Allegro, we can solve a broader range of escape routing problems.","PeriodicalId":369046,"journal":{"name":"2010 11th International Symposium on Quality Electronic Design (ISQED)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124635065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-22DOI: 10.1109/ISQED.2010.5450478
Dongkeun Oh, C. C. Chen, N. Kim, Y. Hu
Integrating multiple cores into a processor increases the heat density significantly, which often constrains the maximum performance of such a processor. There have been many techniques using dynamic voltage and frequency scaling (DVFS) and thread migration to manipulate heat dissipation in thermally-constrained multi-core processors. However, most of them were analyzed and applied individually for optimizing the performance of the multi-core processors while their computational cost for the optimization was not studied well. In this paper, we argue that a coherent organization of two techniques can maximize the performance of the multi-core processors with the least performance overheads associated with the thermal management techniques. Furthermore, we also propose an efficient method to optimize the performance of thermal-constrained multi-core processors. According to our experiment, we achieved 5% throughput improvement with negligible computation cost.
{"title":"The compatibility analysis of thread migration and DVFS in multi-core processor","authors":"Dongkeun Oh, C. C. Chen, N. Kim, Y. Hu","doi":"10.1109/ISQED.2010.5450478","DOIUrl":"https://doi.org/10.1109/ISQED.2010.5450478","url":null,"abstract":"Integrating multiple cores into a processor increases the heat density significantly, which often constrains the maximum performance of such a processor. There have been many techniques using dynamic voltage and frequency scaling (DVFS) and thread migration to manipulate heat dissipation in thermally-constrained multi-core processors. However, most of them were analyzed and applied individually for optimizing the performance of the multi-core processors while their computational cost for the optimization was not studied well. In this paper, we argue that a coherent organization of two techniques can maximize the performance of the multi-core processors with the least performance overheads associated with the thermal management techniques. Furthermore, we also propose an efficient method to optimize the performance of thermal-constrained multi-core processors. According to our experiment, we achieved 5% throughput improvement with negligible computation cost.","PeriodicalId":369046,"journal":{"name":"2010 11th International Symposium on Quality Electronic Design (ISQED)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129721349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-22DOI: 10.1109/ISQED.2010.5450451
R. Ubar, Dmitri Mironov, J. Raik, A. Jutman
The paper presents a new structural fault-independent fault collapsing method based on the topology analysis of the circuit, which has linear complexity. The minimal necessary set of faults as the target objective for test generation is found. The main idea is to produce fault collapsing concurrently with the construction of structurally synthesized binary decision diagrams (SSBDD) used for test generation, as a side effect. To improve the fault collapsing, a new class of BDDs in a form of SSBDDs with multiple inputs (SSMIBDD) is proposed, which allows a significant reduction of the model complexity for test generation purposes, and produces collapsed fault sets with less sizes than the SSBDDs provide. Experimental data show that the fault collapsing by the proposed method is considerably more efficient than other structural fault collapsing methods with comparative time cost. The method is especially efficient for circuits with high rate of internal fanouts.
{"title":"Structural fault collapsing by superposition of BDDs for test generation in digital circuits","authors":"R. Ubar, Dmitri Mironov, J. Raik, A. Jutman","doi":"10.1109/ISQED.2010.5450451","DOIUrl":"https://doi.org/10.1109/ISQED.2010.5450451","url":null,"abstract":"The paper presents a new structural fault-independent fault collapsing method based on the topology analysis of the circuit, which has linear complexity. The minimal necessary set of faults as the target objective for test generation is found. The main idea is to produce fault collapsing concurrently with the construction of structurally synthesized binary decision diagrams (SSBDD) used for test generation, as a side effect. To improve the fault collapsing, a new class of BDDs in a form of SSBDDs with multiple inputs (SSMIBDD) is proposed, which allows a significant reduction of the model complexity for test generation purposes, and produces collapsed fault sets with less sizes than the SSBDDs provide. Experimental data show that the fault collapsing by the proposed method is considerably more efficient than other structural fault collapsing methods with comparative time cost. The method is especially efficient for circuits with high rate of internal fanouts.","PeriodicalId":369046,"journal":{"name":"2010 11th International Symposium on Quality Electronic Design (ISQED)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129589672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-22DOI: 10.1109/ISQED.2010.5450546
L. Józwiak, Y. Jan
This paper focuses on mastering the architecture development of hardware accelerators for demanding applications. It presents the results of our analysis of the main problems that have to be solved when designing accelerators for modern demanding applications, and illustrates the problems with an example of accelerator design for LDPC code decoders for the newest communication system standards. Based on the results of our analysis, we formulate the main requirements that have to be satisfied by an adequate methodology for demanding accelerator design, and propose an architecture design methodology which satisfies the requirements.
{"title":"Quality-driven methodology for demanding accelerator design","authors":"L. Józwiak, Y. Jan","doi":"10.1109/ISQED.2010.5450546","DOIUrl":"https://doi.org/10.1109/ISQED.2010.5450546","url":null,"abstract":"This paper focuses on mastering the architecture development of hardware accelerators for demanding applications. It presents the results of our analysis of the main problems that have to be solved when designing accelerators for modern demanding applications, and illustrates the problems with an example of accelerator design for LDPC code decoders for the newest communication system standards. Based on the results of our analysis, we formulate the main requirements that have to be satisfied by an adequate methodology for demanding accelerator design, and propose an architecture design methodology which satisfies the requirements.","PeriodicalId":369046,"journal":{"name":"2010 11th International Symposium on Quality Electronic Design (ISQED)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127340919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-22DOI: 10.1109/ISQED.2010.5450433
D. Drmanac, B. Bolin, Li-C. Wang
This work proposes a non-parametric methodology for quick and effective behavioral macromodeling of complex digital and analog devices. Gaussian Process Regression (GPR) learning algorithms are used to generate simple, robust, and widely applicable time-domain models without specifying device equations or parameters. SPICE simulations expose device dynamics to train behavioral models while exhaustive validation ensures accurate and efficient models are generated. Average speedups of 97X are observed over SPICE simulation maintaining accurate outputs within 95% confidence intervals.
{"title":"A non-parametric approach to behavioral device modeling","authors":"D. Drmanac, B. Bolin, Li-C. Wang","doi":"10.1109/ISQED.2010.5450433","DOIUrl":"https://doi.org/10.1109/ISQED.2010.5450433","url":null,"abstract":"This work proposes a non-parametric methodology for quick and effective behavioral macromodeling of complex digital and analog devices. Gaussian Process Regression (GPR) learning algorithms are used to generate simple, robust, and widely applicable time-domain models without specifying device equations or parameters. SPICE simulations expose device dynamics to train behavioral models while exhaustive validation ensures accurate and efficient models are generated. Average speedups of 97X are observed over SPICE simulation maintaining accurate outputs within 95% confidence intervals.","PeriodicalId":369046,"journal":{"name":"2010 11th International Symposium on Quality Electronic Design (ISQED)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129948974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-22DOI: 10.1109/ISQED.2010.5450550
M. Ghasemazar, E. Pakbaznia, Massoud Pedram
In a multi-core system, power and performance may be dynamically traded off by utilizing power management (PM). This paper addresses the problem of minimizing the total power consumption of a Chip Multiprocessor (CMP) while maintaining a target average throughput. The proposed solution relies on a hierarchical framework, which employs core consolidation, coarse-grain dynamic voltage and frequency scaling (DVFS), and task assignment at the CMP level and fine-grain DVFS based on closed-loop feedback control at the individual core level. Our experimental results are very favorable showing noticeable average power saving compared to a baseline technique, and demonstrate the high efficacy of the proposed hierarchical PM framework.
{"title":"Minimizing the power consumption of a Chip Multiprocessor under an average throughput constraint","authors":"M. Ghasemazar, E. Pakbaznia, Massoud Pedram","doi":"10.1109/ISQED.2010.5450550","DOIUrl":"https://doi.org/10.1109/ISQED.2010.5450550","url":null,"abstract":"In a multi-core system, power and performance may be dynamically traded off by utilizing power management (PM). This paper addresses the problem of minimizing the total power consumption of a Chip Multiprocessor (CMP) while maintaining a target average throughput. The proposed solution relies on a hierarchical framework, which employs core consolidation, coarse-grain dynamic voltage and frequency scaling (DVFS), and task assignment at the CMP level and fine-grain DVFS based on closed-loop feedback control at the individual core level. Our experimental results are very favorable showing noticeable average power saving compared to a baseline technique, and demonstrate the high efficacy of the proposed hierarchical PM framework.","PeriodicalId":369046,"journal":{"name":"2010 11th International Symposium on Quality Electronic Design (ISQED)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131044998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}