Pub Date : 2013-03-04DOI: 10.1109/ISQED.2013.6523602
Ivan Ratković, Oscar Palomar, Milan Stanic, O. Unsal, A. Cristal, M. Valero
Vector processors are a very promising solution for mobile devices and servers due to their inherently energy-efficient way of exploiting data-level parallelism. Previous research on vector architectures predominantly focused on performance, so vector processors require a new design space exploration to achieve low power. In this paper, we present a design space exploration of adder unit for vector processors (VA), as it is one of the crucial components in the core design with a non-negligible impact in overall performance and power. For this interrelated circuit-architecture exploration, we developed a novel framework with both architectural- and circuit-level tools. Our framework includes both design- (e.g. adder's family type) and vector architecture-related parameters (e.g. vector length). Finally, we present guidelines on the selection of the most appropriate VA for different types of vector processors according to different sets of metrics of interest. For example, we found that 2-lane configurations are more EDP (Energy×Delay)-efficient than single lane configurations for low-end mobile processors.
{"title":"On the selection of adder unit in energy efficient vector processing","authors":"Ivan Ratković, Oscar Palomar, Milan Stanic, O. Unsal, A. Cristal, M. Valero","doi":"10.1109/ISQED.2013.6523602","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523602","url":null,"abstract":"Vector processors are a very promising solution for mobile devices and servers due to their inherently energy-efficient way of exploiting data-level parallelism. Previous research on vector architectures predominantly focused on performance, so vector processors require a new design space exploration to achieve low power. In this paper, we present a design space exploration of adder unit for vector processors (VA), as it is one of the crucial components in the core design with a non-negligible impact in overall performance and power. For this interrelated circuit-architecture exploration, we developed a novel framework with both architectural- and circuit-level tools. Our framework includes both design- (e.g. adder's family type) and vector architecture-related parameters (e.g. vector length). Finally, we present guidelines on the selection of the most appropriate VA for different types of vector processors according to different sets of metrics of interest. For example, we found that 2-lane configurations are more EDP (Energy×Delay)-efficient than single lane configurations for low-end mobile processors.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132502680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-04DOI: 10.1109/ISQED.2013.6523595
A. Annamalai, Raghavan Kumar, Arunkumar Vijayakumar, S. Kundu
As conventional CMOS technology is approaching scaling limits, the shift in trend towards stacked 3D Integrated Circuits (3D IC) is gaining more importance. 3D ICs offer reduced power dissipation, higher integration density, heterogeneous stacking and reduced interconnect delays. In a 3D IC stack, all but the bottom tier are thinned down to enable through-silicon vias (TSV). However, the thinning of the substrate increases the lateral thermal resistance resulting in higher intra-layer temperature gradients potentially leading to performance degradation and even functional errors. In this work, we study the effect of thinning the substrate on temperature profile of various tiers in 3D ICs. Our simulation results show that the intra-layer temperature gradient can be as high as 57°C. Often, the conventional static solutions lead to highly inefficient design. To this end, we present a system-level situation-aware integrated scheme that performs opportunistic thread migration and dynamic voltage and frequency scaling (DVFS) to effectively manage thermal violations while increasing the system throughput relative to stand-alone schemes.
{"title":"A system-level solution for managing spatial temperature gradients in thinned 3D ICs","authors":"A. Annamalai, Raghavan Kumar, Arunkumar Vijayakumar, S. Kundu","doi":"10.1109/ISQED.2013.6523595","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523595","url":null,"abstract":"As conventional CMOS technology is approaching scaling limits, the shift in trend towards stacked 3D Integrated Circuits (3D IC) is gaining more importance. 3D ICs offer reduced power dissipation, higher integration density, heterogeneous stacking and reduced interconnect delays. In a 3D IC stack, all but the bottom tier are thinned down to enable through-silicon vias (TSV). However, the thinning of the substrate increases the lateral thermal resistance resulting in higher intra-layer temperature gradients potentially leading to performance degradation and even functional errors. In this work, we study the effect of thinning the substrate on temperature profile of various tiers in 3D ICs. Our simulation results show that the intra-layer temperature gradient can be as high as 57°C. Often, the conventional static solutions lead to highly inefficient design. To this end, we present a system-level situation-aware integrated scheme that performs opportunistic thread migration and dynamic voltage and frequency scaling (DVFS) to effectively manage thermal violations while increasing the system throughput relative to stand-alone schemes.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133499365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-04DOI: 10.1109/ISQED.2013.6523676
Yiqiang Ding, Wei Zhang
In a multicore platform, the inter-thread cache interferences can significantly affect the worst-case execution time (WCET) of each real-time task, which is crucial for schedulability analysis. At the same time, the worst-case cache interferences are dependent on how tasks are scheduled to run on different cores, thus creating a circular dependence. In this paper, we present an offline real-time scheduling approach on multicore processors by considering the worst-case inter-thread interferences on shared L2 caches. Our scheduling approach uses a greedy heuristic to generate safe schedules while minimizing the worst-case inter-thread shared L2 cache interferences and WCET. The experimental results demonstrate that the proposed approach can reduce the utilization of the resulting schedule by about 12% on average compared to the cyclic multicore scheduling approaches in our theoretical model.
{"title":"On the interactions between real-time scheduling and inter-thread cached interferences for multicore processors","authors":"Yiqiang Ding, Wei Zhang","doi":"10.1109/ISQED.2013.6523676","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523676","url":null,"abstract":"In a multicore platform, the inter-thread cache interferences can significantly affect the worst-case execution time (WCET) of each real-time task, which is crucial for schedulability analysis. At the same time, the worst-case cache interferences are dependent on how tasks are scheduled to run on different cores, thus creating a circular dependence. In this paper, we present an offline real-time scheduling approach on multicore processors by considering the worst-case inter-thread interferences on shared L2 caches. Our scheduling approach uses a greedy heuristic to generate safe schedules while minimizing the worst-case inter-thread shared L2 cache interferences and WCET. The experimental results demonstrate that the proposed approach can reduce the utilization of the resulting schedule by about 12% on average compared to the cyclic multicore scheduling approaches in our theoretical model.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133706965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-04DOI: 10.1109/ISQED.2013.6523623
Valeriu Beiu, A. Beg, W. Ibrahim, F. Kharbash, M. Alioto
This paper suggests a transistor sizing method for classical CMOS gates implemented in advanced technology nodes and operating at low voltages. The method relies on upsizing the length (L) of all transistors uniformly, and balancing the voltage transfer curves (VTCs) for maximizing the static noise margins (SNMs). We use the most well-known CMOS gates (INV, NAND-2, NOR-2) for introducing the novel sizing method, as well as for validating the concept and evaluating its performances. The results show that sizing has not entirely exhausted its potential, allowing to go beyond the well established delay-power tradeoff, as sizing can increase SNMs by: (i) adjusting the threshold voltages (VTH) and their variations (σVTH); and (ii) balancing the VTCs. Simulation results show that this sizing method enables more reliable (i.e., noise-robust and variation-tolerant) CMOS gates, which could operate correctly at very low supply voltages, hence leading to ultra-low voltage/power circuits.
{"title":"Enabling sizing for enhancing the static noise margins","authors":"Valeriu Beiu, A. Beg, W. Ibrahim, F. Kharbash, M. Alioto","doi":"10.1109/ISQED.2013.6523623","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523623","url":null,"abstract":"This paper suggests a transistor sizing method for classical CMOS gates implemented in advanced technology nodes and operating at low voltages. The method relies on upsizing the length (L) of all transistors uniformly, and balancing the voltage transfer curves (VTCs) for maximizing the static noise margins (SNMs). We use the most well-known CMOS gates (INV, NAND-2, NOR-2) for introducing the novel sizing method, as well as for validating the concept and evaluating its performances. The results show that sizing has not entirely exhausted its potential, allowing to go beyond the well established delay-power tradeoff, as sizing can increase SNMs by: (i) adjusting the threshold voltages (VTH) and their variations (σVTH); and (ii) balancing the VTCs. Simulation results show that this sizing method enables more reliable (i.e., noise-robust and variation-tolerant) CMOS gates, which could operate correctly at very low supply voltages, hence leading to ultra-low voltage/power circuits.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133825702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-04DOI: 10.1109/ISQED.2013.6523621
Z. Wang, Chao Chen, A. Chattopadhyay
The downscaling of technology features has brought the system developers an important design criteria, reliability, into prime consideration. Due to effects like external radiation and temperature gradients, the CMOS device is not guaranteed anymore to function flawlessly. Admission for errors to occur is also helpful as that increases the power budget. The power-reliability trade-off compounds the system design challenge by adding another metric, for which efficient design exploration framework is needed. In this work, we present a high-level design framework extended with the capability of fault injection, an important ingredient of reliability-driven design. Compared to the traditional HDL-based fault injection, the proposed fault injection during instruction-set simulation is significantly faster without any notable loss of accuracy. The fault injection framework also allows quick exploration of fault prevention measure both by the aid of software and hardware techniques. We demonstrate the efficacy of our approach by a case study with a RISC processor customized for cryptographic application, where fault protection plays a major role. We also benchmark our framework with a state-of-the-art HDL-based fault injection framework.
{"title":"Fast reliability exploration for embedded processors via high-level fault injection","authors":"Z. Wang, Chao Chen, A. Chattopadhyay","doi":"10.1109/ISQED.2013.6523621","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523621","url":null,"abstract":"The downscaling of technology features has brought the system developers an important design criteria, reliability, into prime consideration. Due to effects like external radiation and temperature gradients, the CMOS device is not guaranteed anymore to function flawlessly. Admission for errors to occur is also helpful as that increases the power budget. The power-reliability trade-off compounds the system design challenge by adding another metric, for which efficient design exploration framework is needed. In this work, we present a high-level design framework extended with the capability of fault injection, an important ingredient of reliability-driven design. Compared to the traditional HDL-based fault injection, the proposed fault injection during instruction-set simulation is significantly faster without any notable loss of accuracy. The fault injection framework also allows quick exploration of fault prevention measure both by the aid of software and hardware techniques. We demonstrate the efficacy of our approach by a case study with a RISC processor customized for cryptographic application, where fault protection plays a major role. We also benchmark our framework with a state-of-the-art HDL-based fault injection framework.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130107177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-04DOI: 10.1109/ISQED.2013.6523691
A. Evans, M. Nicolaidis, Shi-Jie Wen, Thiago Asis
In large SoCs, managing the effects of soft-errors in flip-flops is essential, however, selective mitigation is necessary to minimize the area and power costs. The identification of the optimal set of flip-flops to protect typically requires compute-intensive fault-injection campaigns. We present new techniques which group similar flip-flops into clusters to significantly reduce the number of fault injections. The number of required fault injections can be significantly lower than the total number of flip-flops and in one industrial design with over 100,000 flip-flops, by simulating only 2,100 fault injections, the technique identified a set of 4.1% of the flip-flops, which when protected, reduced the critical failure rate by a factor of 7x.
{"title":"Clustering techniques and statistical fault injection for selective mitigation of SEUs in flip-flops","authors":"A. Evans, M. Nicolaidis, Shi-Jie Wen, Thiago Asis","doi":"10.1109/ISQED.2013.6523691","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523691","url":null,"abstract":"In large SoCs, managing the effects of soft-errors in flip-flops is essential, however, selective mitigation is necessary to minimize the area and power costs. The identification of the optimal set of flip-flops to protect typically requires compute-intensive fault-injection campaigns. We present new techniques which group similar flip-flops into clusters to significantly reduce the number of fault injections. The number of required fault injections can be significantly lower than the total number of flip-flops and in one industrial design with over 100,000 flip-flops, by simulating only 2,100 fault injections, the technique identified a set of 4.1% of the flip-flops, which when protected, reduced the critical failure rate by a factor of 7x.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131837088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-04DOI: 10.1109/ISQED.2013.6523608
Na Gong, Jinhui Wang, R. Sridhar
In this paper, we propose an application-driven ALU design methodology to achieve high level of power efficiency for modern microprocessors. We introduce a PN selection algorithm (PNSA) which enables designers to select power efficient dynamic modules for different applications, based on the detailed analysis of dynamic circuits. Experimental results on ISCAS85 and 74X-Series benchmark circuits show that the power consumption of 8-bit ALU based on this approach can be reduced by 54%-60% for different frequency levels as compared to the conventional dynamic ALU design, demonstrating the effectiveness of the proposed method on application-driven custom ALU design.
{"title":"Application-driven power efficient ALU design methodology for modern microprocessors","authors":"Na Gong, Jinhui Wang, R. Sridhar","doi":"10.1109/ISQED.2013.6523608","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523608","url":null,"abstract":"In this paper, we propose an application-driven ALU design methodology to achieve high level of power efficiency for modern microprocessors. We introduce a PN selection algorithm (PNSA) which enables designers to select power efficient dynamic modules for different applications, based on the detailed analysis of dynamic circuits. Experimental results on ISCAS85 and 74X-Series benchmark circuits show that the power consumption of 8-bit ALU based on this approach can be reduced by 54%-60% for different frequency levels as compared to the conventional dynamic ALU design, demonstrating the effectiveness of the proposed method on application-driven custom ALU design.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129658143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-04DOI: 10.1109/ISQED.2013.6523597
Syed M. A. H. Jafri, Ozan Bag, A. Hemani, Nasim Farahini, K. Paul, J. Plosila, H. Tenhunen
This paper presents a self adaptive architecture to enhance the energy efficiency of coarse-grained reconfigurable architectures (CGRAs). Today, platforms host multiple applications, with arbitrary inter-application communication and concurrency patterns. Each application itself can have multiple versions (implementations with different degree of parallelism) and the optimal version can only be determined at runtime. For such scenarios, traditional worst case designs and compile time mapping decisions are neither optimal nor desirable. Existing solutions to this problem employ costly dedicated hardware to configure the operating point at runtime (using DVFS). As an alternative to dedicated hardware, we propose exploiting the reconfiguration features of modern CGRAs. Our solution relies on dynamically reconfigurable isolation cells (DRICs) and autonomous parallelism, voltage, and frequency selection algorithm (APVFS). The DRICs reduce the overheads of DVFS circuitry by configuring the existing resources as isolation cells. APVFS ensures high efficiency by dynamically selecting the parallelism, voltage and frequency trio, which consumes minimum power to meet the deadlines on available resources. Simulation results using representative applications (Matrix multiplication, FIR, and FFT) showed up to 23% and 51% reduction in power and energy, respectively, compared to traditional DVFS designs. Synthesis results have confirmed significant reduction in area overheads compared to state of the art DVFS methods.
{"title":"Energy-aware coarse-grained reconfigurable architectures using dynamically reconfigurable isolation cells","authors":"Syed M. A. H. Jafri, Ozan Bag, A. Hemani, Nasim Farahini, K. Paul, J. Plosila, H. Tenhunen","doi":"10.1109/ISQED.2013.6523597","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523597","url":null,"abstract":"This paper presents a self adaptive architecture to enhance the energy efficiency of coarse-grained reconfigurable architectures (CGRAs). Today, platforms host multiple applications, with arbitrary inter-application communication and concurrency patterns. Each application itself can have multiple versions (implementations with different degree of parallelism) and the optimal version can only be determined at runtime. For such scenarios, traditional worst case designs and compile time mapping decisions are neither optimal nor desirable. Existing solutions to this problem employ costly dedicated hardware to configure the operating point at runtime (using DVFS). As an alternative to dedicated hardware, we propose exploiting the reconfiguration features of modern CGRAs. Our solution relies on dynamically reconfigurable isolation cells (DRICs) and autonomous parallelism, voltage, and frequency selection algorithm (APVFS). The DRICs reduce the overheads of DVFS circuitry by configuring the existing resources as isolation cells. APVFS ensures high efficiency by dynamically selecting the parallelism, voltage and frequency trio, which consumes minimum power to meet the deadlines on available resources. Simulation results using representative applications (Matrix multiplication, FIR, and FFT) showed up to 23% and 51% reduction in power and energy, respectively, compared to traditional DVFS designs. Synthesis results have confirmed significant reduction in area overheads compared to state of the art DVFS methods.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117310185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-04DOI: 10.1109/ISQED.2013.6523679
F. Marranghello, A. Reis, R. Ribas
This work presents a novel approach to estimate the CMOS inverter delay. The proposed delay model uses the DC transfer curve in order to predict the inverter behavior for slow input transitions rather than estimating the discharging time. Moreover, the only required empirical parameters are those used to calibrate the transistor model. Results are on very good agreement with HSPICE simulations based on BSIM4 transistor model, over a wide range of input slopes and output loads. Comparisons to previously works show that such new delay model offers improved modeling with good trade-off between simplicity and accuracy. The average error is near to 3%, and the worst case error is smaller than 10%.
{"title":"CMOS inverter delay model based on DC transfer curve for slow input","authors":"F. Marranghello, A. Reis, R. Ribas","doi":"10.1109/ISQED.2013.6523679","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523679","url":null,"abstract":"This work presents a novel approach to estimate the CMOS inverter delay. The proposed delay model uses the DC transfer curve in order to predict the inverter behavior for slow input transitions rather than estimating the discharging time. Moreover, the only required empirical parameters are those used to calibrate the transistor model. Results are on very good agreement with HSPICE simulations based on BSIM4 transistor model, over a wide range of input slopes and output loads. Comparisons to previously works show that such new delay model offers improved modeling with good trade-off between simplicity and accuracy. The average error is near to 3%, and the worst case error is smaller than 10%.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134099282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-04DOI: 10.1109/ISQED.2013.6523583
Yue Hu, Shaoming Chen, Lu Peng, Edward Song, Jin-Woo Choi
Microchannel liquid cooling shows great potential in cooling 3D processors. However, the cooling of 3D processors is limited due to design-time and run-time challenges. Moreover, in new technologies, the processor power density is continually increasing and this will bring more serious challenges to liquid cooling. In this paper, we propose two thermal control techniques: 1) Core Vertically Placed (CVP) technique. According to the architecture of a processor core, two schemes are given for placing a core vertically onto multilayers. The 3D processor with the CVP technique can be better cooled since its separate hotspot blocks have a larger total contact area with the cooler surroundings. 2) Thermoelectric cooling (TEC) technique. We propose to incorporate the TEC technique into the liquid-cooled 3D processor to enhance the cooling of hotspots. Our experiments show the CVP technique reduces the maximum temperature up to 29.58 °C, and 16.64 °C on average compared with the baseline design. Moreover, the TEC technique effectively cools down a hotspot from 96.86 °C to 78.60 °C.
{"title":"Effective thermal control techniques for liquid-cooled 3D multi-core processors","authors":"Yue Hu, Shaoming Chen, Lu Peng, Edward Song, Jin-Woo Choi","doi":"10.1109/ISQED.2013.6523583","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523583","url":null,"abstract":"Microchannel liquid cooling shows great potential in cooling 3D processors. However, the cooling of 3D processors is limited due to design-time and run-time challenges. Moreover, in new technologies, the processor power density is continually increasing and this will bring more serious challenges to liquid cooling. In this paper, we propose two thermal control techniques: 1) Core Vertically Placed (CVP) technique. According to the architecture of a processor core, two schemes are given for placing a core vertically onto multilayers. The 3D processor with the CVP technique can be better cooled since its separate hotspot blocks have a larger total contact area with the cooler surroundings. 2) Thermoelectric cooling (TEC) technique. We propose to incorporate the TEC technique into the liquid-cooled 3D processor to enhance the cooling of hotspots. Our experiments show the CVP technique reduces the maximum temperature up to 29.58 °C, and 16.64 °C on average compared with the baseline design. Moreover, the TEC technique effectively cools down a hotspot from 96.86 °C to 78.60 °C.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114146658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}