Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081370
A. More, B. Taskin
The feasibility of the dynamic reconfigurability of the network layer of a hybrid wireless network-on-chip (NoC) that uses on-chip antennas for the wireless network layer and metal interconnects for the wired network layer is studied. The reconfigurability of the NoC is analyzed using a circuit co-simulation technique with a 3D finite element method (FEM) based full-wave electro-magnetic analysis of the antennas. The die and the circuits are modeled according to a typical complementary metal oxide semiconductor (CMOS) technology. It is shown that, it is possible to have 1) at least two different frequency domains for the signal sources and 2) the dynamic switching of the signal sinks between the two frequency domains, with minimal design and area overhead. When implemented, the proposed reconfigurable hybrid network architecture can reduce the latency and increase the network throughput.
{"title":"EM and circuit co-simulation of a reconfigurable hybrid wireless NoC on 2D ICs","authors":"A. More, B. Taskin","doi":"10.1109/ICCD.2011.6081370","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081370","url":null,"abstract":"The feasibility of the dynamic reconfigurability of the network layer of a hybrid wireless network-on-chip (NoC) that uses on-chip antennas for the wireless network layer and metal interconnects for the wired network layer is studied. The reconfigurability of the NoC is analyzed using a circuit co-simulation technique with a 3D finite element method (FEM) based full-wave electro-magnetic analysis of the antennas. The die and the circuits are modeled according to a typical complementary metal oxide semiconductor (CMOS) technology. It is shown that, it is possible to have 1) at least two different frequency domains for the signal sources and 2) the dynamic switching of the signal sinks between the two frequency domains, with minimal design and area overhead. When implemented, the proposed reconfigurable hybrid network architecture can reduce the latency and increase the network throughput.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115397072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081439
V. G. Rao, H. Mahmoodi
The effect of aging has become an important reliability concern in modern CMOS technology. NBTI and PBTI are known to bring about an increase in threshold voltage of the PMOS and NMOS respectively. This paper studies the effect of NBTI and PBTI on different flip-flop circuits with key parameters such as setup time, hold time, clock to output delay and data to output delay. The results in a predictive 32 nm technology show an increase of 0.43 to 1.23 pico-seconds in data-to-output delay depending on the Flip-Flop type. Moreover, we propose a method to use dual threshold voltage assignment to mitigate the effect of transistor aging on pulse triggered Flip-Flops. Dual Vth results show lower delay as well as 30% reduction in delay aging using the proposed dual threshold voltage method.
{"title":"Analysis of reliability of flip-flops under transistor aging effects in nano-scale CMOS technology","authors":"V. G. Rao, H. Mahmoodi","doi":"10.1109/ICCD.2011.6081439","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081439","url":null,"abstract":"The effect of aging has become an important reliability concern in modern CMOS technology. NBTI and PBTI are known to bring about an increase in threshold voltage of the PMOS and NMOS respectively. This paper studies the effect of NBTI and PBTI on different flip-flop circuits with key parameters such as setup time, hold time, clock to output delay and data to output delay. The results in a predictive 32 nm technology show an increase of 0.43 to 1.23 pico-seconds in data-to-output delay depending on the Flip-Flop type. Moreover, we propose a method to use dual threshold voltage assignment to mitigate the effect of transistor aging on pulse triggered Flip-Flops. Dual Vth results show lower delay as well as 30% reduction in delay aging using the proposed dual threshold voltage method.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116639504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081426
M. Arjomand, A. Jadidi, Ali Shafiee, H. Sarbazi-Azad
Phase Change Memory (PCM) is emerging as a high-dense and power-efficient choice for future main memory systems. While PCM cell size is marching towards minimum achievable feature size, recent prototypes effectively improve device scalability by storing multiple bits per each cell. Unfortunately, Multi-Level Cell (MLC) PCM devices offer higher access time and energy when compared to Single-Level Cell (SLC) counterparts making it difficult to incorporate MLC in main memory. To address this challenge, we proposes Zero-value-based Morphable PCM, ZM-PCM for short, a novel MLC-PCM main memory architecture which tries incorporating benefits of both MLC and SLC devices within the same structure. ZM-PCM relies on the observation that zero value at various granularities is frequently occurred within main memory transactions when running PARSEC-2 programs. Motivated by this observation, ZM-PCM codes redundant zero MLC cells into limited bits that is storable in the SLC (or alternatively in devices with fewer bits) form with improved latency, energy, and lifetime with no reduction in available main memory capacity. We evaluate microarchitecture design of morphable PCM cell, coding and decoding algorithms and details of related circuits. We also introduce a simple area-efficient caching mechanism for fast cost-efficient access to coding metadata. Our evaluation on a quad-core CMP with 4GB 8-bit MLC PCM main memory shows that ZM-PCM morphs up to 93% (and 50% on average) of all memory cells with lower densities which directly turns in performance, power and lifetime enhancement.
{"title":"A morphable phase change memory architecture considering frequent zero values","authors":"M. Arjomand, A. Jadidi, Ali Shafiee, H. Sarbazi-Azad","doi":"10.1109/ICCD.2011.6081426","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081426","url":null,"abstract":"Phase Change Memory (PCM) is emerging as a high-dense and power-efficient choice for future main memory systems. While PCM cell size is marching towards minimum achievable feature size, recent prototypes effectively improve device scalability by storing multiple bits per each cell. Unfortunately, Multi-Level Cell (MLC) PCM devices offer higher access time and energy when compared to Single-Level Cell (SLC) counterparts making it difficult to incorporate MLC in main memory. To address this challenge, we proposes Zero-value-based Morphable PCM, ZM-PCM for short, a novel MLC-PCM main memory architecture which tries incorporating benefits of both MLC and SLC devices within the same structure. ZM-PCM relies on the observation that zero value at various granularities is frequently occurred within main memory transactions when running PARSEC-2 programs. Motivated by this observation, ZM-PCM codes redundant zero MLC cells into limited bits that is storable in the SLC (or alternatively in devices with fewer bits) form with improved latency, energy, and lifetime with no reduction in available main memory capacity. We evaluate microarchitecture design of morphable PCM cell, coding and decoding algorithms and details of related circuits. We also introduce a simple area-efficient caching mechanism for fast cost-efficient access to coding metadata. Our evaluation on a quad-core CMP with 4GB 8-bit MLC PCM main memory shows that ZM-PCM morphs up to 93% (and 50% on average) of all memory cells with lower densities which directly turns in performance, power and lifetime enhancement.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127114068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081420
Shrikanth Ganapathy, R. Canal, Antonio González, A. Rubio
In this paper, we propose a dynamically tunable fine-grain body biasing mechanism to reduce standby leakage power in first level data-caches under process variations. Accessed physical arrays are forward body biased (FBB) to improve latency while idle (unaccessed) arrays are reverse body biased (RBB) for reducing standby leakage power. The bias voltage to be applied is computed at design time and updated at run-time to counter the negative effects of process variations. This ensures that under all scenarios, the cache will consume the lowest leakage power for the target access latency computed at design-time. A sensor-like hardware mechanism measures the variation in latency and leakage at run-time and this measurement is used to update the bias voltage. The backbone of the hardware used for measurement is a three-transistor one-diode(3T1D)DRAM cell embedded into a regular cache array. By measuring the access and retention time of the 3T1D cell, we show that it is possible to classify cache arrays based on run-time latency/leakage profiles. Our technique reduces leakage energy consumption and access latency of the cache on an average by 20% & 18% respectively. Finally we show that our technique will improve parametric yield by a maximum of 38% for worst-case scenario.
在本文中,我们提出了一种动态可调的细颗粒体偏置机制,以降低工艺变化下一级数据缓存的待机泄漏功率。已接入的物理阵列采用FBB (forward body biased),以提高时延;空闲(未接入)的物理阵列采用RBB (reverse body biased),以降低待机泄漏功率。要施加的偏置电压在设计时计算,并在运行时更新,以抵消工艺变化的负面影响。这确保了在所有场景下,缓存在设计时计算的目标访问延迟将消耗最低的泄漏功率。类似传感器的硬件机制测量运行时延迟和泄漏的变化,并使用该测量来更新偏置电压。用于测量的硬件的骨干是一个嵌入到常规缓存阵列中的三晶体管单二极管(3T1D)DRAM单元。通过测量3T1D单元的访问和保留时间,我们表明可以根据运行时延迟/泄漏概况对缓存阵列进行分类。我们的技术将缓存的泄漏能耗和访问延迟平均分别降低了20%和18%。最后,我们表明,在最坏的情况下,我们的技术将使参数产率提高38%。
{"title":"Dynamic fine-grain body biasing of caches with latency and leakage 3T1D-based monitors","authors":"Shrikanth Ganapathy, R. Canal, Antonio González, A. Rubio","doi":"10.1109/ICCD.2011.6081420","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081420","url":null,"abstract":"In this paper, we propose a dynamically tunable fine-grain body biasing mechanism to reduce standby leakage power in first level data-caches under process variations. Accessed physical arrays are forward body biased (FBB) to improve latency while idle (unaccessed) arrays are reverse body biased (RBB) for reducing standby leakage power. The bias voltage to be applied is computed at design time and updated at run-time to counter the negative effects of process variations. This ensures that under all scenarios, the cache will consume the lowest leakage power for the target access latency computed at design-time. A sensor-like hardware mechanism measures the variation in latency and leakage at run-time and this measurement is used to update the bias voltage. The backbone of the hardware used for measurement is a three-transistor one-diode(3T1D)DRAM cell embedded into a regular cache array. By measuring the access and retention time of the 3T1D cell, we show that it is possible to classify cache arrays based on run-time latency/leakage profiles. Our technique reduces leakage energy consumption and access latency of the cache on an average by 20% & 18% respectively. Finally we show that our technique will improve parametric yield by a maximum of 38% for worst-case scenario.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127890568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081446
Kai Du, P. Varman, K. Mohanram
Speculative adders have attracted strong interest for achieving sublogarithmic delays by exploiting the tradeoffs between correctness and performance. Speculative adders also find use in the design of error-free variable latency adders, which combine speculation with error correction to achieve high performance for low area overhead over traditional adders. This paper describes static window addition (SWA), a novel function speculation technique for the design of low overhead, high performance variable latency adders. Analytical models for the error rate of SWA-based speculative adders are developed to facilitate both design exploration and convergence. We show that on average, variable latency addition using SWA-based speculative adders is 10% faster than the fastest DesignWare adder with area requirements of -5 to 40% for different adder widths.
{"title":"Static window addition: A new paradigm for the design of variable latency adders","authors":"Kai Du, P. Varman, K. Mohanram","doi":"10.1109/ICCD.2011.6081446","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081446","url":null,"abstract":"Speculative adders have attracted strong interest for achieving sublogarithmic delays by exploiting the tradeoffs between correctness and performance. Speculative adders also find use in the design of error-free variable latency adders, which combine speculation with error correction to achieve high performance for low area overhead over traditional adders. This paper describes static window addition (SWA), a novel function speculation technique for the design of low overhead, high performance variable latency adders. Analytical models for the error rate of SWA-based speculative adders are developed to facilitate both design exploration and convergence. We show that on average, variable latency addition using SWA-based speculative adders is 10% faster than the fastest DesignWare adder with area requirements of -5 to 40% for different adder widths.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115860419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081432
Pramod Subramanyan, Virendra Singh, K. Saluja, E. Larsson
Relentless scaling of CMOS fabrication technology has made contemporary integrated circuits increasingly susceptible to transient faults, wearout-related permanent faults, intermittent faults and process variations. Therefore, mechanisms to mitigate the effects of decreased reliability are expected to become essential components of future general-purpose microprocessors.
{"title":"Adaptive execution assistance for multiplexed fault-tolerant chip multiprocessors","authors":"Pramod Subramanyan, Virendra Singh, K. Saluja, E. Larsson","doi":"10.1109/ICCD.2011.6081432","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081432","url":null,"abstract":"Relentless scaling of CMOS fabrication technology has made contemporary integrated circuits increasingly susceptible to transient faults, wearout-related permanent faults, intermittent faults and process variations. Therefore, mechanisms to mitigate the effects of decreased reliability are expected to become essential components of future general-purpose microprocessors.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130290679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081435
Gulay Yalcin, O. Unsal, A. Cristal, M. Valero
Fault injection is a widely used approach for experiment-based dependability evaluation. Injecting faults to microarchitectural simulators is particularly appealing for researchers, since it can be utilized at the early design stage of the processor. As such, it enables a preliminary analysis of the correlation between the criticality of processor-structure level faults and their impact on applications. In this study, we present FIMSIM, a compact fault injection infrastructure for microarchitectural simulators which is capable of injecting transient, permanent, intermittent and multi-bit faults. FIMSIM provides the opportunity to comprehensively evaluate the vulnerability of different microarchitectural structures against different fault models.
{"title":"FIMSIM: A fault injection infrastructure for microarchitectural simulators","authors":"Gulay Yalcin, O. Unsal, A. Cristal, M. Valero","doi":"10.1109/ICCD.2011.6081435","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081435","url":null,"abstract":"Fault injection is a widely used approach for experiment-based dependability evaluation. Injecting faults to microarchitectural simulators is particularly appealing for researchers, since it can be utilized at the early design stage of the processor. As such, it enables a preliminary analysis of the correlation between the criticality of processor-structure level faults and their impact on applications. In this study, we present FIMSIM, a compact fault injection infrastructure for microarchitectural simulators which is capable of injecting transient, permanent, intermittent and multi-bit faults. FIMSIM provides the opportunity to comprehensively evaluate the vulnerability of different microarchitectural structures against different fault models.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115388950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081441
Da Cheng, S. Gupta
Traditional approaches for improving yield are based on the use of hardware redundancy (HR), and their benefits are limited for high defect densities due to increasing layout complexities and diminishing return effects. This research is based on an observation that completely correct operation of user programs can be guaranteed while using chips with one or more unrepairable memory modules if software-level techniques satisfy two condistions: (1) defects only affect a few memory cells rather than cause malfunction for the entire memory module, and (2) either we do not use any part of the memory affected by the un-repaired defect, or we do use the affected part, but only in a manner that does not excite the un-repaired defect to cause errors. This paper proposes a software-based defect-tolerance (SBDT) approach in combination with HR to utilize defective memory chips for application-specific systems.
{"title":"A novel software-based defect-tolerance approach for application-specific embedded systems","authors":"Da Cheng, S. Gupta","doi":"10.1109/ICCD.2011.6081441","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081441","url":null,"abstract":"Traditional approaches for improving yield are based on the use of hardware redundancy (HR), and their benefits are limited for high defect densities due to increasing layout complexities and diminishing return effects. This research is based on an observation that completely correct operation of user programs can be guaranteed while using chips with one or more unrepairable memory modules if software-level techniques satisfy two condistions: (1) defects only affect a few memory cells rather than cause malfunction for the entire memory module, and (2) either we do not use any part of the memory affected by the un-repaired defect, or we do use the affected part, but only in a manner that does not excite the un-repaired defect to cause errors. This paper proposes a software-based defect-tolerance (SBDT) approach in combination with HR to utilize defective memory chips for application-specific systems.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117204460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081421
Domenic Forte, Ankur Srivastava
There are a growing number of visual tracking applications for mobile devices. However, the computer vision algorithms which process real-time video to track moving targets are demanding. Since a single mobile device possesses limited computational capabilities, energy, etc. to fully support target tracking, some works have investigated architectures which migrate a portion of tracking duties to another device at the cost of transmission bandwidth and energy. In this paper, we investigate the resource utilization in such architectures and present an adaptable architecture which balances tracking workload among the participating devices based on current resource availability (energy, temperature, bandwidth). Results show that the proposed solution requires low additional overhead, can improve on tracking system lifetime by reducing energy consumption, and is more effective in maintaining safe operating temperatures within participants as compared to previously investigated architecture
{"title":"Adaptable architectures for distributed visual target tracking","authors":"Domenic Forte, Ankur Srivastava","doi":"10.1109/ICCD.2011.6081421","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081421","url":null,"abstract":"There are a growing number of visual tracking applications for mobile devices. However, the computer vision algorithms which process real-time video to track moving targets are demanding. Since a single mobile device possesses limited computational capabilities, energy, etc. to fully support target tracking, some works have investigated architectures which migrate a portion of tracking duties to another device at the cost of transmission bandwidth and energy. In this paper, we investigate the resource utilization in such architectures and present an adaptable architecture which balances tracking workload among the participating devices based on current resource availability (energy, temperature, bandwidth). Results show that the proposed solution requires low additional overhead, can improve on tracking system lifetime by reducing energy consumption, and is more effective in maintaining safe operating temperatures within participants as compared to previously investigated architecture","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129720348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081374
Changshu Zhang, A. Ravindran, Kushal Datta, A. Mukherjee, B. Joshi
Exploring the vast microarchitectural design space of chip multiprocessors (CMPs) through the traditional approach of exhaustive simulations is impractical due to the long simulation times and its super-linear increase with core scaling. Kernel based statistical machine learning algorithms can potentially help predict multiple performance metrics with non-linear dependence on the CMP design parameters. In this paper, we describe and evaluate a machine learning framework that uses Kernel Canonical Correlation Analysis (KCCA) to predict the power dissipation and performance of CMPs. Specifically we focus on modeling the microarchitecture of a highly multithreaded CMP targeted towards packet processing. We use a cycle accurate CMP simulator to generate training samples required to build the model. Despite sampling only 0.016% of the design space we observe a median error of 6–10% in the KCCA predicted processor power dissipation and performance.
{"title":"A machine learning approach to modeling power and performance of chip multiprocessors","authors":"Changshu Zhang, A. Ravindran, Kushal Datta, A. Mukherjee, B. Joshi","doi":"10.1109/ICCD.2011.6081374","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081374","url":null,"abstract":"Exploring the vast microarchitectural design space of chip multiprocessors (CMPs) through the traditional approach of exhaustive simulations is impractical due to the long simulation times and its super-linear increase with core scaling. Kernel based statistical machine learning algorithms can potentially help predict multiple performance metrics with non-linear dependence on the CMP design parameters. In this paper, we describe and evaluate a machine learning framework that uses Kernel Canonical Correlation Analysis (KCCA) to predict the power dissipation and performance of CMPs. Specifically we focus on modeling the microarchitecture of a highly multithreaded CMP targeted towards packet processing. We use a cycle accurate CMP simulator to generate training samples required to build the model. Despite sampling only 0.016% of the design space we observe a median error of 6–10% in the KCCA predicted processor power dissipation and performance.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128541033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}