Pub Date : 2007-10-01DOI: 10.1109/ICCD.2007.4601883
Jeffrey Fan, N. Mi, S. Tan
In this paper, we propose a novel on-chip voltage drop reduction technique for on-chip power delivery networks of VLSI systems in the presence of variational leakage current sources. The new method inserts decoupling capacitors (decaps) into the power grid networks to reduce the voltage fluctuation. The optimization is based on sensitivity-based conjugate gradientmethod and sequence of linear programming approach. Different from existing power grid noise reduction methods, the new approach considers the impacts of inter-die and intra-die variational leakage current sources due to unavoidable process variability during the decap optimization process for the first time. Leakage currents, which although are static in nature typically, can still add to the total voltage drops and dynamic voltage reduction thus must consider the leakage-induced voltage variations. The proposed algorithm exploits the relative constant variations for different decap configurations of power grid circuits to speed up the statistical optimization process. Decaps can be inserted in such a way that the resulting circuits have much higher probability to meet the voltage drop constraints in the presence of leakage current variations. Experimental results demonstrate the effectiveness of the proposed approach and show that the new method has 100X to 1,000X of speedup over the Monte Carlo based statistical decap optimization method.
{"title":"Voltage drop reduction for on-chip power delivery considering leakage current variations","authors":"Jeffrey Fan, N. Mi, S. Tan","doi":"10.1109/ICCD.2007.4601883","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601883","url":null,"abstract":"In this paper, we propose a novel on-chip voltage drop reduction technique for on-chip power delivery networks of VLSI systems in the presence of variational leakage current sources. The new method inserts decoupling capacitors (decaps) into the power grid networks to reduce the voltage fluctuation. The optimization is based on sensitivity-based conjugate gradientmethod and sequence of linear programming approach. Different from existing power grid noise reduction methods, the new approach considers the impacts of inter-die and intra-die variational leakage current sources due to unavoidable process variability during the decap optimization process for the first time. Leakage currents, which although are static in nature typically, can still add to the total voltage drops and dynamic voltage reduction thus must consider the leakage-induced voltage variations. The proposed algorithm exploits the relative constant variations for different decap configurations of power grid circuits to speed up the statistical optimization process. Decaps can be inserted in such a way that the resulting circuits have much higher probability to meet the voltage drop constraints in the presence of leakage current variations. Experimental results demonstrate the effectiveness of the proposed approach and show that the new method has 100X to 1,000X of speedup over the Monte Carlo based statistical decap optimization method.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"79 1","pages":"78-83"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73319654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-10-01DOI: 10.1109/ICCD.2007.4601956
R. Senguttuvan, Shreyas Sen, A. Chatterjee
Modern wireless transceiver systems are often overdesigned to meet the requirements of low bit error rate values at high data rates under worst-case channel operating conditions (interference, noise, multi-path effects). This results in circuits being designed with ldquosufficientrdquo margins leading to lower efficiency and high power consumption. In this paper, we develop an adaptive power management strategy for RF systems that optimally trades-off power vs. performance for the RF front-end to maintain operation at or below a specified maximum bit error rate (BER) across temporally changing operating conditions. As the communication channel degrades, more power is consumed by the RF front end and vice versa. Since the maximum bit-error rate specification is not violated, minimum voice or video quality through the wireless channel is always guaranteed.
{"title":"VIZOR: Virtually zero margin adaptive RF for ultra low power wireless communication","authors":"R. Senguttuvan, Shreyas Sen, A. Chatterjee","doi":"10.1109/ICCD.2007.4601956","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601956","url":null,"abstract":"Modern wireless transceiver systems are often overdesigned to meet the requirements of low bit error rate values at high data rates under worst-case channel operating conditions (interference, noise, multi-path effects). This results in circuits being designed with ldquosufficientrdquo margins leading to lower efficiency and high power consumption. In this paper, we develop an adaptive power management strategy for RF systems that optimally trades-off power vs. performance for the RF front-end to maintain operation at or below a specified maximum bit error rate (BER) across temporally changing operating conditions. As the communication channel degrades, more power is consumed by the RF front end and vice versa. Since the maximum bit-error rate specification is not violated, minimum voice or video quality through the wireless channel is always guaranteed.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"61 1","pages":"580-586"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78313110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-10-01DOI: 10.1109/ICCD.2007.4601932
James Tuck, Wei Liu, J. Torrellas
While speculative multithreading (SM) on a chip multiprocessor (CMP) has the ability to speed-up hard-to- parallelize applications, the power inefficiency of aggressive speculation is a concern. To improve SMs power effeciency, we note that not all the tasks that are running in a SM environment are equally critical. To leverage this insight, this paper develops a novel, widely-applicable task-criticality model for SM. It also proposes CAP, a novel architecture that builds a task-criticality graph dynamically and uses it to make scheduling decisions in a SM CMP. Experiments with SPECint, SPECfp, and Olden applications show that, in a CMP with one fast core and three slow ones, the E D2 with CAP is, on average, 91-95% of that without. Moreover, it is only 77-91% of the E D2 of a CMP with four fast cores and no CAP. Overall, we argue that scheduling for task criticality is beneficial.
{"title":"CAP: Criticality analysis for power-efficient speculative multithreading","authors":"James Tuck, Wei Liu, J. Torrellas","doi":"10.1109/ICCD.2007.4601932","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601932","url":null,"abstract":"While speculative multithreading (SM) on a chip multiprocessor (CMP) has the ability to speed-up hard-to- parallelize applications, the power inefficiency of aggressive speculation is a concern. To improve SMs power effeciency, we note that not all the tasks that are running in a SM environment are equally critical. To leverage this insight, this paper develops a novel, widely-applicable task-criticality model for SM. It also proposes CAP, a novel architecture that builds a task-criticality graph dynamically and uses it to make scheduling decisions in a SM CMP. Experiments with SPECint, SPECfp, and Olden applications show that, in a CMP with one fast core and three slow ones, the E D2 with CAP is, on average, 91-95% of that without. Moreover, it is only 77-91% of the E D2 of a CMP with four fast cores and no CAP. Overall, we argue that scheduling for task criticality is beneficial.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"41 1","pages":"409-416"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73711647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-10-01DOI: 10.1109/ICCD.2007.4601933
M. Modarressi, H. Sarbazi-Azad
A core mapping method for reconfigurable network-on-chip (NoC) architectures is presented in this paper. In most of the existing methods, mapping is carried out based on the traffic characteristics of a single application. However, several different applications are implemented and integrated in the modern complex system-on-chips which should be considered by mapping methods. In the proposed method, the reconfiguration (which is achieved by embedding programmable switches between routers of a mesh-based NoC) allows us to dynamically change the network topology in order to adapt it with the running application and optimize the power and performance metrics. The presented network architecture can be configured as an application- specific topology, while it still holds the benefits of the regular NoC topologies such as modularity and predictable electrical properties. The experimental results show that this method can effectively adapt the NoC to the running application and improve the power consumption and performance of the system.
{"title":"Power-aware mapping for reconfigurable NoC architectures","authors":"M. Modarressi, H. Sarbazi-Azad","doi":"10.1109/ICCD.2007.4601933","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601933","url":null,"abstract":"A core mapping method for reconfigurable network-on-chip (NoC) architectures is presented in this paper. In most of the existing methods, mapping is carried out based on the traffic characteristics of a single application. However, several different applications are implemented and integrated in the modern complex system-on-chips which should be considered by mapping methods. In the proposed method, the reconfiguration (which is achieved by embedding programmable switches between routers of a mesh-based NoC) allows us to dynamically change the network topology in order to adapt it with the running application and optimize the power and performance metrics. The presented network architecture can be configured as an application- specific topology, while it still holds the benefits of the regular NoC topologies such as modularity and predictable electrical properties. The experimental results show that this method can effectively adapt the NoC to the running application and improve the power consumption and performance of the system.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"1 1","pages":"417-422"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79920797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-10-01DOI: 10.1109/ICCD.2007.4601955
Shu Li, Tong Zhang
Hybrid nanoelectronics are emerging as one viable option to sustain the Moorepsilas Law after the CMOS scaling limit is reached. One main design challenge in hybrid nanoelectronics is the interface (named as demux) between the highly dense nanowires in nanodevice crossbars and relatively coarse microwires in CMOS domain. The prior work on demux design use a single type of devices to realize the demultiplexing function, but hardly provides a satisfactory solution. This work proposes to combine resistor with FET to implement the demux, leading to the so-called hybrid resistor/FET-logic demux. Such hybrid demux architecture can make these two types of devices well complement each other to improve the overall demux design effectiveness. Furthermore, the effects of resistor conductance variability are analyzed and evaluated based on computer simulations.
{"title":"Hybrid resistor/FET-logic demultiplexer architecture design for hybrid CMOS/nanodevice circuits","authors":"Shu Li, Tong Zhang","doi":"10.1109/ICCD.2007.4601955","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601955","url":null,"abstract":"Hybrid nanoelectronics are emerging as one viable option to sustain the Moorepsilas Law after the CMOS scaling limit is reached. One main design challenge in hybrid nanoelectronics is the interface (named as demux) between the highly dense nanowires in nanodevice crossbars and relatively coarse microwires in CMOS domain. The prior work on demux design use a single type of devices to realize the demultiplexing function, but hardly provides a satisfactory solution. This work proposes to combine resistor with FET to implement the demux, leading to the so-called hybrid resistor/FET-logic demux. Such hybrid demux architecture can make these two types of devices well complement each other to improve the overall demux design effectiveness. Furthermore, the effects of resistor conductance variability are analyzed and evaluated based on computer simulations.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"27 1","pages":"574-579"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83690749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-10-01DOI: 10.1109/ICCD.2007.4601879
Subramanian Ramaswamy, S. Yalamanchili
In this paper we propose techniques to dynamically downsize or upsize a cache accompanied by cache set/line shutdown to produce efficient caches. Unlike previous approaches, resizing is accompanied by a non-uniform remapping of memory into the resized cache, thus avoiding misses to sets/lines that are shut off. The paper first provides an analysis into the causes of energy inefficiencies revealing a simple model for improving efficiency. Based on this model we propose the concept of "folding" - memory regions mapping to disjoint cache resources are combined to share cache sets producing a new placement function. Folding enables powering down cache sets at the expense of possibly increasing conflict misses. Effective folding heuristics can substantially increase energy efficiency at the expense of acceptable increase in execution time. We target the 12 cache because of its larger size and greater energy consumption. Our techniques increase cache energy efficiency by 20%, and reduce the EDP (energy delay product) by up to 45% with an IPC degradation of less than 4%. The results also indicate opportunity for improving cache efficiencies further via cooperative compiler interactions.
{"title":"Improving cache efficiency via resizing + remapping","authors":"Subramanian Ramaswamy, S. Yalamanchili","doi":"10.1109/ICCD.2007.4601879","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601879","url":null,"abstract":"In this paper we propose techniques to dynamically downsize or upsize a cache accompanied by cache set/line shutdown to produce efficient caches. Unlike previous approaches, resizing is accompanied by a non-uniform remapping of memory into the resized cache, thus avoiding misses to sets/lines that are shut off. The paper first provides an analysis into the causes of energy inefficiencies revealing a simple model for improving efficiency. Based on this model we propose the concept of \"folding\" - memory regions mapping to disjoint cache resources are combined to share cache sets producing a new placement function. Folding enables powering down cache sets at the expense of possibly increasing conflict misses. Effective folding heuristics can substantially increase energy efficiency at the expense of acceptable increase in execution time. We target the 12 cache because of its larger size and greater energy consumption. Our techniques increase cache energy efficiency by 20%, and reduce the EDP (energy delay product) by up to 45% with an IPC degradation of less than 4%. The results also indicate opportunity for improving cache efficiencies further via cooperative compiler interactions.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"12 1","pages":"47-54"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83623566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-10-01DOI: 10.1109/ICCD.2007.4601954
A. Namazi, M. Nourani
In this paper, we propose a distributed voting strategy to design a robust NMR system. We show that using inexpensive current-based drivers and buffers, we can completely eliminate the centralized voter unit and do the majority voting among N modules in a distributed fashion. Our strategy achieves high reliability that is vital for future nano systems in which high defect rate is expected. Experimental results are also reported to verify the concept, clarify the design procedure and measure the system's reliability.
{"title":"Distributed voting for fault-tolerant nanoscale systems","authors":"A. Namazi, M. Nourani","doi":"10.1109/ICCD.2007.4601954","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601954","url":null,"abstract":"In this paper, we propose a distributed voting strategy to design a robust NMR system. We show that using inexpensive current-based drivers and buffers, we can completely eliminate the centralized voter unit and do the majority voting among N modules in a distributed fashion. Our strategy achieves high reliability that is vital for future nano systems in which high defect rate is expected. Experimental results are also reported to verify the concept, clarify the design procedure and measure the system's reliability.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"1 1","pages":"568-573"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86418302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-10-01DOI: 10.1109/ICCD.2007.4601907
H. Homayoun, A. Veidenbaum
Leakage power has grown significantly and is a major challenge in microprocessor design. Leakage is the dominant power component in second-level (L2) caches. This paper presents two architectural techniques to utilize leakage reduction circuits in L2 caches. They primarily target the leakage in the peripheral circuitry of an L2 cache and as such have to be able to cope with longer delays. One technique exploits the fact that processor activity decreases significantly after an L2 cache miss occurs and saves power during L2 miss service time. Two algorithms, a static one and an adaptive one, are proposed for deciding when to apply this leakage reduction technique. Another technique attempts to keep the peripheral circuits in a lower-power state most of the time. The results for SPEC2K benchmarks show that the first technique can achieve a 18 to 22% reduction in L2 power consumption, on average (and up to 63%), depending on the decision algorithm. The second technique can save 25%, on average (and up to 80%). This comes with a negligible 1 to 2% performance impact, on average, depending on the technique used.
{"title":"Reducing leakage power in peripheral circuits of L2 caches","authors":"H. Homayoun, A. Veidenbaum","doi":"10.1109/ICCD.2007.4601907","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601907","url":null,"abstract":"Leakage power has grown significantly and is a major challenge in microprocessor design. Leakage is the dominant power component in second-level (L2) caches. This paper presents two architectural techniques to utilize leakage reduction circuits in L2 caches. They primarily target the leakage in the peripheral circuitry of an L2 cache and as such have to be able to cope with longer delays. One technique exploits the fact that processor activity decreases significantly after an L2 cache miss occurs and saves power during L2 miss service time. Two algorithms, a static one and an adaptive one, are proposed for deciding when to apply this leakage reduction technique. Another technique attempts to keep the peripheral circuits in a lower-power state most of the time. The results for SPEC2K benchmarks show that the first technique can achieve a 18 to 22% reduction in L2 power consumption, on average (and up to 63%), depending on the decision algorithm. The second technique can save 25%, on average (and up to 80%). This comes with a negligible 1 to 2% performance impact, on average, depending on the technique used.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"5 1","pages":"230-237"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90554162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-10-01DOI: 10.1109/ICCD.2007.4601895
A. H. Gholamipour, E. Bozorgzadeh, Sudarshan Banerjee
In this paper, we present co-processor selection problem for minimum energy consumption in hw/sw co-design on FPGAs with dual power mode. We provide theoretical analysis for the problem under no constraint, resource constraint, and timing constraint. We prove that the complexity of the problem in each case is NP-Hard and we provide a generalized ILP formulation. We compared the result of our approach in minimizing energy to the result of other approaches that had not considered both static and dynamic power during optimization and we showed that we can reduce energy by 63% in some cases.
{"title":"Energy-aware co-processor selection for embedded processors on FPGAs","authors":"A. H. Gholamipour, E. Bozorgzadeh, Sudarshan Banerjee","doi":"10.1109/ICCD.2007.4601895","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601895","url":null,"abstract":"In this paper, we present co-processor selection problem for minimum energy consumption in hw/sw co-design on FPGAs with dual power mode. We provide theoretical analysis for the problem under no constraint, resource constraint, and timing constraint. We prove that the complexity of the problem in each case is NP-Hard and we provide a generalized ILP formulation. We compared the result of our approach in minimizing energy to the result of other approaches that had not considered both static and dynamic power during optimization and we showed that we can reduce energy by 63% in some cases.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"413 1","pages":"158-163"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79214170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-10-01DOI: 10.1109/ICCD.2007.4601908
Fei Gao, Hanyu Cui, S. Sair
Data prefetching has been shown to be an effective tool in hiding part of the latency associated with cache misses in modern processors. Traditionally, data prefetchers fetch data into a small prefetch buffer near the LI for low latency, or the L2 cache for greater coverage and less cache pollution. However, with the L1-L2 cache speed gap growing, significant performance gains can be obtained if the data pref etcher can operate as aggressively as an L2-level pref etcher but with the fast hit times of an LI-level pref etcher. In this paper, we propose a prefetching framework where an LI-level prefetcher and an L2- level prefetcher work cooperatively to reduce the average access time more than either one alone can. We evaluate several design alternatives suited to perform synergistically under different workloads. From the insight we gather from this analysis, we propose a confidence-based adaptive prefetcher that can improve prefetch efficiency significantly with judicious use of available bus bandwidth. Our results show that for certain prefetcher combinations, two- level prefetching can achieve the cumulative speedup attained from either prefetcher alone. Furthermore, when compared to other two-level prefetching models, the adaptive design provides similar speedups with appreciably less bus traffic.
{"title":"Two-level ata prefetching","authors":"Fei Gao, Hanyu Cui, S. Sair","doi":"10.1109/ICCD.2007.4601908","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601908","url":null,"abstract":"Data prefetching has been shown to be an effective tool in hiding part of the latency associated with cache misses in modern processors. Traditionally, data prefetchers fetch data into a small prefetch buffer near the LI for low latency, or the L2 cache for greater coverage and less cache pollution. However, with the L1-L2 cache speed gap growing, significant performance gains can be obtained if the data pref etcher can operate as aggressively as an L2-level pref etcher but with the fast hit times of an LI-level pref etcher. In this paper, we propose a prefetching framework where an LI-level prefetcher and an L2- level prefetcher work cooperatively to reduce the average access time more than either one alone can. We evaluate several design alternatives suited to perform synergistically under different workloads. From the insight we gather from this analysis, we propose a confidence-based adaptive prefetcher that can improve prefetch efficiency significantly with judicious use of available bus bandwidth. Our results show that for certain prefetcher combinations, two- level prefetching can achieve the cumulative speedup attained from either prefetcher alone. Furthermore, when compared to other two-level prefetching models, the adaptive design provides similar speedups with appreciably less bus traffic.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"36 1","pages":"238-244"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77428284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}