Pub Date : 2011-07-25DOI: 10.1109/IGCC.2011.6008565
Mateusz Berezecki, E. Frachtenberg, Mike Paleczny, K. Steele
Scaling data centers to handle task-parallel work-loads requires balancing the cost of hardware, operations, and power. Low-power, low-core-count servers reduce costs in one of these dimensions, but may require additional nodes to provide the required quality of service or increase costs by under-utilizing memory and other resources. We show that the throughput, response time, and power consumption of a high-core-count processor operating at a low clock rate and very low power consumption can perform well when compared to a platform using faster but fewer commodity cores. Specific measurements are made for a key-value store, Memcached, using a variety of systems based on three different processors: the 4-core Intel Xeon L5520, 8-core AMD Opteron 6128 HE, and 64-core Tilera TILEPro64.
{"title":"Many-core key-value store","authors":"Mateusz Berezecki, E. Frachtenberg, Mike Paleczny, K. Steele","doi":"10.1109/IGCC.2011.6008565","DOIUrl":"https://doi.org/10.1109/IGCC.2011.6008565","url":null,"abstract":"Scaling data centers to handle task-parallel work-loads requires balancing the cost of hardware, operations, and power. Low-power, low-core-count servers reduce costs in one of these dimensions, but may require additional nodes to provide the required quality of service or increase costs by under-utilizing memory and other resources. We show that the throughput, response time, and power consumption of a high-core-count processor operating at a low clock rate and very low power consumption can perform well when compared to a platform using faster but fewer commodity cores. Specific measurements are made for a key-value store, Memcached, using a variety of systems based on three different processors: the 4-core Intel Xeon L5520, 8-core AMD Opteron 6128 HE, and 64-core Tilera TILEPro64.","PeriodicalId":306876,"journal":{"name":"2011 International Green Computing Conference and Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130106603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-07-25DOI: 10.1109/IGCC.2011.6008580
Hadi Hajimiri, Kamran Rahmani, P. Mishra
Optimization techniques are widely used in embedded systems design to improve overall area, performance and energy requirements. Dynamic cache reconfiguration is very effective to reduce energy consumption of cache subsystems which accounts for about half of the total energy consumption in embedded systems. Various studies have shown that code compression can significantly reduce memory requirements, and may improve performance in many scenarios. In this paper, we study the challenges and associated opportunities in integrating dynamic cache reconfiguration with code compression to retain the advantages of both approaches. Experimental results demonstrate that synergistic combination of cache reconfiguration and code compression can significantly reduce both energy consumption (65% on average) and memory requirements while drastically improve the overall performance (up to 75%) compared to dynamic cache reconfiguration alone.
{"title":"Synergistic integration of dynamic cache reconfiguration and code compression in embedded systems","authors":"Hadi Hajimiri, Kamran Rahmani, P. Mishra","doi":"10.1109/IGCC.2011.6008580","DOIUrl":"https://doi.org/10.1109/IGCC.2011.6008580","url":null,"abstract":"Optimization techniques are widely used in embedded systems design to improve overall area, performance and energy requirements. Dynamic cache reconfiguration is very effective to reduce energy consumption of cache subsystems which accounts for about half of the total energy consumption in embedded systems. Various studies have shown that code compression can significantly reduce memory requirements, and may improve performance in many scenarios. In this paper, we study the challenges and associated opportunities in integrating dynamic cache reconfiguration with code compression to retain the advantages of both approaches. Experimental results demonstrate that synergistic combination of cache reconfiguration and code compression can significantly reduce both energy consumption (65% on average) and memory requirements while drastically improve the overall performance (up to 75%) compared to dynamic cache reconfiguration alone.","PeriodicalId":306876,"journal":{"name":"2011 International Green Computing Conference and Workshops","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124821489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-07-25DOI: 10.1109/IGCC.2011.6008547
M. Mostowfi, Ken Christensen
Small or home office (SOHO) Ethernet LAN switches consume about 8 TWh per year in the U.S. alone. Despite normally low traffic load and numerous periods of idleness, these switches typically stay fully powered-on at all times. With the standardization of Energy Efficient Ethernet (EEE), Ethernet interfaces can be put into a Low Power Idle (LPI) mode during idle periods when there are no packets to transmit. This paper proposes and evaluates a new EEE policy of synchronous coalescing of packets in network hosts and edge routers. This policy provides extended idle periods for all ports of a LAN switch and thus enables energy savings deeper than in the Ethernet PHY only. We evaluate our method using an ns-2 simulation model of a LAN switch. We show that our method can reduce the overall energy use of a LAN switch by about 40%, while introducing limited and controlled effects on typical Internet traffic and TCP.
仅在美国,小型或家庭办公室(SOHO)以太网LAN交换机每年就消耗约8太瓦时。尽管流量负载通常很低,并且有很多空闲时间,但这些交换机通常始终保持完全通电状态。随着EEE (Energy Efficient Ethernet)标准的标准化,以太网接口可以在无报文传输的空闲时段进入LPI (Low Power Idle)模式。本文提出并评价了一种新的EEE策略,即在网络主机和边缘路由器中同步合并数据包。此策略为LAN交换机的所有端口提供了延长的空闲时间,从而比仅在以太网PHY中节省更多的能源。我们使用局域网交换机的ns-2仿真模型来评估我们的方法。我们表明,我们的方法可以将LAN交换机的总能耗降低约40%,同时对典型的互联网流量和TCP引入有限和可控的影响。
{"title":"Saving energy in LAN switches: New methods of packet coalescing for Energy Efficient Ethernet","authors":"M. Mostowfi, Ken Christensen","doi":"10.1109/IGCC.2011.6008547","DOIUrl":"https://doi.org/10.1109/IGCC.2011.6008547","url":null,"abstract":"Small or home office (SOHO) Ethernet LAN switches consume about 8 TWh per year in the U.S. alone. Despite normally low traffic load and numerous periods of idleness, these switches typically stay fully powered-on at all times. With the standardization of Energy Efficient Ethernet (EEE), Ethernet interfaces can be put into a Low Power Idle (LPI) mode during idle periods when there are no packets to transmit. This paper proposes and evaluates a new EEE policy of synchronous coalescing of packets in network hosts and edge routers. This policy provides extended idle periods for all ports of a LAN switch and thus enables energy savings deeper than in the Ethernet PHY only. We evaluate our method using an ns-2 simulation model of a LAN switch. We show that our method can reduce the overall energy use of a LAN switch by about 40%, while introducing limited and controlled effects on typical Internet traffic and TCP.","PeriodicalId":306876,"journal":{"name":"2011 International Green Computing Conference and Workshops","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121961371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-07-25DOI: 10.1109/IGCC.2011.6008589
Domenic Forte, Ankur Srivastava
As the popularity of video streaming and sharing over the Internet grows, energy consumption in video server environments increases as well. This paper discusses how energy consumption is a critical concern for video servers and can limit their throughput. We investigate the energy consumption of storage components (disks) in video servers and propose two ways to reduce it for better throughput. First, we organize data on disks in a way that allows easier access by exploiting the inherent properties of video access patterns and giving priority to more important video data. Second, the prioritized ordering of video data allows the server to retrieve only the data required to meet quality of service agreements in an energy efficient way. Therefore, the video quality delivered to clients may be scaled in order to service more concurrent requests and/or reduce energy consumption. Results show that our strategies can increase the number of clients served by as much as 114% when compared to conventional approaches while also meeting constraints on video quality and energy consumption.
{"title":"Energy-aware video storage and retrieval in server environments","authors":"Domenic Forte, Ankur Srivastava","doi":"10.1109/IGCC.2011.6008589","DOIUrl":"https://doi.org/10.1109/IGCC.2011.6008589","url":null,"abstract":"As the popularity of video streaming and sharing over the Internet grows, energy consumption in video server environments increases as well. This paper discusses how energy consumption is a critical concern for video servers and can limit their throughput. We investigate the energy consumption of storage components (disks) in video servers and propose two ways to reduce it for better throughput. First, we organize data on disks in a way that allows easier access by exploiting the inherent properties of video access patterns and giving priority to more important video data. Second, the prioritized ordering of video data allows the server to retrieve only the data required to meet quality of service agreements in an energy efficient way. Therefore, the video quality delivered to clients may be scaled in order to service more concurrent requests and/or reduce energy consumption. Results show that our strategies can increase the number of clients served by as much as 114% when compared to conventional approaches while also meeting constraints on video quality and energy consumption.","PeriodicalId":306876,"journal":{"name":"2011 International Green Computing Conference and Workshops","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116934271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-07-25DOI: 10.1109/IGCC.2011.6008573
S. Kandlikar, D. Kudithipudi, C. Rubio-Jimenez
Three-dimensional (3D) integrated circuits (IC) impose several challenges in thermal management. Stacking vertical layers significantly increases the heat dissipation per unit volume and the thermal footprint per unit area. The internal layers in the stacks are susceptible to high thermal gradients due to the low thermal conductivity interfaces and the distance from the heat sink. Several factors affect the thermal behavior of the 3D IC, including the through silicon vias, bonding, and cooling mechanisms. In this paper, we provide a detailed review of existing cooling mechanisms and their applicability to 3D ICs. We also propose two parameters to account for the thermal interactions among the devices and stack layers for incorporation in the 3D IC cooling system design: Thermal Intensification Factor (TIF) accounts for the increased heat flux due to multiple ICs along the heat transfer path, and Thermal Derating Factor (TDF) accounts for the increased thermal resistance introduced by the multiple layers. Also, a new flow passage design with variable fin density is presented to reduce the surface temperature non-uniformity along the coolant flow length.
{"title":"Cooling mechanisms in 3D ICs: Thermo-mechanical perspective","authors":"S. Kandlikar, D. Kudithipudi, C. Rubio-Jimenez","doi":"10.1109/IGCC.2011.6008573","DOIUrl":"https://doi.org/10.1109/IGCC.2011.6008573","url":null,"abstract":"Three-dimensional (3D) integrated circuits (IC) impose several challenges in thermal management. Stacking vertical layers significantly increases the heat dissipation per unit volume and the thermal footprint per unit area. The internal layers in the stacks are susceptible to high thermal gradients due to the low thermal conductivity interfaces and the distance from the heat sink. Several factors affect the thermal behavior of the 3D IC, including the through silicon vias, bonding, and cooling mechanisms. In this paper, we provide a detailed review of existing cooling mechanisms and their applicability to 3D ICs. We also propose two parameters to account for the thermal interactions among the devices and stack layers for incorporation in the 3D IC cooling system design: Thermal Intensification Factor (TIF) accounts for the increased heat flux due to multiple ICs along the heat transfer path, and Thermal Derating Factor (TDF) accounts for the increased thermal resistance introduced by the multiple layers. Also, a new flow passage design with variable fin density is presented to reduce the surface temperature non-uniformity along the coolant flow length.","PeriodicalId":306876,"journal":{"name":"2011 International Green Computing Conference and Workshops","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127760485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-07-01DOI: 10.1109/IGCC.2011.6008594
H. Anzt, V. Heuveline, J. Aliaga, María Isabel Castillo, J. C. Fernández, R. Mayo, E. S. Quintana‐Ortí
Energy efficiency is a major concern in modern high-performance-computing. Still, few studies provide a deep insight into the power consumption of scientific applications. Especially for algorithms running on hybrid platforms equipped with hardware accelerators, like graphics processors, a detailed energy analysis is essential to identify the most costly parts, and to evaluate possible improvement strategies. In this paper we analyze the computational and power performance of iterative linear solvers applied to sparse systems arising in several scientific applications. We also study the gains yield by dynamic voltage/frequency scaling (DVFS), and illustrate that this technique alone cannot to reduce the energy cost to a considerable amount for iterative linear solvers. We then apply techniques that set the (multi-core processor in the) host system to a low-consuming state for the time that the GPU is executing. Our experiments conclusively reveal how the combination of these two techniques deliver a notable reduction of energy consumption without a noticeable impact on computational performance.
{"title":"Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms","authors":"H. Anzt, V. Heuveline, J. Aliaga, María Isabel Castillo, J. C. Fernández, R. Mayo, E. S. Quintana‐Ortí","doi":"10.1109/IGCC.2011.6008594","DOIUrl":"https://doi.org/10.1109/IGCC.2011.6008594","url":null,"abstract":"Energy efficiency is a major concern in modern high-performance-computing. Still, few studies provide a deep insight into the power consumption of scientific applications. Especially for algorithms running on hybrid platforms equipped with hardware accelerators, like graphics processors, a detailed energy analysis is essential to identify the most costly parts, and to evaluate possible improvement strategies. In this paper we analyze the computational and power performance of iterative linear solvers applied to sparse systems arising in several scientific applications. We also study the gains yield by dynamic voltage/frequency scaling (DVFS), and illustrate that this technique alone cannot to reduce the energy cost to a considerable amount for iterative linear solvers. We then apply techniques that set the (multi-core processor in the) host system to a low-consuming state for the time that the GPU is executing. Our experiments conclusively reveal how the combination of these two techniques deliver a notable reduction of energy consumption without a noticeable impact on computational performance.","PeriodicalId":306876,"journal":{"name":"2011 International Green Computing Conference and Workshops","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121423305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}