首页 > 最新文献

2013 IEEE International Symposium on Workload Characterization (IISWC)最新文献

英文 中文
Characterizing multi-threaded applications for designing sharing-aware last-level cache replacement policies 描述多线程应用程序的特征,以设计共享感知的最后一级缓存替换策略
Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704665
R. Natarajan, Mainak Chaudhuri
Recent years have seen a large volume of proposals on managing the shared last-level cache (LLC) of chip-multiprocessors (CMPs). However, most of these proposals primarily focus on reducing the amount of destructive interference between competing independent threads of multi-programmed workloads. While very few of these studies evaluate the proposed policies on shared memory multi-threaded applications, they do not improve constructive cross-thread sharing of data in the LLC In this paper, we characterize a set of multi-threaded applications drawn from the PARSEC, SPEC OMP, and SPLASH-2 suites with the goal of introducing sharing-awareness in LLC replacement policies. We motivate our characterization study by quantifying the potential contributions of the shared and the private blocks toward the overall volume of the LLC hits in these applications and show that the shared blocks are more important than the private blocks. Next, we characterize the amount of sharing-awareness enjoyed by recent proposals compared to the optimal policy. We design and evaluate a generic oracle that can be used in conjunction with any existing policy to quantify the potential improvement that can come from introducing sharing-awareness. The oracle analysis shows that introducing sharing-awareness reduces the number of LLC misses incurred by the least-recently-used (LRU) policy by 6% and 10% on average for a 4MB and 8MB LLC respectively. A realistic implementation of this oracle requires the LLC controller to have the capability to accurately predict, at the time a block is filled into the LLC, whether the block will be shared during its residency in the LLC. We explore the feasibility of designing such a predictor based on the address of the fill and the program counter of the instruction that triggers the fill. Our sharing behavior predictability study of two history-based fill-time predictors that use block addresses and program counters concludes that achieving acceptable levels of accuracy with such predictors will require other architectural and/or high-level program semantic features that have strong correlations with active sharing phases of the LLC blocks.
近年来出现了大量关于管理芯片多处理器(cmp)的共享最后一级缓存(LLC)的建议。然而,这些建议中的大多数主要关注于减少多编程工作负载中竞争的独立线程之间的破坏性干扰。虽然这些研究中很少评估关于共享内存多线程应用程序的建议策略,但它们并没有改善LLC中建设性的跨线程数据共享。在本文中,我们描述了一组来自PARSEC、SPEC OMP和splash2套件的多线程应用程序,目的是在LLC替换策略中引入共享意识。我们通过量化共享区块和私有区块对这些应用程序中LLC命中总量的潜在贡献来激励我们的特征研究,并表明共享区块比私有区块更重要。接下来,我们描述了与最优策略相比,最近的提案所享有的共享意识的数量。我们设计并评估了一个通用的oracle,它可以与任何现有的策略结合使用,以量化引入共享意识所带来的潜在改进。oracle分析表明,对于4MB和8MB的LLC,引入共享感知可以将最近最少使用(least-recently-used, LRU)策略导致的LLC失败数量平均分别减少6%和10%。这个预言的现实实现要求LLC控制器具有准确预测的能力,当一个块被填充到LLC中时,该块在LLC驻留期间是否会被共享。我们探索了基于填充地址和触发填充指令的程序计数器设计这样一个预测器的可行性。我们对使用块地址和程序计数器的两个基于历史的填充时间预测器的共享行为可预测性研究得出结论,使用此类预测器实现可接受的精度水平将需要其他与LLC块的主动共享阶段具有强相关性的架构和/或高级程序语义特征。
{"title":"Characterizing multi-threaded applications for designing sharing-aware last-level cache replacement policies","authors":"R. Natarajan, Mainak Chaudhuri","doi":"10.1109/IISWC.2013.6704665","DOIUrl":"https://doi.org/10.1109/IISWC.2013.6704665","url":null,"abstract":"Recent years have seen a large volume of proposals on managing the shared last-level cache (LLC) of chip-multiprocessors (CMPs). However, most of these proposals primarily focus on reducing the amount of destructive interference between competing independent threads of multi-programmed workloads. While very few of these studies evaluate the proposed policies on shared memory multi-threaded applications, they do not improve constructive cross-thread sharing of data in the LLC In this paper, we characterize a set of multi-threaded applications drawn from the PARSEC, SPEC OMP, and SPLASH-2 suites with the goal of introducing sharing-awareness in LLC replacement policies. We motivate our characterization study by quantifying the potential contributions of the shared and the private blocks toward the overall volume of the LLC hits in these applications and show that the shared blocks are more important than the private blocks. Next, we characterize the amount of sharing-awareness enjoyed by recent proposals compared to the optimal policy. We design and evaluate a generic oracle that can be used in conjunction with any existing policy to quantify the potential improvement that can come from introducing sharing-awareness. The oracle analysis shows that introducing sharing-awareness reduces the number of LLC misses incurred by the least-recently-used (LRU) policy by 6% and 10% on average for a 4MB and 8MB LLC respectively. A realistic implementation of this oracle requires the LLC controller to have the capability to accurately predict, at the time a block is filled into the LLC, whether the block will be shared during its residency in the LLC. We explore the feasibility of designing such a predictor based on the address of the fill and the program counter of the instruction that triggers the fill. Our sharing behavior predictability study of two history-based fill-time predictors that use block addresses and program counters concludes that achieving acceptable levels of accuracy with such predictors will require other architectural and/or high-level program semantic features that have strong correlations with active sharing phases of the LLC blocks.","PeriodicalId":365868,"journal":{"name":"2013 IEEE International Symposium on Workload Characterization (IISWC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126824996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
ACE: Abstracting, characterizing and exploiting datacenter power demands ACE:抽象、表征和开发数据中心的电力需求
Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704669
Di Wang, Chuangang Ren, Sriram Govindan, A. Sivasubramaniam, B. Urgaonkar, A. Kansal, Kushagra Vaid
Peak power management of datacenters has tremendous cost implications. While numerous mechanisms have been proposed to cap power consumption, real datacenter power consumption data is scarce. Prior studies have either used a small set of applications and/or servers, or presented data that is at an aggregate scale from which it is difficult to design and evaluate new and existing optimizations. To address this gap, we collect power measurement data at multiple spatial and fine-grained temporal resolutions from several geo-distributed datacenters of Microsoft corporation over 6 months. We conduct aggregate analysis of this data to study its statistical properties. We find evidence of self-similarity in power demands, statistical multiplexing effects, and correlations with the cooling power that caters to the IT equipment. With workload characterization a key ingredient for systems design and evaluation, we note the importance of better abstractions for capturing power demands, in the form of peaks and valleys. We identify attributes for peaks and valleys, and important correlations across these attributes that can influence the choice and effectiveness of different power capping techniques. We characterize these attributes and their correlations, showing the burstiness of small duration peaks, and the importance of not ignoring the rare but more stringent or long peaks. The correlations between peaks and valleys suggest the need for techniques to aggregate and collectively handle them. With the wide scope of exploitability of such characteristics for power provisioning and optimizations, we illustrate its benefits with two specific case studies. The first shows how peaks can be differentially handled based on our peak and valley characterization using existing approaches, rather than a one-size-fits-all solution. The second illustrates a simple capacity provisioning strategy for energy storage using the peak and valley characteristics.
数据中心的峰值电源管理具有巨大的成本影响。虽然已经提出了许多机制来限制功耗,但实际的数据中心功耗数据很少。之前的研究要么使用了一小部分应用程序和/或服务器,要么提供了规模庞大的数据,因此很难设计和评估新的和现有的优化。为了解决这一差距,我们在6个月内从微软公司的几个地理分布式数据中心收集了多个空间和细粒度时间分辨率的功率测量数据。我们对这些数据进行汇总分析,以研究其统计特性。我们发现在电力需求、统计多路复用效应以及与迎合IT设备的冷却功率的相关性方面存在自相似性的证据。由于工作负载表征是系统设计和评估的关键因素,我们注意到以峰值和低谷的形式捕获功率需求的更好抽象的重要性。我们确定了峰值和低谷的属性,以及这些属性之间的重要相关性,这些属性可以影响不同功率封顶技术的选择和有效性。我们描述了这些属性及其相关性,显示了小持续峰值的突发性,以及不忽视罕见但更严格或更长的峰值的重要性。波峰和波谷之间的相关性表明,需要有技术来聚合和共同处理它们。由于这些特性可以广泛地用于电力供应和优化,我们通过两个具体的案例研究来说明它的好处。第一个示例展示了如何使用现有方法根据我们的峰值和低谷特征来区别处理峰值,而不是采用一刀切的解决方案。第二部分说明了一种简单的利用峰谷特性的储能容量配置策略。
{"title":"ACE: Abstracting, characterizing and exploiting datacenter power demands","authors":"Di Wang, Chuangang Ren, Sriram Govindan, A. Sivasubramaniam, B. Urgaonkar, A. Kansal, Kushagra Vaid","doi":"10.1109/IISWC.2013.6704669","DOIUrl":"https://doi.org/10.1109/IISWC.2013.6704669","url":null,"abstract":"Peak power management of datacenters has tremendous cost implications. While numerous mechanisms have been proposed to cap power consumption, real datacenter power consumption data is scarce. Prior studies have either used a small set of applications and/or servers, or presented data that is at an aggregate scale from which it is difficult to design and evaluate new and existing optimizations. To address this gap, we collect power measurement data at multiple spatial and fine-grained temporal resolutions from several geo-distributed datacenters of Microsoft corporation over 6 months. We conduct aggregate analysis of this data to study its statistical properties. We find evidence of self-similarity in power demands, statistical multiplexing effects, and correlations with the cooling power that caters to the IT equipment. With workload characterization a key ingredient for systems design and evaluation, we note the importance of better abstractions for capturing power demands, in the form of peaks and valleys. We identify attributes for peaks and valleys, and important correlations across these attributes that can influence the choice and effectiveness of different power capping techniques. We characterize these attributes and their correlations, showing the burstiness of small duration peaks, and the importance of not ignoring the rare but more stringent or long peaks. The correlations between peaks and valleys suggest the need for techniques to aggregate and collectively handle them. With the wide scope of exploitability of such characteristics for power provisioning and optimizations, we illustrate its benefits with two specific case studies. The first shows how peaks can be differentially handled based on our peak and valley characterization using existing approaches, rather than a one-size-fits-all solution. The second illustrates a simple capacity provisioning strategy for energy storage using the peak and valley characteristics.","PeriodicalId":365868,"journal":{"name":"2013 IEEE International Symposium on Workload Characterization (IISWC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131032263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Platform-independent analysis of function-level communication in workloads 工作负载中功能级通信的平台独立分析
Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704685
Siddharth Nilakantan, Mark Hempstead
The emergence of many-core and heterogeneous multicore processors has meant that data communication patterns increasingly determine application performance. Microprocessor designers need tools that can extract and represent these producer-consumer relationships for a workload to aid them in a wide range of tasks including hardware-software co-design, software partitioning, and application performance optimization. This paper presents Sigil, a profiling tool that can extract communication patterns within a workload independent of hardware characteristics. We show how our methodology can extract the true costs of communication within a workload by distinguishing between unique, local, and total communication. We describe the implementation and performance of Sigil as well as the results of several case studies.
多核和异构多核处理器的出现意味着数据通信模式越来越多地决定着应用程序的性能。微处理器设计人员需要能够为工作负载提取和表示这些生产者-消费者关系的工具,以帮助他们完成广泛的任务,包括硬件软件协同设计,软件分区和应用程序性能优化。本文介绍了一个分析工具Sigil,它可以在独立于硬件特征的工作负载中提取通信模式。我们将展示我们的方法如何通过区分唯一通信、本地通信和总通信来提取工作负载中通信的真实成本。我们描述了Sigil的实现和性能,以及几个案例研究的结果。
{"title":"Platform-independent analysis of function-level communication in workloads","authors":"Siddharth Nilakantan, Mark Hempstead","doi":"10.1109/IISWC.2013.6704685","DOIUrl":"https://doi.org/10.1109/IISWC.2013.6704685","url":null,"abstract":"The emergence of many-core and heterogeneous multicore processors has meant that data communication patterns increasingly determine application performance. Microprocessor designers need tools that can extract and represent these producer-consumer relationships for a workload to aid them in a wide range of tasks including hardware-software co-design, software partitioning, and application performance optimization. This paper presents Sigil, a profiling tool that can extract communication patterns within a workload independent of hardware characteristics. We show how our methodology can extract the true costs of communication within a workload by distinguishing between unique, local, and total communication. We describe the implementation and performance of Sigil as well as the results of several case studies.","PeriodicalId":365868,"journal":{"name":"2013 IEEE International Symposium on Workload Characterization (IISWC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131881555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
iBench: Quantifying interference for datacenter applications iBench:量化数据中心应用程序的干扰
Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704667
Christina Delimitrou, C. Kozyrakis
Interference between co-scheduled applications is one of the major reasons that causes modern datacenters (DCs) to operate at low utilization. DC operators traditionally side-step interference either by disallowing colocation altogether and providing isolated server instances, or by requiring the users to express resource reservations, which are often exaggerated to counter-balance the unpredictability in the quality of allocated resources. Understanding, reducing and managing interference can significantly impact the manner in which these large-scale systems operate. We present iBench, a novel workload suite that helps quantify the pressure different applications put in various shared resources, and similarly the pressure they can tolerate in these resources. iBench consists of a set of carefully-crafted benchmarks that induce interference of increasing intensity in resources that span the CPU, cache hierarchy, memory, storage and networking subsystems. We first validate the effect that iBench workloads have on performance against a wide spectrum of DC applications. Then, we use iBench to demonstrate the importance of considering interference in a set of challenging problems that range from DC scheduling and server provisioning, to resource-efficient application development and scheduling for heterogeneous CMPs. In all cases quantifying interference with iBench results in significant performance and/or efficiency improvements. We plan to release iBench under a free software license.
共同调度的应用程序之间的干扰是导致现代数据中心(dc)以低利用率运行的主要原因之一。传统上,数据中心运营商通过完全不允许主机托管并提供隔离的服务器实例,或要求用户表示资源预留来避免干扰,这通常被夸大,以抵消分配资源质量的不可预测性。理解、减少和管理干扰可以显著影响这些大型系统的运行方式。我们介绍了iBench,这是一个新的工作负载套件,可以帮助量化不同应用程序对各种共享资源施加的压力,以及它们在这些资源中可以承受的压力。iBench由一组精心设计的基准测试组成,这些基准测试会对CPU、缓存层次结构、内存、存储和网络子系统的资源强度增加产生干扰。我们首先验证iBench工作负载对各种数据中心应用程序性能的影响。然后,我们使用iBench来演示在一系列具有挑战性的问题中考虑干扰的重要性,这些问题从数据中心调度和服务器供应,到异构cmp的资源高效应用程序开发和调度。在所有情况下,量化iBench的干扰会显著提高性能和/或效率。我们计划在自由软件许可下发布iBench。
{"title":"iBench: Quantifying interference for datacenter applications","authors":"Christina Delimitrou, C. Kozyrakis","doi":"10.1109/IISWC.2013.6704667","DOIUrl":"https://doi.org/10.1109/IISWC.2013.6704667","url":null,"abstract":"Interference between co-scheduled applications is one of the major reasons that causes modern datacenters (DCs) to operate at low utilization. DC operators traditionally side-step interference either by disallowing colocation altogether and providing isolated server instances, or by requiring the users to express resource reservations, which are often exaggerated to counter-balance the unpredictability in the quality of allocated resources. Understanding, reducing and managing interference can significantly impact the manner in which these large-scale systems operate. We present iBench, a novel workload suite that helps quantify the pressure different applications put in various shared resources, and similarly the pressure they can tolerate in these resources. iBench consists of a set of carefully-crafted benchmarks that induce interference of increasing intensity in resources that span the CPU, cache hierarchy, memory, storage and networking subsystems. We first validate the effect that iBench workloads have on performance against a wide spectrum of DC applications. Then, we use iBench to demonstrate the importance of considering interference in a set of challenging problems that range from DC scheduling and server provisioning, to resource-efficient application development and scheduling for heterogeneous CMPs. In all cases quantifying interference with iBench results in significant performance and/or efficiency improvements. We plan to release iBench under a free software license.","PeriodicalId":365868,"journal":{"name":"2013 IEEE International Symposium on Workload Characterization (IISWC)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128595921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 106
Quantifying the energy cost of data movement in scientific applications 量化科学应用中数据移动的能量成本
Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704670
Gokcen Kestor, R. Gioiosa, D. Kerbyson, A. Hoisie
In the exascale era, the energy cost of moving data across the memory hierarchy is expected to be two orders of magnitude higher than the cost of performing a double-precision floating point operation. Despite its importance, the energy cost of data movement in scientific applications has not be quantitatively evaluated even for current systems.
在百亿亿次时代,跨内存层次移动数据的能源成本预计将比执行双精度浮点运算的成本高出两个数量级。尽管它很重要,但即使对于当前的系统,科学应用中数据移动的能源成本也没有得到定量评估。
{"title":"Quantifying the energy cost of data movement in scientific applications","authors":"Gokcen Kestor, R. Gioiosa, D. Kerbyson, A. Hoisie","doi":"10.1109/IISWC.2013.6704670","DOIUrl":"https://doi.org/10.1109/IISWC.2013.6704670","url":null,"abstract":"In the exascale era, the energy cost of moving data across the memory hierarchy is expected to be two orders of magnitude higher than the cost of performing a double-precision floating point operation. Despite its importance, the energy cost of data movement in scientific applications has not be quantitatively evaluated even for current systems.","PeriodicalId":365868,"journal":{"name":"2013 IEEE International Symposium on Workload Characterization (IISWC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130010369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 110
On the performance and energy-efficiency of multi-core SIMD CPUs and CUDA-enabled GPUs 多核SIMD cpu和支持cuda的gpu的性能和能效
Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704683
Ronald Duarte, Resit Sendag, F. J. Vetter
This paper explores the performance and energy efficiency of CUDA-enabled GPUs and multi-core SIMD CPUs using a set of kernels and full applications. Our implementations efficiently exploit both SIMD and thread-level parallelism on multi-core CPUs and the computational capabilities of CUDA-enabled GPUs. We discuss general optimization techniques for our CPU-only and CPU-GPU platforms. To fairly study performance and energy-efficiency, we also used two applications which utilize several kernels. Finally, we present an evaluation of the implementation effort required to efficiently utilize multi-core SIMD CPUs and CUDA-enabled GPUs for the benchmarks studied. Our results show that kernel-only performance and energy-efficiency could be misleading when evaluating parallel hardware; therefore, true results must be obtained using full applications. We show that, after all respective optimizations have been made, the best performing and energy-efficient platform varies for different benchmarks. Finally, our results show that PPEH (Performance gain Per Effort Hours), our newly introduced metric, can affectively be used to quantify efficiency of implementation effort across different benchmarks and platforms.
本文使用一组内核和完整的应用程序探讨了支持cuda的gpu和多核SIMD cpu的性能和能源效率。我们的实现有效地利用多核cpu上的SIMD和线程级并行性以及支持cuda的gpu的计算能力。我们讨论了CPU-only和CPU-GPU平台的一般优化技术。为了公平地研究性能和能源效率,我们还使用了两个使用多个内核的应用程序。最后,我们对有效利用多核SIMD cpu和支持cuda的gpu进行基准测试所需的实现工作进行了评估。我们的结果表明,在评估并行硬件时,仅内核的性能和能源效率可能会产生误导;因此,必须使用完整的应用程序获得真实的结果。我们表明,在进行了所有相应的优化之后,对于不同的基准测试,最佳性能和节能平台是不同的。最后,我们的结果表明,我们新引入的指标PPEH (Per Effort Hours Performance gain)可以有效地用于量化不同基准测试和平台上实现工作的效率。
{"title":"On the performance and energy-efficiency of multi-core SIMD CPUs and CUDA-enabled GPUs","authors":"Ronald Duarte, Resit Sendag, F. J. Vetter","doi":"10.1109/IISWC.2013.6704683","DOIUrl":"https://doi.org/10.1109/IISWC.2013.6704683","url":null,"abstract":"This paper explores the performance and energy efficiency of CUDA-enabled GPUs and multi-core SIMD CPUs using a set of kernels and full applications. Our implementations efficiently exploit both SIMD and thread-level parallelism on multi-core CPUs and the computational capabilities of CUDA-enabled GPUs. We discuss general optimization techniques for our CPU-only and CPU-GPU platforms. To fairly study performance and energy-efficiency, we also used two applications which utilize several kernels. Finally, we present an evaluation of the implementation effort required to efficiently utilize multi-core SIMD CPUs and CUDA-enabled GPUs for the benchmarks studied. Our results show that kernel-only performance and energy-efficiency could be misleading when evaluating parallel hardware; therefore, true results must be obtained using full applications. We show that, after all respective optimizations have been made, the best performing and energy-efficient platform varies for different benchmarks. Finally, our results show that PPEH (Performance gain Per Effort Hours), our newly introduced metric, can affectively be used to quantify efficiency of implementation effort across different benchmarks and platforms.","PeriodicalId":365868,"journal":{"name":"2013 IEEE International Symposium on Workload Characterization (IISWC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126282703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
WiBench: An open source kernel suite for benchmarking wireless systems WiBench:用于对无线系统进行基准测试的开源内核套件
Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704678
Qi Zheng, Yajing Chen, R. Dreslinski, C. Chakrabarti, A. Anastasopoulos, S. Mahlke, T. Mudge
The rapid growth in the number of mobile devices and the higher data rate requirements of mobile subscribers have made wireless signal processing a key driving application of mobile computing technology. To design better mobile platforms and the supporting wireless infrastructure, it is very important for computer architects and system designers to understand and characterize the performance of existing and upcoming wireless protocols. In this paper, we present a newly developed open-source benchmark suite called WiBench. It consists of a wide range of signal processing kernels used in many mainstream standards such as 802.11, WCDMA and LTE. The kernels include FFT/IFFT, MIMO, channel estimation, channel coding, constellation mapping, etc. Each kernel is a self-contained configurable block which can be tuned to meet the different system requirements. Several standard channel models have also been included to study system performance, such as the bit error rate. The suite also contains an LTE uplink system as a representative example of a wireless system that can be built using these kernels. WiBench is provided in C++ to make it easier for computer architects to profile and analyze the system. We characterize the performance of WiBench to illustrate how it can be used to guide hardware system design. Architectural analyses on each individual kernel and on the entire LTE uplink are performed, indicating the hotspots, available parallelism, and runtime performance. Finally, a MATLAB version is also included for debugging purposes.
移动设备数量的快速增长和移动用户对数据速率的更高要求使得无线信号处理成为移动计算技术的关键驱动应用。为了设计更好的移动平台和支持的无线基础设施,计算机架构师和系统设计人员了解和描述现有和即将推出的无线协议的性能是非常重要的。在本文中,我们提出了一个新开发的开源基准套件WiBench。它由许多主流标准(如802.11、WCDMA和LTE)中使用的广泛的信号处理内核组成。内核包括FFT/IFFT、MIMO、信道估计、信道编码、星座映射等。每个内核都是一个自包含的可配置块,可以对其进行调优以满足不同的系统需求。还包括几个标准信道模型来研究系统性能,如误码率。该套件还包含一个LTE上行链路系统,作为可以使用这些内核构建的无线系统的代表性示例。c++中提供了WiBench,使计算机架构师更容易分析系统。我们描述了WiBench的性能,以说明如何使用它来指导硬件系统设计。对每个内核和整个LTE上行链路执行体系结构分析,指出热点、可用并行性和运行时性能。最后,还包括一个MATLAB版本,用于调试目的。
{"title":"WiBench: An open source kernel suite for benchmarking wireless systems","authors":"Qi Zheng, Yajing Chen, R. Dreslinski, C. Chakrabarti, A. Anastasopoulos, S. Mahlke, T. Mudge","doi":"10.1109/IISWC.2013.6704678","DOIUrl":"https://doi.org/10.1109/IISWC.2013.6704678","url":null,"abstract":"The rapid growth in the number of mobile devices and the higher data rate requirements of mobile subscribers have made wireless signal processing a key driving application of mobile computing technology. To design better mobile platforms and the supporting wireless infrastructure, it is very important for computer architects and system designers to understand and characterize the performance of existing and upcoming wireless protocols. In this paper, we present a newly developed open-source benchmark suite called WiBench. It consists of a wide range of signal processing kernels used in many mainstream standards such as 802.11, WCDMA and LTE. The kernels include FFT/IFFT, MIMO, channel estimation, channel coding, constellation mapping, etc. Each kernel is a self-contained configurable block which can be tuned to meet the different system requirements. Several standard channel models have also been included to study system performance, such as the bit error rate. The suite also contains an LTE uplink system as a representative example of a wireless system that can be built using these kernels. WiBench is provided in C++ to make it easier for computer architects to profile and analyze the system. We characterize the performance of WiBench to illustrate how it can be used to guide hardware system design. Architectural analyses on each individual kernel and on the entire LTE uplink are performed, indicating the hotspots, available parallelism, and runtime performance. Finally, a MATLAB version is also included for debugging purposes.","PeriodicalId":365868,"journal":{"name":"2013 IEEE International Symposium on Workload Characterization (IISWC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123422532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Performance implications of System Management Mode 系统管理模式对性能的影响
Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704682
Brian Delgado, K. Karavanic
System Management Mode (SMM) is a special x86 processor mode that privileged software such as kernels or hypervisors cannot access or interrupt. Previously, it has been assumed that time spent in SMM would be relatively small and therefore its side effects on privileged software were unimportant; recently, researchers have proposed uses, such as security-related checks, that would greatly increase the amount of runtime spent in this mode. We present the results of a detailed performance study to characterize the performance impacts of SMM, using measurement infrastructure we have developed. Our study includes impact to application, system, and hypervisor. We show there can be clear negative effects from prolonged preemptions. However, if SMM duration is kept within certain ranges, perturbation caused by SMIs may be kept to a minimum.
SMM (System Management Mode)是一种特殊的x86处理器模式,内核或管理程序等特权软件无法访问或中断该模式。以前,人们假设在SMM上花费的时间相对较少,因此它对特权软件的副作用不重要;最近,研究人员提出了一些用途,比如与安全相关的检查,这将大大增加在这种模式下花费的运行时间。我们提出了一项详细的性能研究结果,利用我们开发的测量基础设施来表征SMM的性能影响。我们的研究包括对应用程序、系统和管理程序的影响。我们表明,长期的先发制人会产生明显的负面影响。然而,如果SMM持续时间保持在一定范围内,smi引起的扰动可以保持在最低限度。
{"title":"Performance implications of System Management Mode","authors":"Brian Delgado, K. Karavanic","doi":"10.1109/IISWC.2013.6704682","DOIUrl":"https://doi.org/10.1109/IISWC.2013.6704682","url":null,"abstract":"System Management Mode (SMM) is a special x86 processor mode that privileged software such as kernels or hypervisors cannot access or interrupt. Previously, it has been assumed that time spent in SMM would be relatively small and therefore its side effects on privileged software were unimportant; recently, researchers have proposed uses, such as security-related checks, that would greatly increase the amount of runtime spent in this mode. We present the results of a detailed performance study to characterize the performance impacts of SMM, using measurement infrastructure we have developed. Our study includes impact to application, system, and hypervisor. We show there can be clear negative effects from prolonged preemptions. However, if SMM duration is kept within certain ranges, perturbation caused by SMIs may be kept to a minimum.","PeriodicalId":365868,"journal":{"name":"2013 IEEE International Symposium on Workload Characterization (IISWC)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122562258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Power and performance of GPU-accelerated systems: A closer look gpu加速系统的功率和性能:近距离观察
Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704675
Yukitaka Abe, Hiroshi Sasaki, S. Kato, Koji Inoue, M. Edahiro, M. Peres
In this paper, we have presented a characterization of power and performance for GPU-accelerated systems. We selected four different NVIDIA GPUs from three generations of the GPU architecture in order to demonstrate generality of our contribution. One of our findings is that the power efficiency characteristics differ such that the best configuration is not identical between the GPUs. This evidence encourages future work on the management of power and performance for GPU-accelerated systems to benefit from dynamic voltage and frequency scaling. In future work, we plan to develop a dynamic voltage and frequency scaling algorithm for GPU-accelerated systems.
在本文中,我们提出了gpu加速系统的功率和性能表征。我们从三代GPU架构中选择了四种不同的NVIDIA GPU,以展示我们贡献的通用性。我们的发现之一是,功率效率特性不同,因此最佳配置在gpu之间是不相同的。这一证据鼓励未来在gpu加速系统的功率和性能管理方面的工作,以受益于动态电压和频率缩放。在未来的工作中,我们计划为gpu加速系统开发一种动态电压和频率缩放算法。
{"title":"Power and performance of GPU-accelerated systems: A closer look","authors":"Yukitaka Abe, Hiroshi Sasaki, S. Kato, Koji Inoue, M. Edahiro, M. Peres","doi":"10.1109/IISWC.2013.6704675","DOIUrl":"https://doi.org/10.1109/IISWC.2013.6704675","url":null,"abstract":"In this paper, we have presented a characterization of power and performance for GPU-accelerated systems. We selected four different NVIDIA GPUs from three generations of the GPU architecture in order to demonstrate generality of our contribution. One of our findings is that the power efficiency characteristics differ such that the best configuration is not identical between the GPUs. This evidence encourages future work on the management of power and performance for GPU-accelerated systems to benefit from dynamic voltage and frequency scaling. In future work, we plan to develop a dynamic voltage and frequency scaling algorithm for GPU-accelerated systems.","PeriodicalId":365868,"journal":{"name":"2013 IEEE International Symposium on Workload Characterization (IISWC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116735396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Characterizing data analysis workloads in data centers 描述数据中心的数据分析工作负载
Pub Date : 2013-07-30 DOI: 10.1109/IISWC.2013.6704671
Zhen Jia, Lei Wang, Jianfeng Zhan, Lixin Zhang, Chunjie Luo
As the amount of data explodes rapidly, more and more corporations are using data centers to make effective decisions and gain a competitive edge. Data analysis applications play a significant role in data centers, and hence it has became increasingly important to understand their behaviors in order to further improve the performance of data center computer systems. In this paper, after investigating three most important application domains in terms of page views and daily visitors, we choose eleven representative data analysis workloads and characterize their micro-architectural characteristics by using hardware performance counters, in order to understand the impacts and implications of data analysis workloads on the systems equipped with modern superscalar out-of-order processors. Our study on the workloads reveals that data analysis applications share many inherent characteristics, which place them in a different class from desktop (SPEC CPU2006), HPC (HPCC), and service workloads, including traditional server workloads (SPECweb200S) and scale-out service workloads (four among six benchmarks in CloudSuite), and accordingly we give several recommendations for architecture and system optimizations. On the basis of our workload characterization work, we released a benchmark suite named DCBench for typical datacenter workloads, including data analysis and service workloads, with an open-source license on our project home page on http://prof.ict.ac.cnIDCBench. We hope that DCBench is helpful for performing architecture and small-to-medium scale system researches for datacenter computing.
随着数据量的快速增长,越来越多的公司正在使用数据中心来做出有效的决策并获得竞争优势。数据分析应用程序在数据中心中扮演着重要的角色,因此为了进一步提高数据中心计算机系统的性能,了解它们的行为变得越来越重要。本文在调查了三个最重要的应用领域的页面浏览量和每日访问量之后,我们选择了11个具有代表性的数据分析工作负载,并通过使用硬件性能计数器来表征它们的微架构特征,以了解数据分析工作负载对配备现代超标量乱序处理器的系统的影响和含义。我们对工作负载的研究表明,数据分析应用程序具有许多固有的特征,这些特征使它们与桌面(SPEC CPU2006), HPC (HPCC)和服务工作负载(包括传统的服务器工作负载(SPECweb200S)和扩展服务工作负载(CloudSuite中六个基准中的四个)不同,因此我们给出了一些架构和系统优化的建议。在工作负载表征工作的基础上,我们发布了一个名为DCBench的基准测试套件,用于典型的数据中心工作负载,包括数据分析和服务工作负载,并在我们的项目主页http://prof.ict.ac.cnIDCBench上发布了开源许可。我们希望DCBench对数据中心计算的架构和中小型系统研究有所帮助。
{"title":"Characterizing data analysis workloads in data centers","authors":"Zhen Jia, Lei Wang, Jianfeng Zhan, Lixin Zhang, Chunjie Luo","doi":"10.1109/IISWC.2013.6704671","DOIUrl":"https://doi.org/10.1109/IISWC.2013.6704671","url":null,"abstract":"As the amount of data explodes rapidly, more and more corporations are using data centers to make effective decisions and gain a competitive edge. Data analysis applications play a significant role in data centers, and hence it has became increasingly important to understand their behaviors in order to further improve the performance of data center computer systems. In this paper, after investigating three most important application domains in terms of page views and daily visitors, we choose eleven representative data analysis workloads and characterize their micro-architectural characteristics by using hardware performance counters, in order to understand the impacts and implications of data analysis workloads on the systems equipped with modern superscalar out-of-order processors. Our study on the workloads reveals that data analysis applications share many inherent characteristics, which place them in a different class from desktop (SPEC CPU2006), HPC (HPCC), and service workloads, including traditional server workloads (SPECweb200S) and scale-out service workloads (four among six benchmarks in CloudSuite), and accordingly we give several recommendations for architecture and system optimizations. On the basis of our workload characterization work, we released a benchmark suite named DCBench for typical datacenter workloads, including data analysis and service workloads, with an open-source license on our project home page on http://prof.ict.ac.cnIDCBench. We hope that DCBench is helpful for performing architecture and small-to-medium scale system researches for datacenter computing.","PeriodicalId":365868,"journal":{"name":"2013 IEEE International Symposium on Workload Characterization (IISWC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124677160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 124
期刊
2013 IEEE International Symposium on Workload Characterization (IISWC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1