首页 > 最新文献

2015 IEEE International Symposium on Workload Characterization最新文献

英文 中文
Quantifying the Performance Impact of Memory Latency and Bandwidth for Big Data Workloads 量化大数据工作负载下内存延迟和带宽对性能的影响
Pub Date : 2015-10-04 DOI: 10.1109/IISWC.2015.32
R. Clapp, Martin Dimitrov, Karthik Kumar, Vish Viswanathan, Thomas Willhalm
In recent years, DRAM technology improvements have scaled at a much slower pace than processors. While server processor core counts grow from 33% to 50% on a yearly cadence, DDR 3/4 memory channel bandwidth has grown at a slower rate, and memory latency has remained relatively flat for some time. Combined with new computing paradigms such as big data analytics, which involves analyzing massive volumes of data in real time, there is a trend of increasing pressure on the memory subsystem. This makes it important for computer architects to understand the sensitivity of the performance of big data workloads to memory bandwidth and latency, and how these workloads compare to more conventional workloads. To address this, we present straightforward analytic equations to quantify the impact of memory bandwidth and latency on workload performance, leveraging measured data from performance counters on real systems. We demonstrate how the values of the components of these equations can be used to classify different workloads according to their inherent bandwidth requirement and latency sensitivity. Using this performance model, we show the relative sensitivities of big data, high-performance computing, and enterprise workload classes to changes in memory bandwidth and latency.
近年来,DRAM技术的发展速度比处理器慢得多。虽然服务器处理器核心数每年从33%增长到50%,但DDR 3/4内存通道带宽的增长速度较慢,并且内存延迟在一段时间内保持相对平稳。再加上新的计算范式,如大数据分析,需要实时分析大量数据,内存子系统的压力有增加的趋势。这使得计算机架构师了解大数据工作负载的性能对内存带宽和延迟的敏感性以及这些工作负载与更传统的工作负载的比较变得非常重要。为了解决这个问题,我们提供了简单的分析方程来量化内存带宽和延迟对工作负载性能的影响,利用实际系统上性能计数器的测量数据。我们演示了如何使用这些方程的组件的值来根据其固有的带宽需求和延迟灵敏度对不同的工作负载进行分类。使用这个性能模型,我们展示了大数据、高性能计算和企业工作负载类别对内存带宽和延迟变化的相对敏感性。
{"title":"Quantifying the Performance Impact of Memory Latency and Bandwidth for Big Data Workloads","authors":"R. Clapp, Martin Dimitrov, Karthik Kumar, Vish Viswanathan, Thomas Willhalm","doi":"10.1109/IISWC.2015.32","DOIUrl":"https://doi.org/10.1109/IISWC.2015.32","url":null,"abstract":"In recent years, DRAM technology improvements have scaled at a much slower pace than processors. While server processor core counts grow from 33% to 50% on a yearly cadence, DDR 3/4 memory channel bandwidth has grown at a slower rate, and memory latency has remained relatively flat for some time. Combined with new computing paradigms such as big data analytics, which involves analyzing massive volumes of data in real time, there is a trend of increasing pressure on the memory subsystem. This makes it important for computer architects to understand the sensitivity of the performance of big data workloads to memory bandwidth and latency, and how these workloads compare to more conventional workloads. To address this, we present straightforward analytic equations to quantify the impact of memory bandwidth and latency on workload performance, leveraging measured data from performance counters on real systems. We demonstrate how the values of the components of these equations can be used to classify different workloads according to their inherent bandwidth requirement and latency sensitivity. Using this performance model, we show the relative sensitivities of big data, high-performance computing, and enterprise workload classes to changes in memory bandwidth and latency.","PeriodicalId":142698,"journal":{"name":"2015 IEEE International Symposium on Workload Characterization","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134381127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
How Good Are Low-Power 64-Bit SoCs for Server-Class Workloads? 低功耗64位soc对于服务器类工作负载有多好?
Pub Date : 2015-10-04 DOI: 10.1109/IISWC.2015.21
R. Azimi, Xin Zhan, S. Reda
Emerging system-on-a-chip (SoC)-based microservers promise higher energy efficiency by drastically reducing power consumption albeit at the expense of loss in performance. In this paper we thoroughly evaluate the performance and energy efficiency of two 64-bit eight-core ARM and x86 SoCs on a number of parallel scale-out benchmarks and high-performance computing benchmarks. We characterize the workloads on these servers and elaborate the impact of the SoC architecture, memory hierarchy, and system design on the performance and energy efficiency outcomes. We also contrast the results against those of standard x86 servers.
新兴的基于片上系统(SoC)的微服务器承诺通过大幅降低功耗来提高能源效率,尽管这是以性能损失为代价的。在本文中,我们全面评估了两个64位8核ARM和x86 soc在许多并行横向扩展基准和高性能计算基准上的性能和能效。我们描述了这些服务器上的工作负载,并详细说明了SoC架构、内存层次结构和系统设计对性能和能效结果的影响。我们还将结果与标准x86服务器的结果进行了对比。
{"title":"How Good Are Low-Power 64-Bit SoCs for Server-Class Workloads?","authors":"R. Azimi, Xin Zhan, S. Reda","doi":"10.1109/IISWC.2015.21","DOIUrl":"https://doi.org/10.1109/IISWC.2015.21","url":null,"abstract":"Emerging system-on-a-chip (SoC)-based microservers promise higher energy efficiency by drastically reducing power consumption albeit at the expense of loss in performance. In this paper we thoroughly evaluate the performance and energy efficiency of two 64-bit eight-core ARM and x86 SoCs on a number of parallel scale-out benchmarks and high-performance computing benchmarks. We characterize the workloads on these servers and elaborate the impact of the SoC architecture, memory hierarchy, and system design on the performance and energy efficiency outcomes. We also contrast the results against those of standard x86 servers.","PeriodicalId":142698,"journal":{"name":"2015 IEEE International Symposium on Workload Characterization","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122326337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
I/O Characteristics of Smartphone Applications and Their Implications for eMMC Design 智能手机应用的I/O特性及其对eMMC设计的影响
Pub Date : 2015-10-04 DOI: 10.1109/IISWC.2015.8
Deng Zhou, Wen Pan, Wei Wang, T. Xie
A vast majority of smart phones use eMMC (embedded multimedia card) devices as their storage subsystems. Recent studies reveal that storage subsystem is a significant contributor to the performance of smart phone applications. Nevertheless, smart phone applications' block-level I/O characteristics and their implications on eMMC design are still poorly understood. In this research, we collect and analyze block-level I/O traces from 18 common applications (e.g., Email and Twitter) on a Nexus 5 smart phone. We observe some I/O characteristics from which several implications for eMMC design are derived. For example, we find that in 15 out of the 18 traces majority requests (44.9%-57.4%) are small single-page (4KB) requests. The implication is that small requests should be served rapidly so that the overall performance of an eMMC device can be boosted. Next, we conduct a case study to demonstrate how to apply the implications to optimize eMMC design. Inspired by two implications, we propose a hybrid-page-size (HPS) eMMC. Experimental results show that the HPS scheme can reduce mean response time by up to 86% while improving space utilization by up to 24.2%.
绝大多数智能手机都使用eMMC(嵌入式多媒体卡)设备作为其存储子系统。近年来的研究表明,存储子系统是影响智能手机应用性能的重要因素。然而,智能手机应用程序的块级I/O特性及其对eMMC设计的影响仍然知之甚少。在这项研究中,我们收集并分析了Nexus 5智能手机上18个常见应用程序(例如,电子邮件和Twitter)的块级I/O痕迹。我们观察了一些I/O特性,从中得出了eMMC设计的几个含义。例如,我们发现在18个跟踪中的15个中,大多数请求(44.9%-57.4%)是小的单页(4KB)请求。这意味着应该快速地处理小请求,以便提高eMMC设备的整体性能。接下来,我们将进行一个案例研究,以演示如何将其应用于优化eMMC设计。受两个含义的启发,我们提出了混合页面大小(HPS) eMMC。实验结果表明,HPS方案可将平均响应时间缩短86%,空间利用率提高24.2%。
{"title":"I/O Characteristics of Smartphone Applications and Their Implications for eMMC Design","authors":"Deng Zhou, Wen Pan, Wei Wang, T. Xie","doi":"10.1109/IISWC.2015.8","DOIUrl":"https://doi.org/10.1109/IISWC.2015.8","url":null,"abstract":"A vast majority of smart phones use eMMC (embedded multimedia card) devices as their storage subsystems. Recent studies reveal that storage subsystem is a significant contributor to the performance of smart phone applications. Nevertheless, smart phone applications' block-level I/O characteristics and their implications on eMMC design are still poorly understood. In this research, we collect and analyze block-level I/O traces from 18 common applications (e.g., Email and Twitter) on a Nexus 5 smart phone. We observe some I/O characteristics from which several implications for eMMC design are derived. For example, we find that in 15 out of the 18 traces majority requests (44.9%-57.4%) are small single-page (4KB) requests. The implication is that small requests should be served rapidly so that the overall performance of an eMMC device can be boosted. Next, we conduct a case study to demonstrate how to apply the implications to optimize eMMC design. Inspired by two implications, we propose a hybrid-page-size (HPS) eMMC. Experimental results show that the HPS scheme can reduce mean response time by up to 86% while improving space utilization by up to 24.2%.","PeriodicalId":142698,"journal":{"name":"2015 IEEE International Symposium on Workload Characterization","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127589176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Characterizing Disk Failures with Quantified Disk Degradation Signatures: An Early Experience 用量化磁盘退化特征表征磁盘故障:早期经验
Pub Date : 2015-10-04 DOI: 10.1109/IISWC.2015.26
Song Huang, Song Fu, Quan Zhang, Weisong Shi
With the advent of cloud computing and online services, large enterprises rely heavily on their data centers to serve end users. Among different server components, hard disk drives are known to contribute significantly to server failures. Disk failures as well as their impact on the performance of storage systems and operating costs are becoming an increasingly important concern for data center designers and operators. However, there is very little understanding on the characteristics of disk failures in data centers. Effective disk failure management and data recovery also requires a deep understanding of the nature of disk failures. In this paper, we present a systematic approach to provide a holistic and insightful view of disk failures. We study a large-scale storage system from a production data center. We categorize disk failures based on their distinctive manifestations and properties. Then we characterize the degradation of disk errors to failures by deriving the degradation signatures for each failure category. The influence of disk health attributes on failure degradation is also quantified. We discuss leveraging the derived degradation signatures to forecast disk failures even in their early stages. To the best of our knowledge, this is the first work that shows how to discover the categories of disk failures and characterize their degradation processes on a production data center.
随着云计算和在线服务的出现,大型企业严重依赖其数据中心为最终用户提供服务。在不同的服务器组件中,硬盘驱动器是导致服务器故障的主要原因。磁盘故障及其对存储系统性能和运营成本的影响正成为数据中心设计人员和运营商日益关注的问题。然而,人们对数据中心中磁盘故障的特征了解甚少。有效的磁盘故障管理和数据恢复还需要深入了解磁盘故障的性质。在本文中,我们提出了一种系统的方法来提供磁盘故障的整体和有见地的观点。我们研究了一个生产数据中心的大规模存储系统。我们根据磁盘故障的独特表现和特性对其进行分类。然后,我们通过导出每个故障类别的退化特征来表征磁盘错误到故障的退化。还量化了磁盘健康属性对故障退化的影响。我们将讨论利用派生的退化特征来预测磁盘故障,即使是在故障的早期阶段。据我们所知,这是第一个展示如何在生产数据中心发现磁盘故障类别并描述其降级过程的工作。
{"title":"Characterizing Disk Failures with Quantified Disk Degradation Signatures: An Early Experience","authors":"Song Huang, Song Fu, Quan Zhang, Weisong Shi","doi":"10.1109/IISWC.2015.26","DOIUrl":"https://doi.org/10.1109/IISWC.2015.26","url":null,"abstract":"With the advent of cloud computing and online services, large enterprises rely heavily on their data centers to serve end users. Among different server components, hard disk drives are known to contribute significantly to server failures. Disk failures as well as their impact on the performance of storage systems and operating costs are becoming an increasingly important concern for data center designers and operators. However, there is very little understanding on the characteristics of disk failures in data centers. Effective disk failure management and data recovery also requires a deep understanding of the nature of disk failures. In this paper, we present a systematic approach to provide a holistic and insightful view of disk failures. We study a large-scale storage system from a production data center. We categorize disk failures based on their distinctive manifestations and properties. Then we characterize the degradation of disk errors to failures by deriving the degradation signatures for each failure category. The influence of disk health attributes on failure degradation is also quantified. We discuss leveraging the derived degradation signatures to forecast disk failures even in their early stages. To the best of our knowledge, this is the first work that shows how to discover the categories of disk failures and characterize their degradation processes on a production data center.","PeriodicalId":142698,"journal":{"name":"2015 IEEE International Symposium on Workload Characterization","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129886676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed 全速前进:接近本地速度的详细架构模拟
Pub Date : 2015-10-04 DOI: 10.1109/IISWC.2015.29
Andreas Sandberg, Nikos Nikoleris, Trevor E. Carlson, Erik Hagersten, S. Kaxiras, D. Black-Schaffer
Cycle-level micro architectural simulation is the de-facto standard to estimate performance of next-generation platforms. Unfortunately, the level of detail needed for accurate simulation requires complex, and therefore slow, simulation models that run at speeds that are thousands of times slower than native execution. With the introduction of sampled simulation, it has become possible to simulate only the key, representative portions of a workload in a reasonable amount of time and reliably estimate its overall performance. These sampling methodologies provide the ability to identify regions for detailed execution, and through micro architectural state check pointing, one can quickly and easily determine the performance characteristics of a workload for a variety of micro architectural changes. While this strategy of sampling simulations to generate checkpoints performs well for static applications, more complex scenarios involving hardware-software co-design (such as co-optimizing both a Java virtual machine and the micro architecture it is running on) cause this methodology to break down, as new micro architectural checkpoints are needed for each memory hierarchy configuration and software version. Solutions are therefore needed to enable fast and accurate simulation that also address the needs of hardware-software co-design and exploration. In this work we present a methodology to enhance checkpoint-based sampled simulation. Our solution integrates hardware virtualization to provide near-native speed, virtualized fast-forwarding to regions of interest, and parallel detailed simulation. However, as we cannot warm the simulated caches during virtualized fast-forwarding, we develop a novel approach to estimate the error introduced by limited cache warming, through the use of optimistic and pessimistic warming simulations. Using virtualized fast-forwarding (which operates at 90% of native speed on average), we demonstrate a parallel sampling simulator that can be used to accurately estimate the IPC of standard workloads with an average error of 2.2% while still reaching an execution rate of 2.0 GIPS (63% of native) on average. Additionally, we demonstrate that our parallelization strategy scales almost linearly and simulates one core at up to 93% of its native execution rate, 19,000x faster than detailed simulation, while using 8 cores.
周期级微架构仿真是评估下一代平台性能的事实上的标准。不幸的是,精确模拟所需的细节级别需要复杂的,因此缓慢的仿真模型,其运行速度比本机执行速度慢数千倍。随着采样模拟的引入,可以在合理的时间内只模拟工作负载的关键、代表性部分,并可靠地估计其整体性能。这些抽样方法提供了识别区域以进行详细执行的能力,并且通过微体系结构状态检查点,可以快速轻松地确定各种微体系结构更改的工作负载的性能特征。虽然这种抽样模拟生成检查点的策略对于静态应用程序执行良好,但涉及软硬件协同设计的更复杂场景(例如共同优化Java虚拟机及其运行的微体系结构)会导致这种方法失效,因为每个内存层次结构配置和软件版本都需要新的微体系结构检查点。因此,需要解决方案来实现快速准确的仿真,同时满足硬件软件协同设计和探索的需求。在这项工作中,我们提出了一种方法来增强基于检查点的采样模拟。我们的解决方案集成了硬件虚拟化,以提供接近本地的速度,虚拟化的快速转发到感兴趣的区域,以及并行的详细模拟。然而,由于我们不能在虚拟快进过程中加热模拟缓存,我们开发了一种新的方法,通过使用乐观和悲观的变暖模拟来估计有限缓存变暖带来的误差。使用虚拟化快速转发(其平均运行速度为本机速度的90%),我们演示了一个并行采样模拟器,该模拟器可用于准确估计标准工作负载的IPC,平均误差为2.2%,同时平均执行速度仍达到2.0 GIPS(本机速度的63%)。此外,我们证明了我们的并行化策略几乎是线性扩展的,并且在使用8个核心时,以高达93%的原生执行速率模拟一个核心,比详细模拟快19,000倍。
{"title":"Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed","authors":"Andreas Sandberg, Nikos Nikoleris, Trevor E. Carlson, Erik Hagersten, S. Kaxiras, D. Black-Schaffer","doi":"10.1109/IISWC.2015.29","DOIUrl":"https://doi.org/10.1109/IISWC.2015.29","url":null,"abstract":"Cycle-level micro architectural simulation is the de-facto standard to estimate performance of next-generation platforms. Unfortunately, the level of detail needed for accurate simulation requires complex, and therefore slow, simulation models that run at speeds that are thousands of times slower than native execution. With the introduction of sampled simulation, it has become possible to simulate only the key, representative portions of a workload in a reasonable amount of time and reliably estimate its overall performance. These sampling methodologies provide the ability to identify regions for detailed execution, and through micro architectural state check pointing, one can quickly and easily determine the performance characteristics of a workload for a variety of micro architectural changes. While this strategy of sampling simulations to generate checkpoints performs well for static applications, more complex scenarios involving hardware-software co-design (such as co-optimizing both a Java virtual machine and the micro architecture it is running on) cause this methodology to break down, as new micro architectural checkpoints are needed for each memory hierarchy configuration and software version. Solutions are therefore needed to enable fast and accurate simulation that also address the needs of hardware-software co-design and exploration. In this work we present a methodology to enhance checkpoint-based sampled simulation. Our solution integrates hardware virtualization to provide near-native speed, virtualized fast-forwarding to regions of interest, and parallel detailed simulation. However, as we cannot warm the simulated caches during virtualized fast-forwarding, we develop a novel approach to estimate the error introduced by limited cache warming, through the use of optimistic and pessimistic warming simulations. Using virtualized fast-forwarding (which operates at 90% of native speed on average), we demonstrate a parallel sampling simulator that can be used to accurately estimate the IPC of standard workloads with an average error of 2.2% while still reaching an execution rate of 2.0 GIPS (63% of native) on average. Additionally, we demonstrate that our parallelization strategy scales almost linearly and simulates one core at up to 93% of its native execution rate, 19,000x faster than detailed simulation, while using 8 cores.","PeriodicalId":142698,"journal":{"name":"2015 IEEE International Symposium on Workload Characterization","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125667532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Evaluating the Combined Impact of Node Architecture and Cloud Workload Characteristics on Network Traffic and Performance/Cost 评估节点架构和云工作负载特性对网络流量和性能/成本的综合影响
Pub Date : 2015-10-01 DOI: 10.1109/IISWC.2015.31
D. Z. Tootaghaj, F. Farhat, M. Arjomand, P. Faraboschi, M. Kandemir, A. Sivasubramaniam, C. Das
The combined impact of node architecture and workload characteristics on off-chip network traffic with performance/cost analysis has not been investigated before in the context of emerging cloud applications. Motivated by this observation, this paper performs a thorough characterization of twelve cloud workloads using a full-system datacenter simulation infrastructure. We first study the inherent network characteristics of emerging cloud applications including message inter-arrival times, packet sizes, inter-node communication overhead, self-similarity, and traffic volume. Then, we study the effect of hardware architectural metrics on network traffic. Our experimental analysis reveals that (1) the message arrival times and packet-size distributions exhibit variances across different cloud applications, (2) the inter-arrival times imply a large amount of self-similarity as the number of nodes increase, (3) the node architecture can play a significant role in shaping the overall network traffic, and finally, (4) the applications we study can be broadly divided into those which perform better in a scale-out or scale-up configuration at node level and into two categories, namely, those that have long-duration, low-burst flows and those that have short-duration, high-burst flows. Using the results of (3) and (4), the paper discusses the performance/cost trade-offs for scale-out and scale-up approaches and proposes an analytical model that can be used to predict the communication and computation demand for different configurations. It is shown that the difference between two different node architecture's performance per dollar cost (under same number of cores system wide) can be as high as 154 percent which disclose the need for accurate characterization of cloud applications before wasting the precious cloud resources by allocating wrong architecture. The results of this study can be used for system modeling, capacity planning and managing heterogeneous resources for large-scale system designs.
节点架构和工作负载特征对片外网络流量的综合影响以及性能/成本分析在新兴云应用程序的背景下还没有被研究过。受此观察的启发,本文使用全系统数据中心模拟基础设施对12个云工作负载进行了全面的表征。我们首先研究了新兴云应用程序的固有网络特征,包括消息间到达时间、数据包大小、节点间通信开销、自相似性和流量。然后,我们研究了硬件架构指标对网络流量的影响。我们的实验分析表明:(1)消息到达时间和数据包大小分布在不同的云应用程序中表现出差异,(2)随着节点数量的增加,到达时间意味着大量的自相似性,(3)节点架构可以在塑造整体网络流量方面发挥重要作用,最后,(4)我们所研究的应用程序可以大致分为在节点级别的横向或纵向配置中表现更好的应用程序,以及具有长持续时间,低突发流和具有短持续时间,高突发流的两类应用程序。利用(3)和(4)的结果,本文讨论了横向扩展和纵向扩展方法的性能/成本权衡,并提出了一个分析模型,可用于预测不同配置下的通信和计算需求。结果表明,两种不同节点架构的每美元成本(在相同数量的核心系统范围下)的性能差异可能高达154%,这表明在分配错误的架构浪费宝贵的云资源之前,需要准确地描述云应用程序。研究结果可用于大规模系统设计的系统建模、容量规划和异构资源管理。
{"title":"Evaluating the Combined Impact of Node Architecture and Cloud Workload Characteristics on Network Traffic and Performance/Cost","authors":"D. Z. Tootaghaj, F. Farhat, M. Arjomand, P. Faraboschi, M. Kandemir, A. Sivasubramaniam, C. Das","doi":"10.1109/IISWC.2015.31","DOIUrl":"https://doi.org/10.1109/IISWC.2015.31","url":null,"abstract":"The combined impact of node architecture and workload characteristics on off-chip network traffic with performance/cost analysis has not been investigated before in the context of emerging cloud applications. Motivated by this observation, this paper performs a thorough characterization of twelve cloud workloads using a full-system datacenter simulation infrastructure. We first study the inherent network characteristics of emerging cloud applications including message inter-arrival times, packet sizes, inter-node communication overhead, self-similarity, and traffic volume. Then, we study the effect of hardware architectural metrics on network traffic. Our experimental analysis reveals that (1) the message arrival times and packet-size distributions exhibit variances across different cloud applications, (2) the inter-arrival times imply a large amount of self-similarity as the number of nodes increase, (3) the node architecture can play a significant role in shaping the overall network traffic, and finally, (4) the applications we study can be broadly divided into those which perform better in a scale-out or scale-up configuration at node level and into two categories, namely, those that have long-duration, low-burst flows and those that have short-duration, high-burst flows. Using the results of (3) and (4), the paper discusses the performance/cost trade-offs for scale-out and scale-up approaches and proposes an analytical model that can be used to predict the communication and computation demand for different configurations. It is shown that the difference between two different node architecture's performance per dollar cost (under same number of cores system wide) can be as high as 154 percent which disclose the need for accurate characterization of cloud applications before wasting the precious cloud resources by allocating wrong architecture. The results of this study can be used for system modeling, capacity planning and managing heterogeneous resources for large-scale system designs.","PeriodicalId":142698,"journal":{"name":"2015 IEEE International Symposium on Workload Characterization","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132453347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
期刊
2015 IEEE International Symposium on Workload Characterization
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1