2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems最新文献_第2页

An Offline Demand Estimation Method for Multi-threaded Applications 一种多线程应用的离线需求估计方法

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.10

Juan F. Pérez, Sergio Pacheco-Sanchez, G. Casale

Parameterizing performance models for multi-threaded enterprise applications requires finding the service rates offered by worker threads to the incoming requests. Statistical inference on monitoring data is here helpful to reduce the overheads of application profiling and to infer missing information. While linear regression of utilization data is often used to estimate service rates, it suffers erratic performance and also ignores a large part of application monitoring data, e.g., response times. Yet inference from other metrics, such as response times or queue-length samples, is complicated by the dependence on scheduling policies. To address these issues, we propose novel scheduling-aware estimation approaches for multi-threaded applications based on linear regression and maximum likelihood estimators. The proposed methods estimate demands from samples of the number of requests in execution in the worker threads at the admission instant of a new request. Validation results are presented on simulated and real application datasets for systems with multi-class requests, class switching, and admission control.

为多线程企业应用程序参数化性能模型需要找到工作线程为传入请求提供的服务速率。在这里，对监视数据的统计推断有助于减少应用程序分析的开销，并推断缺失的信息。虽然利用率数据的线性回归通常用于估计服务率，但它的性能不稳定，并且还忽略了大部分应用程序监控数据，例如响应时间。然而，从其他指标(如响应时间或队列长度样本)进行的推断由于依赖于调度策略而变得复杂。为了解决这些问题，我们提出了基于线性回归和最大似然估计的多线程应用程序的新的调度感知估计方法。所提出的方法在接收新请求的瞬间从工作线程中正在执行的请求数量的样本中估计需求。在具有多类请求、类切换和准入控制的系统上，给出了仿真和实际应用数据集的验证结果。

{"title":"An Offline Demand Estimation Method for Multi-threaded Applications","authors":"Juan F. Pérez, Sergio Pacheco-Sanchez, G. Casale","doi":"10.1109/MASCOTS.2013.10","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.10","url":null,"abstract":"Parameterizing performance models for multi-threaded enterprise applications requires finding the service rates offered by worker threads to the incoming requests. Statistical inference on monitoring data is here helpful to reduce the overheads of application profiling and to infer missing information. While linear regression of utilization data is often used to estimate service rates, it suffers erratic performance and also ignores a large part of application monitoring data, e.g., response times. Yet inference from other metrics, such as response times or queue-length samples, is complicated by the dependence on scheduling policies. To address these issues, we propose novel scheduling-aware estimation approaches for multi-threaded applications based on linear regression and maximum likelihood estimators. The proposed methods estimate demands from samples of the number of requests in execution in the worker threads at the admission instant of a new request. Validation results are presented on simulated and real application datasets for systems with multi-class requests, class switching, and admission control.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127863412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

PAB: Parallelism-Aware Buffer Management Scheme for Nand-Based SSDs 基于nand的固态硬盘并行感知缓冲管理方案

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.18

Xufeng Guo, Jianfeng Tan, Yuping Wang

Recently, internal buffer module and multi-level parallel components have already become the standard elements of SSDs. The internal buffer module is always used as a write cache, reducing the erasures and thus improving overall performance. The multi-level parallelism is exploited to service requests in a concurrent or interleaving manner, which promotes the system throughput. These two aspects have been extensively discussed in the literature. However, current buffer algorithms cannot take full advantage of parallelism inside SSDs. In this paper, we propose a novel write buffer management scheme called Parallelism-Aware Buffer (PAB). In this scheme, the buffer is divided into two parts named as Work-Zone and Para-Zone respectively. Conventional buffer algorithms are employed in the Work-Zone, while the Para-Zone is responsible for reorganizing the requests evicted from Work-Zone according to the underlying parallelism. Simulation results show that with only a small size of Para-Zone, PAB can achieve 19.2% ~ 68.1% enhanced performance compared with LRU based on a page-mapping FTL, while this improvement scope becomes 5.6% ~ 35.6% compared with BPLRU based on the state-of-the-art block-mapping FTL known as FAST.

近年来，内部缓冲模块和多级并行组件已经成为固态硬盘的标配元件。内部缓冲区模块总是用作写缓存，减少擦除，从而提高整体性能。利用多级并行性以并行或交错的方式为请求提供服务，提高了系统的吞吐量。这两个方面在文献中已被广泛讨论。然而，当前的缓存算法不能充分利用ssd内部的并行性。在本文中，我们提出了一种新的写缓冲区管理方案，称为并行感知缓冲区(PAB)。在该方案中，缓冲区被分为两个部分，分别称为Work-Zone和Para-Zone。在Work-Zone中使用传统的缓冲区算法，而Para-Zone负责根据底层并行性重新组织从Work-Zone驱逐的请求。仿真结果表明，与基于页映射FTL的LRU相比，PAB在Para-Zone较小的情况下，性能提高了19.2% ~ 68.1%，而与基于最先进的块映射FTL (FAST)的BPLRU相比，性能提高幅度为5.6% ~ 35.6%。

{"title":"PAB: Parallelism-Aware Buffer Management Scheme for Nand-Based SSDs","authors":"Xufeng Guo, Jianfeng Tan, Yuping Wang","doi":"10.1109/MASCOTS.2013.18","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.18","url":null,"abstract":"Recently, internal buffer module and multi-level parallel components have already become the standard elements of SSDs. The internal buffer module is always used as a write cache, reducing the erasures and thus improving overall performance. The multi-level parallelism is exploited to service requests in a concurrent or interleaving manner, which promotes the system throughput. These two aspects have been extensively discussed in the literature. However, current buffer algorithms cannot take full advantage of parallelism inside SSDs. In this paper, we propose a novel write buffer management scheme called Parallelism-Aware Buffer (PAB). In this scheme, the buffer is divided into two parts named as Work-Zone and Para-Zone respectively. Conventional buffer algorithms are employed in the Work-Zone, while the Para-Zone is responsible for reorganizing the requests evicted from Work-Zone according to the underlying parallelism. Simulation results show that with only a small size of Para-Zone, PAB can achieve 19.2% ~ 68.1% enhanced performance compared with LRU based on a page-mapping FTL, while this improvement scope becomes 5.6% ~ 35.6% compared with BPLRU based on the state-of-the-art block-mapping FTL known as FAST.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125497350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Improving the Revenue, Efficiency and Reliability in Data Center Spot Market: A Truthful Mechanism 提高数据中心现货市场收益、效率和可靠性:一种真实的机制

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.30

Kai Song, Y. Yao, L. Golubchik

Data centers are typically over-provisioned, in order to meet certain service level agreements (SLAs) under worst-case scenarios (e.g., peak loads). Selling unused instances at discounted prices thus is a reasonable approach for data center providers to off-set the maintenance and operation costs. Spot market models are widely used for pricing and allocating unused instances. In this paper, we focus on mechanism design for a data center spot market (DCSM). Particularly, we propose a mechanism based on a repeated uniform price auction, and prove its truthfulness. In the mechanism, to achieve better quality of service, the flexibility of adjusting bids during job execution is provided, and a bidding adjustment model is also discussed. Four metrics are used to evaluate the mechanism: in addition to the commonly used metrics in auction theory, namely, revenue, efficiency, slowdown and waste are defined to capture the Quality of Service (QoS) provided by DCSMs. We prove that a uniform price action achieves optimal efficiency among all single-price auctions in DCSMs. We also conduct comprehensive simulations to explore the performance of the resulting DCSM. The result show that (1) the bidding adjustment model helps increase the revenue by an average of 5%, and decrease the slowdown and waste by average of 5% and 6%, respectively, (2) our model with repeated uniform price auction outperforms the current Amazon Spot Market by an average of 14% in revenue, 24% in efficiency, 13% in slowdown, and by 14% in waste. Parameter tuning studies are also performed to refine the performance of our mechanism.

为了在最坏情况下(例如，峰值负载)满足某些服务水平协议(sla)，数据中心通常会过度供应。因此，以折扣价出售未使用的实例是数据中心提供商抵消维护和操作成本的合理方法。现货市场模型被广泛用于定价和分配未使用的实例。本文主要研究数据中心现货市场(DCSM)的机制设计。具体地说，我们提出了一种基于重复统一价格拍卖的机制，并证明了其真实性。在该机制中，为获得更好的服务质量，提供了在工作执行过程中调整投标的灵活性，并讨论了投标调整模型。使用四个指标来评估该机制:除了拍卖理论中常用的指标外，还定义了收入、效率、速度放缓和浪费，以捕获dcsm提供的服务质量(QoS)。我们证明了统一价格行为在dcsm的所有单一价格拍卖中达到了最优的效率。我们还进行了全面的仿真来探索所得的DCSM的性能。结果表明:(1)竞价调整模型使亚马逊现货市场的收益平均增加5%，慢速和浪费平均减少5%和6%;(2)我们的重复统一价格拍卖模型的收益、效率、慢速和浪费分别平均高出当前亚马逊现货市场14%、24%、13%和14%。还进行了参数调优研究，以改进我们的机制的性能。

{"title":"Improving the Revenue, Efficiency and Reliability in Data Center Spot Market: A Truthful Mechanism","authors":"Kai Song, Y. Yao, L. Golubchik","doi":"10.1109/MASCOTS.2013.30","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.30","url":null,"abstract":"Data centers are typically over-provisioned, in order to meet certain service level agreements (SLAs) under worst-case scenarios (e.g., peak loads). Selling unused instances at discounted prices thus is a reasonable approach for data center providers to off-set the maintenance and operation costs. Spot market models are widely used for pricing and allocating unused instances. In this paper, we focus on mechanism design for a data center spot market (DCSM). Particularly, we propose a mechanism based on a repeated uniform price auction, and prove its truthfulness. In the mechanism, to achieve better quality of service, the flexibility of adjusting bids during job execution is provided, and a bidding adjustment model is also discussed. Four metrics are used to evaluate the mechanism: in addition to the commonly used metrics in auction theory, namely, revenue, efficiency, slowdown and waste are defined to capture the Quality of Service (QoS) provided by DCSMs. We prove that a uniform price action achieves optimal efficiency among all single-price auctions in DCSMs. We also conduct comprehensive simulations to explore the performance of the resulting DCSM. The result show that (1) the bidding adjustment model helps increase the revenue by an average of 5%, and decrease the slowdown and waste by average of 5% and 6%, respectively, (2) our model with repeated uniform price auction outperforms the current Amazon Spot Market by an average of 14% in revenue, 24% in efficiency, 13% in slowdown, and by 14% in waste. Parameter tuning studies are also performed to refine the performance of our mechanism.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126055384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

A Versatile Performance and Energy Simulation Tool for Composite GPU Global Memory 复合GPU全局内存的多功能性能和能量模拟工具

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.39

Bin Wang, Yizheng Jiao, Weikuan Yu, Xipeng Shen, Dong Li, J. Vetter

As a cost-effective compute device, Graphic Processing Unit (GPU) has been widely embraced in the field of high performance computing. GPU is characterized by its massive thread-level parallelism and high memory bandwidth. Although GPU has exhibited tremendous potential, recent GPU architecture researches mainly focus on GPU compute units and full system exploration is rare due to the lack of accurate simulators that can reveal hardware organization of both GPU compute units and its memory system. In order to fill this void, we build a GPU simulator called VxGPUSim that can support the simulation with detailed performance, timing and power consumption statistics. Our experimental evaluation demonstrates that VxGPUSim can faithfully reveal the internal execution details of GPU global memory of various memory configurations. It can enable further research on the design of GPU global memory for performance and energy tradeoffs.

图形处理器(graphics Processing Unit, GPU)作为一种经济高效的计算设备，在高性能计算领域得到了广泛的应用。GPU的特点是具有巨大的线程级并行性和高内存带宽。尽管GPU显示出了巨大的潜力，但目前的GPU架构研究主要集中在GPU计算单元上，由于缺乏精确的模拟器来揭示GPU计算单元及其存储系统的硬件组织，因此很少有完整的系统探索。为了填补这一空白，我们构建了一个名为VxGPUSim的GPU模拟器，它可以通过详细的性能、时序和功耗统计数据来支持仿真。我们的实验评估表明，VxGPUSim可以真实地显示各种内存配置下GPU全局内存的内部执行细节。它可以进一步研究GPU全局存储器的性能和能量权衡的设计。

引用次数: 2

A VoD System for Massively Scaled, Heterogeneous Environments: Design and Implementation 面向大规模异构环境的VoD系统:设计与实现

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.8

Kangwook Lee, Lisa Yan, Abhay K. Parekh, K. Ramchandran

We propose, analyze and implement a general architecture for massively parallel VoD content distribution. We allow for devices that have a wide range of reliability, storage and bandwidth constraints. Each device can act as a cache for other devices and can also communicate with a central server. Some devices may be dedicated caches with no co-located users. Our goal is to allow each user device to be able to stream any movie from a large catalog, while minimizing the load of the central server. First, we architect and formulate a static optimization problem that accounts for various network bandwidth and storage capacity constraints, as well as the maximum number of network connections for each device. Not surprisingly this formulation is NP-hard. We then use a Markov approximation technique in a primal-dual framework to devise a highly distributed algorithm which is provably close to the optimal. Next we test the practical effectiveness of the distributed algorithm in several ways. We demonstrate remarkable robustness to system scale and changes in demand, user churn, network failure and node failures via a packet level simulation of the system. Finally, we describe our results from numerous experiments on a full implementation of the system with 60 caches and 120 users on 20 Amazon EC2 instances. In addition to corroborating our analytical and simulation-based findings, the implementation allows us to examine various system-level tradeoffs. Examples of this include: (i) the split between server to cache and cache to device traffic, (ii) the tradeoff between cache update intervals and the time taken for the system to adjust to changes in demand, and (iii) the tradeoff between the rate of virtual topology updates and convergence. These insights give us the confidence to claim that a much larger system on the scale of hundreds of thousands of highly heterogeneous nodes would perform as well as our current implementation.

我们提出、分析并实现了一个大规模并行VoD内容分发的通用架构。我们允许具有广泛可靠性，存储和带宽限制的设备。每个设备都可以作为其他设备的缓存，也可以与中央服务器通信。有些设备可能是专用缓存，没有共同定位的用户。我们的目标是允许每个用户设备能够流式传输来自大型目录的任何电影，同时最小化中央服务器的负载。首先，我们构建并制定了一个静态优化问题，该问题考虑了各种网络带宽和存储容量约束，以及每个设备的最大网络连接数。毫不奇怪，这个公式是NP-hard的。然后，我们在原始对偶框架中使用马尔可夫近似技术来设计一个可证明接近最优的高度分布式算法。接下来，我们从几个方面测试了分布式算法的实际有效性。我们通过系统的数据包级模拟展示了对系统规模和需求变化、用户流失、网络故障和节点故障的显著鲁棒性。最后，我们描述了在20个Amazon EC2实例上具有60个缓存和120个用户的系统的完整实现上的大量实验结果。除了证实我们的分析和基于模拟的发现外，该实现还允许我们检查各种系统级权衡。这方面的例子包括:(i)服务器到缓存和缓存到设备流量之间的分离，(ii)缓存更新间隔和系统适应需求变化所花费的时间之间的权衡，以及(iii)虚拟拓扑更新速度和收敛速度之间的权衡。这些见解使我们有信心声称，在数十万个高度异构节点的规模上，更大的系统将与我们当前的实现一样出色。

{"title":"A VoD System for Massively Scaled, Heterogeneous Environments: Design and Implementation","authors":"Kangwook Lee, Lisa Yan, Abhay K. Parekh, K. Ramchandran","doi":"10.1109/MASCOTS.2013.8","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.8","url":null,"abstract":"We propose, analyze and implement a general architecture for massively parallel VoD content distribution. We allow for devices that have a wide range of reliability, storage and bandwidth constraints. Each device can act as a cache for other devices and can also communicate with a central server. Some devices may be dedicated caches with no co-located users. Our goal is to allow each user device to be able to stream any movie from a large catalog, while minimizing the load of the central server. First, we architect and formulate a static optimization problem that accounts for various network bandwidth and storage capacity constraints, as well as the maximum number of network connections for each device. Not surprisingly this formulation is NP-hard. We then use a Markov approximation technique in a primal-dual framework to devise a highly distributed algorithm which is provably close to the optimal. Next we test the practical effectiveness of the distributed algorithm in several ways. We demonstrate remarkable robustness to system scale and changes in demand, user churn, network failure and node failures via a packet level simulation of the system. Finally, we describe our results from numerous experiments on a full implementation of the system with 60 caches and 120 users on 20 Amazon EC2 instances. In addition to corroborating our analytical and simulation-based findings, the implementation allows us to examine various system-level tradeoffs. Examples of this include: (i) the split between server to cache and cache to device traffic, (ii) the tradeoff between cache update intervals and the time taken for the system to adjust to changes in demand, and (iii) the tradeoff between the rate of virtual topology updates and convergence. These insights give us the confidence to claim that a much larger system on the scale of hundreds of thousands of highly heterogeneous nodes would perform as well as our current implementation.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"348 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133102765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Self-Tuning Batching with DVFS for Improving Performance and Energy Efficiency in Servers 用DVFS自调优批处理提高服务器的性能和能源效率

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.12

Dazhao Cheng, Yanfei Guo, Xiaobo Zhou

Performance improvement and energy efficiency are two important goals in provisioning Internet services in data center servers. In this paper, we propose and develop a self-tuning request batching mechanism to simultaneously achieve the two correlated goals. The batching mechanism increases the cache hit rate at the front-tier Web server, which provides the opportunity to improve application's performance and energy efficiency of the server system. The core of the batching mechanism is a novel and practical two-layer control system that adaptively adjusts the batching interval and frequency states of CPUs according to the service level agreement and the workload characteristics. The batching control adopts a self-tuning fuzzy model predictive control approach for application performance improvement. The power control dynamically adjusts the frequency of CPUs with DVFS in response to workload fluctuations for energy efficiency. A coordinator between the two control loops achieves the desired performance and energy efficiency. We implement the mechanism in a test bed and experimental results demonstrate that the new approach significantly improves the application's performance in terms of the system throughput and average response time. The results also illustrate it can reduce the energy consumption of the server system by 13% at the same time.

性能改进和能源效率是在数据中心服务器中提供Internet服务的两个重要目标。在本文中，我们提出并开发了一种自调优请求批处理机制来同时实现这两个相关的目标。批处理机制提高了前端Web服务器的缓存命中率，从而有机会提高应用程序的性能和服务器系统的能源效率。该批处理机制的核心是一种新颖实用的双层控制系统，可根据服务水平协议和工作负载特性自适应调整cpu的批处理间隔和频率状态。批处理控制采用自整定模糊模型预测控制方法，提高应用性能。电源控制可根据工作负载的波动动态调整具有DVFS的cpu的频率，以提高能效。两个控制回路之间的协调器实现了期望的性能和能源效率。我们在测试平台上实现了该机制，实验结果表明，新方法在系统吞吐量和平均响应时间方面显着提高了应用程序的性能。结果还表明，该方法可使服务器系统的能耗降低13%。

{"title":"Self-Tuning Batching with DVFS for Improving Performance and Energy Efficiency in Servers","authors":"Dazhao Cheng, Yanfei Guo, Xiaobo Zhou","doi":"10.1109/MASCOTS.2013.12","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.12","url":null,"abstract":"Performance improvement and energy efficiency are two important goals in provisioning Internet services in data center servers. In this paper, we propose and develop a self-tuning request batching mechanism to simultaneously achieve the two correlated goals. The batching mechanism increases the cache hit rate at the front-tier Web server, which provides the opportunity to improve application's performance and energy efficiency of the server system. The core of the batching mechanism is a novel and practical two-layer control system that adaptively adjusts the batching interval and frequency states of CPUs according to the service level agreement and the workload characteristics. The batching control adopts a self-tuning fuzzy model predictive control approach for application performance improvement. The power control dynamically adjusts the frequency of CPUs with DVFS in response to workload fluctuations for energy efficiency. A coordinator between the two control loops achieves the desired performance and energy efficiency. We implement the mechanism in a test bed and experimental results demonstrate that the new approach significantly improves the application's performance in terms of the system throughput and average response time. The results also illustrate it can reduce the energy consumption of the server system by 13% at the same time.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129369779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Transforming System Load to Throughput for Consolidated Applications 将系统负载转换为综合应用程序的吞吐量

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.37

Andrej Podzimek, L. Chen

Today's computing systems monitor and collect a large number of system load statistics, e.g., time series of CPU utilization, but utilization traces do not directly reflect application performance, e.g., response time and throughput. Indeed, resource utilization is the output of conventional performance evaluation approaches, such as queueing models and benchmarking, and often for a single application. In this paper, we address the following research question: How to turn utilization traces from consolidated applications into estimates of application performance metrics? To such an end, we developed "Showstopper", a novel and light-weight benchmarking methodology and tool which orchestrates execution of multi-threaded benchmarks on a multi-core system in parallel, so that the CPU load follows utilization traces and application performance metrics can thus be estimated efficiently. To generate the desired loads, Showstopper alternates stopped and runnable states of multiple benchmarks in a distributed fashion, dynamically adjusting their duty cycles using feedback control mechanisms. Our preliminary evaluation results show that Showstopper can sustain the target loads within 5% of error and obtain reliable throughput estimates for DaCapo benchmarks executed on Linux/x86-64 platforms.

今天的计算系统监控和收集大量的系统负载统计数据，例如CPU利用率的时间序列，但是利用率跟踪并不能直接反映应用程序性能，例如响应时间和吞吐量。实际上，资源利用率是传统性能评估方法(如排队模型和基准测试)的输出，而且通常针对单个应用程序。在本文中，我们解决以下研究问题:如何将整合应用程序的利用率跟踪转换为应用程序性能指标的估计?为此，我们开发了“Showstopper”，这是一种新颖且轻量级的基准测试方法和工具，它可以在多核系统上并行地编排多线程基准测试的执行，以便CPU负载遵循利用率跟踪，从而可以有效地估计应用程序性能指标。为了生成所需的负载，Showstopper以分布式方式交替多个基准测试的已停止和可运行状态，并使用反馈控制机制动态调整它们的占空比。我们的初步评估结果表明，Showstopper可以在5%的误差范围内维持目标负载，并获得在Linux/x86-64平台上执行的DaCapo基准测试的可靠吞吐量估计。

{"title":"Transforming System Load to Throughput for Consolidated Applications","authors":"Andrej Podzimek, L. Chen","doi":"10.1109/MASCOTS.2013.37","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.37","url":null,"abstract":"Today's computing systems monitor and collect a large number of system load statistics, e.g., time series of CPU utilization, but utilization traces do not directly reflect application performance, e.g., response time and throughput. Indeed, resource utilization is the output of conventional performance evaluation approaches, such as queueing models and benchmarking, and often for a single application. In this paper, we address the following research question: How to turn utilization traces from consolidated applications into estimates of application performance metrics? To such an end, we developed \"Showstopper\", a novel and light-weight benchmarking methodology and tool which orchestrates execution of multi-threaded benchmarks on a multi-core system in parallel, so that the CPU load follows utilization traces and application performance metrics can thus be estimated efficiently. To generate the desired loads, Showstopper alternates stopped and runnable states of multiple benchmarks in a distributed fashion, dynamically adjusting their duty cycles using feedback control mechanisms. Our preliminary evaluation results show that Showstopper can sustain the target loads within 5% of error and obtain reliable throughput estimates for DaCapo benchmarks executed on Linux/x86-64 platforms.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130853699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Towards Machine Learning-Based Auto-tuning of MapReduce 基于机器学习的MapReduce自动调优研究

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.9

N. Yigitbasi, Theodore L. Willke, Guangdeng Liao, D. Epema

MapReduce, which is the de facto programming model for large-scale distributed data processing, and its most popular implementation Hadoop have enjoyed widespread adoption in industry during the past few years. Unfortunately, from a performance point of view getting the most out of Hadoop is still a big challenge due to the large number of configuration parameters. Currently these parameters are tuned manually by trial and error, which is ineffective due to the large parameter space and the complex interactions among the parameters. Even worse, the parameters have to be re-tuned for different MapReduce applications and clusters. To make the parameter tuning process more effective, in this paper we explore machine learning-based performance models that we use to auto-tune the configuration parameters. To this end, we first evaluate several machine learning models with diverse MapReduce applications and cluster configurations, and we show that support vector regression model (SVR) has good accuracy and is also computationally efficient. We further assess our auto-tuning approach, which uses the SVR performance model, against the Starfish auto tuner, which uses a cost-based performance model. Our findings reveal that our auto-tuning approach can provide comparable or in some cases better performance improvements than Starfish with a smaller number of parameters. Finally, we propose and discuss a complete and practical end-to-end auto-tuning flow that combines our machine learning-based performance models with smart search algorithms for the effective training of the models and the effective exploration of the parameter space.

MapReduce是大规模分布式数据处理的事实上的编程模型，它最流行的实现Hadoop在过去几年中在工业界得到了广泛的采用。不幸的是，从性能的角度来看，由于大量的配置参数，充分利用Hadoop仍然是一个很大的挑战。由于参数空间大，参数之间的相互作用复杂，目前这些参数的调整都是通过人工试错来实现的。更糟糕的是，必须为不同的MapReduce应用程序和集群重新调整参数。为了使参数调优过程更有效，在本文中，我们探索了用于自动调优配置参数的基于机器学习的性能模型。为此，我们首先评估了几种具有不同MapReduce应用程序和集群配置的机器学习模型，并表明支持向量回归模型(SVR)具有良好的准确性和计算效率。我们进一步评估了我们的自动调谐方法，它使用SVR性能模型，而海星自动调谐器使用基于成本的性能模型。我们的研究结果表明，我们的自动调整方法可以提供与参数数量较少的Starfish相当或在某些情况下更好的性能改进。最后，我们提出并讨论了一个完整且实用的端到端自动调谐流程，该流程将基于机器学习的性能模型与智能搜索算法相结合，用于模型的有效训练和参数空间的有效探索。

{"title":"Towards Machine Learning-Based Auto-tuning of MapReduce","authors":"N. Yigitbasi, Theodore L. Willke, Guangdeng Liao, D. Epema","doi":"10.1109/MASCOTS.2013.9","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.9","url":null,"abstract":"MapReduce, which is the de facto programming model for large-scale distributed data processing, and its most popular implementation Hadoop have enjoyed widespread adoption in industry during the past few years. Unfortunately, from a performance point of view getting the most out of Hadoop is still a big challenge due to the large number of configuration parameters. Currently these parameters are tuned manually by trial and error, which is ineffective due to the large parameter space and the complex interactions among the parameters. Even worse, the parameters have to be re-tuned for different MapReduce applications and clusters. To make the parameter tuning process more effective, in this paper we explore machine learning-based performance models that we use to auto-tune the configuration parameters. To this end, we first evaluate several machine learning models with diverse MapReduce applications and cluster configurations, and we show that support vector regression model (SVR) has good accuracy and is also computationally efficient. We further assess our auto-tuning approach, which uses the SVR performance model, against the Starfish auto tuner, which uses a cost-based performance model. Our findings reveal that our auto-tuning approach can provide comparable or in some cases better performance improvements than Starfish with a smaller number of parameters. Finally, we propose and discuss a complete and practical end-to-end auto-tuning flow that combines our machine learning-based performance models with smart search algorithms for the effective training of the models and the effective exploration of the parameter space.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116386652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 103

Capacity of Simple Multiple-Input-Single-Output Wireless Networks over Uniform or Fractal Maps 均匀或分形地图上简单多输入-单输出无线网络的容量

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.66

P. Jacquet

We want to estimate the average capacity of MISO networks when several simultaneous emitters and a single access point are randomly distributed in an infinite fractal map embedded in a space of dimension D. We first show that the average capacity is a constant when the nodes are uniformly distributed in the space. This constant is function of the space dimension and of the signal attenuation factor, it holds even in presence of non i.i.d. fading effects. We second extend the analysis to fractal maps with a non integer dimension. In this case the constant still holds with the fractal dimension replacing D but the capacity shows small periodic oscillation around this constant when the node density varies. The practical consequence of this result is that the capacity increases significantly when the network map has a small fractal dimension.

我们希望估算 MISO 网络的平均容量，即当多个同时发射器和一个接入点随机分布在嵌入维数为 D 的空间的无限分形图中时的平均容量。这个常数是空间维度和信号衰减系数的函数，即使存在非 i.i.d.衰减效应也成立。其次，我们将分析扩展到非整数维度的分形图。在这种情况下，用分形维数代替 D，常数仍然成立，但当节点密度变化时，容量会在该常数附近出现小的周期性振荡。这一结果的实际结果是，当网络图的分形维数较小时，容量会显著增加。

引用次数: 9

Configuring Cloud Admission Policies under Dynamic Demand 配置动态需求下云接入策略

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.42

Merve Unuvar, Y. Doganata, A. Tantawi

We consider the problem of admitting sets of, possibly heterogenous, virtual machines (VMs) with stochastic resource demands onto physical machines (PMs) in a Cloud environment. The objective is to achieve a specified quality-of-service related to the probability of resource over-utilization in an uncertain loading condition, while minimizing the rejection probability of VM requests. We introduce a method which relies on approximating the probability distribution of the total resource demand on PMs and estimating the probability of over-utilization. We compare our method to two simple admission policies: admission based on maximum demand and admission based on average demand. We investigate the efficiency of the results of using our method on a simulated Cloud environment where we analyze the effects of various parameters (commitment factor, coefficient of variation etc.) on the solution for highly variate demands.

我们考虑了在云环境中允许具有随机资源需求的虚拟机(vm)集合到物理机(pm)上的问题。目标是在不确定的负载条件下实现与资源过度利用概率相关的指定服务质量，同时最小化VM请求的拒绝概率。提出了一种基于估算项目总资源需求概率分布和过度利用概率的方法。我们将我们的方法与两种简单的录取政策进行比较:基于最大需求的录取和基于平均需求的录取。我们研究了在模拟云环境中使用我们的方法的结果的效率，我们分析了各种参数(承诺因子，变异系数等)对高变量需求解决方案的影响。

引用次数: 5