Parameterizing performance models for multi-threaded enterprise applications requires finding the service rates offered by worker threads to the incoming requests. Statistical inference on monitoring data is here helpful to reduce the overheads of application profiling and to infer missing information. While linear regression of utilization data is often used to estimate service rates, it suffers erratic performance and also ignores a large part of application monitoring data, e.g., response times. Yet inference from other metrics, such as response times or queue-length samples, is complicated by the dependence on scheduling policies. To address these issues, we propose novel scheduling-aware estimation approaches for multi-threaded applications based on linear regression and maximum likelihood estimators. The proposed methods estimate demands from samples of the number of requests in execution in the worker threads at the admission instant of a new request. Validation results are presented on simulated and real application datasets for systems with multi-class requests, class switching, and admission control.
{"title":"An Offline Demand Estimation Method for Multi-threaded Applications","authors":"Juan F. Pérez, Sergio Pacheco-Sanchez, G. Casale","doi":"10.1109/MASCOTS.2013.10","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.10","url":null,"abstract":"Parameterizing performance models for multi-threaded enterprise applications requires finding the service rates offered by worker threads to the incoming requests. Statistical inference on monitoring data is here helpful to reduce the overheads of application profiling and to infer missing information. While linear regression of utilization data is often used to estimate service rates, it suffers erratic performance and also ignores a large part of application monitoring data, e.g., response times. Yet inference from other metrics, such as response times or queue-length samples, is complicated by the dependence on scheduling policies. To address these issues, we propose novel scheduling-aware estimation approaches for multi-threaded applications based on linear regression and maximum likelihood estimators. The proposed methods estimate demands from samples of the number of requests in execution in the worker threads at the admission instant of a new request. Validation results are presented on simulated and real application datasets for systems with multi-class requests, class switching, and admission control.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127863412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, internal buffer module and multi-level parallel components have already become the standard elements of SSDs. The internal buffer module is always used as a write cache, reducing the erasures and thus improving overall performance. The multi-level parallelism is exploited to service requests in a concurrent or interleaving manner, which promotes the system throughput. These two aspects have been extensively discussed in the literature. However, current buffer algorithms cannot take full advantage of parallelism inside SSDs. In this paper, we propose a novel write buffer management scheme called Parallelism-Aware Buffer (PAB). In this scheme, the buffer is divided into two parts named as Work-Zone and Para-Zone respectively. Conventional buffer algorithms are employed in the Work-Zone, while the Para-Zone is responsible for reorganizing the requests evicted from Work-Zone according to the underlying parallelism. Simulation results show that with only a small size of Para-Zone, PAB can achieve 19.2% ~ 68.1% enhanced performance compared with LRU based on a page-mapping FTL, while this improvement scope becomes 5.6% ~ 35.6% compared with BPLRU based on the state-of-the-art block-mapping FTL known as FAST.
{"title":"PAB: Parallelism-Aware Buffer Management Scheme for Nand-Based SSDs","authors":"Xufeng Guo, Jianfeng Tan, Yuping Wang","doi":"10.1109/MASCOTS.2013.18","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.18","url":null,"abstract":"Recently, internal buffer module and multi-level parallel components have already become the standard elements of SSDs. The internal buffer module is always used as a write cache, reducing the erasures and thus improving overall performance. The multi-level parallelism is exploited to service requests in a concurrent or interleaving manner, which promotes the system throughput. These two aspects have been extensively discussed in the literature. However, current buffer algorithms cannot take full advantage of parallelism inside SSDs. In this paper, we propose a novel write buffer management scheme called Parallelism-Aware Buffer (PAB). In this scheme, the buffer is divided into two parts named as Work-Zone and Para-Zone respectively. Conventional buffer algorithms are employed in the Work-Zone, while the Para-Zone is responsible for reorganizing the requests evicted from Work-Zone according to the underlying parallelism. Simulation results show that with only a small size of Para-Zone, PAB can achieve 19.2% ~ 68.1% enhanced performance compared with LRU based on a page-mapping FTL, while this improvement scope becomes 5.6% ~ 35.6% compared with BPLRU based on the state-of-the-art block-mapping FTL known as FAST.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125497350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data centers are typically over-provisioned, in order to meet certain service level agreements (SLAs) under worst-case scenarios (e.g., peak loads). Selling unused instances at discounted prices thus is a reasonable approach for data center providers to off-set the maintenance and operation costs. Spot market models are widely used for pricing and allocating unused instances. In this paper, we focus on mechanism design for a data center spot market (DCSM). Particularly, we propose a mechanism based on a repeated uniform price auction, and prove its truthfulness. In the mechanism, to achieve better quality of service, the flexibility of adjusting bids during job execution is provided, and a bidding adjustment model is also discussed. Four metrics are used to evaluate the mechanism: in addition to the commonly used metrics in auction theory, namely, revenue, efficiency, slowdown and waste are defined to capture the Quality of Service (QoS) provided by DCSMs. We prove that a uniform price action achieves optimal efficiency among all single-price auctions in DCSMs. We also conduct comprehensive simulations to explore the performance of the resulting DCSM. The result show that (1) the bidding adjustment model helps increase the revenue by an average of 5%, and decrease the slowdown and waste by average of 5% and 6%, respectively, (2) our model with repeated uniform price auction outperforms the current Amazon Spot Market by an average of 14% in revenue, 24% in efficiency, 13% in slowdown, and by 14% in waste. Parameter tuning studies are also performed to refine the performance of our mechanism.
{"title":"Improving the Revenue, Efficiency and Reliability in Data Center Spot Market: A Truthful Mechanism","authors":"Kai Song, Y. Yao, L. Golubchik","doi":"10.1109/MASCOTS.2013.30","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.30","url":null,"abstract":"Data centers are typically over-provisioned, in order to meet certain service level agreements (SLAs) under worst-case scenarios (e.g., peak loads). Selling unused instances at discounted prices thus is a reasonable approach for data center providers to off-set the maintenance and operation costs. Spot market models are widely used for pricing and allocating unused instances. In this paper, we focus on mechanism design for a data center spot market (DCSM). Particularly, we propose a mechanism based on a repeated uniform price auction, and prove its truthfulness. In the mechanism, to achieve better quality of service, the flexibility of adjusting bids during job execution is provided, and a bidding adjustment model is also discussed. Four metrics are used to evaluate the mechanism: in addition to the commonly used metrics in auction theory, namely, revenue, efficiency, slowdown and waste are defined to capture the Quality of Service (QoS) provided by DCSMs. We prove that a uniform price action achieves optimal efficiency among all single-price auctions in DCSMs. We also conduct comprehensive simulations to explore the performance of the resulting DCSM. The result show that (1) the bidding adjustment model helps increase the revenue by an average of 5%, and decrease the slowdown and waste by average of 5% and 6%, respectively, (2) our model with repeated uniform price auction outperforms the current Amazon Spot Market by an average of 14% in revenue, 24% in efficiency, 13% in slowdown, and by 14% in waste. Parameter tuning studies are also performed to refine the performance of our mechanism.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126055384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bin Wang, Yizheng Jiao, Weikuan Yu, Xipeng Shen, Dong Li, J. Vetter
As a cost-effective compute device, Graphic Processing Unit (GPU) has been widely embraced in the field of high performance computing. GPU is characterized by its massive thread-level parallelism and high memory bandwidth. Although GPU has exhibited tremendous potential, recent GPU architecture researches mainly focus on GPU compute units and full system exploration is rare due to the lack of accurate simulators that can reveal hardware organization of both GPU compute units and its memory system. In order to fill this void, we build a GPU simulator called VxGPUSim that can support the simulation with detailed performance, timing and power consumption statistics. Our experimental evaluation demonstrates that VxGPUSim can faithfully reveal the internal execution details of GPU global memory of various memory configurations. It can enable further research on the design of GPU global memory for performance and energy tradeoffs.
图形处理器(graphics Processing Unit, GPU)作为一种经济高效的计算设备,在高性能计算领域得到了广泛的应用。GPU的特点是具有巨大的线程级并行性和高内存带宽。尽管GPU显示出了巨大的潜力,但目前的GPU架构研究主要集中在GPU计算单元上,由于缺乏精确的模拟器来揭示GPU计算单元及其存储系统的硬件组织,因此很少有完整的系统探索。为了填补这一空白,我们构建了一个名为VxGPUSim的GPU模拟器,它可以通过详细的性能、时序和功耗统计数据来支持仿真。我们的实验评估表明,VxGPUSim可以真实地显示各种内存配置下GPU全局内存的内部执行细节。它可以进一步研究GPU全局存储器的性能和能量权衡的设计。
{"title":"A Versatile Performance and Energy Simulation Tool for Composite GPU Global Memory","authors":"Bin Wang, Yizheng Jiao, Weikuan Yu, Xipeng Shen, Dong Li, J. Vetter","doi":"10.1109/MASCOTS.2013.39","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.39","url":null,"abstract":"As a cost-effective compute device, Graphic Processing Unit (GPU) has been widely embraced in the field of high performance computing. GPU is characterized by its massive thread-level parallelism and high memory bandwidth. Although GPU has exhibited tremendous potential, recent GPU architecture researches mainly focus on GPU compute units and full system exploration is rare due to the lack of accurate simulators that can reveal hardware organization of both GPU compute units and its memory system. In order to fill this void, we build a GPU simulator called VxGPUSim that can support the simulation with detailed performance, timing and power consumption statistics. Our experimental evaluation demonstrates that VxGPUSim can faithfully reveal the internal execution details of GPU global memory of various memory configurations. It can enable further research on the design of GPU global memory for performance and energy tradeoffs.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133102052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kangwook Lee, Lisa Yan, Abhay K. Parekh, K. Ramchandran
We propose, analyze and implement a general architecture for massively parallel VoD content distribution. We allow for devices that have a wide range of reliability, storage and bandwidth constraints. Each device can act as a cache for other devices and can also communicate with a central server. Some devices may be dedicated caches with no co-located users. Our goal is to allow each user device to be able to stream any movie from a large catalog, while minimizing the load of the central server. First, we architect and formulate a static optimization problem that accounts for various network bandwidth and storage capacity constraints, as well as the maximum number of network connections for each device. Not surprisingly this formulation is NP-hard. We then use a Markov approximation technique in a primal-dual framework to devise a highly distributed algorithm which is provably close to the optimal. Next we test the practical effectiveness of the distributed algorithm in several ways. We demonstrate remarkable robustness to system scale and changes in demand, user churn, network failure and node failures via a packet level simulation of the system. Finally, we describe our results from numerous experiments on a full implementation of the system with 60 caches and 120 users on 20 Amazon EC2 instances. In addition to corroborating our analytical and simulation-based findings, the implementation allows us to examine various system-level tradeoffs. Examples of this include: (i) the split between server to cache and cache to device traffic, (ii) the tradeoff between cache update intervals and the time taken for the system to adjust to changes in demand, and (iii) the tradeoff between the rate of virtual topology updates and convergence. These insights give us the confidence to claim that a much larger system on the scale of hundreds of thousands of highly heterogeneous nodes would perform as well as our current implementation.
{"title":"A VoD System for Massively Scaled, Heterogeneous Environments: Design and Implementation","authors":"Kangwook Lee, Lisa Yan, Abhay K. Parekh, K. Ramchandran","doi":"10.1109/MASCOTS.2013.8","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.8","url":null,"abstract":"We propose, analyze and implement a general architecture for massively parallel VoD content distribution. We allow for devices that have a wide range of reliability, storage and bandwidth constraints. Each device can act as a cache for other devices and can also communicate with a central server. Some devices may be dedicated caches with no co-located users. Our goal is to allow each user device to be able to stream any movie from a large catalog, while minimizing the load of the central server. First, we architect and formulate a static optimization problem that accounts for various network bandwidth and storage capacity constraints, as well as the maximum number of network connections for each device. Not surprisingly this formulation is NP-hard. We then use a Markov approximation technique in a primal-dual framework to devise a highly distributed algorithm which is provably close to the optimal. Next we test the practical effectiveness of the distributed algorithm in several ways. We demonstrate remarkable robustness to system scale and changes in demand, user churn, network failure and node failures via a packet level simulation of the system. Finally, we describe our results from numerous experiments on a full implementation of the system with 60 caches and 120 users on 20 Amazon EC2 instances. In addition to corroborating our analytical and simulation-based findings, the implementation allows us to examine various system-level tradeoffs. Examples of this include: (i) the split between server to cache and cache to device traffic, (ii) the tradeoff between cache update intervals and the time taken for the system to adjust to changes in demand, and (iii) the tradeoff between the rate of virtual topology updates and convergence. These insights give us the confidence to claim that a much larger system on the scale of hundreds of thousands of highly heterogeneous nodes would perform as well as our current implementation.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"348 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133102765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Performance improvement and energy efficiency are two important goals in provisioning Internet services in data center servers. In this paper, we propose and develop a self-tuning request batching mechanism to simultaneously achieve the two correlated goals. The batching mechanism increases the cache hit rate at the front-tier Web server, which provides the opportunity to improve application's performance and energy efficiency of the server system. The core of the batching mechanism is a novel and practical two-layer control system that adaptively adjusts the batching interval and frequency states of CPUs according to the service level agreement and the workload characteristics. The batching control adopts a self-tuning fuzzy model predictive control approach for application performance improvement. The power control dynamically adjusts the frequency of CPUs with DVFS in response to workload fluctuations for energy efficiency. A coordinator between the two control loops achieves the desired performance and energy efficiency. We implement the mechanism in a test bed and experimental results demonstrate that the new approach significantly improves the application's performance in terms of the system throughput and average response time. The results also illustrate it can reduce the energy consumption of the server system by 13% at the same time.
{"title":"Self-Tuning Batching with DVFS for Improving Performance and Energy Efficiency in Servers","authors":"Dazhao Cheng, Yanfei Guo, Xiaobo Zhou","doi":"10.1109/MASCOTS.2013.12","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.12","url":null,"abstract":"Performance improvement and energy efficiency are two important goals in provisioning Internet services in data center servers. In this paper, we propose and develop a self-tuning request batching mechanism to simultaneously achieve the two correlated goals. The batching mechanism increases the cache hit rate at the front-tier Web server, which provides the opportunity to improve application's performance and energy efficiency of the server system. The core of the batching mechanism is a novel and practical two-layer control system that adaptively adjusts the batching interval and frequency states of CPUs according to the service level agreement and the workload characteristics. The batching control adopts a self-tuning fuzzy model predictive control approach for application performance improvement. The power control dynamically adjusts the frequency of CPUs with DVFS in response to workload fluctuations for energy efficiency. A coordinator between the two control loops achieves the desired performance and energy efficiency. We implement the mechanism in a test bed and experimental results demonstrate that the new approach significantly improves the application's performance in terms of the system throughput and average response time. The results also illustrate it can reduce the energy consumption of the server system by 13% at the same time.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129369779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Today's computing systems monitor and collect a large number of system load statistics, e.g., time series of CPU utilization, but utilization traces do not directly reflect application performance, e.g., response time and throughput. Indeed, resource utilization is the output of conventional performance evaluation approaches, such as queueing models and benchmarking, and often for a single application. In this paper, we address the following research question: How to turn utilization traces from consolidated applications into estimates of application performance metrics? To such an end, we developed "Showstopper", a novel and light-weight benchmarking methodology and tool which orchestrates execution of multi-threaded benchmarks on a multi-core system in parallel, so that the CPU load follows utilization traces and application performance metrics can thus be estimated efficiently. To generate the desired loads, Showstopper alternates stopped and runnable states of multiple benchmarks in a distributed fashion, dynamically adjusting their duty cycles using feedback control mechanisms. Our preliminary evaluation results show that Showstopper can sustain the target loads within 5% of error and obtain reliable throughput estimates for DaCapo benchmarks executed on Linux/x86-64 platforms.
{"title":"Transforming System Load to Throughput for Consolidated Applications","authors":"Andrej Podzimek, L. Chen","doi":"10.1109/MASCOTS.2013.37","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.37","url":null,"abstract":"Today's computing systems monitor and collect a large number of system load statistics, e.g., time series of CPU utilization, but utilization traces do not directly reflect application performance, e.g., response time and throughput. Indeed, resource utilization is the output of conventional performance evaluation approaches, such as queueing models and benchmarking, and often for a single application. In this paper, we address the following research question: How to turn utilization traces from consolidated applications into estimates of application performance metrics? To such an end, we developed \"Showstopper\", a novel and light-weight benchmarking methodology and tool which orchestrates execution of multi-threaded benchmarks on a multi-core system in parallel, so that the CPU load follows utilization traces and application performance metrics can thus be estimated efficiently. To generate the desired loads, Showstopper alternates stopped and runnable states of multiple benchmarks in a distributed fashion, dynamically adjusting their duty cycles using feedback control mechanisms. Our preliminary evaluation results show that Showstopper can sustain the target loads within 5% of error and obtain reliable throughput estimates for DaCapo benchmarks executed on Linux/x86-64 platforms.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130853699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Yigitbasi, Theodore L. Willke, Guangdeng Liao, D. Epema
MapReduce, which is the de facto programming model for large-scale distributed data processing, and its most popular implementation Hadoop have enjoyed widespread adoption in industry during the past few years. Unfortunately, from a performance point of view getting the most out of Hadoop is still a big challenge due to the large number of configuration parameters. Currently these parameters are tuned manually by trial and error, which is ineffective due to the large parameter space and the complex interactions among the parameters. Even worse, the parameters have to be re-tuned for different MapReduce applications and clusters. To make the parameter tuning process more effective, in this paper we explore machine learning-based performance models that we use to auto-tune the configuration parameters. To this end, we first evaluate several machine learning models with diverse MapReduce applications and cluster configurations, and we show that support vector regression model (SVR) has good accuracy and is also computationally efficient. We further assess our auto-tuning approach, which uses the SVR performance model, against the Starfish auto tuner, which uses a cost-based performance model. Our findings reveal that our auto-tuning approach can provide comparable or in some cases better performance improvements than Starfish with a smaller number of parameters. Finally, we propose and discuss a complete and practical end-to-end auto-tuning flow that combines our machine learning-based performance models with smart search algorithms for the effective training of the models and the effective exploration of the parameter space.
{"title":"Towards Machine Learning-Based Auto-tuning of MapReduce","authors":"N. Yigitbasi, Theodore L. Willke, Guangdeng Liao, D. Epema","doi":"10.1109/MASCOTS.2013.9","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.9","url":null,"abstract":"MapReduce, which is the de facto programming model for large-scale distributed data processing, and its most popular implementation Hadoop have enjoyed widespread adoption in industry during the past few years. Unfortunately, from a performance point of view getting the most out of Hadoop is still a big challenge due to the large number of configuration parameters. Currently these parameters are tuned manually by trial and error, which is ineffective due to the large parameter space and the complex interactions among the parameters. Even worse, the parameters have to be re-tuned for different MapReduce applications and clusters. To make the parameter tuning process more effective, in this paper we explore machine learning-based performance models that we use to auto-tune the configuration parameters. To this end, we first evaluate several machine learning models with diverse MapReduce applications and cluster configurations, and we show that support vector regression model (SVR) has good accuracy and is also computationally efficient. We further assess our auto-tuning approach, which uses the SVR performance model, against the Starfish auto tuner, which uses a cost-based performance model. Our findings reveal that our auto-tuning approach can provide comparable or in some cases better performance improvements than Starfish with a smaller number of parameters. Finally, we propose and discuss a complete and practical end-to-end auto-tuning flow that combines our machine learning-based performance models with smart search algorithms for the effective training of the models and the effective exploration of the parameter space.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116386652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We want to estimate the average capacity of MISO networks when several simultaneous emitters and a single access point are randomly distributed in an infinite fractal map embedded in a space of dimension D. We first show that the average capacity is a constant when the nodes are uniformly distributed in the space. This constant is function of the space dimension and of the signal attenuation factor, it holds even in presence of non i.i.d. fading effects. We second extend the analysis to fractal maps with a non integer dimension. In this case the constant still holds with the fractal dimension replacing D but the capacity shows small periodic oscillation around this constant when the node density varies. The practical consequence of this result is that the capacity increases significantly when the network map has a small fractal dimension.
我们希望估算 MISO 网络的平均容量,即当多个同时发射器和一个接入点随机分布在嵌入维数为 D 的空间的无限分形图中时的平均容量。这个常数是空间维度和信号衰减系数的函数,即使存在非 i.i.d.衰减效应也成立。其次,我们将分析扩展到非整数维度的分形图。在这种情况下,用分形维数代替 D,常数仍然成立,但当节点密度变化时,容量会在该常数附近出现小的周期性振荡。这一结果的实际结果是,当网络图的分形维数较小时,容量会显著增加。
{"title":"Capacity of Simple Multiple-Input-Single-Output Wireless Networks over Uniform or Fractal Maps","authors":"P. Jacquet","doi":"10.1109/MASCOTS.2013.66","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.66","url":null,"abstract":"We want to estimate the average capacity of MISO networks when several simultaneous emitters and a single access point are randomly distributed in an infinite fractal map embedded in a space of dimension D. We first show that the average capacity is a constant when the nodes are uniformly distributed in the space. This constant is function of the space dimension and of the signal attenuation factor, it holds even in presence of non i.i.d. fading effects. We second extend the analysis to fractal maps with a non integer dimension. In this case the constant still holds with the fractal dimension replacing D but the capacity shows small periodic oscillation around this constant when the node density varies. The practical consequence of this result is that the capacity increases significantly when the network map has a small fractal dimension.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130771548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the problem of admitting sets of, possibly heterogenous, virtual machines (VMs) with stochastic resource demands onto physical machines (PMs) in a Cloud environment. The objective is to achieve a specified quality-of-service related to the probability of resource over-utilization in an uncertain loading condition, while minimizing the rejection probability of VM requests. We introduce a method which relies on approximating the probability distribution of the total resource demand on PMs and estimating the probability of over-utilization. We compare our method to two simple admission policies: admission based on maximum demand and admission based on average demand. We investigate the efficiency of the results of using our method on a simulated Cloud environment where we analyze the effects of various parameters (commitment factor, coefficient of variation etc.) on the solution for highly variate demands.
{"title":"Configuring Cloud Admission Policies under Dynamic Demand","authors":"Merve Unuvar, Y. Doganata, A. Tantawi","doi":"10.1109/MASCOTS.2013.42","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.42","url":null,"abstract":"We consider the problem of admitting sets of, possibly heterogenous, virtual machines (VMs) with stochastic resource demands onto physical machines (PMs) in a Cloud environment. The objective is to achieve a specified quality-of-service related to the probability of resource over-utilization in an uncertain loading condition, while minimizing the rejection probability of VM requests. We introduce a method which relies on approximating the probability distribution of the total resource demand on PMs and estimating the probability of over-utilization. We compare our method to two simple admission policies: admission based on maximum demand and admission based on average demand. We investigate the efficiency of the results of using our method on a simulated Cloud environment where we analyze the effects of various parameters (commitment factor, coefficient of variation etc.) on the solution for highly variate demands.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132501722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}