Tahir Diop, Steven Gurfinkel, J. Anderson, Natalie D. Enright Jerger
GPUs are used to speed up many scientific computations, however, to use several networked GPUs concurrently, the programmer must explicitly partition work and transmit data between devices. We propose DistCL, a novel framework that distributes the execution of penCL kernels across a GPU cluster. DistCL makes multiple distributed compute devices appear to be a single compute device. DistCL abstracts and manages many of the challenges associated with distributing a kernel across multiple devices including: (1) partitioning work into smaller parts, (2) scheduling these parts across the network, (3) partitioning memory so that each part of memory is written to by at most one device, and (4) tracking and transferring these parts of memory. Converting an OpenCL application to DistCL is straightforward and requires little programmer effort. This makes it a powerful and valuable tool for exploring the distributed execution of OpenCL kernels. We compare DistCL to SnuCL, which also facilitates the distribution of OpenCL kernels. We also give some insights: distributed tasks favor more compute bound problems and favour large contiguous memory accesses. DistCL achieves a maximum speedup of 29.1 and average speedups of 7.3 when distributing kernels among 32 peers over an Infiniband cluster.
{"title":"DistCL: A Framework for the Distributed Execution of OpenCL Kernels","authors":"Tahir Diop, Steven Gurfinkel, J. Anderson, Natalie D. Enright Jerger","doi":"10.1109/MASCOTS.2013.77","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.77","url":null,"abstract":"GPUs are used to speed up many scientific computations, however, to use several networked GPUs concurrently, the programmer must explicitly partition work and transmit data between devices. We propose DistCL, a novel framework that distributes the execution of penCL kernels across a GPU cluster. DistCL makes multiple distributed compute devices appear to be a single compute device. DistCL abstracts and manages many of the challenges associated with distributing a kernel across multiple devices including: (1) partitioning work into smaller parts, (2) scheduling these parts across the network, (3) partitioning memory so that each part of memory is written to by at most one device, and (4) tracking and transferring these parts of memory. Converting an OpenCL application to DistCL is straightforward and requires little programmer effort. This makes it a powerful and valuable tool for exploring the distributed execution of OpenCL kernels. We compare DistCL to SnuCL, which also facilitates the distribution of OpenCL kernels. We also give some insights: distributed tasks favor more compute bound problems and favour large contiguous memory accesses. DistCL achieves a maximum speedup of 29.1 and average speedups of 7.3 when distributing kernels among 32 peers over an Infiniband cluster.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116025896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reliability is an important factor to consider when designing and deploying SSDs in storage systems. Both the endurance and the retention time of flash memory are affected by the history of low-level stress and recovery patterns in flash cells, which are determined by the workload characteristics, the time during which the workload utilizes the SSD, and the FTL algorithms. Accurately assessing SSD reliability requires simulating several years' of workload behavior, which is time consuming. This paper presents a methodology that uses snapshot-based sampling and clustering techniques to help reduce the simulation time while maintaining high accuracy. The methodology leverages the key insight that most of the large changes in retention time occur early in the lifetime of the SSD, whereas most of the simulation time is spent in its later stages. This allows simulation acceleration to focus on the later stages without significant loss of accuracy. We show that our approach provides an average speed-up of 12X relative to detailed simulation with an error of 3.21% in the estimated mean and 6.42% in the estimated standard deviation of the retention times of the blocks in the SSD.
{"title":"A Novel Simulation Methodology for Accelerating Reliability Assessment of SSDs","authors":"Luyao Jiang, S. Gurumurthi","doi":"10.1109/MASCOTS.2013.46","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.46","url":null,"abstract":"Reliability is an important factor to consider when designing and deploying SSDs in storage systems. Both the endurance and the retention time of flash memory are affected by the history of low-level stress and recovery patterns in flash cells, which are determined by the workload characteristics, the time during which the workload utilizes the SSD, and the FTL algorithms. Accurately assessing SSD reliability requires simulating several years' of workload behavior, which is time consuming. This paper presents a methodology that uses snapshot-based sampling and clustering techniques to help reduce the simulation time while maintaining high accuracy. The methodology leverages the key insight that most of the large changes in retention time occur early in the lifetime of the SSD, whereas most of the simulation time is spent in its later stages. This allows simulation acceleration to focus on the later stages without significant loss of accuracy. We show that our approach provides an average speed-up of 12X relative to detailed simulation with an error of 3.21% in the estimated mean and 6.42% in the estimated standard deviation of the retention times of the blocks in the SSD.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122413355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the steady-state distribution of networks of order independent queues with negative signals which delete customers. An Order Independent queue is defined by a service rate which is independent on the order of the customers in the queue. Such an abstract discipline may be used to model complex blocking mechanism (for instance the Multiserver Station with Concurrent Classes of Customers). Order independent queues are in general neither symmetric nor reversible. We prove that, under usual assumptions on the arrivals, the services and the routing of customers, such a network of queues with signals has a steady-state distribution with product form solution. The proof is based on the quasi-reversibility of the queues. We also present some examples of application for this new analytical result.
{"title":"Networks of Order Independent Queues with Signals","authors":"Thu-Ha Dao-Thi, J. Fourneau, Minh-Anh Tran","doi":"10.1109/MASCOTS.2013.21","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.21","url":null,"abstract":"We study the steady-state distribution of networks of order independent queues with negative signals which delete customers. An Order Independent queue is defined by a service rate which is independent on the order of the customers in the queue. Such an abstract discipline may be used to model complex blocking mechanism (for instance the Multiserver Station with Concurrent Classes of Customers). Order independent queues are in general neither symmetric nor reversible. We prove that, under usual assumptions on the arrivals, the services and the routing of customers, such a network of queues with signals has a steady-state distribution with product form solution. The proof is based on the quasi-reversibility of the queues. We also present some examples of application for this new analytical result.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128520171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Network virtualization can potentially overcome Internet ossification. This technology lets multiple virtual networks run on a shared physical infrastructure. A key step lies in mapping a virtual network request to a resource allocation in the network substrate. Previous approaches to this network embedding problem assumed the request will ask for specific resources, such as network capacity or computing power. However, the end-user is more interested in performance. This paper therefore considers a different request format, namely a request will ask for a certain quality of service (QoS). The infrastructure provider must then determine the resource allocation necessary for this QoS. In particular, the provider must take into account user reaction to perceived performance and adjust the allocation dynamically. To this end, we propose an estimation mechanism that is based on analyzing the interaction between user behavior and network performance. This approach can dynamically adjust resource estimations when QoS requirements change. Our simulation-based experiments demonstrate that the proposed approach can satisfy user performance requirements through appropriate resource estimation. Moreover, our approach can adjust resource estimations efficiently and accurately.
{"title":"Resource Estimation for Network Virtualization through Users and Network Interaction Analysis","authors":"Bo-Chun Wang, Y. Tay, L. Golubchik","doi":"10.1109/MASCOTS.2013.65","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.65","url":null,"abstract":"Network virtualization can potentially overcome Internet ossification. This technology lets multiple virtual networks run on a shared physical infrastructure. A key step lies in mapping a virtual network request to a resource allocation in the network substrate. Previous approaches to this network embedding problem assumed the request will ask for specific resources, such as network capacity or computing power. However, the end-user is more interested in performance. This paper therefore considers a different request format, namely a request will ask for a certain quality of service (QoS). The infrastructure provider must then determine the resource allocation necessary for this QoS. In particular, the provider must take into account user reaction to perceived performance and adjust the allocation dynamically. To this end, we propose an estimation mechanism that is based on analyzing the interaction between user behavior and network performance. This approach can dynamically adjust resource estimations when QoS requirements change. Our simulation-based experiments demonstrate that the proposed approach can satisfy user performance requirements through appropriate resource estimation. Moreover, our approach can adjust resource estimations efficiently and accurately.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123965515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fault-tolerant disk arrays rely on replication or erasure-coding to reconstruct lost data after a disk failure. As disk capacity increases, so does the risk of encountering irrecoverable read errors that would prevent the full recovery of the lost data. We propose a three-dimensional erasure-coding technique that reduces that risk by guaranteeing full recovery in the presence of all triple and nearly all quadruple disk failures. Our solution performs better than existing solutions, such as sets of disk arrays using Reed-Solomon codes against triple failures in each individual array. Given its very high reliability, it is especially suited to the needs of very large data sets that must be preserved over long periods of time.
{"title":"Three-Dimensional Redundancy Codes for Archival Storage","authors":"Jehan-Francois Pâris, D. Long, W. Litwin","doi":"10.1109/MASCOTS.2013.45","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.45","url":null,"abstract":"Fault-tolerant disk arrays rely on replication or erasure-coding to reconstruct lost data after a disk failure. As disk capacity increases, so does the risk of encountering irrecoverable read errors that would prevent the full recovery of the lost data. We propose a three-dimensional erasure-coding technique that reduces that risk by guaranteeing full recovery in the presence of all triple and nearly all quadruple disk failures. Our solution performs better than existing solutions, such as sets of disk arrays using Reed-Solomon codes against triple failures in each individual array. Given its very high reliability, it is especially suited to the needs of very large data sets that must be preserved over long periods of time.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123315778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Among other statistical features, the analysis of fine-grained GPS traces from different outdoor scenarios has shown that human mobility statistically resembles Lévy Walks and led to the design of the Self-similar Least-Action Walk (SLAW) mobility model. It was concluded that human mobility is scale-free and that this feature is invariant irrespective of any geographic constraints. These constraints were considered too scenario-specific and were omitted in SLAW. However, we argue that geographic constraints should not be considered as an unnecessary detail, but as an important feature of a realistic mobility model for the simulative performance evaluation of mobile networks. Therefore, we introduce geographic restrictions to SLAW in the form of maps. Our evaluation of the extended model (called MSLAW) shows that the introduced restrictions have a significant impact on several performance metrics relevant for opportunistic networks.
{"title":"Introducing Geographic Restrictions to the SLAW Human Mobility Model","authors":"Matthias Schwamborn, N. Aschenbruck","doi":"10.1109/MASCOTS.2013.34","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.34","url":null,"abstract":"Among other statistical features, the analysis of fine-grained GPS traces from different outdoor scenarios has shown that human mobility statistically resembles Lévy Walks and led to the design of the Self-similar Least-Action Walk (SLAW) mobility model. It was concluded that human mobility is scale-free and that this feature is invariant irrespective of any geographic constraints. These constraints were considered too scenario-specific and were omitted in SLAW. However, we argue that geographic constraints should not be considered as an unnecessary detail, but as an important feature of a realistic mobility model for the simulative performance evaluation of mobile networks. Therefore, we introduce geographic restrictions to SLAW in the form of maps. Our evaluation of the extended model (called MSLAW) shows that the introduced restrictions have a significant impact on several performance metrics relevant for opportunistic networks.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121779691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Adams, M. Storer, Avani Wildani, E. L. Miller, B. A. Madden
There is a large body of work-such as system administration and intrusion detection-that relies upon storage system logs and snapshots. These solutions rely on accurate system records, however, little effort has been made to verify the correctness of logging instrumentation and log reliability. We present a solution, called ExDiff, that uses expectation differencing to validate storage system logs. Our solution can identify development errors such as the omission of a logging point and runtime errors such as log crashes. ExDiff uses metadata snapshots and activity logs to predict the expected state of the system and compares that with the system's actual state. Mismatches between the expected and actual metadata states can then be used to highlight gaps in log coverage, as well as aid in identifying specific types of missing entries. We show that ExDiff provides valuable insight to system designers, administrators and researchers by accurately identifying gaps in log coverage, providing clues useful in isolating specific types of missing log entries, and highlighting potential misunderstandings in logged action.
{"title":"Validating Storage System Instrumentation","authors":"I. Adams, M. Storer, Avani Wildani, E. L. Miller, B. A. Madden","doi":"10.1109/MASCOTS.2013.73","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.73","url":null,"abstract":"There is a large body of work-such as system administration and intrusion detection-that relies upon storage system logs and snapshots. These solutions rely on accurate system records, however, little effort has been made to verify the correctness of logging instrumentation and log reliability. We present a solution, called ExDiff, that uses expectation differencing to validate storage system logs. Our solution can identify development errors such as the omission of a logging point and runtime errors such as log crashes. ExDiff uses metadata snapshots and activity logs to predict the expected state of the system and compares that with the system's actual state. Mismatches between the expected and actual metadata states can then be used to highlight gaps in log coverage, as well as aid in identifying specific types of missing entries. We show that ExDiff provides valuable insight to system designers, administrators and researchers by accurately identifying gaps in log coverage, providing clues useful in isolating specific types of missing log entries, and highlighting potential misunderstandings in logged action.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134550575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Comprehensive analyses that aim to better understand the topology of real world networks have been an important research challenge. Internet topology measurement studies provide samples of the underlying network at various levels. Although router-level Internet topology measurement systems target low level Internet infrastructure, they primarily focus on the Layer-3 connectivity and ignore the underlying multi-access links. In this paper, in addition to the thoroughly studied degree distribution, we analyze the subnet and interface distributions of major Internet topology datasets. We also investigate the impact of the higher granularity modeling at link level versus router level modeling. Our analysis establishes a foundation for the Layer-2 Internet topology generation and introduces the link layer characteristics into the network modeling.
{"title":"Impact of Multi-access Links on the Internet Topology Modeling","authors":"M. Akgun, M. H. Gunes","doi":"10.1109/MASCOTS.2013.60","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.60","url":null,"abstract":"Comprehensive analyses that aim to better understand the topology of real world networks have been an important research challenge. Internet topology measurement studies provide samples of the underlying network at various levels. Although router-level Internet topology measurement systems target low level Internet infrastructure, they primarily focus on the Layer-3 connectivity and ignore the underlying multi-access links. In this paper, in addition to the thoroughly studied degree distribution, we analyze the subnet and interface distributions of major Internet topology datasets. We also investigate the impact of the higher granularity modeling at link level versus router level modeling. Our analysis establishes a foundation for the Layer-2 Internet topology generation and introduces the link layer characteristics into the network modeling.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131029895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To efficiently manage resources and provide guaranteed services, today's computing systems monitor and collect a large number of resource usages, for example the average and time series of CPU utilization. However, little is known about the analytical distribution of resource usages, which are the crucial parameters to infer performance metrics defined in service level agreements (SLAs), such as response times and throughputs. In this paper, we aim to characterize the entire distribution of CPU utilization via stochastic reward models. In particular, we first study and derive the probability density function of the utilization of widely known and applied queuing systems, namely Poisson processes, Markov modulated Poisson processes and time-varying Poisson processes. Secondly, we apply our proposed analysis on characterizing the CPU usage of live production systems, and simulated queuing systems. Evaluation results show that analytical characterization of the selected queueing models can capture the utilization distribution of a wide range of real-life systems well, and we argue the robustness of our methodology to further infer system performance metrics.
{"title":"Characterization Analysis of Resource Utilization Distribution","authors":"R. Birke, L. Chen, M. Gribaudo, P. Piazzolla","doi":"10.1109/MASCOTS.2013.54","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.54","url":null,"abstract":"To efficiently manage resources and provide guaranteed services, today's computing systems monitor and collect a large number of resource usages, for example the average and time series of CPU utilization. However, little is known about the analytical distribution of resource usages, which are the crucial parameters to infer performance metrics defined in service level agreements (SLAs), such as response times and throughputs. In this paper, we aim to characterize the entire distribution of CPU utilization via stochastic reward models. In particular, we first study and derive the probability density function of the utilization of widely known and applied queuing systems, namely Poisson processes, Markov modulated Poisson processes and time-varying Poisson processes. Secondly, we apply our proposed analysis on characterizing the CPU usage of live production systems, and simulated queuing systems. Evaluation results show that analytical characterization of the selected queueing models can capture the utilization distribution of a wide range of real-life systems well, and we argue the robustness of our methodology to further infer system performance metrics.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133091804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As businesses move their critical IT operations to multi-tenant cloud data centers, it is becoming increasingly important to provide network performance guarantees to individual tenants. Due to the impact of network congestion on the performance of many common cloud applications, recent work has focused on enabling network reservation for individual tenants. Current network reservation methods, however, do not gracefully degrade in the presence of network over subscriptions that may frequently occur in a cloud environment. In this context, for a shared data center network, we introduce Network Satisfaction Ratio (NSR) as a measure of the satisfaction derived by a tenant from a given network reservation. NSR is defined as the ratio of the actual reserved bandwidth to the desired bandwidth of the tenant. Based on NSR, we present a novel network reservation mechanism that can admit time-varying tenant requests and can fairly distribute any degradation in the NSR among the tenants in presence of network over subscription. We evaluate the proposed method using both synthetic network traffic trace and representative data center traffic trace generated by running a reduced data center job trace in a small test bed. The evaluation shows that our method adapts to changes in network reservations, and it provides significant and fair improvement in NSR when the data center network is oversubscribed.
{"title":"Managing Network Reservation for Tenants in Oversubscribed Clouds","authors":"Mayank Mishra, P. Dutta, Praveen Kumar, V. Mann","doi":"10.1109/MASCOTS.2013.13","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.13","url":null,"abstract":"As businesses move their critical IT operations to multi-tenant cloud data centers, it is becoming increasingly important to provide network performance guarantees to individual tenants. Due to the impact of network congestion on the performance of many common cloud applications, recent work has focused on enabling network reservation for individual tenants. Current network reservation methods, however, do not gracefully degrade in the presence of network over subscriptions that may frequently occur in a cloud environment. In this context, for a shared data center network, we introduce Network Satisfaction Ratio (NSR) as a measure of the satisfaction derived by a tenant from a given network reservation. NSR is defined as the ratio of the actual reserved bandwidth to the desired bandwidth of the tenant. Based on NSR, we present a novel network reservation mechanism that can admit time-varying tenant requests and can fairly distribute any degradation in the NSR among the tenants in presence of network over subscription. We evaluate the proposed method using both synthetic network traffic trace and representative data center traffic trace generated by running a reduced data center job trace in a small test bed. The evaluation shows that our method adapts to changes in network reservations, and it provides significant and fair improvement in NSR when the data center network is oversubscribed.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129744149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}