Pub Date : 2016-06-20DOI: 10.1109/IWQoS.2016.7590441
Yanwei Liu, Jinxia Liu, Zhen Xu, S. Ci
Today, how to accurately predict the quality of experience (QoE) of the networking service is a very important issue for the network operator to optimize the service. However, due to the complex multi-dimensional characteristics of QoE, QoE estimation is extremely challenging. With utilizing the advantages of quality of service (QoS) in evaluating the networking performance, we exploit QoS/QoE correlation to predict QoE by building a QoSto-QoE mapping relationship. To fully consider the inter-dependency among QoS parameters towards forming the QoE, a Choquet integral based fuzzy measurement method is used to map QoS to QoE. Via extensive experiments in mobile VoD applications, the advancement and effectiveness of the proposed method are verified.
{"title":"Choquet integral based QoS-to-QoE mapping for mobile VoD applications","authors":"Yanwei Liu, Jinxia Liu, Zhen Xu, S. Ci","doi":"10.1109/IWQoS.2016.7590441","DOIUrl":"https://doi.org/10.1109/IWQoS.2016.7590441","url":null,"abstract":"Today, how to accurately predict the quality of experience (QoE) of the networking service is a very important issue for the network operator to optimize the service. However, due to the complex multi-dimensional characteristics of QoE, QoE estimation is extremely challenging. With utilizing the advantages of quality of service (QoS) in evaluating the networking performance, we exploit QoS/QoE correlation to predict QoE by building a QoSto-QoE mapping relationship. To fully consider the inter-dependency among QoS parameters towards forming the QoE, a Choquet integral based fuzzy measurement method is used to map QoS to QoE. Via extensive experiments in mobile VoD applications, the advancement and effectiveness of the proposed method are verified.","PeriodicalId":304978,"journal":{"name":"2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129578756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-20DOI: 10.1109/IWQoS.2016.7590386
Jiajie Shen, Jiazhen Gu, Yangfan Zhou, Xin Wang
In data storage systems, data are typically stored in redundant storage nodes to ensure storage reliability. When storage nodes fail, with the help of the redundant nodes, the lost data can be restored in new storage nodes. Such a regeneration process may be aborted, since storage nodes may fail during the process. Therefore, reducing the time of regeneration process is a well-known challenge to improve the reliability of storage systems. Delayed repair is a typical repair scheme in real-world storage systems. It reduces the overhead of the regeneration process by recovering multiple node failures simultaneously. How to reduce the regeneration time of delayed repair is yet to be well addressed. Since available bandwidth is flowing in storage systems and the regeneration time is seriously affected by the available bandwidth, we find the key to solve this problem is determining the start time of the regeneration process. Via modeling this problem with Lyaponuv optimization framework, we propose an OMFR scheme to reduce the regeneration time. The experimental results show that OMFR scheme can reduce cumulative regeneration time by up to 78% compared with traditional delayed repair schemes.
{"title":"Bandwidth-aware delayed repair in distributed storage systems","authors":"Jiajie Shen, Jiazhen Gu, Yangfan Zhou, Xin Wang","doi":"10.1109/IWQoS.2016.7590386","DOIUrl":"https://doi.org/10.1109/IWQoS.2016.7590386","url":null,"abstract":"In data storage systems, data are typically stored in redundant storage nodes to ensure storage reliability. When storage nodes fail, with the help of the redundant nodes, the lost data can be restored in new storage nodes. Such a regeneration process may be aborted, since storage nodes may fail during the process. Therefore, reducing the time of regeneration process is a well-known challenge to improve the reliability of storage systems. Delayed repair is a typical repair scheme in real-world storage systems. It reduces the overhead of the regeneration process by recovering multiple node failures simultaneously. How to reduce the regeneration time of delayed repair is yet to be well addressed. Since available bandwidth is flowing in storage systems and the regeneration time is seriously affected by the available bandwidth, we find the key to solve this problem is determining the start time of the regeneration process. Via modeling this problem with Lyaponuv optimization framework, we propose an OMFR scheme to reduce the regeneration time. The experimental results show that OMFR scheme can reduce cumulative regeneration time by up to 78% compared with traditional delayed repair schemes.","PeriodicalId":304978,"journal":{"name":"2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121494810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-20DOI: 10.1109/IWQoS.2016.7590439
Jiwei Li, Zhe Peng, Bin Xiao
Wearable cameras require connecting to cellular-capable devices (e.g., smartphones) so as to provide live broadcast services for worldwide users when Wi-Fi is unavailable. However, the constantly changing cellular network conditions may substantially slow down the upload of recorded videos. In this paper, we consider the scenario where wearable cameras upload live videos to remote distribution servers under cellular networks, aiming at maximizing the quality of uploaded videos while meeting the delay requirements. To attain the goal, we propose a dynamic video coding approach that utilizes dynamic video recording resolution adjustment on wearable cameras and Lyapunov based video preprocessing on smartphones. Our proposed resolution adjustment algorithm adapts to network condition changes, and reduces the overheads of video preprocessing. Due to the property of Lyapunov optimization framework, our proposed video preprocessing algorithm delivers near-optimal video quality while meeting the upload delay requirements. Our evaluation results show that our approach achieves up to 50% reduction in power consumption on smartphones and up to 60% reduction in average delay, at the cost of slightly compromised video quality.
{"title":"Smartphone-assisted smooth live video broadcast on wearable cameras","authors":"Jiwei Li, Zhe Peng, Bin Xiao","doi":"10.1109/IWQoS.2016.7590439","DOIUrl":"https://doi.org/10.1109/IWQoS.2016.7590439","url":null,"abstract":"Wearable cameras require connecting to cellular-capable devices (e.g., smartphones) so as to provide live broadcast services for worldwide users when Wi-Fi is unavailable. However, the constantly changing cellular network conditions may substantially slow down the upload of recorded videos. In this paper, we consider the scenario where wearable cameras upload live videos to remote distribution servers under cellular networks, aiming at maximizing the quality of uploaded videos while meeting the delay requirements. To attain the goal, we propose a dynamic video coding approach that utilizes dynamic video recording resolution adjustment on wearable cameras and Lyapunov based video preprocessing on smartphones. Our proposed resolution adjustment algorithm adapts to network condition changes, and reduces the overheads of video preprocessing. Due to the property of Lyapunov optimization framework, our proposed video preprocessing algorithm delivers near-optimal video quality while meeting the upload delay requirements. Our evaluation results show that our approach achieves up to 50% reduction in power consumption on smartphones and up to 60% reduction in average delay, at the cost of slightly compromised video quality.","PeriodicalId":304978,"journal":{"name":"2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127380773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-20DOI: 10.1109/IWQoS.2016.7590410
V. Suryaprakash, Ilaria Malanchini
For the first time since the advent of mobile networks, the idea of advancing their pervasiveness by co-opting them into most aspects of daily life has taken hold and this idea is, henceforth, intended to be a mainstay of future networks (5G and beyond). As a result, a term one frequently encounters in the latest literature pertinent to radio access networks is reliability. It is, however, fairly evident that it is mostly used in a colloquial linguistic sense or that, in some cases, it is used synonymously with availability. This work is, to the best of our knowledge, the first to provide a quantitative definition of reliability which stems from its characterization in the dictionary and is based on quantifiable definitions of resilience, availability, and other parameters important to radio access networks. The utility of this quantitative definition is demonstrated by developing a reliability-aware scheduler which takes predictions of the channel quality into account. The scheduler developed here is also compared with the classical proportional fair scheduler in use today. This comparison not only succeeds in highlighting the practicality of the definition provided, but it also shows that the anticipatory reliability-aware scheduler is able to provide an improvement of about 35 - 50% in reliability when compared to a proportional fair scheduler which is common in contemporary use.
{"title":"Reliability in future radio access networks: From linguistic to quantitative definitions","authors":"V. Suryaprakash, Ilaria Malanchini","doi":"10.1109/IWQoS.2016.7590410","DOIUrl":"https://doi.org/10.1109/IWQoS.2016.7590410","url":null,"abstract":"For the first time since the advent of mobile networks, the idea of advancing their pervasiveness by co-opting them into most aspects of daily life has taken hold and this idea is, henceforth, intended to be a mainstay of future networks (5G and beyond). As a result, a term one frequently encounters in the latest literature pertinent to radio access networks is reliability. It is, however, fairly evident that it is mostly used in a colloquial linguistic sense or that, in some cases, it is used synonymously with availability. This work is, to the best of our knowledge, the first to provide a quantitative definition of reliability which stems from its characterization in the dictionary and is based on quantifiable definitions of resilience, availability, and other parameters important to radio access networks. The utility of this quantitative definition is demonstrated by developing a reliability-aware scheduler which takes predictions of the channel quality into account. The scheduler developed here is also compared with the classical proportional fair scheduler in use today. This comparison not only succeeds in highlighting the practicality of the definition provided, but it also shows that the anticipatory reliability-aware scheduler is able to provide an improvement of about 35 - 50% in reliability when compared to a proportional fair scheduler which is common in contemporary use.","PeriodicalId":304978,"journal":{"name":"2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127508642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-20DOI: 10.1109/IWQoS.2016.7590395
Xiao Ling, Yi Yuan, Dan Wang, Jiahai Yang
Recently, the cloud systems face an increasing number of big data applications. It becomes an important issue for the cloud providers to allocate resources so as to accommodate as many of these big data applications as possible. In current cloud service, e.g., Amazon EMR, a job runs on a fixed cluster. This means that a fixed amount of resources (e.g. CPU, memory) is allocated to the life cycle of this job. We observe that the resources are inefficiently used in such services because of resources usage unbalance. Therefore, we propose a runtime elastic VM approach where the cloud system can increase or decrease the number of CPUs at different time periods for the jobs. There is little change to such services as Amazon EMR, yet the cloud system can accommodate many more jobs. In this paper, we first present a measurement study to show the feasibility and the quantitative impact of adjusting VM configurations dynamically. We then model the task and job completion time of big data applications, which are used for elastic VM adjustment decisions. We validate our models through experiments. We present Tetris, an elastic VM strategy based on cloud system that can better optimize resource utilization to support big data applications. We further implement a Tetris prototype and comprehensively evaluate Tetris on a real private cloud platform using Facebook trace and Wikipedia dataset. We observe that with Tetris, the cloud system can accommodate 31.3% more jobs.
{"title":"Tetris: Optimizing cloud resource usage unbalance with elastic VM","authors":"Xiao Ling, Yi Yuan, Dan Wang, Jiahai Yang","doi":"10.1109/IWQoS.2016.7590395","DOIUrl":"https://doi.org/10.1109/IWQoS.2016.7590395","url":null,"abstract":"Recently, the cloud systems face an increasing number of big data applications. It becomes an important issue for the cloud providers to allocate resources so as to accommodate as many of these big data applications as possible. In current cloud service, e.g., Amazon EMR, a job runs on a fixed cluster. This means that a fixed amount of resources (e.g. CPU, memory) is allocated to the life cycle of this job. We observe that the resources are inefficiently used in such services because of resources usage unbalance. Therefore, we propose a runtime elastic VM approach where the cloud system can increase or decrease the number of CPUs at different time periods for the jobs. There is little change to such services as Amazon EMR, yet the cloud system can accommodate many more jobs. In this paper, we first present a measurement study to show the feasibility and the quantitative impact of adjusting VM configurations dynamically. We then model the task and job completion time of big data applications, which are used for elastic VM adjustment decisions. We validate our models through experiments. We present Tetris, an elastic VM strategy based on cloud system that can better optimize resource utilization to support big data applications. We further implement a Tetris prototype and comprehensively evaluate Tetris on a real private cloud platform using Facebook trace and Wikipedia dataset. We observe that with Tetris, the cloud system can accommodate 31.3% more jobs.","PeriodicalId":304978,"journal":{"name":"2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126789310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-20DOI: 10.1109/IWQoS.2016.7590405
Ye Yu, Chen Qian
Server-centric data center networks enable several important features of modern data center applications, such as cloud storage and big data processing. However, network failures are ubiquitous and significantly affect network performance, such as routing correctness and network bandwidth. Existing server-centric data centers do not provide specific fault-tolerance mechanisms to recover the network from failures and to protect network performance from downgrading. In this work, we design FTDC, a fault-tolerant network and its routing protocols. FTDC is developed to provide high-bandwidth and flexibility to data center applications and achieve fault tolerance in a self-fixing manner. Upon failures, the servers automatically explore valid paths to deliver packets to the destination by exchanging control messages among servers. Experimental results show that FTDC demonstrate high performance with very little extra overhead during network failures.
{"title":"FTDC: A fault-tolerant server-centric data center network","authors":"Ye Yu, Chen Qian","doi":"10.1109/IWQoS.2016.7590405","DOIUrl":"https://doi.org/10.1109/IWQoS.2016.7590405","url":null,"abstract":"Server-centric data center networks enable several important features of modern data center applications, such as cloud storage and big data processing. However, network failures are ubiquitous and significantly affect network performance, such as routing correctness and network bandwidth. Existing server-centric data centers do not provide specific fault-tolerance mechanisms to recover the network from failures and to protect network performance from downgrading. In this work, we design FTDC, a fault-tolerant network and its routing protocols. FTDC is developed to provide high-bandwidth and flexibility to data center applications and achieve fault tolerance in a self-fixing manner. Upon failures, the servers automatically explore valid paths to deliver packets to the destination by exchanging control messages among servers. Experimental results show that FTDC demonstrate high performance with very little extra overhead during network failures.","PeriodicalId":304978,"journal":{"name":"2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126514183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-20DOI: 10.1109/IWQoS.2016.7590435
Shaohan Huang, Carol J. Fung, Kui Wang, Polo Pei, Zhongzhi Luan, D. Qian
Component based enterprise systems are becoming extremely complex in which the availability and usability are influenced intensively by the system's anomalies. Anomaly prediction is highly important for ensuring a system's stability, which aims at preventing anomaly from occurring through pre-failure warning. However, due to the system's complex nature and the noise from monitoring, capturing pre-failure symptoms is a challenging problem. In this paper, we present a sequential and an averaged recurrent neural networks (RNN) models for distributed systems and component based systems. Specifically, we use cycle representation to capture cyclical system behaviors, which can be used to improve prediction accuracy. The anomaly data used in the experiments is collected from RUBis, IBM System S, and the component based system of enterprise T. The experimental results show that our proposed methods can achieve high prediction accuracy with satisfying lead time. Our recurrent neural networks model also demonstrates time efficiency for monitoring large-scale systems.
基于组件的企业系统正变得极其复杂,系统的可用性和可用性受到系统异常的强烈影响。异常预测是保证系统稳定性的重要手段,其目的是通过故障预警来防止异常的发生。然而,由于系统的复杂性和监测噪声,捕获故障前症状是一个具有挑战性的问题。本文提出了分布式系统和基于组件的系统的顺序和平均递归神经网络(RNN)模型。具体来说,我们使用循环表示来捕获循环系统行为,这可以用来提高预测精度。实验中使用的异常数据分别来自RUBis、IBM System S和企业t的组件系统。实验结果表明,我们提出的方法可以在满足提前期的情况下获得较高的预测精度。我们的递归神经网络模型也证明了监测大型系统的时间效率。
{"title":"Using recurrent neural networks toward black-box system anomaly prediction","authors":"Shaohan Huang, Carol J. Fung, Kui Wang, Polo Pei, Zhongzhi Luan, D. Qian","doi":"10.1109/IWQoS.2016.7590435","DOIUrl":"https://doi.org/10.1109/IWQoS.2016.7590435","url":null,"abstract":"Component based enterprise systems are becoming extremely complex in which the availability and usability are influenced intensively by the system's anomalies. Anomaly prediction is highly important for ensuring a system's stability, which aims at preventing anomaly from occurring through pre-failure warning. However, due to the system's complex nature and the noise from monitoring, capturing pre-failure symptoms is a challenging problem. In this paper, we present a sequential and an averaged recurrent neural networks (RNN) models for distributed systems and component based systems. Specifically, we use cycle representation to capture cyclical system behaviors, which can be used to improve prediction accuracy. The anomaly data used in the experiments is collected from RUBis, IBM System S, and the component based system of enterprise T. The experimental results show that our proposed methods can achieve high prediction accuracy with satisfying lead time. Our recurrent neural networks model also demonstrates time efficiency for monitoring large-scale systems.","PeriodicalId":304978,"journal":{"name":"2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122314626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-20DOI: 10.1109/IWQoS.2016.7590434
Yongli Cheng, F. Wang, Hong Jiang, Yu Hua, D. Feng, XiuNeng Wang
With the rapid growth of data, communication overhead has become an important concern in applications of data centers and cloud computing. However, existing distributed graph-processing frameworks routinely suffer from high communication costs, leading to very long waiting times experienced by users for the graph-computing results. In order to address this problem, we propose a new computation model with low communication costs, called LCC-BSP. We use this model to design and implement a high-performance distributed graph-processing framework called LCC-Graph. This framework eliminates the high communication costs in existing distributed graph-processing frameworks. Moreover, LCC-Graph also minimizes the computation workload of each vertex, significantly reducing the computation time for each superstep. Evaluation of LCC-Graph on a 32-node cluster, driven by real-world graph datasets, shows that it significantly outperforms existing distributed graph-processing frameworks in terms of runtime, particularly when the system is supported by a high-bandwidth network. For example, LCC-Graph achieves an order of magnitude performance improvement over GPS and GraphLab.
{"title":"LCC-Graph: A high-performance graph-processing framework with low communication costs","authors":"Yongli Cheng, F. Wang, Hong Jiang, Yu Hua, D. Feng, XiuNeng Wang","doi":"10.1109/IWQoS.2016.7590434","DOIUrl":"https://doi.org/10.1109/IWQoS.2016.7590434","url":null,"abstract":"With the rapid growth of data, communication overhead has become an important concern in applications of data centers and cloud computing. However, existing distributed graph-processing frameworks routinely suffer from high communication costs, leading to very long waiting times experienced by users for the graph-computing results. In order to address this problem, we propose a new computation model with low communication costs, called LCC-BSP. We use this model to design and implement a high-performance distributed graph-processing framework called LCC-Graph. This framework eliminates the high communication costs in existing distributed graph-processing frameworks. Moreover, LCC-Graph also minimizes the computation workload of each vertex, significantly reducing the computation time for each superstep. Evaluation of LCC-Graph on a 32-node cluster, driven by real-world graph datasets, shows that it significantly outperforms existing distributed graph-processing frameworks in terms of runtime, particularly when the system is supported by a high-bandwidth network. For example, LCC-Graph achieves an order of magnitude performance improvement over GPS and GraphLab.","PeriodicalId":304978,"journal":{"name":"2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133003116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-20DOI: 10.1109/IWQoS.2016.7590423
Chaokun Zhang, Yong Cui, Rong Zheng, E. Jinlong, Jianping Wu
In this paper, we investigate the scheduling problem with multi-resource allocation in cloud computing environments. In contrast to existing work that focuses on flow-level scheduling, which treats flows in isolation, we consider dependency among subtasks of applications that imposes a partial order relationship in execution. We formulate the problem of Multi-Resource Partial-Ordered Task Scheduling (MR-POTS) to minimize the makespan. In the first stage, the proposed Dominant Resource Priority (DRP) algorithm decides the collection of subtasks for resource allocation by taking into account the partial order relationship and characteristics of subtasks. In the second stage, the proposed Maximum Utilization Allocation (MUA) algorithm partitions multiple resources among selected subtasks with the objective to maximize the overall utilization. Both theoretical analysis and experimental evaluation demonstrate the proposed algorithms can approximately achieve the minimal makespan with high resource utilization. Specifically, a reduction of 50% in makespan can be achieved compared with existing scheduling schemes.
{"title":"Multi-Resource Partial-Ordered Task Scheduling in cloud computing","authors":"Chaokun Zhang, Yong Cui, Rong Zheng, E. Jinlong, Jianping Wu","doi":"10.1109/IWQoS.2016.7590423","DOIUrl":"https://doi.org/10.1109/IWQoS.2016.7590423","url":null,"abstract":"In this paper, we investigate the scheduling problem with multi-resource allocation in cloud computing environments. In contrast to existing work that focuses on flow-level scheduling, which treats flows in isolation, we consider dependency among subtasks of applications that imposes a partial order relationship in execution. We formulate the problem of Multi-Resource Partial-Ordered Task Scheduling (MR-POTS) to minimize the makespan. In the first stage, the proposed Dominant Resource Priority (DRP) algorithm decides the collection of subtasks for resource allocation by taking into account the partial order relationship and characteristics of subtasks. In the second stage, the proposed Maximum Utilization Allocation (MUA) algorithm partitions multiple resources among selected subtasks with the objective to maximize the overall utilization. Both theoretical analysis and experimental evaluation demonstrate the proposed algorithms can approximately achieve the minimal makespan with high resource utilization. Specifically, a reduction of 50% in makespan can be achieved compared with existing scheduling schemes.","PeriodicalId":304978,"journal":{"name":"2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125865230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-20DOI: 10.1109/IWQoS.2016.7590409
Shuo Wang, Jiao Zhang, Tao Huang, Tian Pan, Jiang Liu, Yun-jie Liu
We present FDALB, a flow distribution aware load balancing mechanism aimed at reducing flow collisions and achieving high scalability. FDALB, like the most of centralized methods, uses a centralized controller to get the view of networks and congestion information. However, FDALB classifies flows into short flows and long flows. The paths of short flows and long flows are controlled by distributed switches and the centralized controller respectively. Thus, the controller handles only a small part of flows to achieve high scalability. To further reduce the controller's overhead, FDALB leverages end-hosts to tag long flows, thus switches can easily determine long flows by inspecting the tag. Besides, FDALB can adaptively adjust the threshold at each end-host to keep up with the flow distribution dynamics.
{"title":"FDALB: Flow distribution aware load balancing for datacenter networks","authors":"Shuo Wang, Jiao Zhang, Tao Huang, Tian Pan, Jiang Liu, Yun-jie Liu","doi":"10.1109/IWQoS.2016.7590409","DOIUrl":"https://doi.org/10.1109/IWQoS.2016.7590409","url":null,"abstract":"We present FDALB, a flow distribution aware load balancing mechanism aimed at reducing flow collisions and achieving high scalability. FDALB, like the most of centralized methods, uses a centralized controller to get the view of networks and congestion information. However, FDALB classifies flows into short flows and long flows. The paths of short flows and long flows are controlled by distributed switches and the centralized controller respectively. Thus, the controller handles only a small part of flows to achieve high scalability. To further reduce the controller's overhead, FDALB leverages end-hosts to tag long flows, thus switches can easily determine long flows by inspecting the tag. Besides, FDALB can adaptively adjust the threshold at each end-host to keep up with the flow distribution dynamics.","PeriodicalId":304978,"journal":{"name":"2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125960547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}