Pub Date : 2019-11-01DOI: 10.1109/CloudNet47604.2019.9064110
Amina Mseddi, Wael Jaafar, H. Elbiaze, W. Ajib
Fog computing emerged as a new paradigm that pushes cloud applications to the network edge. The fog infrastructure contains mainly distributed and heterogeneous fog nodes that are characterized by their complex distribution, high mobility and sporadic resources availability. This dynamic fog nodes behavior triggers new challenges in the resource management process, such as resources coordination for continuous quality-of-service satisfaction. In this paper, we propose a smart online resource allocation approach adapted for dynamic fog computing environments, aiming at maximizing the number of satisfied user requests within a predefined delay threshold. We model the fog computing environment as a Markov discrete process, where dynamic fog node behavior / mobility and resources availability are considered. Then, we present our smart deep-reinforcement-learning resource allocation algorithm. Considering real-world mobility data sets, the near-optimal performance of the proposed solution is illustrated through simulations, and its superiority over heuristic state-of-the-art approaches is exposed.
{"title":"Intelligent Resource Allocation in Dynamic Fog Computing Environments","authors":"Amina Mseddi, Wael Jaafar, H. Elbiaze, W. Ajib","doi":"10.1109/CloudNet47604.2019.9064110","DOIUrl":"https://doi.org/10.1109/CloudNet47604.2019.9064110","url":null,"abstract":"Fog computing emerged as a new paradigm that pushes cloud applications to the network edge. The fog infrastructure contains mainly distributed and heterogeneous fog nodes that are characterized by their complex distribution, high mobility and sporadic resources availability. This dynamic fog nodes behavior triggers new challenges in the resource management process, such as resources coordination for continuous quality-of-service satisfaction. In this paper, we propose a smart online resource allocation approach adapted for dynamic fog computing environments, aiming at maximizing the number of satisfied user requests within a predefined delay threshold. We model the fog computing environment as a Markov discrete process, where dynamic fog node behavior / mobility and resources availability are considered. Then, we present our smart deep-reinforcement-learning resource allocation algorithm. Considering real-world mobility data sets, the near-optimal performance of the proposed solution is illustrated through simulations, and its superiority over heuristic state-of-the-art approaches is exposed.","PeriodicalId":340890,"journal":{"name":"2019 IEEE 8th International Conference on Cloud Networking (CloudNet)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122923269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/CloudNet47604.2019.9064134
Pavol Mulinka, P. Casas, J. Vanerio
Continuous and adaptive learning is an effective learning approach when dealing with highly dynamic and changing scenarios, where concept drift often happens. In a continuous, stream or adaptive learning setup, new measurements arrive continuously and there are no boundaries for learning, meaning that the learning model has to decide how and when to (re)learn from these new data constantly. We address the problem of adaptive and continual learning for network security, building dynamic models to detect network attacks in real network traffic. The combination of fast and big network measurements data with the re-training paradigm of adaptive learning imposes complex challenges in terms of data processing speed, which we tackle by relying on big data platforms for parallel stream processing. We build and benchmark different adaptive learning models on top of a novel big data analytics platform for network traffic monitoring and analysis tasks, and show that high speed-up computations (as high as × 6) can be achieved by parallelizing off-the-shelf stream learning approaches.
{"title":"Continuous and Adaptive Learning over Big Streaming Data for Network Security","authors":"Pavol Mulinka, P. Casas, J. Vanerio","doi":"10.1109/CloudNet47604.2019.9064134","DOIUrl":"https://doi.org/10.1109/CloudNet47604.2019.9064134","url":null,"abstract":"Continuous and adaptive learning is an effective learning approach when dealing with highly dynamic and changing scenarios, where concept drift often happens. In a continuous, stream or adaptive learning setup, new measurements arrive continuously and there are no boundaries for learning, meaning that the learning model has to decide how and when to (re)learn from these new data constantly. We address the problem of adaptive and continual learning for network security, building dynamic models to detect network attacks in real network traffic. The combination of fast and big network measurements data with the re-training paradigm of adaptive learning imposes complex challenges in terms of data processing speed, which we tackle by relying on big data platforms for parallel stream processing. We build and benchmark different adaptive learning models on top of a novel big data analytics platform for network traffic monitoring and analysis tasks, and show that high speed-up computations (as high as × 6) can be achieved by parallelizing off-the-shelf stream learning approaches.","PeriodicalId":340890,"journal":{"name":"2019 IEEE 8th International Conference on Cloud Networking (CloudNet)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124742288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/CloudNet47604.2019.9064132
Karyna Gogunska, C. Barakat, G. Urvoy-Keller
With the increasing popularity of cloud networking and the widespread usage of virtualization, it becomes more and more complex to monitor this new virtual environment. Yet, monitoring remains crucial for network troubleshooting and analysis. Controlling the measurement footprint in the virtual network is one of the main priorities in the process of monitoring as resources are shared between the compute nodes of tenants and the measurement process itself. In this paper, first, we assess the capability of machine learning to predict measurement impact on the ongoing traffic between virtual machines; second, we propose a data-driven solution that is able to provide optimal monitoring parameters for virtual network measurement with minimum traffic interference.
{"title":"Tuning optimal traffic measurement parameters in virtual networks with machine learning","authors":"Karyna Gogunska, C. Barakat, G. Urvoy-Keller","doi":"10.1109/CloudNet47604.2019.9064132","DOIUrl":"https://doi.org/10.1109/CloudNet47604.2019.9064132","url":null,"abstract":"With the increasing popularity of cloud networking and the widespread usage of virtualization, it becomes more and more complex to monitor this new virtual environment. Yet, monitoring remains crucial for network troubleshooting and analysis. Controlling the measurement footprint in the virtual network is one of the main priorities in the process of monitoring as resources are shared between the compute nodes of tenants and the measurement process itself. In this paper, first, we assess the capability of machine learning to predict measurement impact on the ongoing traffic between virtual machines; second, we propose a data-driven solution that is able to provide optimal monitoring parameters for virtual network measurement with minimum traffic interference.","PeriodicalId":340890,"journal":{"name":"2019 IEEE 8th International Conference on Cloud Networking (CloudNet)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127625377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/CloudNet47604.2019.9064127
Houssam ElBouanani, C. Barakat, G. Urvoy-Keller, Dino Lopez Pacheco
Data center network monitoring can be carried out at hardware networking equipment (e.g., physical routers) and/or software networking equipment (e.g., virtual switches). While software switches offer high flexibility to deploy various monitoring tools, they have to utilize server resources, especially CPU and memory, that can no longer be reserved fully to service users' traffic. In this paper we closely examine the costs of ($i$) sampling packets on a virtual switch for monitoring purposes; (ii) sending them to a user-space program for measurement; and (iii) forwarding them to a remote server where they will be processed in case of lack of resources locally. Starting from empirical observations, we derive an analytical model to accurately predict (R2= 99.5%) the three aforementioned costs, as a function of the sampling rates, and pave the way for a collaborative monitoring approach where servers delegate monitoring tasks to each other via port mirroring in case they lack resources.
{"title":"Collaborative Traffic Measurement in Virtualized Data Center Networks","authors":"Houssam ElBouanani, C. Barakat, G. Urvoy-Keller, Dino Lopez Pacheco","doi":"10.1109/CloudNet47604.2019.9064127","DOIUrl":"https://doi.org/10.1109/CloudNet47604.2019.9064127","url":null,"abstract":"Data center network monitoring can be carried out at hardware networking equipment (e.g., physical routers) and/or software networking equipment (e.g., virtual switches). While software switches offer high flexibility to deploy various monitoring tools, they have to utilize server resources, especially CPU and memory, that can no longer be reserved fully to service users' traffic. In this paper we closely examine the costs of ($i$) sampling packets on a virtual switch for monitoring purposes; (ii) sending them to a user-space program for measurement; and (iii) forwarding them to a remote server where they will be processed in case of lack of resources locally. Starting from empirical observations, we derive an analytical model to accurately predict (R2= 99.5%) the three aforementioned costs, as a function of the sampling rates, and pave the way for a collaborative monitoring approach where servers delegate monitoring tasks to each other via port mirroring in case they lack resources.","PeriodicalId":340890,"journal":{"name":"2019 IEEE 8th International Conference on Cloud Networking (CloudNet)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114244258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/CloudNet47604.2019.9064147
Flávio Meneses, M. Fernandes, T. Vieira, Daniel Corujo, A. Neto, R. Aguiar
This paper proposes a framework where Customer Premises Equipments (CPEs) are dynamically instantiated, by leveraging Software Defined Networking (SDN) and Network Function Vitualisation (NFV), in the cloud as a chain of containerised virtual network functions (VNFs). Resulting virtual CPE instances (i.e., vCPEs) are organised in clusters and a Management and Orchestrator (MANO) entity is used to monitor the cluster and to migrate vCPEs among the nodes composing the cluster as required for ensuring the load balancing of the resources of the cluster. During the vCPEs migration process, the data-path is dynamically updated via SDN mechanisms. A proof of concept prototype of the framework was developed and evaluated in an experimental testbed, showcasing its feasibility and a near-zero downtime while migration is taking place.
{"title":"Dynamic Modular vCPE Orchestration in Platform as a Service Architectures","authors":"Flávio Meneses, M. Fernandes, T. Vieira, Daniel Corujo, A. Neto, R. Aguiar","doi":"10.1109/CloudNet47604.2019.9064147","DOIUrl":"https://doi.org/10.1109/CloudNet47604.2019.9064147","url":null,"abstract":"This paper proposes a framework where Customer Premises Equipments (CPEs) are dynamically instantiated, by leveraging Software Defined Networking (SDN) and Network Function Vitualisation (NFV), in the cloud as a chain of containerised virtual network functions (VNFs). Resulting virtual CPE instances (i.e., vCPEs) are organised in clusters and a Management and Orchestrator (MANO) entity is used to monitor the cluster and to migrate vCPEs among the nodes composing the cluster as required for ensuring the load balancing of the resources of the cluster. During the vCPEs migration process, the data-path is dynamically updated via SDN mechanisms. A proof of concept prototype of the framework was developed and evaluated in an experimental testbed, showcasing its feasibility and a near-zero downtime while migration is taking place.","PeriodicalId":340890,"journal":{"name":"2019 IEEE 8th International Conference on Cloud Networking (CloudNet)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130954340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/CloudNet47604.2019.9064131
László Toka, Dávid Haja, Attila Korösi, Balázs Sonkoly
Edge and fog computing are emerging concepts extending traditional cloud computing by deploying compute resources closer to the users. This approach, closely integrated with carrier-networks, enables several future services, such as tactile internet, 5G and beyond telco services, and extended reality applications. The emphasis is on integration: the rigorous delay constraints, ensuring reliability on the distributed remote nodes, and the sheer scale altogether call for a powerful provisioning platform that offers the applications the best out of the underlying infrastructure. In this paper we investigate the resource provisioning problem in the edge infrastructure with the consideration of probable failures. Our goal is to support high reliability of services with the minimum amount of edge resources reserved to provide the necessary redundancy in the system. We design a resource provisioning algorithm, which takes into account network latency when pinpointing backup placeholders for virtual functions of edge applications. We implement the proposed solution in a simulation environment and show the efficient resource utilization results achieved by our fast heuristic algorithm.
{"title":"Resource provisioning for highly reliable and ultra-responsive edge applications","authors":"László Toka, Dávid Haja, Attila Korösi, Balázs Sonkoly","doi":"10.1109/CloudNet47604.2019.9064131","DOIUrl":"https://doi.org/10.1109/CloudNet47604.2019.9064131","url":null,"abstract":"Edge and fog computing are emerging concepts extending traditional cloud computing by deploying compute resources closer to the users. This approach, closely integrated with carrier-networks, enables several future services, such as tactile internet, 5G and beyond telco services, and extended reality applications. The emphasis is on integration: the rigorous delay constraints, ensuring reliability on the distributed remote nodes, and the sheer scale altogether call for a powerful provisioning platform that offers the applications the best out of the underlying infrastructure. In this paper we investigate the resource provisioning problem in the edge infrastructure with the consideration of probable failures. Our goal is to support high reliability of services with the minimum amount of edge resources reserved to provide the necessary redundancy in the system. We design a resource provisioning algorithm, which takes into account network latency when pinpointing backup placeholders for virtual functions of edge applications. We implement the proposed solution in a simulation environment and show the efficient resource utilization results achieved by our fast heuristic algorithm.","PeriodicalId":340890,"journal":{"name":"2019 IEEE 8th International Conference on Cloud Networking (CloudNet)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127927472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/CloudNet47604.2019.9064143
T. Ayar, D. Altilar, L. Budzisz, B. Rathke
The use of multiple paths in core networks for TCP traffic sounds promising as it suggests bandwidth aggregation, fault tolerance through redundancy, high resource utilization efficiency, reduced congestions, and increase in TCP throughput. In order to benefit from all these features, the load balancing approaches at different granularities (per-flow, per-destination, and per-packet) have to be applied. In order to promote use of per-packet load balancing in core networks, we already proposed a transparent TCP proxy called as ORTA (Out-of-Order Robustness for TCP with Transparent Acknowledgment Intervention). ORTA was introduced along with simulation results which were all promising and competing with the nontransparent approaches in the literature. However, network simulations may not reflect the real system performances because of the lack of precise and accurate model of the real systems. In this paper, ORTA is implemented as a netfilter module and emulation test results are presented. The results indicate that ORTA prevents TCP performance degradation caused by TCP packet reorderings. Moreover, ORTA has no degrading impact on TCP performance when packet reordering does not exist.
{"title":"Emulation and Performance Evaluation of a Transparent Reordering Robust TCP Proxy","authors":"T. Ayar, D. Altilar, L. Budzisz, B. Rathke","doi":"10.1109/CloudNet47604.2019.9064143","DOIUrl":"https://doi.org/10.1109/CloudNet47604.2019.9064143","url":null,"abstract":"The use of multiple paths in core networks for TCP traffic sounds promising as it suggests bandwidth aggregation, fault tolerance through redundancy, high resource utilization efficiency, reduced congestions, and increase in TCP throughput. In order to benefit from all these features, the load balancing approaches at different granularities (per-flow, per-destination, and per-packet) have to be applied. In order to promote use of per-packet load balancing in core networks, we already proposed a transparent TCP proxy called as ORTA (Out-of-Order Robustness for TCP with Transparent Acknowledgment Intervention). ORTA was introduced along with simulation results which were all promising and competing with the nontransparent approaches in the literature. However, network simulations may not reflect the real system performances because of the lack of precise and accurate model of the real systems. In this paper, ORTA is implemented as a netfilter module and emulation test results are presented. The results indicate that ORTA prevents TCP performance degradation caused by TCP packet reorderings. Moreover, ORTA has no degrading impact on TCP performance when packet reordering does not exist.","PeriodicalId":340890,"journal":{"name":"2019 IEEE 8th International Conference on Cloud Networking (CloudNet)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126320409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/CloudNet47604.2019.9064116
X. Masip-Bruin, S. Sánchez-López, A. Jurnet, E. Marín-Tordera, A. Jukan, G. Ren
The capacity to efficiently manage the whole set of resources from the edge up to the cloud paves the way to a new landscape of innovative opportunities for all involved actors, be it on the research or industrial sides. Fog-to-Cloud (F2C) has been recently proposed as a management solution particularly tailored to manage the stack of resources from the edge up to the cloud in a coordinated way. However, beyond the benefits brought by considering all the spectrum of resources to run a service, resilience, as a concept must be reflected in the F2C design. In this paper, we address a particular scenario where a specific node failure in the F2C architecture will substantially impact on the whole system performance, and analyse three tentative strategies to efficiently manage such scenario.
{"title":"Towards a Resilient Control Architecture for Combined Fog-to-Cloud Systems","authors":"X. Masip-Bruin, S. Sánchez-López, A. Jurnet, E. Marín-Tordera, A. Jukan, G. Ren","doi":"10.1109/CloudNet47604.2019.9064116","DOIUrl":"https://doi.org/10.1109/CloudNet47604.2019.9064116","url":null,"abstract":"The capacity to efficiently manage the whole set of resources from the edge up to the cloud paves the way to a new landscape of innovative opportunities for all involved actors, be it on the research or industrial sides. Fog-to-Cloud (F2C) has been recently proposed as a management solution particularly tailored to manage the stack of resources from the edge up to the cloud in a coordinated way. However, beyond the benefits brought by considering all the spectrum of resources to run a service, resilience, as a concept must be reflected in the F2C design. In this paper, we address a particular scenario where a specific node failure in the F2C architecture will substantially impact on the whole system performance, and analyse three tentative strategies to efficiently manage such scenario.","PeriodicalId":340890,"journal":{"name":"2019 IEEE 8th International Conference on Cloud Networking (CloudNet)","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121739895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/CloudNet47604.2019.9064115
Adrien Gausseran, Andrea Tomassilli, F. Giroire, J. Moulierac
Software Defined Networking (SDN) and Network Function Virtualization (NFV) are complementary and core components of modernized networks. In this paper, we consider the problem of reconfiguring Service Function Chains (SFC) with the goal of bringing the network from a sub-optimal to an optimal operational state. We propose optimization models based on the make-before-break mechanism, in which a new path is set up before the old one is torn down. Our method takes into consideration the chaining requirements of the flows and scales well with the number of nodes in the network. We show that, with our approach, the network operational cost defined in terms of both bandwidth and installed network function costs can be reduced and a higher acceptance rate can be achieved, while not interrupting the flows.
{"title":"No Interruption When Reconfiguring my SFCs","authors":"Adrien Gausseran, Andrea Tomassilli, F. Giroire, J. Moulierac","doi":"10.1109/CloudNet47604.2019.9064115","DOIUrl":"https://doi.org/10.1109/CloudNet47604.2019.9064115","url":null,"abstract":"Software Defined Networking (SDN) and Network Function Virtualization (NFV) are complementary and core components of modernized networks. In this paper, we consider the problem of reconfiguring Service Function Chains (SFC) with the goal of bringing the network from a sub-optimal to an optimal operational state. We propose optimization models based on the make-before-break mechanism, in which a new path is set up before the old one is torn down. Our method takes into consideration the chaining requirements of the flows and scales well with the number of nodes in the network. We show that, with our approach, the network operational cost defined in terms of both bandwidth and installed network function costs can be reduced and a higher acceptance rate can be achieved, while not interrupting the flows.","PeriodicalId":340890,"journal":{"name":"2019 IEEE 8th International Conference on Cloud Networking (CloudNet)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122110902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-08DOI: 10.1109/CloudNet47604.2019.9064146
C. H. Benet, A. Kassler
Data center networks offer multiple disjoint paths between Top-of-Rack (ToR) switches to connect server racks providing large bisection bandwidth. An effective load-balancing mechanism is required in order to fully utilize the available capacity of the multiple paths. While packet-based load-balancing can achieve high utilization, it suffers from reordering. Flow-based load-balancing such as equal-cost multipath routing (ECMP) spreads traffic uniformly across multiple paths leading to frequent hash collisions and suboptimal performance. Finally, flowlet based load-balancing such as CONGA or HULA splits flows into smaller units, which are sent on different paths. Most flowlet based load-balancing schemes depend on a proper static setting of the flowlet gap, which decides when new flowlets are detected. While a too small gap may lead to reordering, a too large gap results in missed load-balancing opportunities. In this paper, we propose FlowDyn, which dynamically adapts the flowlet gap to increase the efficiency of the load-balancing schemes while avoiding the reordering problem. Using programmable data planes, FlowDyn uses active probes together with telemetry information to track path latency between different ToR switches. FlowDyn calculates dynamically a suitable flowlet gap that can be used for flowlet based load-balancing mechanism. We evaluate FlowDyn extensively in simulation, showing that it achieves 3.19 times smaller flow completion time at 10% load and 1.16x at 90% load.
{"title":"FlowDyn: Towards a Dynamic Flowlet Gap Detection using Programmable Data Planes","authors":"C. H. Benet, A. Kassler","doi":"10.1109/CloudNet47604.2019.9064146","DOIUrl":"https://doi.org/10.1109/CloudNet47604.2019.9064146","url":null,"abstract":"Data center networks offer multiple disjoint paths between Top-of-Rack (ToR) switches to connect server racks providing large bisection bandwidth. An effective load-balancing mechanism is required in order to fully utilize the available capacity of the multiple paths. While packet-based load-balancing can achieve high utilization, it suffers from reordering. Flow-based load-balancing such as equal-cost multipath routing (ECMP) spreads traffic uniformly across multiple paths leading to frequent hash collisions and suboptimal performance. Finally, flowlet based load-balancing such as CONGA or HULA splits flows into smaller units, which are sent on different paths. Most flowlet based load-balancing schemes depend on a proper static setting of the flowlet gap, which decides when new flowlets are detected. While a too small gap may lead to reordering, a too large gap results in missed load-balancing opportunities. In this paper, we propose FlowDyn, which dynamically adapts the flowlet gap to increase the efficiency of the load-balancing schemes while avoiding the reordering problem. Using programmable data planes, FlowDyn uses active probes together with telemetry information to track path latency between different ToR switches. FlowDyn calculates dynamically a suitable flowlet gap that can be used for flowlet based load-balancing mechanism. We evaluate FlowDyn extensively in simulation, showing that it achieves 3.19 times smaller flow completion time at 10% load and 1.16x at 90% load.","PeriodicalId":340890,"journal":{"name":"2019 IEEE 8th International Conference on Cloud Networking (CloudNet)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133449706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}