{"title":"Learning the Optimal Controller Placement in Mobile Software-Defined Networks","authors":"I. Koutsopoulos","doi":"10.1109/WoWMoM54355.2022.00029","DOIUrl":null,"url":null,"abstract":"We formulate and study the problem of online learning of the optimal controller selection policy in mobile Software-Defined Networks, where the controller-switch round-trip-time (RTT) delays are unknown and time-varying. Static optimization approaches are not helpful, since delays vary significantly (and sometimes, arbitrarily) from one slot to another, and only RTT delays from the current active controller can be easily measured. First, we model the sequence of RTT delays across time as a stationary random process so that the value at each time slot is a sample from an unknown probability distribution with unknown mean. This approach is applicable in relatively static network settings, where stationarity can be assumed. We cast the problem as a stochastic multiarmed bandit, where the arms are the different controller choices, and we fit different bandit algorithms to that setting, such as: the Lowest Confidence Bound (LCB) algorithm by modifying the known Upper Confidence Bound (UCB) one, the LCB-tuned one, and the Boltzmann exploration one. The first two are known to achieve sublinear regret, while the last one turns out to be very efficient. In a second approach, the random process of RTTs is non-stationary and thus cannot be characterized statistically. This scenario is applicable in cases of arbitrary mobility and other dynamics that affect RTT delays in an unpredictable, adversarial manner. We pose the problem as an adversarial bandit that can be solved with the EXP3 algorithm which achieves sublinear regret. We argue that all approaches can be implemented in an SDN environment with lightweight messaging. We also compare the performance of these algorithms for different problem settings and hyper-parameters that reflect the efficiency of the learning process. Numerical evaluation shows that Boltzmann exploration achieves the best performance.","PeriodicalId":275324,"journal":{"name":"2022 IEEE 23rd International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 23rd International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WoWMoM54355.2022.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
We formulate and study the problem of online learning of the optimal controller selection policy in mobile Software-Defined Networks, where the controller-switch round-trip-time (RTT) delays are unknown and time-varying. Static optimization approaches are not helpful, since delays vary significantly (and sometimes, arbitrarily) from one slot to another, and only RTT delays from the current active controller can be easily measured. First, we model the sequence of RTT delays across time as a stationary random process so that the value at each time slot is a sample from an unknown probability distribution with unknown mean. This approach is applicable in relatively static network settings, where stationarity can be assumed. We cast the problem as a stochastic multiarmed bandit, where the arms are the different controller choices, and we fit different bandit algorithms to that setting, such as: the Lowest Confidence Bound (LCB) algorithm by modifying the known Upper Confidence Bound (UCB) one, the LCB-tuned one, and the Boltzmann exploration one. The first two are known to achieve sublinear regret, while the last one turns out to be very efficient. In a second approach, the random process of RTTs is non-stationary and thus cannot be characterized statistically. This scenario is applicable in cases of arbitrary mobility and other dynamics that affect RTT delays in an unpredictable, adversarial manner. We pose the problem as an adversarial bandit that can be solved with the EXP3 algorithm which achieves sublinear regret. We argue that all approaches can be implemented in an SDN environment with lightweight messaging. We also compare the performance of these algorithms for different problem settings and hyper-parameters that reflect the efficiency of the learning process. Numerical evaluation shows that Boltzmann exploration achieves the best performance.