Lin Yang, M. Hajiesmaili, R. Sitaraman, A. Wierman, Enrique Mallada, W. Wong
This paper considers the problem of online linear optimization with inventory management constraints. Specifically, we consider an online scenario where a decision maker needs to satisfy her timevarying demand for some units of an asset, either from a market with a time-varying price or from her own inventory. In each time slot, the decision maker is presented a (linear) price and must immediately decide the amount to purchase for covering the demand and/or for storing in the inventory for future use. The inventory has a limited capacity and can be used to buy and store assets at low price and cover the demand when the price is high. The ultimate goal of the decision maker is to cover the demand at each time slot while minimizing the cost of buying assets from the market. We propose ARP, an online algorithm for linear programming with inventory constraints, and ARPRate, an extended version that handles rate constraints to/from the inventory. Both ARP and ARPRate achieve optimal competitive ratios, meaning that no other online algorithm can achieve a better theoretical guarantee. To illustrate the results, we use the proposed algorithms in a case study focused on energy procurement and storage management strategies for data centers.
{"title":"Online Linear Optimization with Inventory Management Constraints","authors":"Lin Yang, M. Hajiesmaili, R. Sitaraman, A. Wierman, Enrique Mallada, W. Wong","doi":"10.1145/3393691.3394207","DOIUrl":"https://doi.org/10.1145/3393691.3394207","url":null,"abstract":"This paper considers the problem of online linear optimization with inventory management constraints. Specifically, we consider an online scenario where a decision maker needs to satisfy her timevarying demand for some units of an asset, either from a market with a time-varying price or from her own inventory. In each time slot, the decision maker is presented a (linear) price and must immediately decide the amount to purchase for covering the demand and/or for storing in the inventory for future use. The inventory has a limited capacity and can be used to buy and store assets at low price and cover the demand when the price is high. The ultimate goal of the decision maker is to cover the demand at each time slot while minimizing the cost of buying assets from the market. We propose ARP, an online algorithm for linear programming with inventory constraints, and ARPRate, an extended version that handles rate constraints to/from the inventory. Both ARP and ARPRate achieve optimal competitive ratios, meaning that no other online algorithm can achieve a better theoretical guarantee. To illustrate the results, we use the proposed algorithms in a case study focused on energy procurement and storage management strategies for data centers.","PeriodicalId":188517,"journal":{"name":"Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127623309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An important class of computer software, such as network servers, exhibits concurrency through many loosely coupled and potentially long-running communication sessions. For these applications, a long-standing open question is whether thread-per-session programming can deliver comparable performance to event-driven programming. This paper clearly demonstrates, for the first time, that it is possible to employ user-level threading for building thread-per-session applications without compromising functionality, efficiency, performance, or scalability. We present the design and implementation of a general-purpose, yet nimble, user-level M:N threading runtime that is built from scratch to accomplish these objectives. Its key components are efficient and effective load balancing and user-level I/O blocking. While no other runtime exists with comparable characteristics, an important fundamental finding of this work is that building this runtime does not require particularly intricate data structures or algorithms. The runtime is thus a straightforward existence proof for user-level threading without performance compromises and can serve as a reference platform for future research. It is evaluated in comparison to event-driven software, system-level threading, and several other user-level threading runtimes. An experimental evaluation is conducted using benchmark programs, as well as the popular Memcached application. We demonstrate that our user-level runtime outperforms other threading runtimes and enables thread-per-session programming at high levels of concurrency and hardware parallelism without sacrificing performance.
{"title":"User-level Threading: Have Your Cake and Eat It Too","authors":"M. Karsten, Saman Barghi","doi":"10.1145/3393691.3394226","DOIUrl":"https://doi.org/10.1145/3393691.3394226","url":null,"abstract":"An important class of computer software, such as network servers, exhibits concurrency through many loosely coupled and potentially long-running communication sessions. For these applications, a long-standing open question is whether thread-per-session programming can deliver comparable performance to event-driven programming. This paper clearly demonstrates, for the first time, that it is possible to employ user-level threading for building thread-per-session applications without compromising functionality, efficiency, performance, or scalability. We present the design and implementation of a general-purpose, yet nimble, user-level M:N threading runtime that is built from scratch to accomplish these objectives. Its key components are efficient and effective load balancing and user-level I/O blocking. While no other runtime exists with comparable characteristics, an important fundamental finding of this work is that building this runtime does not require particularly intricate data structures or algorithms. The runtime is thus a straightforward existence proof for user-level threading without performance compromises and can serve as a reference platform for future research. It is evaluated in comparison to event-driven software, system-level threading, and several other user-level threading runtimes. An experimental evaluation is conducted using benchmark programs, as well as the popular Memcached application. We demonstrate that our user-level runtime outperforms other threading runtimes and enables thread-per-session programming at high levels of concurrency and hardware parallelism without sacrificing performance.","PeriodicalId":188517,"journal":{"name":"Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127368936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a systematic approach to identify and quantify the types of structures featured by packet traces in communication networks. Our approach leverages an information-theoretic methodology, based on iterative randomization and compression of the packet trace, which allows us to systematically remove and measure dimensions of structure in the trace. In particular, we introduce the notion of trace complexity which approximates the entropy rate of a packet trace. Considering several real-world traces, we show that trace complexity can provide unique insights into the characteristics of various applications. Based on our approach, we also propose a traffic generator model able to produce a synthetic trace that matches the complexity levels of its corresponding real-world trace. Using a case study in the context of datacenters, we show that insights into the structure of packet traces can lead to improved demand-aware network designs: datacenter topologies that are optimized for specific traffic patterns.
{"title":"On the Complexity of Traffic Traces and Implications","authors":"C. Avin, M. Ghobadi, Chen Griner, S. Schmid","doi":"10.1145/3393691.3394205","DOIUrl":"https://doi.org/10.1145/3393691.3394205","url":null,"abstract":"This paper presents a systematic approach to identify and quantify the types of structures featured by packet traces in communication networks. Our approach leverages an information-theoretic methodology, based on iterative randomization and compression of the packet trace, which allows us to systematically remove and measure dimensions of structure in the trace. In particular, we introduce the notion of trace complexity which approximates the entropy rate of a packet trace. Considering several real-world traces, we show that trace complexity can provide unique insights into the characteristics of various applications. Based on our approach, we also propose a traffic generator model able to produce a synthetic trace that matches the complexity levels of its corresponding real-world trace. Using a case study in the context of datacenters, we show that insights into the structure of packet traces can lead to improved demand-aware network designs: datacenter topologies that are optimized for specific traffic patterns.","PeriodicalId":188517,"journal":{"name":"Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131125232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study online optimization in a setting where an online learner seeks to optimize a per-round hitting cost, which may be non-convex, while incurring a movement cost when changing actions between rounds. We ask: under what general conditions is it possible for an online learner to leverage predictions of future cost functions in order to achieve near-optimal costs? Prior work has provided near-optimal online algorithms for specific combinations of assumptions about hitting and switching costs, but no general results are known. In this work, we give two general sufficient conditions that specify a relationship between the hitting and movement costs which guarantees that a new algorithm, Synchronized Fixed Horizon Control (SFHC), achieves a 1+O(1/w) competitive ratio, where w is the number of predictions available to the learner. Our conditions do not require the cost functions to be convex, and we also derive competitive ratio results for non-convex hitting and movement costs. Our results provide the first constant, dimension-free competitive ratio for online non-convex optimization with movement costs. We also give an example of a natural problem, Convex Body Chasing (CBC), where the sufficient conditions are not satisfied and prove that no online algorithm can have a competitive ratio that converges to 1.
{"title":"Online Optimization with Predictions and Non-convex Losses","authors":"Yiheng Lin, Gautam Goel, A. Wierman","doi":"10.1145/3393691.3394208","DOIUrl":"https://doi.org/10.1145/3393691.3394208","url":null,"abstract":"We study online optimization in a setting where an online learner seeks to optimize a per-round hitting cost, which may be non-convex, while incurring a movement cost when changing actions between rounds. We ask: under what general conditions is it possible for an online learner to leverage predictions of future cost functions in order to achieve near-optimal costs? Prior work has provided near-optimal online algorithms for specific combinations of assumptions about hitting and switching costs, but no general results are known. In this work, we give two general sufficient conditions that specify a relationship between the hitting and movement costs which guarantees that a new algorithm, Synchronized Fixed Horizon Control (SFHC), achieves a 1+O(1/w) competitive ratio, where w is the number of predictions available to the learner. Our conditions do not require the cost functions to be convex, and we also derive competitive ratio results for non-convex hitting and movement costs. Our results provide the first constant, dimension-free competitive ratio for online non-convex optimization with movement costs. We also give an example of a natural problem, Convex Body Chasing (CBC), where the sufficient conditions are not satisfied and prove that no online algorithm can have a competitive ratio that converges to 1.","PeriodicalId":188517,"journal":{"name":"Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134476320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Given the central role of webcams in monitoring physical surroundings, it behooves the research community to understand the characteristics of webcams' distribution and their privacy/security implications. In this paper, we conduct the first systematic study on live webcams from both aggregation sites and individual webcams (webpages/IP hosts). We propose a series of efficient, automated techniques for detecting and fingerprinting live webcams. In particular, we leverage distributed algorithms to detect aggregation sites and generate webcam fingerprints by utilizing the Graphical User Interface (GUI) of the built-in web server of a device. Overall, we observe 0.85 million webpages from aggregation sites hosting live webcams and 2.2 million live webcams in the public IPv4 space. Our study reveals that aggregation sites have a typical long-tail distribution in hosting live streams (5.8% of sites contain 90.44% of live streaming contents), and 85.4% of aggregation websites scrape webcams from others. Further, we observe that (1) 277,239 webcams from aggregation sites and IP hosts (11.7%) directly expose live streams to the public, (2) aggregation sites expose 187,897 geolocation names and more detailed 23,083 longitude/latitude pairs of webcams, (3) the default usernames and passwords of 38,942 webcams are visible on aggregation sites in plaintext, and (4) 1,237 webcams are detected as having been compromised to conduct malicious behaviors.
{"title":"Under the Concealing Surface: Detecting and Understanding Live Webcams in the Wild","authors":"Jinke Song, Qiang Li, Haining Wang, Limin Sun","doi":"10.1145/3393691.3394220","DOIUrl":"https://doi.org/10.1145/3393691.3394220","url":null,"abstract":"Given the central role of webcams in monitoring physical surroundings, it behooves the research community to understand the characteristics of webcams' distribution and their privacy/security implications. In this paper, we conduct the first systematic study on live webcams from both aggregation sites and individual webcams (webpages/IP hosts). We propose a series of efficient, automated techniques for detecting and fingerprinting live webcams. In particular, we leverage distributed algorithms to detect aggregation sites and generate webcam fingerprints by utilizing the Graphical User Interface (GUI) of the built-in web server of a device. Overall, we observe 0.85 million webpages from aggregation sites hosting live webcams and 2.2 million live webcams in the public IPv4 space. Our study reveals that aggregation sites have a typical long-tail distribution in hosting live streams (5.8% of sites contain 90.44% of live streaming contents), and 85.4% of aggregation websites scrape webcams from others. Further, we observe that (1) 277,239 webcams from aggregation sites and IP hosts (11.7%) directly expose live streams to the public, (2) aggregation sites expose 187,897 geolocation names and more detailed 23,083 longitude/latitude pairs of webcams, (3) the default usernames and passwords of 38,942 webcams are visible on aggregation sites in plaintext, and (4) 1,237 webcams are detected as having been compromised to conduct malicious behaviors.","PeriodicalId":188517,"journal":{"name":"Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130304660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
New memory technologies are blurring the previously distinctive performance characteristics of adjacent layers in the memory hierarchy. No longer are such layers orders of magnitude different in request latency or capacity. Beyond the traditional single-layer view of caching, we now must re-cast the problem as a data placement challenge: which data should be cached in faster memory if it could instead be served directly from slower memory? We present CHOPT, an offline algorithm for data placement across multiple tiers of memory with asymmetric read and write costs. We show that CHOPT is optimal and can therefore serve as the upper bound of performance gain for any data placement algorithm. We also demonstrate an approximation of CHOPT which makes its execution time for long traces practical using spatial sampling of requests incurring a small 0.2% average error on representative workloads at a sampling ratio of 1%. Our evaluation of CHOPT on more than 30 production traces and benchmarks shows that optimal data placement decisions could improve average request latency by 8.2%-44.8% when compared with the long-established gold standard: Belady and Mattson's offline, evict-farthest-in-the-future optimal algorithms. Our results identify substantial improvement opportunities for future online memory management research.
{"title":"Optimal Data Placement for Heterogeneous Cache, Memory, and Storage Systems","authors":"Lei Zhang, Reza Karimi, I. Ahmad, Ymir Vigfusson","doi":"10.1145/3393691.3394229","DOIUrl":"https://doi.org/10.1145/3393691.3394229","url":null,"abstract":"New memory technologies are blurring the previously distinctive performance characteristics of adjacent layers in the memory hierarchy. No longer are such layers orders of magnitude different in request latency or capacity. Beyond the traditional single-layer view of caching, we now must re-cast the problem as a data placement challenge: which data should be cached in faster memory if it could instead be served directly from slower memory? We present CHOPT, an offline algorithm for data placement across multiple tiers of memory with asymmetric read and write costs. We show that CHOPT is optimal and can therefore serve as the upper bound of performance gain for any data placement algorithm. We also demonstrate an approximation of CHOPT which makes its execution time for long traces practical using spatial sampling of requests incurring a small 0.2% average error on representative workloads at a sampling ratio of 1%. Our evaluation of CHOPT on more than 30 production traces and benchmarks shows that optimal data placement decisions could improve average request latency by 8.2%-44.8% when compared with the long-established gold standard: Belady and Mattson's offline, evict-farthest-in-the-future optimal algorithms. Our results identify substantial improvement opportunities for future online memory management research.","PeriodicalId":188517,"journal":{"name":"Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123822483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider stochastic bandit problems with a continuous set of arms and where the expected reward is a continuous and unimodal function of the arm. For these problems, we propose the Stochastic Polychotomy (SP) algorithms, and derive finite-time upper bounds on their regret and optimization error. We show that, for a class of reward functions, the SP algorithm achieves a regret and an optimization error with optimal scalings, i.e., O(√T) and O(1/√T) (up to a logarithmic factor), respectively.
{"title":"Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness","authors":"Richard Combes, A. Proutière, A. Fauquette","doi":"10.1145/3393691.3394225","DOIUrl":"https://doi.org/10.1145/3393691.3394225","url":null,"abstract":"We consider stochastic bandit problems with a continuous set of arms and where the expected reward is a continuous and unimodal function of the arm. For these problems, we propose the Stochastic Polychotomy (SP) algorithms, and derive finite-time upper bounds on their regret and optimization error. We show that, for a class of reward functions, the SP algorithm achieves a regret and an optimization error with optimal scalings, i.e., O(√T) and O(1/√T) (up to a logarithmic factor), respectively.","PeriodicalId":188517,"journal":{"name":"Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130891941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider a large distributed service system consisting of n homogeneous servers with infinite capacity FIFO queues. Jobs arrive as a Poisson process of rate λ n/kn (for some positive constant λ and integer kn). Each incoming job consists of kn identical tasks that can be executed in parallel, and that can be encoded into at least kn "replicas" of the same size (by introducing redundancy) so that the job is considered to be completed when any kn replicas associated with it finish their service. Moreover, we assume that servers can experience random slowdowns in their processing rate so that the service time of a replica is the product of its size and a random slowdown. First, we assume that the server slowdowns are shifted exponential and independent of the replica sizes. In this setting we show that the delay of a typical job is asymptotically minimized (as n→∞) when the number of replicas per task is a constant that only depends on the arrival rate λ, and on the expected slowdown of servers. Second, we introduce a new model for the server slowdowns in which larger tasks experience less variable slowdowns than smaller tasks. In this setting we show that, under the class of policies where all replicas start their service at the same time, the delay of a typical job is asymptotically minimized (as n→∞) when the number of replicas per task is made to depend on the actual size of the tasks being replicated, with smaller tasks being replicated more than larger tasks.
{"title":"Delay-Optimal Policies in Partial Fork-Join Systems with Redundancy and Random Slowdowns","authors":"Martin Zubeldia","doi":"10.1145/3393691.3394181","DOIUrl":"https://doi.org/10.1145/3393691.3394181","url":null,"abstract":"We consider a large distributed service system consisting of n homogeneous servers with infinite capacity FIFO queues. Jobs arrive as a Poisson process of rate λ n/kn (for some positive constant λ and integer kn). Each incoming job consists of kn identical tasks that can be executed in parallel, and that can be encoded into at least kn \"replicas\" of the same size (by introducing redundancy) so that the job is considered to be completed when any kn replicas associated with it finish their service. Moreover, we assume that servers can experience random slowdowns in their processing rate so that the service time of a replica is the product of its size and a random slowdown. First, we assume that the server slowdowns are shifted exponential and independent of the replica sizes. In this setting we show that the delay of a typical job is asymptotically minimized (as n→∞) when the number of replicas per task is a constant that only depends on the arrival rate λ, and on the expected slowdown of servers. Second, we introduce a new model for the server slowdowns in which larger tasks experience less variable slowdowns than smaller tasks. In this setting we show that, under the class of policies where all replicas start their service at the same time, the delay of a typical job is asymptotically minimized (as n→∞) when the number of replicas per task is made to depend on the actual size of the tasks being replicated, with smaller tasks being replicated more than larger tasks.","PeriodicalId":188517,"journal":{"name":"Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122572888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work, we consider the popular tree-based search strategy within the framework of reinforcement learning, the Monte Carlo Tree Search (MCTS), in the context of infinite-horizon discounted cost Markov Decision Process (MDP) with deterministic transitions. While MCTS is believed to provide an approximate value function for a given state with enough simulations, cf. [Kocsis and Szepesvari 2006; Kocsis et al. 2006], the claimed proof of this property is incomplete. This is due to the fact that the variant of MCTS, the Upper Confidence Bound for Trees (UCT), analyzed in prior works utilizes "logarithmic" bonus term for balancing exploration and exploitation within the tree-based search, following the insights from stochastic multi-arm bandit (MAB) literature, cf. [Agrawal 1995; Auer et al. 2002]. In effect, such an approach assumes that the regret of the underlying recursively dependent non-stationary MABs concentrates around their mean exponentially in the number of steps, which is unlikely to hold as pointed out in [Audibert et al. 2009], even for stationary MABs. As the key contribution of this work, we establish polynomial concentration property of regret for a class of non-stationary multi-arm bandits. This in turn establishes that the MCTS with appropriate polynomial rather than logarithmic bonus term in UCB has the claimed property of [Kocsis and Szepesvari 2006; Kocsis et al. 2006]. Interestingly enough, empirically successful approaches (cf. [Silver et al. 2017]) utilize a similar polynomial form of MCTS as suggested by our result. Using this as a building block, we argue that MCTS, combined with nearest neighbor supervised learning, acts as a "policy improvement" operator, i.e., it iteratively improves value function approximation for all states, due to combining with supervised learning, despite evaluating at only finitely many states. In effect, we establish that to learn an ε-approximation of the value function for deterministic MDPs with respect to ℓ∞ norm, MCTS combined with nearest neighbor requires a sample size scaling as Õ (ε-(d+4), where d is the dimension of the state space. This is nearly optimal due to a minimax lower bound of ∼Ω (ε-(d+2) [Shah and Xie 2018], suggesting the strength of the variant of MCTS we propose here and our resulting analysis.
在这项工作中,我们考虑了在具有确定性过渡的无限视界贴现成本马尔可夫决策过程(MDP)的背景下,在强化学习框架内流行的基于树的搜索策略,蒙特卡洛树搜索(MCTS)。虽然MCTS被认为为给定状态提供了一个近似的值函数,但有足够的模拟,参见[Kocsis和Szepesvari 2006;Kocsis et al. 2006],该属性的证明是不完整的。这是由于MCTS的变体,即树的上置信限(UCT),在先前的工作中分析,利用“对数”奖励项来平衡基于树的搜索中的探索和开发,这是根据随机多臂强盗(MAB)文献的见解,参见[Agrawal 1995;Auer et al. 2002]。实际上,这种方法假设潜在的递归依赖的非平稳mab的遗憾在步数上以指数形式集中在它们的平均值周围,正如[Audibert et al. 2009]所指出的那样,即使对于平稳mab,这也不太可能成立。作为本工作的关键贡献,我们建立了一类非平稳多臂强盗的多项式后悔集中性质。这反过来又确立了UCB中具有适当多项式而不是对数奖励项的MCTS具有[Kocsis和Szepesvari 2006;Kocsis et al. 2006]。有趣的是,经验上成功的方法(参见[Silver et al. 2017])利用了与我们的结果相似的多项式形式的MCTS。使用此作为构建块,我们认为MCTS与最近邻监督学习相结合,充当“策略改进”算子,即,尽管仅在有限多个状态下评估,但由于与监督学习相结合,它迭代地改进了所有状态的值函数近似。实际上,我们建立了为了学习确定性MDPs相对于r∞范数的值函数的ε-近似,MCTS结合最近邻需要样本大小缩放为Õ (ε-(d+4),其中d是状态空间的维数。由于最小和最大下界为~ Ω (ε-(d+2) [Shah and Xie 2018],这几乎是最优的,这表明我们在这里提出的MCTS变体的强度以及我们的结果分析。
{"title":"Non-Asymptotic Analysis of Monte Carlo Tree Search","authors":"D. Shah, Qiaomin Xie, Zhi Xu","doi":"10.1145/3393691.3394202","DOIUrl":"https://doi.org/10.1145/3393691.3394202","url":null,"abstract":"In this work, we consider the popular tree-based search strategy within the framework of reinforcement learning, the Monte Carlo Tree Search (MCTS), in the context of infinite-horizon discounted cost Markov Decision Process (MDP) with deterministic transitions. While MCTS is believed to provide an approximate value function for a given state with enough simulations, cf. [Kocsis and Szepesvari 2006; Kocsis et al. 2006], the claimed proof of this property is incomplete. This is due to the fact that the variant of MCTS, the Upper Confidence Bound for Trees (UCT), analyzed in prior works utilizes \"logarithmic\" bonus term for balancing exploration and exploitation within the tree-based search, following the insights from stochastic multi-arm bandit (MAB) literature, cf. [Agrawal 1995; Auer et al. 2002]. In effect, such an approach assumes that the regret of the underlying recursively dependent non-stationary MABs concentrates around their mean exponentially in the number of steps, which is unlikely to hold as pointed out in [Audibert et al. 2009], even for stationary MABs. As the key contribution of this work, we establish polynomial concentration property of regret for a class of non-stationary multi-arm bandits. This in turn establishes that the MCTS with appropriate polynomial rather than logarithmic bonus term in UCB has the claimed property of [Kocsis and Szepesvari 2006; Kocsis et al. 2006]. Interestingly enough, empirically successful approaches (cf. [Silver et al. 2017]) utilize a similar polynomial form of MCTS as suggested by our result. Using this as a building block, we argue that MCTS, combined with nearest neighbor supervised learning, acts as a \"policy improvement\" operator, i.e., it iteratively improves value function approximation for all states, due to combining with supervised learning, despite evaluating at only finitely many states. In effect, we establish that to learn an ε-approximation of the value function for deterministic MDPs with respect to ℓ∞ norm, MCTS combined with nearest neighbor requires a sample size scaling as Õ (ε-(d+4), where d is the dimension of the state space. This is nearly optimal due to a minimax lower bound of ∼Ω (ε-(d+2) [Shah and Xie 2018], suggesting the strength of the variant of MCTS we propose here and our resulting analysis.","PeriodicalId":188517,"journal":{"name":"Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems","volume":"199 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132809674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sushil Mahavir Varma, Pornpawee Bumpensanti, S. T. Maguluri, He Wang
Motivated by diverse applications in sharing economy and online marketplaces, we consider optimal pricing and matching control in a two-sided queueing system. We assume that heterogeneous customers and servers arrive to the system with price-dependent arrival rates. The compatibility between servers and customers is specified by a bipartite graph. Once a pair of customer and server are matched, they depart from the system instantaneously. The objective is to maximize the long-run average profits of the system while minimizing average waiting time. We first propose a static pricing and max-weight matching policy, which achieves O(√η) optimality rate when all of the arrival rates are scaled by η. We further show that a dynamic pricing and modified max-weight matching policy achieves an improved O(η1/3) optimality rate. In addition, we propose a constraint generation algorithm that solves value function approximation of the MDP and demonstrate strong numerical performance of this algorithm.
{"title":"Dynamic Pricing and Matching for Two-Sided Queues","authors":"Sushil Mahavir Varma, Pornpawee Bumpensanti, S. T. Maguluri, He Wang","doi":"10.1145/3393691.3394183","DOIUrl":"https://doi.org/10.1145/3393691.3394183","url":null,"abstract":"Motivated by diverse applications in sharing economy and online marketplaces, we consider optimal pricing and matching control in a two-sided queueing system. We assume that heterogeneous customers and servers arrive to the system with price-dependent arrival rates. The compatibility between servers and customers is specified by a bipartite graph. Once a pair of customer and server are matched, they depart from the system instantaneously. The objective is to maximize the long-run average profits of the system while minimizing average waiting time. We first propose a static pricing and max-weight matching policy, which achieves O(√η) optimality rate when all of the arrival rates are scaled by η. We further show that a dynamic pricing and modified max-weight matching policy achieves an improved O(η1/3) optimality rate. In addition, we propose a constraint generation algorithm that solves value function approximation of the MDP and demonstrate strong numerical performance of this algorithm.","PeriodicalId":188517,"journal":{"name":"Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131247804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}