具有同步目标的随机博弈

Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science Pub Date : 2022-02-25 DOI:10.1145/3531130.3532439

L. Doyen

{"title":"具有同步目标的随机博弈","authors":"L. Doyen","doi":"10.1145/3531130.3532439","DOIUrl":null,"url":null,"abstract":"We consider two-player stochastic games played on a finite graph for infinitely many rounds. Stochastic games generalize both Markov decision processes (MDP) by adding an adversary player, and two-player deterministic games by adding stochasticity. The outcome of the game is a sequence of distributions over the states of the game graph. We consider synchronizing objectives, which require the probability mass to accumulate in a set of target states, either always, once, infinitely often, or always after some point in the outcome sequence; and the winning modes of sure winning (if the accumulated probability is equal to 1) and almost-sure winning (if the accumulated probability is arbitrarily close to 1). We present algorithms to compute the set of winning distributions for each of these synchronizing modes, showing that the corresponding decision problem is PSPACE-complete for synchronizing once and infinitely often, and PTIME-complete for synchronizing always and always after some point. These bounds are remarkably in line with the special case of MDPs, while the algorithmic solution and proof technique are considerably more involved, even for deterministic games. This is because those games have a flavour of imperfect information, in particular they are not determined and randomized strategies need to be considered, even if there is no stochastic choice in the game graph. Moreover, in combination with stochasticity in the game graph, finite-memory strategies are not sufficient in general (for synchronizing infinitely often).","PeriodicalId":373589,"journal":{"name":"Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Stochastic Games with Synchronizing Objectives\",\"authors\":\"L. Doyen\",\"doi\":\"10.1145/3531130.3532439\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider two-player stochastic games played on a finite graph for infinitely many rounds. Stochastic games generalize both Markov decision processes (MDP) by adding an adversary player, and two-player deterministic games by adding stochasticity. The outcome of the game is a sequence of distributions over the states of the game graph. We consider synchronizing objectives, which require the probability mass to accumulate in a set of target states, either always, once, infinitely often, or always after some point in the outcome sequence; and the winning modes of sure winning (if the accumulated probability is equal to 1) and almost-sure winning (if the accumulated probability is arbitrarily close to 1). We present algorithms to compute the set of winning distributions for each of these synchronizing modes, showing that the corresponding decision problem is PSPACE-complete for synchronizing once and infinitely often, and PTIME-complete for synchronizing always and always after some point. These bounds are remarkably in line with the special case of MDPs, while the algorithmic solution and proof technique are considerably more involved, even for deterministic games. This is because those games have a flavour of imperfect information, in particular they are not determined and randomized strategies need to be considered, even if there is no stochastic choice in the game graph. Moreover, in combination with stochasticity in the game graph, finite-memory strategies are not sufficient in general (for synchronizing infinitely often).\",\"PeriodicalId\":373589,\"journal\":{\"name\":\"Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3531130.3532439\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3531130.3532439","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

我们考虑在有限图上进行无限回合的两人随机博弈。随机对策通过增加一个对手来推广马尔可夫决策过程，通过增加随机性来推广双参与者确定性对策。游戏的结果是游戏图状态的一系列分布。我们考虑同步目标，它要求概率质量在一组目标状态中积累，或者总是，一次，无限频繁，或者总是在结果序列中的某个点之后;以及确定获胜(如果累积概率等于1)和几乎确定获胜(如果累积概率任意接近1)的获胜模式。我们给出了计算每种同步模式的获胜分布集的算法，表明相应的决策问题对于同步一次和无限频繁来说是PSPACE-complete，对于总是同步和总是在某点之后同步来说是PTIME-complete。这些界限与mdp的特殊情况非常一致，而算法解决方案和证明技术则更加复杂，甚至对于确定性游戏也是如此。这是因为这些游戏带有不完全信息的味道，特别是它们不是确定的，需要考虑随机策略，即使游戏图表中没有随机选择。此外，结合游戏图中的随机性，有限内存策略通常是不够的(对于无限频繁的同步来说)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Stochastic Games with Synchronizing Objectives

We consider two-player stochastic games played on a finite graph for infinitely many rounds. Stochastic games generalize both Markov decision processes (MDP) by adding an adversary player, and two-player deterministic games by adding stochasticity. The outcome of the game is a sequence of distributions over the states of the game graph. We consider synchronizing objectives, which require the probability mass to accumulate in a set of target states, either always, once, infinitely often, or always after some point in the outcome sequence; and the winning modes of sure winning (if the accumulated probability is equal to 1) and almost-sure winning (if the accumulated probability is arbitrarily close to 1). We present algorithms to compute the set of winning distributions for each of these synchronizing modes, showing that the corresponding decision problem is PSPACE-complete for synchronizing once and infinitely often, and PTIME-complete for synchronizing always and always after some point. These bounds are remarkably in line with the special case of MDPs, while the algorithmic solution and proof technique are considerably more involved, even for deterministic games. This is because those games have a flavour of imperfect information, in particular they are not determined and randomized strategies need to be considered, even if there is no stochastic choice in the game graph. Moreover, in combination with stochasticity in the game graph, finite-memory strategies are not sufficient in general (for synchronizing infinitely often).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助