Tom'avs Br'azdil, V'aclav Brovzek, K. Chatterjee, Vojtvech Forejt, Anton'in Kuvcera
{"title":"具有多个长期平均目标的马尔可夫决策过程","authors":"Tom'avs Br'azdil, V'aclav Brovzek, K. Chatterjee, Vojtvech Forejt, Anton'in Kuvcera","doi":"10.2168/LMCS-10(1:13)2014","DOIUrl":null,"url":null,"abstract":"We study Markov decision processes (MDPs) with multiple\nlimit-average (or mean-payoff) functions. We consider two\ndifferent objectives, namely, expectation and satisfaction\nobjectives. Given an MDP with k limit-average functions, in the\nexpectation objective the goal is to maximize the expected\nlimit-average value, and in the satisfaction objective the goal\nis to maximize the probability of runs such that the\nlimit-average value stays above a given vector. We show that\nunder the expectation objective, in contrast to the case of one\nlimit-average function, both randomization and memory are\nnecessary for strategies even for epsilon-approximation, and\nthat finite-memory randomized strategies are sufficient for\nachieving Pareto optimal values. Under the satisfaction\nobjective, in contrast to the case of one limit-average\nfunction, infinite memory is necessary for strategies achieving\na specific value (i.e. randomized finite-memory strategies are\nnot sufficient), whereas memoryless randomized strategies are\nsufficient for epsilon-approximation, for all epsilon>0. We\nfurther prove that the decision problems for both expectation\nand satisfaction objectives can be solved in polynomial time\nand the trade-off curve (Pareto curve) can be\nepsilon-approximated in time polynomial in the size of the MDP\nand 1/epsilon, and exponential in the number of limit-average\nfunctions, for all epsilon>0. Our analysis also reveals flaws\nin previous work for MDPs with multiple mean-payoff functions\nunder the expectation objective, corrects the flaws, and allows\nus to obtain improved results.","PeriodicalId":49904,"journal":{"name":"Logical Methods in Computer Science","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Markov Decision Processes with Multiple Long-Run AverageObjectives\",\"authors\":\"Tom'avs Br'azdil, V'aclav Brovzek, K. Chatterjee, Vojtvech Forejt, Anton'in Kuvcera\",\"doi\":\"10.2168/LMCS-10(1:13)2014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study Markov decision processes (MDPs) with multiple\\nlimit-average (or mean-payoff) functions. We consider two\\ndifferent objectives, namely, expectation and satisfaction\\nobjectives. Given an MDP with k limit-average functions, in the\\nexpectation objective the goal is to maximize the expected\\nlimit-average value, and in the satisfaction objective the goal\\nis to maximize the probability of runs such that the\\nlimit-average value stays above a given vector. We show that\\nunder the expectation objective, in contrast to the case of one\\nlimit-average function, both randomization and memory are\\nnecessary for strategies even for epsilon-approximation, and\\nthat finite-memory randomized strategies are sufficient for\\nachieving Pareto optimal values. Under the satisfaction\\nobjective, in contrast to the case of one limit-average\\nfunction, infinite memory is necessary for strategies achieving\\na specific value (i.e. randomized finite-memory strategies are\\nnot sufficient), whereas memoryless randomized strategies are\\nsufficient for epsilon-approximation, for all epsilon>0. We\\nfurther prove that the decision problems for both expectation\\nand satisfaction objectives can be solved in polynomial time\\nand the trade-off curve (Pareto curve) can be\\nepsilon-approximated in time polynomial in the size of the MDP\\nand 1/epsilon, and exponential in the number of limit-average\\nfunctions, for all epsilon>0. Our analysis also reveals flaws\\nin previous work for MDPs with multiple mean-payoff functions\\nunder the expectation objective, corrects the flaws, and allows\\nus to obtain improved results.\",\"PeriodicalId\":49904,\"journal\":{\"name\":\"Logical Methods in Computer Science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2011-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Logical Methods in Computer Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.2168/LMCS-10(1:13)2014\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Logical Methods in Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.2168/LMCS-10(1:13)2014","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Markov Decision Processes with Multiple Long-Run AverageObjectives
We study Markov decision processes (MDPs) with multiple
limit-average (or mean-payoff) functions. We consider two
different objectives, namely, expectation and satisfaction
objectives. Given an MDP with k limit-average functions, in the
expectation objective the goal is to maximize the expected
limit-average value, and in the satisfaction objective the goal
is to maximize the probability of runs such that the
limit-average value stays above a given vector. We show that
under the expectation objective, in contrast to the case of one
limit-average function, both randomization and memory are
necessary for strategies even for epsilon-approximation, and
that finite-memory randomized strategies are sufficient for
achieving Pareto optimal values. Under the satisfaction
objective, in contrast to the case of one limit-average
function, infinite memory is necessary for strategies achieving
a specific value (i.e. randomized finite-memory strategies are
not sufficient), whereas memoryless randomized strategies are
sufficient for epsilon-approximation, for all epsilon>0. We
further prove that the decision problems for both expectation
and satisfaction objectives can be solved in polynomial time
and the trade-off curve (Pareto curve) can be
epsilon-approximated in time polynomial in the size of the MDP
and 1/epsilon, and exponential in the number of limit-average
functions, for all epsilon>0. Our analysis also reveals flaws
in previous work for MDPs with multiple mean-payoff functions
under the expectation objective, corrects the flaws, and allows
us to obtain improved results.
期刊介绍:
Logical Methods in Computer Science is a fully refereed, open access, free, electronic journal. It welcomes papers on theoretical and practical areas in computer science involving logical methods, taken in a broad sense; some particular areas within its scope are listed below. Papers are refereed in the traditional way, with two or more referees per paper. Copyright is retained by the author.
Topics of Logical Methods in Computer Science:
Algebraic methods
Automata and logic
Automated deduction
Categorical models and logic
Coalgebraic methods
Computability and Logic
Computer-aided verification
Concurrency theory
Constraint programming
Cyber-physical systems
Database theory
Defeasible reasoning
Domain theory
Emerging topics: Computational systems in biology
Emerging topics: Quantum computation and logic
Finite model theory
Formalized mathematics
Functional programming and lambda calculus
Inductive logic and learning
Interactive proof checking
Logic and algorithms
Logic and complexity
Logic and games
Logic and probability
Logic for knowledge representation
Logic programming
Logics of programs
Modal and temporal logics
Program analysis and type checking
Program development and specification
Proof complexity
Real time and hybrid systems
Reasoning about actions and planning
Satisfiability
Security
Semantics of programming languages
Term rewriting and equational logic
Type theory and constructive mathematics.