DYNAMIC PROGRAMMING ON RECURSIVE REWARD SYSTEMS

Bulletin of Mathematical Statistics Pub Date : 1976-03-01 DOI:10.5109/13108

N. Furukawa, Seiichi Iwamoto

{"title":"DYNAMIC PROGRAMMING ON RECURSIVE REWARD SYSTEMS","authors":"N. Furukawa, Seiichi Iwamoto","doi":"10.5109/13108","DOIUrl":null,"url":null,"abstract":"Dynamic programming (DP) has been introduced by R. Bellman [2] as an important technique to solve non-linear programming problems in which a sequence of decisions has to be chosen in an optimal manner. Bellman, in his book, proposed \"Principle of Optimality\" to show that th e determination of an optimal policy can be reduced to the solution of an optimality equation, i. e., a functional equation that should be satisfied by an optimal return. Although Principle of Optimality is a proposition which needs mathematical reasoning, his justification for the principle was not in a precise mathematical form. For this reason, the scope of cost structure to which the principle is applicable has been left unexplained. Afterward, G. L. Nemhauser [9] gave a sufficient condition for the cost structure in order that an optimality equation holds true. His condition is that the cost function should have both a separability property and a monotonicity property. Nemhauser did not make explicit the relation between the effectiveness of Bellman's principle and the justification for an optimality equation — the relation is no more trivial under his condition. In this paper we shall be concerned with the optimization of finite-stage sequential decision processes. We shall give rigorous proofs for the justification of optimality equations and for the effectiveness of optimality principles with two meanings , without assuming the existence of maximum values of returns. Our condition is that the cost function should have a recursiveness property, a monotonicity property and a Lipschitz condition. Our recursiveness is essentially same as the separability in Nemhauser sense. Our monotonicity has two senses : one is a wide sense, and the other a strict sense. The monotonicity properties in the wide and the strict senses, together with the recursiveness and the Lipschitz condition, induce optimality principles in a weak and a strong senses, respectively. Bellman's Principle of Optimality is well to be identified, in our terms, a principle in the strong sense. Our principle in the weak sense has not been introduced in other literatures as far as the authors know. If we assume the existence of maximum values of returns like Nemhauser did, then the Lipschitz condition can be suppressed from hypotheses in our arguments. In this paper we treat both deterministic and stochastic cases. Section 2 is","PeriodicalId":287765,"journal":{"name":"Bulletin of Mathematical Statistics","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1976-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of Mathematical Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5109/13108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

Dynamic programming (DP) has been introduced by R. Bellman [2] as an important technique to solve non-linear programming problems in which a sequence of decisions has to be chosen in an optimal manner. Bellman, in his book, proposed "Principle of Optimality" to show that th e determination of an optimal policy can be reduced to the solution of an optimality equation, i. e., a functional equation that should be satisfied by an optimal return. Although Principle of Optimality is a proposition which needs mathematical reasoning, his justification for the principle was not in a precise mathematical form. For this reason, the scope of cost structure to which the principle is applicable has been left unexplained. Afterward, G. L. Nemhauser [9] gave a sufficient condition for the cost structure in order that an optimality equation holds true. His condition is that the cost function should have both a separability property and a monotonicity property. Nemhauser did not make explicit the relation between the effectiveness of Bellman's principle and the justification for an optimality equation — the relation is no more trivial under his condition. In this paper we shall be concerned with the optimization of finite-stage sequential decision processes. We shall give rigorous proofs for the justification of optimality equations and for the effectiveness of optimality principles with two meanings , without assuming the existence of maximum values of returns. Our condition is that the cost function should have a recursiveness property, a monotonicity property and a Lipschitz condition. Our recursiveness is essentially same as the separability in Nemhauser sense. Our monotonicity has two senses : one is a wide sense, and the other a strict sense. The monotonicity properties in the wide and the strict senses, together with the recursiveness and the Lipschitz condition, induce optimality principles in a weak and a strong senses, respectively. Bellman's Principle of Optimality is well to be identified, in our terms, a principle in the strong sense. Our principle in the weak sense has not been introduced in other literatures as far as the authors know. If we assume the existence of maximum values of returns like Nemhauser did, then the Lipschitz condition can be suppressed from hypotheses in our arguments. In this paper we treat both deterministic and stochastic cases. Section 2 is

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

递归奖励系统的动态规划

动态规划(DP)由R. Bellman[2]引入，作为解决非线性规划问题的一种重要技术，在非线性规划问题中，必须以最优方式选择一系列决策。Bellman在他的书中提出了“最优性原理”，表明最优策略的确定可以归结为一个最优性方程的解，即一个应该被最优收益所满足的泛函方程。虽然最优性原理是一个需要数学推理的命题，但他对该原理的论证并不是精确的数学形式。因此，该原则适用的成本结构范围一直没有得到解释。随后，G. L. Nemhauser[9]给出了成本结构的充分条件，使最优性方程成立。他的条件是代价函数必须同时具有可分性和单调性。Nemhauser没有明确说明Bellman原理的有效性与最优性方程的证明之间的关系——在他的条件下，这种关系不再是微不足道的。本文主要研究有限阶段序列决策过程的优化问题。我们将在不假设收益最大值存在的情况下，对最优性方程的正当性和最优性原则的有效性给出具有两种意义的严格证明。我们的条件是代价函数必须具有递归性、单调性和Lipschitz条件。我们的递归性本质上和内姆豪瑟意义上的可分性是一样的。我们的单调有两种意义:一种是广义的单调，另一种是狭义的单调。广义和严格意义上的单调性，以及递归性和Lipschitz条件，分别推导出弱和强意义上的最优性原则。用我们的术语来说，Bellman的最优原则，是一个强意义上的原则。据作者所知，其他文献中还没有介绍过我们的弱意义原理。如果我们像Nemhauser那样假设存在收益最大值，那么Lipschitz条件就可以从我们论证中的假设中被抑制。在本文中，我们处理确定性和随机情况。第二部分是

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助