{"title":"A Course in Dynamic Optimization","authors":"Bar Light","doi":"arxiv-2408.03034","DOIUrl":null,"url":null,"abstract":"These lecture notes are derived from a graduate-level course in dynamic\noptimization, offering an introduction to techniques and models extensively\nused in management science, economics, operations research, engineering, and\ncomputer science. The course emphasizes the theoretical underpinnings of\ndiscrete-time dynamic programming models and advanced algorithmic strategies\nfor solving these models. Unlike typical treatments, it provides a proof for\nthe principle of optimality for upper semi-continuous dynamic programming, a\nmiddle ground between the simpler countable state space case\n\\cite{bertsekas2012dynamic}, and the involved universally measurable case\n\\cite{bertsekas1996stochastic}. This approach is sufficiently rigorous to\ninclude important examples such as dynamic pricing, consumption-savings, and\ninventory management models. The course also delves into the properties of\nvalue and policy functions, leveraging classical results\n\\cite{topkis1998supermodularity} and recent developments. Additionally, it\noffers an introduction to reinforcement learning, including a formal proof of\nthe convergence of Q-learning algorithms. Furthermore, the notes delve into\npolicy gradient methods for the average reward case, presenting a convergence\nresult for the tabular case in this context. This result is simple and similar\nto the discounted case but appears to be new.","PeriodicalId":501188,"journal":{"name":"arXiv - ECON - Theoretical Economics","volume":"39 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - Theoretical Economics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.03034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
These lecture notes are derived from a graduate-level course in dynamic
optimization, offering an introduction to techniques and models extensively
used in management science, economics, operations research, engineering, and
computer science. The course emphasizes the theoretical underpinnings of
discrete-time dynamic programming models and advanced algorithmic strategies
for solving these models. Unlike typical treatments, it provides a proof for
the principle of optimality for upper semi-continuous dynamic programming, a
middle ground between the simpler countable state space case
\cite{bertsekas2012dynamic}, and the involved universally measurable case
\cite{bertsekas1996stochastic}. This approach is sufficiently rigorous to
include important examples such as dynamic pricing, consumption-savings, and
inventory management models. The course also delves into the properties of
value and policy functions, leveraging classical results
\cite{topkis1998supermodularity} and recent developments. Additionally, it
offers an introduction to reinforcement learning, including a formal proof of
the convergence of Q-learning algorithms. Furthermore, the notes delve into
policy gradient methods for the average reward case, presenting a convergence
result for the tabular case in this context. This result is simple and similar
to the discounted case but appears to be new.