Mathematics of Operations Research最新文献_第6页

Optimal Investment Strategy for α-Robust Utility Maximization Problem α-稳健效用最大化问题的最优投资策略

IF 1.7 3区数学 Q2 MATHEMATICS, APPLIED

Mathematics of Operations Research

Pub Date : 2024-03-21 DOI: 10.1287/moor.2023.0076

Zhou Yang, Danping Li, Yan Zeng, Guanting Liu

In reality, investors are uncertain about the dynamics of risky asset returns. Therefore, investors prefer to make robust investment decisions. In this paper, we propose an α-robust utility maximization problem under uncertain parameters. The investor is allowed to invest in a financial market consisting of a risk-free asset and a risky asset. The uncertainty about the expected return rate is parameterized by a nonempty set. Different from most existing literature on robust utility maximization problems where investors are generally assumed to be extremely ambiguity averse because they tend to consider only expected utility in the worst-case scenario, we pay attention to the investors who are not only ambiguity averse but also ambiguity seeking. Under power utility, we provide the implicit function representations for the precommitted strategy, equilibrium strategy of the open-loop type, and equilibrium strategy of the closed-loop type. Some properties about the optimal trading strategies, the best-case and worst-case parameters under three different kinds of strategies, are provided.Funding: This work was supported by National Natural Science Foundation of China [Grants 12071147, 12171169, 12271171, 12371470, 71721001, 71931004, 72371256], the Shanghai Philosophy Social Science Planning Office Project [Grant 2022ZJB005], Fundamental Research Funds for the Central Universities [Grant 2022QKT001], the Excellent Young Team Project Natural Science Foundation of Guangdong Province of China [Grant 2023B1515040001], the Philosophy and Social Science Programming Foundation of Guangdong Province [Grant GD22CYJ17], the Nature Science Foundation of Guangdong Province of China [Grant 2022A1515011472], and the 111 Project [Grant B14019].

在现实中，投资者对风险资产收益的动态并不确定。因此，投资者更倾向于做出稳健的投资决策。本文提出了一个不确定参数下的α稳健效用最大化问题。投资者可以投资于由无风险资产和风险资产组成的金融市场。预期收益率的不确定性由一个非空集参数化。与大多数关于稳健效用最大化问题的现有文献不同，我们关注的是那些不仅厌恶模糊性，而且还追求模糊性的投资者。在幂效用下，我们提供了预承诺策略、开环型均衡策略和闭环型均衡策略的隐式函数表示。我们还提供了三种不同策略下的最优交易策略、最佳情况参数和最坏情况参数的一些特性：本研究得到国家自然科学基金[12071147, 12171169, 12271171, 12371470, 71721001, 71931004, 72371256]、上海市哲学社会科学规划办公室项目[2022ZJB005]、中央高校基本科研业务费[2022QKT001]和上海市优秀青年团队项目[2022ZJB005]的资助、广东省自然科学基金优秀青年团队项目[批准号：2023B1515040001]、广东省哲学社会科学规划基金项目[批准号：GD22CYJ17]、广东省自然科学基金项目[批准号：2022A1515011472]和 "111 "项目[批准号：B14019]。

{"title":"Optimal Investment Strategy for α-Robust Utility Maximization Problem","authors":"Zhou Yang, Danping Li, Yan Zeng, Guanting Liu","doi":"10.1287/moor.2023.0076","DOIUrl":"https://doi.org/10.1287/moor.2023.0076","url":null,"abstract":"In reality, investors are uncertain about the dynamics of risky asset returns. Therefore, investors prefer to make robust investment decisions. In this paper, we propose an α-robust utility maximization problem under uncertain parameters. The investor is allowed to invest in a financial market consisting of a risk-free asset and a risky asset. The uncertainty about the expected return rate is parameterized by a nonempty set. Different from most existing literature on robust utility maximization problems where investors are generally assumed to be extremely ambiguity averse because they tend to consider only expected utility in the worst-case scenario, we pay attention to the investors who are not only ambiguity averse but also ambiguity seeking. Under power utility, we provide the implicit function representations for the precommitted strategy, equilibrium strategy of the open-loop type, and equilibrium strategy of the closed-loop type. Some properties about the optimal trading strategies, the best-case and worst-case parameters under three different kinds of strategies, are provided.Funding: This work was supported by National Natural Science Foundation of China [Grants 12071147, 12171169, 12271171, 12371470, 71721001, 71931004, 72371256], the Shanghai Philosophy Social Science Planning Office Project [Grant 2022ZJB005], Fundamental Research Funds for the Central Universities [Grant 2022QKT001], the Excellent Young Team Project Natural Science Foundation of Guangdong Province of China [Grant 2023B1515040001], the Philosophy and Social Science Programming Foundation of Guangdong Province [Grant GD22CYJ17], the Nature Science Foundation of Guangdong Province of China [Grant 2022A1515011472], and the 111 Project [Grant B14019].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"183 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140199553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multilevel Langevin Pathwise Average for Gibbs Approximation 用于吉布斯逼近的多级朗之文路径平均法

IF 1.7 3区数学 Q2 MATHEMATICS, APPLIED

Mathematics of Operations Research

Pub Date : 2024-03-20 DOI: 10.1287/moor.2021.0243

Maxime Egéa, Fabien Panloup

We propose and study a new multilevel method for the numerical approximation of a Gibbs distribution π on [Formula: see text], based on (overdamped) Langevin diffusions. This method relies on a multilevel occupation measure, that is, on an appropriate combination of R occupation measures of (constant-step) Euler schemes with respective steps [Formula: see text]. We first state a quantitative result under general assumptions that guarantees an ε-approximation (in an L2-sense) with a cost of the order [Formula: see text] or [Formula: see text] under less contractive assumptions. We then apply it to overdamped Langevin diffusions with strongly convex potential [Formula: see text] and obtain an ε-complexity of the order [Formula: see text] or [Formula: see text] under additional assumptions on U. More precisely, up to universal constants, an appropriate choice of the parameters leads to a cost controlled by [Formula: see text] (where [Formula: see text] and [Formula: see text] respectively denote the supremum and the infimum of the largest and lowest eigenvalue of [Formula: see text]). We finally complete these theoretical results with some numerical illustrations, including comparisons to other algorithms in Bayesian learning and opening to the non–strongly convex setting.Funding: The authors are grateful to the SIRIC ILIAD Nantes-Angers program, supported by the French National Cancer Institute [INCA-DGOS-Inserm Grant 12558].

我们提出并研究了一种新的多级方法，基于（过阻尼）朗格文扩散，对[公式：见正文]上的吉布斯分布π进行数值逼近。该方法依赖于多级占优度量，即具有各自步长的（恒定步长）欧拉方案的 R 级占优度量的适当组合[公式：见正文]。我们首先给出一个一般假设下的定量结果，它保证了ε 近似（在 L2 意义上），其代价为[公式：见正文]或[公式：见正文]。然后，我们将其应用于具有强凸势的、过阻尼的朗格文扩散[公式：见正文]，并在 U 的额外假设下得到[公式：见正文]或[公式：见正文]阶的ε复杂性。更确切地说，在不超出普遍常数的情况下，参数的适当选择会导致[公式：见正文]所控制的代价（其中[公式：见正文]和[公式：见正文]分别表示[公式：见正文]的最大和最小特征值的上峰和下峰）。最后，我们通过一些数值说明完成了这些理论结果，包括与贝叶斯学习中其他算法的比较，以及向非强凸设置的开放：作者感谢法国国家癌症研究所[INCA-DGOS-Inserm Grant 12558]支持的 SIRIC ILIAD Nantes-Angers 计划。

{"title":"Multilevel Langevin Pathwise Average for Gibbs Approximation","authors":"Maxime Egéa, Fabien Panloup","doi":"10.1287/moor.2021.0243","DOIUrl":"https://doi.org/10.1287/moor.2021.0243","url":null,"abstract":"We propose and study a new multilevel method for the numerical approximation of a Gibbs distribution π on [Formula: see text], based on (overdamped) Langevin diffusions. This method relies on a multilevel occupation measure, that is, on an appropriate combination of R occupation measures of (constant-step) Euler schemes with respective steps [Formula: see text]. We first state a quantitative result under general assumptions that guarantees an ε-approximation (in an L2-sense) with a cost of the order [Formula: see text] or [Formula: see text] under less contractive assumptions. We then apply it to overdamped Langevin diffusions with strongly convex potential [Formula: see text] and obtain an ε-complexity of the order [Formula: see text] or [Formula: see text] under additional assumptions on U. More precisely, up to universal constants, an appropriate choice of the parameters leads to a cost controlled by [Formula: see text] (where [Formula: see text] and [Formula: see text] respectively denote the supremum and the infimum of the largest and lowest eigenvalue of [Formula: see text]). We finally complete these theoretical results with some numerical illustrations, including comparisons to other algorithms in Bayesian learning and opening to the non–strongly convex setting.Funding: The authors are grateful to the SIRIC ILIAD Nantes-Angers program, supported by the French National Cancer Institute [INCA-DGOS-Inserm Grant 12558].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"80 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140297973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mean-Field Multiagent Reinforcement Learning: A Decentralized Network Approach 平均场多代理强化学习：分散网络方法

IF 1.7 3区数学 Q2 MATHEMATICS, APPLIED

Mathematics of Operations Research

Pub Date : 2024-03-13 DOI: 10.1287/moor.2022.0055

Haotian Gu, Xin Guo, Xiaoli Wei, Renyuan Xu

One of the challenges for multiagent reinforcement learning (MARL) is designing efficient learning algorithms for a large system in which each agent has only limited or partial information of the entire system. Whereas exciting progress has been made to analyze decentralized MARL with the network of agents for social networks and team video games, little is known theoretically for decentralized MARL with the network of states for modeling self-driving vehicles, ride-sharing, and data and traffic routing. This paper proposes a framework of localized training and decentralized execution to study MARL with the network of states. Localized training means that agents only need to collect local information in their neighboring states during the training phase; decentralized execution implies that agents can execute afterward the learned decentralized policies, which depend only on agents’ current states. The theoretical analysis consists of three key components: the first is the reformulation of the MARL system as a networked Markov decision process with teams of agents, enabling updating the associated team Q-function in a localized fashion; the second is the Bellman equation for the value function and the appropriate Q-function on the probability measure space; and the third is the exponential decay property of the team Q-function, facilitating its approximation with efficient sample efficiency and controllable error. The theoretical analysis paves the way for a new algorithm LTDE-Neural-AC, in which the actor–critic approach with overparameterized neural networks is proposed. The convergence and sample complexity are established and shown to be scalable with respect to the sizes of both agents and states. To the best of our knowledge, this is the first neural network–based MARL algorithm with network structure and provable convergence guarantee.Funding: X. Wei is partially supported by NSFC no. 12201343. R. Xu is partially supported by the NSF CAREER award DMS-2339240.

多代理强化学习（MARL）面临的挑战之一，是为一个大型系统设计高效的学习算法，而在这个系统中，每个代理只掌握整个系统的有限或部分信息。虽然在分析社交网络和团队视频游戏中的代理网络分散强化学习方面已经取得了令人振奋的进展，但对于自动驾驶汽车建模、共享乘车、数据和交通路由的状态网络分散强化学习，理论界却知之甚少。本文提出了一个本地化训练和分散执行的框架，以研究具有状态网络的 MARL。本地化训练指的是，代理在训练阶段只需收集其相邻状态的本地信息；分散执行指的是，代理可以在事后执行所学到的分散策略，这些策略只取决于代理的当前状态。理论分析由三个关键部分组成：第一部分是将 MARL 系统重新表述为一个具有代理团队的网络化马尔可夫决策过程，从而能够以本地化的方式更新相关的团队 Q 函数；第二部分是概率度量空间上的价值函数和适当 Q 函数的贝尔曼方程；第三部分是团队 Q 函数的指数衰减特性，有助于以高效的样本效率和可控的误差对其进行近似。理论分析为新算法 LTDE-Neural-AC 的提出铺平了道路。该算法的收敛性和采样复杂度均已确定，并证明可根据代理和状态的大小进行扩展。据我们所知，这是第一个基于神经网络的 MARL 算法，具有网络结构和可证明的收敛性保证：国家自然科学基金委员会编号：12201343。12201343.R. Xu 部分获得国家自然科学基金 CAREER 奖 DMS-2339240 的资助。

{"title":"Mean-Field Multiagent Reinforcement Learning: A Decentralized Network Approach","authors":"Haotian Gu, Xin Guo, Xiaoli Wei, Renyuan Xu","doi":"10.1287/moor.2022.0055","DOIUrl":"https://doi.org/10.1287/moor.2022.0055","url":null,"abstract":"One of the challenges for multiagent reinforcement learning (MARL) is designing efficient learning algorithms for a large system in which each agent has only limited or partial information of the entire system. Whereas exciting progress has been made to analyze decentralized MARL with the network of agents for social networks and team video games, little is known theoretically for decentralized MARL with the network of states for modeling self-driving vehicles, ride-sharing, and data and traffic routing. This paper proposes a framework of localized training and decentralized execution to study MARL with the network of states. Localized training means that agents only need to collect local information in their neighboring states during the training phase; decentralized execution implies that agents can execute afterward the learned decentralized policies, which depend only on agents’ current states. The theoretical analysis consists of three key components: the first is the reformulation of the MARL system as a networked Markov decision process with teams of agents, enabling updating the associated team Q-function in a localized fashion; the second is the Bellman equation for the value function and the appropriate Q-function on the probability measure space; and the third is the exponential decay property of the team Q-function, facilitating its approximation with efficient sample efficiency and controllable error. The theoretical analysis paves the way for a new algorithm LTDE-Neural-AC, in which the actor–critic approach with overparameterized neural networks is proposed. The convergence and sample complexity are established and shown to be scalable with respect to the sizes of both agents and states. To the best of our knowledge, this is the first neural network–based MARL algorithm with network structure and provable convergence guarantee.Funding: X. Wei is partially supported by NSFC no. 12201343. R. Xu is partially supported by the NSF CAREER award DMS-2339240.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"30 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semidefinite Approximations for Bicliques and Bi-Independent Pairs 双桥和双独立对的半无限逼近法

IF 1.7 3区数学 Q2 MATHEMATICS, APPLIED

Mathematics of Operations Research

Pub Date : 2024-03-13 DOI: 10.1287/moor.2023.0046

Monique Laurent, Sven Polak, Luis Felipe Vargas

We investigate some graph parameters dealing with bi-independent pairs (A, B) in a bipartite graph [Formula: see text], that is, pairs (A, B) where [Formula: see text], and [Formula: see text] are independent. These parameters also allow us to study bicliques in general graphs. When maximizing the cardinality [Formula: see text], one finds the stability number [Formula: see text], well-known to be polynomial-time computable. When maximizing the product [Formula: see text], one finds the parameter g(G), shown to be NP-hard by Peeters in 2003, and when maximizing the ratio [Formula: see text], one finds h(G), introduced by Vallentin in 2020 for bounding product-free sets in finite groups. We show that h(G) is an NP-hard parameter and, as a crucial ingredient, that it is NP-complete to decide whether a bipartite graph G has a balanced maximum independent set. These hardness results motivate introducing semidefinite programming (SDP) bounds for g(G), h(G), and [Formula: see text] (the maximum cardinality of a balanced independent set). We show that these bounds can be seen as natural variations of the Lovász ϑ-number, a well-known semidefinite bound on [Formula: see text]. In addition, we formulate closed-form eigenvalue bounds, and we show relationships among them as well as with earlier spectral parameters by Hoffman and Haemers in 2001 and Vallentin in 2020.Funding: This work was supported by H2020 Marie Skłodowska-Curie Actions [Grant 813211 (POEMA)].

我们研究了一些处理双向图[公式：见正文]中的双独立成对（A，B）的图参数，即[公式：见正文]和[公式：见正文]是独立的成对（A，B）。通过这些参数，我们还可以研究一般图中的双阙。当最大化心数[公式：见正文]时，我们会发现稳定数[公式：见正文]，众所周知，稳定数是可以用多项式时间计算的。当最大化乘积[公式：见正文]时，我们会发现参数 g(G)，Peeters 在 2003 年证明它是 NP 难的，而当最大化比值[公式：见正文]时，我们会发现 h(G)，Vallentin 在 2020 年引入它用于限定有限群中的无乘积集。我们证明，h(G) 是一个 NP-困难参数，而且作为一个关键要素，决定一个双方图 G 是否有一个平衡的最大独立集也是 NP-困难的。这些困难性结果促使我们为 g(G)、h(G) 和 [公式：见正文]（平衡独立集的最大心数）引入了半定量编程 (SDP) 边界。我们证明，这些约束可以看作是 Lovász ϑ数的自然变化，而 Lovász ϑ数是[式：见正文]的一个著名的半有限约束。此外，我们还提出了闭式特征值边界，并展示了它们之间的关系，以及与霍夫曼和海默斯（Hoffman and Haemers）于 2001 年、瓦伦汀（Vallentin）于 2020 年提出的光谱参数之间的关系：这项工作得到了 H2020 玛丽-斯克沃多夫斯卡-居里行动[第 813211 号拨款（POEMA）]的支持。

{"title":"Semidefinite Approximations for Bicliques and Bi-Independent Pairs","authors":"Monique Laurent, Sven Polak, Luis Felipe Vargas","doi":"10.1287/moor.2023.0046","DOIUrl":"https://doi.org/10.1287/moor.2023.0046","url":null,"abstract":"We investigate some graph parameters dealing with bi-independent pairs (A, B) in a bipartite graph [Formula: see text], that is, pairs (A, B) where [Formula: see text], and [Formula: see text] are independent. These parameters also allow us to study bicliques in general graphs. When maximizing the cardinality [Formula: see text], one finds the stability number [Formula: see text], well-known to be polynomial-time computable. When maximizing the product [Formula: see text], one finds the parameter g(G), shown to be NP-hard by Peeters in 2003, and when maximizing the ratio [Formula: see text], one finds h(G), introduced by Vallentin in 2020 for bounding product-free sets in finite groups. We show that h(G) is an NP-hard parameter and, as a crucial ingredient, that it is NP-complete to decide whether a bipartite graph G has a balanced maximum independent set. These hardness results motivate introducing semidefinite programming (SDP) bounds for g(G), h(G), and [Formula: see text] (the maximum cardinality of a balanced independent set). We show that these bounds can be seen as natural variations of the Lovász ϑ-number, a well-known semidefinite bound on [Formula: see text]. In addition, we formulate closed-form eigenvalue bounds, and we show relationships among them as well as with earlier spectral parameters by Hoffman and Haemers in 2001 and Vallentin in 2020.Funding: This work was supported by H2020 Marie Skłodowska-Curie Actions [Grant 813211 (POEMA)].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"23 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Marginal Values of a Stochastic Game 随机博弈的边际值

IF 1.7 3区数学 Q2 MATHEMATICS, APPLIED

Mathematics of Operations Research

Pub Date : 2024-03-12 DOI: 10.1287/moor.2023.0297

Luc Attia, Miquel Oliu-Barton, Raimundo Saona

Zero-sum stochastic games are parameterized by payoffs, transitions, and possibly a discount rate. In this article, we study how the main solution concepts, the discounted and undiscounted values, vary when these parameters are perturbed. We focus on the marginal values, introduced by Mills in 1956 in the context of matrix games—that is, the directional derivatives of the value along any fixed perturbation. We provide a formula for the marginal values of a discounted stochastic game. Further, under mild assumptions on the perturbation, we provide a formula for their limit as the discount rate vanishes and for the marginal values of an undiscounted stochastic game. We also show, via an example, that the two latter differ in general.Funding: This work was supported by Fondation CFM pour la Recherche; the European Research Council [Grant ERC-CoG-863818 (ForM-SMArt)]; and Agence Nationale de la Recherche [Grant ANR-21-CE40-0020].

零和随机博弈的参数包括报酬、转换和可能的贴现率。在本文中，我们将研究当这些参数受到扰动时，主要的解概念（贴现值和未贴现值）是如何变化的。我们重点研究米尔斯于 1956 年在矩阵博弈中引入的边际值--即沿着任何固定扰动的值的方向导数。我们提供了贴现随机博弈的边际值公式。此外，根据对扰动的温和假设，我们还给出了贴现率消失时的极限值公式，以及未贴现随机博弈的边际值公式。我们还通过一个例子说明，后两者在一般情况下是不同的：本研究得到了 Fondation CFM pour la Recherche、欧洲研究理事会 [Grant ERC-CoG-863818 (ForM-SMArt)] 和 Agence Nationale de la Recherche [Grant ANR-21-CE40-0020] 的支持。

引用次数: 0

Convergence and Stability of Coupled Belief-Strategy Learning Dynamics in Continuous Games 连续博弈中信念-策略耦合学习动态的收敛性与稳定性

IF 1.7 3区数学 Q2 MATHEMATICS, APPLIED

Mathematics of Operations Research

Pub Date : 2024-03-12 DOI: 10.1287/moor.2022.0161

Manxi Wu, Saurabh Amin, Asuman Ozdaglar

We propose a learning dynamics to model how strategic agents repeatedly play a continuous game while relying on an information platform to learn an unknown payoff-relevant parameter. In each time step, the platform updates a belief estimate of the parameter based on players’ strategies and realized payoffs using Bayes’ rule. Then, players adopt a generic learning rule to adjust their strategies based on the updated belief. We present results on the convergence of beliefs and strategies and the properties of convergent fixed points of the dynamics. We obtain sufficient and necessary conditions for the existence of globally stable fixed points. We also provide sufficient conditions for the local stability of fixed points. These results provide an approach to analyzing the long-term outcomes that arise from the interplay between Bayesian belief learning and strategy learning in games and enable us to characterize conditions under which learning leads to a complete information equilibrium.Funding: Financial support from the Air Force Office of Scientific Research [Project Building Attack Resilience into Complex Networks], the Simons Institute [research fellowship], and a Michael Hammer Fellowship is gratefully acknowledged.

我们提出了一种学习动态模型，用以模拟策略参与者如何在依靠信息平台学习未知的与报酬相关参数的同时，反复进行连续博弈。在每个时间步骤中，平台都会根据博弈者的策略和实现的回报，利用贝叶斯规则更新参数的信念估计值。然后，玩家采用通用学习规则，根据更新后的信念调整策略。我们介绍了信念和策略的收敛结果以及动态收敛定点的特性。我们获得了全局稳定定点存在的充分和必要条件。我们还提供了定点局部稳定的充分条件。这些结果为分析贝叶斯信念学习和策略学习在博弈中的相互作用所产生的长期结果提供了一种方法，并使我们能够描述学习导致完全信息均衡的条件：感谢空军科学研究办公室[在复杂网络中建立攻击复原力项目]、西蒙斯研究所[研究奖学金]和迈克尔-哈默奖学金的资助。

{"title":"Convergence and Stability of Coupled Belief-Strategy Learning Dynamics in Continuous Games","authors":"Manxi Wu, Saurabh Amin, Asuman Ozdaglar","doi":"10.1287/moor.2022.0161","DOIUrl":"https://doi.org/10.1287/moor.2022.0161","url":null,"abstract":"We propose a learning dynamics to model how strategic agents repeatedly play a continuous game while relying on an information platform to learn an unknown payoff-relevant parameter. In each time step, the platform updates a belief estimate of the parameter based on players’ strategies and realized payoffs using Bayes’ rule. Then, players adopt a generic learning rule to adjust their strategies based on the updated belief. We present results on the convergence of beliefs and strategies and the properties of convergent fixed points of the dynamics. We obtain sufficient and necessary conditions for the existence of globally stable fixed points. We also provide sufficient conditions for the local stability of fixed points. These results provide an approach to analyzing the long-term outcomes that arise from the interplay between Bayesian belief learning and strategy learning in games and enable us to characterize conditions under which learning leads to a complete information equilibrium.Funding: Financial support from the Air Force Office of Scientific Research [Project Building Attack Resilience into Complex Networks], the Simons Institute [research fellowship], and a Michael Hammer Fellowship is gratefully acknowledged.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"72 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP 风险敏感指数成本 MDP 的策略梯度算法

IF 1.7 3区数学 Q2 MATHEMATICS, APPLIED

Mathematics of Operations Research

Pub Date : 2024-03-11 DOI: 10.1287/moor.2022.0139

Mehrdad Moharrami, Yashaswini Murthy, Arghyadip Roy, R. Srikant

We study the risk-sensitive exponential cost Markov decision process (MDP) formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies. We derive a formula that can be used to compute the policy gradient from (state, action, cost) information collected from sample paths of the MDP for each fixed parameterized policy. Unlike the traditional average cost problem, standard stochastic approximation theory cannot be used to exploit this formula. To address the issue, we introduce a truncated and smooth version of the risk-sensitive cost and show that this new cost criterion can be used to approximate the risk-sensitive cost and its gradient uniformly under some mild assumptions. We then develop a trajectory-based gradient algorithm to minimize the smooth truncated estimation of the risk-sensitive cost and derive conditions under which a sequence of truncations can be used to solve the original, untruncated cost problem.Funding: This work was supported by the Office of Naval Research Global [Grant N0001419-1-2566], the Division of Computer and Network Systems [Grant 21-06801], the Army Research Office [Grant W911NF-19-1-0379], and the Division of Computing and Communication Foundations [Grants 17-04970 and 19-34986].

我们研究了风险敏感指数成本马尔可夫决策过程（Markov decision process，MDP）公式，并开发了一种基于轨迹的梯度算法，以找到与一组参数化策略相关的成本静止点。我们推导出一个公式，可用于根据从每个固定参数化政策的 MDP 样本路径收集的（状态、行动、成本）信息计算政策梯度。与传统的平均成本问题不同，标准的随机逼近理论不能用来利用这个公式。为了解决这个问题，我们引入了风险敏感成本的截断和平滑版本，并证明在一些温和的假设条件下，这种新的成本标准可以用来均匀地近似风险敏感成本及其梯度。然后，我们开发了一种基于轨迹的梯度算法，以最小化风险敏感成本的平滑截断估计值，并推导出在哪些条件下可以使用一连串的截断来解决原始的、未截断的成本问题：这项工作得到了全球海军研究办公室[N0001419-1-2566 号资助]、计算机和网络系统部[21-06801 号资助]、陆军研究办公室[W911NF-19-1-0379 号资助]以及计算和通信基础部[17-04970 号和 19-34986 号资助]的支持。

{"title":"A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP","authors":"Mehrdad Moharrami, Yashaswini Murthy, Arghyadip Roy, R. Srikant","doi":"10.1287/moor.2022.0139","DOIUrl":"https://doi.org/10.1287/moor.2022.0139","url":null,"abstract":"We study the risk-sensitive exponential cost Markov decision process (MDP) formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies. We derive a formula that can be used to compute the policy gradient from (state, action, cost) information collected from sample paths of the MDP for each fixed parameterized policy. Unlike the traditional average cost problem, standard stochastic approximation theory cannot be used to exploit this formula. To address the issue, we introduce a truncated and smooth version of the risk-sensitive cost and show that this new cost criterion can be used to approximate the risk-sensitive cost and its gradient uniformly under some mild assumptions. We then develop a trajectory-based gradient algorithm to minimize the smooth truncated estimation of the risk-sensitive cost and derive conditions under which a sequence of truncations can be used to solve the original, untruncated cost problem.Funding: This work was supported by the Office of Naval Research Global [Grant N0001419-1-2566], the Division of Computer and Network Systems [Grant 21-06801], the Army Research Office [Grant W911NF-19-1-0379], and the Division of Computing and Communication Foundations [Grants 17-04970 and 19-34986].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"7 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140107747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parametric Semidefinite Programming: Geometry of the Trajectory of Solutions 参数半无限编程：解的轨迹几何

IF 1.7 3区数学 Q2 MATHEMATICS, APPLIED

Mathematics of Operations Research

Pub Date : 2024-03-08 DOI: 10.1287/moor.2021.0097

Antonio Bellon, Didier Henrion, Vyacheslav Kungurtsev, Jakub Mareček

In many applications, solutions of convex optimization problems are updated on-line, as functions of time. In this paper, we consider parametric semidefinite programs, which are linear optimization problems in the semidefinite cone whose coefficients (input data) depend on a time parameter. We are interested in the geometry of the solution (output data) trajectory, defined as the set of solutions depending on the parameter. We propose an exhaustive description of the geometry of the solution trajectory. As our main result, we show that only six distinct behaviors can be observed at a neighborhood of a given point along the solution trajectory. Each possible behavior is then illustrated by an example.Funding: This work was supported by OP RDE [Grant CZ.02.1.01/0.0/0.0/16_019/0000765].

在许多应用中，凸优化问题的解作为时间函数进行在线更新。在本文中，我们考虑的是参数半定量程序，它是半定量锥中的线性优化问题，其系数（输入数据）取决于时间参数。我们感兴趣的是解决方案（输出数据）轨迹的几何形状，它被定义为取决于参数的解决方案集合。我们提出了对解法轨迹几何的详尽描述。我们的主要结果表明，在解轨迹上给定点的邻域只能观察到六种不同的行为。然后，我们将通过一个例子来说明每种可能的行为：本研究得到了 RDE OP [Grant CZ.02.1.01/0.0/0.0/16_019/0000765] 的支持。

引用次数: 0

On the (Im-)Possibility of Representing Probability Distributions as a Difference of I.I.D. Noise Terms 论以 I.I.D. 噪声项之差表示概率分布的（非）可能性

IF 1.7 3区数学 Q2 MATHEMATICS, APPLIED

Mathematics of Operations Research

Pub Date : 2024-03-07 DOI: 10.1287/moor.2023.0081

Christian Ewerhart, Marco Serena

A random variable is difference-form decomposable (DFD) if it may be written as the difference of two i.i.d. random terms. We show that densities of such variables exhibit a remarkable degree of structure. Specifically, a DFD density can be neither approximately uniform, nor quasiconvex, nor strictly concave. On the other hand, a DFD density need, in general, be neither unimodal nor logconcave. Regarding smoothness, we show that a compactly supported DFD density cannot be analytic and will often exhibit a kink even if its components are smooth. The analysis highlights the risks for model consistency resulting from the strategy widely adopted in the economics literature of imposing assumptions directly on a difference of noise terms rather than on its components.

如果一个随机变量可以写成两个 i.i.d. 随机项的差分，那么它就是差分形式可分解变量（DFD）。我们的研究表明，这种变量的密度具有显著的结构性。具体来说，DFD 密度既不是近似均匀的，也不是准凸的，更不是严格凹的。另一方面，DFD 密度一般既不需要是单模态的，也不需要是对数凹的。关于平滑性，我们证明了紧凑支撑的 DFD 密度不可能是解析的，即使其分量是平滑的，也会经常出现扭结。分析强调了经济学文献中广泛采用的策略对模型一致性的风险，即直接对噪声项的差值而非其组成部分施加假设。

引用次数: 0

Optimal Consumption and Investment with Independent Stochastic Labor Income 具有独立随机劳动收入的最优消费与投资

IF 1.7 3区数学 Q2 MATHEMATICS, APPLIED

Mathematics of Operations Research

Pub Date : 2024-03-05 DOI: 10.1287/moor.2023.0119

Alain Bensoussan, Seyoung Park

We develop a new dynamic continuous-time model of optimal consumption and investment to include independent stochastic labor income. We reduce the problem of solving the Bellman equation to a problem of solving an integral equation. We then explicitly characterize the optimal consumption and investment strategy as a function of income-to-wealth ratio. We provide some analytical comparative statics associated with the value function and optimal strategies. We also develop a quite general numerical algorithm for control iteration and solve the Bellman equation as a sequence of solutions to ordinary differential equations. This numerical algorithm can be readily applied to many other optimal consumption and investment problems especially with extra nondiversifiable Brownian risks, resulting in nonlinear Bellman equations. Finally, our numerical analysis illustrates how the presence of stochastic labor income affects the optimal consumption and investment strategy.Funding: A. Bensoussan was supported by the National Science Foundation under grant [DMS-2204795]. S. Park was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea, South Korea [NRF-2022S1A3A2A02089950].

我们建立了一个新的动态连续时间最优消费和投资模型，其中包括独立的随机劳动收入。我们将贝尔曼方程的求解问题简化为积分方程的求解问题。然后，我们将最优消费和投资策略明确表征为收入与财富比率的函数。我们提供了一些与价值函数和最优策略相关的分析比较静态。我们还开发了一种相当通用的控制迭代数值算法，并将贝尔曼方程作为常微分方程的解序列来求解。这种数值算法可以很容易地应用于许多其他最优消费和投资问题，尤其是涉及额外的不可分散布朗风险，从而导致非线性贝尔曼方程的问题。最后，我们的数值分析说明了随机劳动收入的存在如何影响最优消费和投资策略：A. Bensoussan 受美国国家科学基金会资助[DMS-2204795]。S. Park 由大韩民国教育部和韩国国家研究基金会 [NRF-2022S1A3A2A02089950] 资助。

{"title":"Optimal Consumption and Investment with Independent Stochastic Labor Income","authors":"Alain Bensoussan, Seyoung Park","doi":"10.1287/moor.2023.0119","DOIUrl":"https://doi.org/10.1287/moor.2023.0119","url":null,"abstract":"We develop a new dynamic continuous-time model of optimal consumption and investment to include independent stochastic labor income. We reduce the problem of solving the Bellman equation to a problem of solving an integral equation. We then explicitly characterize the optimal consumption and investment strategy as a function of income-to-wealth ratio. We provide some analytical comparative statics associated with the value function and optimal strategies. We also develop a quite general numerical algorithm for control iteration and solve the Bellman equation as a sequence of solutions to ordinary differential equations. This numerical algorithm can be readily applied to many other optimal consumption and investment problems especially with extra nondiversifiable Brownian risks, resulting in nonlinear Bellman equations. Finally, our numerical analysis illustrates how the presence of stochastic labor income affects the optimal consumption and investment strategy.Funding: A. Bensoussan was supported by the National Science Foundation under grant [DMS-2204795]. S. Park was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea, South Korea [NRF-2022S1A3A2A02089950].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"278 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140055887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0