We study a dynamic mean-variance portfolio optimization problem under the reinforcement learning framework, where an entropy regularizer is introduced to induce exploration. Due to the time–inconsistency involved in a mean-variance criterion, we aim to learn an equilibrium policy. Under an incomplete market setting, we obtain a semi-analytical, exploratory, equilibrium mean-variance policy that turns out to follow a Gaussian distribution. We then focus on a Gaussian mean return model and propose a reinforcement learning algorithm to find the equilibrium policy. Thanks to a thoroughly designed policy iteration procedure in our algorithm, we prove the convergence of our algorithm under mild conditions, despite that dynamic programming principle and the usual policy improvement theorem failing to hold for an equilibrium policy. Numerical experiments are given to demonstrate our algorithm. The design and implementation of our reinforcement learning algorithm apply to a general market setup.
{"title":"Learning equilibrium mean-variance strategy","authors":"Min Dai, Yuchao Dong, Yanwei Jia","doi":"10.1111/mafi.12402","DOIUrl":"https://doi.org/10.1111/mafi.12402","url":null,"abstract":"<p>We study a dynamic mean-variance portfolio optimization problem under the reinforcement learning framework, where an entropy regularizer is introduced to induce exploration. Due to the time–inconsistency involved in a mean-variance criterion, we aim to learn an equilibrium policy. Under an incomplete market setting, we obtain a semi-analytical, exploratory, equilibrium mean-variance policy that turns out to follow a Gaussian distribution. We then focus on a Gaussian mean return model and propose a reinforcement learning algorithm to find the equilibrium policy. Thanks to a thoroughly designed policy iteration procedure in our algorithm, we prove the convergence of our algorithm under mild conditions, despite that dynamic programming principle and the usual policy improvement theorem failing to hold for an equilibrium policy. Numerical experiments are given to demonstrate our algorithm. The design and implementation of our reinforcement learning algorithm apply to a general market setup.</p>","PeriodicalId":49867,"journal":{"name":"Mathematical Finance","volume":"33 4","pages":"1166-1212"},"PeriodicalIF":1.6,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50120656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper solves the consumption-investment problem under Epstein-Zin preferences on a random horizon. In an incomplete market, we take the random horizon to be a stopping time adapted to the market filtration, generated by all observable, but not necessarily tradable, state processes. Contrary to prior studies, we do not impose any fixed upper bound for the random horizon, allowing for truly unbounded ones. Focusing on the empirically relevant case where the risk aversion and the elasticity of intertemporal substitution are both larger than one, we characterize the optimal consumption and investment strategies using backward stochastic differential equations with superlinear growth on unbounded random horizons. This characterization, compared with the classical fixed-horizon result, involves an additional stochastic process that serves to capture the randomness of the horizon. As demonstrated in two concrete examples, changing from a fixed horizon to a random one drastically alters the optimal strategies.
{"title":"Epstein-Zin utility maximization on a random horizon","authors":"Joshua Aurand, Yu-Jui Huang","doi":"10.1111/mafi.12404","DOIUrl":"10.1111/mafi.12404","url":null,"abstract":"<p>This paper solves the consumption-investment problem under Epstein-Zin preferences on a random horizon. In an incomplete market, we take the random horizon to be a stopping time adapted to the market filtration, generated by all observable, but not necessarily tradable, state processes. Contrary to prior studies, we do not impose any fixed upper bound for the random horizon, allowing for truly unbounded ones. Focusing on the empirically relevant case where the risk aversion and the elasticity of intertemporal substitution are both larger than one, we characterize the optimal consumption and investment strategies using backward stochastic differential equations with superlinear growth on unbounded random horizons. This characterization, compared with the classical fixed-horizon result, involves an additional stochastic process that serves to capture the randomness of the horizon. As demonstrated in two concrete examples, changing from a fixed horizon to a random one drastically alters the optimal strategies.</p>","PeriodicalId":49867,"journal":{"name":"Mathematical Finance","volume":"33 4","pages":"1370-1411"},"PeriodicalIF":1.6,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47971903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The widespread use of market-making algorithms in electronic over-the-counter markets may give rise to unexpected effects resulting from the autonomous learning dynamics of these algorithms. In particular the possibility of “tacit collusion” among market makers has increasingly received regulatory scrutiny. We model the interaction of market makers in a dealer market as a stochastic differential game of intensity control with partial information and study the resulting dynamics of bid-ask spreads. Competition among dealers is modeled as a Nash equilibrium, while collusion is described in terms of Pareto optima. Using a decentralized multi-agent deep reinforcement learning algorithm to model how competing market makers learn to adjust their quotes, we show that the interaction of market making algorithms via market prices, without any sharing of information, may give rise to tacit collusion, with spread levels strictly above the competitive equilibrium level.
{"title":"Dynamics of market making algorithms in dealer markets: Learning and tacit collusion","authors":"Rama Cont, Wei Xiong","doi":"10.1111/mafi.12401","DOIUrl":"10.1111/mafi.12401","url":null,"abstract":"<p>The widespread use of market-making algorithms in electronic over-the-counter markets may give rise to unexpected effects resulting from the autonomous learning dynamics of these algorithms. In particular the possibility of “tacit collusion” among market makers has increasingly received regulatory scrutiny. We model the interaction of market makers in a dealer market as a stochastic differential game of intensity control with partial information and study the resulting dynamics of bid-ask spreads. Competition among dealers is modeled as a Nash equilibrium, while collusion is described in terms of Pareto optima. Using a decentralized multi-agent deep reinforcement learning algorithm to model how competing market makers learn to adjust their quotes, we show that the interaction of market making algorithms via market prices, without any sharing of information, may give rise to tacit collusion, with spread levels strictly above the competitive equilibrium level.</p>","PeriodicalId":49867,"journal":{"name":"Mathematical Finance","volume":"34 2","pages":"467-521"},"PeriodicalIF":1.6,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/mafi.12401","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42985217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We develop a continuous-time control approach to optimal trading in a Proof-of-Stake (PoS) blockchain, formulated as a consumption-investment problem that aims to strike the optimal balance between a participant's (or agent's) utility from holding/trading stakes and utility from consumption. We present solutions via dynamic programming and the Hamilton–Jacobi–Bellman (HJB) equations. When the utility functions are linear or convex, we derive close-form solutions and show that the bang-bang strategy is optimal (i.e., always buy or sell at full capacity). Furthermore, we bring out the explicit connection between the rate of return in trading/holding stakes and the participant's risk-adjusted valuation of the stakes. In particular, we show when a participant is risk-neutral or risk-seeking, corresponding to the risk-adjusted valuation being a martingale or a sub-martingale, the optimal strategy must be to either buy all the time, sell all the time, or first buy then sell, and with both buying and selling executed at full capacity. We also propose a risk-control version of the consumption-investment problem; and for a special case, the “stake-parity” problem, we show a mean-reverting strategy is optimal.
{"title":"Trading under the proof-of-stake protocol – A continuous-time control approach","authors":"Wenpin Tang, David D. Yao","doi":"10.1111/mafi.12403","DOIUrl":"10.1111/mafi.12403","url":null,"abstract":"<p>We develop a continuous-time control approach to optimal trading in a Proof-of-Stake (PoS) blockchain, formulated as a consumption-investment problem that aims to strike the optimal balance between a participant's (or agent's) utility from holding/trading stakes and utility from consumption. We present solutions via dynamic programming and the Hamilton–Jacobi–Bellman (HJB) equations. When the utility functions are linear or convex, we derive close-form solutions and show that the bang-bang strategy is optimal (i.e., always buy or sell at full capacity). Furthermore, we bring out the explicit connection between the rate of return in trading/holding stakes and the participant's risk-adjusted valuation of the stakes. In particular, we show when a participant is risk-neutral or risk-seeking, corresponding to the risk-adjusted valuation being a martingale or a sub-martingale, the optimal strategy must be to either buy all the time, sell all the time, or first buy then sell, and with both buying and selling executed at full capacity. We also propose a risk-control version of the consumption-investment problem; and for a special case, the “stake-parity” problem, we show a mean-reverting strategy is optimal.</p>","PeriodicalId":49867,"journal":{"name":"Mathematical Finance","volume":"33 4","pages":"979-1004"},"PeriodicalIF":1.6,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44081069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Florian Bourgey, Stefano De Marco, Peter K. Friz, Paolo Pigato
Several asymptotic results for the implied volatility generated by a rough volatility model have been obtained in recent years (notably in the small-maturity regime), providing a better understanding of the shapes of the volatility surface induced by rough volatility models, supporting their calibration power to SP500 option data. Rough volatility models also generate a local volatility surface, via the so-called Markovian projection of the stochastic volatility. We complement the existing results on implied volatility by studying the asymptotic behavior of the local volatility surface generated by a class of rough stochastic volatility models, encompassing the rough Bergomi model. Notably, we observe that the celebrated “1/2 skew rule” linking the short-term at-the-money skew of the implied volatility to the short-term at-the-money skew of the local volatility, a consequence of the celebrated “harmonic mean formula” of [Berestycki et al. (2002). Quantitative Finance, 2, 61–69], is replaced by a new rule: the ratio of the at-the-money implied and local volatility skews tends to the constant