I use an important ruling, Marblegate Asset Management v. Education Management Corporation (EMC), to study the economic role of two publicly traded debt restructuring methods: coercive bond exchange offers and Chapter 11. This ruling restricted firms from employing coercive bond exchange offers to facilitate out-of-court restructurings, thereby increasing the likelihood of restructuring publicly traded debt under Chapter 11. Following the ruling, investment in affected distressed firms decreased substantially, but investment efficiency improved. The changes in covenant, maturity, and offering yield of newly issued bonds suggested that existing bondholders with covenants gained more bargaining power than shareholders and new bondholders. The paper provides causal evidence from a large sample analysis, demonstrating the divergent effects of these two publicly traded debt restructuring methods on investment policies. This paper was accepted by Victoria Ivashina, finance. Funding: Funding for this research was provided by University of Pittsburgh (Doctoral Fellowship) and Vanderbilt University (standard research funds). Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.01831 .
{"title":"Publicly Traded Debt Restructuring Methods, Corporate Investment, and Debt Contracting","authors":"Xin Fan","doi":"10.1287/mnsc.2022.01831","DOIUrl":"https://doi.org/10.1287/mnsc.2022.01831","url":null,"abstract":"I use an important ruling, Marblegate Asset Management v. Education Management Corporation (EMC), to study the economic role of two publicly traded debt restructuring methods: coercive bond exchange offers and Chapter 11. This ruling restricted firms from employing coercive bond exchange offers to facilitate out-of-court restructurings, thereby increasing the likelihood of restructuring publicly traded debt under Chapter 11. Following the ruling, investment in affected distressed firms decreased substantially, but investment efficiency improved. The changes in covenant, maturity, and offering yield of newly issued bonds suggested that existing bondholders with covenants gained more bargaining power than shareholders and new bondholders. The paper provides causal evidence from a large sample analysis, demonstrating the divergent effects of these two publicly traded debt restructuring methods on investment policies. This paper was accepted by Victoria Ivashina, finance. Funding: Funding for this research was provided by University of Pittsburgh (Doctoral Fellowship) and Vanderbilt University (standard research funds). Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.01831 .","PeriodicalId":49890,"journal":{"name":"Management Science","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141114506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many professional and educational settings require individuals to be willing and able to perform under time pressure. We use a laboratory experiment and survey data to study preferences for working under time pressure. We make three main contributions. First, we develop an incentivized method to measure preferences for working under time pressure and document that participants in our laboratory experiment are averse to working under time pressure on average. Second, we show that there is substantial heterogeneity in the degree of time pressure aversion across individuals and that these individual preferences can be partially captured by simple survey questions. Third, we include these questions in a survey of bachelor’s degree students and a nationally representative survey panel and show that time pressure preferences predict career choices and income. Our results indicate that individual differences in time pressure aversion could be an influential factor in determining labor market outcomes. This paper was accepted by Yan Chen, behavioral economics and decision analysis. Funding: This work was supported by the Jan Wallander and Tom Hedelius Foundation, European Research Council under the European Union’s Horizon 2020 Research and Innovation Programme [Grant 850590]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2023.02078 .
许多职业和教育环境都要求个人愿意并能够在时间压力下工作。我们利用实验室实验和调查数据来研究在时间压力下工作的偏好。我们的主要贡献有三点。首先,我们开发了一种激励方法来衡量人们对在时间压力下工作的偏好,并证明实验室实验的参与者平均而言厌恶在时间压力下工作。其次,我们证明了不同个体对时间压力的厌恶程度存在很大差异,而这些个体偏好可以通过简单的调查问题得到部分反映。第三,我们将这些问题纳入对学士学位学生和全国代表性调查小组的调查中,结果表明时间压力偏好可以预测职业选择和收入。我们的结果表明,时间压力厌恶的个体差异可能是决定劳动力市场结果的一个影响因素。本文已被行为经济学与决策分析专业的陈艳接受。资助:本研究由 Jan Wallander 和 Tom Hedelius 基金会、欧洲研究理事会根据欧盟 "地平线 2020 研究与创新计划"(Horizon 2020 Research and Innovation Programme [Grant 850590])提供支持。补充材料:在线附录和数据文件请访问 https://doi.org/10.1287/mnsc.2023.02078 。
{"title":"Time Pressure Preferences","authors":"Thomas Buser, R. van Veldhuizen, Yang Zhong","doi":"10.1287/mnsc.2023.02078","DOIUrl":"https://doi.org/10.1287/mnsc.2023.02078","url":null,"abstract":"Many professional and educational settings require individuals to be willing and able to perform under time pressure. We use a laboratory experiment and survey data to study preferences for working under time pressure. We make three main contributions. First, we develop an incentivized method to measure preferences for working under time pressure and document that participants in our laboratory experiment are averse to working under time pressure on average. Second, we show that there is substantial heterogeneity in the degree of time pressure aversion across individuals and that these individual preferences can be partially captured by simple survey questions. Third, we include these questions in a survey of bachelor’s degree students and a nationally representative survey panel and show that time pressure preferences predict career choices and income. Our results indicate that individual differences in time pressure aversion could be an influential factor in determining labor market outcomes. This paper was accepted by Yan Chen, behavioral economics and decision analysis. Funding: This work was supported by the Jan Wallander and Tom Hedelius Foundation, European Research Council under the European Union’s Horizon 2020 Research and Innovation Programme [Grant 850590]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2023.02078 .","PeriodicalId":49890,"journal":{"name":"Management Science","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141118604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Gatekeeper Effect: The Implications of Pre-Screening, Self-Selection, and Bias for Hiring Processes","authors":"Moran Koren","doi":"10.1287/mnsc.2021.03918","DOIUrl":"https://doi.org/10.1287/mnsc.2021.03918","url":null,"abstract":"Management Science, Ahead of Print. <br/>","PeriodicalId":49890,"journal":{"name":"Management Science","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141060173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Valuation plays a central role in determining Chapter 11 reorganization outcomes. However, obtaining accurate valuation estimates of reorganized firms is challenging because of limited firm-specific market-based information and the oft-conflicting incentives of claimholders. We examine the role of industry peer information in reducing misvaluations and its implications for unintended interclaimant wealth transfers and postreorganization performance. First, we find that the availability of relevant industry peer information is negatively associated with equity valuation errors for firms emerging from Chapter 11. Cross-sectional results suggest that the relation between industry peer information and valuation errors varies substantially with debtors’ information environment and case characteristics. Second, we find that industry peer information quality is associated with better ex post financial performance of emerged firms because of lower overvaluation. Finally, we document the role of industry peer information in substantially reducing the frequency and magnitude of unintended wealth transfers between claimants arising from equity valuation errors. This paper was accepted by Suraj Srinivasan, accounting. Funding: The authors appreciate financial support from the Social Sciences and Humanities Research Council of Canada [Grant 435-2020-0583] and the Canadian Academic Accounting Association. B. Fang acknowledges financial support from the Della Suantio Fellowship. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.01233 .
{"title":"Industry Peer Information and the Equity Valuation Accuracy of Firms Emerging from Chapter 11","authors":"Bingxu Fang, Sasan Saiy, Dushyantkumar Vyas","doi":"10.1287/mnsc.2022.01233","DOIUrl":"https://doi.org/10.1287/mnsc.2022.01233","url":null,"abstract":"Valuation plays a central role in determining Chapter 11 reorganization outcomes. However, obtaining accurate valuation estimates of reorganized firms is challenging because of limited firm-specific market-based information and the oft-conflicting incentives of claimholders. We examine the role of industry peer information in reducing misvaluations and its implications for unintended interclaimant wealth transfers and postreorganization performance. First, we find that the availability of relevant industry peer information is negatively associated with equity valuation errors for firms emerging from Chapter 11. Cross-sectional results suggest that the relation between industry peer information and valuation errors varies substantially with debtors’ information environment and case characteristics. Second, we find that industry peer information quality is associated with better ex post financial performance of emerged firms because of lower overvaluation. Finally, we document the role of industry peer information in substantially reducing the frequency and magnitude of unintended wealth transfers between claimants arising from equity valuation errors. This paper was accepted by Suraj Srinivasan, accounting. Funding: The authors appreciate financial support from the Social Sciences and Humanities Research Council of Canada [Grant 435-2020-0583] and the Canadian Academic Accounting Association. B. Fang acknowledges financial support from the Della Suantio Fellowship. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.01233 .","PeriodicalId":49890,"journal":{"name":"Management Science","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140964967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taha Havakhor, Mohammad Saifur Rahman, Tianjian Zhang, Chenqi Zhu
Advancements in technology have reduced information acquisition costs, creating an improved information environment for retail investors. Specifically, new technologies, such as application programming interface (API), deliver high-volume, institutional-like raw data directly to Main Street investors. Although greater availability of information can be beneficial, it may also exacerbate retail investors’ existing trading deficiencies. Exploiting the sudden shutdown of Yahoo! Finance API, the largest free API for retail investors, this study examines how access to tech-enabled raw financial data affects retail investment. We find that retail trading volumes in stocks favored by active retail investors dropped by 8.6%–10.5% within one month of the API shutdown. The remaining retail trades collectively became more predictive of future returns, suggesting less gambling-like behavior after the API shutdown. Moreover, our randomized controlled experiment affirms the underlying mechanism: tech-enabled access to high-volume historical price data increases individuals’ overconfidence, which further leads them to engage in excessive trading. The study reveals an unintended consequence of technology-led, wider data access for retail investors. This paper was accepted by D. J. Wu, information systems. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2021.01379 .
技术进步降低了信息获取成本,为散户投资者创造了更好的信息环境。具体而言,应用程序接口(API)等新技术将大量的、类似于机构的原始数据直接提供给主流投资者。虽然更多的信息可以带来好处,但也可能加剧散户投资者现有的交易缺陷。雅虎财经 API 是面向散户投资者的最大的免费 API,本研究利用雅虎财经 API 的突然关闭,探讨了获取技术驱动的原始金融数据对散户投资的影响。我们发现,在 API 关闭后的一个月内,活跃散户投资者青睐的股票的散户交易量下降了 8.6%-10.5%。剩余的散户交易对未来回报的预测性有所提高,这表明 API 关闭后类似赌博的行为有所减少。此外,我们的随机对照实验证实了潜在的机制:通过技术手段获取大量历史价格数据会增加个人的过度自信,从而进一步导致他们进行过度交易。这项研究揭示了以技术为先导、为散户投资者提供更广泛数据访问的意外后果。本文已被 D. J. Wu(信息系统)接受。补充材料:在线附录和数据文件可在 https://doi.org/10.1287/mnsc.2021.01379 上获取。
{"title":"Tech-Enabled Financial Data Access, Retail Investors, and Gambling-Like Behavior in the Stock Market","authors":"Taha Havakhor, Mohammad Saifur Rahman, Tianjian Zhang, Chenqi Zhu","doi":"10.1287/mnsc.2021.01379","DOIUrl":"https://doi.org/10.1287/mnsc.2021.01379","url":null,"abstract":"Advancements in technology have reduced information acquisition costs, creating an improved information environment for retail investors. Specifically, new technologies, such as application programming interface (API), deliver high-volume, institutional-like raw data directly to Main Street investors. Although greater availability of information can be beneficial, it may also exacerbate retail investors’ existing trading deficiencies. Exploiting the sudden shutdown of Yahoo! Finance API, the largest free API for retail investors, this study examines how access to tech-enabled raw financial data affects retail investment. We find that retail trading volumes in stocks favored by active retail investors dropped by 8.6%–10.5% within one month of the API shutdown. The remaining retail trades collectively became more predictive of future returns, suggesting less gambling-like behavior after the API shutdown. Moreover, our randomized controlled experiment affirms the underlying mechanism: tech-enabled access to high-volume historical price data increases individuals’ overconfidence, which further leads them to engage in excessive trading. The study reveals an unintended consequence of technology-led, wider data access for retail investors. This paper was accepted by D. J. Wu, information systems. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2021.01379 .","PeriodicalId":49890,"journal":{"name":"Management Science","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140971083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study examines the effect of firms’ participation in platform-endorsed review solicitation programs on consumers’ online review generation. We leverage a natural experiment on TripAdvisor, which launched a review solicitation program that allows hotels to collect reviews directly from guests after their stays with the aid of certified connectivity partners. Applying a two-stage difference-in-differences approach to a panel data set of online reviews for a matched set of hotels across TripAdvisor and Expedia, we find that hotels’ participation in the review solicitation program results in a 34.3% increase in review volume, a 0.151 increase in review rating, but a 16.9% decrease in review length. Review solicitation, however, generates a notable negative spillover effect on the volume of organic reviews. Specifically, the volume of organic reviews is reduced by 15.5% after hotels start soliciting reviews. We provide evidence that the motivational crowding-out effect plays an important role in driving this negative spillover. Further analyses reveal that the effects of review solicitation are heterogeneous with respect to hotels of different types and consumers with different demographic and behavioral characteristics. Finally, using a novel structural topic model, we detect a significant shift in review content from specific and concrete topics to general and abstract topics. Our findings suggest that review platforms and firms should be cautious about the unintended negative consequences of review solicitation on consumers’ review generation. This paper was accepted by Hemant Bhargava, information systems. Funding: This work was supported by the National Natural Science Foundation of China [Grants 72371192, 72132008, 71872061, and 72061127002] and the Humanities and Social Science Fund of Ministry of Education of China [Grant 22YJA630021]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2023.01006 .
{"title":"The Pitfalls of Review Solicitation: Evidence from a Natural Experiment on TripAdvisor","authors":"Baojun Gao, Jing Wang, Xiaojie Ding, Yue Guo","doi":"10.1287/mnsc.2023.01006","DOIUrl":"https://doi.org/10.1287/mnsc.2023.01006","url":null,"abstract":"This study examines the effect of firms’ participation in platform-endorsed review solicitation programs on consumers’ online review generation. We leverage a natural experiment on TripAdvisor, which launched a review solicitation program that allows hotels to collect reviews directly from guests after their stays with the aid of certified connectivity partners. Applying a two-stage difference-in-differences approach to a panel data set of online reviews for a matched set of hotels across TripAdvisor and Expedia, we find that hotels’ participation in the review solicitation program results in a 34.3% increase in review volume, a 0.151 increase in review rating, but a 16.9% decrease in review length. Review solicitation, however, generates a notable negative spillover effect on the volume of organic reviews. Specifically, the volume of organic reviews is reduced by 15.5% after hotels start soliciting reviews. We provide evidence that the motivational crowding-out effect plays an important role in driving this negative spillover. Further analyses reveal that the effects of review solicitation are heterogeneous with respect to hotels of different types and consumers with different demographic and behavioral characteristics. Finally, using a novel structural topic model, we detect a significant shift in review content from specific and concrete topics to general and abstract topics. Our findings suggest that review platforms and firms should be cautious about the unintended negative consequences of review solicitation on consumers’ review generation. This paper was accepted by Hemant Bhargava, information systems. Funding: This work was supported by the National Natural Science Foundation of China [Grants 72371192, 72132008, 71872061, and 72061127002] and the Humanities and Social Science Fund of Ministry of Education of China [Grant 22YJA630021]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2023.01006 .","PeriodicalId":49890,"journal":{"name":"Management Science","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140970164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investor Sentiment and the Pricing of Macro Risks for Hedge Funds","authors":"Zhuo Chen, Andrea Lu, Xiaoquan Zhu","doi":"10.1287/mnsc.2022.02792","DOIUrl":"https://doi.org/10.1287/mnsc.2022.02792","url":null,"abstract":"Management Science, Ahead of Print. <br/>","PeriodicalId":49890,"journal":{"name":"Management Science","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141060169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hong Qu, Jared Williams, Ran Zhao, Anthony Kwasnica
We examine a beauty contest game with an option to analyze an additional disclosure. We analytically prove that in some scenarios, coordination incentives cause sophisticated players who can comprehend disclosures to choose not to analyze them to match unsophisticated players’ actions, a phenomenon we call “coordinated inattention.” Laboratory experiments provide support for the coordinated inattention mechanism: Coordination incentives reduce sophisticated subjects’ propensity to analyze disclosures, especially when they believe others are unlikely to comprehend them. We further find that psychological biases help reduce coordinated inattention. Subjects are overconfident, sophisticated subjects overestimate others’ ability to comprehend disclosures, and both biases are associated with a higher tendency to analyze disclosures. Our analysis suggests that unsophisticated decision makers’ inability to comprehend complex disclosures has a negative spillover effect by reducing sophisticated decision makers’ attention to disclosures. Our results highlight the importance of the recent efforts of the Securities and Exchange Commission (SEC) and the Financial Accounting Standards Board (FASB) to make disclosures easier to comprehend. This paper was accepted by Brian Bushee, accounting. Funding: This study involved no funding except the payments made to the experimental subjects; these funds were provided by Penn State University. Supplemental Material: The online appendix and electronic companion are available at https://doi.org/10.1287/mnsc.2021.01029 .
{"title":"Coordinated Inattention and Disclosure Complexity","authors":"Hong Qu, Jared Williams, Ran Zhao, Anthony Kwasnica","doi":"10.1287/mnsc.2021.01029","DOIUrl":"https://doi.org/10.1287/mnsc.2021.01029","url":null,"abstract":"We examine a beauty contest game with an option to analyze an additional disclosure. We analytically prove that in some scenarios, coordination incentives cause sophisticated players who can comprehend disclosures to choose not to analyze them to match unsophisticated players’ actions, a phenomenon we call “coordinated inattention.” Laboratory experiments provide support for the coordinated inattention mechanism: Coordination incentives reduce sophisticated subjects’ propensity to analyze disclosures, especially when they believe others are unlikely to comprehend them. We further find that psychological biases help reduce coordinated inattention. Subjects are overconfident, sophisticated subjects overestimate others’ ability to comprehend disclosures, and both biases are associated with a higher tendency to analyze disclosures. Our analysis suggests that unsophisticated decision makers’ inability to comprehend complex disclosures has a negative spillover effect by reducing sophisticated decision makers’ attention to disclosures. Our results highlight the importance of the recent efforts of the Securities and Exchange Commission (SEC) and the Financial Accounting Standards Board (FASB) to make disclosures easier to comprehend. This paper was accepted by Brian Bushee, accounting. Funding: This study involved no funding except the payments made to the experimental subjects; these funds were provided by Penn State University. Supplemental Material: The online appendix and electronic companion are available at https://doi.org/10.1287/mnsc.2021.01029 .","PeriodicalId":49890,"journal":{"name":"Management Science","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140979591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weichao Mao, Kaiqing Zhang, Ruihao Zhu, David Simchi-Levi, Tamer Başar
We consider model-free reinforcement learning (RL) in nonstationary Markov decision processes. Both the reward functions and the state transition functions are allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain variation budgets. We propose Restarted Q-Learning with Upper Confidence Bounds (RestartQ-UCB), the first model-free algorithm for nonstationary RL, and show that it outperforms existing solutions in terms of dynamic regret. Specifically, RestartQ-UCB with Freedman-type bonus terms achieves a dynamic regret bound of [Formula: see text], where S and A are the numbers of states and actions, respectively, [Formula: see text] is the variation budget, H is the number of time steps per episode, and T is the total number of time steps. We further present a parameter-free algorithm named Double-Restart Q-UCB that does not require prior knowledge of the variation budget. We show that our algorithms are nearly optimal by establishing an information-theoretical lower bound of [Formula: see text], the first lower bound in nonstationary RL. Numerical experiments validate the advantages of RestartQ-UCB in terms of both cumulative rewards and computational efficiency. We demonstrate the power of our results in examples of multiagent RL and inventory control across related products. This paper was accepted by Omar Besbes, revenue management and market analytics. Funding: The research of D. Simchi-Levi and R. Zhu was supported by the MIT Data Science Laboratory. The research of W. Mao, K. Zhang, and T. Başar was supported in part by the U.S. Army Research Laboratory (ARL) Cooperative Agreement W911NF-17-2-0196, in part by the Office of Naval Research (ONR) [MURI Grant N00014-16-1-2710], and in part by the Air Force Office of Scientific Research (AFOSR) [Grant FA9550-19-1-0353]. K. Zhang also acknowledges support from U.S. Army Research Laboratory (ARL) [Grant W911NF-24-1-0085]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.02533 .
{"title":"Model-Free Nonstationary Reinforcement Learning: Near-Optimal Regret and Applications in Multiagent Reinforcement Learning and Inventory Control","authors":"Weichao Mao, Kaiqing Zhang, Ruihao Zhu, David Simchi-Levi, Tamer Başar","doi":"10.1287/mnsc.2022.02533","DOIUrl":"https://doi.org/10.1287/mnsc.2022.02533","url":null,"abstract":"We consider model-free reinforcement learning (RL) in nonstationary Markov decision processes. Both the reward functions and the state transition functions are allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain variation budgets. We propose Restarted Q-Learning with Upper Confidence Bounds (RestartQ-UCB), the first model-free algorithm for nonstationary RL, and show that it outperforms existing solutions in terms of dynamic regret. Specifically, RestartQ-UCB with Freedman-type bonus terms achieves a dynamic regret bound of [Formula: see text], where S and A are the numbers of states and actions, respectively, [Formula: see text] is the variation budget, H is the number of time steps per episode, and T is the total number of time steps. We further present a parameter-free algorithm named Double-Restart Q-UCB that does not require prior knowledge of the variation budget. We show that our algorithms are nearly optimal by establishing an information-theoretical lower bound of [Formula: see text], the first lower bound in nonstationary RL. Numerical experiments validate the advantages of RestartQ-UCB in terms of both cumulative rewards and computational efficiency. We demonstrate the power of our results in examples of multiagent RL and inventory control across related products. This paper was accepted by Omar Besbes, revenue management and market analytics. Funding: The research of D. Simchi-Levi and R. Zhu was supported by the MIT Data Science Laboratory. The research of W. Mao, K. Zhang, and T. Başar was supported in part by the U.S. Army Research Laboratory (ARL) Cooperative Agreement W911NF-17-2-0196, in part by the Office of Naval Research (ONR) [MURI Grant N00014-16-1-2710], and in part by the Air Force Office of Scientific Research (AFOSR) [Grant FA9550-19-1-0353]. K. Zhang also acknowledges support from U.S. Army Research Laboratory (ARL) [Grant W911NF-24-1-0085]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.02533 .","PeriodicalId":49890,"journal":{"name":"Management Science","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140979225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}