{"title":"学习算法的风险偏好","authors":"Andreas Haupt, Aroon Narayanan","doi":"10.1016/j.geb.2024.09.013","DOIUrl":null,"url":null,"abstract":"<div><div>Many economic decision-makers today rely on learning algorithms for important decisions. This paper shows that a widely used learning algorithm—<em>ε</em>-Greedy—exhibits emergent risk aversion, favoring actions with lower payoff variance. When presented with actions of the same expectated payoff, under a wide range of conditions, <em>ε</em>-Greedy chooses the lower-variance action with probability approaching one. This emergent preference can have wide-ranging consequences, from inequity to homogenization, and holds transiently even when the higher-variance action has a strictly higher expected payoff. We discuss two methods to restore risk neutrality. The first method reweights data as a function of how likely an action is chosen. The second method employs optimistic payoff estimates for actions that have not been taken often.</div></div>","PeriodicalId":48291,"journal":{"name":"Games and Economic Behavior","volume":"148 ","pages":"Pages 415-426"},"PeriodicalIF":1.0000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Risk preferences of learning algorithms\",\"authors\":\"Andreas Haupt, Aroon Narayanan\",\"doi\":\"10.1016/j.geb.2024.09.013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Many economic decision-makers today rely on learning algorithms for important decisions. This paper shows that a widely used learning algorithm—<em>ε</em>-Greedy—exhibits emergent risk aversion, favoring actions with lower payoff variance. When presented with actions of the same expectated payoff, under a wide range of conditions, <em>ε</em>-Greedy chooses the lower-variance action with probability approaching one. This emergent preference can have wide-ranging consequences, from inequity to homogenization, and holds transiently even when the higher-variance action has a strictly higher expected payoff. We discuss two methods to restore risk neutrality. The first method reweights data as a function of how likely an action is chosen. The second method employs optimistic payoff estimates for actions that have not been taken often.</div></div>\",\"PeriodicalId\":48291,\"journal\":{\"name\":\"Games and Economic Behavior\",\"volume\":\"148 \",\"pages\":\"Pages 415-426\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Games and Economic Behavior\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S089982562400143X\",\"RegionNum\":3,\"RegionCategory\":\"经济学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Games and Economic Behavior","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S089982562400143X","RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ECONOMICS","Score":null,"Total":0}
Many economic decision-makers today rely on learning algorithms for important decisions. This paper shows that a widely used learning algorithm—ε-Greedy—exhibits emergent risk aversion, favoring actions with lower payoff variance. When presented with actions of the same expectated payoff, under a wide range of conditions, ε-Greedy chooses the lower-variance action with probability approaching one. This emergent preference can have wide-ranging consequences, from inequity to homogenization, and holds transiently even when the higher-variance action has a strictly higher expected payoff. We discuss two methods to restore risk neutrality. The first method reweights data as a function of how likely an action is chosen. The second method employs optimistic payoff estimates for actions that have not been taken often.
期刊介绍:
Games and Economic Behavior facilitates cross-fertilization between theories and applications of game theoretic reasoning. It consistently attracts the best quality and most creative papers in interdisciplinary studies within the social, biological, and mathematical sciences. Most readers recognize it as the leading journal in game theory. Research Areas Include: • Game theory • Economics • Political science • Biology • Computer science • Mathematics • Psychology