{"title":"竞赛中的强化学习","authors":"V. Chaudhary","doi":"10.2139/ssrn.3920906","DOIUrl":null,"url":null,"abstract":"We study contests as an example of winner-take-all competition with linearly ordered large strategy space. We study a model in which each player optimizes the probability of winning above some subjective threshold. The environment we consider is that of limited information where agents play the game repeatedly and know their own efforts and outcomes. Players learn through reinforcement. Predictions are derived based on the model dynamics and asymptotic analysis. The model is able to predict individual behavior regularities found in experimental data and track the behavior at aggregate level with reasonable accuracy.","PeriodicalId":373527,"journal":{"name":"PSN: Game Theory (Topic)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning in Contests\",\"authors\":\"V. Chaudhary\",\"doi\":\"10.2139/ssrn.3920906\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study contests as an example of winner-take-all competition with linearly ordered large strategy space. We study a model in which each player optimizes the probability of winning above some subjective threshold. The environment we consider is that of limited information where agents play the game repeatedly and know their own efforts and outcomes. Players learn through reinforcement. Predictions are derived based on the model dynamics and asymptotic analysis. The model is able to predict individual behavior regularities found in experimental data and track the behavior at aggregate level with reasonable accuracy.\",\"PeriodicalId\":373527,\"journal\":{\"name\":\"PSN: Game Theory (Topic)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PSN: Game Theory (Topic)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3920906\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PSN: Game Theory (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3920906","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We study contests as an example of winner-take-all competition with linearly ordered large strategy space. We study a model in which each player optimizes the probability of winning above some subjective threshold. The environment we consider is that of limited information where agents play the game repeatedly and know their own efforts and outcomes. Players learn through reinforcement. Predictions are derived based on the model dynamics and asymptotic analysis. The model is able to predict individual behavior regularities found in experimental data and track the behavior at aggregate level with reasonable accuracy.