Dima Ivanov, Paul Dütting, Inbal Talgam-Cohen, Tonghan Wang, David C. Parkes
{"title":"主代理强化学习","authors":"Dima Ivanov, Paul Dütting, Inbal Talgam-Cohen, Tonghan Wang, David C. Parkes","doi":"arxiv-2407.18074","DOIUrl":null,"url":null,"abstract":"Contracts are the economic framework which allows a principal to delegate a\ntask to an agent -- despite misaligned interests, and even without directly\nobserving the agent's actions. In many modern reinforcement learning settings,\nself-interested agents learn to perform a multi-stage task delegated to them by\na principal. We explore the significant potential of utilizing contracts to\nincentivize the agents. We model the delegated task as an MDP, and study a\nstochastic game between the principal and agent where the principal learns what\ncontracts to use, and the agent learns an MDP policy in response. We present a\nlearning-based algorithm for optimizing the principal's contracts, which\nprovably converges to the subgame-perfect equilibrium of the principal-agent\ngame. A deep RL implementation allows us to apply our method to very large MDPs\nwith unknown transition dynamics. We extend our approach to multiple agents,\nand demonstrate its relevance to resolving a canonical sequential social\ndilemma with minimal intervention to agent rewards.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"55 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Principal-Agent Reinforcement Learning\",\"authors\":\"Dima Ivanov, Paul Dütting, Inbal Talgam-Cohen, Tonghan Wang, David C. Parkes\",\"doi\":\"arxiv-2407.18074\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Contracts are the economic framework which allows a principal to delegate a\\ntask to an agent -- despite misaligned interests, and even without directly\\nobserving the agent's actions. In many modern reinforcement learning settings,\\nself-interested agents learn to perform a multi-stage task delegated to them by\\na principal. We explore the significant potential of utilizing contracts to\\nincentivize the agents. We model the delegated task as an MDP, and study a\\nstochastic game between the principal and agent where the principal learns what\\ncontracts to use, and the agent learns an MDP policy in response. We present a\\nlearning-based algorithm for optimizing the principal's contracts, which\\nprovably converges to the subgame-perfect equilibrium of the principal-agent\\ngame. A deep RL implementation allows us to apply our method to very large MDPs\\nwith unknown transition dynamics. We extend our approach to multiple agents,\\nand demonstrate its relevance to resolving a canonical sequential social\\ndilemma with minimal intervention to agent rewards.\",\"PeriodicalId\":501315,\"journal\":{\"name\":\"arXiv - CS - Multiagent Systems\",\"volume\":\"55 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.18074\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.18074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Contracts are the economic framework which allows a principal to delegate a
task to an agent -- despite misaligned interests, and even without directly
observing the agent's actions. In many modern reinforcement learning settings,
self-interested agents learn to perform a multi-stage task delegated to them by
a principal. We explore the significant potential of utilizing contracts to
incentivize the agents. We model the delegated task as an MDP, and study a
stochastic game between the principal and agent where the principal learns what
contracts to use, and the agent learns an MDP policy in response. We present a
learning-based algorithm for optimizing the principal's contracts, which
provably converges to the subgame-perfect equilibrium of the principal-agent
game. A deep RL implementation allows us to apply our method to very large MDPs
with unknown transition dynamics. We extend our approach to multiple agents,
and demonstrate its relevance to resolving a canonical sequential social
dilemma with minimal intervention to agent rewards.