{"title":"ReLExS: Reinforcement Learning Explanations for Stackelberg No-Regret Learners","authors":"Xiangge Huang, Jingyuan Li, Jiaqing Xie","doi":"arxiv-2408.14086","DOIUrl":null,"url":null,"abstract":"With the constraint of a no regret follower, will the players in a two-player\nStackelberg game still reach Stackelberg equilibrium? We first show when the\nfollower strategy is either reward-average or transform-reward-average, the two\nplayers can always get the Stackelberg Equilibrium. Then, we extend that the\nplayers can achieve the Stackelberg equilibrium in the two-player game under\nthe no regret constraint. Also, we show a strict upper bound of the follower's\nutility difference between with and without no regret constraint. Moreover, in\nconstant-sum two-player Stackelberg games with non-regret action sequences, we\nensure the total optimal utility of the game remains also bounded.","PeriodicalId":501316,"journal":{"name":"arXiv - CS - Computer Science and Game Theory","volume":"204 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Science and Game Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14086","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the constraint of a no regret follower, will the players in a two-player
Stackelberg game still reach Stackelberg equilibrium? We first show when the
follower strategy is either reward-average or transform-reward-average, the two
players can always get the Stackelberg Equilibrium. Then, we extend that the
players can achieve the Stackelberg equilibrium in the two-player game under
the no regret constraint. Also, we show a strict upper bound of the follower's
utility difference between with and without no regret constraint. Moreover, in
constant-sum two-player Stackelberg games with non-regret action sequences, we
ensure the total optimal utility of the game remains also bounded.