{"title":"安全决策的框架:凸对偶方法","authors":"Martino Bernasconi, Federico Cacciamani, Matteo Castiglioni","doi":"10.3233/ia-230008","DOIUrl":null,"url":null,"abstract":"We study the problem of online interaction in general decision making problems, where the objective is not only to find optimal strategies, but also to satisfy certain safety guarantees, expressed in terms of costs accrued. In particular, we focus on the online learning problem in which an agent has to find the optimal solution of a linear objective. Moreover, the agent has to satisfy a linear safety constraint at each round. We propose a theoretical framework to address such problems and present BAN-SOLO, a UCB-like algorithm that, in an online interaction with an unknown environment, attains sublinear regret of order O ( T ) and satisfies a safety constraint with high probability at each iteration. BAN-SOLO provides a general framework that can be applied to any setting in which estimators of the objective and the cost function are available. At its core, it relies on tools from convex duality to manage environment exploration while satisfying the safety constraint imposed by the problem. To show the applicability of our framework, we provide two game theoretical applications: normal-form games and sequential decision-making problems.","PeriodicalId":42055,"journal":{"name":"Intelligenza Artificiale","volume":"14 4","pages":"0"},"PeriodicalIF":1.9000,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A framework for safe decision making: A convex duality approach\",\"authors\":\"Martino Bernasconi, Federico Cacciamani, Matteo Castiglioni\",\"doi\":\"10.3233/ia-230008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the problem of online interaction in general decision making problems, where the objective is not only to find optimal strategies, but also to satisfy certain safety guarantees, expressed in terms of costs accrued. In particular, we focus on the online learning problem in which an agent has to find the optimal solution of a linear objective. Moreover, the agent has to satisfy a linear safety constraint at each round. We propose a theoretical framework to address such problems and present BAN-SOLO, a UCB-like algorithm that, in an online interaction with an unknown environment, attains sublinear regret of order O ( T ) and satisfies a safety constraint with high probability at each iteration. BAN-SOLO provides a general framework that can be applied to any setting in which estimators of the objective and the cost function are available. At its core, it relies on tools from convex duality to manage environment exploration while satisfying the safety constraint imposed by the problem. To show the applicability of our framework, we provide two game theoretical applications: normal-form games and sequential decision-making problems.\",\"PeriodicalId\":42055,\"journal\":{\"name\":\"Intelligenza Artificiale\",\"volume\":\"14 4\",\"pages\":\"0\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2023-10-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligenza Artificiale\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/ia-230008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligenza Artificiale","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/ia-230008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A framework for safe decision making: A convex duality approach
We study the problem of online interaction in general decision making problems, where the objective is not only to find optimal strategies, but also to satisfy certain safety guarantees, expressed in terms of costs accrued. In particular, we focus on the online learning problem in which an agent has to find the optimal solution of a linear objective. Moreover, the agent has to satisfy a linear safety constraint at each round. We propose a theoretical framework to address such problems and present BAN-SOLO, a UCB-like algorithm that, in an online interaction with an unknown environment, attains sublinear regret of order O ( T ) and satisfies a safety constraint with high probability at each iteration. BAN-SOLO provides a general framework that can be applied to any setting in which estimators of the objective and the cost function are available. At its core, it relies on tools from convex duality to manage environment exploration while satisfying the safety constraint imposed by the problem. To show the applicability of our framework, we provide two game theoretical applications: normal-form games and sequential decision-making problems.