{"title":"罪犯与罪犯之间的安全游戏","authors":"Miroslav Krstic","doi":"10.1016/j.arcontrol.2024.100939","DOIUrl":null,"url":null,"abstract":"<div><p>In this tutorial we study a safety analog of the classical zero-sum differential game with positive definite penalties on the state and the two inputs. Consider a nonlinear system affine in two inputs, which are called “offender” and “defender.” Let the inputs have the opposing objectives in relation to an infinite-time cost which, in addition to penalizing the inputs of both agents, incorporates a safety index of the system (a barrier function), with the defender aiming to maximize the system safety and the offender aiming to minimize it. If there is a pair of (offender, defender) non-Nash feedback policies of the <span><math><mrow><msub><mrow><mi>L</mi></mrow><mrow><mi>g</mi></mrow></msub><mi>h</mi></mrow></math></span> form with a safe outcome, namely, where the defender maintains safety while the offender fails to violate safety, then there exists an inverse optimal pair of policies that attain a Nash equilibrium relative to the safety minimax objective. In the tutorial we study both deterministic and stochastic offenders. The deterministic offender applies its feedback through its deterministic input value, while the stochastic offender applies its feedback through its incremental covariance. In addition to Nash policies for a minimax offender–defender formulation, we provide feedback laws for the defender, in the scenario where the offender action is unrestricted by optimality, and where the defender ensures input-to-state safety in the deterministic and stochastic senses. This tutorial is derived from our recent article on inverse optimal safety filters, by setting the nominal control to zero and declaring the disturbance to be the offender agent.</p><p>Among several illustrative examples, one is particularly interesting and unconventional. We consider a safety game played on a unicycle vehicle between its two inputs: the angular velocity and the linear velocity, as the opposing players. We consider two scenarios. In the first, the angular velocity, acting as an offender, attempts to run the vehicle into an obstacle by steering, while the linear velocity, acting as a defender, drives the vehicle forward or in reverse to prevent the vehicle being run into the obstacle. In the second scenario, the linear velocity acts as an offender and angular velocity acts as a defender (in the deterministic case by varying the heading rate; in the stochastic case by varying the variance of a white noise driving the heading rate). A “wind” towards the obstacle advantages the offender in both scenarios. The input policies derived are optimal in the sense of their opposite objectives, under the best possible policy of the opponent, under meaningful costs on their actions. The linear velocity input prevails, whether acting in the role of a defender, in which case the collision with the obstacle is prevented, or in the role of an offender, in which case the collision with the obstacle is achieved.</p></div>","PeriodicalId":50750,"journal":{"name":"Annual Reviews in Control","volume":"57 ","pages":"Article 100939"},"PeriodicalIF":7.3000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1367578824000087/pdfft?md5=3d4c0e415f10642f5626c050ea707e6a&pid=1-s2.0-S1367578824000087-main.pdf","citationCount":"0","resultStr":"{\"title\":\"An offender–defender safety game\",\"authors\":\"Miroslav Krstic\",\"doi\":\"10.1016/j.arcontrol.2024.100939\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In this tutorial we study a safety analog of the classical zero-sum differential game with positive definite penalties on the state and the two inputs. Consider a nonlinear system affine in two inputs, which are called “offender” and “defender.” Let the inputs have the opposing objectives in relation to an infinite-time cost which, in addition to penalizing the inputs of both agents, incorporates a safety index of the system (a barrier function), with the defender aiming to maximize the system safety and the offender aiming to minimize it. If there is a pair of (offender, defender) non-Nash feedback policies of the <span><math><mrow><msub><mrow><mi>L</mi></mrow><mrow><mi>g</mi></mrow></msub><mi>h</mi></mrow></math></span> form with a safe outcome, namely, where the defender maintains safety while the offender fails to violate safety, then there exists an inverse optimal pair of policies that attain a Nash equilibrium relative to the safety minimax objective. In the tutorial we study both deterministic and stochastic offenders. The deterministic offender applies its feedback through its deterministic input value, while the stochastic offender applies its feedback through its incremental covariance. In addition to Nash policies for a minimax offender–defender formulation, we provide feedback laws for the defender, in the scenario where the offender action is unrestricted by optimality, and where the defender ensures input-to-state safety in the deterministic and stochastic senses. This tutorial is derived from our recent article on inverse optimal safety filters, by setting the nominal control to zero and declaring the disturbance to be the offender agent.</p><p>Among several illustrative examples, one is particularly interesting and unconventional. We consider a safety game played on a unicycle vehicle between its two inputs: the angular velocity and the linear velocity, as the opposing players. We consider two scenarios. In the first, the angular velocity, acting as an offender, attempts to run the vehicle into an obstacle by steering, while the linear velocity, acting as a defender, drives the vehicle forward or in reverse to prevent the vehicle being run into the obstacle. In the second scenario, the linear velocity acts as an offender and angular velocity acts as a defender (in the deterministic case by varying the heading rate; in the stochastic case by varying the variance of a white noise driving the heading rate). A “wind” towards the obstacle advantages the offender in both scenarios. The input policies derived are optimal in the sense of their opposite objectives, under the best possible policy of the opponent, under meaningful costs on their actions. The linear velocity input prevails, whether acting in the role of a defender, in which case the collision with the obstacle is prevented, or in the role of an offender, in which case the collision with the obstacle is achieved.</p></div>\",\"PeriodicalId\":50750,\"journal\":{\"name\":\"Annual Reviews in Control\",\"volume\":\"57 \",\"pages\":\"Article 100939\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1367578824000087/pdfft?md5=3d4c0e415f10642f5626c050ea707e6a&pid=1-s2.0-S1367578824000087-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Reviews in Control\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1367578824000087\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Reviews in Control","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1367578824000087","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
In this tutorial we study a safety analog of the classical zero-sum differential game with positive definite penalties on the state and the two inputs. Consider a nonlinear system affine in two inputs, which are called “offender” and “defender.” Let the inputs have the opposing objectives in relation to an infinite-time cost which, in addition to penalizing the inputs of both agents, incorporates a safety index of the system (a barrier function), with the defender aiming to maximize the system safety and the offender aiming to minimize it. If there is a pair of (offender, defender) non-Nash feedback policies of the form with a safe outcome, namely, where the defender maintains safety while the offender fails to violate safety, then there exists an inverse optimal pair of policies that attain a Nash equilibrium relative to the safety minimax objective. In the tutorial we study both deterministic and stochastic offenders. The deterministic offender applies its feedback through its deterministic input value, while the stochastic offender applies its feedback through its incremental covariance. In addition to Nash policies for a minimax offender–defender formulation, we provide feedback laws for the defender, in the scenario where the offender action is unrestricted by optimality, and where the defender ensures input-to-state safety in the deterministic and stochastic senses. This tutorial is derived from our recent article on inverse optimal safety filters, by setting the nominal control to zero and declaring the disturbance to be the offender agent.
Among several illustrative examples, one is particularly interesting and unconventional. We consider a safety game played on a unicycle vehicle between its two inputs: the angular velocity and the linear velocity, as the opposing players. We consider two scenarios. In the first, the angular velocity, acting as an offender, attempts to run the vehicle into an obstacle by steering, while the linear velocity, acting as a defender, drives the vehicle forward or in reverse to prevent the vehicle being run into the obstacle. In the second scenario, the linear velocity acts as an offender and angular velocity acts as a defender (in the deterministic case by varying the heading rate; in the stochastic case by varying the variance of a white noise driving the heading rate). A “wind” towards the obstacle advantages the offender in both scenarios. The input policies derived are optimal in the sense of their opposite objectives, under the best possible policy of the opponent, under meaningful costs on their actions. The linear velocity input prevails, whether acting in the role of a defender, in which case the collision with the obstacle is prevented, or in the role of an offender, in which case the collision with the obstacle is achieved.
期刊介绍:
The field of Control is changing very fast now with technology-driven “societal grand challenges” and with the deployment of new digital technologies. The aim of Annual Reviews in Control is to provide comprehensive and visionary views of the field of Control, by publishing the following types of review articles:
Survey Article: Review papers on main methodologies or technical advances adding considerable technical value to the state of the art. Note that papers which purely rely on mechanistic searches and lack comprehensive analysis providing a clear contribution to the field will be rejected.
Vision Article: Cutting-edge and emerging topics with visionary perspective on the future of the field or how it will bridge multiple disciplines, and
Tutorial research Article: Fundamental guides for future studies.