首页 > 最新文献

IEEE Control Systems Letters最新文献

英文 中文
Tube-Based MPC for Uncertain Sampled-Data Control Systems With Inter-Sample Reachability Analysis 带有样本间可达性分析的不确定采样数据控制系统的管状MPC
IF 2 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-12-22 DOI: 10.1109/LCSYS.2025.3647069
Yang Zhao;Elikplim Gah;Sze Zheng Yong
This letter presents an output-feedback tube-based model predictive control (MPC) framework for linear sampled-data control systems subject to external disturbances and non-convex constraints. The proposed approach rigorously incorporates inter-sample reachability analysis to account for the continuous-time evolution of system trajectories between discrete sampling instances and to ensure constraint satisfaction in the continuous time domain. The resulting continuous-time tube-based MPC scheme is demonstrated to ensure that trajectories remain within (potentially non-convex) safe sets throughout the continuous-time evolution.
这封信提出了一个基于输出反馈管的模型预测控制(MPC)框架,用于受外部干扰和非凸约束的线性采样数据控制系统。该方法严格地结合了样本间可达性分析,以解释离散采样实例之间系统轨迹的连续时间演化,并确保在连续时域内满足约束。由此产生的基于连续时间管的MPC方案被证明可以确保轨迹在整个连续时间演化过程中保持在(可能是非凸的)安全集内。
{"title":"Tube-Based MPC for Uncertain Sampled-Data Control Systems With Inter-Sample Reachability Analysis","authors":"Yang Zhao;Elikplim Gah;Sze Zheng Yong","doi":"10.1109/LCSYS.2025.3647069","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3647069","url":null,"abstract":"This letter presents an output-feedback tube-based model predictive control (MPC) framework for linear sampled-data control systems subject to external disturbances and non-convex constraints. The proposed approach rigorously incorporates inter-sample reachability analysis to account for the continuous-time evolution of system trajectories between discrete sampling instances and to ensure constraint satisfaction in the continuous time domain. The resulting continuous-time tube-based MPC scheme is demonstrated to ensure that trajectories remain within (potentially non-convex) safe sets throughout the continuous-time evolution.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"3047-3052"},"PeriodicalIF":2.0,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Convergence Analysis of Repeated Optimization in Performative Prediction 性能预测中重复优化的收敛性分析
IF 2 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-12-22 DOI: 10.1109/LCSYS.2025.3646686
Siqi Du;Heling Zhang;Roy Dong
Classical data-driven methods can be conceptualized as mappings from data distributions to decisions. However, in practice, decisions can influence the data distributions themselves. One of the common methods for handling unknown decision-dependent distribution shift is repeated optimization. In this letter, we model repeated optimization as a discrete-time feedback interconnection system. Our framework enables convergence analysis based on dissipation inequalities and integral quadratic constraints, which provides a novel method to show convergence under unknown decision-dependent distribution shift. We bound the suboptimality when using repeated gradient descent and ignoring the distribution shift when taking gradient steps. Additionally, our framework provides a method to bound the distance between performatively stable points and performatively optimal points.
经典的数据驱动方法可以被概念化为从数据分布到决策的映射。然而,在实践中,决策可以影响数据分布本身。处理未知决策相关分布偏移的常用方法之一是重复优化。在这封信中,我们将重复优化建模为离散时间反馈互连系统。该框架实现了基于耗散不等式和积分二次约束的收敛性分析,提供了一种新的方法来显示未知决策相关分布位移下的收敛性。当使用重复梯度下降时,我们对次优性进行了限定,而忽略了采取梯度步骤时的分布移位。此外,我们的框架提供了一种方法来限定性能稳定点和性能最优点之间的距离。
{"title":"Convergence Analysis of Repeated Optimization in Performative Prediction","authors":"Siqi Du;Heling Zhang;Roy Dong","doi":"10.1109/LCSYS.2025.3646686","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3646686","url":null,"abstract":"Classical data-driven methods can be conceptualized as mappings from data distributions to decisions. However, in practice, decisions can influence the data distributions themselves. One of the common methods for handling unknown decision-dependent distribution shift is repeated optimization. In this letter, we model repeated optimization as a discrete-time feedback interconnection system. Our framework enables convergence analysis based on dissipation inequalities and integral quadratic constraints, which provides a novel method to show convergence under unknown decision-dependent distribution shift. We bound the suboptimality when using repeated gradient descent and ignoring the distribution shift when taking gradient steps. Additionally, our framework provides a method to bound the distance between performatively stable points and performatively optimal points.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"2999-3004"},"PeriodicalIF":2.0,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Geometric Insight in Solving Optimal Control Problems and the Emergence of Generalized Functions 解决最优控制问题的几何洞察力和广义函数的出现
IF 2 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-12-22 DOI: 10.1109/LCSYS.2025.3646701
Erik I. Verriest
Geometric insight may lead to a quick solution for a class of non-LQ optimal control problems. We illustrate this with a simple, inconspicuous-looking example. While necessary conditions for optimality are easily obtained, their analytic solution may not be easy. But some problems are locally reducible to an Euclidean distance problem, but not necessarily globally due to the underlying topology. This insight leads to the additional realization that in some cases, optimality may require impulsive inputs. However, Dirac deltas cannot be compatible with nonlinear operations in Schwartz’s distribution theory. Thus, it seems that we may have a solution but not a theory. Since the solution is transparent in its geometric form, it suggests that another approach to generalized functions, as proposed by Colombeau, should be used. This is very valuable as it corroborates our earlier work. Generalizations are then sought for other problems reducible to Euclidean minimum distance problems, and even more general Riemannian spaces. We make some connections with the notion of persistence of behavior, where these results apply.
几何洞察力可能导致一类非lq最优控制问题的快速解。我们用一个简单的、不起眼的例子来说明这一点。虽然最优性的必要条件很容易得到,但它们的解析解可能并不容易。但有些问题局部可约为欧几里得距离问题,但由于底层拓扑的关系,不一定全局可约。这种见解导致了额外的认识,即在某些情况下,最优性可能需要脉冲输入。然而,狄拉克函数与Schwartz分布理论中的非线性运算是不相容的。因此,我们似乎有了一个解决方案,但没有一个理论。由于解的几何形式是透明的,因此建议使用Colombeau提出的另一种求解广义函数的方法。这是非常有价值的,因为它证实了我们早期的工作。然后寻求可简化为欧几里得最小距离问题的其他问题的推广,甚至更一般的黎曼空间。我们将这些结果与行为持续性的概念联系起来。
{"title":"Geometric Insight in Solving Optimal Control Problems and the Emergence of Generalized Functions","authors":"Erik I. Verriest","doi":"10.1109/LCSYS.2025.3646701","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3646701","url":null,"abstract":"Geometric insight may lead to a quick solution for a class of non-LQ optimal control problems. We illustrate this with a simple, inconspicuous-looking example. While necessary conditions for optimality are easily obtained, their analytic solution may not be easy. But some problems are locally reducible to an Euclidean distance problem, but not necessarily globally due to the underlying topology. This insight leads to the additional realization that in some cases, optimality may require impulsive inputs. However, Dirac deltas cannot be compatible with nonlinear operations in Schwartz’s distribution theory. Thus, it seems that we may have a solution but not a theory. Since the solution is transparent in its geometric form, it suggests that another approach to generalized functions, as proposed by Colombeau, should be used. This is very valuable as it corroborates our earlier work. Generalizations are then sought for other problems reducible to Euclidean minimum distance problems, and even more general Riemannian spaces. We make some connections with the notion of persistence of behavior, where these results apply.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"3053-3058"},"PeriodicalIF":2.0,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Symptom-Driven Personalized Proton Pump Inhibitors Therapy Using Bayesian Neural Networks and Model Predictive Control 使用贝叶斯神经网络和模型预测控制的症状驱动个性化质子泵抑制剂治疗
IF 2 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-12-22 DOI: 10.1109/LCSYS.2025.3647098
Yutong Li;Ilya Kolmanovsky
Proton Pump Inhibitors (PPIs) are the standard of care for gastric acid disorders but carry significant risks when administered chronically at high doses. Precise long-term control of gastric acidity is challenged by the impracticality of invasive gastric acid monitoring beyond 72 hours and wide inter-patient variability. We propose a noninvasive, symptom-based framework that tailors PPI dosing solely on patient-reported reflux and digestive symptom patterns. A Bayesian Neural Network (BNN) prediction model learns to predict patient symptoms and quantifies its uncertainty from historical symptom scores, meal, and PPIs intake data. These probabilistic forecasts feed a chance-constrained Model Predictive Control (MPC) algorithm that dynamically computes future PPI doses to minimize drug usage while enforcing acid suppression with high confidence—without any direct acid measurement. In silico studies over diverse dietary schedules and virtual patient profiles demonstrate that our learning-augmented MPC reduces total PPI consumption by 65% compared to standard fixed regimens, while maintaining acid suppression with at least 95% probability. The proposed approach offers a practical path to personalized PPI therapy, minimizing treatment burden and overdose risk without invasive sensors.
质子泵抑制剂(PPIs)是胃酸紊乱的标准治疗,但长期高剂量使用会带来显著风险。由于超过72小时的侵入性胃酸监测的不可行性和患者之间的广泛差异,对胃酸的精确长期控制提出了挑战。我们提出一个无创的、基于症状的框架,仅根据患者报告的反流和消化症状模式来调整PPI剂量。贝叶斯神经网络(BNN)预测模型学习预测患者症状,并从历史症状评分、膳食和PPIs摄入数据中量化其不确定性。这些概率预测提供给机会约束模型预测控制(MPC)算法,该算法动态计算未来PPI剂量,以最大限度地减少药物使用,同时在没有任何直接酸测量的情况下,以高可信度实施酸抑制。在不同饮食计划和虚拟患者资料的计算机研究表明,与标准固定方案相比,我们的学习增强MPC减少了总PPI消耗65%,同时保持酸抑制的概率至少为95%。提出的方法为个性化PPI治疗提供了一条实用的途径,在没有侵入性传感器的情况下,将治疗负担和用药过量风险降至最低。
{"title":"Symptom-Driven Personalized Proton Pump Inhibitors Therapy Using Bayesian Neural Networks and Model Predictive Control","authors":"Yutong Li;Ilya Kolmanovsky","doi":"10.1109/LCSYS.2025.3647098","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3647098","url":null,"abstract":"Proton Pump Inhibitors (PPIs) are the standard of care for gastric acid disorders but carry significant risks when administered chronically at high doses. Precise long-term control of gastric acidity is challenged by the impracticality of invasive gastric acid monitoring beyond 72 hours and wide inter-patient variability. We propose a noninvasive, symptom-based framework that tailors PPI dosing solely on patient-reported reflux and digestive symptom patterns. A Bayesian Neural Network (BNN) prediction model learns to predict patient symptoms and quantifies its uncertainty from historical symptom scores, meal, and PPIs intake data. These probabilistic forecasts feed a chance-constrained Model Predictive Control (MPC) algorithm that dynamically computes future PPI doses to minimize drug usage while enforcing acid suppression with high confidence—without any direct acid measurement. In silico studies over diverse dietary schedules and virtual patient profiles demonstrate that our learning-augmented MPC reduces total PPI consumption by 65% compared to standard fixed regimens, while maintaining acid suppression with at least 95% probability. The proposed approach offers a practical path to personalized PPI therapy, minimizing treatment burden and overdose risk without invasive sensors.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"3023-3028"},"PeriodicalIF":2.0,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Safe Bayesian Optimization Across Noise Models via Scenario Programming 基于场景规划的噪声模型安全贝叶斯优化
IF 2 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-12-18 DOI: 10.1109/LCSYS.2025.3645824
Abdullah Tokmak;Thomas B. Schön;Dominik Baumann
Safe Bayesian optimization (BO) with Gaussian processes is an effective tool for tuning control policies in safety-critical real-world systems, specifically due to its sample efficiency and safety guarantees. However, most safe BO algorithms assume homoscedastic sub-Gaussian measurement noise, an assumption that does not hold in many relevant applications. In this letter, we propose a straightforward yet rigorous approach for safe BO across noise models, including homoscedastic sub-Gaussian and heteroscedastic heavy-tailed distributions. We provide a high-probability bound on the measurement noise via the scenario approach, integrate these bounds into high probability confidence intervals, and prove safety and optimality for our proposed safe BO algorithm. We deploy our algorithm in synthetic examples and in tuning a controller for the Franka Emika manipulator in simulation.
具有高斯过程的安全贝叶斯优化(BO)是在安全关键的现实系统中调整控制策略的有效工具,特别是由于它的样本效率和安全性保证。然而,大多数安全的BO算法假设均方差亚高斯测量噪声,这一假设在许多相关应用中并不成立。在这封信中,我们提出了一种简单而严格的方法,用于跨噪声模型的安全BO,包括均方差亚高斯分布和异方差重尾分布。我们通过场景方法提供测量噪声的高概率界,将这些界整合到高概率置信区间中,并证明了我们提出的安全BO算法的安全性和最优性。我们将我们的算法应用于合成实例,并在仿真中用于调整Franka Emika机械手的控制器。
{"title":"Safe Bayesian Optimization Across Noise Models via Scenario Programming","authors":"Abdullah Tokmak;Thomas B. Schön;Dominik Baumann","doi":"10.1109/LCSYS.2025.3645824","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3645824","url":null,"abstract":"Safe Bayesian optimization (BO) with Gaussian processes is an effective tool for tuning control policies in safety-critical real-world systems, specifically due to its sample efficiency and safety guarantees. However, most safe BO algorithms assume homoscedastic sub-Gaussian measurement noise, an assumption that does not hold in many relevant applications. In this letter, we propose a straightforward yet rigorous approach for safe BO across noise models, including homoscedastic sub-Gaussian and heteroscedastic heavy-tailed distributions. We provide a high-probability bound on the measurement noise via the scenario approach, integrate these bounds into high probability confidence intervals, and prove safety and optimality for our proposed safe BO algorithm. We deploy our algorithm in synthetic examples and in tuning a controller for the Franka Emika manipulator in simulation.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"3029-3034"},"PeriodicalIF":2.0,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stability Analysis of Fast Extremum Seeking Control for Wiener Systems Using Online Complex Curve Fitting 基于在线复杂曲线拟合的Wiener系统快速求极值控制稳定性分析
IF 2 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-12-18 DOI: 10.1109/LCSYS.2025.3645829
Juan Javier Palacios Roman;Matthijs van Berkel;Maurice Heemels;Thijs van Keulen
In this letter, we show uniform semi-global practical asymptotic stability of fast extremum seeking control (ESC) for single-input single-output Wiener systems. While classic ESC requires a time-scale separation between plant and dither, the fast ESC method circumvents this time-scale separation by exploiting limited knowledge of the frequency response of the linear part of the Wiener system, thereby achieving faster convergence. The assumptions under which the fast ESC method works are relaxed compared to existing work and explicit bounds on the design parameters of the fast ESC scheme are provided. A numerical case study illustrates the enhanced convergence and the robustness of the fast ESC method.
在这篇文章中,我们证明了单输入单输出Wiener系统快速极值寻求控制(ESC)的一致半全局实用渐近稳定性。虽然经典ESC需要在植物和抖动之间进行时间尺度分离,但快速ESC方法通过利用维纳系统线性部分的频率响应的有限知识来规避这种时间尺度分离,从而实现更快的收敛。与现有方法相比,放宽了快速ESC方法工作的假设条件,并给出了快速ESC方案设计参数的明确界限。算例研究表明,快速ESC方法具有较好的收敛性和鲁棒性。
{"title":"Stability Analysis of Fast Extremum Seeking Control for Wiener Systems Using Online Complex Curve Fitting","authors":"Juan Javier Palacios Roman;Matthijs van Berkel;Maurice Heemels;Thijs van Keulen","doi":"10.1109/LCSYS.2025.3645829","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3645829","url":null,"abstract":"In this letter, we show uniform semi-global practical asymptotic stability of fast extremum seeking control (ESC) for single-input single-output Wiener systems. While classic ESC requires a time-scale separation between plant and dither, the fast ESC method circumvents this time-scale separation by exploiting limited knowledge of the frequency response of the linear part of the Wiener system, thereby achieving faster convergence. The assumptions under which the fast ESC method works are relaxed compared to existing work and explicit bounds on the design parameters of the fast ESC scheme are provided. A numerical case study illustrates the enhanced convergence and the robustness of the fast ESC method.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"2993-2998"},"PeriodicalIF":2.0,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Small-Gain Look at Cyber-Physical Security 网络物理安全的小收益分析
IF 2 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-12-18 DOI: 10.1109/LCSYS.2025.3645825
Sayan Chakraborty;Zhong-Ping Jiang
This letter studies the resilience of cyber-physical systems under denial-of-service attacks. We develop a novel framework for resilient control that avoids the need for detailed information about the system or attacker dynamics by treating the plant–attacker interaction as an interconnected system. Using small-gain analysis and switching systems theory, we derive explicit resilience conditions, and employ reinforcement learning to synthesize an optimal policy directly from input–state data, estimating the required small-gain bounds in a data-driven manner. A numerical example illustrates the effectiveness of the proposed approach.
这封信研究了网络物理系统在拒绝服务攻击下的弹性。我们开发了一种新的弹性控制框架,通过将植物-攻击者交互视为一个相互关联的系统,避免了对系统或攻击者动态详细信息的需要。利用小增益分析和开关系统理论,我们推导出明确的弹性条件,并采用强化学习直接从输入状态数据合成最优策略,以数据驱动的方式估计所需的小增益边界。数值算例说明了该方法的有效性。
{"title":"A Small-Gain Look at Cyber-Physical Security","authors":"Sayan Chakraborty;Zhong-Ping Jiang","doi":"10.1109/LCSYS.2025.3645825","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3645825","url":null,"abstract":"This letter studies the resilience of cyber-physical systems under denial-of-service attacks. We develop a novel framework for resilient control that avoids the need for detailed information about the system or attacker dynamics by treating the plant–attacker interaction as an interconnected system. Using small-gain analysis and switching systems theory, we derive explicit resilience conditions, and employ reinforcement learning to synthesize an optimal policy directly from input–state data, estimating the required small-gain bounds in a data-driven manner. A numerical example illustrates the effectiveness of the proposed approach.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"3035-3040"},"PeriodicalIF":2.0,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimism as Risk-Seeking in Multi-Agent Reinforcement Learning 多智能体强化学习中的乐观主义风险寻求
IF 2 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-12-17 DOI: 10.1109/LCSYS.2025.3645109
Runyu Zhang;Na Li;Asuman Ozdaglar;Jeff Shamma;Gioele Zardini
Risk sensitivity has become a central theme in reinforcement learning (RL), where convex risk measures and robust formulations provide principled ways to model preferences beyond expected return. Recent extensions to multi-agent RL (MARL) have largely emphasized the risk-averse setting, prioritizing robustness to uncertainty. In cooperative MARL, however, such conservatism often leads to suboptimal equilibria, and a parallel line of work has shown that optimism can promote cooperation. Existing optimistic methods, though effective in practice, are typically heuristic and lack theoretical grounding. Building on the dual representation for convex risk measures, we propose a principled framework that interprets risk-seeking objectives as optimism. We introduce optimistic value functions, which formalize optimism as divergence-penalized risk-seeking evaluations. Building on this foundation, we derive a policy-gradient theorem for optimistic value functions, including explicit formulas for the entropic risk/KL-penalty setting, and develop decentralized optimistic actor-critic algorithms that implement these updates. Empirical results on cooperative benchmarks demonstrate that risk-seeking optimism consistently improves coordination over both risk-neutral baselines and heuristic optimistic methods. Our framework thus unifies risk-sensitive learning and optimism, offering a theoretically grounded and practically effective approach to cooperation in MARL.
风险敏感性已经成为强化学习(RL)的中心主题,其中凸风险度量和鲁棒公式提供了超越预期回报的偏好建模的原则方法。最近对多智能体强化学习(MARL)的扩展在很大程度上强调了风险规避设置,将鲁棒性优先于不确定性。然而,在合作型MARL中,这种保守性往往会导致次优均衡,而一条平行线表明,乐观主义可以促进合作。现有的乐观方法虽然在实践中是有效的,但往往是启发式的,缺乏理论依据。在凸风险度量的对偶表示的基础上,我们提出了一个原则性框架,将风险寻求目标解释为乐观主义。我们引入乐观价值函数,将乐观形式化为发散惩罚风险寻求评估。在此基础上,我们推导了乐观值函数的策略梯度定理,包括熵风险/ kl -惩罚设置的显式公式,并开发了实现这些更新的分散乐观行为者批评算法。合作基准的实证结果表明,风险寻求乐观主义在风险中性基准和启发式乐观方法上都能持续改善协调。因此,我们的框架将风险敏感学习和乐观主义结合起来,为MARL合作提供了理论基础和实践有效的方法。
{"title":"Optimism as Risk-Seeking in Multi-Agent Reinforcement Learning","authors":"Runyu Zhang;Na Li;Asuman Ozdaglar;Jeff Shamma;Gioele Zardini","doi":"10.1109/LCSYS.2025.3645109","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3645109","url":null,"abstract":"Risk sensitivity has become a central theme in reinforcement learning (RL), where convex risk measures and robust formulations provide principled ways to model preferences beyond expected return. Recent extensions to multi-agent RL (MARL) have largely emphasized the risk-averse setting, prioritizing robustness to uncertainty. In cooperative MARL, however, such conservatism often leads to suboptimal equilibria, and a parallel line of work has shown that optimism can promote cooperation. Existing optimistic methods, though effective in practice, are typically heuristic and lack theoretical grounding. Building on the dual representation for convex risk measures, we propose a principled framework that interprets risk-seeking objectives as optimism. We introduce optimistic value functions, which formalize optimism as divergence-penalized risk-seeking evaluations. Building on this foundation, we derive a policy-gradient theorem for optimistic value functions, including explicit formulas for the entropic risk/KL-penalty setting, and develop decentralized optimistic actor-critic algorithms that implement these updates. Empirical results on cooperative benchmarks demonstrate that risk-seeking optimism consistently improves coordination over both risk-neutral baselines and heuristic optimistic methods. Our framework thus unifies risk-sensitive learning and optimism, offering a theoretically grounded and practically effective approach to cooperation in MARL.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"10 ","pages":"1-6"},"PeriodicalIF":2.0,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146098438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heterogeneous Pursuit of an Active Target Under Sensing Constraints 传感约束下主动目标的异构跟踪
IF 2 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-12-17 DOI: 10.1109/LCSYS.2025.3645033
Prajakta Surve;Shaunak D. Bopardikar;Alexander Von Moll;Isaac Weintraub;David W. Casbeer
This letter studies a heterogeneous three-agent pursuit-evasion scenario in which a sensor–attacker team attempts to capture an active target capable of changing its heading at fixed time intervals. The sensor has a limited sensing range, and the attacker must intercept the target before it escapes sensing. We formulate this problem as a game of kind and extend the optimal sensor and attacker strategies from prior work on passive targets to the active target setting. The sensor updates its heading in each interval by assuming that the target will keep its heading fixed for the rest of the engagement, while the attacker uses an Apollonius circle-based approach for minimum-time interception and updating its heading corresponding to the target heading in every interval. We show that the conditions for capture or escape of a passive target also extend to the case of an active target. In particular, if the speed of the active target is less than a critical value identified for passive targets in our prior work, then capture is guaranteed.
这封信研究了一个异构的三代理追捕逃避场景,其中一个传感器攻击者团队试图捕获一个能够以固定的时间间隔改变其航向的活动目标。传感器的感知范围有限,攻击者必须在目标逃脱感知之前拦截目标。我们将此问题表述为一类博弈,并将最优传感器和攻击者策略从先前的被动目标工作扩展到主动目标设置。传感器通过假设目标将在剩余的交战时间内保持其航向固定,在每个间隔更新其航向,而攻击者使用基于阿波罗尼乌斯圆的方法进行最小时间拦截,并在每个间隔更新其航向对应于目标航向。我们证明了被动目标的捕获或逃脱的条件也延伸到主动目标的情况下。特别是,如果主动目标的速度小于我们在之前的工作中为被动目标确定的临界值,那么捕获是保证的。
{"title":"Heterogeneous Pursuit of an Active Target Under Sensing Constraints","authors":"Prajakta Surve;Shaunak D. Bopardikar;Alexander Von Moll;Isaac Weintraub;David W. Casbeer","doi":"10.1109/LCSYS.2025.3645033","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3645033","url":null,"abstract":"This letter studies a heterogeneous three-agent pursuit-evasion scenario in which a sensor–attacker team attempts to capture an active target capable of changing its heading at fixed time intervals. The sensor has a limited sensing range, and the attacker must intercept the target before it escapes sensing. We formulate this problem as a game of kind and extend the optimal sensor and attacker strategies from prior work on passive targets to the active target setting. The sensor updates its heading in each interval by assuming that the target will keep its heading fixed for the rest of the engagement, while the attacker uses an Apollonius circle-based approach for minimum-time interception and updating its heading corresponding to the target heading in every interval. We show that the conditions for capture or escape of a passive target also extend to the case of an active target. In particular, if the speed of the active target is less than a critical value identified for passive targets in our prior work, then capture is guaranteed.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"3017-3022"},"PeriodicalIF":2.0,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Safe Navigation in the Presence of Range-Limited Pursuers 在距离有限的追踪者存在下的安全导航
IF 2 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2025-12-17 DOI: 10.1109/LCSYS.2025.3645221
Thomas Chapman;Alexander Von Moll;Isaac E. Weintraub
This letter examines the degree to which an evader seeking a safe and efficient path to a target location can benefit from increasing levels of knowledge regarding one or more range-limited pursuers seeking to intercept it. Unlike previous work, this letter considers the time of flight of the pursuers actively attempting interception. It is shown that additional knowledge allows the evader to safely steer closer to the threats, shortening paths without accepting additional risk of capture. A control heuristic is presented, suitable for real-time implementation, which capitalizes on all knowledge available to the evader.
这封信考察了寻求安全有效路径到目标位置的逃避者在多大程度上可以从寻求拦截它的一个或多个范围有限的追踪者的不断提高的知识水平中受益。与以前的工作不同,这封信考虑了主动尝试拦截的追击者的飞行时间。研究表明,额外的知识允许逃避者安全地转向更接近威胁,缩短路径而不接受额外的捕获风险。提出了一种适合于实时实现的控制启发式算法,该算法充分利用了逃避者可用的所有知识。
{"title":"Safe Navigation in the Presence of Range-Limited Pursuers","authors":"Thomas Chapman;Alexander Von Moll;Isaac E. Weintraub","doi":"10.1109/LCSYS.2025.3645221","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3645221","url":null,"abstract":"This letter examines the degree to which an evader seeking a safe and efficient path to a target location can benefit from increasing levels of knowledge regarding one or more range-limited pursuers seeking to intercept it. Unlike previous work, this letter considers the time of flight of the pursuers actively attempting interception. It is shown that additional knowledge allows the evader to safely steer closer to the threats, shortening paths without accepting additional risk of capture. A control heuristic is presented, suitable for real-time implementation, which capitalizes on all knowledge available to the evader.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"2849-2854"},"PeriodicalIF":2.0,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Control Systems Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1