Pub Date : 2025-10-10DOI: 10.1109/OJCSYS.2025.3620149
Shanting Wang;Panagiotis Typaldos;Chenjun Li;Andreas A. Malikopoulos
In this paper, we introduce VisioPath, a novel framework combining vision-language models (VLMs) with model predictive control (MPC) to enable safe autonomous driving in dynamic traffic environments. The proposed approach leverages a bird's-eye view video processing pipeline and zero-shot VLM capabilities to obtain structured information about surrounding vehicles, including their positions, dimensions, and velocitie, while providing semantically-informed initial trajectory guesses that warm-start the optimizer and enable contextually-aware navigation decisions (e.g., yielding to emergency vehicles). Using this rich perception output, we shape elliptical collision-avoidance potential fields around other traffic participants, which are seamlessly integrated into a finite-horizon optimal control problem for trajectory planning. The resulting trajectory optimization is solved via differential dynamic programming and is embedded in an event-triggered MPC loop. To ensure collision-free motion, a safety verification layer is incorporated in the framework that provides an assessment of potential unsafe trajectories. Extensive simulations in SUMO and CARLA simulators demonstrate that VisioPath outperforms other baseline approaches, such as conventional MPC, A*, RRT and CBF methods, across multiple metrics. By combining modern AI-driven perception with the rigorous foundation of optimal control, VisioPath represents a significant step forward in safe trajectory planning for complex traffic systems.
{"title":"VisioPath: Vision-Language Enhanced Model Predictive Control for Safe Autonomous Navigation in Mixed Traffic","authors":"Shanting Wang;Panagiotis Typaldos;Chenjun Li;Andreas A. Malikopoulos","doi":"10.1109/OJCSYS.2025.3620149","DOIUrl":"https://doi.org/10.1109/OJCSYS.2025.3620149","url":null,"abstract":"In this paper, we introduce <italic>VisioPath</i>, a novel framework combining vision-language models (VLMs) with model predictive control (MPC) to enable safe autonomous driving in dynamic traffic environments. The proposed approach leverages a bird's-eye view video processing pipeline and zero-shot VLM capabilities to obtain structured information about surrounding vehicles, including their positions, dimensions, and velocitie, while providing semantically-informed initial trajectory guesses that warm-start the optimizer and enable contextually-aware navigation decisions (e.g., yielding to emergency vehicles). Using this rich perception output, we shape elliptical collision-avoidance potential fields around other traffic participants, which are seamlessly integrated into a finite-horizon optimal control problem for trajectory planning. The resulting trajectory optimization is solved via differential dynamic programming and is embedded in an event-triggered MPC loop. To ensure collision-free motion, a safety verification layer is incorporated in the framework that provides an assessment of potential unsafe trajectories. Extensive simulations in SUMO and CARLA simulators demonstrate that <italic>VisioPath</i> outperforms other baseline approaches, such as conventional MPC, A*, RRT and CBF methods, across multiple metrics. By combining modern AI-driven perception with the rigorous foundation of optimal control, <italic>VisioPath</i> represents a significant step forward in safe trajectory planning for complex traffic systems.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"4 ","pages":"562-580"},"PeriodicalIF":0.0,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11199901","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145510199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-09DOI: 10.1109/OJCSYS.2025.3619810
V. Scordamaglia;M. Mattei;G. Franzé
In this paper, a novel solution for addressing the control allocation problem for over-actuated autonomous aircraft is presented. Inparticular, a detailed High Altitude Performance Demonstrator (HAPD) is used to show the effectiveness of the control/allocation architecture. The novelty of the proposed solution consists of designing a model predictive controller compliant with input saturations, geometric constraints, model uncertainties and enjoying tracking capabilities to be used during the online operations to adapt the nominal allocation unit to the time-varying conditions arising from the nonlinear aircraft dynamics. To make this approach viable, the state trajectories of the nonlinear envelope are formally embedded into those pertaining to a norm-bounded linear description. Then the allocation task is addressed by defining an online reference generator in charge of providing a feasible reference trajectory compatible with time-varying flight conditions. Finally, the nominal allocation is adapted online by exploiting state prediction features of the model predictive controller. A simulation campaign, involving comparisons with a well-known competitor, is performed by enlightening the effectiveness of the proposed approach in fulfilling constraints, ensuring accurate trajectory tracking and optimally allocating the control effort.
{"title":"Norm-Bounded Model Predictive Control Allocation Strategy for an Over-Actuated Aircraft","authors":"V. Scordamaglia;M. Mattei;G. Franzé","doi":"10.1109/OJCSYS.2025.3619810","DOIUrl":"https://doi.org/10.1109/OJCSYS.2025.3619810","url":null,"abstract":"In this paper, a novel solution for addressing the control allocation problem for over-actuated autonomous aircraft is presented. Inparticular, a detailed High Altitude Performance Demonstrator (HAPD) is used to show the effectiveness of the control/allocation architecture. The novelty of the proposed solution consists of designing a model predictive controller compliant with input saturations, geometric constraints, model uncertainties and enjoying tracking capabilities to be used during the online operations to adapt the nominal allocation unit to the time-varying conditions arising from the nonlinear aircraft dynamics. To make this approach viable, the state trajectories of the nonlinear envelope are formally embedded into those pertaining to a norm-bounded linear description. Then the allocation task is addressed by defining an online reference generator in charge of providing a feasible reference trajectory compatible with time-varying flight conditions. Finally, the nominal allocation is adapted online by exploiting state prediction features of the model predictive controller. A simulation campaign, involving comparisons with a well-known competitor, is performed by enlightening the effectiveness of the proposed approach in fulfilling constraints, ensuring accurate trajectory tracking and optimally allocating the control effort.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"4 ","pages":"518-530"},"PeriodicalIF":0.0,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11197644","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145405353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This manuscript presents a framework for trajectory generation for soft continuum robots using principles from optimal control. The problem is constrained over the partial differential kinematic equations of the Cosserat rod model, capturing all modes of deformation which soft continuum systems can achieve. The derived optimal control problem is transformed to a nonlinear programming problem which can be solved using the Bernstein polynomial basis. Non-unit quaternions are used to discretely apply rotational transformations to values approximated over Bernstein polynomials, allowing individual components of strain to be constrained separately. Included within this manuscript are numerical results as well as validation through experimental results.
{"title":"Constrained Path Planning for Soft Continuum Robots With Bernstein Surfaces","authors":"Maxwell Hammond;Ean Lovett;Vincenzo Pugliese;Venanzio Cichella;Caterina Lamuta","doi":"10.1109/OJCSYS.2025.3617288","DOIUrl":"https://doi.org/10.1109/OJCSYS.2025.3617288","url":null,"abstract":"This manuscript presents a framework for trajectory generation for soft continuum robots using principles from optimal control. The problem is constrained over the partial differential kinematic equations of the Cosserat rod model, capturing all modes of deformation which soft continuum systems can achieve. The derived optimal control problem is transformed to a nonlinear programming problem which can be solved using the Bernstein polynomial basis. Non-unit quaternions are used to discretely apply rotational transformations to values approximated over Bernstein polynomials, allowing individual components of strain to be constrained separately. Included within this manuscript are numerical results as well as validation through experimental results.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"4 ","pages":"618-628"},"PeriodicalIF":0.0,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11190071","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-26DOI: 10.1109/OJCSYS.2025.3614875
Thomas O. de Jong;Khemraj Shukla;Mircea Lazar
In this paper, we consider the design of model predictive control (MPC) algorithms based on deep operator neural networks (DeepONets) (Lu et al. 2021). These neural networks are capable of accurately approximating real- and complex-valued solutions (Jiang et al. 2024) of continuous-time nonlinear systems without relying on recurrent architectures. The DeepONet architecture is made up of two feedforward neural networks: the branch network, which encodes the input function space, and the trunk network, which represents dependencies on temporal variables or initial conditions. Utilizing the original DeepONet architecture (Lu et al. 2021) as a predictor within MPC for Multi-Input Multi-Output (MIMO) systems requires multiple branch networks, to generate multi-output predictions, one for each input. Moreover, to predict multiple time steps into the future, the network has to be evaluated multiple times. Motivated by this, we introduce a multi-step DeepONet (MS-DeepONet) architecture that computes in one-shot multi-step predictions of system outputs from multi-step input sequences, which is better suited for MPC. We prove that the MS-DeepONet is a universal approximator in terms of multi-step sequence prediction. Additionally, we develop automated hyperparameter selection strategies and implement MPC frameworks using both the standard DeepONet and the proposed MS-DeepONet architectures in PyTorch. We compare MS-DeepONet, standard DeepONet, and LSTM-based controllers on learning and predictive control tasks for the Van der Pol oscillator and the quadruple tank process. The MS-DeepONet is also evaluated on a challenging cart–pendulum system, where it successfully learns swing-up and stabilization policies. Across the examples, MS-DeepONet outperforms standard DeepONet in prediction accuracy and control performance, and achieves significantly lower computation times than Long Short-Term Memory (LSTM) based MPC.
在本文中,我们考虑了基于深度算子神经网络(DeepONets)的模型预测控制(MPC)算法的设计(Lu et al. 2021)。这些神经网络能够准确地逼近连续时间非线性系统的实值和复值解(Jiang et al. 2024),而不依赖于循环架构。DeepONet架构由两个前馈神经网络组成:分支网络编码输入函数空间,主干网络表示对时间变量或初始条件的依赖关系。利用原始DeepONet架构(Lu et al. 2021)作为多输入多输出(MIMO)系统的MPC预测器,需要多个分支网络来生成多输出预测,每个输入一个。此外,为了预测未来的多个时间步长,必须对网络进行多次评估。基于此,我们引入了一种多步DeepONet (MS-DeepONet)架构,该架构可以对多步输入序列的系统输出进行一次多步预测,更适合MPC。我们证明了MS-DeepONet在多步序列预测方面是一个通用逼近器。此外,我们开发了自动超参数选择策略,并在PyTorch中使用标准DeepONet和提议的MS-DeepONet架构实现MPC框架。我们比较了MS-DeepONet、标准DeepONet和基于lstm的控制器对Van der Pol振荡器和四缸过程的学习和预测控制任务。MS-DeepONet还在一个具有挑战性的小车摆系统中进行了评估,成功地学习了摆动和稳定策略。在所有示例中,MS-DeepONet在预测精度和控制性能方面优于标准DeepONet,并且比基于长短期记忆(LSTM)的MPC实现了显着降低的计算时间。
{"title":"Deep Operator Neural Network Model Predictive Control","authors":"Thomas O. de Jong;Khemraj Shukla;Mircea Lazar","doi":"10.1109/OJCSYS.2025.3614875","DOIUrl":"https://doi.org/10.1109/OJCSYS.2025.3614875","url":null,"abstract":"In this paper, we consider the design of model predictive control (MPC) algorithms based on deep operator neural networks (DeepONets) (Lu et al. 2021). These neural networks are capable of accurately approximating real- and complex-valued solutions (Jiang et al. 2024) of continuous-time nonlinear systems without relying on recurrent architectures. The DeepONet architecture is made up of two feedforward neural networks: the branch network, which encodes the input function space, and the trunk network, which represents dependencies on temporal variables or initial conditions. Utilizing the original DeepONet architecture (Lu et al. 2021) as a predictor within MPC for Multi-Input Multi-Output (MIMO) systems requires multiple branch networks, to generate multi-output predictions, one for each input. Moreover, to predict multiple time steps into the future, the network has to be evaluated multiple times. Motivated by this, we introduce a multi-step DeepONet (MS-DeepONet) architecture that computes in one-shot multi-step predictions of system outputs from multi-step input sequences, which is better suited for MPC. We prove that the MS-DeepONet is a universal approximator in terms of multi-step sequence prediction. Additionally, we develop automated hyperparameter selection strategies and implement MPC frameworks using both the standard DeepONet and the proposed MS-DeepONet architectures in PyTorch. We compare MS-DeepONet, standard DeepONet, and LSTM-based controllers on learning and predictive control tasks for the Van der Pol oscillator and the quadruple tank process. The MS-DeepONet is also evaluated on a challenging cart–pendulum system, where it successfully learns swing-up and stabilization policies. Across the examples, MS-DeepONet outperforms standard DeepONet in prediction accuracy and control performance, and achieves significantly lower computation times than Long Short-Term Memory (LSTM) based MPC.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"4 ","pages":"501-517"},"PeriodicalIF":0.0,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11181185","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145405352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-24DOI: 10.1109/OJCSYS.2025.3614070
Brooks A. Butler;Philip E. Paré
As modern systems become increasingly connected with complex dynamic coupling relationships, developing safe control methods for such interconnected systems becomes paramount. In this paper, we explore the relationship of node-level safety definitions for individual agents to local neighborhood dynamics. We define a collaborative control barrier function and provide conditions under which sets defined by these functions will be forward invariant. We use collaborative control barrier functions to construct a novel decentralized algorithm for the safe control of collaborating network agents and provide conditions under which the algorithm is guaranteed to return a viable set of safe control actions for all agents. We then illustrate these results on a networked susceptible-infected-susceptible (SIS) model.
{"title":"Collaborative Safety-Critical Control in Coupled Networked Systems","authors":"Brooks A. Butler;Philip E. Paré","doi":"10.1109/OJCSYS.2025.3614070","DOIUrl":"https://doi.org/10.1109/OJCSYS.2025.3614070","url":null,"abstract":"As modern systems become increasingly connected with complex dynamic coupling relationships, developing safe control methods for such interconnected systems becomes paramount. In this paper, we explore the relationship of node-level safety definitions for individual agents to local neighborhood dynamics. We define a collaborative control barrier function and provide conditions under which sets defined by these functions will be forward invariant. We use collaborative control barrier functions to construct a novel decentralized algorithm for the safe control of collaborating network agents and provide conditions under which the algorithm is guaranteed to return a viable set of safe control actions for all agents. We then illustrate these results on a networked susceptible-infected-susceptible (SIS) model.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"4 ","pages":"433-446"},"PeriodicalIF":0.0,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11176994","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-22DOI: 10.1109/OJCSYS.2025.3612784
Jaap Eising;Jorge Cortés
This paper deals with the problem of accurately determining guaranteed suboptimal values of an unknown cost function on the basis of noisy measurements. We consider a set-valued variant to regression where, instead of finding a best estimate of the cost function, we reason over all functions compatible with the measurements and apply robust methods explicitly in terms of the data. Our treatment provides data-based conditions under which closed-form expressions of upper bounds of the unknown function can be obtained, and regularity properties like convexity and Lipschitzness can be established. These results allow us to perform point- and set-wise verification of suboptimality, and tackle the cautious optimization of the unknown function in both one-shot and online scenarios. We showcase the versatility of the proposed methods in two control-relevant problems: data-driven contraction analysis of unknown nonlinear systems and suboptimal regulation with unknown dynamics and cost. Simulations illustrate our results.
{"title":"Cautious Optimization via Data Informativity","authors":"Jaap Eising;Jorge Cortés","doi":"10.1109/OJCSYS.2025.3612784","DOIUrl":"https://doi.org/10.1109/OJCSYS.2025.3612784","url":null,"abstract":"This paper deals with the problem of accurately determining guaranteed suboptimal values of an unknown cost function on the basis of noisy measurements. We consider a set-valued variant to regression where, instead of finding a best estimate of the cost function, we reason over all functions compatible with the measurements and apply robust methods explicitly in terms of the data. Our treatment provides data-based conditions under which closed-form expressions of upper bounds of the unknown function can be obtained, and regularity properties like convexity and Lipschitzness can be established. These results allow us to perform point- and set-wise verification of suboptimality, and tackle the cautious optimization of the unknown function in both one-shot and online scenarios. We showcase the versatility of the proposed methods in two control-relevant problems: data-driven contraction analysis of unknown nonlinear systems and suboptimal regulation with unknown dynamics and cost. Simulations illustrate our results.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"4 ","pages":"400-417"},"PeriodicalIF":0.0,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11175185","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145351997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-19DOI: 10.1109/OJCSYS.2025.3612246
Prajakta Surve;Shaunak D. Bopardikar;Alexander Von Moll;Isaac Weintraub;David W. Casbeer
We introduce a pursuit game played between a team of a sensor and an attacker and a mobile target in the unbounded Euclidean plane. The target is faster than the sensor, but slower than the attacker. The sensor’s objective is to keep the target within a sensing radius so that the attacker can capture the target, whereas the target seeks to escape by reaching beyond the sensing radius from the sensor without getting captured by the attacker. We assume that as long as the target is within the sensing radius from the sensor, the sensor-attacker team is able to measure the target’s instantaneous position and velocity. We pose and solve this problem as a Game of Kind in which the target uses an open-loop strategy (passive target). Aside from the novel formulation, our contributions are four-fold. First, we present optimal strategies for both the sensor and the attacker, according to their respective objectives. Specifically, we design a sensor strategy that maximizes the duration for which the target remains within its sensing range, while the attacker uses proportional navigation to capture the target. Second, we characterize the sensable region – the region in the plane in which the target remains within the sensing radius of the sensor during the game – and show that capture is guaranteed if and only if the Apollonius circle between the attacker and the target is fully contained within this region. Third, we derive a lower bound on the target’s speed below which capture is guaranteed, and an upper bound on the target speed above which there exists an escape strategy for the target, from an arbitrary initial orientation between the agents. Fourth, for a given initial orientation between the agents, we present a sharper upper bound on the target speed above which there exists an escape strategy for the target.
{"title":"Mutual Support by Sensor-Attacker Team for a Passive Target","authors":"Prajakta Surve;Shaunak D. Bopardikar;Alexander Von Moll;Isaac Weintraub;David W. Casbeer","doi":"10.1109/OJCSYS.2025.3612246","DOIUrl":"https://doi.org/10.1109/OJCSYS.2025.3612246","url":null,"abstract":"We introduce a pursuit game played between a team of a sensor and an attacker and a mobile target in the unbounded Euclidean plane. The target is faster than the sensor, but slower than the attacker. The sensor’s objective is to keep the target within a sensing radius so that the attacker can capture the target, whereas the target seeks to escape by reaching beyond the sensing radius from the sensor without getting captured by the attacker. We assume that as long as the target is within the sensing radius from the sensor, the sensor-attacker team is able to measure the target’s instantaneous position and velocity. We pose and solve this problem as a <italic>Game of Kind</i> in which the target uses an open-loop strategy (passive target). Aside from the novel formulation, our contributions are four-fold. First, we present optimal strategies for both the sensor and the attacker, according to their respective objectives. Specifically, we design a sensor strategy that maximizes the duration for which the target remains within its sensing range, while the attacker uses proportional navigation to capture the target. Second, we characterize the <italic>sensable region</i> – the region in the plane in which the target remains within the sensing radius of the sensor during the game – and show that capture is guaranteed if and only if the Apollonius circle between the attacker and the target is fully contained within this region. Third, we derive a lower bound on the target’s speed below which capture is guaranteed, and an upper bound on the target speed above which there exists an escape strategy for the target, from an arbitrary initial orientation between the agents. Fourth, for a given initial orientation between the agents, we present a sharper upper bound on the target speed above which there exists an escape strategy for the target.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"4 ","pages":"418-432"},"PeriodicalIF":0.0,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11173712","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-19DOI: 10.1109/OJCSYS.2025.3612245
Shuo Liu;Zhe Huang;Jun Zeng;Koushil Sreenath;Calin A. Belta
Safety remains a central challenge in control of dynamical systems, particularly when the boundaries of unsafe sets are complex (e.g., nonconvex, nonsmooth) or unknown. This paper proposes a learning-enabled framework for safety-critical Model Predictive Control (MPC) that integrates Discrete-Time High-Order Control Barrier Functions (DHOCBFs) with iterative convex optimization. Unlike existing methods that primarily address CBFs of relative degree one with fully known unsafe set boundaries, our approach generalizes to arbitrary relative degrees and addresses scenarios where only samples are available for the unsafe set boundaries. We extract pixels from unsafe set boundaries and train a neural network to approximate local linearizations. The learned models are incorporated into the linearized DHOCBF constraints at each time step within the MPC framework. An iterative convex optimization procedure is developed to accelerate computation while maintaining formal safety guarantees. The benefits of computational performance and safe avoidance of obstacles with diverse shapes are examined and confirmed through numerical results. By bridging model-based control with learning-based environment modeling, this framework advances safe autonomy for discrete-time systems operating in complex and partially known settings.
{"title":"Learning-Enabled Iterative Convex Optimization for Safety-Critical Model Predictive Control","authors":"Shuo Liu;Zhe Huang;Jun Zeng;Koushil Sreenath;Calin A. Belta","doi":"10.1109/OJCSYS.2025.3612245","DOIUrl":"https://doi.org/10.1109/OJCSYS.2025.3612245","url":null,"abstract":"Safety remains a central challenge in control of dynamical systems, particularly when the boundaries of unsafe sets are complex (e.g., nonconvex, nonsmooth) or unknown. This paper proposes a learning-enabled framework for safety-critical Model Predictive Control (MPC) that integrates Discrete-Time High-Order Control Barrier Functions (DHOCBFs) with iterative convex optimization. Unlike existing methods that primarily address CBFs of relative degree one with fully known unsafe set boundaries, our approach generalizes to arbitrary relative degrees and addresses scenarios where only samples are available for the unsafe set boundaries. We extract pixels from unsafe set boundaries and train a neural network to approximate local linearizations. The learned models are incorporated into the linearized DHOCBF constraints at each time step within the MPC framework. An iterative convex optimization procedure is developed to accelerate computation while maintaining formal safety guarantees. The benefits of computational performance and safe avoidance of obstacles with diverse shapes are examined and confirmed through numerical results. By bridging model-based control with learning-based environment modeling, this framework advances safe autonomy for discrete-time systems operating in complex and partially known settings.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"4 ","pages":"482-500"},"PeriodicalIF":0.0,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11174009","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-18DOI: 10.1109/OJCSYS.2025.3611725
Steven Carr;Georgios Bakirtzis;Ufuk Topcu
Agents controlled by the output of reinforcement learning (RL) algorithms often transition to unsafe states, particularly in uncertain and partially observable environments. Partially observable Markov decision processes (POMDPs) provide a natural setting for studying such scenarios with limited sensing. Shields filter undesirable actions to ensure safe RL by preserving safety requirements in the agents’ policy. However, synthesizing holistic shields is computationally expensive in complex deployment scenarios. We propose the compositional synthesis of shields by modeling safety requirements by parts, thereby improving scalability. In particular, problem formulations in the form of POMDPs using RL algorithms illustrate that an RL agent equipped with the resulting compositional shielding, beyond being safe, converges to higher values of expected reward. By using subproblem formulations, we preserve and improve the ability of shielded agents to require fewer training episodes than unshielded agents, especially in sparse-reward settings. Concretely, we find that compositional shield synthesis allows an RL agent to remain safe in environments two orders of magnitude larger than other state-of-the-art model-based approaches.
{"title":"Compositional Shield Synthesis for Safe Reinforcement Learning in Partial Observability","authors":"Steven Carr;Georgios Bakirtzis;Ufuk Topcu","doi":"10.1109/OJCSYS.2025.3611725","DOIUrl":"https://doi.org/10.1109/OJCSYS.2025.3611725","url":null,"abstract":"Agents controlled by the output of reinforcement learning (RL) algorithms often transition to unsafe states, particularly in uncertain and partially observable environments. Partially observable Markov decision processes (POMDPs) provide a natural setting for studying such scenarios with limited sensing. <italic>Shields</i> filter undesirable actions to ensure safe RL by preserving safety requirements in the agents’ policy. However, synthesizing holistic shields is computationally expensive in complex deployment scenarios. We propose the <italic>compositional</i> synthesis of shields by modeling safety requirements by parts, thereby improving scalability. In particular, problem formulations in the form of POMDPs using RL algorithms illustrate that an RL agent equipped with the resulting compositional shielding, beyond being safe, converges to higher values of expected reward. By using subproblem formulations, we preserve and improve the ability of shielded agents to require fewer training episodes than unshielded agents, especially in sparse-reward settings. Concretely, we find that compositional shield synthesis allows an RL agent to remain safe in environments two orders of magnitude larger than other state-of-the-art model-based approaches.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"4 ","pages":"373-384"},"PeriodicalIF":0.0,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11172329","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-17DOI: 10.1109/OJCSYS.2025.3611726
Daigo Shishika;Alexander Von Moll;Dipankar Maity;Michael Dorothy
Can deception exist in differential games? We provide a case study for a Turret-Attacker differential game, where two Attackers seek to score points by reaching a target region while a Turret tries to minimize the score by aligning itself with the Attackers before they reach the target. In contrast to the original problem solved with complete information, we assume that the Turret only has partial information about the maximum speed of the Attackers. We investigate whether there is any incentive for the Attackers to move slower than their maximum speed in order to “deceive” the Turret into taking suboptimal actions. We first describe the existence of a dilemma that the Turret may face. Then we derive a set of initial conditions from which the Attackers can force the Turret into a situation where it must take a guess.
{"title":"Deception in Turret Defense Game: Information Limiting Strategy to Induce Dilemma","authors":"Daigo Shishika;Alexander Von Moll;Dipankar Maity;Michael Dorothy","doi":"10.1109/OJCSYS.2025.3611726","DOIUrl":"https://doi.org/10.1109/OJCSYS.2025.3611726","url":null,"abstract":"Can deception exist in differential games? We provide a case study for a Turret-Attacker differential game, where two Attackers seek to score points by reaching a target region while a Turret tries to minimize the score by aligning itself with the Attackers before they reach the target. In contrast to the original problem solved with complete information, we assume that the Turret only has partial information about the maximum speed of the Attackers. We investigate whether there is any incentive for the Attackers to move slower than their maximum speed in order to “deceive” the Turret into taking suboptimal actions. We first describe the existence of a dilemma that the Turret may face. Then we derive a set of initial conditions from which the Attackers can force the Turret into a situation where it must take a guess.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"4 ","pages":"385-399"},"PeriodicalIF":0.0,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11169496","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}