Pub Date : 2025-11-19DOI: 10.1109/LCSYS.2025.3634657
Hikaru Hoshino
This letter proposes a gradient-based control co-design method for nonlinear optimal regulator problems, where physical design parameters and feedback controllers are optimized simultaneously. The proposed method is based on Galerkin approximations of the Hamilton–Jacobi–Bellman equation in a policy iteration framework. The key idea is to evaluate closed-loop performance as the expected cost over a prescribed distribution of initial states, which enables sensitivity analysis and gradient-based updates of the design parameters, while the controller is improved through policy iteration. As a result, the proposed method overcomes restrictive structural assumptions such as system equivalence, thereby avoiding conservatism and allowing flexible incorporation of design-dependent costs. Moreover, closed-loop stability is ensured at every iteration of the co-design procedure by embedding a recursive admissibility verification that combines two complementary Lyapunov conditions. The effectiveness of the proposed method is demonstrated through an example of a load positioning system.
{"title":"Gradient-Based Co-Design of Nonlinear Optimal Regulators With Stability Guarantee","authors":"Hikaru Hoshino","doi":"10.1109/LCSYS.2025.3634657","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3634657","url":null,"abstract":"This letter proposes a gradient-based control co-design method for nonlinear optimal regulator problems, where physical design parameters and feedback controllers are optimized simultaneously. The proposed method is based on Galerkin approximations of the Hamilton–Jacobi–Bellman equation in a policy iteration framework. The key idea is to evaluate closed-loop performance as the expected cost over a prescribed distribution of initial states, which enables sensitivity analysis and gradient-based updates of the design parameters, while the controller is improved through policy iteration. As a result, the proposed method overcomes restrictive structural assumptions such as system equivalence, thereby avoiding conservatism and allowing flexible incorporation of design-dependent costs. Moreover, closed-loop stability is ensured at every iteration of the co-design procedure by embedding a recursive admissibility verification that combines two complementary Lyapunov conditions. The effectiveness of the proposed method is demonstrated through an example of a load positioning system.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"2561-2566"},"PeriodicalIF":2.0,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11260446","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-19DOI: 10.1109/LCSYS.2025.3634944
Georgiy A. Bondar;Abhishek Halder
The Multimarginal Schrödinger Bridge (MSB) finds the optimal coupling among a collection of random vectors with known statistics and a known correlation structure. In the MSB formulation, this correlation structure is specified a priori as an undirected connected graph with measure-valued vertices. In this letter, we formulate and solve the problem of finding the optimal MSB in the sense we seek the optimal coupling over all possible graph structures. We find that computing the optimal MSB amounts to solving the minimum spanning tree problem over measure-valued vertices. We show that the resulting problem can be solved in two steps. The first step constructs a complete graph with edge weight equal to a sum of the optimal value of the corresponding bimarginal SB and the entropies of the endpoints. The second step solves a minimum spanning tree problem over that weighted graph. Numerical experiments illustrate the proposed solution.
{"title":"Optimal Multimarginal Schrödinger Bridge: Minimum Spanning Tree Over Measure-Valued Vertices","authors":"Georgiy A. Bondar;Abhishek Halder","doi":"10.1109/LCSYS.2025.3634944","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3634944","url":null,"abstract":"The Multimarginal Schrödinger Bridge (MSB) finds the optimal coupling among a collection of random vectors with known statistics and a known correlation structure. In the MSB formulation, this correlation structure is specified a priori as an undirected connected graph with measure-valued vertices. In this letter, we formulate and solve the problem of finding the optimal MSB in the sense we seek the optimal coupling over all possible graph structures. We find that computing the optimal MSB amounts to solving the minimum spanning tree problem over measure-valued vertices. We show that the resulting problem can be solved in two steps. The first step constructs a complete graph with edge weight equal to a sum of the optimal value of the corresponding bimarginal SB and the entropies of the endpoints. The second step solves a minimum spanning tree problem over that weighted graph. Numerical experiments illustrate the proposed solution.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"2555-2560"},"PeriodicalIF":2.0,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-19DOI: 10.1109/LCSYS.2025.3635516
Hanfeng Li;Min Li
The primary objective of this letter is to develop an adaptive output feedback quantized control scheme for strict-feedback nonlinear systems. Three technical challenges must be overcome in addressing this problem. First, it is necessary to compensate for the unknown parameter associated with sensor failures, but the existing compensating method is applicable only within the backstepping framework, which complicates the overall design process. Second, sensor failures lead to only corrupted output being available and prevent the observer and controller from utilizing the true system output. Third, the nature of input quantization renders the actual controller discontinuous, which unavoidably introduces an undesirable quantization error and thus poses a challenge in mitigating its effect. To circumvent these difficulties, this letter establishes a new compensation scheme and a transformation of the control signal to address the effects of sensor failures and input quantization. A concise dynamic gain approach is proposed to construct a novel adaptive observer by using only the corrupted output signal. It is shown that, with the derived output feedback quantized controller, all closed-loop signals are bounded and the tracking error converges to an adjustable region. Two simulation examples are presented to demonstrate the effectiveness of the proposed scheme.
{"title":"Output Feedback Quantized Tracking Control for a Class of Nonlinear Systems With Sensor Failures","authors":"Hanfeng Li;Min Li","doi":"10.1109/LCSYS.2025.3635516","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3635516","url":null,"abstract":"The primary objective of this letter is to develop an adaptive output feedback quantized control scheme for strict-feedback nonlinear systems. Three technical challenges must be overcome in addressing this problem. First, it is necessary to compensate for the unknown parameter associated with sensor failures, but the existing compensating method is applicable only within the backstepping framework, which complicates the overall design process. Second, sensor failures lead to only corrupted output being available and prevent the observer and controller from utilizing the true system output. Third, the nature of input quantization renders the actual controller discontinuous, which unavoidably introduces an undesirable quantization error and thus poses a challenge in mitigating its effect. To circumvent these difficulties, this letter establishes a new compensation scheme and a transformation of the control signal to address the effects of sensor failures and input quantization. A concise dynamic gain approach is proposed to construct a novel adaptive observer by using only the corrupted output signal. It is shown that, with the derived output feedback quantized controller, all closed-loop signals are bounded and the tracking error converges to an adjustable region. Two simulation examples are presented to demonstrate the effectiveness of the proposed scheme.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"2573-2578"},"PeriodicalIF":2.0,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1109/LCSYS.2025.3633369
S. Akbari;S. Galeani;G. Manca;M. Sassano
The objective of this letter is to develop a dynamic control allocation framework for nonlinear systems subject to periodic exogenous signals. The proposed approach combines gradient-based optimization with parameter-dependent sensitivity dynamics, enabling an extension of control allocation beyond conventional strategies, tailored to constant references and instantaneous costs; this methodology systematically addresses periodic reference trajectories and integral cost functionals, while preserving the desirable property of output invisibility, similarly to the linear settings. The proposed technique is implemented and validated on a nonlinear mechanical system.
{"title":"Dynamic Control Allocation for Nonlinear Systems via a Sensitivity Approach","authors":"S. Akbari;S. Galeani;G. Manca;M. Sassano","doi":"10.1109/LCSYS.2025.3633369","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3633369","url":null,"abstract":"The objective of this letter is to develop a dynamic control allocation framework for nonlinear systems subject to periodic exogenous signals. The proposed approach combines gradient-based optimization with parameter-dependent sensitivity dynamics, enabling an extension of control allocation beyond conventional strategies, tailored to constant references and instantaneous costs; this methodology systematically addresses periodic reference trajectories and integral cost functionals, while preserving the desirable property of output invisibility, similarly to the linear settings. The proposed technique is implemented and validated on a nonlinear mechanical system.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"2549-2554"},"PeriodicalIF":2.0,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This letter presents a robust backstepping type controller formulation for the position tracking control of engineering systems actuated via electro-hydraulic actuators (EHAs). Specifically, a robust controller that does not require accurate knowledge of the system parameters and uses only position measurements is proposed. A filtered based approach is applied to remove the velocity dependency of the controller formulation. Stability of the closed loop system and the uniform boundedness of the tracking error signals are ensured via Lyapunov based arguments. The overall performance of the proposed method is illustrated, initially through physics-based MATLAB/Simscape studies, and then experimentally on a 1 degree of freedom (dof) EHA test-bed and a 2 dof robotic arm.
{"title":"Robust Position Tracking Control of Electro-Hydraulic Actuators: Elimination of Velocity Measurements","authors":"Sule Taskingollu;Alper Bayrak;Erman Selim;Enver Tatlicioglu;Erkan Zergeroglu","doi":"10.1109/LCSYS.2025.3633925","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3633925","url":null,"abstract":"This letter presents a robust backstepping type controller formulation for the position tracking control of engineering systems actuated via electro-hydraulic actuators (EHAs). Specifically, a robust controller that does not require accurate knowledge of the system parameters and uses only position measurements is proposed. A filtered based approach is applied to remove the velocity dependency of the controller formulation. Stability of the closed loop system and the uniform boundedness of the tracking error signals are ensured via Lyapunov based arguments. The overall performance of the proposed method is illustrated, initially through physics-based MATLAB/Simscape studies, and then experimentally on a 1 degree of freedom (dof) EHA test-bed and a 2 dof robotic arm.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"2543-2548"},"PeriodicalIF":2.0,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1109/LCSYS.2025.3633320
Daniel E. Ochoa;Mahmoud Abdelgalil;Jorge I. Poveda
We study the instability of Nesterov’s ODE in non-conservative settings, where the driving term is not necessarily the gradient of a potential function. While convergence properties under Nesterov’s ODE are well-characterized for settings with gradient-based driving terms, we show that the presence of arbitrarily small non-conservative terms can lead to instability. To resolve the instability issue, we study a regularization mechanism based on restarting. For this mechanism, we establish novel explicit bounds on the resetting period that ensure the decrease of a suitable Lyapunov function, thereby guaranteeing stability and “accelerated” convergence rates under suitable smoothness and monotonicity properties on the driving term. Numerical simulations support our results.
{"title":"On the Instability of Nesterov’s ODE Under Non-Conservative Vector Fields","authors":"Daniel E. Ochoa;Mahmoud Abdelgalil;Jorge I. Poveda","doi":"10.1109/LCSYS.2025.3633320","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3633320","url":null,"abstract":"We study the instability of Nesterov’s ODE in non-conservative settings, where the driving term is not necessarily the gradient of a potential function. While convergence properties under Nesterov’s ODE are well-characterized for settings with gradient-based driving terms, we show that the presence of arbitrarily small non-conservative terms can lead to instability. To resolve the instability issue, we study a regularization mechanism based on restarting. For this mechanism, we establish novel explicit bounds on the resetting period that ensure the decrease of a suitable Lyapunov function, thereby guaranteeing stability and “accelerated” convergence rates under suitable smoothness and monotonicity properties on the driving term. Numerical simulations support our results.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"2639-2644"},"PeriodicalIF":2.0,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-13DOI: 10.1109/LCSYS.2025.3632763
Ákos M. Bokor;Felix Biertümpfe;Peter Seiler;Roland Tóth
This letter proposes a robust tube-based model predictive control approach for spacecraft rendezvous subject to sector-bounded nonlinearities and bounded disturbances. Unlike existing methods that design the feedback controller and tube tightening parameters sequentially, we jointly optimize both through a convex Linear Matrix Inequality using static quadratic constraints. This eliminates the conservatism inherent in two-step design procedures, while maintaining computational tractability for real-time implementation. The approach is validated through a CubeSat docking simulation with tight operational constraints, showing around 70% fuel savings, 21% reduction in average computational time and smaller tube sizes compared to LQR-based fixed-gain methods.
{"title":"Robust Model Predictive Control for Spacecraft Rendezvous Under Sector-Bounded Nonlinearities","authors":"Ákos M. Bokor;Felix Biertümpfe;Peter Seiler;Roland Tóth","doi":"10.1109/LCSYS.2025.3632763","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3632763","url":null,"abstract":"This letter proposes a robust tube-based model predictive control approach for spacecraft rendezvous subject to sector-bounded nonlinearities and bounded disturbances. Unlike existing methods that design the feedback controller and tube tightening parameters sequentially, we jointly optimize both through a convex Linear Matrix Inequality using static quadratic constraints. This eliminates the conservatism inherent in two-step design procedures, while maintaining computational tractability for real-time implementation. The approach is validated through a CubeSat docking simulation with tight operational constraints, showing around 70% fuel savings, 21% reduction in average computational time and smaller tube sizes compared to LQR-based fixed-gain methods.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"2567-2572"},"PeriodicalIF":2.0,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11245213","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-13DOI: 10.1109/LCSYS.2025.3632307
Yanan Zhu;Wenwu Yu;Guanghui Wen
This letter provides the first rigorous theoretical analysis for the Distributed Fenchel Dual Gradient (DFDG) algorithm, a continuous-time method for solving distributed convex optimization problems with local set constraints over digraphs. The DFDG algorithm, originally proposed in our prior work (Zhu et al., 2020), transforms the primal problem into its Fenchel dual and solves it using a two-time-scale dynamical system. This letter provides a more comprehensive explanation of the algorithm’s design mechanism and formally establishes its convergence properties. Under strong convexity and Lipschitz continuity assumptions, Lyapunov stability theory is employed to prove the asymptotic convergence to the optimal solutions of both the primal and its dual problems. This analysis provides rigorous guarantees for a class of dual-based algorithms over digraphs, filling a critical gap in the existing literature.
这封信为分布式Fenchel对偶梯度(DFDG)算法提供了第一个严格的理论分析,这是一种连续时间方法,用于解决有向图上具有局部集约束的分布式凸优化问题。DFDG算法最初是在我们之前的工作中提出的(Zhu et al., 2020),它将原始问题转换为Fenchel对偶,并使用双时间尺度动力系统来解决它。这封信提供了一个更全面的解释算法的设计机制,并正式建立了其收敛性质。在强凸性和Lipschitz连续性假设下,利用Lyapunov稳定性理论证明了该问题的最优解及其对偶问题的渐近收敛性。这种分析为有向图上的一类基于双重的算法提供了严格的保证,填补了现有文献中的一个关键空白。
{"title":"Distributed Fenchel Dual Gradient Algorithm for Constrained Convex Optimization Over Digraphs","authors":"Yanan Zhu;Wenwu Yu;Guanghui Wen","doi":"10.1109/LCSYS.2025.3632307","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3632307","url":null,"abstract":"This letter provides the first rigorous theoretical analysis for the Distributed Fenchel Dual Gradient (DFDG) algorithm, a continuous-time method for solving distributed convex optimization problems with local set constraints over digraphs. The DFDG algorithm, originally proposed in our prior work (Zhu et al., 2020), transforms the primal problem into its Fenchel dual and solves it using a two-time-scale dynamical system. This letter provides a more comprehensive explanation of the algorithm’s design mechanism and formally establishes its convergence properties. Under strong convexity and Lipschitz continuity assumptions, Lyapunov stability theory is employed to prove the asymptotic convergence to the optimal solutions of both the primal and its dual problems. This analysis provides rigorous guarantees for a class of dual-based algorithms over digraphs, filling a critical gap in the existing literature.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"2537-2542"},"PeriodicalIF":2.0,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-07DOI: 10.1109/LCSYS.2025.3630241
Hesham Abdelfattah;Sameh A. Eisa;Peter Stechlinski
In this letter, we extend the sensitivity-based rank condition (SERC) test for local observability to another class of systems, namely smooth and nonsmooth differential-algebraic equation (DAE) systems of index-1. The newly introduced test for DAEs, which we call the lexicographic SERC (L-SERC) observability test, utilizes the theory of lexicographic differentiation to compute sensitivity information. Moreover, the newly introduced L-SERC observability test can judges which states are observable and which are not. Additionally, we introduce a novel sensitivity-based extended Kalman filter (S-EKF) algorithm for state estimation, applicable to both smooth and nonsmooth DAE systems. Finally, we apply the newly developed S-EKF to estimate the states of a wind turbine power system model.
{"title":"Observability and State Estimation for Smooth and Nonsmooth Differential Algebraic Equation Systems","authors":"Hesham Abdelfattah;Sameh A. Eisa;Peter Stechlinski","doi":"10.1109/LCSYS.2025.3630241","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3630241","url":null,"abstract":"In this letter, we extend the sensitivity-based rank condition (SERC) test for local observability to another class of systems, namely smooth and nonsmooth differential-algebraic equation (DAE) systems of index-1. The newly introduced test for DAEs, which we call the lexicographic SERC (L-SERC) observability test, utilizes the theory of lexicographic differentiation to compute sensitivity information. Moreover, the newly introduced L-SERC observability test can judges which states are observable and which are not. Additionally, we introduce a novel sensitivity-based extended Kalman filter (S-EKF) algorithm for state estimation, applicable to both smooth and nonsmooth DAE systems. Finally, we apply the newly developed S-EKF to estimate the states of a wind turbine power system model.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"2507-2512"},"PeriodicalIF":2.0,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145510102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hierarchical reinforcement learning (RL) aims to improve sample efficiency by decomposing complex long-horizon tasks into fast low-level myopic and slower high-level non-myopic subtasks. However, the unilateral nested policy structure in current goal-conditioned hierarchical RL (HRL) methods sets subgoals at the high level without considering feedback from the low level, which significantly degrades the performance of high-level subgoal generation and sampling efficiency. Hindsight action relabeling further weakens subgoal settings by submitting to low-level reachability. Inspired by feedback control of dynamic systems, we present Feedback for Improved HRL with Timed Subgoals (FIHTS), a mechanism allowing feedback control of subgoal generation for improved HRL. Unlike current HRL, FIHTS enables both the high level to set subgoals and the low level to receive rewards based on subgoal achievement. Our experiments in various challenging dynamic RL environments show that our FIHTS method achieves higher success rates with higher sample efficiency than existing subgoal-based HRL methods.
{"title":"Feedback for Improved Hierarchical Reinforcement Learning With Timed Subgoals","authors":"Yajie Bao;Dan Shen;Genshe Chen;Hao Xu;Samson Badlia;Simon Khan;Erik Blasch;Khanh Pham","doi":"10.1109/LCSYS.2025.3629008","DOIUrl":"https://doi.org/10.1109/LCSYS.2025.3629008","url":null,"abstract":"Hierarchical reinforcement learning (RL) aims to improve sample efficiency by decomposing complex long-horizon tasks into fast low-level myopic and slower high-level non-myopic subtasks. However, the unilateral nested policy structure in current goal-conditioned hierarchical RL (HRL) methods sets subgoals at the high level without considering feedback from the low level, which significantly degrades the performance of high-level subgoal generation and sampling efficiency. Hindsight action relabeling further weakens subgoal settings by submitting to low-level reachability. Inspired by feedback control of dynamic systems, we present Feedback for Improved HRL with Timed Subgoals (FIHTS), a mechanism allowing feedback control of subgoal generation for improved HRL. Unlike current HRL, FIHTS enables both the high level to set subgoals and the low level to receive rewards based on subgoal achievement. Our experiments in various challenging dynamic RL environments show that our FIHTS method achieves higher success rates with higher sample efficiency than existing subgoal-based HRL methods.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"2501-2506"},"PeriodicalIF":2.0,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145510104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}