Pub Date : 2021-12-14DOI: 10.1109/CDC45484.2021.9682812
James Reed, Maxwell J. Wu, K. Barton, C. Vermillion, K. Mishra
This paper presents a new iterative learning control (ILC) methodology, termed library-based norm-optimal ILC, which optimally accounts for variations in measurable disturbances and plant parameters from one iteration to the next. In this formulation, previous iteration-varying disturbance and/or plant parameters, along with the corresponding control and error sequences, are intelligently maintained in a dynamically evolving library. The library is then referenced at each iteration, in order to base the new control sequence on the most relevant prior iterations, according to an optimization metric. In contrast with the limited number of library-based ILC methodologies pursued in the literature, the present work (i) selects provably optimal interpolation weights, (ii) presents methods for starting with an empty library and intelligently truncating the library when it becomes too large, and (iii) demonstrates convergence to an optimal performance value. To demonstrate the effectiveness of our new methodology, we simulate our library-based norm-optimal ILC method on a linear time-varying model of a micro-robotic deposition system.
{"title":"Library-Based Norm-Optimal Iterative Learning Control","authors":"James Reed, Maxwell J. Wu, K. Barton, C. Vermillion, K. Mishra","doi":"10.1109/CDC45484.2021.9682812","DOIUrl":"https://doi.org/10.1109/CDC45484.2021.9682812","url":null,"abstract":"This paper presents a new iterative learning control (ILC) methodology, termed library-based norm-optimal ILC, which optimally accounts for variations in measurable disturbances and plant parameters from one iteration to the next. In this formulation, previous iteration-varying disturbance and/or plant parameters, along with the corresponding control and error sequences, are intelligently maintained in a dynamically evolving library. The library is then referenced at each iteration, in order to base the new control sequence on the most relevant prior iterations, according to an optimization metric. In contrast with the limited number of library-based ILC methodologies pursued in the literature, the present work (i) selects provably optimal interpolation weights, (ii) presents methods for starting with an empty library and intelligently truncating the library when it becomes too large, and (iii) demonstrates convergence to an optimal performance value. To demonstrate the effectiveness of our new methodology, we simulate our library-based norm-optimal ILC method on a linear time-varying model of a micro-robotic deposition system.","PeriodicalId":229089,"journal":{"name":"2021 60th IEEE Conference on Decision and Control (CDC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123560006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-14DOI: 10.1109/CDC45484.2021.9683289
Kyriakos Lotidis, A. L. Moustakas, N. Bambos
In this paper, we focus on the effect that testing centers (which detect and quarantine infected individuals) have on mitigating the evolution of an epidemic. We incorporate diffusion-style mobility of infected but undetected individuals, as opposed to detected and quarantined ones. We compute the total and maximum (over time) spatially averaged density of infected individuals (detected or not), which are useful metrics of the epidemic’s impact on a population, as functions of the testing center spatial density.Even under conditions where the epidemic has the natural potential to spread, we find that a ‘phase transition’ occurs as the testing center spatial density increases. For any testing density above a certain threshold the epidemic is suppressed and dies out, while below it propagates and evolves naturally albeit still strongly depending on the testing center density. This analysis further allows to optimize the testing certain density so that the epidemic’s evolution does not inundate or exhaust critical health care resources, like ICU bed capacity.
{"title":"Controlling Epidemics via Testing","authors":"Kyriakos Lotidis, A. L. Moustakas, N. Bambos","doi":"10.1109/CDC45484.2021.9683289","DOIUrl":"https://doi.org/10.1109/CDC45484.2021.9683289","url":null,"abstract":"In this paper, we focus on the effect that testing centers (which detect and quarantine infected individuals) have on mitigating the evolution of an epidemic. We incorporate diffusion-style mobility of infected but undetected individuals, as opposed to detected and quarantined ones. We compute the total and maximum (over time) spatially averaged density of infected individuals (detected or not), which are useful metrics of the epidemic’s impact on a population, as functions of the testing center spatial density.Even under conditions where the epidemic has the natural potential to spread, we find that a ‘phase transition’ occurs as the testing center spatial density increases. For any testing density above a certain threshold the epidemic is suppressed and dies out, while below it propagates and evolves naturally albeit still strongly depending on the testing center density. This analysis further allows to optimize the testing certain density so that the epidemic’s evolution does not inundate or exhaust critical health care resources, like ICU bed capacity.","PeriodicalId":229089,"journal":{"name":"2021 60th IEEE Conference on Decision and Control (CDC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122123431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-14DOI: 10.1109/CDC45484.2021.9682976
Ioannis Tzortzis, C. D. Charalambous, C. Hadjicostis
In this paper, we study the robust linear quadratic regulator (LQR) problem for a class of discrete-time dynamical systems composed of several uncertain players with unknown or ambiguous distribution information. A distinctive feature of the assumed model is that each player is prescribed by a nominal probability distribution and categorized according to an uncertainty level of confidence. Our approach is based on minimax optimization. By following a dynamic programming approach a closed-form expression of the robust control policy is derived. The effect of ambiguity on the performance of the LQR is studied via a sequential hierarchical game with one leader and several followers. The equilibrium solution is obtained through a maximizing, time-varying probability distribution characterizing each player’s optimal policy. The behavior of the proposed method is demonstrated through an application to a drop-shipping retail fulfillment model.
{"title":"A Distributionally Robust LQR for Systems with Multiple Uncertain Players","authors":"Ioannis Tzortzis, C. D. Charalambous, C. Hadjicostis","doi":"10.1109/CDC45484.2021.9682976","DOIUrl":"https://doi.org/10.1109/CDC45484.2021.9682976","url":null,"abstract":"In this paper, we study the robust linear quadratic regulator (LQR) problem for a class of discrete-time dynamical systems composed of several uncertain players with unknown or ambiguous distribution information. A distinctive feature of the assumed model is that each player is prescribed by a nominal probability distribution and categorized according to an uncertainty level of confidence. Our approach is based on minimax optimization. By following a dynamic programming approach a closed-form expression of the robust control policy is derived. The effect of ambiguity on the performance of the LQR is studied via a sequential hierarchical game with one leader and several followers. The equilibrium solution is obtained through a maximizing, time-varying probability distribution characterizing each player’s optimal policy. The behavior of the proposed method is demonstrated through an application to a drop-shipping retail fulfillment model.","PeriodicalId":229089,"journal":{"name":"2021 60th IEEE Conference on Decision and Control (CDC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125956731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-14DOI: 10.1109/CDC45484.2021.9683659
Wei Zhang, Lin Tie, Jr-Shin Li
Systems composed of large ensembles of isolated or interacted dynamic units are prevalent in nature and engineered infrastructures. Linear ensemble systems are inarguably the simplest class of ensemble systems and have attracted intensive attention to control theorists and practionars in the past years. Comprehensive understanding of dynamic properties of such systems yet remains far-fetched and requires considerable knowledge and techniques beyond the reach of modern control theory. In this paper, we explore the classes of linear ensemble systems with system matrices that are not globally diagonalizable. In particular, we focus on analyzing their controllability properties under a Sobolev space setting and develop conditions under which uniform controllability of such ensemble systems is equivalent to that of their diagonalizable counterparts. This development significantly facilitates controllability analysis for linear ensemble systems through examining diagonalized linear systems.
{"title":"Controllability of Sobolev-Type Linear Ensemble Systems","authors":"Wei Zhang, Lin Tie, Jr-Shin Li","doi":"10.1109/CDC45484.2021.9683659","DOIUrl":"https://doi.org/10.1109/CDC45484.2021.9683659","url":null,"abstract":"Systems composed of large ensembles of isolated or interacted dynamic units are prevalent in nature and engineered infrastructures. Linear ensemble systems are inarguably the simplest class of ensemble systems and have attracted intensive attention to control theorists and practionars in the past years. Comprehensive understanding of dynamic properties of such systems yet remains far-fetched and requires considerable knowledge and techniques beyond the reach of modern control theory. In this paper, we explore the classes of linear ensemble systems with system matrices that are not globally diagonalizable. In particular, we focus on analyzing their controllability properties under a Sobolev space setting and develop conditions under which uniform controllability of such ensemble systems is equivalent to that of their diagonalizable counterparts. This development significantly facilitates controllability analysis for linear ensemble systems through examining diagonalized linear systems.","PeriodicalId":229089,"journal":{"name":"2021 60th IEEE Conference on Decision and Control (CDC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126194237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-14DOI: 10.1109/CDC45484.2021.9683675
Ioannis Tzortzis, C. Hadjicostis, C. D. Charalambous
The linear quadratic tracking control problem is studied for a class of discrete-time uncertain Markov jump linear systems with time-varying conditional distributions. The controller is designed under the assumption that it has no access to the true states of the Markov chain, but rather it depends on the Markov chain state estimates. To deal with uncertainty, the transition probabilities of Markov state estimates between the different operating modes of the system are considered to belong in an ambiguity set of some nominal transition probabilities. The estimation problem is solved via the one-step forward Viterbi algorithm, while the stochastic control problem is solved via minimax optimization theory. An optimal control policy with some desired robustness properties is designed, and a maximizing time-varying transition probability distribution is obtained. A numerical example is given to illustrate the applicability and effectiveness of the proposed approach.
{"title":"Linear Quadratic Tracking Control of Hidden Markov Jump Linear Systems Subject to Ambiguity","authors":"Ioannis Tzortzis, C. Hadjicostis, C. D. Charalambous","doi":"10.1109/CDC45484.2021.9683675","DOIUrl":"https://doi.org/10.1109/CDC45484.2021.9683675","url":null,"abstract":"The linear quadratic tracking control problem is studied for a class of discrete-time uncertain Markov jump linear systems with time-varying conditional distributions. The controller is designed under the assumption that it has no access to the true states of the Markov chain, but rather it depends on the Markov chain state estimates. To deal with uncertainty, the transition probabilities of Markov state estimates between the different operating modes of the system are considered to belong in an ambiguity set of some nominal transition probabilities. The estimation problem is solved via the one-step forward Viterbi algorithm, while the stochastic control problem is solved via minimax optimization theory. An optimal control policy with some desired robustness properties is designed, and a maximizing time-varying transition probability distribution is obtained. A numerical example is given to illustrate the applicability and effectiveness of the proposed approach.","PeriodicalId":229089,"journal":{"name":"2021 60th IEEE Conference on Decision and Control (CDC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124773159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-14DOI: 10.1109/CDC45484.2021.9683249
Pushpak Jagtap, Dimos V. Dimarogonas
This paper focuses on the problem of distributed consensus control of multi-agent systems while considering two main practical concerns (i) stochastic noise in the agent dynamics and (ii) predefined performance constraints over evolutions of multi-agent systems. In particular, we consider that each agent is driven by a stochastic differential equation with state-dependent noise which makes the considered problem more challenging compare to non-stochastic agents. The work provides sufficient conditions under which the proposed time-varying distributed control laws ensure consensus in expectation and almost sure consensus of stochastic multi-agent systems while satisfying prescribed performance constraints over evolutions of the systems in the sense of the qth moment. Finally, we demonstrate the effectiveness of the proposed results with a numerical example.
{"title":"Distributed Consensus of Stochastic Multi-agent Systems with Prescribed Performance Constraints","authors":"Pushpak Jagtap, Dimos V. Dimarogonas","doi":"10.1109/CDC45484.2021.9683249","DOIUrl":"https://doi.org/10.1109/CDC45484.2021.9683249","url":null,"abstract":"This paper focuses on the problem of distributed consensus control of multi-agent systems while considering two main practical concerns (i) stochastic noise in the agent dynamics and (ii) predefined performance constraints over evolutions of multi-agent systems. In particular, we consider that each agent is driven by a stochastic differential equation with state-dependent noise which makes the considered problem more challenging compare to non-stochastic agents. The work provides sufficient conditions under which the proposed time-varying distributed control laws ensure consensus in expectation and almost sure consensus of stochastic multi-agent systems while satisfying prescribed performance constraints over evolutions of the systems in the sense of the qth moment. Finally, we demonstrate the effectiveness of the proposed results with a numerical example.","PeriodicalId":229089,"journal":{"name":"2021 60th IEEE Conference on Decision and Control (CDC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128419082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-14DOI: 10.1109/CDC45484.2021.9683301
Etienne Bertin, B. Hérissé, Julien Alexandre Dit Sandretto, Alexandre Chapoutot
A controlled system subject to dynamics with unknown but bounded parameters is considered. The control is defined as the solution of an optimal control problem, which induces hybrid dynamics. A method to enclose all optimal trajectories of this system is proposed. Using interval and zonotope based validated simulation and Pontryagin’s Maximum Principle, a characterization of optimal trajectories, a conservative enclosure is constructed. The usual validated simulation framework is modified so that possible trajectories are enclosed with spatio-temporal zonotopes that simplify simulation through events. Then optimality conditions are propagated backward in time and added as constraints on the previously computed enclosure. The obtained constrained zonotopes form a thin enclosure of all optimal trajectories that is less susceptible to accumulation of error. This algorithm is applied on Goddard’s problem, an aerospace problem with a bang-bang control.
{"title":"Spatio-temporal constrained zonotopes for validation of optimal control problems *","authors":"Etienne Bertin, B. Hérissé, Julien Alexandre Dit Sandretto, Alexandre Chapoutot","doi":"10.1109/CDC45484.2021.9683301","DOIUrl":"https://doi.org/10.1109/CDC45484.2021.9683301","url":null,"abstract":"A controlled system subject to dynamics with unknown but bounded parameters is considered. The control is defined as the solution of an optimal control problem, which induces hybrid dynamics. A method to enclose all optimal trajectories of this system is proposed. Using interval and zonotope based validated simulation and Pontryagin’s Maximum Principle, a characterization of optimal trajectories, a conservative enclosure is constructed. The usual validated simulation framework is modified so that possible trajectories are enclosed with spatio-temporal zonotopes that simplify simulation through events. Then optimality conditions are propagated backward in time and added as constraints on the previously computed enclosure. The obtained constrained zonotopes form a thin enclosure of all optimal trajectories that is less susceptible to accumulation of error. This algorithm is applied on Goddard’s problem, an aerospace problem with a bang-bang control.","PeriodicalId":229089,"journal":{"name":"2021 60th IEEE Conference on Decision and Control (CDC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128283043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-14DOI: 10.1109/CDC45484.2021.9683490
M. H. Lim, C. Tomlin, Zachary Sunberg
This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algorithms and analyzes them from theoretical and simulation perspectives. Voronoi Optimistic Weighted Sparse Sampling (VOWSS) is a theoretical tool that justifies VPW-based online solvers, and it is the first algorithm with global convergence guarantees for continuous state, action, and observation POMDPs. Voronoi Optimistic Monte Carlo Planning with Observation Weighting (VOMCPOW) is a versatile and efficient algorithm that consistently outperforms state-of-the-art POMDP algorithms in several simulation experiments.
{"title":"Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs","authors":"M. H. Lim, C. Tomlin, Zachary Sunberg","doi":"10.1109/CDC45484.2021.9683490","DOIUrl":"https://doi.org/10.1109/CDC45484.2021.9683490","url":null,"abstract":"This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algorithms and analyzes them from theoretical and simulation perspectives. Voronoi Optimistic Weighted Sparse Sampling (VOWSS) is a theoretical tool that justifies VPW-based online solvers, and it is the first algorithm with global convergence guarantees for continuous state, action, and observation POMDPs. Voronoi Optimistic Monte Carlo Planning with Observation Weighting (VOMCPOW) is a versatile and efficient algorithm that consistently outperforms state-of-the-art POMDP algorithms in several simulation experiments.","PeriodicalId":229089,"journal":{"name":"2021 60th IEEE Conference on Decision and Control (CDC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128677081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-14DOI: 10.1109/CDC45484.2021.9683217
Kaiwen Chen, A. Astolfi
This paper proposes an identification-based adaptive control scheme for nonlinear systems with time-varying parameters designed on the basis of the so-called congelation of variables method. First a scalar example to demonstrate the design methodology, which relies on re-arranging the identifier subsystems from a cascaded topology to a cyclic topology, is discussed. A small-gain-like control synthesis exploiting the cyclic topology is then presented to replace the classical control synthesis based on the swapping lemma, which exploits the cascaded topology. Then a state feedback design for a class of lower triangular nonlinear systems is presented: this combines the same design methodology with the backstepping techniques. Boundedness of all closed-loop signals and convergence of the system state are proved. Finally, simulation results showing that the proposed controller achieves superior performance than the classical design are presented.
{"title":"Identification-based Adaptive Control for Systems with Time-varying Parameters","authors":"Kaiwen Chen, A. Astolfi","doi":"10.1109/CDC45484.2021.9683217","DOIUrl":"https://doi.org/10.1109/CDC45484.2021.9683217","url":null,"abstract":"This paper proposes an identification-based adaptive control scheme for nonlinear systems with time-varying parameters designed on the basis of the so-called congelation of variables method. First a scalar example to demonstrate the design methodology, which relies on re-arranging the identifier subsystems from a cascaded topology to a cyclic topology, is discussed. A small-gain-like control synthesis exploiting the cyclic topology is then presented to replace the classical control synthesis based on the swapping lemma, which exploits the cascaded topology. Then a state feedback design for a class of lower triangular nonlinear systems is presented: this combines the same design methodology with the backstepping techniques. Boundedness of all closed-loop signals and convergence of the system state are proved. Finally, simulation results showing that the proposed controller achieves superior performance than the classical design are presented.","PeriodicalId":229089,"journal":{"name":"2021 60th IEEE Conference on Decision and Control (CDC)","volume":"156 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128733623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-14DOI: 10.1109/CDC45484.2021.9683220
Francesco Zanini, A. Chiuso
Data-driven control of nonlinear dynamical systems is a largely open problem. In this paper, building upon the theory of Koopman operators and exploiting ideas from policy gradient methods in reinforcement learning, a novel approach for data-driven optimal control of unknown nonlinear dynamical systems is introduced.
{"title":"Data-Driven Control of Nonlinear Systems: Learning Koopman Operators for Policy Gradient","authors":"Francesco Zanini, A. Chiuso","doi":"10.1109/CDC45484.2021.9683220","DOIUrl":"https://doi.org/10.1109/CDC45484.2021.9683220","url":null,"abstract":"Data-driven control of nonlinear dynamical systems is a largely open problem. In this paper, building upon the theory of Koopman operators and exploiting ideas from policy gradient methods in reinforcement learning, a novel approach for data-driven optimal control of unknown nonlinear dynamical systems is introduced.","PeriodicalId":229089,"journal":{"name":"2021 60th IEEE Conference on Decision and Control (CDC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129623430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}