Pub Date : 2020-07-01DOI: 10.23919/ACC45564.2020.9147377
Liren Yang, Denise M. Rizzo, M. Castanier, N. Ozay
In this paper we propose a value-iteration based algorithm to compute controlled invariant sets in cases where the range of certain parameters in the system model are not known a priori. By defining the value function in a way that is related to parameter ranges, the proposed computation allows us to analyze parameter sensitivity for the controlled invariant set. The convergence properties of the algorithm are analyzed for certain classes of systems. Finally, a vehicle team power management case study is used to illustrate the efficacy and scalability of the proposed algorithm.
{"title":"Parameter Sensitivity Analysis of Controlled Invariant Sets via Value Iteration","authors":"Liren Yang, Denise M. Rizzo, M. Castanier, N. Ozay","doi":"10.23919/ACC45564.2020.9147377","DOIUrl":"https://doi.org/10.23919/ACC45564.2020.9147377","url":null,"abstract":"In this paper we propose a value-iteration based algorithm to compute controlled invariant sets in cases where the range of certain parameters in the system model are not known a priori. By defining the value function in a way that is related to parameter ranges, the proposed computation allows us to analyze parameter sensitivity for the controlled invariant set. The convergence properties of the algorithm are analyzed for certain classes of systems. Finally, a vehicle team power management case study is used to illustrate the efficacy and scalability of the proposed algorithm.","PeriodicalId":288450,"journal":{"name":"2020 American Control Conference (ACC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123563123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.23919/ACC45564.2020.9147912
Michael B. Kane
The cost of energy from current floating offshore wind turbines (FOWTs) are not economical due to inefficiencies and maintenance costs, leaving significant renewable energy resources untapped. Co-designing lighter less expensive FOWTs with individual pitch control (IPC) of each blade could increase efficiencies, decreases costs, and make offshore wind economically viable. However, the nonlinear dynamics and breadth of nonstationary wind and wave loading present challenges to designing effective and robust IPC for each desired location and situation.This manuscript presents the development, design, and simulation of machine learning control (MLC) for IPC of FOWTs. MLC has been shown effective for many complex nonlinear fluid-structure interaction problems. This project investigates scaling up these component-level control problems to the system level control of the NREL 5MW OC3 FOWT. A massively parallel genetic program (GP) is developed using MATLAB Simulink and OpenFAST that efficiently evaluates new individuals and selectively tests fitness of each generation in the most challenging design load case. The proposed controller was compared to a baseline PID controller using a cost function that captured the value of annual energy production with maintenance costs correlated to ultimate loads and harmonic fatigue. The proposed controller achieved 67% of the cost of the baseline PID controller, resulting in 4th place in the ARPA-E ATLAS Offshore competition for IPC of the OC3 FOWT for the given design load cases.
{"title":"Machine Learning Control for Floating Offshore Wind Turbine Individual Blade Pitch Control","authors":"Michael B. Kane","doi":"10.23919/ACC45564.2020.9147912","DOIUrl":"https://doi.org/10.23919/ACC45564.2020.9147912","url":null,"abstract":"The cost of energy from current floating offshore wind turbines (FOWTs) are not economical due to inefficiencies and maintenance costs, leaving significant renewable energy resources untapped. Co-designing lighter less expensive FOWTs with individual pitch control (IPC) of each blade could increase efficiencies, decreases costs, and make offshore wind economically viable. However, the nonlinear dynamics and breadth of nonstationary wind and wave loading present challenges to designing effective and robust IPC for each desired location and situation.This manuscript presents the development, design, and simulation of machine learning control (MLC) for IPC of FOWTs. MLC has been shown effective for many complex nonlinear fluid-structure interaction problems. This project investigates scaling up these component-level control problems to the system level control of the NREL 5MW OC3 FOWT. A massively parallel genetic program (GP) is developed using MATLAB Simulink and OpenFAST that efficiently evaluates new individuals and selectively tests fitness of each generation in the most challenging design load case. The proposed controller was compared to a baseline PID controller using a cost function that captured the value of annual energy production with maintenance costs correlated to ultimate loads and harmonic fatigue. The proposed controller achieved 67% of the cost of the baseline PID controller, resulting in 4th place in the ARPA-E ATLAS Offshore competition for IPC of the OC3 FOWT for the given design load cases.","PeriodicalId":288450,"journal":{"name":"2020 American Control Conference (ACC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125391542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.23919/acc45564.2020.9147946
Paul Stanfel, K. Johnson, C. Bay, J. King
In this paper, we present a reinforcement-learning-based distributed approach to wind farm energy capture maximization using yaw-based wake steering. In order to maximize the power output of a wind farm, individual turbines can use yaw misalignment to deflect their wakes away from downstream turbines. Although using model-based methods to achieve yaw misalignment is one option, a model-free method might be better suited to incorporate changing conditions and uncertainty. We propose an algorithm that adapts concepts of temporal difference reinforcement learning distributed to a multiagent environment that empowers individual turbines to optimize overall wind farm output and react to unforeseen disturbances.
{"title":"A Distributed Reinforcement Learning Yaw Control Approach for Wind Farm Energy Capture Maximization*","authors":"Paul Stanfel, K. Johnson, C. Bay, J. King","doi":"10.23919/acc45564.2020.9147946","DOIUrl":"https://doi.org/10.23919/acc45564.2020.9147946","url":null,"abstract":"In this paper, we present a reinforcement-learning-based distributed approach to wind farm energy capture maximization using yaw-based wake steering. In order to maximize the power output of a wind farm, individual turbines can use yaw misalignment to deflect their wakes away from downstream turbines. Although using model-based methods to achieve yaw misalignment is one option, a model-free method might be better suited to incorporate changing conditions and uncertainty. We propose an algorithm that adapts concepts of temporal difference reinforcement learning distributed to a multiagent environment that empowers individual turbines to optimize overall wind farm output and react to unforeseen disturbances.","PeriodicalId":288450,"journal":{"name":"2020 American Control Conference (ACC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116004523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.23919/ACC45564.2020.9147628
Demetris Coleman, Xiaobo Tan
Autonomous underwater gliders have become valuable, energy-efficient tools for a myriad of applications including ocean exploration, fish tracking, and environmental sampling. Many applications, such as, exploring a large area of underwater ruins or navigating through a coral reef, would benefit from fine trajectory tracking. However, trajectory tracking control of underwater gliders is particularly challenging due to their under-actuated, nonlinear dynamics. Taking gliding robotic fish as an example, in this work we propose a backstepping-based controller for the gliding motion to track a desired reference for the pitch angle and position in the 3D space. In particular, the challenge of under-actuation is addressed by exploiting the coupled dynamics and introducing a new modified error term that combines pitch and horizontal position tracking errors. The effectiveness of the proposed control scheme is demonstrated via simulation and its advantages are shown via comparison with a PID controller.
{"title":"Backstepping Control of Gliding Robotic Fish for Trajectory Tracking in 3D Space","authors":"Demetris Coleman, Xiaobo Tan","doi":"10.23919/ACC45564.2020.9147628","DOIUrl":"https://doi.org/10.23919/ACC45564.2020.9147628","url":null,"abstract":"Autonomous underwater gliders have become valuable, energy-efficient tools for a myriad of applications including ocean exploration, fish tracking, and environmental sampling. Many applications, such as, exploring a large area of underwater ruins or navigating through a coral reef, would benefit from fine trajectory tracking. However, trajectory tracking control of underwater gliders is particularly challenging due to their under-actuated, nonlinear dynamics. Taking gliding robotic fish as an example, in this work we propose a backstepping-based controller for the gliding motion to track a desired reference for the pitch angle and position in the 3D space. In particular, the challenge of under-actuation is addressed by exploiting the coupled dynamics and introducing a new modified error term that combines pitch and horizontal position tracking errors. The effectiveness of the proposed control scheme is demonstrated via simulation and its advantages are shown via comparison with a PID controller.","PeriodicalId":288450,"journal":{"name":"2020 American Control Conference (ACC)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122570089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.23919/ACC45564.2020.9147943
Zhenyong Zhang, Ruilong Deng, David K. Y. Yau, Peng Cheng, Jiming Chen
False data injection (FDI) attack is one class of the threatening cyber attacks against power systems. It has been widely recognized that, with the assumption that the attacker is capable of obtaining complete or incomplete information of the system topology and line parameters, the highly synthesized FDI attacks can evade being detected from bad data detection in state estimation. However, line parameters cannot be obtained or inferred easily in practice, because they may be changed or disturbed. In this paper, we find that it is possible for the attacker to execute stealthy FDI attacks against DC state estimation with zero knowledge of line parameters. We term them as zero-parameter-information FDI attacks. Only the topology information about the cut line is required for designing such attack. We prove that, the attacker can arbitrarily modify the state variable of a one-degree bus, which is connected to the outside only by a single cut line; and modify the state variables of all buses, with a same arbitrary bias, in a one-degree super-bus, which is a group of buses that is connected to the outside only by a single cut line. Moreover, we extend these results to a bus or a super-bus which is connected to the outside only by multiple cut lines. Finally, we illustrate and validate our findings using some test power systems.
{"title":"Zero-Parameter-Information FDI Attacks Against Power System State Estimation","authors":"Zhenyong Zhang, Ruilong Deng, David K. Y. Yau, Peng Cheng, Jiming Chen","doi":"10.23919/ACC45564.2020.9147943","DOIUrl":"https://doi.org/10.23919/ACC45564.2020.9147943","url":null,"abstract":"False data injection (FDI) attack is one class of the threatening cyber attacks against power systems. It has been widely recognized that, with the assumption that the attacker is capable of obtaining complete or incomplete information of the system topology and line parameters, the highly synthesized FDI attacks can evade being detected from bad data detection in state estimation. However, line parameters cannot be obtained or inferred easily in practice, because they may be changed or disturbed. In this paper, we find that it is possible for the attacker to execute stealthy FDI attacks against DC state estimation with zero knowledge of line parameters. We term them as zero-parameter-information FDI attacks. Only the topology information about the cut line is required for designing such attack. We prove that, the attacker can arbitrarily modify the state variable of a one-degree bus, which is connected to the outside only by a single cut line; and modify the state variables of all buses, with a same arbitrary bias, in a one-degree super-bus, which is a group of buses that is connected to the outside only by a single cut line. Moreover, we extend these results to a bus or a super-bus which is connected to the outside only by multiple cut lines. Finally, we illustrate and validate our findings using some test power systems.","PeriodicalId":288450,"journal":{"name":"2020 American Control Conference (ACC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122681101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.23919/ACC45564.2020.9147743
Pedro Casau, R. Cunha, C. Silvestre
In this paper, we present the design of trajectory tracking controllers for multirotor aerial vehicles that have the ability to operate both with and without thrust reversal. We follow a hierarchical control approach, in the sense that we start by designing a common saturated controller for the position subsystem and use it to provide a reference to an attitude tracking controller. The controllers for each operating mode are able to achieve global asymptotic stability as well as semiglobal exponential stabilization of the zero tracking error set. We demonstrate the capabilities of the proposed controllers in a simulation that performs a throw-and-catch maneuver.
{"title":"Improved Maneuverability for Multirotor Aerial Vehicles using Globally Stabilizing Feedbacks","authors":"Pedro Casau, R. Cunha, C. Silvestre","doi":"10.23919/ACC45564.2020.9147743","DOIUrl":"https://doi.org/10.23919/ACC45564.2020.9147743","url":null,"abstract":"In this paper, we present the design of trajectory tracking controllers for multirotor aerial vehicles that have the ability to operate both with and without thrust reversal. We follow a hierarchical control approach, in the sense that we start by designing a common saturated controller for the position subsystem and use it to provide a reference to an attitude tracking controller. The controllers for each operating mode are able to achieve global asymptotic stability as well as semiglobal exponential stabilization of the zero tracking error set. We demonstrate the capabilities of the proposed controllers in a simulation that performs a throw-and-catch maneuver.","PeriodicalId":288450,"journal":{"name":"2020 American Control Conference (ACC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122934657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.23919/ACC45564.2020.9147965
P. Shen, R. Caverly
This paper presents a dynamic model of a two-dimensional tower crane, including a Rayleigh-Ritz discretization of the crane’s flexible hoist cable, and proposes a passivity-based control approach for payload trajectory tracking using the µ-tip rate. It is assumed that the crane’s payload is massive, which allows for a decoupling of the rigid and elastic system dynamics. It is shown that the crane features a passive input-output mapping from modified force and torque inputs to a modified output formed using the position and velocity tracking errors of the payload. An input strictly passive derivative controller is proposed, which results in the velocity tracking error and the µ-tip position error of the payload converging to zero. A numerical example is presented that demonstrates the controller’s performance when the payload is to track an agile trajectory.
{"title":"Noncolocated Passivity-Based Control of a 2 DOF Tower Crane with a Flexible Hoist Cable","authors":"P. Shen, R. Caverly","doi":"10.23919/ACC45564.2020.9147965","DOIUrl":"https://doi.org/10.23919/ACC45564.2020.9147965","url":null,"abstract":"This paper presents a dynamic model of a two-dimensional tower crane, including a Rayleigh-Ritz discretization of the crane’s flexible hoist cable, and proposes a passivity-based control approach for payload trajectory tracking using the µ-tip rate. It is assumed that the crane’s payload is massive, which allows for a decoupling of the rigid and elastic system dynamics. It is shown that the crane features a passive input-output mapping from modified force and torque inputs to a modified output formed using the position and velocity tracking errors of the payload. An input strictly passive derivative controller is proposed, which results in the velocity tracking error and the µ-tip position error of the payload converging to zero. A numerical example is presented that demonstrates the controller’s performance when the payload is to track an agile trajectory.","PeriodicalId":288450,"journal":{"name":"2020 American Control Conference (ACC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122655455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.23919/ACC45564.2020.9147668
Håkan Runvik, A. Medvedev, M. Kjellsson
A novel modeling approach capturing the multiple peak phenomenon in oral levodopa administration is proposed. Multiple peaks in the blood plasma concentration of the drug are attributed to the effects caused by gastric emptying. The developed model describes the instances of interrupted gastric emptying by an impulsive feedback of the dopamine concentration in the brain acting on the pyloric sphincter. A combination of the continuous levodopa clearing dynamics and the impulsive feedback results in a hybrid model, whose solutions are positive and bounded. The stability properties of the model are studied by means of a Poincaré map describing the propagation of the continuous model states through the firings of the impulsive feedback. Model feasibility is illustrated on data sets obtained in clinical experiments.
{"title":"Impulsive Feedback Modeling of Levodopa Pharmacokinetics Subject to Intermittently Interrupted Gastric Emptying*","authors":"Håkan Runvik, A. Medvedev, M. Kjellsson","doi":"10.23919/ACC45564.2020.9147668","DOIUrl":"https://doi.org/10.23919/ACC45564.2020.9147668","url":null,"abstract":"A novel modeling approach capturing the multiple peak phenomenon in oral levodopa administration is proposed. Multiple peaks in the blood plasma concentration of the drug are attributed to the effects caused by gastric emptying. The developed model describes the instances of interrupted gastric emptying by an impulsive feedback of the dopamine concentration in the brain acting on the pyloric sphincter. A combination of the continuous levodopa clearing dynamics and the impulsive feedback results in a hybrid model, whose solutions are positive and bounded. The stability properties of the model are studied by means of a Poincaré map describing the propagation of the continuous model states through the firings of the impulsive feedback. Model feasibility is illustrated on data sets obtained in clinical experiments.","PeriodicalId":288450,"journal":{"name":"2020 American Control Conference (ACC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122988519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A control strategy for a leader-follower system without inter-agent communication is developed for cooperative transportation in this work. Two cascaded UKFs (unscented Kalman filters) are developed to estimate the external force of the leader as to eliminate the need of force sensors. Compared to most existing results, performance of the developed force estimators is invariant to lighting conditions since the developed UKFs do not require measurement from vision systems. To enhance robustness of the control scheme, a switching controller along with a triggering condition is developed for the follower, so that the impact to the performance of the closed-loop system caused by the disturbances can be minimized. Experiments are conducted to evaluate the control performance. Additionally, interesting phenomena are observed from the experiments and discussed, which can facilitate the improvement of the next generation cable-based transportation systems.
{"title":"Cooperative Transportation of Drones without Inter-Agent Communication","authors":"Pin-Xian Wu, Hsin-Ai Hung, Cheng-Cheng Yang, Teng-Hu Cheng","doi":"10.23919/ACC45564.2020.9147355","DOIUrl":"https://doi.org/10.23919/ACC45564.2020.9147355","url":null,"abstract":"A control strategy for a leader-follower system without inter-agent communication is developed for cooperative transportation in this work. Two cascaded UKFs (unscented Kalman filters) are developed to estimate the external force of the leader as to eliminate the need of force sensors. Compared to most existing results, performance of the developed force estimators is invariant to lighting conditions since the developed UKFs do not require measurement from vision systems. To enhance robustness of the control scheme, a switching controller along with a triggering condition is developed for the follower, so that the impact to the performance of the closed-loop system caused by the disturbances can be minimized. Experiments are conducted to evaluate the control performance. Additionally, interesting phenomena are observed from the experiments and discussed, which can facilitate the improvement of the next generation cable-based transportation systems.","PeriodicalId":288450,"journal":{"name":"2020 American Control Conference (ACC)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114482880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.23919/ACC45564.2020.9147706
Derek Machalek, Titus Quah, Kody M. Powell
Reinforcement learning (RL) algorithms are a set of goal-oriented machine learning algorithms that can perform control and optimization in a system. Most RL algorithms do not require any information about the underlying dynamics of the system, they only require input and output information. RL algorithms can therefore be applied to a wide range of systems. This paper explores the use of a custom environment to optimize a problem pertinent to process engineers. In this study the custom environment is a continuously stirred tank reactor (CSTR). The purpose of using a custom environment is to illustrate that any number of systems can readily become RL environments. Three RL algorithms are investigated: deep deterministic policy gradient (DDPG), twin-delayed DDPG (TD3), and proximal policy optimization. They are evaluated based on how they converge to a stable solution and how well they dynamically optimize the economics of the CSTR. All three algorithms perform 98% as well as a first principles model, coupled with a non-linear solver, but only TD3 demonstrates convergence to a stable solution. While itself limited in scope, this paper seeks to further open the door to a coupling between powerful RL algorithms and process systems engineering.
{"title":"Dynamic Economic Optimization of a Continuously Stirred Tank Reactor Using Reinforcement Learning","authors":"Derek Machalek, Titus Quah, Kody M. Powell","doi":"10.23919/ACC45564.2020.9147706","DOIUrl":"https://doi.org/10.23919/ACC45564.2020.9147706","url":null,"abstract":"Reinforcement learning (RL) algorithms are a set of goal-oriented machine learning algorithms that can perform control and optimization in a system. Most RL algorithms do not require any information about the underlying dynamics of the system, they only require input and output information. RL algorithms can therefore be applied to a wide range of systems. This paper explores the use of a custom environment to optimize a problem pertinent to process engineers. In this study the custom environment is a continuously stirred tank reactor (CSTR). The purpose of using a custom environment is to illustrate that any number of systems can readily become RL environments. Three RL algorithms are investigated: deep deterministic policy gradient (DDPG), twin-delayed DDPG (TD3), and proximal policy optimization. They are evaluated based on how they converge to a stable solution and how well they dynamically optimize the economics of the CSTR. All three algorithms perform 98% as well as a first principles model, coupled with a non-linear solver, but only TD3 demonstrates convergence to a stable solution. While itself limited in scope, this paper seeks to further open the door to a coupling between powerful RL algorithms and process systems engineering.","PeriodicalId":288450,"journal":{"name":"2020 American Control Conference (ACC)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114590231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}