Distributed Nonlinear Model Predictive Control and Reinforcement Learning

2019 Australian & New Zealand Control Conference (ANZCC) Pub Date : 2019-11-01 DOI:10.1109/ANZCC47194.2019.8945719

Ifrah Saeed, T. Alpcan, S. Erfani, M. Yilmaz

{"title":"Distributed Nonlinear Model Predictive Control and Reinforcement Learning","authors":"Ifrah Saeed, T. Alpcan, S. Erfani, M. Yilmaz","doi":"10.1109/ANZCC47194.2019.8945719","DOIUrl":null,"url":null,"abstract":"Coordinating two or more dynamic systems such as autonomous vehicles or satellites in a distributed manner poses an important research challenge. Multiple approaches to this problem have been proposed including Nonlinear Model Predictive Control (NMPC) and its model-free counterparts in reinforcement learning (RL) literature such as Deep QNetwork (DQN). This initial study aims to compare and contrast the optimal control technique, NMPC, where the model is known, with the popular model-free RL method, DQN. Simple distributed variants of these for the specific problem of balancing and synchronising two highly unstable cart-pole systems are investigated numerically. We found that both NMPC and trained DQN work optimally under ideal model and small communication delays. While NMPC performs sub-optimally under a model-mismatch scenario, DQN performance naturally does not suffer from this. Distributed DQN needs a lot of realworld experience to be trained but once it is trained, it does not have to spend its time finding the optimal action at every time-step like NMPC. This illustrative comparison lays a foundation for hybrid approaches, which can be applied to complex multi-agent scenarios.","PeriodicalId":322243,"journal":{"name":"2019 Australian & New Zealand Control Conference (ANZCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Australian & New Zealand Control Conference (ANZCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ANZCC47194.2019.8945719","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Coordinating two or more dynamic systems such as autonomous vehicles or satellites in a distributed manner poses an important research challenge. Multiple approaches to this problem have been proposed including Nonlinear Model Predictive Control (NMPC) and its model-free counterparts in reinforcement learning (RL) literature such as Deep QNetwork (DQN). This initial study aims to compare and contrast the optimal control technique, NMPC, where the model is known, with the popular model-free RL method, DQN. Simple distributed variants of these for the specific problem of balancing and synchronising two highly unstable cart-pole systems are investigated numerically. We found that both NMPC and trained DQN work optimally under ideal model and small communication delays. While NMPC performs sub-optimally under a model-mismatch scenario, DQN performance naturally does not suffer from this. Distributed DQN needs a lot of realworld experience to be trained but once it is trained, it does not have to spend its time finding the optimal action at every time-step like NMPC. This illustrative comparison lays a foundation for hybrid approaches, which can be applied to complex multi-agent scenarios.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

分布式非线性模型预测控制与强化学习

以分布式方式协调两个或多个动态系统(如自动驾驶汽车或卫星)是一个重要的研究挑战。已经提出了多种方法来解决这个问题，包括非线性模型预测控制(NMPC)及其在强化学习(RL)文献中的无模型对应物，如深度QNetwork (DQN)。这项初步研究的目的是比较和对比最优控制技术，NMPC，其中模型是已知的，与流行的无模型RL方法，DQN。对于平衡和同步两个高度不稳定的车杆系统的具体问题，这些简单的分布变体进行了数值研究。我们发现NMPC和训练DQN在理想的模型和较小的通信延迟下都是最优的。当NMPC在模型不匹配的情况下执行次优时，DQN的性能自然不会受到影响。分布式DQN需要大量的现实世界经验来训练，但是一旦训练好了，它就不需要像NMPC那样花时间在每个时间步上寻找最佳动作。这种说明性的比较为混合方法奠定了基础，它可以应用于复杂的多智能体场景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 Australian & New Zealand Control Conference (ANZCC)

自引率

0.00%

发文量