Distributed Nonlinear Model Predictive Control and Reinforcement Learning

Ifrah Saeed, T. Alpcan, S. Erfani, M. Yilmaz
{"title":"Distributed Nonlinear Model Predictive Control and Reinforcement Learning","authors":"Ifrah Saeed, T. Alpcan, S. Erfani, M. Yilmaz","doi":"10.1109/ANZCC47194.2019.8945719","DOIUrl":null,"url":null,"abstract":"Coordinating two or more dynamic systems such as autonomous vehicles or satellites in a distributed manner poses an important research challenge. Multiple approaches to this problem have been proposed including Nonlinear Model Predictive Control (NMPC) and its model-free counterparts in reinforcement learning (RL) literature such as Deep QNetwork (DQN). This initial study aims to compare and contrast the optimal control technique, NMPC, where the model is known, with the popular model-free RL method, DQN. Simple distributed variants of these for the specific problem of balancing and synchronising two highly unstable cart-pole systems are investigated numerically. We found that both NMPC and trained DQN work optimally under ideal model and small communication delays. While NMPC performs sub-optimally under a model-mismatch scenario, DQN performance naturally does not suffer from this. Distributed DQN needs a lot of realworld experience to be trained but once it is trained, it does not have to spend its time finding the optimal action at every time-step like NMPC. This illustrative comparison lays a foundation for hybrid approaches, which can be applied to complex multi-agent scenarios.","PeriodicalId":322243,"journal":{"name":"2019 Australian & New Zealand Control Conference (ANZCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Australian & New Zealand Control Conference (ANZCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ANZCC47194.2019.8945719","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Coordinating two or more dynamic systems such as autonomous vehicles or satellites in a distributed manner poses an important research challenge. Multiple approaches to this problem have been proposed including Nonlinear Model Predictive Control (NMPC) and its model-free counterparts in reinforcement learning (RL) literature such as Deep QNetwork (DQN). This initial study aims to compare and contrast the optimal control technique, NMPC, where the model is known, with the popular model-free RL method, DQN. Simple distributed variants of these for the specific problem of balancing and synchronising two highly unstable cart-pole systems are investigated numerically. We found that both NMPC and trained DQN work optimally under ideal model and small communication delays. While NMPC performs sub-optimally under a model-mismatch scenario, DQN performance naturally does not suffer from this. Distributed DQN needs a lot of realworld experience to be trained but once it is trained, it does not have to spend its time finding the optimal action at every time-step like NMPC. This illustrative comparison lays a foundation for hybrid approaches, which can be applied to complex multi-agent scenarios.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
分布式非线性模型预测控制与强化学习
以分布式方式协调两个或多个动态系统(如自动驾驶汽车或卫星)是一个重要的研究挑战。已经提出了多种方法来解决这个问题,包括非线性模型预测控制(NMPC)及其在强化学习(RL)文献中的无模型对应物,如深度QNetwork (DQN)。这项初步研究的目的是比较和对比最优控制技术,NMPC,其中模型是已知的,与流行的无模型RL方法,DQN。对于平衡和同步两个高度不稳定的车杆系统的具体问题,这些简单的分布变体进行了数值研究。我们发现NMPC和训练DQN在理想的模型和较小的通信延迟下都是最优的。当NMPC在模型不匹配的情况下执行次优时,DQN的性能自然不会受到影响。分布式DQN需要大量的现实世界经验来训练,但是一旦训练好了,它就不需要像NMPC那样花时间在每个时间步上寻找最佳动作。这种说明性的比较为混合方法奠定了基础,它可以应用于复杂的多智能体场景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Psychoacoustically Motivated Active Noise Control at Remote Locations Generalized Simulated Annealing with Sequentially Modified Cost Function for Combinatorial optimization Problems Positive-Real Truncated Balanced Realization based Frequency-Weighted Model reduction Interval Switched Positive Observers for Discrete-Time Switched Positive Systems under Arbitrary Switching A Fast Session Key Generation Scheme for LoRaWAN
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1