一种改进的批处理强化学习控制策略

P. Zhang, Jie Zhang, Yang Long, Bingzhang Hu
{"title":"一种改进的批处理强化学习控制策略","authors":"P. Zhang, Jie Zhang, Yang Long, Bingzhang Hu","doi":"10.1109/MMAR.2019.8864632","DOIUrl":null,"url":null,"abstract":"Batch processes are significant and essential manufacturing route for the agile manufacturing of high value added products and they are typically difficult to control because of unknown disturbances, model plant mismatches, and highly nonlinear characteristic. Traditional one-step reinforcement learning and neural network have been applied to optimize and control batch processes. However, traditional one-step reinforcement learning and the neural network lack accuracy and robustness leading to unsatisfactory performance. To overcome these issues and difficulties, a modified multi-step action Q-learning algorithm (MMSA) based on multiple step action Q-learning (MSA) is proposed in this paper. For MSA, the action space is divided into some periods of same time steps and the same action is explored with fixed greedy policy being applied continuously during a period. Compared with MSA, the modification of MMSA is that the exploration and selection of action will follow an improved and various greedy policy in the whole system time which can improve the flexibility and speed of the learning algorithm. The proposed algorithm is applied to a highly nonlinear batch process and it is shown giving better control performance than the traditional one-step reinforcement learning and MSA.","PeriodicalId":392498,"journal":{"name":"2019 24th International Conference on Methods and Models in Automation and Robotics (MMAR)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An improved reinforcement learning control strategy for batch processes\",\"authors\":\"P. Zhang, Jie Zhang, Yang Long, Bingzhang Hu\",\"doi\":\"10.1109/MMAR.2019.8864632\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Batch processes are significant and essential manufacturing route for the agile manufacturing of high value added products and they are typically difficult to control because of unknown disturbances, model plant mismatches, and highly nonlinear characteristic. Traditional one-step reinforcement learning and neural network have been applied to optimize and control batch processes. However, traditional one-step reinforcement learning and the neural network lack accuracy and robustness leading to unsatisfactory performance. To overcome these issues and difficulties, a modified multi-step action Q-learning algorithm (MMSA) based on multiple step action Q-learning (MSA) is proposed in this paper. For MSA, the action space is divided into some periods of same time steps and the same action is explored with fixed greedy policy being applied continuously during a period. Compared with MSA, the modification of MMSA is that the exploration and selection of action will follow an improved and various greedy policy in the whole system time which can improve the flexibility and speed of the learning algorithm. The proposed algorithm is applied to a highly nonlinear batch process and it is shown giving better control performance than the traditional one-step reinforcement learning and MSA.\",\"PeriodicalId\":392498,\"journal\":{\"name\":\"2019 24th International Conference on Methods and Models in Automation and Robotics (MMAR)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 24th International Conference on Methods and Models in Automation and Robotics (MMAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MMAR.2019.8864632\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 24th International Conference on Methods and Models in Automation and Robotics (MMAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMAR.2019.8864632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

批量生产过程是高附加值产品敏捷生产的重要而必要的生产路线,但由于存在未知干扰、模型厂失配和高度非线性等特点,批量生产过程难以控制。传统的一步强化学习和神经网络已被用于批量过程的优化和控制。然而,传统的一步强化学习和神经网络缺乏准确性和鲁棒性,导致性能不理想。为了克服这些问题和困难,本文提出了一种基于多步动作q学习的改进多步动作q学习算法(MMSA)。对于MSA,将动作空间划分为若干相同时间步长的时间段,并在一个时间段内连续应用固定的贪婪策略来探索相同的动作。与MSA相比,MMSA的改进之处在于,在整个系统时间内,动作的探索和选择将遵循一种改进的、多样化的贪婪策略,从而提高了学习算法的灵活性和速度。将该算法应用于一个高度非线性的批处理过程,结果表明,该算法比传统的一步强化学习和MSA具有更好的控制性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An improved reinforcement learning control strategy for batch processes
Batch processes are significant and essential manufacturing route for the agile manufacturing of high value added products and they are typically difficult to control because of unknown disturbances, model plant mismatches, and highly nonlinear characteristic. Traditional one-step reinforcement learning and neural network have been applied to optimize and control batch processes. However, traditional one-step reinforcement learning and the neural network lack accuracy and robustness leading to unsatisfactory performance. To overcome these issues and difficulties, a modified multi-step action Q-learning algorithm (MMSA) based on multiple step action Q-learning (MSA) is proposed in this paper. For MSA, the action space is divided into some periods of same time steps and the same action is explored with fixed greedy policy being applied continuously during a period. Compared with MSA, the modification of MMSA is that the exploration and selection of action will follow an improved and various greedy policy in the whole system time which can improve the flexibility and speed of the learning algorithm. The proposed algorithm is applied to a highly nonlinear batch process and it is shown giving better control performance than the traditional one-step reinforcement learning and MSA.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Interval Observer-Based Controller Design for Systems with State Constraints: Application to Solid Oxide Fuel Cells Stacks Process Fault Detection and Reconstruction by Principal Component Analysis Maintenance Scheduling of the Embroidery Machines Based on Fuzzy Logic Application of Artificial Intelligence in Sustainable Building Design - Optimisation Methods Social robot in diagnosis of autism among preschool children
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1