Learning Anticipatory Decision for Distributed Systems With Robustness Guarantees

IF 6.4 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2024-11-13 DOI:10.1109/TASE.2024.3493912
Peijiang Liu;Xindi Yang;Hongliang Ren;Hao Zhang;Zhuping Wang
{"title":"Learning Anticipatory Decision for Distributed Systems With Robustness Guarantees","authors":"Peijiang Liu;Xindi Yang;Hongliang Ren;Hao Zhang;Zhuping Wang","doi":"10.1109/TASE.2024.3493912","DOIUrl":null,"url":null,"abstract":"This paper investigates anticipatory decision for unknown distributed systems with robustness concerns. Anticipatory decision focuses on action selection before observations appear at temporal scales. Firstly, anticipatory decision forms sequential feedback with min-max performance guarantees, while causality comes from time series analysis. Next, distribution, robustness and time consistency partition the optimization into spatial and temporal sub-games. The spatial sub-games dispel conflicts on distribution and robustness, while the temporal ones ensure stability and performance through time consistency. Finally, we propose a multi-step reinforcement learning algorithm under causality analysis and game theoretical framework. Numerical results demonstrate the effectiveness of the approach, and practical experiments show potential real-world applications. Note to Practitioners—This framework focuses on anticipatory decision for distributed systems, which suffer from distributed communication, unknown dynamics, environmental disturbances and state observation loss. Our framework has various application scenarios, e.g., internal surgical robots, low-light autonomous driving and non-GPS navigation, and these scenarios mainly involve dynamic environments and weak signal feedback. For example, decision-making in autonomous driving requires not only reacting to current environmental conditions but also anticipating future scenarios and uncertainties due to poor visibility. Most results deal these issues with model-driven approaches, while unknown dynamics render these methods inapplicable. For implementation, we propose a multi-step reinforcement learning algorithm for anticipatory decision framework with stability and robustness guarantees, and details mainly contain three parts: 1) We collect data during offline phase, and form the data structure, namely, current-next observation pair with multi-step decision and accumulated reward; 2) Strategies and value functions are approximated with neural networks through Monte-Carlo methods; 3) The strategy is deployed as sequential feedback in practical systems, and predicts multi-step decisions with single-step state observation. Finally, we select robot consensus with optical sensors as the implementation demo.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"8965-8975"},"PeriodicalIF":6.4000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10751796/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

This paper investigates anticipatory decision for unknown distributed systems with robustness concerns. Anticipatory decision focuses on action selection before observations appear at temporal scales. Firstly, anticipatory decision forms sequential feedback with min-max performance guarantees, while causality comes from time series analysis. Next, distribution, robustness and time consistency partition the optimization into spatial and temporal sub-games. The spatial sub-games dispel conflicts on distribution and robustness, while the temporal ones ensure stability and performance through time consistency. Finally, we propose a multi-step reinforcement learning algorithm under causality analysis and game theoretical framework. Numerical results demonstrate the effectiveness of the approach, and practical experiments show potential real-world applications. Note to Practitioners—This framework focuses on anticipatory decision for distributed systems, which suffer from distributed communication, unknown dynamics, environmental disturbances and state observation loss. Our framework has various application scenarios, e.g., internal surgical robots, low-light autonomous driving and non-GPS navigation, and these scenarios mainly involve dynamic environments and weak signal feedback. For example, decision-making in autonomous driving requires not only reacting to current environmental conditions but also anticipating future scenarios and uncertainties due to poor visibility. Most results deal these issues with model-driven approaches, while unknown dynamics render these methods inapplicable. For implementation, we propose a multi-step reinforcement learning algorithm for anticipatory decision framework with stability and robustness guarantees, and details mainly contain three parts: 1) We collect data during offline phase, and form the data structure, namely, current-next observation pair with multi-step decision and accumulated reward; 2) Strategies and value functions are approximated with neural networks through Monte-Carlo methods; 3) The strategy is deployed as sequential feedback in practical systems, and predicts multi-step decisions with single-step state observation. Finally, we select robot consensus with optical sensors as the implementation demo.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为具有鲁棒性保证的分布式系统学习预期决策
本文研究了具有鲁棒性问题的未知分布式系统的预期决策问题。预期决策侧重于在时间尺度上观察出现之前的行动选择。首先,预期决策形成了具有最小-最大性能保证的顺序反馈,而因果关系来源于时间序列分析。其次,分布、鲁棒性和时间一致性将优化划分为空间子博弈和时间子博弈。空间子博弈消除了分布和鲁棒性上的冲突,时间子博弈通过时间一致性保证了稳定性和性能。最后,我们提出了一种基于因果分析和博弈论框架的多步强化学习算法。数值结果表明了该方法的有效性,实际实验表明了该方法在现实世界中的潜在应用。实践者注意:这个框架关注的是分布式系统的预期决策,这些系统受到分布式通信、未知动态、环境干扰和状态观察损失的影响。我们的框架有多种应用场景,例如内部手术机器人、微光自动驾驶和非gps导航,这些场景主要涉及动态环境和弱信号反馈。例如,自动驾驶中的决策不仅需要对当前的环境条件做出反应,还需要预测未来的场景和由于能见度低而产生的不确定性。大多数结果用模型驱动的方法处理这些问题,而未知的动态使得这些方法不适用。为实现预期决策框架,我们提出了一种具有稳定性和鲁棒性保证的多步强化学习算法,具体内容主要包括三部分:1)在离线阶段采集数据,形成具有多步决策和累积奖励的当前-下一观测对数据结构;2)通过蒙特卡罗方法,用神经网络逼近策略和价值函数;3)该策略在实际系统中以顺序反馈的形式部署,通过单步状态观察预测多步决策。最后,我们选择了具有光学传感器的机器人共识作为实现演示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Automation Science and Engineering
IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统
CiteScore
12.50
自引率
14.30%
发文量
404
审稿时长
3.0 months
期刊介绍: The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.
期刊最新文献
PipeCLIP: Defect-conditioned and Cross-focus-driven Vision-language Model for Video-based Sewer Defect Inspection Fuzzy Decoupled Adaptive Impedance Control for a Five-Wheel Wall-Climbing Robot on Discontinuous Wall Surfaces Observer-based Nonlinear Consensus Control of Virtually Coupled Heterogeneous Trains Incorporating Train-following Interactions Transfer state estimator for new operation modes using variable-structure multiple models Stochastic Sampled-Data Fuzzy Security Control for Nonlinear Markov Jump Distributed Parameter Systems With Time-Varying Delay
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1