Learning Anticipatory Decision for Distributed Systems With Robustness Guarantees

IF 6.4 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2024-11-13 DOI:10.1109/TASE.2024.3493912

Peijiang Liu;Xindi Yang;Hongliang Ren;Hao Zhang;Zhuping Wang

{"title":"Learning Anticipatory Decision for Distributed Systems With Robustness Guarantees","authors":"Peijiang Liu;Xindi Yang;Hongliang Ren;Hao Zhang;Zhuping Wang","doi":"10.1109/TASE.2024.3493912","DOIUrl":null,"url":null,"abstract":"This paper investigates anticipatory decision for unknown distributed systems with robustness concerns. Anticipatory decision focuses on action selection before observations appear at temporal scales. Firstly, anticipatory decision forms sequential feedback with min-max performance guarantees, while causality comes from time series analysis. Next, distribution, robustness and time consistency partition the optimization into spatial and temporal sub-games. The spatial sub-games dispel conflicts on distribution and robustness, while the temporal ones ensure stability and performance through time consistency. Finally, we propose a multi-step reinforcement learning algorithm under causality analysis and game theoretical framework. Numerical results demonstrate the effectiveness of the approach, and practical experiments show potential real-world applications. Note to Practitioners—This framework focuses on anticipatory decision for distributed systems, which suffer from distributed communication, unknown dynamics, environmental disturbances and state observation loss. Our framework has various application scenarios, e.g., internal surgical robots, low-light autonomous driving and non-GPS navigation, and these scenarios mainly involve dynamic environments and weak signal feedback. For example, decision-making in autonomous driving requires not only reacting to current environmental conditions but also anticipating future scenarios and uncertainties due to poor visibility. Most results deal these issues with model-driven approaches, while unknown dynamics render these methods inapplicable. For implementation, we propose a multi-step reinforcement learning algorithm for anticipatory decision framework with stability and robustness guarantees, and details mainly contain three parts: 1) We collect data during offline phase, and form the data structure, namely, current-next observation pair with multi-step decision and accumulated reward; 2) Strategies and value functions are approximated with neural networks through Monte-Carlo methods; 3) The strategy is deployed as sequential feedback in practical systems, and predicts multi-step decisions with single-step state observation. Finally, we select robot consensus with optical sensors as the implementation demo.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"8965-8975"},"PeriodicalIF":6.4000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10751796/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

This paper investigates anticipatory decision for unknown distributed systems with robustness concerns. Anticipatory decision focuses on action selection before observations appear at temporal scales. Firstly, anticipatory decision forms sequential feedback with min-max performance guarantees, while causality comes from time series analysis. Next, distribution, robustness and time consistency partition the optimization into spatial and temporal sub-games. The spatial sub-games dispel conflicts on distribution and robustness, while the temporal ones ensure stability and performance through time consistency. Finally, we propose a multi-step reinforcement learning algorithm under causality analysis and game theoretical framework. Numerical results demonstrate the effectiveness of the approach, and practical experiments show potential real-world applications. Note to Practitioners—This framework focuses on anticipatory decision for distributed systems, which suffer from distributed communication, unknown dynamics, environmental disturbances and state observation loss. Our framework has various application scenarios, e.g., internal surgical robots, low-light autonomous driving and non-GPS navigation, and these scenarios mainly involve dynamic environments and weak signal feedback. For example, decision-making in autonomous driving requires not only reacting to current environmental conditions but also anticipating future scenarios and uncertainties due to poor visibility. Most results deal these issues with model-driven approaches, while unknown dynamics render these methods inapplicable. For implementation, we propose a multi-step reinforcement learning algorithm for anticipatory decision framework with stability and robustness guarantees, and details mainly contain three parts: 1) We collect data during offline phase, and form the data structure, namely, current-next observation pair with multi-step decision and accumulated reward; 2) Strategies and value functions are approximated with neural networks through Monte-Carlo methods; 3) The strategy is deployed as sequential feedback in practical systems, and predicts multi-step decisions with single-step state observation. Finally, we select robot consensus with optical sensors as the implementation demo.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

为具有鲁棒性保证的分布式系统学习预期决策

本文研究了具有鲁棒性问题的未知分布式系统的预期决策问题。预期决策侧重于在时间尺度上观察出现之前的行动选择。首先，预期决策形成了具有最小-最大性能保证的顺序反馈，而因果关系来源于时间序列分析。其次，分布、鲁棒性和时间一致性将优化划分为空间子博弈和时间子博弈。空间子博弈消除了分布和鲁棒性上的冲突，时间子博弈通过时间一致性保证了稳定性和性能。最后，我们提出了一种基于因果分析和博弈论框架的多步强化学习算法。数值结果表明了该方法的有效性，实际实验表明了该方法在现实世界中的潜在应用。实践者注意：这个框架关注的是分布式系统的预期决策，这些系统受到分布式通信、未知动态、环境干扰和状态观察损失的影响。我们的框架有多种应用场景，例如内部手术机器人、微光自动驾驶和非gps导航，这些场景主要涉及动态环境和弱信号反馈。例如，自动驾驶中的决策不仅需要对当前的环境条件做出反应，还需要预测未来的场景和由于能见度低而产生的不确定性。大多数结果用模型驱动的方法处理这些问题，而未知的动态使得这些方法不适用。为实现预期决策框架，我们提出了一种具有稳定性和鲁棒性保证的多步强化学习算法，具体内容主要包括三部分：1)在离线阶段采集数据，形成具有多步决策和累积奖励的当前-下一观测对数据结构；2)通过蒙特卡罗方法，用神经网络逼近策略和价值函数；3)该策略在实际系统中以顺序反馈的形式部署，通过单步状态观察预测多步决策。最后，我们选择了具有光学传感器的机器人共识作为实现演示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.