Peijiang Liu;Xindi Yang;Hongliang Ren;Hao Zhang;Zhuping Wang
{"title":"Learning Anticipatory Decision for Distributed Systems With Robustness Guarantees","authors":"Peijiang Liu;Xindi Yang;Hongliang Ren;Hao Zhang;Zhuping Wang","doi":"10.1109/TASE.2024.3493912","DOIUrl":null,"url":null,"abstract":"This paper investigates anticipatory decision for unknown distributed systems with robustness concerns. Anticipatory decision focuses on action selection before observations appear at temporal scales. Firstly, anticipatory decision forms sequential feedback with min-max performance guarantees, while causality comes from time series analysis. Next, distribution, robustness and time consistency partition the optimization into spatial and temporal sub-games. The spatial sub-games dispel conflicts on distribution and robustness, while the temporal ones ensure stability and performance through time consistency. Finally, we propose a multi-step reinforcement learning algorithm under causality analysis and game theoretical framework. Numerical results demonstrate the effectiveness of the approach, and practical experiments show potential real-world applications. Note to Practitioners—This framework focuses on anticipatory decision for distributed systems, which suffer from distributed communication, unknown dynamics, environmental disturbances and state observation loss. Our framework has various application scenarios, e.g., internal surgical robots, low-light autonomous driving and non-GPS navigation, and these scenarios mainly involve dynamic environments and weak signal feedback. For example, decision-making in autonomous driving requires not only reacting to current environmental conditions but also anticipating future scenarios and uncertainties due to poor visibility. Most results deal these issues with model-driven approaches, while unknown dynamics render these methods inapplicable. For implementation, we propose a multi-step reinforcement learning algorithm for anticipatory decision framework with stability and robustness guarantees, and details mainly contain three parts: 1) We collect data during offline phase, and form the data structure, namely, current-next observation pair with multi-step decision and accumulated reward; 2) Strategies and value functions are approximated with neural networks through Monte-Carlo methods; 3) The strategy is deployed as sequential feedback in practical systems, and predicts multi-step decisions with single-step state observation. Finally, we select robot consensus with optical sensors as the implementation demo.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"8965-8975"},"PeriodicalIF":6.4000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10751796/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
This paper investigates anticipatory decision for unknown distributed systems with robustness concerns. Anticipatory decision focuses on action selection before observations appear at temporal scales. Firstly, anticipatory decision forms sequential feedback with min-max performance guarantees, while causality comes from time series analysis. Next, distribution, robustness and time consistency partition the optimization into spatial and temporal sub-games. The spatial sub-games dispel conflicts on distribution and robustness, while the temporal ones ensure stability and performance through time consistency. Finally, we propose a multi-step reinforcement learning algorithm under causality analysis and game theoretical framework. Numerical results demonstrate the effectiveness of the approach, and practical experiments show potential real-world applications. Note to Practitioners—This framework focuses on anticipatory decision for distributed systems, which suffer from distributed communication, unknown dynamics, environmental disturbances and state observation loss. Our framework has various application scenarios, e.g., internal surgical robots, low-light autonomous driving and non-GPS navigation, and these scenarios mainly involve dynamic environments and weak signal feedback. For example, decision-making in autonomous driving requires not only reacting to current environmental conditions but also anticipating future scenarios and uncertainties due to poor visibility. Most results deal these issues with model-driven approaches, while unknown dynamics render these methods inapplicable. For implementation, we propose a multi-step reinforcement learning algorithm for anticipatory decision framework with stability and robustness guarantees, and details mainly contain three parts: 1) We collect data during offline phase, and form the data structure, namely, current-next observation pair with multi-step decision and accumulated reward; 2) Strategies and value functions are approximated with neural networks through Monte-Carlo methods; 3) The strategy is deployed as sequential feedback in practical systems, and predicts multi-step decisions with single-step state observation. Finally, we select robot consensus with optical sensors as the implementation demo.
期刊介绍:
The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.