单模光纤耦合光学卫星通信下行链路中波前无传感器自适应光学的强化学习环境

IF 2.1 4区物理与天体物理 Q2 OPTICS Photonics Pub Date : 2023-12-13 DOI:10.3390/photonics10121371

Payam Parvizi, Runnan Zou, Colin Bellinger, R. Cheriton, Davide Spinello

{"title":"单模光纤耦合光学卫星通信下行链路中波前无传感器自适应光学的强化学习环境","authors":"Payam Parvizi, Runnan Zou, Colin Bellinger, R. Cheriton, Davide Spinello","doi":"10.3390/photonics10121371","DOIUrl":null,"url":null,"abstract":"Optical satellite communications (OSC) downlinks can support much higher bandwidths than radio-frequency channels. However, atmospheric turbulence degrades the optical beam wavefront, leading to reduced data transfer rates. In this study, we propose using reinforcement learning (RL) as a lower-cost alternative to standard wavefront sensor-based solutions. We estimate that RL has the potential to reduce system latency, while lowering system costs by omitting the wavefront sensor and low-latency wavefront processing electronics. This is achieved by adopting a control policy learned through interactions with a cost-effective and ultra-fast readout of a low-dimensional photodetector array, rather than relying on a wavefront phase profiling camera. However, RL-based wavefront sensorless adaptive optics (AO) for OSC downlinks faces challenges relating to prediction latency, sample efficiency, and adaptability. To gain a deeper insight into these challenges, we have developed and shared the first OSC downlink RL environment and evaluated a diverse set of deep RL algorithms in the environment. Our results indicate that the Proximal Policy Optimization (PPO) algorithm outperforms the Soft Actor–Critic (SAC) and Deep Deterministic Policy Gradient (DDPG) algorithms. Moreover, PPO converges to within 86% of the maximum performance achievable by the predominant Shack–Hartmann wavefront sensor-based AO system. Our findings indicate the potential of RL in replacing wavefront sensor-based AO while reducing the cost of OSC downlinks.","PeriodicalId":20154,"journal":{"name":"Photonics","volume":"119 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning Environment for Wavefront Sensorless Adaptive Optics in Single-Mode Fiber Coupled Optical Satellite Communications Downlinks\",\"authors\":\"Payam Parvizi, Runnan Zou, Colin Bellinger, R. Cheriton, Davide Spinello\",\"doi\":\"10.3390/photonics10121371\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Optical satellite communications (OSC) downlinks can support much higher bandwidths than radio-frequency channels. However, atmospheric turbulence degrades the optical beam wavefront, leading to reduced data transfer rates. In this study, we propose using reinforcement learning (RL) as a lower-cost alternative to standard wavefront sensor-based solutions. We estimate that RL has the potential to reduce system latency, while lowering system costs by omitting the wavefront sensor and low-latency wavefront processing electronics. This is achieved by adopting a control policy learned through interactions with a cost-effective and ultra-fast readout of a low-dimensional photodetector array, rather than relying on a wavefront phase profiling camera. However, RL-based wavefront sensorless adaptive optics (AO) for OSC downlinks faces challenges relating to prediction latency, sample efficiency, and adaptability. To gain a deeper insight into these challenges, we have developed and shared the first OSC downlink RL environment and evaluated a diverse set of deep RL algorithms in the environment. Our results indicate that the Proximal Policy Optimization (PPO) algorithm outperforms the Soft Actor–Critic (SAC) and Deep Deterministic Policy Gradient (DDPG) algorithms. Moreover, PPO converges to within 86% of the maximum performance achievable by the predominant Shack–Hartmann wavefront sensor-based AO system. Our findings indicate the potential of RL in replacing wavefront sensor-based AO while reducing the cost of OSC downlinks.\",\"PeriodicalId\":20154,\"journal\":{\"name\":\"Photonics\",\"volume\":\"119 1\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2023-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Photonics\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.3390/photonics10121371\",\"RegionNum\":4,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"OPTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Photonics","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/photonics10121371","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OPTICS","Score":null,"Total":0}

引用次数: 0

摘要

与射频信道相比，光学卫星通信（OSC）下行链路可支持更高的带宽。然而，大气湍流会降低光束波前，导致数据传输速率降低。在这项研究中，我们建议使用强化学习（RL）作为基于波前传感器的标准解决方案的低成本替代方案。我们估计，RL 有可能减少系统延迟，同时通过省略波前传感器和低延迟波前处理电子设备来降低系统成本。这是通过与低维光电探测器阵列的高性价比和超快速读出进行交互，而不是依赖波前相位剖析相机来学习控制策略来实现的。然而，用于 OSC 下行链路的基于 RL 的无波前传感器自适应光学（AO）面临着与预测延迟、采样效率和适应性有关的挑战。为了更深入地了解这些挑战，我们开发并共享了首个 OSC 下行链路 RL 环境，并在该环境中评估了多种深度 RL 算法。结果表明，近端策略优化（PPO）算法优于软行为批判（SAC）和深度确定性策略梯度（DDPG）算法。此外，PPO收敛到了基于 Shack-Hartmann 波前传感器的主流 AO 系统所能达到的最高性能的 86% 以内。我们的研究结果表明，RL 有潜力取代基于波前传感器的自动光学系统，同时降低 OSC 下行链路的成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Reinforcement Learning Environment for Wavefront Sensorless Adaptive Optics in Single-Mode Fiber Coupled Optical Satellite Communications Downlinks

Optical satellite communications (OSC) downlinks can support much higher bandwidths than radio-frequency channels. However, atmospheric turbulence degrades the optical beam wavefront, leading to reduced data transfer rates. In this study, we propose using reinforcement learning (RL) as a lower-cost alternative to standard wavefront sensor-based solutions. We estimate that RL has the potential to reduce system latency, while lowering system costs by omitting the wavefront sensor and low-latency wavefront processing electronics. This is achieved by adopting a control policy learned through interactions with a cost-effective and ultra-fast readout of a low-dimensional photodetector array, rather than relying on a wavefront phase profiling camera. However, RL-based wavefront sensorless adaptive optics (AO) for OSC downlinks faces challenges relating to prediction latency, sample efficiency, and adaptability. To gain a deeper insight into these challenges, we have developed and shared the first OSC downlink RL environment and evaluated a diverse set of deep RL algorithms in the environment. Our results indicate that the Proximal Policy Optimization (PPO) algorithm outperforms the Soft Actor–Critic (SAC) and Deep Deterministic Policy Gradient (DDPG) algorithms. Moreover, PPO converges to within 86% of the maximum performance achievable by the predominant Shack–Hartmann wavefront sensor-based AO system. Our findings indicate the potential of RL in replacing wavefront sensor-based AO while reducing the cost of OSC downlinks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Photonics Physics and Astronomy-Instrumentation

CiteScore

2.60

自引率

20.80%

发文量

817

审稿时长

8 weeks

期刊介绍： Photonics (ISSN 2304-6732) aims at a fast turn around time for peer-reviewing manuscripts and producing accepted articles. The online-only and open access nature of the journal will allow for a speedy and wide circulation of your research as well as review articles. We aim at establishing Photonics as a leading venue for publishing high impact fundamental research but also applications of optics and photonics. The journal particularly welcomes both theoretical (simulation) and experimental research. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. Electronic files and software regarding the full details of the calculation and experimental procedure, if unable to be published in a normal way, can be deposited as supplementary material.