High-Fidelity Data-Driven Dynamics Model for Reinforcement Learning-based Magnetic Control in HL-3 Tokamak

arXiv - PHYS - Plasma Physics Pub Date : 2024-09-14 DOI:arxiv-2409.09238

Niannian Wu, Zongyu Yang, Rongpeng Li, Ning Wei, Yihang Chen, Qianyun Dong, Jiyuan Li, Guohui Zheng, Xinwen Gong, Feng Gao, Bo Li, Min Xu, Zhifeng Zhao, Wulyu Zhong

{"title":"High-Fidelity Data-Driven Dynamics Model for Reinforcement Learning-based Magnetic Control in HL-3 Tokamak","authors":"Niannian Wu, Zongyu Yang, Rongpeng Li, Ning Wei, Yihang Chen, Qianyun Dong, Jiyuan Li, Guohui Zheng, Xinwen Gong, Feng Gao, Bo Li, Min Xu, Zhifeng Zhao, Wulyu Zhong","doi":"arxiv-2409.09238","DOIUrl":null,"url":null,"abstract":"The drive to control tokamaks, a prominent technology in nuclear fusion, is\nessential due to its potential to provide a virtually unlimited source of clean\nenergy. Reinforcement learning (RL) promises improved flexibility to manage the\nintricate and non-linear dynamics of the plasma encapsulated in a tokamak.\nHowever, RL typically requires substantial interaction with a simulator capable\nof accurately evolving the high-dimensional plasma state. Compared to\nfirst-principle-based simulators, whose intense computations lead to sluggish\nRL training, we devise an effective method to acquire a fully data-driven\nsimulator, by mitigating the arising compounding error issue due to the\nunderlying autoregressive nature. With high accuracy and appealing\nextrapolation capability, this high-fidelity dynamics model subsequently\nenables the rapid training of a qualified RL agent to directly generate\nengineering-reasonable magnetic coil commands, aiming at the desired long-term\ntargets of plasma current and last closed flux surface. Together with a\nsurrogate magnetic equilibrium reconstruction model EFITNN, the RL agent\nsuccessfully maintains a $100$-ms, $1$ kHz trajectory control with accurate\nwaveform tracking on the HL-3 tokamak. Furthermore, it also demonstrates the\nfeasibility of zero-shot adaptation to changed triangularity targets,\nconfirming the robustness of the developed data-driven dynamics model. Our work\nunderscores the advantage of fully data-driven dynamics models in yielding\nRL-based trajectory control policies at a sufficiently fast pace, an\nanticipated engineering requirement in daily discharge practices for the\nupcoming ITER device.","PeriodicalId":501274,"journal":{"name":"arXiv - PHYS - Plasma Physics","volume":"41 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Plasma Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The drive to control tokamaks, a prominent technology in nuclear fusion, is essential due to its potential to provide a virtually unlimited source of clean energy. Reinforcement learning (RL) promises improved flexibility to manage the intricate and non-linear dynamics of the plasma encapsulated in a tokamak. However, RL typically requires substantial interaction with a simulator capable of accurately evolving the high-dimensional plasma state. Compared to first-principle-based simulators, whose intense computations lead to sluggish RL training, we devise an effective method to acquire a fully data-driven simulator, by mitigating the arising compounding error issue due to the underlying autoregressive nature. With high accuracy and appealing extrapolation capability, this high-fidelity dynamics model subsequently enables the rapid training of a qualified RL agent to directly generate engineering-reasonable magnetic coil commands, aiming at the desired long-term targets of plasma current and last closed flux surface. Together with a surrogate magnetic equilibrium reconstruction model EFITNN, the RL agent successfully maintains a $100$-ms, $1$ kHz trajectory control with accurate waveform tracking on the HL-3 tokamak. Furthermore, it also demonstrates the feasibility of zero-shot adaptation to changed triangularity targets, confirming the robustness of the developed data-driven dynamics model. Our work underscores the advantage of fully data-driven dynamics models in yielding RL-based trajectory control policies at a sufficiently fast pace, an anticipated engineering requirement in daily discharge practices for the upcoming ITER device.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于强化学习的 HL-3 托卡马克磁控制的高保真数据驱动动力学模型

托卡马克是核聚变领域的一项重要技术，由于它有可能提供几乎无限的清洁能源，因此控制托卡马克至关重要。然而，强化学习（RL）通常需要与能够精确演化高维等离子体状态的模拟器进行大量交互。与基于第一原理的模拟器（其高强度计算导致 RL 训练迟缓）相比，我们设计了一种有效的方法来获得完全由数据驱动的模拟器，减轻了由于潜在的自回归性质而产生的复合误差问题。这种高保真动力学模型具有高精度和极具吸引力的外推能力，因此可以快速训练出合格的 RL 代理，直接生成工程合理的磁线圈指令，从而实现等离子体电流和最后闭合磁通表面的长期目标。结合代用磁平衡重建模型 EFITNN，RL 代理成功地在 HL-3 托卡马克上保持了 100 美元-毫秒、1 美元-千赫兹的轨迹控制和精确的波形跟踪。此外，它还证明了对变化的三角形目标进行零次适应的可行性，证实了所开发的数据驱动动力学模型的鲁棒性。我们的工作证明了全数据驱动动力学模型在以足够快的速度产生基于RL的轨迹控制策略方面的优势，而这正是即将投入使用的ITER装置在日常放电实践中的预期工程要求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - PHYS - Plasma Physics

自引率

0.00%

发文量