在脑机接口中使用核反强化学习进行基于隐藏脑状态的内部评估

IF 4.8 2区医学 Q2 ENGINEERING, BIOMEDICAL IEEE Transactions on Neural Systems and Rehabilitation Engineering Pub Date : 2024-11-21 DOI:10.1109/TNSRE.2024.3503713

Jieyuan Tan;Xiang Zhang;Shenghui Wu;Zhiwei Song;Yiwen Wang

{"title":"在脑机接口中使用核反强化学习进行基于隐藏脑状态的内部评估","authors":"Jieyuan Tan;Xiang Zhang;Shenghui Wu;Zhiwei Song;Yiwen Wang","doi":"10.1109/TNSRE.2024.3503713","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL)-based brain machine interfaces (BMIs) assist paralyzed people in controlling neural prostheses without the need for real limb movement as supervised signals. The design of reward signal significantly impacts the learning efficiency of the RL-based decoders. Existing reward designs in the RL-based BMI framework rely on external rewards or manually labeled internal rewards, unable to accurately extract subjects’ internal evaluation. In this paper, we propose a hidden brain state-based kernel inverse reinforcement learning (HBS-KIRL) method to accurately infer the subject-specific internal evaluation from neural activity during the BMI task. The state-space model is applied to project the neural state into low-dimensional hidden brain state space, which greatly reduces the exploration dimension. Then the kernel method is applied to speed up the convergence of policy, reward, and Q-value networks in reproducing kernel Hilbert space (RKHS). We tested our proposed algorithm on the data collected from the medial prefrontal cortex (mPFC) of rats when they were performing a two-lever-discrimination task. We assessed the state-value estimation performance of our proposed method and compared it with naïve IRL and PCA-based IRL. To validate that the extracted internal evaluation could contribute to the decoder training, we compared the decoding performance of decoders trained by different reward models, including manually designed reward, naïve IRL, PCA-IRL, and our proposed HBS-KIRL. The results show that the HBS-KIRL method can give a stable and accurate estimation of state-value distribution with respect to behavior. Compared with other methods, the decoder guided by HBS-KIRL achieves consistent and better decoding performance over days. This study reveals the potential of applying the IRL method to better extract subject-specific evaluation and improve the BMI decoding performance.","PeriodicalId":13419,"journal":{"name":"IEEE Transactions on Neural Systems and Rehabilitation Engineering","volume":"32 ","pages":"4219-4229"},"PeriodicalIF":4.8000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10759843","citationCount":"0","resultStr":"{\"title\":\"Hidden Brain State-Based Internal Evaluation Using Kernel Inverse Reinforcement Learning in Brain-Machine Interfaces\",\"authors\":\"Jieyuan Tan;Xiang Zhang;Shenghui Wu;Zhiwei Song;Yiwen Wang\",\"doi\":\"10.1109/TNSRE.2024.3503713\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL)-based brain machine interfaces (BMIs) assist paralyzed people in controlling neural prostheses without the need for real limb movement as supervised signals. The design of reward signal significantly impacts the learning efficiency of the RL-based decoders. Existing reward designs in the RL-based BMI framework rely on external rewards or manually labeled internal rewards, unable to accurately extract subjects’ internal evaluation. In this paper, we propose a hidden brain state-based kernel inverse reinforcement learning (HBS-KIRL) method to accurately infer the subject-specific internal evaluation from neural activity during the BMI task. The state-space model is applied to project the neural state into low-dimensional hidden brain state space, which greatly reduces the exploration dimension. Then the kernel method is applied to speed up the convergence of policy, reward, and Q-value networks in reproducing kernel Hilbert space (RKHS). We tested our proposed algorithm on the data collected from the medial prefrontal cortex (mPFC) of rats when they were performing a two-lever-discrimination task. We assessed the state-value estimation performance of our proposed method and compared it with naïve IRL and PCA-based IRL. To validate that the extracted internal evaluation could contribute to the decoder training, we compared the decoding performance of decoders trained by different reward models, including manually designed reward, naïve IRL, PCA-IRL, and our proposed HBS-KIRL. The results show that the HBS-KIRL method can give a stable and accurate estimation of state-value distribution with respect to behavior. Compared with other methods, the decoder guided by HBS-KIRL achieves consistent and better decoding performance over days. This study reveals the potential of applying the IRL method to better extract subject-specific evaluation and improve the BMI decoding performance.\",\"PeriodicalId\":13419,\"journal\":{\"name\":\"IEEE Transactions on Neural Systems and Rehabilitation Engineering\",\"volume\":\"32 \",\"pages\":\"4219-4229\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10759843\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Neural Systems and Rehabilitation Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10759843/\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Neural Systems and Rehabilitation Engineering","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10759843/","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

摘要

基于强化学习（RL）的脑机接口（BMI）可帮助瘫痪者控制神经假肢，而无需真实的肢体运动作为监督信号。奖励信号的设计对基于 RL 的解码器的学习效率有很大影响。基于 RL 的 BMI 框架中的现有奖励设计依赖于外部奖励或人工标注的内部奖励，无法准确提取受试者的内部评价。在本文中，我们提出了一种基于隐脑状态的核反强化学习（HBS-KIRL）方法，以从 BMI 任务中的神经活动中准确推断出特定受试者的内部评价。利用状态空间模型将神经状态投射到低维的隐藏脑状态空间，从而大大降低了探索维度。然后应用核方法加速政策、奖励和 Q 值网络在重现核希尔伯特空间（RKHS）中的收敛。我们利用从大鼠内侧前额叶皮层（mPFC）收集到的数据对所提出的算法进行了测试，当时大鼠正在执行双杠杆辨别任务。我们评估了所提方法的状态值估计性能，并将其与天真 IRL 和基于 PCA 的 IRL 进行了比较。为了验证提取的内部评估是否有助于解码器的训练，我们比较了由不同奖励模型训练的解码器的解码性能，包括人工设计的奖励、天真 IRL、PCA-IRL 和我们提出的 HBS-KIRL。结果表明，HBS-KIRL 方法可以对行为的状态值分布做出稳定而准确的估计。与其他方法相比，HBS-KIRL 引导的解码器能在数天内实现稳定且更好的解码性能。本研究揭示了应用 IRL 方法更好地提取特定主题评价和提高 BMI 解码性能的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Hidden Brain State-Based Internal Evaluation Using Kernel Inverse Reinforcement Learning in Brain-Machine Interfaces

Reinforcement learning (RL)-based brain machine interfaces (BMIs) assist paralyzed people in controlling neural prostheses without the need for real limb movement as supervised signals. The design of reward signal significantly impacts the learning efficiency of the RL-based decoders. Existing reward designs in the RL-based BMI framework rely on external rewards or manually labeled internal rewards, unable to accurately extract subjects’ internal evaluation. In this paper, we propose a hidden brain state-based kernel inverse reinforcement learning (HBS-KIRL) method to accurately infer the subject-specific internal evaluation from neural activity during the BMI task. The state-space model is applied to project the neural state into low-dimensional hidden brain state space, which greatly reduces the exploration dimension. Then the kernel method is applied to speed up the convergence of policy, reward, and Q-value networks in reproducing kernel Hilbert space (RKHS). We tested our proposed algorithm on the data collected from the medial prefrontal cortex (mPFC) of rats when they were performing a two-lever-discrimination task. We assessed the state-value estimation performance of our proposed method and compared it with naïve IRL and PCA-based IRL. To validate that the extracted internal evaluation could contribute to the decoder training, we compared the decoding performance of decoders trained by different reward models, including manually designed reward, naïve IRL, PCA-IRL, and our proposed HBS-KIRL. The results show that the HBS-KIRL method can give a stable and accurate estimation of state-value distribution with respect to behavior. Compared with other methods, the decoder guided by HBS-KIRL achieves consistent and better decoding performance over days. This study reveals the potential of applying the IRL method to better extract subject-specific evaluation and improve the BMI decoding performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Neural Systems and Rehabilitation Engineering 医学-工程：生物医学

CiteScore

8.60

自引率

8.20%

发文量

479

审稿时长

6-12 weeks

期刊介绍： Rehabilitative and neural aspects of biomedical engineering, including functional electrical stimulation, acoustic dynamics, human performance measurement and analysis, nerve stimulation, electromyography, motor control and stimulation; and hardware and software applications for rehabilitation engineering and assistive devices.