Biologically Plausible Variational Policy Gradient with Spiking Recurrent Winner-Take-All Networks

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-21 DOI:10.48550/arXiv.2210.13225

Zhile Yang, Shangqi Guo, Ying Fang, Jian K. Liu

引用次数: 0

Abstract

One stream of reinforcement learning research is exploring biologically plausible models and algorithms to simulate biological intelligence and fit neuromorphic hardware. Among them, reward-modulated spike-timing-dependent plasticity (R-STDP) is a recent branch with good potential in energy efficiency. However, current R-STDP methods rely on heuristic designs of local learning rules, thus requiring task-specific expert knowledge. In this paper, we consider a spiking recurrent winner-take-all network, and propose a new R-STDP method, spiking variational policy gradient (SVPG), whose local learning rules are derived from the global policy gradient and thus eliminate the need for heuristic designs. In experiments of MNIST classification and Gym InvertedPendulum, our SVPG achieves good training performance, and also presents better robustness to various kinds of noises than conventional methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

生物学上似是而非的变分策略梯度与反复出现的赢者通吃网络

强化学习研究的一个流派是探索生物学上合理的模型和算法来模拟生物智能和适应神经形态硬件。其中，奖励调制的峰值时间相关塑性(R-STDP)是一个在能效方面具有良好潜力的新分支。然而，目前的R-STDP方法依赖于局部学习规则的启发式设计，因此需要特定于任务的专家知识。本文考虑一个尖峰循环赢者通吃网络，提出了一种新的R-STDP方法——尖峰变分策略梯度(spike variational policy gradient, SVPG)，该方法的局部学习规则来源于全局策略梯度，从而消除了启发式设计的需要。在MNIST分类和Gym倒立摆的实验中，我们的SVPG取得了良好的训练效果，并且对各种噪声的鲁棒性也优于常规方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference

自引率

0.00%

发文量

期刊最新文献

Learning Anatomically Consistent Embedding for Chest Radiography. Single Pixel Spectral Color Constancy DiffSketching: Sketch Control Image Synthesis with Diffusion Models Defect Transfer GAN: Diverse Defect Synthesis for Data Augmentation Mitigating Bias in Visual Transformers via Targeted Alignment