Reward-based online learning in non-stationary environments: Adapting a P300-speller with a “backspace” key

E. Daucé, T. Proix, L. Ralaivola
{"title":"Reward-based online learning in non-stationary environments: Adapting a P300-speller with a “backspace” key","authors":"E. Daucé, T. Proix, L. Ralaivola","doi":"10.1109/IJCNN.2015.7280686","DOIUrl":null,"url":null,"abstract":"We adapt a policy gradient approach to the problem of reward-based online learning of a non-invasive EEG-based “P300”-speller. We first clarify the nature of the P300-speller classification problem and present a general regularized gradient ascent formula. We then show that when the reward is immediate and binary (namely “bad response” or “good response”), each update is expected to improve the classifier accuracy, whether the actual response is correct or not. We also estimate the robustness of the method to occasional mistaken rewards, i.e. show that the learning efficacy may only linearly decrease with the rate of invalid rewards. The effectiveness of our approach is tested in a series of simulations reproducing the conditions of real experiments. We show in a first experiment that a systematic improvement of the spelling rate is obtained for all subjects in the absence of initial calibration. In a second experiment, we consider the case of the online recovery that is expected to follow failed electrodes. Combined with a specific failure detection algorithm, the spelling error information (typically contained in a “backspace” hit) is shown useful for the policy gradient to adapt the P300 classifier to the new situation, provided the feedback is reliable enough (namely having a reliability greater than 70%).","PeriodicalId":6539,"journal":{"name":"2015 International Joint Conference on Neural Networks (IJCNN)","volume":"54 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2015.7280686","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

We adapt a policy gradient approach to the problem of reward-based online learning of a non-invasive EEG-based “P300”-speller. We first clarify the nature of the P300-speller classification problem and present a general regularized gradient ascent formula. We then show that when the reward is immediate and binary (namely “bad response” or “good response”), each update is expected to improve the classifier accuracy, whether the actual response is correct or not. We also estimate the robustness of the method to occasional mistaken rewards, i.e. show that the learning efficacy may only linearly decrease with the rate of invalid rewards. The effectiveness of our approach is tested in a series of simulations reproducing the conditions of real experiments. We show in a first experiment that a systematic improvement of the spelling rate is obtained for all subjects in the absence of initial calibration. In a second experiment, we consider the case of the online recovery that is expected to follow failed electrodes. Combined with a specific failure detection algorithm, the spelling error information (typically contained in a “backspace” hit) is shown useful for the policy gradient to adapt the P300 classifier to the new situation, provided the feedback is reliable enough (namely having a reliability greater than 70%).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
非固定环境中基于奖励的在线学习:使用“退格”键调整p300拼写器
我们采用了一种策略梯度方法来解决基于非侵入性脑电图的“P300”拼写者的基于奖励的在线学习问题。我们首先澄清了p300拼写分类问题的本质,并提出了一个通用的正则化梯度上升公式。然后我们表明,当奖励是即时的和二进制的(即“坏响应”或“好响应”)时,每次更新都有望提高分类器的准确性,无论实际响应是否正确。我们还估计了该方法对偶发错误奖励的鲁棒性,即表明学习效能可能仅随无效奖励率线性下降。通过对真实实验条件的一系列模拟,验证了该方法的有效性。我们在第一个实验中表明,在没有初始校准的情况下,所有受试者的拼写率都得到了系统的提高。在第二个实验中,我们考虑了电极失效后的在线恢复情况。结合特定的故障检测算法,拼写错误信息(通常包含在“backspace”命中中)对于策略梯度使P300分类器适应新情况非常有用,前提是反馈足够可靠(即可靠性大于70%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Efficient conformal regressors using bagged neural nets Repeated play of the SVM game as a means of adaptive classification Unit commitment considering multiple charging and discharging scenarios of plug-in electric vehicles High-dimensional function approximation using local linear embedding A label compression coding approach through maximizing dependence between features and labels for multi-label classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1