Reinforcement learning and the reward positivity with aversive outcomes.

Psychophysiology Pub Date : 2024-04-01 Epub Date: 2023-11-22 DOI:10.1111/psyp.14460
Elizabeth A Bauer, Brandon K Watanabe, Annmarie MacNamara
{"title":"Reinforcement learning and the reward positivity with aversive outcomes.","authors":"Elizabeth A Bauer, Brandon K Watanabe, Annmarie MacNamara","doi":"10.1111/psyp.14460","DOIUrl":null,"url":null,"abstract":"<p><p>The reinforcement learning (RL) theory of the reward positivity (RewP), an event-related potential (ERP) component that measures reward responsivity, suggests that the RewP should be largest when positive outcomes are unexpected and has been supported by work using appetitive outcomes (e.g., money). However, the RewP can also be elicited by the absence of aversive outcomes (e.g., shock). The limited work to-date that has manipulated expectancy while using aversive outcomes has not supported the predictions of RL theory. Nonetheless, this work has been difficult to reconcile with the appetitive literature because the RewP was not observed as a reward signal in these studies, which used passive tasks that did not involve participant choice. Here, we tested the predictions of the RL theory by manipulating expectancy in an active/choice-based threat-of-shock doors task that was previously found to elicit the RewP as a reward signal. Moreover, we used principal components analysis to isolate the RewP from overlapping ERP components. Eighty participants viewed pairs of doors surrounded by a red or green border; shock delivery was expected (80%) following red-bordered doors and unexpected (20%) following green-bordered doors. The RewP was observed as a reward signal (i.e., no shock > shock) that was not potentiated for unexpected feedback. In addition, the RewP was larger overall for unexpected (vs expected) feedback. Therefore, the RewP appears to reflect the additive (not interactive) effects of reward and expectancy, challenging the RL theory of the RewP, at least when reward is defined as the absence of an aversive outcome.</p>","PeriodicalId":94182,"journal":{"name":"Psychophysiology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10939817/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychophysiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/psyp.14460","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/11/22 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The reinforcement learning (RL) theory of the reward positivity (RewP), an event-related potential (ERP) component that measures reward responsivity, suggests that the RewP should be largest when positive outcomes are unexpected and has been supported by work using appetitive outcomes (e.g., money). However, the RewP can also be elicited by the absence of aversive outcomes (e.g., shock). The limited work to-date that has manipulated expectancy while using aversive outcomes has not supported the predictions of RL theory. Nonetheless, this work has been difficult to reconcile with the appetitive literature because the RewP was not observed as a reward signal in these studies, which used passive tasks that did not involve participant choice. Here, we tested the predictions of the RL theory by manipulating expectancy in an active/choice-based threat-of-shock doors task that was previously found to elicit the RewP as a reward signal. Moreover, we used principal components analysis to isolate the RewP from overlapping ERP components. Eighty participants viewed pairs of doors surrounded by a red or green border; shock delivery was expected (80%) following red-bordered doors and unexpected (20%) following green-bordered doors. The RewP was observed as a reward signal (i.e., no shock > shock) that was not potentiated for unexpected feedback. In addition, the RewP was larger overall for unexpected (vs expected) feedback. Therefore, the RewP appears to reflect the additive (not interactive) effects of reward and expectancy, challenging the RL theory of the RewP, at least when reward is defined as the absence of an aversive outcome.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
强化学习和奖励积极与厌恶的结果。
奖励积极性(RewP)是衡量奖励反应性的事件相关电位(ERP)组成部分,强化学习(RL)理论表明,当积极结果出乎意料并得到使用食欲结果(如金钱)的工作支持时,RewP应该最大。然而,RewP也可以由缺乏厌恶结果(如休克)引起。迄今为止,有限的研究在使用厌恶结果的同时操纵了预期,这并不支持强化学习理论的预测。然而,这项工作很难与食欲文献相一致,因为在这些研究中,RewP并没有被观察到作为奖励信号,这些研究使用的是不涉及参与者选择的被动任务。在这里,我们测试了RL理论的预测,通过在一个基于主动/选择的冲击威胁门任务中操纵期望,该任务之前被发现会引发RewP作为奖励信号。此外,我们使用主成分分析将RewP从重叠的ERP组件中分离出来。80名参与者观看被红色或绿色边框包围的成对门;红色镶边门后预期(80%)会有电击交货,绿色镶边门后意外交货(20%)。RewP被观察为奖励信号(即,无电击>电击),它不会因意外反馈而增强。此外,对于意外反馈(与预期反馈相比),RewP总体上更大。因此,RewP似乎反映了奖励和期望的加性(而非互动性)效应,挑战了RewP的强化学习理论,至少当奖励被定义为没有厌恶结果时是这样。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
How effort-based self-interest motivation shapes altruistic donation behavior and brain responses. Beyond peaks and troughs: Multiplexed performance monitoring signals in the EEG. Reduced reward responsiveness and depression vulnerability: Consideration of social contexts and implications for intervention. Moving toward reality: Electrocortical reactivity to naturalistic multimodal emotional videos. Mapping the routes of perception: Hemispheric asymmetries in signal propagation dynamics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1