强化学习和奖励积极与厌恶的结果。

Psychophysiology Pub Date : 2024-04-01 Epub Date: 2023-11-22 DOI:10.1111/psyp.14460

Elizabeth A Bauer, Brandon K Watanabe, Annmarie MacNamara

{"title":"强化学习和奖励积极与厌恶的结果。","authors":"Elizabeth A Bauer, Brandon K Watanabe, Annmarie MacNamara","doi":"10.1111/psyp.14460","DOIUrl":null,"url":null,"abstract":"The reinforcement learning (RL) theory of the reward positivity (RewP), an event-related potential (ERP) component that measures reward responsivity, suggests that the RewP should be largest when positive outcomes are unexpected and has been supported by work using appetitive outcomes (e.g., money). However, the RewP can also be elicited by the absence of aversive outcomes (e.g., shock). The limited work to-date that has manipulated expectancy while using aversive outcomes has not supported the predictions of RL theory. Nonetheless, this work has been difficult to reconcile with the appetitive literature because the RewP was not observed as a reward signal in these studies, which used passive tasks that did not involve participant choice. Here, we tested the predictions of the RL theory by manipulating expectancy in an active/choice-based threat-of-shock doors task that was previously found to elicit the RewP as a reward signal. Moreover, we used principal components analysis to isolate the RewP from overlapping ERP components. Eighty participants viewed pairs of doors surrounded by a red or green border; shock delivery was expected (80%) following red-bordered doors and unexpected (20%) following green-bordered doors. The RewP was observed as a reward signal (i.e., no shock > shock) that was not potentiated for unexpected feedback. In addition, the RewP was larger overall for unexpected (vs expected) feedback. Therefore, the RewP appears to reflect the additive (not interactive) effects of reward and expectancy, challenging the RL theory of the RewP, at least when reward is defined as the absence of an aversive outcome.","PeriodicalId":94182,"journal":{"name":"Psychophysiology","volume":" ","pages":"e14460"},"PeriodicalIF":0.0000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10939817/pdf/","citationCount":"0","resultStr":"{\"title\":\"Reinforcement learning and the reward positivity with aversive outcomes.\",\"authors\":\"Elizabeth A Bauer, Brandon K Watanabe, Annmarie MacNamara\",\"doi\":\"10.1111/psyp.14460\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The reinforcement learning (RL) theory of the reward positivity (RewP), an event-related potential (ERP) component that measures reward responsivity, suggests that the RewP should be largest when positive outcomes are unexpected and has been supported by work using appetitive outcomes (e.g., money). However, the RewP can also be elicited by the absence of aversive outcomes (e.g., shock). The limited work to-date that has manipulated expectancy while using aversive outcomes has not supported the predictions of RL theory. Nonetheless, this work has been difficult to reconcile with the appetitive literature because the RewP was not observed as a reward signal in these studies, which used passive tasks that did not involve participant choice. Here, we tested the predictions of the RL theory by manipulating expectancy in an active/choice-based threat-of-shock doors task that was previously found to elicit the RewP as a reward signal. Moreover, we used principal components analysis to isolate the RewP from overlapping ERP components. Eighty participants viewed pairs of doors surrounded by a red or green border; shock delivery was expected (80%) following red-bordered doors and unexpected (20%) following green-bordered doors. The RewP was observed as a reward signal (i.e., no shock > shock) that was not potentiated for unexpected feedback. In addition, the RewP was larger overall for unexpected (vs expected) feedback. Therefore, the RewP appears to reflect the additive (not interactive) effects of reward and expectancy, challenging the RL theory of the RewP, at least when reward is defined as the absence of an aversive outcome.\",\"PeriodicalId\":94182,\"journal\":{\"name\":\"Psychophysiology\",\"volume\":\" \",\"pages\":\"e14460\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10939817/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Psychophysiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1111/psyp.14460\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/11/22 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychophysiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/psyp.14460","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/11/22 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

奖励积极性(RewP)是衡量奖励反应性的事件相关电位(ERP)组成部分，强化学习(RL)理论表明，当积极结果出乎意料并得到使用食欲结果(如金钱)的工作支持时，RewP应该最大。然而，RewP也可以由缺乏厌恶结果(如休克)引起。迄今为止，有限的研究在使用厌恶结果的同时操纵了预期，这并不支持强化学习理论的预测。然而，这项工作很难与食欲文献相一致，因为在这些研究中，RewP并没有被观察到作为奖励信号，这些研究使用的是不涉及参与者选择的被动任务。在这里，我们测试了RL理论的预测，通过在一个基于主动/选择的冲击威胁门任务中操纵期望，该任务之前被发现会引发RewP作为奖励信号。此外，我们使用主成分分析将RewP从重叠的ERP组件中分离出来。80名参与者观看被红色或绿色边框包围的成对门;红色镶边门后预期(80%)会有电击交货，绿色镶边门后意外交货(20%)。RewP被观察为奖励信号(即，无电击>电击)，它不会因意外反馈而增强。此外，对于意外反馈(与预期反馈相比)，RewP总体上更大。因此，RewP似乎反映了奖励和期望的加性(而非互动性)效应，挑战了RewP的强化学习理论，至少当奖励被定义为没有厌恶结果时是这样。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Reinforcement learning and the reward positivity with aversive outcomes.

The reinforcement learning (RL) theory of the reward positivity (RewP), an event-related potential (ERP) component that measures reward responsivity, suggests that the RewP should be largest when positive outcomes are unexpected and has been supported by work using appetitive outcomes (e.g., money). However, the RewP can also be elicited by the absence of aversive outcomes (e.g., shock). The limited work to-date that has manipulated expectancy while using aversive outcomes has not supported the predictions of RL theory. Nonetheless, this work has been difficult to reconcile with the appetitive literature because the RewP was not observed as a reward signal in these studies, which used passive tasks that did not involve participant choice. Here, we tested the predictions of the RL theory by manipulating expectancy in an active/choice-based threat-of-shock doors task that was previously found to elicit the RewP as a reward signal. Moreover, we used principal components analysis to isolate the RewP from overlapping ERP components. Eighty participants viewed pairs of doors surrounded by a red or green border; shock delivery was expected (80%) following red-bordered doors and unexpected (20%) following green-bordered doors. The RewP was observed as a reward signal (i.e., no shock > shock) that was not potentiated for unexpected feedback. In addition, the RewP was larger overall for unexpected (vs expected) feedback. Therefore, the RewP appears to reflect the additive (not interactive) effects of reward and expectancy, challenging the RL theory of the RewP, at least when reward is defined as the absence of an aversive outcome.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Psychophysiology

自引率

0.00%

发文量