Why learning progress needs absolute values: Comment on Poli et al. (2024)

IF 2.7 4区 医学 Q3 NEUROSCIENCES European Journal of Neuroscience Pub Date : 2024-12-05 DOI:10.1111/ejn.16635
Augustin Chartouny, Benoît Girard, Mehdi Khamassi
{"title":"Why learning progress needs absolute values: Comment on Poli et al. (2024)","authors":"Augustin Chartouny,&nbsp;Benoît Girard,&nbsp;Mehdi Khamassi","doi":"10.1111/ejn.16635","DOIUrl":null,"url":null,"abstract":"<p>In a recent issue of TiCS, Poli et al. (<span>2024</span>) reviewed the latest developments of computational models of curiosity in cognitive neuroscience and promoted learning progress as a key computational mechanism for optimal environmental exploration. Here, we want to emphasize results from the machine learning literature showing that their mathematical formula of learning progress may be sub-optimal. We present an alternative formulation of learning progress with absolute values, solving this problem. Learning progress with absolute values provides further insights into the decision-making mechanisms that may underlie exploration. It also demonstrates the need for new experiments to disambiguate all the existing interpretations of learning progress.</p><p>Learning progress promotes exploration depending on how much an agent (e.g., human, animal or robot) is learning (Oudeyer et al., <span>2007</span>). Agents should explore options for which they progress quickly because there is potentially more to learn. In contrast, agents should ignore options for which they have not made progress, as there might be nothing new to learn. Poli et al. suggest that a good proxy for learning progress is the change in prediction errors over time (Oudeyer et al., <span>2007</span>). With this formulation, a decrease in prediction errors indicates that the agent is currently learning and should keep exploring to continue improving. Conversely, an increase in prediction errors makes the learning progress negative and should result in the agent avoiding options that become unpredictable. However, it has been shown in the machine learning literature that exploration should increase when prediction errors increase, either after a task change to adapt to the new task (Chartouny et al., <span>2024</span>) or when the agent starts forgetting how to solve the task (Colas et al., <span>2019</span>). Authors commonly use a formulation of learning progress with absolute values to induce exploration equally between increases and decreases of performance (Chartouny et al., <span>2024</span>; Colas et al., <span>2019</span>).</p><p>Learning progress with absolute values seems more efficient from a machine learning perspective, but we argue that it also seems more promising in explaining human exploration. With absolute values, increases in prediction error induce curious behaviours. This is consistent with experimental results showing that humans explore more when tasks become suddenly surprising. For example, Collins and Koechlin (<span>2012</span>) reported that humans' exploratory response rates went from 5% in a stable environment to 40% three or four trials after a task change and slowly decreased back to 5% as the surprise vanished. Furthermore, Stahl and Feigenson (<span>2015</span>) demonstrated that infants explore and learn more about the properties of objects that surprise them. Finally, learning progress with absolute values explained significantly better human behaviour and pupil size variation than learning progress without absolute values in an arithmetic task with summations of varying difficulty (Sayalı et al., <span>2023</span>). Thus, models of learning progress with absolute values have proven to be useful in cognitive neuroscience. Further research is required to see whether they more generally account for human exploratory behaviour in various situations and whether neural correlates of such a mechanism can be found in brain activity.</p><p>As highlighted by Poli et al. (<span>2024</span>), learning progress has limitations. However, their claim that learning progress does not provide ‘how useful a given activity is to the agent's goal’ may be misleading. The article they cite praises the role of learning progress in goal-directed exploration (Molinaro &amp; Collins, <span>2023</span>). Molinaro and Collins state that ‘studies of human behavior have confirmed the prominent role of learning progress in dictating which goals people end up pursuing’. Thus, learning progress permits generating goals of increasing difficulty and achieving them. Moreover, one of the main applications of learning progress in the reinforcement learning literature is to learn tasks with multiple goals (Colas et al., <span>2022</span>). This illustrates that learning progress is central to understanding goal-oriented behaviours. However, a formulation of learning progress as differences of prediction errors is not appropriate to study goal-oriented behaviours. Thus, authors use alternative formulations such as measures of competence (Colas et al., <span>2019</span>). This demonstrates that there is no canonical learning progress formula, that there is a need for a taxonomy of all the learning progress formulations in the literature and that more experiments with humans should compare different learning progress formulations.</p><p>The diversity of interpretations of learning progress comes from the fact that learning progress is a simple heuristic: If curiosity is a desire for knowledge (Kang et al., <span>2009</span>), curious humans should focus on options that maximize what they learn. Based on this observation, many formulations of learning progress have been introduced, such as the decrease in prediction errors (Oudeyer et al., <span>2007</span>; Poli et al., <span>2024</span>), the absolute difference in percentage correct (Sayalı et al., <span>2023</span>), the absolute difference in competence (Colas et al., <span>2019</span>) and the variation of precision of an internal model (Chartouny et al., <span>2024</span>). All these formulations share the idea that learning progress computes a variation of performance metrics, which permits to avoid tasks that are either trivial or impossible. Future studies should investigate how well each formulation explains human exploration on the same task.</p><p><b>Augustin Chartouny:</b> Writing—original draft. <b>Benoît Girard:</b> Supervision; validation; writing—original draft. <b>Mehdi Khamassi:</b> Supervision; validation; writing—original draft.</p><p>The authors declare no conflict of interest.</p>","PeriodicalId":11993,"journal":{"name":"European Journal of Neuroscience","volume":"61 1","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11664492/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/ejn.16635","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

In a recent issue of TiCS, Poli et al. (2024) reviewed the latest developments of computational models of curiosity in cognitive neuroscience and promoted learning progress as a key computational mechanism for optimal environmental exploration. Here, we want to emphasize results from the machine learning literature showing that their mathematical formula of learning progress may be sub-optimal. We present an alternative formulation of learning progress with absolute values, solving this problem. Learning progress with absolute values provides further insights into the decision-making mechanisms that may underlie exploration. It also demonstrates the need for new experiments to disambiguate all the existing interpretations of learning progress.

Learning progress promotes exploration depending on how much an agent (e.g., human, animal or robot) is learning (Oudeyer et al., 2007). Agents should explore options for which they progress quickly because there is potentially more to learn. In contrast, agents should ignore options for which they have not made progress, as there might be nothing new to learn. Poli et al. suggest that a good proxy for learning progress is the change in prediction errors over time (Oudeyer et al., 2007). With this formulation, a decrease in prediction errors indicates that the agent is currently learning and should keep exploring to continue improving. Conversely, an increase in prediction errors makes the learning progress negative and should result in the agent avoiding options that become unpredictable. However, it has been shown in the machine learning literature that exploration should increase when prediction errors increase, either after a task change to adapt to the new task (Chartouny et al., 2024) or when the agent starts forgetting how to solve the task (Colas et al., 2019). Authors commonly use a formulation of learning progress with absolute values to induce exploration equally between increases and decreases of performance (Chartouny et al., 2024; Colas et al., 2019).

Learning progress with absolute values seems more efficient from a machine learning perspective, but we argue that it also seems more promising in explaining human exploration. With absolute values, increases in prediction error induce curious behaviours. This is consistent with experimental results showing that humans explore more when tasks become suddenly surprising. For example, Collins and Koechlin (2012) reported that humans' exploratory response rates went from 5% in a stable environment to 40% three or four trials after a task change and slowly decreased back to 5% as the surprise vanished. Furthermore, Stahl and Feigenson (2015) demonstrated that infants explore and learn more about the properties of objects that surprise them. Finally, learning progress with absolute values explained significantly better human behaviour and pupil size variation than learning progress without absolute values in an arithmetic task with summations of varying difficulty (Sayalı et al., 2023). Thus, models of learning progress with absolute values have proven to be useful in cognitive neuroscience. Further research is required to see whether they more generally account for human exploratory behaviour in various situations and whether neural correlates of such a mechanism can be found in brain activity.

As highlighted by Poli et al. (2024), learning progress has limitations. However, their claim that learning progress does not provide ‘how useful a given activity is to the agent's goal’ may be misleading. The article they cite praises the role of learning progress in goal-directed exploration (Molinaro & Collins, 2023). Molinaro and Collins state that ‘studies of human behavior have confirmed the prominent role of learning progress in dictating which goals people end up pursuing’. Thus, learning progress permits generating goals of increasing difficulty and achieving them. Moreover, one of the main applications of learning progress in the reinforcement learning literature is to learn tasks with multiple goals (Colas et al., 2022). This illustrates that learning progress is central to understanding goal-oriented behaviours. However, a formulation of learning progress as differences of prediction errors is not appropriate to study goal-oriented behaviours. Thus, authors use alternative formulations such as measures of competence (Colas et al., 2019). This demonstrates that there is no canonical learning progress formula, that there is a need for a taxonomy of all the learning progress formulations in the literature and that more experiments with humans should compare different learning progress formulations.

The diversity of interpretations of learning progress comes from the fact that learning progress is a simple heuristic: If curiosity is a desire for knowledge (Kang et al., 2009), curious humans should focus on options that maximize what they learn. Based on this observation, many formulations of learning progress have been introduced, such as the decrease in prediction errors (Oudeyer et al., 2007; Poli et al., 2024), the absolute difference in percentage correct (Sayalı et al., 2023), the absolute difference in competence (Colas et al., 2019) and the variation of precision of an internal model (Chartouny et al., 2024). All these formulations share the idea that learning progress computes a variation of performance metrics, which permits to avoid tasks that are either trivial or impossible. Future studies should investigate how well each formulation explains human exploration on the same task.

Augustin Chartouny: Writing—original draft. Benoît Girard: Supervision; validation; writing—original draft. Mehdi Khamassi: Supervision; validation; writing—original draft.

The authors declare no conflict of interest.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为什么学习进度需要绝对值:评Poli et al.(2024)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
European Journal of Neuroscience
European Journal of Neuroscience 医学-神经科学
CiteScore
7.10
自引率
5.90%
发文量
305
审稿时长
3.5 months
期刊介绍: EJN is the journal of FENS and supports the international neuroscientific community by publishing original high quality research articles and reviews in all fields of neuroscience. In addition, to engage with issues that are of interest to the science community, we also publish Editorials, Meetings Reports and Neuro-Opinions on topics that are of current interest in the fields of neuroscience research and training in science. We have recently established a series of ‘Profiles of Women in Neuroscience’. Our goal is to provide a vehicle for publications that further the understanding of the structure and function of the nervous system in both health and disease and to provide a vehicle to engage the neuroscience community. As the official journal of FENS, profits from the journal are re-invested in the neuroscientific community through the activities of FENS.
期刊最新文献
Geometry as a Guide: Enclosure Effects on Spatial Mapping (Commentary on Xu et al. 2024) NREM Sleep EEG Characteristics Following Written Trauma Memory Exposure: An Exploratory Study of Power and Spindle-Phase Dynamics Tacit Creationism Encourages Oversimplified Views of Functions and Dysfunctions Is the Subthalamic Nucleus Sleeping Under Nitrous Oxide–Ketamine General Anesthesia? Neurophysiological Treatment Effects of Mesdopetam, Pimavanserin and Amantadine in a Rodent Model of Levodopa-Induced Dyskinesia
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1