Why learning progress needs absolute values: Comment on Poli et al. (2024)

IF 2.4 4区医学 Q3 NEUROSCIENCES European Journal of Neuroscience Pub Date : 2024-12-05 DOI:10.1111/ejn.16635

Augustin Chartouny, Benoît Girard, Mehdi Khamassi

{"title":"Why learning progress needs absolute values: Comment on Poli et al. (2024)","authors":"Augustin Chartouny, Benoît Girard, Mehdi Khamassi","doi":"10.1111/ejn.16635","DOIUrl":null,"url":null,"abstract":"In a recent issue of TiCS, Poli et al. (2024) reviewed the latest developments of computational models of curiosity in cognitive neuroscience and promoted learning progress as a key computational mechanism for optimal environmental exploration. Here, we want to emphasize results from the machine learning literature showing that their mathematical formula of learning progress may be sub-optimal. We present an alternative formulation of learning progress with absolute values, solving this problem. Learning progress with absolute values provides further insights into the decision-making mechanisms that may underlie exploration. It also demonstrates the need for new experiments to disambiguate all the existing interpretations of learning progress.Learning progress promotes exploration depending on how much an agent (e.g., human, animal or robot) is learning (Oudeyer et al., 2007). Agents should explore options for which they progress quickly because there is potentially more to learn. In contrast, agents should ignore options for which they have not made progress, as there might be nothing new to learn. Poli et al. suggest that a good proxy for learning progress is the change in prediction errors over time (Oudeyer et al., 2007). With this formulation, a decrease in prediction errors indicates that the agent is currently learning and should keep exploring to continue improving. Conversely, an increase in prediction errors makes the learning progress negative and should result in the agent avoiding options that become unpredictable. However, it has been shown in the machine learning literature that exploration should increase when prediction errors increase, either after a task change to adapt to the new task (Chartouny et al., 2024) or when the agent starts forgetting how to solve the task (Colas et al., 2019). Authors commonly use a formulation of learning progress with absolute values to induce exploration equally between increases and decreases of performance (Chartouny et al., 2024; Colas et al., 2019).Learning progress with absolute values seems more efficient from a machine learning perspective, but we argue that it also seems more promising in explaining human exploration. With absolute values, increases in prediction error induce curious behaviours. This is consistent with experimental results showing that humans explore more when tasks become suddenly surprising. For example, Collins and Koechlin (2012) reported that humans' exploratory response rates went from 5% in a stable environment to 40% three or four trials after a task change and slowly decreased back to 5% as the surprise vanished. Furthermore, Stahl and Feigenson (2015) demonstrated that infants explore and learn more about the properties of objects that surprise them. Finally, learning progress with absolute values explained significantly better human behaviour and pupil size variation than learning progress without absolute values in an arithmetic task with summations of varying difficulty (Sayalı et al., 2023). Thus, models of learning progress with absolute values have proven to be useful in cognitive neuroscience. Further research is required to see whether they more generally account for human exploratory behaviour in various situations and whether neural correlates of such a mechanism can be found in brain activity.As highlighted by Poli et al. (2024), learning progress has limitations. However, their claim that learning progress does not provide ‘how useful a given activity is to the agent's goal’ may be misleading. The article they cite praises the role of learning progress in goal-directed exploration (Molinaro & Collins, 2023). Molinaro and Collins state that ‘studies of human behavior have confirmed the prominent role of learning progress in dictating which goals people end up pursuing’. Thus, learning progress permits generating goals of increasing difficulty and achieving them. Moreover, one of the main applications of learning progress in the reinforcement learning literature is to learn tasks with multiple goals (Colas et al., 2022). This illustrates that learning progress is central to understanding goal-oriented behaviours. However, a formulation of learning progress as differences of prediction errors is not appropriate to study goal-oriented behaviours. Thus, authors use alternative formulations such as measures of competence (Colas et al., 2019). This demonstrates that there is no canonical learning progress formula, that there is a need for a taxonomy of all the learning progress formulations in the literature and that more experiments with humans should compare different learning progress formulations.The diversity of interpretations of learning progress comes from the fact that learning progress is a simple heuristic: If curiosity is a desire for knowledge (Kang et al., 2009), curious humans should focus on options that maximize what they learn. Based on this observation, many formulations of learning progress have been introduced, such as the decrease in prediction errors (Oudeyer et al., 2007; Poli et al., 2024), the absolute difference in percentage correct (Sayalı et al., 2023), the absolute difference in competence (Colas et al., 2019) and the variation of precision of an internal model (Chartouny et al., 2024). All these formulations share the idea that learning progress computes a variation of performance metrics, which permits to avoid tasks that are either trivial or impossible. Future studies should investigate how well each formulation explains human exploration on the same task.Augustin Chartouny: Writing—original draft. Benoît Girard: Supervision; validation; writing—original draft. Mehdi Khamassi: Supervision; validation; writing—original draft.The authors declare no conflict of interest.","PeriodicalId":11993,"journal":{"name":"European Journal of Neuroscience","volume":"61 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11664492/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/ejn.16635","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"NEUROSCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

In a recent issue of TiCS, Poli et al. (2024) reviewed the latest developments of computational models of curiosity in cognitive neuroscience and promoted learning progress as a key computational mechanism for optimal environmental exploration. Here, we want to emphasize results from the machine learning literature showing that their mathematical formula of learning progress may be sub-optimal. We present an alternative formulation of learning progress with absolute values, solving this problem. Learning progress with absolute values provides further insights into the decision-making mechanisms that may underlie exploration. It also demonstrates the need for new experiments to disambiguate all the existing interpretations of learning progress.

Learning progress promotes exploration depending on how much an agent (e.g., human, animal or robot) is learning (Oudeyer et al., 2007). Agents should explore options for which they progress quickly because there is potentially more to learn. In contrast, agents should ignore options for which they have not made progress, as there might be nothing new to learn. Poli et al. suggest that a good proxy for learning progress is the change in prediction errors over time (Oudeyer et al., 2007). With this formulation, a decrease in prediction errors indicates that the agent is currently learning and should keep exploring to continue improving. Conversely, an increase in prediction errors makes the learning progress negative and should result in the agent avoiding options that become unpredictable. However, it has been shown in the machine learning literature that exploration should increase when prediction errors increase, either after a task change to adapt to the new task (Chartouny et al., 2024) or when the agent starts forgetting how to solve the task (Colas et al., 2019). Authors commonly use a formulation of learning progress with absolute values to induce exploration equally between increases and decreases of performance (Chartouny et al., 2024; Colas et al., 2019).

Learning progress with absolute values seems more efficient from a machine learning perspective, but we argue that it also seems more promising in explaining human exploration. With absolute values, increases in prediction error induce curious behaviours. This is consistent with experimental results showing that humans explore more when tasks become suddenly surprising. For example, Collins and Koechlin (2012) reported that humans' exploratory response rates went from 5% in a stable environment to 40% three or four trials after a task change and slowly decreased back to 5% as the surprise vanished. Furthermore, Stahl and Feigenson (2015) demonstrated that infants explore and learn more about the properties of objects that surprise them. Finally, learning progress with absolute values explained significantly better human behaviour and pupil size variation than learning progress without absolute values in an arithmetic task with summations of varying difficulty (Sayalı et al., 2023). Thus, models of learning progress with absolute values have proven to be useful in cognitive neuroscience. Further research is required to see whether they more generally account for human exploratory behaviour in various situations and whether neural correlates of such a mechanism can be found in brain activity.

As highlighted by Poli et al. (2024), learning progress has limitations. However, their claim that learning progress does not provide ‘how useful a given activity is to the agent's goal’ may be misleading. The article they cite praises the role of learning progress in goal-directed exploration (Molinaro & Collins, 2023). Molinaro and Collins state that ‘studies of human behavior have confirmed the prominent role of learning progress in dictating which goals people end up pursuing’. Thus, learning progress permits generating goals of increasing difficulty and achieving them. Moreover, one of the main applications of learning progress in the reinforcement learning literature is to learn tasks with multiple goals (Colas et al., 2022). This illustrates that learning progress is central to understanding goal-oriented behaviours. However, a formulation of learning progress as differences of prediction errors is not appropriate to study goal-oriented behaviours. Thus, authors use alternative formulations such as measures of competence (Colas et al., 2019). This demonstrates that there is no canonical learning progress formula, that there is a need for a taxonomy of all the learning progress formulations in the literature and that more experiments with humans should compare different learning progress formulations.

The diversity of interpretations of learning progress comes from the fact that learning progress is a simple heuristic: If curiosity is a desire for knowledge (Kang et al., 2009), curious humans should focus on options that maximize what they learn. Based on this observation, many formulations of learning progress have been introduced, such as the decrease in prediction errors (Oudeyer et al., 2007; Poli et al., 2024), the absolute difference in percentage correct (Sayalı et al., 2023), the absolute difference in competence (Colas et al., 2019) and the variation of precision of an internal model (Chartouny et al., 2024). All these formulations share the idea that learning progress computes a variation of performance metrics, which permits to avoid tasks that are either trivial or impossible. Future studies should investigate how well each formulation explains human exploration on the same task.

Augustin Chartouny: Writing—original draft. Benoît Girard: Supervision; validation; writing—original draft. Mehdi Khamassi: Supervision; validation; writing—original draft.

The authors declare no conflict of interest.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

为什么学习进度需要绝对值：评Poli et al.（2024）。

在最近一期的tic中，Poli等人（2024）回顾了认知神经科学中好奇心计算模型的最新发展，并将学习进展作为最佳环境探索的关键计算机制。在这里，我们想强调机器学习文献的结果，表明他们的学习进度的数学公式可能是次优的。我们提出了另一种具有绝对值的学习进度公式来解决这个问题。具有绝对值的学习进度提供了对可能作为探索基础的决策机制的进一步了解。它还表明需要新的实验来消除对学习进展的所有现有解释的歧义。学习进度促进探索取决于代理（如人类、动物或机器人）学习的程度（Oudeyer等人，2007）。智能体应该探索那些他们进展迅速的选项，因为可能还有更多的东西需要学习。相比之下，代理应该忽略那些他们没有取得进展的选项，因为可能没有什么新的东西可以学习。Poli等人认为，预测误差随时间的变化是学习进度的一个很好的代表（Oudeyer等人，2007）。使用这种公式，预测误差的减少表明智能体目前正在学习，并且应该继续探索以继续改进。相反，预测误差的增加使学习进度为负，并且应该导致代理避免变得不可预测的选项。然而，机器学习文献已经表明，当预测误差增加时，探索应该增加，无论是在任务改变以适应新任务之后（Chartouny等人，2024），还是当代理开始忘记如何解决任务时（Colas等人，2019）。作者通常使用具有绝对值的学习进度公式来诱导对性能增加和减少的平等探索(Chartouny等人，2024；Colas et al., 2019)。从机器学习的角度来看，使用绝对值学习进度似乎更有效，但我们认为，在解释人类探索方面，它似乎也更有希望。对于绝对值，预测误差的增加会引起好奇行为。这与实验结果一致，实验结果表明，当任务突然变得令人惊讶时，人类会探索更多。例如，Collins和Koechlin（2012）报告说，人类的探索性反应率从稳定环境中的5%上升到任务变化后三到四次试验的40%，然后随着惊喜的消失慢慢下降到5%。此外，Stahl和Feigenson（2015）表明，婴儿会探索和学习更多让他们感到惊讶的物体的特性。最后，在具有不同难度求和的算术任务中，具有绝对值的学习进度比没有绝对值的学习进度明显更好地解释了人类行为和瞳孔大小的变化（sayalyi et al., 2023）。因此，具有绝对值的学习过程模型已被证明在认知神经科学中是有用的。需要进一步的研究来确定它们是否更普遍地解释了人类在各种情况下的探索行为，以及是否可以在大脑活动中发现这种机制的神经相关。正如Poli等人（2024）所强调的那样，学习进步是有局限性的。然而，他们声称学习进度并不能提供“给定的活动对智能体的目标有多有用”，这可能是一种误导。他们引用的文章赞扬了学习进步在目标导向探索中的作用(Molinaro &amp；柯林斯,2023)。Molinaro和Collins表示，“对人类行为的研究已经证实，学习过程在决定人们最终追求的目标方面发挥了重要作用。”因此，学习过程允许产生增加难度的目标并实现它们。此外，强化学习文献中学习进度的主要应用之一是学习具有多目标的任务（Colas et al., 2022）。这说明学习进度是理解目标导向行为的核心。然而，将学习进度表述为预测误差的差异并不适合研究目标导向行为。因此，作者使用了能力衡量等替代公式（Colas et al., 2019）。这表明，没有规范的学习进度公式，有必要对文献中所有的学习进度公式进行分类，更多的人类实验应该比较不同的学习进度公式。对学习进展的各种解释源于这样一个事实，即学习进展是一种简单的启发式：如果好奇心是对知识的渴望（Kang et al., 2009），那么好奇的人应该把注意力集中在最大化他们所学知识的选择上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

European Journal of Neuroscience 医学-神经科学

CiteScore

7.10

自引率

5.90%

发文量

305

审稿时长

3.5 months

期刊介绍： EJN is the journal of FENS and supports the international neuroscientific community by publishing original high quality research articles and reviews in all fields of neuroscience. In addition, to engage with issues that are of interest to the science community, we also publish Editorials, Meetings Reports and Neuro-Opinions on topics that are of current interest in the fields of neuroscience research and training in science. We have recently established a series of ‘Profiles of Women in Neuroscience’. Our goal is to provide a vehicle for publications that further the understanding of the structure and function of the nervous system in both health and disease and to provide a vehicle to engage the neuroscience community. As the official journal of FENS, profits from the journal are re-invested in the neuroscientific community through the activities of FENS.