{"title":"为什么学习进度需要绝对值:评Poli et al.(2024)。","authors":"Augustin Chartouny, Benoît Girard, Mehdi Khamassi","doi":"10.1111/ejn.16635","DOIUrl":null,"url":null,"abstract":"<p>In a recent issue of TiCS, Poli et al. (<span>2024</span>) reviewed the latest developments of computational models of curiosity in cognitive neuroscience and promoted learning progress as a key computational mechanism for optimal environmental exploration. Here, we want to emphasize results from the machine learning literature showing that their mathematical formula of learning progress may be sub-optimal. We present an alternative formulation of learning progress with absolute values, solving this problem. Learning progress with absolute values provides further insights into the decision-making mechanisms that may underlie exploration. It also demonstrates the need for new experiments to disambiguate all the existing interpretations of learning progress.</p><p>Learning progress promotes exploration depending on how much an agent (e.g., human, animal or robot) is learning (Oudeyer et al., <span>2007</span>). Agents should explore options for which they progress quickly because there is potentially more to learn. In contrast, agents should ignore options for which they have not made progress, as there might be nothing new to learn. Poli et al. suggest that a good proxy for learning progress is the change in prediction errors over time (Oudeyer et al., <span>2007</span>). With this formulation, a decrease in prediction errors indicates that the agent is currently learning and should keep exploring to continue improving. Conversely, an increase in prediction errors makes the learning progress negative and should result in the agent avoiding options that become unpredictable. However, it has been shown in the machine learning literature that exploration should increase when prediction errors increase, either after a task change to adapt to the new task (Chartouny et al., <span>2024</span>) or when the agent starts forgetting how to solve the task (Colas et al., <span>2019</span>). Authors commonly use a formulation of learning progress with absolute values to induce exploration equally between increases and decreases of performance (Chartouny et al., <span>2024</span>; Colas et al., <span>2019</span>).</p><p>Learning progress with absolute values seems more efficient from a machine learning perspective, but we argue that it also seems more promising in explaining human exploration. With absolute values, increases in prediction error induce curious behaviours. This is consistent with experimental results showing that humans explore more when tasks become suddenly surprising. For example, Collins and Koechlin (<span>2012</span>) reported that humans' exploratory response rates went from 5% in a stable environment to 40% three or four trials after a task change and slowly decreased back to 5% as the surprise vanished. Furthermore, Stahl and Feigenson (<span>2015</span>) demonstrated that infants explore and learn more about the properties of objects that surprise them. Finally, learning progress with absolute values explained significantly better human behaviour and pupil size variation than learning progress without absolute values in an arithmetic task with summations of varying difficulty (Sayalı et al., <span>2023</span>). Thus, models of learning progress with absolute values have proven to be useful in cognitive neuroscience. Further research is required to see whether they more generally account for human exploratory behaviour in various situations and whether neural correlates of such a mechanism can be found in brain activity.</p><p>As highlighted by Poli et al. (<span>2024</span>), learning progress has limitations. However, their claim that learning progress does not provide ‘how useful a given activity is to the agent's goal’ may be misleading. The article they cite praises the role of learning progress in goal-directed exploration (Molinaro & Collins, <span>2023</span>). Molinaro and Collins state that ‘studies of human behavior have confirmed the prominent role of learning progress in dictating which goals people end up pursuing’. Thus, learning progress permits generating goals of increasing difficulty and achieving them. Moreover, one of the main applications of learning progress in the reinforcement learning literature is to learn tasks with multiple goals (Colas et al., <span>2022</span>). This illustrates that learning progress is central to understanding goal-oriented behaviours. However, a formulation of learning progress as differences of prediction errors is not appropriate to study goal-oriented behaviours. Thus, authors use alternative formulations such as measures of competence (Colas et al., <span>2019</span>). This demonstrates that there is no canonical learning progress formula, that there is a need for a taxonomy of all the learning progress formulations in the literature and that more experiments with humans should compare different learning progress formulations.</p><p>The diversity of interpretations of learning progress comes from the fact that learning progress is a simple heuristic: If curiosity is a desire for knowledge (Kang et al., <span>2009</span>), curious humans should focus on options that maximize what they learn. Based on this observation, many formulations of learning progress have been introduced, such as the decrease in prediction errors (Oudeyer et al., <span>2007</span>; Poli et al., <span>2024</span>), the absolute difference in percentage correct (Sayalı et al., <span>2023</span>), the absolute difference in competence (Colas et al., <span>2019</span>) and the variation of precision of an internal model (Chartouny et al., <span>2024</span>). All these formulations share the idea that learning progress computes a variation of performance metrics, which permits to avoid tasks that are either trivial or impossible. Future studies should investigate how well each formulation explains human exploration on the same task.</p><p><b>Augustin Chartouny:</b> Writing—original draft. <b>Benoît Girard:</b> Supervision; validation; writing—original draft. <b>Mehdi Khamassi:</b> Supervision; validation; writing—original draft.</p><p>The authors declare no conflict of interest.</p>","PeriodicalId":11993,"journal":{"name":"European Journal of Neuroscience","volume":"61 1","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11664492/pdf/","citationCount":"0","resultStr":"{\"title\":\"Why learning progress needs absolute values: Comment on Poli et al. (2024)\",\"authors\":\"Augustin Chartouny, Benoît Girard, Mehdi Khamassi\",\"doi\":\"10.1111/ejn.16635\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In a recent issue of TiCS, Poli et al. (<span>2024</span>) reviewed the latest developments of computational models of curiosity in cognitive neuroscience and promoted learning progress as a key computational mechanism for optimal environmental exploration. Here, we want to emphasize results from the machine learning literature showing that their mathematical formula of learning progress may be sub-optimal. We present an alternative formulation of learning progress with absolute values, solving this problem. Learning progress with absolute values provides further insights into the decision-making mechanisms that may underlie exploration. It also demonstrates the need for new experiments to disambiguate all the existing interpretations of learning progress.</p><p>Learning progress promotes exploration depending on how much an agent (e.g., human, animal or robot) is learning (Oudeyer et al., <span>2007</span>). Agents should explore options for which they progress quickly because there is potentially more to learn. In contrast, agents should ignore options for which they have not made progress, as there might be nothing new to learn. Poli et al. suggest that a good proxy for learning progress is the change in prediction errors over time (Oudeyer et al., <span>2007</span>). With this formulation, a decrease in prediction errors indicates that the agent is currently learning and should keep exploring to continue improving. Conversely, an increase in prediction errors makes the learning progress negative and should result in the agent avoiding options that become unpredictable. However, it has been shown in the machine learning literature that exploration should increase when prediction errors increase, either after a task change to adapt to the new task (Chartouny et al., <span>2024</span>) or when the agent starts forgetting how to solve the task (Colas et al., <span>2019</span>). Authors commonly use a formulation of learning progress with absolute values to induce exploration equally between increases and decreases of performance (Chartouny et al., <span>2024</span>; Colas et al., <span>2019</span>).</p><p>Learning progress with absolute values seems more efficient from a machine learning perspective, but we argue that it also seems more promising in explaining human exploration. With absolute values, increases in prediction error induce curious behaviours. This is consistent with experimental results showing that humans explore more when tasks become suddenly surprising. For example, Collins and Koechlin (<span>2012</span>) reported that humans' exploratory response rates went from 5% in a stable environment to 40% three or four trials after a task change and slowly decreased back to 5% as the surprise vanished. Furthermore, Stahl and Feigenson (<span>2015</span>) demonstrated that infants explore and learn more about the properties of objects that surprise them. Finally, learning progress with absolute values explained significantly better human behaviour and pupil size variation than learning progress without absolute values in an arithmetic task with summations of varying difficulty (Sayalı et al., <span>2023</span>). Thus, models of learning progress with absolute values have proven to be useful in cognitive neuroscience. Further research is required to see whether they more generally account for human exploratory behaviour in various situations and whether neural correlates of such a mechanism can be found in brain activity.</p><p>As highlighted by Poli et al. (<span>2024</span>), learning progress has limitations. However, their claim that learning progress does not provide ‘how useful a given activity is to the agent's goal’ may be misleading. The article they cite praises the role of learning progress in goal-directed exploration (Molinaro & Collins, <span>2023</span>). Molinaro and Collins state that ‘studies of human behavior have confirmed the prominent role of learning progress in dictating which goals people end up pursuing’. Thus, learning progress permits generating goals of increasing difficulty and achieving them. Moreover, one of the main applications of learning progress in the reinforcement learning literature is to learn tasks with multiple goals (Colas et al., <span>2022</span>). This illustrates that learning progress is central to understanding goal-oriented behaviours. However, a formulation of learning progress as differences of prediction errors is not appropriate to study goal-oriented behaviours. Thus, authors use alternative formulations such as measures of competence (Colas et al., <span>2019</span>). This demonstrates that there is no canonical learning progress formula, that there is a need for a taxonomy of all the learning progress formulations in the literature and that more experiments with humans should compare different learning progress formulations.</p><p>The diversity of interpretations of learning progress comes from the fact that learning progress is a simple heuristic: If curiosity is a desire for knowledge (Kang et al., <span>2009</span>), curious humans should focus on options that maximize what they learn. Based on this observation, many formulations of learning progress have been introduced, such as the decrease in prediction errors (Oudeyer et al., <span>2007</span>; Poli et al., <span>2024</span>), the absolute difference in percentage correct (Sayalı et al., <span>2023</span>), the absolute difference in competence (Colas et al., <span>2019</span>) and the variation of precision of an internal model (Chartouny et al., <span>2024</span>). All these formulations share the idea that learning progress computes a variation of performance metrics, which permits to avoid tasks that are either trivial or impossible. Future studies should investigate how well each formulation explains human exploration on the same task.</p><p><b>Augustin Chartouny:</b> Writing—original draft. <b>Benoît Girard:</b> Supervision; validation; writing—original draft. <b>Mehdi Khamassi:</b> Supervision; validation; writing—original draft.</p><p>The authors declare no conflict of interest.</p>\",\"PeriodicalId\":11993,\"journal\":{\"name\":\"European Journal of Neuroscience\",\"volume\":\"61 1\",\"pages\":\"\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11664492/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Journal of Neuroscience\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/ejn.16635\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"NEUROSCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/ejn.16635","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
Why learning progress needs absolute values: Comment on Poli et al. (2024)
In a recent issue of TiCS, Poli et al. (2024) reviewed the latest developments of computational models of curiosity in cognitive neuroscience and promoted learning progress as a key computational mechanism for optimal environmental exploration. Here, we want to emphasize results from the machine learning literature showing that their mathematical formula of learning progress may be sub-optimal. We present an alternative formulation of learning progress with absolute values, solving this problem. Learning progress with absolute values provides further insights into the decision-making mechanisms that may underlie exploration. It also demonstrates the need for new experiments to disambiguate all the existing interpretations of learning progress.
Learning progress promotes exploration depending on how much an agent (e.g., human, animal or robot) is learning (Oudeyer et al., 2007). Agents should explore options for which they progress quickly because there is potentially more to learn. In contrast, agents should ignore options for which they have not made progress, as there might be nothing new to learn. Poli et al. suggest that a good proxy for learning progress is the change in prediction errors over time (Oudeyer et al., 2007). With this formulation, a decrease in prediction errors indicates that the agent is currently learning and should keep exploring to continue improving. Conversely, an increase in prediction errors makes the learning progress negative and should result in the agent avoiding options that become unpredictable. However, it has been shown in the machine learning literature that exploration should increase when prediction errors increase, either after a task change to adapt to the new task (Chartouny et al., 2024) or when the agent starts forgetting how to solve the task (Colas et al., 2019). Authors commonly use a formulation of learning progress with absolute values to induce exploration equally between increases and decreases of performance (Chartouny et al., 2024; Colas et al., 2019).
Learning progress with absolute values seems more efficient from a machine learning perspective, but we argue that it also seems more promising in explaining human exploration. With absolute values, increases in prediction error induce curious behaviours. This is consistent with experimental results showing that humans explore more when tasks become suddenly surprising. For example, Collins and Koechlin (2012) reported that humans' exploratory response rates went from 5% in a stable environment to 40% three or four trials after a task change and slowly decreased back to 5% as the surprise vanished. Furthermore, Stahl and Feigenson (2015) demonstrated that infants explore and learn more about the properties of objects that surprise them. Finally, learning progress with absolute values explained significantly better human behaviour and pupil size variation than learning progress without absolute values in an arithmetic task with summations of varying difficulty (Sayalı et al., 2023). Thus, models of learning progress with absolute values have proven to be useful in cognitive neuroscience. Further research is required to see whether they more generally account for human exploratory behaviour in various situations and whether neural correlates of such a mechanism can be found in brain activity.
As highlighted by Poli et al. (2024), learning progress has limitations. However, their claim that learning progress does not provide ‘how useful a given activity is to the agent's goal’ may be misleading. The article they cite praises the role of learning progress in goal-directed exploration (Molinaro & Collins, 2023). Molinaro and Collins state that ‘studies of human behavior have confirmed the prominent role of learning progress in dictating which goals people end up pursuing’. Thus, learning progress permits generating goals of increasing difficulty and achieving them. Moreover, one of the main applications of learning progress in the reinforcement learning literature is to learn tasks with multiple goals (Colas et al., 2022). This illustrates that learning progress is central to understanding goal-oriented behaviours. However, a formulation of learning progress as differences of prediction errors is not appropriate to study goal-oriented behaviours. Thus, authors use alternative formulations such as measures of competence (Colas et al., 2019). This demonstrates that there is no canonical learning progress formula, that there is a need for a taxonomy of all the learning progress formulations in the literature and that more experiments with humans should compare different learning progress formulations.
The diversity of interpretations of learning progress comes from the fact that learning progress is a simple heuristic: If curiosity is a desire for knowledge (Kang et al., 2009), curious humans should focus on options that maximize what they learn. Based on this observation, many formulations of learning progress have been introduced, such as the decrease in prediction errors (Oudeyer et al., 2007; Poli et al., 2024), the absolute difference in percentage correct (Sayalı et al., 2023), the absolute difference in competence (Colas et al., 2019) and the variation of precision of an internal model (Chartouny et al., 2024). All these formulations share the idea that learning progress computes a variation of performance metrics, which permits to avoid tasks that are either trivial or impossible. Future studies should investigate how well each formulation explains human exploration on the same task.
期刊介绍:
EJN is the journal of FENS and supports the international neuroscientific community by publishing original high quality research articles and reviews in all fields of neuroscience. In addition, to engage with issues that are of interest to the science community, we also publish Editorials, Meetings Reports and Neuro-Opinions on topics that are of current interest in the fields of neuroscience research and training in science. We have recently established a series of ‘Profiles of Women in Neuroscience’. Our goal is to provide a vehicle for publications that further the understanding of the structure and function of the nervous system in both health and disease and to provide a vehicle to engage the neuroscience community. As the official journal of FENS, profits from the journal are re-invested in the neuroscientific community through the activities of FENS.