Credit Assignment: Challenges and Opportunities in Developing Human-like Learning Agents

Thuy Ngoc Nguyen, Chase McDonald, Cleotilde Gonzalez
{"title":"Credit Assignment: Challenges and Opportunities in Developing Human-like Learning Agents","authors":"Thuy Ngoc Nguyen, Chase McDonald, Cleotilde Gonzalez","doi":"10.1609/aaaiss.v3i1.31180","DOIUrl":null,"url":null,"abstract":"Temporal credit assignment is the process of distributing delayed outcomes to each action in a sequence, which is essential for learning to adapt and make decisions in dynamic environments. While computational methods in reinforcement learning, such as temporal difference (TD), have shown success in tackling this issue, it remains unclear whether these mechanisms accurately reflect how humans handle feedback delays. Furthermore, cognitive science research has not fully explored the credit assignment problem in humans and cognitive models. Our study uses a cognitive model based on Instance-Based Learning Theory (IBLT) to investigate various credit assignment mechanisms, including equal credit, exponential credit, and TD credit, using the IBL decision mechanism in a goal-seeking navigation task with feedback delays and varying levels of decision complexity. We compare the performance and process measures of the different models with human decision-making in two experiments. Our findings indicate that the human learning process cannot be fully explained by any of the mechanisms. We also observe that decision complexity affects human behavior but not model behavior. By examining the similarities and differences between human and model behavior, we summarize the challenges and opportunities for developing learning agents that emulate human decisions in dynamic environments.","PeriodicalId":516827,"journal":{"name":"Proceedings of the AAAI Symposium Series","volume":"12 9","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the AAAI Symposium Series","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaaiss.v3i1.31180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Temporal credit assignment is the process of distributing delayed outcomes to each action in a sequence, which is essential for learning to adapt and make decisions in dynamic environments. While computational methods in reinforcement learning, such as temporal difference (TD), have shown success in tackling this issue, it remains unclear whether these mechanisms accurately reflect how humans handle feedback delays. Furthermore, cognitive science research has not fully explored the credit assignment problem in humans and cognitive models. Our study uses a cognitive model based on Instance-Based Learning Theory (IBLT) to investigate various credit assignment mechanisms, including equal credit, exponential credit, and TD credit, using the IBL decision mechanism in a goal-seeking navigation task with feedback delays and varying levels of decision complexity. We compare the performance and process measures of the different models with human decision-making in two experiments. Our findings indicate that the human learning process cannot be fully explained by any of the mechanisms. We also observe that decision complexity affects human behavior but not model behavior. By examining the similarities and differences between human and model behavior, we summarize the challenges and opportunities for developing learning agents that emulate human decisions in dynamic environments.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
学分作业:开发类人学习代理的挑战与机遇
时间学分分配是将延迟结果分配给序列中每个动作的过程,这对于在动态环境中学习适应和决策至关重要。虽然强化学习中的计算方法(如时间差(TD))在解决这一问题上取得了成功,但这些机制是否能准确反映人类如何处理反馈延迟仍不清楚。此外,认知科学研究尚未充分探讨人类和认知模型中的学分分配问题。我们的研究使用了基于实例学习理论(IBLT)的认知模型,在具有反馈延迟和不同决策复杂度的目标寻求导航任务中,利用 IBL 决策机制研究了各种学分分配机制,包括等额学分、指数学分和 TD 学分。我们在两个实验中将不同模型的性能和过程测量与人类决策进行了比较。我们的研究结果表明,人类的学习过程无法用任何一种机制来完全解释。我们还发现,决策复杂度会影响人类行为,但不会影响模型行为。通过研究人类行为与模型行为之间的异同,我们总结了在动态环境中开发模拟人类决策的学习代理所面临的挑战和机遇。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Modes of Tracking Mal-Info in Social Media with AI/ML Tools to Help Mitigate Harmful GenAI for Improved Societal Well Being Embodying Human-Like Modes of Balance Control Through Human-In-the-Loop Dyadic Learning Constructing Deep Concepts through Shallow Search Implications of Identity in AI: Creators, Creations, and Consequences ASMR: Aggregated Semantic Matching Retrieval Unleashing Commonsense Ability of LLM through Open-Ended Question Answering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1