Understanding reinforcement learned crowds

Q4 Computer Science Computer Graphics World Pub Date : 2022-09-19 DOI:10.48550/arXiv.2209.09344
Ariel Kwiatkowski, Vicky S. Kalogeiton, Julien Pettr'e, Marie-Paule Cani
{"title":"Understanding reinforcement learned crowds","authors":"Ariel Kwiatkowski, Vicky S. Kalogeiton, Julien Pettr'e, Marie-Paule Cani","doi":"10.48550/arXiv.2209.09344","DOIUrl":null,"url":null,"abstract":"Simulating trajectories of virtual crowds is a commonly encountered task in Computer Graphics. Several recent works have applied Reinforcement Learning methods to animate virtual agents, however they often make different design choices when it comes to the fundamental simulation setup. Each of these choices comes with a reasonable justification for its use, so it is not obvious what is their real impact, and how they affect the results. In this work, we analyze some of these arbitrary choices in terms of their impact on the learning performance, as well as the quality of the resulting simulation measured in terms of the energy efficiency. We perform a theoretical analysis of the properties of the reward function design, and empirically evaluate the impact of using certain observation and action spaces on a variety of scenarios, with the reward function and energy usage as metrics. We show that directly using the neighboring agents' information as observation generally outperforms the more widely used raycasting. Similarly, using nonholonomic controls with egocentric observations tends to produce more efficient behaviors than holonomic controls with absolute observations. Each of these choices has a significant, and potentially nontrivial impact on the results, and so researchers should be mindful about choosing and reporting them in their work.","PeriodicalId":51003,"journal":{"name":"Computer Graphics World","volume":"37 1","pages":"28-37"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Graphics World","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2209.09344","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 5

Abstract

Simulating trajectories of virtual crowds is a commonly encountered task in Computer Graphics. Several recent works have applied Reinforcement Learning methods to animate virtual agents, however they often make different design choices when it comes to the fundamental simulation setup. Each of these choices comes with a reasonable justification for its use, so it is not obvious what is their real impact, and how they affect the results. In this work, we analyze some of these arbitrary choices in terms of their impact on the learning performance, as well as the quality of the resulting simulation measured in terms of the energy efficiency. We perform a theoretical analysis of the properties of the reward function design, and empirically evaluate the impact of using certain observation and action spaces on a variety of scenarios, with the reward function and energy usage as metrics. We show that directly using the neighboring agents' information as observation generally outperforms the more widely used raycasting. Similarly, using nonholonomic controls with egocentric observations tends to produce more efficient behaviors than holonomic controls with absolute observations. Each of these choices has a significant, and potentially nontrivial impact on the results, and so researchers should be mindful about choosing and reporting them in their work.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
理解强化学习群体
模拟虚拟人群的运动轨迹是计算机图形学中经常遇到的问题。最近的一些作品已经应用了强化学习方法来动画虚拟代理,但是当涉及到基本的模拟设置时,它们通常会做出不同的设计选择。每种选择都有其合理的使用理由,因此它们的真正影响是什么,以及它们如何影响结果并不明显。在这项工作中,我们分析了其中一些任意选择对学习性能的影响,以及根据能源效率测量的结果模拟的质量。我们对奖励函数设计的属性进行了理论分析,并以奖励函数和能量使用为指标,实证评估了在各种场景中使用特定观察和行动空间的影响。我们表明,直接使用相邻代理的信息作为观测通常优于更广泛使用的射线投射。同样,使用以自我为中心观察的非完整控制往往比使用绝对观察的完整控制产生更有效的行为。这些选择中的每一个都对结果有重要的,潜在的重要影响,因此研究人员应该注意在他们的工作中选择和报告它们。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computer Graphics World
Computer Graphics World 工程技术-计算机:软件工程
CiteScore
0.03
自引率
0.00%
发文量
0
审稿时长
>12 weeks
期刊最新文献
TARig: Adaptive template-aware neural rigging for humanoid characters Numerical approximations for energy preserving microfacet models Image super-resolution with multi-scale fractal residual attention network An overview on Meta-learning approaches for Few-shot Weakly-supervised Segmentation Omnidirectional visual computing: Foundations, challenges, and applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1