Evaluating Large Language Models with RAG Capability: A Perspective from Robot Behavior Planning and Execution

Jin Yamanaka, Takashi Kido
{"title":"Evaluating Large Language Models with RAG Capability: A Perspective from Robot Behavior Planning and Execution","authors":"Jin Yamanaka, Takashi Kido","doi":"10.1609/aaaiss.v3i1.31254","DOIUrl":null,"url":null,"abstract":"After the significant performance of Large Language Models (LLMs) was revealed, their capabilities were rapidly expanded with techniques such as Retrieval Augmented Generation (RAG). Given their broad applicability and fast development, it's crucial to consider their impact on social systems. On the other hand, assessing these advanced LLMs poses challenges due to their extensive capabilities and the complex nature of social systems.\n\nIn this study, we pay attention to the similarity between LLMs in social systems and humanoid robots in open environments. We enumerate the essential components required for controlling humanoids in problem solving which help us explore the core capabilities of LLMs and assess the effects of any deficiencies within these components. This approach is justified because the effectiveness of humanoid systems has been thoroughly proven and acknowledged. To identify needed components for humanoids in problem-solving tasks, we create an extensive component framework for planning and controlling humanoid robots in an open environment. Then assess the impacts and risks of LLMs for each component, referencing the latest benchmarks to evaluate their current strengths and weaknesses. Following the assessment guided by our framework, we identified certain capabilities that LLMs lack and concerns in social systems.","PeriodicalId":516827,"journal":{"name":"Proceedings of the AAAI Symposium Series","volume":"83 7","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the AAAI Symposium Series","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaaiss.v3i1.31254","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

After the significant performance of Large Language Models (LLMs) was revealed, their capabilities were rapidly expanded with techniques such as Retrieval Augmented Generation (RAG). Given their broad applicability and fast development, it's crucial to consider their impact on social systems. On the other hand, assessing these advanced LLMs poses challenges due to their extensive capabilities and the complex nature of social systems. In this study, we pay attention to the similarity between LLMs in social systems and humanoid robots in open environments. We enumerate the essential components required for controlling humanoids in problem solving which help us explore the core capabilities of LLMs and assess the effects of any deficiencies within these components. This approach is justified because the effectiveness of humanoid systems has been thoroughly proven and acknowledged. To identify needed components for humanoids in problem-solving tasks, we create an extensive component framework for planning and controlling humanoid robots in an open environment. Then assess the impacts and risks of LLMs for each component, referencing the latest benchmarks to evaluate their current strengths and weaknesses. Following the assessment guided by our framework, we identified certain capabilities that LLMs lack and concerns in social systems.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估具有 RAG 功能的大型语言模型:机器人行为规划与执行视角
大语言模型(LLM)的显著性能被揭示出来后,其功能通过检索增强生成(RAG)等技术得到了迅速扩展。鉴于其广泛的适用性和快速的发展,考虑其对社会系统的影响至关重要。在本研究中,我们关注了社会系统中的 LLM 与开放环境中的仿人机器人之间的相似性。我们列举了在解决问题过程中控制仿人机器人所需的基本组件,这有助于我们探索 LLM 的核心能力,并评估这些组件中任何不足之处的影响。这种方法是合理的,因为仿人系统的有效性已得到充分证明和认可。为了确定仿人机器人在解决问题任务中所需的组件,我们创建了一个广泛的组件框架,用于在开放环境中规划和控制仿人机器人。然后,参考最新基准评估每个组件的 LLM 影响和风险,以评估其当前的优缺点。在我们的框架指导下进行评估后,我们确定了 LLMs 所缺乏的某些能力以及社会系统中存在的问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Modes of Tracking Mal-Info in Social Media with AI/ML Tools to Help Mitigate Harmful GenAI for Improved Societal Well Being Embodying Human-Like Modes of Balance Control Through Human-In-the-Loop Dyadic Learning Constructing Deep Concepts through Shallow Search Implications of Identity in AI: Creators, Creations, and Consequences ASMR: Aggregated Semantic Matching Retrieval Unleashing Commonsense Ability of LLM through Open-Ended Question Answering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1