Riccardo Bovo, Steven Abreu, Karan Ahuja, Eric J Gonzalez, Li-Te Cheng, Mar Gonzalez-Franco
{"title":"EmBARDiment: an Embodied AI Agent for Productivity in XR","authors":"Riccardo Bovo, Steven Abreu, Karan Ahuja, Eric J Gonzalez, Li-Te Cheng, Mar Gonzalez-Franco","doi":"arxiv-2408.08158","DOIUrl":null,"url":null,"abstract":"XR devices running chat-bots powered by Large Language Models (LLMs) have\ntremendous potential as always-on agents that can enable much better\nproductivity scenarios. However, screen based chat-bots do not take advantage\nof the the full-suite of natural inputs available in XR, including inward\nfacing sensor data, instead they over-rely on explicit voice or text prompts,\nsometimes paired with multi-modal data dropped as part of the query. We propose\na solution that leverages an attention framework that derives context\nimplicitly from user actions, eye-gaze, and contextual memory within the XR\nenvironment. This minimizes the need for engineered explicit prompts, fostering\ngrounded and intuitive interactions that glean user insights for the chat-bot.\nOur user studies demonstrate the imminent feasibility and transformative\npotential of our approach to streamline user interaction in XR with chat-bots,\nwhile offering insights for the design of future XR-embodied LLM agents.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"43 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.08158","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
XR devices running chat-bots powered by Large Language Models (LLMs) have
tremendous potential as always-on agents that can enable much better
productivity scenarios. However, screen based chat-bots do not take advantage
of the the full-suite of natural inputs available in XR, including inward
facing sensor data, instead they over-rely on explicit voice or text prompts,
sometimes paired with multi-modal data dropped as part of the query. We propose
a solution that leverages an attention framework that derives context
implicitly from user actions, eye-gaze, and contextual memory within the XR
environment. This minimizes the need for engineered explicit prompts, fostering
grounded and intuitive interactions that glean user insights for the chat-bot.
Our user studies demonstrate the imminent feasibility and transformative
potential of our approach to streamline user interaction in XR with chat-bots,
while offering insights for the design of future XR-embodied LLM agents.