Prompting Robotic Modalities (PRM): A structured architecture for centralizing language models in complex systems

IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-05-01 Epub Date: 2025-01-31 DOI:10.1016/j.future.2025.107723
Bilel Benjdira, Anis Koubaa, Anas M. Ali
{"title":"Prompting Robotic Modalities (PRM): A structured architecture for centralizing language models in complex systems","authors":"Bilel Benjdira,&nbsp;Anis Koubaa,&nbsp;Anas M. Ali","doi":"10.1016/j.future.2025.107723","DOIUrl":null,"url":null,"abstract":"<div><div>Despite significant advancements in robotics and AI, existing systems often struggle to integrate diverse modalities (e.g., image, sound, actuator data) into a unified framework, resulting in fragmented architectures that limit adaptability, scalability, and explainability. To address these gaps, this paper introduces Prompting Robotic Modalities (PRM), a novel architecture that centralizes language models for controlling and managing complex systems through natural language. In PRM, each system modality (e.g., image, sound, actuator) is handled independently by a Modality Language Model (MLM), while a central Task Modality, powered by a Large Language Model (LLM), orchestrates complex tasks using information from the MLMs. Each MLM is trained on datasets that pair modality-specific data with rich textual descriptions, enabling intuitive, language-based interaction. We validate PRM with two main contributions: (1) ROSGPT_Vision, a new open-source ROS 2 package (available at <span><span>https://github.com/bilel-bj/ROSGPT_Vision</span><svg><path></path></svg></span>) for visual modality tasks, achieving up to 66% classification accuracy in driver-focus monitoring—surpassing other tested models in its category; and (2) CarMate, a driver-distraction detection application that significantly reduces development time and cost by allowing rapid adaptation to new monitoring tasks via simple prompt adjustments. In addition, we develop a Navigation Language Model (NLM) that converts free-form human language orders into detailed ROS commands, underscoring PRM’s modality-agnostic adaptability. Experimental results demonstrate that PRM simplifies system development, outperforms baseline vision-language approaches in specialized tasks (e.g., driver monitoring), reduces complexity through prompt engineering rather than extensive coding, and enhances explainability via natural-language-based diagnostics. Hence, PRM lays a promising foundation for next-generation complex and robotic systems by integrating advanced language model capabilities at their core, making them more adaptable to new environments, cost-effective, and user-friendly.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"166 ","pages":"Article 107723"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25000184","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/31 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Despite significant advancements in robotics and AI, existing systems often struggle to integrate diverse modalities (e.g., image, sound, actuator data) into a unified framework, resulting in fragmented architectures that limit adaptability, scalability, and explainability. To address these gaps, this paper introduces Prompting Robotic Modalities (PRM), a novel architecture that centralizes language models for controlling and managing complex systems through natural language. In PRM, each system modality (e.g., image, sound, actuator) is handled independently by a Modality Language Model (MLM), while a central Task Modality, powered by a Large Language Model (LLM), orchestrates complex tasks using information from the MLMs. Each MLM is trained on datasets that pair modality-specific data with rich textual descriptions, enabling intuitive, language-based interaction. We validate PRM with two main contributions: (1) ROSGPT_Vision, a new open-source ROS 2 package (available at https://github.com/bilel-bj/ROSGPT_Vision) for visual modality tasks, achieving up to 66% classification accuracy in driver-focus monitoring—surpassing other tested models in its category; and (2) CarMate, a driver-distraction detection application that significantly reduces development time and cost by allowing rapid adaptation to new monitoring tasks via simple prompt adjustments. In addition, we develop a Navigation Language Model (NLM) that converts free-form human language orders into detailed ROS commands, underscoring PRM’s modality-agnostic adaptability. Experimental results demonstrate that PRM simplifies system development, outperforms baseline vision-language approaches in specialized tasks (e.g., driver monitoring), reduces complexity through prompt engineering rather than extensive coding, and enhances explainability via natural-language-based diagnostics. Hence, PRM lays a promising foundation for next-generation complex and robotic systems by integrating advanced language model capabilities at their core, making them more adaptable to new environments, cost-effective, and user-friendly.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
提示机器人模式(PRM):用于在复杂系统中集中语言模型的结构化体系结构
尽管机器人技术和人工智能取得了重大进步,但现有系统往往难以将不同的模式(例如,图像、声音、执行器数据)集成到一个统一的框架中,从而导致碎片化的架构限制了适应性、可扩展性和可解释性。为了解决这些差距,本文介绍了提示机器人模式(PRM),这是一种新颖的架构,它集中了通过自然语言控制和管理复杂系统的语言模型。在PRM中,每个系统模态(例如,图像、声音、执行器)由模态语言模型(MLM)独立处理,而由大型语言模型(LLM)提供支持的中央任务模态使用来自MLM的信息协调复杂任务。每个传销都在数据集上进行训练,这些数据集将特定于模态的数据与丰富的文本描述配对,从而实现直观的、基于语言的交互。我们通过两个主要贡献验证了PRM:(1) ROSGPT_Vision,一个用于视觉模态任务的新的开源ROS 2包(可在https://github.com/bilel-bj/ROSGPT_Vision上获得),在驾驶员焦点监控中实现了高达66%的分类准确率,超过了同类其他测试模型;(2) CarMate,这是一款驾驶员分心检测应用程序,通过简单的快速调整,可以快速适应新的监控任务,从而大大缩短了开发时间和成本。此外,我们开发了一个导航语言模型(NLM),将自由形式的人类语言顺序转换为详细的ROS命令,强调了PRM的模态不可知适应性。实验结果表明,PRM简化了系统开发,在专门任务(例如,驾驶员监控)中优于基线视觉语言方法,通过快速工程而不是大量编码降低了复杂性,并通过基于自然语言的诊断增强了可解释性。因此,PRM通过在其核心集成先进的语言模型能力,为下一代复杂和机器人系统奠定了一个有希望的基础,使它们更能适应新的环境,具有成本效益,并且用户友好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
19.90
自引率
2.70%
发文量
376
审稿时长
10.6 months
期刊介绍: Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.
期刊最新文献
Blockchain architectures for enhancing EV infrastructure security: A unified framework for addressing sophisticated cyber-attacks Applying quantum error-correcting codes for fault-tolerant blind quantum cloud computation A swarm intelligence enabled multi-agent reinforcement learning scheme for computational task offloading in internet of things blockchain KnowAIDE: A fAIR-compliant data environment to accelerate AI research Non-intrusive kernel-level dispatching for MQTT shared subscriptions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1