Multimodal Perception and Decision-Making Systems for Complex Roads Based on Foundation Models

IF 8.6 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Systems Man Cybernetics-Systems Pub Date : 2024-10-07 DOI:10.1109/TSMC.2024.3444277

Lili Fan;Yutong Wang;Hui Zhang;Changxian Zeng;Yunjie Li;Chao Gou;Hui Yu

{"title":"Multimodal Perception and Decision-Making Systems for Complex Roads Based on Foundation Models","authors":"Lili Fan;Yutong Wang;Hui Zhang;Changxian Zeng;Yunjie Li;Chao Gou;Hui Yu","doi":"10.1109/TSMC.2024.3444277","DOIUrl":null,"url":null,"abstract":"Since the inception of Industry 5.0 in 2021, a growing number of researchers have begun to pay their attention to the revolutionary shift it brings. The principles of Industry 5.0, including human-centric, sustainability, and emphasis on ecological and social values, will become the new paradigm for future industrial development. In this transformative landscape, artificial intelligence (AI) plays a pivotal role, and foundation models based on ChatGPT are set to reshape the organizational structure of industries. In this article, we introduce a multimodal perception and decision-making system built upon a foundational model. This system integrates image and point cloud data to enhance perception accuracy and provide ample information for decision making. It is designed to achieve a deep integration of AI and human-centric autonomous driving within the context of Industry 5.0. We introduce a cross-domain learning approach in the system architecture, along with a model training method from foundation models to handle complex road conditions. The proposed method enables road drivable area segmentation on complex unstructured roads. To address the issue of increased variance caused by the residual structure employed in previous works, this article introduces a distribution correction module, which effectively mitigates this problem. Furthermore, to achieve high-performance perception systems in intricate road scenarios, we put forth a multimodal perception fusion method in this study. The experiments demonstrate the superiority of this approach over single-sensor perception. This work contributes to the ongoing discourse on the convergence of AI, human-centric values, and advanced driving systems within the framework of Industry 5.0.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"54 11","pages":"6561-6569"},"PeriodicalIF":8.6000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man Cybernetics-Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10706115/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Since the inception of Industry 5.0 in 2021, a growing number of researchers have begun to pay their attention to the revolutionary shift it brings. The principles of Industry 5.0, including human-centric, sustainability, and emphasis on ecological and social values, will become the new paradigm for future industrial development. In this transformative landscape, artificial intelligence (AI) plays a pivotal role, and foundation models based on ChatGPT are set to reshape the organizational structure of industries. In this article, we introduce a multimodal perception and decision-making system built upon a foundational model. This system integrates image and point cloud data to enhance perception accuracy and provide ample information for decision making. It is designed to achieve a deep integration of AI and human-centric autonomous driving within the context of Industry 5.0. We introduce a cross-domain learning approach in the system architecture, along with a model training method from foundation models to handle complex road conditions. The proposed method enables road drivable area segmentation on complex unstructured roads. To address the issue of increased variance caused by the residual structure employed in previous works, this article introduces a distribution correction module, which effectively mitigates this problem. Furthermore, to achieve high-performance perception systems in intricate road scenarios, we put forth a multimodal perception fusion method in this study. The experiments demonstrate the superiority of this approach over single-sensor perception. This work contributes to the ongoing discourse on the convergence of AI, human-centric values, and advanced driving systems within the framework of Industry 5.0.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于地基模型的复杂道路多模式感知与决策系统

自 2021 年工业 5.0 诞生以来，越来越多的研究人员开始关注它所带来的革命性转变。工业 5.0 的原则，包括以人为本、可持续发展、重视生态和社会价值，将成为未来工业发展的新范式。在这一变革格局中，人工智能（AI）扮演着举足轻重的角色，基于 ChatGPT 的基础模型必将重塑工业的组织结构。在本文中，我们将介绍一个建立在基础模型上的多模态感知和决策系统。该系统整合了图像和点云数据，以提高感知精度，并为决策提供充足的信息。它旨在实现工业 5.0 背景下人工智能与以人为本的自动驾驶的深度融合。我们在系统架构中引入了一种跨领域学习方法，以及一种从基础模型出发的模型训练方法，以处理复杂路况。所提出的方法可在复杂的非结构化道路上实现道路可驾驶区域分割。针对以往研究中采用的残差结构导致方差增大的问题，本文引入了分布校正模块，有效缓解了这一问题。此外，为了在错综复杂的道路场景中实现高性能的感知系统，我们在本研究中提出了一种多模态感知融合方法。实验证明了这种方法优于单传感器感知。在工业 5.0 的框架内，本研究为当前有关人工智能、以人为本的价值观和先进驾驶系统融合的讨论做出了贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Systems Man Cybernetics-Systems AUTOMATION & CONTROL SYSTEMS-COMPUTER SCIENCE, CYBERNETICS

CiteScore

18.50

自引率

11.50%

发文量

812

审稿时长

6 months

期刊介绍： The IEEE Transactions on Systems, Man, and Cybernetics: Systems encompasses the fields of systems engineering, covering issue formulation, analysis, and modeling throughout the systems engineering lifecycle phases. It addresses decision-making, issue interpretation, systems management, processes, and various methods such as optimization, modeling, and simulation in the development and deployment of large systems.

期刊最新文献

Table of Contents Table of Contents Introducing IEEE Collabratec Information For Authors IEEE Transactions on Systems, Man, and Cybernetics publication information