{"title":"Multimodal Perception and Decision-Making Systems for Complex Roads Based on Foundation Models","authors":"Lili Fan;Yutong Wang;Hui Zhang;Changxian Zeng;Yunjie Li;Chao Gou;Hui Yu","doi":"10.1109/TSMC.2024.3444277","DOIUrl":null,"url":null,"abstract":"Since the inception of Industry 5.0 in 2021, a growing number of researchers have begun to pay their attention to the revolutionary shift it brings. The principles of Industry 5.0, including human-centric, sustainability, and emphasis on ecological and social values, will become the new paradigm for future industrial development. In this transformative landscape, artificial intelligence (AI) plays a pivotal role, and foundation models based on ChatGPT are set to reshape the organizational structure of industries. In this article, we introduce a multimodal perception and decision-making system built upon a foundational model. This system integrates image and point cloud data to enhance perception accuracy and provide ample information for decision making. It is designed to achieve a deep integration of AI and human-centric autonomous driving within the context of Industry 5.0. We introduce a cross-domain learning approach in the system architecture, along with a model training method from foundation models to handle complex road conditions. The proposed method enables road drivable area segmentation on complex unstructured roads. To address the issue of increased variance caused by the residual structure employed in previous works, this article introduces a distribution correction module, which effectively mitigates this problem. Furthermore, to achieve high-performance perception systems in intricate road scenarios, we put forth a multimodal perception fusion method in this study. The experiments demonstrate the superiority of this approach over single-sensor perception. This work contributes to the ongoing discourse on the convergence of AI, human-centric values, and advanced driving systems within the framework of Industry 5.0.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"54 11","pages":"6561-6569"},"PeriodicalIF":8.6000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man Cybernetics-Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10706115/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Since the inception of Industry 5.0 in 2021, a growing number of researchers have begun to pay their attention to the revolutionary shift it brings. The principles of Industry 5.0, including human-centric, sustainability, and emphasis on ecological and social values, will become the new paradigm for future industrial development. In this transformative landscape, artificial intelligence (AI) plays a pivotal role, and foundation models based on ChatGPT are set to reshape the organizational structure of industries. In this article, we introduce a multimodal perception and decision-making system built upon a foundational model. This system integrates image and point cloud data to enhance perception accuracy and provide ample information for decision making. It is designed to achieve a deep integration of AI and human-centric autonomous driving within the context of Industry 5.0. We introduce a cross-domain learning approach in the system architecture, along with a model training method from foundation models to handle complex road conditions. The proposed method enables road drivable area segmentation on complex unstructured roads. To address the issue of increased variance caused by the residual structure employed in previous works, this article introduces a distribution correction module, which effectively mitigates this problem. Furthermore, to achieve high-performance perception systems in intricate road scenarios, we put forth a multimodal perception fusion method in this study. The experiments demonstrate the superiority of this approach over single-sensor perception. This work contributes to the ongoing discourse on the convergence of AI, human-centric values, and advanced driving systems within the framework of Industry 5.0.
期刊介绍:
The IEEE Transactions on Systems, Man, and Cybernetics: Systems encompasses the fields of systems engineering, covering issue formulation, analysis, and modeling throughout the systems engineering lifecycle phases. It addresses decision-making, issue interpretation, systems management, processes, and various methods such as optimization, modeling, and simulation in the development and deployment of large systems.