{"title":"Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object Detection.","authors":"Hengsong Liu, Tongle Duan","doi":"10.3390/s25020553","DOIUrl":null,"url":null,"abstract":"<p><p>The multi-sensor fusion, such as LiDAR and camera-based 3D object detection, is a key technology in autonomous driving and robotics. However, traditional 3D detection models are limited to recognizing predefined categories and struggle with unknown or novel objects. Given the complexity of real-world environments, research into open-vocabulary 3D object detection is essential. Therefore, this paper aims to address two key issues in this area: how to localize and classify novel objects. We propose Cross-modal Collaboration and Robust Feature Classifier to improve localization accuracy and classification robustness for novel objects. The Cross-modal Collaboration involves the collaborative localization between LiDAR and camera. In this approach, 2D images provide preliminary regions of interest for novel objects in the 3D point cloud, while the 3D point cloud offers more precise positional information to the 2D images. Through iterative updates between two modalities, the preliminary region and positional information are refined, achieving the accurate localization of novel objects. The Robust Feature Classifier aims to accurately classify novel objects. To prevent them from being misidentified as background or other incorrect categories, this method maps the semantic vectors of new categories into multiple sets of visual features distinguished from the background. And it clusters these visual features based on each individual semantic vector to maintain inter-class separability. Our method achieves state-of-the-art performance on various scenarios and datasets.</p>","PeriodicalId":21698,"journal":{"name":"Sensors","volume":"25 2","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11768536/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sensors","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.3390/s25020553","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0
Abstract
The multi-sensor fusion, such as LiDAR and camera-based 3D object detection, is a key technology in autonomous driving and robotics. However, traditional 3D detection models are limited to recognizing predefined categories and struggle with unknown or novel objects. Given the complexity of real-world environments, research into open-vocabulary 3D object detection is essential. Therefore, this paper aims to address two key issues in this area: how to localize and classify novel objects. We propose Cross-modal Collaboration and Robust Feature Classifier to improve localization accuracy and classification robustness for novel objects. The Cross-modal Collaboration involves the collaborative localization between LiDAR and camera. In this approach, 2D images provide preliminary regions of interest for novel objects in the 3D point cloud, while the 3D point cloud offers more precise positional information to the 2D images. Through iterative updates between two modalities, the preliminary region and positional information are refined, achieving the accurate localization of novel objects. The Robust Feature Classifier aims to accurately classify novel objects. To prevent them from being misidentified as background or other incorrect categories, this method maps the semantic vectors of new categories into multiple sets of visual features distinguished from the background. And it clusters these visual features based on each individual semantic vector to maintain inter-class separability. Our method achieves state-of-the-art performance on various scenarios and datasets.
期刊介绍:
Sensors (ISSN 1424-8220) provides an advanced forum for the science and technology of sensors and biosensors. It publishes reviews (including comprehensive reviews on the complete sensors products), regular research papers and short notes. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced.