{"title":"Logit prototype learning with active multimodal representation for robust open-set recognition","authors":"Yimin Fu, Zhunga Liu, Zicheng Wang","doi":"10.1007/s11432-023-3924-x","DOIUrl":null,"url":null,"abstract":"<p>Robust open-set recognition (OSR) performance has become a prerequisite for pattern recognition systems in real-world applications. However, the existing OSR methods are primarily implemented on the basis of single-modal perception, and their performance is limited when single-modal data fail to provide sufficient descriptions of the objects. Although multimodal data can provide more comprehensive information than single-modal data, the learning of decision boundaries can be affected by the feature representation gap between different modalities. To effectively integrate multimodal data for robust OSR performance, we propose logit prototype learning (LPL) with active multimodal representation. In LPL, the input multimodal data are transformed into the logit space, enabling a direct exploration of intermodal correlations without the impact of scale inconsistency. Then, the fusion weights of each modality are determined using an entropybased uncertainty estimation method. This approach realizes adaptive adjustment of the fusion strategy to provide comprehensive descriptions in the presence of external disturbances. Moreover, the single-modal and multimodal representations are jointly optimized interactively to learn discriminative decision boundaries. Finally, a stepwise recognition rule is employed to reduce the misclassification risk and facilitate the distinction between known and unknown classes. Extensive experiments on three multimodal datasets have been done to demonstrate the effectiveness of the proposed method.</p>","PeriodicalId":21618,"journal":{"name":"Science China Information Sciences","volume":"8 1","pages":""},"PeriodicalIF":7.3000,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science China Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11432-023-3924-x","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Robust open-set recognition (OSR) performance has become a prerequisite for pattern recognition systems in real-world applications. However, the existing OSR methods are primarily implemented on the basis of single-modal perception, and their performance is limited when single-modal data fail to provide sufficient descriptions of the objects. Although multimodal data can provide more comprehensive information than single-modal data, the learning of decision boundaries can be affected by the feature representation gap between different modalities. To effectively integrate multimodal data for robust OSR performance, we propose logit prototype learning (LPL) with active multimodal representation. In LPL, the input multimodal data are transformed into the logit space, enabling a direct exploration of intermodal correlations without the impact of scale inconsistency. Then, the fusion weights of each modality are determined using an entropybased uncertainty estimation method. This approach realizes adaptive adjustment of the fusion strategy to provide comprehensive descriptions in the presence of external disturbances. Moreover, the single-modal and multimodal representations are jointly optimized interactively to learn discriminative decision boundaries. Finally, a stepwise recognition rule is employed to reduce the misclassification risk and facilitate the distinction between known and unknown classes. Extensive experiments on three multimodal datasets have been done to demonstrate the effectiveness of the proposed method.
期刊介绍:
Science China Information Sciences is a dedicated journal that showcases high-quality, original research across various domains of information sciences. It encompasses Computer Science & Technologies, Control Science & Engineering, Information & Communication Engineering, Microelectronics & Solid-State Electronics, and Quantum Information, providing a platform for the dissemination of significant contributions in these fields.