利用主动多模态表示的对数原型学习,实现稳健的开放集识别

IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Science China Information Sciences Pub Date : 2024-05-17 DOI:10.1007/s11432-023-3924-x
Yimin Fu, Zhunga Liu, Zicheng Wang
{"title":"利用主动多模态表示的对数原型学习,实现稳健的开放集识别","authors":"Yimin Fu, Zhunga Liu, Zicheng Wang","doi":"10.1007/s11432-023-3924-x","DOIUrl":null,"url":null,"abstract":"<p>Robust open-set recognition (OSR) performance has become a prerequisite for pattern recognition systems in real-world applications. However, the existing OSR methods are primarily implemented on the basis of single-modal perception, and their performance is limited when single-modal data fail to provide sufficient descriptions of the objects. Although multimodal data can provide more comprehensive information than single-modal data, the learning of decision boundaries can be affected by the feature representation gap between different modalities. To effectively integrate multimodal data for robust OSR performance, we propose logit prototype learning (LPL) with active multimodal representation. In LPL, the input multimodal data are transformed into the logit space, enabling a direct exploration of intermodal correlations without the impact of scale inconsistency. Then, the fusion weights of each modality are determined using an entropybased uncertainty estimation method. This approach realizes adaptive adjustment of the fusion strategy to provide comprehensive descriptions in the presence of external disturbances. Moreover, the single-modal and multimodal representations are jointly optimized interactively to learn discriminative decision boundaries. Finally, a stepwise recognition rule is employed to reduce the misclassification risk and facilitate the distinction between known and unknown classes. Extensive experiments on three multimodal datasets have been done to demonstrate the effectiveness of the proposed method.</p>","PeriodicalId":21618,"journal":{"name":"Science China Information Sciences","volume":null,"pages":null},"PeriodicalIF":7.3000,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Logit prototype learning with active multimodal representation for robust open-set recognition\",\"authors\":\"Yimin Fu, Zhunga Liu, Zicheng Wang\",\"doi\":\"10.1007/s11432-023-3924-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Robust open-set recognition (OSR) performance has become a prerequisite for pattern recognition systems in real-world applications. However, the existing OSR methods are primarily implemented on the basis of single-modal perception, and their performance is limited when single-modal data fail to provide sufficient descriptions of the objects. Although multimodal data can provide more comprehensive information than single-modal data, the learning of decision boundaries can be affected by the feature representation gap between different modalities. To effectively integrate multimodal data for robust OSR performance, we propose logit prototype learning (LPL) with active multimodal representation. In LPL, the input multimodal data are transformed into the logit space, enabling a direct exploration of intermodal correlations without the impact of scale inconsistency. Then, the fusion weights of each modality are determined using an entropybased uncertainty estimation method. This approach realizes adaptive adjustment of the fusion strategy to provide comprehensive descriptions in the presence of external disturbances. Moreover, the single-modal and multimodal representations are jointly optimized interactively to learn discriminative decision boundaries. Finally, a stepwise recognition rule is employed to reduce the misclassification risk and facilitate the distinction between known and unknown classes. Extensive experiments on three multimodal datasets have been done to demonstrate the effectiveness of the proposed method.</p>\",\"PeriodicalId\":21618,\"journal\":{\"name\":\"Science China Information Sciences\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2024-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Science China Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11432-023-3924-x\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science China Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11432-023-3924-x","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

强大的开放集识别(OSR)性能已成为实际应用中模式识别系统的先决条件。然而,现有的开放集识别方法主要是基于单模态感知实现的,当单模态数据无法提供足够的物体描述时,其性能就会受到限制。虽然多模态数据能提供比单模态数据更全面的信息,但决策边界的学习会受到不同模态之间特征表示差距的影响。为了有效地整合多模态数据,实现稳健的 OSR 性能,我们提出了具有主动多模态表征的 logit 原型学习(LPL)。在 LPL 中,输入的多模态数据被转换到 logit 空间,从而可以直接探索模态间的相关性,而不会受到尺度不一致的影响。然后,使用基于熵的不确定性估计方法确定每种模态的融合权重。这种方法实现了融合策略的自适应调整,从而在存在外部干扰的情况下提供全面的描述。此外,对单模态和多模态表征进行交互式联合优化,以学习判别决策边界。最后,采用逐步识别规则来降低误分类风险,并促进已知和未知类别之间的区分。我们在三个多模态数据集上进行了广泛的实验,以证明所提方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Logit prototype learning with active multimodal representation for robust open-set recognition

Robust open-set recognition (OSR) performance has become a prerequisite for pattern recognition systems in real-world applications. However, the existing OSR methods are primarily implemented on the basis of single-modal perception, and their performance is limited when single-modal data fail to provide sufficient descriptions of the objects. Although multimodal data can provide more comprehensive information than single-modal data, the learning of decision boundaries can be affected by the feature representation gap between different modalities. To effectively integrate multimodal data for robust OSR performance, we propose logit prototype learning (LPL) with active multimodal representation. In LPL, the input multimodal data are transformed into the logit space, enabling a direct exploration of intermodal correlations without the impact of scale inconsistency. Then, the fusion weights of each modality are determined using an entropybased uncertainty estimation method. This approach realizes adaptive adjustment of the fusion strategy to provide comprehensive descriptions in the presence of external disturbances. Moreover, the single-modal and multimodal representations are jointly optimized interactively to learn discriminative decision boundaries. Finally, a stepwise recognition rule is employed to reduce the misclassification risk and facilitate the distinction between known and unknown classes. Extensive experiments on three multimodal datasets have been done to demonstrate the effectiveness of the proposed method.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Science China Information Sciences
Science China Information Sciences COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
12.60
自引率
5.70%
发文量
224
审稿时长
8.3 months
期刊介绍: Science China Information Sciences is a dedicated journal that showcases high-quality, original research across various domains of information sciences. It encompasses Computer Science & Technologies, Control Science & Engineering, Information & Communication Engineering, Microelectronics & Solid-State Electronics, and Quantum Information, providing a platform for the dissemination of significant contributions in these fields.
期刊最新文献
Weighted sum power maximization for STAR-RIS-aided SWIPT systems with nonlinear energy harvesting TSCompiler: efficient compilation framework for dynamic-shape models NeurDB: an AI-powered autonomous data system State and parameter identification of linearized water wave equation via adjoint method An STP look at logical blocking of finite state machines: formulation, detection, and search
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1