用于部分多视图不完整多标签分类的任务增强型交叉视图估算网络

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-12 DOI:arxiv-2409.07931

Xiaohuan Lu, Lian Zhao, Wai Keung Wong, Jie Wen, Jiang Long, Wulin Xie

{"title":"用于部分多视图不完整多标签分类的任务增强型交叉视图估算网络","authors":"Xiaohuan Lu, Lian Zhao, Wai Keung Wong, Jie Wen, Jiang Long, Wulin Xie","doi":"arxiv-2409.07931","DOIUrl":null,"url":null,"abstract":"In real-world scenarios, multi-view multi-label learning often encounters the\nchallenge of incomplete training data due to limitations in data collection and\nunreliable annotation processes. The absence of multi-view features impairs the\ncomprehensive understanding of samples, omitting crucial details essential for\nclassification. To address this issue, we present a task-augmented cross-view\nimputation network (TACVI-Net) for the purpose of handling partial multi-view\nincomplete multi-label classification. Specifically, we employ a two-stage\nnetwork to derive highly task-relevant features to recover the missing views.\nIn the first stage, we leverage the information bottleneck theory to obtain a\ndiscriminative representation of each view by extracting task-relevant\ninformation through a view-specific encoder-classifier architecture. In the\nsecond stage, an autoencoder based multi-view reconstruction network is\nutilized to extract high-level semantic representation of the augmented\nfeatures and recover the missing data, thereby aiding the final classification\ntask. Extensive experiments on five datasets demonstrate that our TACVI-Net\noutperforms other state-of-the-art methods.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"39 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Task-Augmented Cross-View Imputation Network for Partial Multi-View Incomplete Multi-Label Classification\",\"authors\":\"Xiaohuan Lu, Lian Zhao, Wai Keung Wong, Jie Wen, Jiang Long, Wulin Xie\",\"doi\":\"arxiv-2409.07931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In real-world scenarios, multi-view multi-label learning often encounters the\\nchallenge of incomplete training data due to limitations in data collection and\\nunreliable annotation processes. The absence of multi-view features impairs the\\ncomprehensive understanding of samples, omitting crucial details essential for\\nclassification. To address this issue, we present a task-augmented cross-view\\nimputation network (TACVI-Net) for the purpose of handling partial multi-view\\nincomplete multi-label classification. Specifically, we employ a two-stage\\nnetwork to derive highly task-relevant features to recover the missing views.\\nIn the first stage, we leverage the information bottleneck theory to obtain a\\ndiscriminative representation of each view by extracting task-relevant\\ninformation through a view-specific encoder-classifier architecture. In the\\nsecond stage, an autoencoder based multi-view reconstruction network is\\nutilized to extract high-level semantic representation of the augmented\\nfeatures and recover the missing data, thereby aiding the final classification\\ntask. Extensive experiments on five datasets demonstrate that our TACVI-Net\\noutperforms other state-of-the-art methods.\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":\"39 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07931\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在现实世界中，由于数据收集的局限性和注释过程的不可靠，多视角多标签学习经常会遇到训练数据不完整的挑战。多视角特征的缺失会影响对样本的全面理解，从而遗漏对分类至关重要的细节。为了解决这个问题，我们提出了一种任务增强跨视图输入网络（TACVI-Net），用于处理部分多视图不完整多标签分类。在第一阶段，我们利用信息瓶颈理论，通过特定视图的编码器-分类器架构提取任务相关信息，从而获得每个视图的区分表示。在第二阶段，我们利用基于自动编码器的多视图重构网络来提取增强特征的高级语义表示并恢复缺失数据，从而帮助完成最终的分类任务。在五个数据集上进行的广泛实验表明，我们的 TACVI-Netout 优于其他最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Task-Augmented Cross-View Imputation Network for Partial Multi-View Incomplete Multi-Label Classification

In real-world scenarios, multi-view multi-label learning often encounters the challenge of incomplete training data due to limitations in data collection and unreliable annotation processes. The absence of multi-view features impairs the comprehensive understanding of samples, omitting crucial details essential for classification. To address this issue, we present a task-augmented cross-view imputation network (TACVI-Net) for the purpose of handling partial multi-view incomplete multi-label classification. Specifically, we employ a two-stage network to derive highly task-relevant features to recover the missing views. In the first stage, we leverage the information bottleneck theory to obtain a discriminative representation of each view by extracting task-relevant information through a view-specific encoder-classifier architecture. In the second stage, an autoencoder based multi-view reconstruction network is utilized to extract high-level semantic representation of the augmented features and recover the missing data, thereby aiding the final classification task. Extensive experiments on five datasets demonstrate that our TACVI-Net outperforms other state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Computer Vision and Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Massively Multi-Person 3D Human Motion Forecasting with Scene Context Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Precise Forecasting of Sky Images Using Spatial Warping JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Applications of Knowledge Distillation in Remote Sensing: A Survey