{"title":"UVaT: Uncertainty Incorporated View-Aware Transformer for Robust Multi-View Classification","authors":"Yapeng Li;Yong Luo;Bo Du","doi":"10.1109/TIP.2024.3451931","DOIUrl":null,"url":null,"abstract":"Existing multi-view classification algorithms usually assume that all examples have observations on all views, and the data in different views are clean. However, in real-world applications, we are often provided with data that have missing representations or contain noise on some views (i.e., missing or noise views). This may lead to significant performance degeneration, and thus many algorithms are proposed to address the incomplete view or noisy view issues. However, most of existing algorithms deal with the two issues separately, and hence may fail when both missing and noisy views exist. They are also usually not flexible in that the view or feature significance cannot be adaptively identified. Besides, the view missing patterns may vary in the training and test phases, and such difference is often ignored. To remedy these drawbacks, we propose a novel multi-view classification framework that is simultaneously robust to both incomplete and noisy views. This is achieved by integrating early fusion and late fusion in a single framework. Specifically, in our early fusion module, we propose a view-aware transformer to mask the missing views and adaptively explore the relationships between views and target tasks to deal with missing views. Considering that view missing patterns may change from the training to the test phase, we also design single-view classification and category-consistency constraints to reduce the dependence of our model on view-missing patterns. In our late fusion module, we quantify the view uncertainty in an ensemble way to estimate the noise level of that view. Then the uncertainty and prediction logits of different views are integrated to make our model robust to noisy views. The framework is trained in an end-to-end manner. Experimental results on diverse datasets demonstrate the robustness and effectiveness of our model for both incomplete and noisy views. Codes are available at \n<uri>https://github.com/li-yapeng/UVaT</uri>\n.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5129-5143"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10666988/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Existing multi-view classification algorithms usually assume that all examples have observations on all views, and the data in different views are clean. However, in real-world applications, we are often provided with data that have missing representations or contain noise on some views (i.e., missing or noise views). This may lead to significant performance degeneration, and thus many algorithms are proposed to address the incomplete view or noisy view issues. However, most of existing algorithms deal with the two issues separately, and hence may fail when both missing and noisy views exist. They are also usually not flexible in that the view or feature significance cannot be adaptively identified. Besides, the view missing patterns may vary in the training and test phases, and such difference is often ignored. To remedy these drawbacks, we propose a novel multi-view classification framework that is simultaneously robust to both incomplete and noisy views. This is achieved by integrating early fusion and late fusion in a single framework. Specifically, in our early fusion module, we propose a view-aware transformer to mask the missing views and adaptively explore the relationships between views and target tasks to deal with missing views. Considering that view missing patterns may change from the training to the test phase, we also design single-view classification and category-consistency constraints to reduce the dependence of our model on view-missing patterns. In our late fusion module, we quantify the view uncertainty in an ensemble way to estimate the noise level of that view. Then the uncertainty and prediction logits of different views are integrated to make our model robust to noisy views. The framework is trained in an end-to-end manner. Experimental results on diverse datasets demonstrate the robustness and effectiveness of our model for both incomplete and noisy views. Codes are available at
https://github.com/li-yapeng/UVaT
.