{"title":"CSTrans: cross-subdomain transformer for unsupervised domain adaptation","authors":"Junchi Liu, Xiang Zhang, Zhigang Luo","doi":"10.1007/s40747-024-01709-4","DOIUrl":null,"url":null,"abstract":"<p>Unsupervised domain adaptation (UDA) aims to make full use of a labeled source domain data to classify an unlabeled target domain data. With the success of Transformer in various vision tasks, existing UDA methods borrow strong Transformer framework to learn global domain-invariant feature representation from the domain level or category level. Of them, the cross-attention as a key component acts for the cross-domain feature alignment, benefiting from its robustness. Intriguingly, we find that the robustness makes the model insensitive to the sub-grouping property within the same category of both source and target domains, known as the subdomain structure. This is because the robustness regards some fine-grained information as the noises and removes them. To overcome this shortcoming, we propose an end-to-end Cross-Subdomain Transformer framework (CSTrans) to exploit the transferability of subdomain structures and the robustness of cross-attention to calibrate inter-domain features. Specifically, there are two innovations in this paper. First, we devise an efficient Index Matching Module (IMM) to calculate the cross-attention of the same category in different domains and learn the domain-invariant representation. This not only simplifies the traditional daunting image-pair selection but also paves the safer way for guarding fine-grained subdomain information. This is because the IMM implements reliable feature confusion. Second, we introduce discriminative clustering to mine the subdomain structures in the same category and further learn subdomain discrimination. Both aspects cooperates with each other for fewer training stages. We perform extensive studies on five benchmarks, and the respective experimental results show that, as compared to existing UDA siblings, CSTrans attains remarkable results with average classification accuracy of 94.3%, 92.1%, and 85.4% on datasets Office-31, ImageCLEF-DA, and Office-Home, respectively.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"28 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-024-01709-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Unsupervised domain adaptation (UDA) aims to make full use of a labeled source domain data to classify an unlabeled target domain data. With the success of Transformer in various vision tasks, existing UDA methods borrow strong Transformer framework to learn global domain-invariant feature representation from the domain level or category level. Of them, the cross-attention as a key component acts for the cross-domain feature alignment, benefiting from its robustness. Intriguingly, we find that the robustness makes the model insensitive to the sub-grouping property within the same category of both source and target domains, known as the subdomain structure. This is because the robustness regards some fine-grained information as the noises and removes them. To overcome this shortcoming, we propose an end-to-end Cross-Subdomain Transformer framework (CSTrans) to exploit the transferability of subdomain structures and the robustness of cross-attention to calibrate inter-domain features. Specifically, there are two innovations in this paper. First, we devise an efficient Index Matching Module (IMM) to calculate the cross-attention of the same category in different domains and learn the domain-invariant representation. This not only simplifies the traditional daunting image-pair selection but also paves the safer way for guarding fine-grained subdomain information. This is because the IMM implements reliable feature confusion. Second, we introduce discriminative clustering to mine the subdomain structures in the same category and further learn subdomain discrimination. Both aspects cooperates with each other for fewer training stages. We perform extensive studies on five benchmarks, and the respective experimental results show that, as compared to existing UDA siblings, CSTrans attains remarkable results with average classification accuracy of 94.3%, 92.1%, and 85.4% on datasets Office-31, ImageCLEF-DA, and Office-Home, respectively.
期刊介绍:
Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.