视觉变压器稳定域自适应的低秩自适应

IF 1 Q4 OPTICS Optical Memory and Neural Networks Pub Date : 2023-11-28 DOI:10.3103/S1060992X2306005X

N. Filatov, M. Kindulov

{"title":"视觉变压器稳定域自适应的低秩自适应","authors":"N. Filatov, M. Kindulov","doi":"10.3103/S1060992X2306005X","DOIUrl":null,"url":null,"abstract":"<p>Unsupervised domain adaptation plays a crucial role in semantic segmentation tasks due to the high cost of annotating data. Existing approaches often rely on large transformer models and momentum networks to stabilize and improve the self-training process. In this study, we investigate the applicability of low-rank adaptation (LoRA) to domain adaptation in computer vision. Our focus is on the unsupervised domain adaptation task of semantic segmentation, which requires adapting models from a synthetic dataset (GTA5) to a real-world dataset (City-scapes). We employ the Swin Transformer as the feature extractor and TransDA domain adaptation framework. Through experiments, we demonstrate that LoRA effectively stabilizes the self-training process, achieving similar training dynamics to the exponentially moving average (EMA) mechanism. Moreover, LoRA provides comparable metrics to EMA under the same limited computation budget. In GTA5 → Cityscapes experiments, the adaptation pipeline with LoRA achieves a mIoU of 0.515, slightly surpassing the EMA baseline’s mIoU of 0.513, while also offering an 11% speedup in training time and video memory saving. These re-sults highlight LoRA as a promising approach for domain adaptation in computer vision, offering a viable alternative to momentum networks which also saves computational resources.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"32 2","pages":"S277 - S283"},"PeriodicalIF":1.0000,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Low Rank Adaptation for Stable Domain Adaptation of Vision Transformers\",\"authors\":\"N. Filatov, M. Kindulov\",\"doi\":\"10.3103/S1060992X2306005X\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Unsupervised domain adaptation plays a crucial role in semantic segmentation tasks due to the high cost of annotating data. Existing approaches often rely on large transformer models and momentum networks to stabilize and improve the self-training process. In this study, we investigate the applicability of low-rank adaptation (LoRA) to domain adaptation in computer vision. Our focus is on the unsupervised domain adaptation task of semantic segmentation, which requires adapting models from a synthetic dataset (GTA5) to a real-world dataset (City-scapes). We employ the Swin Transformer as the feature extractor and TransDA domain adaptation framework. Through experiments, we demonstrate that LoRA effectively stabilizes the self-training process, achieving similar training dynamics to the exponentially moving average (EMA) mechanism. Moreover, LoRA provides comparable metrics to EMA under the same limited computation budget. In GTA5 → Cityscapes experiments, the adaptation pipeline with LoRA achieves a mIoU of 0.515, slightly surpassing the EMA baseline’s mIoU of 0.513, while also offering an 11% speedup in training time and video memory saving. These re-sults highlight LoRA as a promising approach for domain adaptation in computer vision, offering a viable alternative to momentum networks which also saves computational resources.</p>\",\"PeriodicalId\":721,\"journal\":{\"name\":\"Optical Memory and Neural Networks\",\"volume\":\"32 2\",\"pages\":\"S277 - S283\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2023-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Optical Memory and Neural Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.3103/S1060992X2306005X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"OPTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optical Memory and Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S1060992X2306005X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPTICS","Score":null,"Total":0}

引用次数: 0

摘要

由于标注数据的高成本，无监督域自适应在语义分割任务中起着至关重要的作用。现有的方法通常依赖于大型变压器模型和动量网络来稳定和改进自训练过程。在本研究中，我们探讨了低秩自适应(LoRA)在计算机视觉领域自适应中的适用性。我们的重点是语义分割的无监督域自适应任务，这需要将模型从合成数据集(GTA5)调整到现实数据集(城市景观)。我们采用Swin Transformer作为特征提取器和TransDA域自适应框架。通过实验，我们证明了LoRA有效地稳定了自训练过程，实现了与指数移动平均(EMA)机制相似的训练动态。此外，LoRA在相同的有限计算预算下提供了与EMA相当的指标。在GTA5→cityscape实验中，使用LoRA的自适应流水线达到了0.515的mIoU，略高于EMA基线的0.513的mIoU，同时在训练时间和节省视频内存方面也提供了11%的加速。这些结果突出了LoRA作为计算机视觉领域适应的一种有前途的方法，为动量网络提供了一种可行的替代方案，同时也节省了计算资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Low Rank Adaptation for Stable Domain Adaptation of Vision Transformers

Unsupervised domain adaptation plays a crucial role in semantic segmentation tasks due to the high cost of annotating data. Existing approaches often rely on large transformer models and momentum networks to stabilize and improve the self-training process. In this study, we investigate the applicability of low-rank adaptation (LoRA) to domain adaptation in computer vision. Our focus is on the unsupervised domain adaptation task of semantic segmentation, which requires adapting models from a synthetic dataset (GTA5) to a real-world dataset (City-scapes). We employ the Swin Transformer as the feature extractor and TransDA domain adaptation framework. Through experiments, we demonstrate that LoRA effectively stabilizes the self-training process, achieving similar training dynamics to the exponentially moving average (EMA) mechanism. Moreover, LoRA provides comparable metrics to EMA under the same limited computation budget. In GTA5 → Cityscapes experiments, the adaptation pipeline with LoRA achieves a mIoU of 0.515, slightly surpassing the EMA baseline’s mIoU of 0.513, while also offering an 11% speedup in training time and video memory saving. These re-sults highlight LoRA as a promising approach for domain adaptation in computer vision, offering a viable alternative to momentum networks which also saves computational resources.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Optical Memory and Neural Networks OPTICS-

CiteScore

1.50

自引率

11.10%

发文量

期刊介绍： The journal covers a wide range of issues in information optics such as optical memory, mechanisms for optical data recording and processing, photosensitive materials, optical, optoelectronic and holographic nanostructures, and many other related topics. Papers on memory systems using holographic and biological structures and concepts of brain operation are also included. The journal pays particular attention to research in the field of neural net systems that may lead to a new generation of computional technologies by endowing them with intelligence.