Pub Date : 2026-03-06DOI: 10.1007/s11263-026-02733-2
Yan Li, Weiwei Guo, Xue Yang, Ning Liao, Shaofeng Zhang, Yi Yu, Wenxian Yu, Junchi Yan
{"title":"Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation","authors":"Yan Li, Weiwei Guo, Xue Yang, Ning Liao, Shaofeng Zhang, Yi Yu, Wenxian Yu, Junchi Yan","doi":"10.1007/s11263-026-02733-2","DOIUrl":"https://doi.org/10.1007/s11263-026-02733-2","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"693 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-06DOI: 10.1007/s11263-025-02693-z
Jinyoung Park, Juyeon Ko, Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim
Pre-trained vision-language models ( e.g ., CLIP) have shown impressive success in various computer vision tasks with their generalization capability. Recently, parameter-efficient fine-tuning (PEFT) approaches have been actively explored to effectively and efficiently adapt the pre-trained vision-language models to a variety of downstream tasks. However, most existing PEFT approaches suffer from a task overfitting issue since the general knowledge of the pre-trained models is forgotten while a small number of learnable parameters in soft prompts/adapters are fine-tuned on a small data set from a specific target task. Thus, we propose a P arameter- E fficient F ine- T uning via Meta - R egularization (PEFT-MetaR) to improve the generalizability of parameter-efficient fine-tuning methods for vision-language models. Specifically, PEFT-MetaR meta-learns both the regularizer and learnable parameters to harness the task-specific knowledge from the downstream tasks and task-agnostic general knowledge from the pretrained models. Further, PEFT-MetaR augments the task to generate multiple virtual tasks to alleviate the meta-overfitting. In addition, we provide the analysis to comprehend how PEFT-MetaR improves the generalizability from the perspective of the gradient alignment. Our experiments demonstrate that PEFT-MetaR improves the generalizability of parameter-efficient fine-tuning methods on various datasets.
{"title":"Parameter-Efficient Fine-Tuning via Meta-Regularizer","authors":"Jinyoung Park, Juyeon Ko, Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim","doi":"10.1007/s11263-025-02693-z","DOIUrl":"https://doi.org/10.1007/s11263-025-02693-z","url":null,"abstract":"Pre-trained vision-language models ( <jats:italic>e.g</jats:italic> ., CLIP) have shown impressive success in various computer vision tasks with their generalization capability. Recently, parameter-efficient fine-tuning (PEFT) approaches have been actively explored to effectively and efficiently adapt the pre-trained vision-language models to a variety of downstream tasks. However, most existing PEFT approaches suffer from a task overfitting issue since the general knowledge of the pre-trained models is forgotten while a small number of learnable parameters in soft prompts/adapters are fine-tuned on a small data set from a specific target task. Thus, we propose a P arameter- E fficient F ine- T uning via Meta - R egularization (PEFT-MetaR) to improve the generalizability of parameter-efficient fine-tuning methods for vision-language models. Specifically, PEFT-MetaR meta-learns both the regularizer and learnable parameters to harness the task-specific knowledge from the downstream tasks and task-agnostic general knowledge from the pretrained models. Further, PEFT-MetaR augments the task to generate multiple virtual tasks to alleviate the meta-overfitting. In addition, we provide the analysis to comprehend how PEFT-MetaR improves the generalizability from the perspective of the gradient alignment. Our experiments demonstrate that PEFT-MetaR improves the generalizability of parameter-efficient fine-tuning methods on various datasets.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"15 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-06DOI: 10.1007/s11263-026-02751-0
Steven Landgraf, Markus Hillemann, Theodor Kapler, Markus Ulrich
Quantifying the predictive uncertainty emerged as a possible solution to common challenges like overconfidence, lack of explainability, and robustness of deep neural networks, albeit one that is often computationally expensive. Many real-world applications are multi-modal in nature and hence benefit from multi-task learning. In autonomous driving or robotics, for example, the joint solution of semantic segmentation and monocular depth estimation has proven to be valuable. To this end, we introduce EMUFormer, a novel student-teacher distillation approach for efficient multi-task uncertainties in the context of joint semantic segmentation and monocular depth estimation. By leveraging the predictive uncertainties of the teacher, EMUFormer achieves new state-of-the-art results on Cityscapes and NYUv2 and additionally estimates reliable predictive uncertainties for both tasks that are comparable or superior to a Deep Ensemble despite being an order of magnitude more efficient to compute. These findings even extend to out-of-domain and domain adaptation scenarios, highlighting EMUFormer’s remarkable reliability.
{"title":"EMUFormer: Efficient Multi-task Uncertainties for Reliable Joint Semantic Segmentation and Monocular Depth Estimation","authors":"Steven Landgraf, Markus Hillemann, Theodor Kapler, Markus Ulrich","doi":"10.1007/s11263-026-02751-0","DOIUrl":"https://doi.org/10.1007/s11263-026-02751-0","url":null,"abstract":"Quantifying the predictive uncertainty emerged as a possible solution to common challenges like overconfidence, lack of explainability, and robustness of deep neural networks, albeit one that is often computationally expensive. Many real-world applications are multi-modal in nature and hence benefit from multi-task learning. In autonomous driving or robotics, for example, the joint solution of semantic segmentation and monocular depth estimation has proven to be valuable. To this end, we introduce EMUFormer, a novel student-teacher distillation approach for efficient multi-task uncertainties in the context of joint semantic segmentation and monocular depth estimation. By leveraging the predictive uncertainties of the teacher, EMUFormer achieves new state-of-the-art results on Cityscapes and NYUv2 and additionally estimates reliable predictive uncertainties for both tasks that are comparable or superior to a Deep Ensemble despite being an order of magnitude more efficient to compute. These findings even extend to out-of-domain and domain adaptation scenarios, highlighting EMUFormer’s remarkable reliability.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"199 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}