Pub Date : 2026-01-12DOI: 10.1007/s11263-025-02721-y
Carlos Francisco Moreno-García, Gerardo Aragon Camarasa, Edmond S. L. Ho, Paul Henderson, Nicolas Pugeault, Jungong Han, Sergio Escalera
{"title":"Guest Editorial: Special Issue for the British Machine Vision Conference (BMVC), 2024 (Glasgow, Scotland, UK)","authors":"Carlos Francisco Moreno-García, Gerardo Aragon Camarasa, Edmond S. L. Ho, Paul Henderson, Nicolas Pugeault, Jungong Han, Sergio Escalera","doi":"10.1007/s11263-025-02721-y","DOIUrl":"https://doi.org/10.1007/s11263-025-02721-y","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"6 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145955234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-11DOI: 10.1007/s11263-025-02601-5
Hongduan Tian, Feng Liu, Ka Chun Cheung, Zhen Fang, Simon See, Tongliang Liu, Bo Han
In cross-domain few-shot classification (CFC), mainstream studies aim to train a simple module (e.g. a linear transformation head) to select or transform features (a.k.a., the high-level semantic features) for previously unseen domains with a few labeled training data available on top of a powerful pre-trained model. These studies usually assume that high-level semantic features are shared across these domains, and just simple feature selection or transformations are enough to adapt features to previously unseen domains. However, in this paper, we find that the simply transformed features are too general to fully cover the key content features regarding each class. Thus, we propose an effective method, invariant-content feature reconstruction (IFR), to train a simple module that simultaneously considers both high-level and fine-grained invariant-content features for the previously unseen domains. Specifically, the fine-grained invariant-content features are considered as a set of informative and discriminative features learned from a few labeled training data of tasks sampled from unseen domains and are extracted by retrieving features that are invariant to style modifications from a set of content-preserving augmented data in pixel level with an attention module. Extensive experiments on the Meta-Dataset benchmark show that IFR achieves good generalization performance on unseen domains, which demonstrates the effectiveness of the fusion of the high-level features and the fine-grained invariant-content features. Specifically, IFR improves the average accuracy on unseen domains by 1.6% and 6.5% respectively under two different cross-domain few-shot classification settings.
{"title":"Cross-domain Few-shot Classification via Invariant-content Feature Reconstruction","authors":"Hongduan Tian, Feng Liu, Ka Chun Cheung, Zhen Fang, Simon See, Tongliang Liu, Bo Han","doi":"10.1007/s11263-025-02601-5","DOIUrl":"https://doi.org/10.1007/s11263-025-02601-5","url":null,"abstract":"In <jats:italic>cross-domain few-shot classification</jats:italic> (CFC), mainstream studies aim to train a simple module (e.g. a linear transformation head) to select or transform features (a.k.a., the high-level semantic features) for previously unseen domains with a few labeled training data available on top of a powerful pre-trained model. These studies usually <jats:italic>assume</jats:italic> that high-level semantic features are shared across these domains, and just simple feature selection or transformations are enough to adapt features to previously unseen domains. However, in this paper, we find that the simply transformed features are too general to fully cover the key content features regarding each class. Thus, we propose an effective method, <jats:italic>invariant-content feature reconstruction</jats:italic> (IFR), to train a simple module that simultaneously considers both high-level and fine-grained invariant-content features for the previously unseen domains. Specifically, the fine-grained invariant-content features are considered as a set of <jats:italic>informative</jats:italic> and <jats:italic>discriminative</jats:italic> features learned from a few labeled training data of tasks sampled from unseen domains and are extracted by retrieving features that are invariant to style modifications from a set of content-preserving augmented data in pixel level with an attention module. Extensive experiments on the Meta-Dataset benchmark show that IFR achieves good generalization performance on unseen domains, which demonstrates the effectiveness of the fusion of the high-level features and the fine-grained invariant-content features. Specifically, IFR improves the average accuracy on unseen domains by 1.6% and 6.5% respectively under two different cross-domain few-shot classification settings.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"38 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145955236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1007/s11263-025-02630-0
Pablo Morales-Álvarez, Stergios Christodoulidis, Maria Vakalopoulou, Pablo Piantanida, Jose Dolz
{"title":"BayesAdapter: Enhanced Uncertainty Estimation in CLIP Few-Shot Adaptation","authors":"Pablo Morales-Álvarez, Stergios Christodoulidis, Maria Vakalopoulou, Pablo Piantanida, Jose Dolz","doi":"10.1007/s11263-025-02630-0","DOIUrl":"https://doi.org/10.1007/s11263-025-02630-0","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"6 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145947206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09DOI: 10.1007/s11263-025-02654-6
Antonio Ríos-Vila, Jorge Calvo-Zaragoza, David Rizo, Thierry Paquet
Optical Music Recognition (OMR) has made significant progress since its inception, with various approaches now capable of accurately transcribing music scores into digital formats. Despite these advancements, most so-called end-to-end OMR approaches still rely on multi-stage processing pipelines for transcribing full-page score images, which entails challenges such as the need for dedicated layout analysis and specific annotated data, thereby limiting the general applicability of such methods. In this paper, we present the first truly end-to-end approach for page-level OMR in complex layouts. Our system, which combines convolutional layers with autoregressive Transformers, processes an entire music score page and outputs a complete transcription in a music encoding format. This is made possible by both the architecture and the training procedure, which utilizes curriculum learning through incremental synthetic data generation. We evaluate the proposed system using pianoform corpora, which is one of the most complex sources in the OMR literature. This evaluation is conducted first in a controlled scenario with synthetic data, and subsequently against two real-world corpora of varying conditions. Our approach is compared with leading commercial OMR software. The results demonstrate that our system not only successfully transcribes full-page music scores but also outperforms the commercial tool in both zero-shot settings and after fine-tuning with the target domain, representing a significant contribution to the field of OMR.
{"title":"End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music","authors":"Antonio Ríos-Vila, Jorge Calvo-Zaragoza, David Rizo, Thierry Paquet","doi":"10.1007/s11263-025-02654-6","DOIUrl":"https://doi.org/10.1007/s11263-025-02654-6","url":null,"abstract":"Optical Music Recognition (OMR) has made significant progress since its inception, with various approaches now capable of accurately transcribing music scores into digital formats. Despite these advancements, most so-called <jats:italic>end-to-end</jats:italic> OMR approaches still rely on multi-stage processing pipelines for transcribing full-page score images, which entails challenges such as the need for dedicated layout analysis and specific annotated data, thereby limiting the general applicability of such methods. In this paper, we present the first truly end-to-end approach for page-level OMR in complex layouts. Our system, which combines convolutional layers with autoregressive Transformers, processes an entire music score page and outputs a complete transcription in a music encoding format. This is made possible by both the architecture and the training procedure, which utilizes curriculum learning through incremental synthetic data generation. We evaluate the proposed system using pianoform corpora, which is one of the most complex sources in the OMR literature. This evaluation is conducted first in a controlled scenario with synthetic data, and subsequently against two real-world corpora of varying conditions. Our approach is compared with leading commercial OMR software. The results demonstrate that our system not only successfully transcribes full-page music scores but also outperforms the commercial tool in both zero-shot settings and after fine-tuning with the target domain, representing a significant contribution to the field of OMR.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"3 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145947208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09DOI: 10.1007/s11263-025-02590-5
Jungmyung Wi, Youngkyun Jang, Dujin Lee, Myeongseok Nam, Donghyun Kim
{"title":"Delving into Pre-training for Domain Transfer: A Broad Study of Pre-training for Domain Generalization and Domain Adaptation","authors":"Jungmyung Wi, Youngkyun Jang, Dujin Lee, Myeongseok Nam, Donghyun Kim","doi":"10.1007/s11263-025-02590-5","DOIUrl":"https://doi.org/10.1007/s11263-025-02590-5","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"29 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145947209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}