Pub Date : 2026-01-07DOI: 10.1007/s11263-025-02653-7
Cheng Da, Peng Wang, Cong Yao
{"title":"Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition","authors":"Cheng Da, Peng Wang, Cong Yao","doi":"10.1007/s11263-025-02653-7","DOIUrl":"https://doi.org/10.1007/s11263-025-02653-7","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"82 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145947212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1007/s11263-025-02649-3
Yiming Zhang, Yicheng Gu, Yanhong Zeng, Zhening Xing, Yuancheng Wang, Zhizheng Wu, Bin Liu, Kai Chen
{"title":"FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds","authors":"Yiming Zhang, Yicheng Gu, Yanhong Zeng, Zhening Xing, Yuancheng Wang, Zhizheng Wu, Bin Liu, Kai Chen","doi":"10.1007/s11263-025-02649-3","DOIUrl":"https://doi.org/10.1007/s11263-025-02649-3","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"253 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145947213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-06DOI: 10.1007/s11263-025-02595-0
Oguzhan Ulucan, Diclehan Ulucan, Marc Ebner
The human visual system achieves color constancy, allowing consistent color perception under varying environmental contexts, while also being deceived by color illusions, where contextual information affects our perception. Despite the close relationship between color constancy and color illusions, and their potential benefits to the field, both phenomena are rarely studied together in computer vision. In this study, we present the benefits of considering color illusions in the field of computer vision. Particularly, we introduce a learning-free method, namely multiresolution color constancy , which combines insights from computational neuroscience and computer vision to address both phenomena within a single framework. Our approach performs color constancy in both multi- and single-illuminant scenarios, while it is also deceived by assimilation illusions. Additionally, we extend our method to low-light image enhancement, thus, demonstrate its usability across different computer vision tasks. Through comprehensive experiments on color constancy, we show the effectiveness of our method in multi-illuminant and single-illuminant scenarios. Furthermore, we compare our method with state-of-the-art learning-based models on low-light image enhancement, where it shows competitive performance. This work presents the first method that integrates color constancy, color illusions, and low-light image enhancement in a single and explainable framework.
{"title":"A Traditional Approach for Color Constancy and Color Assimilation Illusions with Its Applications to Low-Light Image Enhancement","authors":"Oguzhan Ulucan, Diclehan Ulucan, Marc Ebner","doi":"10.1007/s11263-025-02595-0","DOIUrl":"https://doi.org/10.1007/s11263-025-02595-0","url":null,"abstract":"The human visual system achieves color constancy, allowing consistent color perception under varying environmental contexts, while also being deceived by color illusions, where contextual information affects our perception. Despite the close relationship between color constancy and color illusions, and their potential benefits to the field, both phenomena are rarely studied together in computer vision. In this study, we present the benefits of considering color illusions in the field of computer vision. Particularly, we introduce a learning-free method, namely <jats:italic>multiresolution color constancy</jats:italic> , which combines insights from computational neuroscience and computer vision to address both phenomena within a single framework. Our approach performs color constancy in both multi- and single-illuminant scenarios, while it is also deceived by assimilation illusions. Additionally, we extend our method to low-light image enhancement, thus, demonstrate its usability across different computer vision tasks. Through comprehensive experiments on color constancy, we show the effectiveness of our method in multi-illuminant and single-illuminant scenarios. Furthermore, we compare our method with state-of-the-art learning-based models on low-light image enhancement, where it shows competitive performance. This work presents the first method that integrates color constancy, color illusions, and low-light image enhancement in a single and explainable framework.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"83 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145902469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-06DOI: 10.1007/s11263-025-02622-0
Federico Betti, Lorenzo Baraldi, Lorenzo Baraldi, Rita Cucchiara, Nicu Sebe
{"title":"Hallucination Early Detection in Diffusion Models","authors":"Federico Betti, Lorenzo Baraldi, Lorenzo Baraldi, Rita Cucchiara, Nicu Sebe","doi":"10.1007/s11263-025-02622-0","DOIUrl":"https://doi.org/10.1007/s11263-025-02622-0","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"42 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145902468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05DOI: 10.1007/s11263-025-02658-2
Rayyan Ahmed, Hamza Baali, Abdesselam Bouzerdoum
Deep learning has revolutionized image analysis, but its applications are limited by the need for large datasets and high computational resources. Hybrid approaches that combine domain-specific, universal feature extractor with learnable neural networks offer a promising balance of efficiency and accuracy. This paper presents a hybrid model integrating a Gabor filter bank front-end with compact neural networks for efficient feature extraction and classification. Gabor filters, inherently bandpass, extract early-stage features with spatially shifted filters covering the frequency plane to balance spatial and spectral localization. We introduce separate channels capturing low- and high-frequency components to enhance feature representation while maintaining efficiency. The approach reduces trainable parameters and training time while preserving accuracy, making it suitable for resource-constrained environments. Compared to MobileNetV2 and EfficientNetB0, our model trains approximately 4–6 × faster on average while using fewer parameters and FLOPs. We compare it to pretrained networks used as feature extractors, lightweight fine-tuned models, and classical descriptors (HOG, LBP). It achieves competitive results with faster training and reduced computation. The hybrid model uses only around 0.60 GFLOPs and 0.34 M parameters, and we apply statistical significance testing (ANOVA, paired t-tests) to validate performance gains. Inference takes 0.01–0.02 s per image, up to 15 × faster than EfficientNetB0 and 8 × faster than MobileNetV2. Grad-CAM visualizations confirm localized attention on relevant regions. This work highlights integrating traditional features with deep learning to improve efficiency for resource-limited applications. Future work will address color fusion, robustness to noise, and automated filter optimization.
{"title":"A Lightweight Hybrid Gabor Deep Learning Approach and its Application to Medical Image Classification","authors":"Rayyan Ahmed, Hamza Baali, Abdesselam Bouzerdoum","doi":"10.1007/s11263-025-02658-2","DOIUrl":"https://doi.org/10.1007/s11263-025-02658-2","url":null,"abstract":"Deep learning has revolutionized image analysis, but its applications are limited by the need for large datasets and high computational resources. Hybrid approaches that combine domain-specific, universal feature extractor with learnable neural networks offer a promising balance of efficiency and accuracy. This paper presents a hybrid model integrating a Gabor filter bank front-end with compact neural networks for efficient feature extraction and classification. Gabor filters, inherently bandpass, extract early-stage features with spatially shifted filters covering the frequency plane to balance spatial and spectral localization. We introduce separate channels capturing low- and high-frequency components to enhance feature representation while maintaining efficiency. The approach reduces trainable parameters and training time while preserving accuracy, making it suitable for resource-constrained environments. Compared to MobileNetV2 and EfficientNetB0, our model trains approximately 4–6 × faster on average while using fewer parameters and FLOPs. We compare it to pretrained networks used as feature extractors, lightweight fine-tuned models, and classical descriptors (HOG, LBP). It achieves competitive results with faster training and reduced computation. The hybrid model uses only around 0.60 GFLOPs and 0.34 M parameters, and we apply statistical significance testing (ANOVA, paired t-tests) to validate performance gains. Inference takes 0.01–0.02 s per image, up to 15 × faster than EfficientNetB0 and 8 × faster than MobileNetV2. Grad-CAM visualizations confirm localized attention on relevant regions. This work highlights integrating traditional features with deep learning to improve efficiency for resource-limited applications. Future work will address color fusion, robustness to noise, and automated filter optimization.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"41 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145902471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05DOI: 10.1007/s11263-025-02669-z
Gang Wu, Junjun Jiang, Kui Jiang, Xianming Liu, Wangmeng Zuo
{"title":"Learning from History: Task-agnostic Model Contrastive Learning for Image Restoration","authors":"Gang Wu, Junjun Jiang, Kui Jiang, Xianming Liu, Wangmeng Zuo","doi":"10.1007/s11263-025-02669-z","DOIUrl":"https://doi.org/10.1007/s11263-025-02669-z","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"1 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145902472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}