Pub Date : 2024-10-17DOI: 10.1109/TIP.2024.3478854
Jingren Liu;Zhong Ji;Yanwei Pang;Yunlong Yu
The proliferation of Few-Shot Class Incremental Learning (FSCIL) methodologies has highlighted the critical challenge of maintaining robust anti-amnesia capabilities in FSCIL learners. In this paper, we present a novel conceptualization of anti-amnesia in terms of mathematical generalization, leveraging the Neural Tangent Kernel (NTK) perspective. Our method focuses on two key aspects: ensuring optimal NTK convergence and minimizing NTK-related generalization loss, which serve as the theoretical foundation for cross-task generalization. To achieve global NTK convergence, we introduce a principled meta-learning mechanism that guides optimization within an expanded network architecture. Concurrently, to reduce the NTK-related generalization loss, we systematically optimize its constituent factors. Specifically, we initiate self-supervised pre-training on the base session to enhance NTK-related generalization potential. These self-supervised weights are then carefully refined through curricular alignment, followed by the application of dual NTK regularization tailored specifically for both convolutional and linear layers. Through the combined effects of these measures, our network acquires robust NTK properties, ensuring optimal convergence and stability of the NTK matrix and minimizing the NTK-related generalization loss, significantly enhancing its theoretical generalization. On popular FSCIL benchmark datasets, our NTK-FSCIL surpasses contemporary state-of-the-art approaches, elevating end-session accuracy by 2.9% to 9.3%.
{"title":"NTK-Guided Few-Shot Class Incremental Learning","authors":"Jingren Liu;Zhong Ji;Yanwei Pang;Yunlong Yu","doi":"10.1109/TIP.2024.3478854","DOIUrl":"10.1109/TIP.2024.3478854","url":null,"abstract":"The proliferation of Few-Shot Class Incremental Learning (FSCIL) methodologies has highlighted the critical challenge of maintaining robust anti-amnesia capabilities in FSCIL learners. In this paper, we present a novel conceptualization of anti-amnesia in terms of mathematical generalization, leveraging the Neural Tangent Kernel (NTK) perspective. Our method focuses on two key aspects: ensuring optimal NTK convergence and minimizing NTK-related generalization loss, which serve as the theoretical foundation for cross-task generalization. To achieve global NTK convergence, we introduce a principled meta-learning mechanism that guides optimization within an expanded network architecture. Concurrently, to reduce the NTK-related generalization loss, we systematically optimize its constituent factors. Specifically, we initiate self-supervised pre-training on the base session to enhance NTK-related generalization potential. These self-supervised weights are then carefully refined through curricular alignment, followed by the application of dual NTK regularization tailored specifically for both convolutional and linear layers. Through the combined effects of these measures, our network acquires robust NTK properties, ensuring optimal convergence and stability of the NTK matrix and minimizing the NTK-related generalization loss, significantly enhancing its theoretical generalization. On popular FSCIL benchmark datasets, our NTK-FSCIL surpasses contemporary state-of-the-art approaches, elevating end-session accuracy by 2.9% to 9.3%.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6029-6044"},"PeriodicalIF":0.0,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142448771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-17DOI: 10.1109/TIP.2024.3477356
Mu Li;Youneng Bao;Xiaohang Sui;Jinxing Li;Guangming Lu;Yong Xu
Learned 360° image compression methods using equirectangular projection (ERP) often confront a non-uniform sampling issue, inherent to sphere-to-rectangle projection. While uniformly or nearly uniformly sampling representations, along with their corresponding convolution operations, have been proposed to mitigate this issue, these methods often concentrate solely on uniform sampling rates, thus neglecting the content of the image. In this paper, we urge that different contents within 360° images have varying significance and advocate for the adoption of a content-adaptive parametric representation in 360° image compression, which takes into account both the content and sampling rate. We first introduce the parametric pseudocylindrical representation and corresponding convolution operation, upon which we build a learned 360° image codec. Then, we model the hyperparameter of the representation as the output of a network, derived from the image’s content and its spherical coordinates. We treat the optimization of hyperparameters for different 360° images as distinct compression tasks and propose a meta-learning algorithm to jointly optimize the codec and the metaknowledge, i.e., the hyperparameter estimation network. A significant challenge is the lack of a direct derivative from the compression loss to the hyperparameter network. To address this, we present a novel method to relax the rate-distortion loss as a function of the hyperparameters, enabling gradient-based optimization of the metaknowledge. Experimental results on omnidirectional images demonstrate that our method achieves state-of-the-art performance and superior visual quality.
{"title":"Learning Content-Weighted Pseudocylindrical Representation for 360° Image Compression","authors":"Mu Li;Youneng Bao;Xiaohang Sui;Jinxing Li;Guangming Lu;Yong Xu","doi":"10.1109/TIP.2024.3477356","DOIUrl":"10.1109/TIP.2024.3477356","url":null,"abstract":"Learned 360° image compression methods using equirectangular projection (ERP) often confront a non-uniform sampling issue, inherent to sphere-to-rectangle projection. While uniformly or nearly uniformly sampling representations, along with their corresponding convolution operations, have been proposed to mitigate this issue, these methods often concentrate solely on uniform sampling rates, thus neglecting the content of the image. In this paper, we urge that different contents within 360° images have varying significance and advocate for the adoption of a content-adaptive parametric representation in 360° image compression, which takes into account both the content and sampling rate. We first introduce the parametric pseudocylindrical representation and corresponding convolution operation, upon which we build a learned 360° image codec. Then, we model the hyperparameter of the representation as the output of a network, derived from the image’s content and its spherical coordinates. We treat the optimization of hyperparameters for different 360° images as distinct compression tasks and propose a meta-learning algorithm to jointly optimize the codec and the metaknowledge, i.e., the hyperparameter estimation network. A significant challenge is the lack of a direct derivative from the compression loss to the hyperparameter network. To address this, we present a novel method to relax the rate-distortion loss as a function of the hyperparameters, enabling gradient-based optimization of the metaknowledge. Experimental results on omnidirectional images demonstrate that our method achieves state-of-the-art performance and superior visual quality.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5975-5988"},"PeriodicalIF":0.0,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142448772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-17DOI: 10.1109/TIP.2024.3473301
Khang Truong Giang;Soohwan Song;Sungho Jo
This study tackles image matching in difficult scenarios, such as scenes with significant variations or limited texture, with a strong emphasis on computational efficiency. Previous studies have attempted to address this challenge by encoding global scene contexts using Transformers. However, these approaches have high computational costs and may not capture sufficient high-level contextual information, such as spatial structures or semantic shapes. To overcome these limitations, we propose a novel image-matching method that leverages a topic-modeling strategy to capture high-level contexts in images. Our method represents each image as a multinomial distribution over topics, where each topic represents semantic structures. By incorporating these topics, we can effectively capture comprehensive context information and obtain discriminative and high-quality features. Notably, our coarse-level matching network enhances efficiency by employing attention layers only to fixed-sized topics and small-sized features. Finally, we design a dynamic feature refinement network for precise results at a finer matching stage. Through extensive experiments, we have demonstrated the superiority of our method in challenging scenarios. Specifically, our method ranks in the top 9% in the Image Matching Challenge 2023 without using ensemble techniques. Additionally, we achieve an approximately 50% reduction in computational costs compared to other Transformer-based methods. Code is available at https://github.com/TruongKhang/TopicFM