Pub Date : 2025-12-30DOI: 10.1109/TIP.2025.3646940
Yangfan Li;Wei Li
Feature reconstruction networks have achieved remarkable performance in few-shot fine-grained classification tasks. Nonetheless, traditional feature reconstruction networks rely on linear regression. This linearity may cause the loss of subtle discriminative cues, ultimately resulting in less precise reconstructed features. Moreover, in situations where the background predominantly occupies the image, the background reconstruction errors tend to overshadow foreground reconstruction errors, resulting in inaccurate reconstruction errors. In order to address the two key issues, a novel approach called the Foreground-Aware Kernelized Feature Reconstruction Network (FKFRN) is proposed. Specifically, to address the problem of imprecise reconstructed features, we introduce kernel methods into linear feature reconstruction, extending it to nonlinear feature reconstruction, thus enabling the reconstruction of richer, finer-grained discriminative features. To tackle the issue of inaccurate reconstruction errors, the foreground-aware reconstruction error is proposed. Specifically, the model assigns higher weights to features containing more foreground information and lower weights to those dominated by background content, which reduces the impact of background errors on the overall reconstruction. To estimate these weights accurately, we design two complementary strategies: an explicit probabilistic graphical model and an implicit neural network–based approach. Extensive experimental results on eight datasets validate the effectiveness of the proposed approach for few-shot fine-grained classification.
{"title":"Few-Shot Fine-Grained Classification With Foreground-Aware Kernelized Feature Reconstruction Network","authors":"Yangfan Li;Wei Li","doi":"10.1109/TIP.2025.3646940","DOIUrl":"10.1109/TIP.2025.3646940","url":null,"abstract":"Feature reconstruction networks have achieved remarkable performance in few-shot fine-grained classification tasks. Nonetheless, traditional feature reconstruction networks rely on linear regression. This linearity may cause the loss of subtle discriminative cues, ultimately resulting in less precise reconstructed features. Moreover, in situations where the background predominantly occupies the image, the background reconstruction errors tend to overshadow foreground reconstruction errors, resulting in inaccurate reconstruction errors. In order to address the two key issues, a novel approach called the Foreground-Aware Kernelized Feature Reconstruction Network (FKFRN) is proposed. Specifically, to address the problem of imprecise reconstructed features, we introduce kernel methods into linear feature reconstruction, extending it to nonlinear feature reconstruction, thus enabling the reconstruction of richer, finer-grained discriminative features. To tackle the issue of inaccurate reconstruction errors, the foreground-aware reconstruction error is proposed. Specifically, the model assigns higher weights to features containing more foreground information and lower weights to those dominated by background content, which reduces the impact of background errors on the overall reconstruction. To estimate these weights accurately, we design two complementary strategies: an explicit probabilistic graphical model and an implicit neural network–based approach. Extensive experimental results on eight datasets validate the effectiveness of the proposed approach for few-shot fine-grained classification.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"150-165"},"PeriodicalIF":13.7,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145866581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-30DOI: 10.1109/TIP.2025.3646893
Daosong Hu;Xi Li;Mingyue Cui;Kai Huang
In resource-constrained vehicle systems, establishing consistency between multi-view scenes and driver gaze remains challenging. Prior methods mainly focus on cross-source data fusion, estimating gaze or attention maps through unidirectional implicit links between scene and facial features. Although bidirectional projection can correct misalignment between predictions and ground truth, the high resolution of scene images and complex semantic extraction incur heavy computational loads. To address these issues, we propose a lightweight driver-attention estimation framework that leverages geometric consistency between scene and gaze to guide feature extraction bidirectionally, thereby strengthening representation. Specifically, we first introduce a lightweight feature extraction module that captures global and local information in parallel through dual asymmetric branches to efficiently extract facial and scene features. An information cross fusion module is then designed to promote interaction between the scene and gaze streams. The multi-branch architecture extracts gaze and geometric cues at multiple scales, reducing the computational redundancy caused by mixed features when modeling geometric consistency across both views. Experiments on a large public dataset show that incorporating scene information introduces no significant computational overhead and yields a better trade-off between accuracy and efficiency. Moreover, leveraging bidirectional projection and the temporal continuity of gaze, we preliminarily explore the framework’s potential for predicting attention trends.
{"title":"LNet: Lightweight Network for Driver Attention Estimation via Scene and Gaze Consistency","authors":"Daosong Hu;Xi Li;Mingyue Cui;Kai Huang","doi":"10.1109/TIP.2025.3646893","DOIUrl":"10.1109/TIP.2025.3646893","url":null,"abstract":"In resource-constrained vehicle systems, establishing consistency between multi-view scenes and driver gaze remains challenging. Prior methods mainly focus on cross-source data fusion, estimating gaze or attention maps through unidirectional implicit links between scene and facial features. Although bidirectional projection can correct misalignment between predictions and ground truth, the high resolution of scene images and complex semantic extraction incur heavy computational loads. To address these issues, we propose a lightweight driver-attention estimation framework that leverages geometric consistency between scene and gaze to guide feature extraction bidirectionally, thereby strengthening representation. Specifically, we first introduce a lightweight feature extraction module that captures global and local information in parallel through dual asymmetric branches to efficiently extract facial and scene features. An information cross fusion module is then designed to promote interaction between the scene and gaze streams. The multi-branch architecture extracts gaze and geometric cues at multiple scales, reducing the computational redundancy caused by mixed features when modeling geometric consistency across both views. Experiments on a large public dataset show that incorporating scene information introduces no significant computational overhead and yields a better trade-off between accuracy and efficiency. Moreover, leveraging bidirectional projection and the temporal continuity of gaze, we preliminarily explore the framework’s potential for predicting attention trends.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"27-41"},"PeriodicalIF":13.7,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145866772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-26DOI: 10.1109/TIP.2025.3644791
Heyang Sun;Chuanxing Geng;Songcan Chen
The open set known class bias is conventionally viewed as a fatal problem i.e., the models trained solely on known classes tend to fit unknown classes to known classes with high confidence in inference. Thus existing methods, without exception make a choice in two manners: most methods opt for eliminating the known class bias as much as possible with tireless efforts, while others circumvent the known class bias by employing a reconstruction method. However, in this paper, we challenge the two widely accepted approaches and present a novel proposition: the so-called harmful known class bias for most methods is, exactly conversely, beneficial for the reconstruction-based method and thus such known class bias can serve as a positive-incentive to the Open set recognition (OSR) models from a reconstruction perspective. Along this line, we propose the Bias Enhanced Reconstruction Learning (BERL) framework to enhance the known class bias respectively from the class level, model level and sample level. Specifically, at the class level, a specific representation is constructed in a supervised contrastive manner to avoid overgeneralization, while a diffusion model is employed by injecting the class prior to guide the biased reconstruction at the model level. Additionally, we leverage the advantages of the diffusion model to design a self-adaptive strategy, enabling effective sample-level biased sampling based on the information bottleneck theory. Experiments on various benchmarks demonstrate the effectiveness and performance superiority of the proposed method.
{"title":"Embracing the Power of Known Class Bias in Open Set Recognition From a Reconstruction Perspective","authors":"Heyang Sun;Chuanxing Geng;Songcan Chen","doi":"10.1109/TIP.2025.3644791","DOIUrl":"10.1109/TIP.2025.3644791","url":null,"abstract":"The open set known class bias is conventionally viewed as a fatal problem i.e., the models trained solely on known classes tend to fit unknown classes to known classes with high confidence in inference. Thus existing methods, without exception make a choice in two manners: most methods opt for eliminating the known class bias as much as possible with tireless efforts, while others circumvent the known class bias by employing a reconstruction method. However, in this paper, we challenge the two widely accepted approaches and present a novel proposition: the so-called harmful known class bias for most methods is, exactly conversely, beneficial for the reconstruction-based method and thus such known class bias can serve as a positive-incentive to the Open set recognition (OSR) models from a reconstruction perspective. Along this line, we propose the Bias Enhanced Reconstruction Learning (BERL) framework to enhance the known class bias respectively from the class level, model level and sample level. Specifically, at the class level, a specific representation is constructed in a supervised contrastive manner to avoid overgeneralization, while a diffusion model is employed by injecting the class prior to guide the biased reconstruction at the model level. Additionally, we leverage the advantages of the diffusion model to design a self-adaptive strategy, enabling effective sample-level biased sampling based on the information bottleneck theory. Experiments on various benchmarks demonstrate the effectiveness and performance superiority of the proposed method.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"14-26"},"PeriodicalIF":13.7,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145836281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-25DOI: 10.1109/TIP.2025.3646471
Nyi Nyi Naing;Huazhen Chen;Qing Cai;Lili Xia;Zhongke Gao;Jianpeng An
Nuclei segmentation and classification in Hematoxylin and Eosin (H&E) stained histology images play a vital role in cancer diagnosis, treatment planning, and research. However, accurate segmentation can be hindered by factors like irregular cell shapes, unclear boundaries, and class imbalance. To address these challenges, we propose the Adaptive Gated Attention Fusion Network (AGAFNet), which integrates three innovative attention-based blocks into a U-shaped architecture complemented by dedicated decoders for both segmentation and classification tasks. These blocks comprise the Channel-wise and Spatial Attention Integration Block (CSAIB) for enhanced feature representation and selective focus on informative regions; the Adaptive Gated Convolutional Block (AGCB) for robust feature selection throughout the network; and the Fusion Attention Refinement Block (FARB) for effective information fusion. AGAFNet leverages these elements to provide a robust solution for precise nuclei segmentation and classification in H&E stained histology images. We evaluate the performance of AGAFNet on three large-scale multi-tissue datasets: PanNuke, CoNSeP, and Lizard. The experimental results demonstrate our proposed AGAFNet achieves comparable performance to state-of-the-art methods.
{"title":"AGAFNet: Adaptive Gated Attention Fusion Network for Accurate Nuclei Segmentation and Classification in Histology Images","authors":"Nyi Nyi Naing;Huazhen Chen;Qing Cai;Lili Xia;Zhongke Gao;Jianpeng An","doi":"10.1109/TIP.2025.3646471","DOIUrl":"10.1109/TIP.2025.3646471","url":null,"abstract":"Nuclei segmentation and classification in Hematoxylin and Eosin (H&E) stained histology images play a vital role in cancer diagnosis, treatment planning, and research. However, accurate segmentation can be hindered by factors like irregular cell shapes, unclear boundaries, and class imbalance. To address these challenges, we propose the Adaptive Gated Attention Fusion Network (AGAFNet), which integrates three innovative attention-based blocks into a U-shaped architecture complemented by dedicated decoders for both segmentation and classification tasks. These blocks comprise the Channel-wise and Spatial Attention Integration Block (CSAIB) for enhanced feature representation and selective focus on informative regions; the Adaptive Gated Convolutional Block (AGCB) for robust feature selection throughout the network; and the Fusion Attention Refinement Block (FARB) for effective information fusion. AGAFNet leverages these elements to provide a robust solution for precise nuclei segmentation and classification in H&E stained histology images. We evaluate the performance of AGAFNet on three large-scale multi-tissue datasets: PanNuke, CoNSeP, and Lizard. The experimental results demonstrate our proposed AGAFNet achieves comparable performance to state-of-the-art methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"98-111"},"PeriodicalIF":13.7,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-25DOI: 10.1109/TIP.2025.3646455
Kang Li;Feiniu Yuan;Chunmei Wang;Chunli Meng
Lightweight smoke image segmentation is essential for fire warning systems, particularly on mobile devices. In recent years, although numerous high-precision, large-scale smoke segmentation models have been developed, there are few lightweight solutions specifically designed for mobile applications. Therefore, we propose a Multi-stage Group Interaction and Cross-domain Fusion Network (MGICFN) with low computational complexity for real-time smoke segmentation. To improve the model’s ability to effectively analyze smoke features, we incorporate a Cross-domain Interaction Attention Module (CIAM) to merge spatial and frequency domain features for creating a lightweight smoke encoder. To alleviate the loss of critical information from small smoke objects during downsampling, we design a Multi-stage Group Interaction Module (MGIM). The MGIM calibrates the information discrepancies between high and low-dimensional features. To enhance the boundary information of smoke targets, we introduce an Edge Enhancement Module (EEM), which utilizes predicted target boundaries as advanced guidance to refine lower-level smoke features. Furthermore, we implement a Group Convolutional Block Attention Module (GCBAM) and a Group Fusion Module (GFM) to connect the encoder and decoder efficiently. Experimental results demonstrate that MGICFN achieves an 88.70% Dice coefficient (Dice), an 81.16% mean Intersection over Union (mIoU), and a 91.93% accuracy (Acc) on the SFS3K dataset. It also achieves an 87.30% Dice, a 78.68% mIoU, and a 92.95% Acc on the SYN70K test dataset. Our MGICFN model has 0.73M parameters and requires 0.3G FLOPs.
{"title":"Multi-Stage Group Interaction and Cross-Domain Fusion Network for Real-Time Smoke Segmentation","authors":"Kang Li;Feiniu Yuan;Chunmei Wang;Chunli Meng","doi":"10.1109/TIP.2025.3646455","DOIUrl":"10.1109/TIP.2025.3646455","url":null,"abstract":"Lightweight smoke image segmentation is essential for fire warning systems, particularly on mobile devices. In recent years, although numerous high-precision, large-scale smoke segmentation models have been developed, there are few lightweight solutions specifically designed for mobile applications. Therefore, we propose a Multi-stage Group Interaction and Cross-domain Fusion Network (MGICFN) with low computational complexity for real-time smoke segmentation. To improve the model’s ability to effectively analyze smoke features, we incorporate a Cross-domain Interaction Attention Module (CIAM) to merge spatial and frequency domain features for creating a lightweight smoke encoder. To alleviate the loss of critical information from small smoke objects during downsampling, we design a Multi-stage Group Interaction Module (MGIM). The MGIM calibrates the information discrepancies between high and low-dimensional features. To enhance the boundary information of smoke targets, we introduce an Edge Enhancement Module (EEM), which utilizes predicted target boundaries as advanced guidance to refine lower-level smoke features. Furthermore, we implement a Group Convolutional Block Attention Module (GCBAM) and a Group Fusion Module (GFM) to connect the encoder and decoder efficiently. Experimental results demonstrate that MGICFN achieves an 88.70% Dice coefficient (Dice), an 81.16% mean Intersection over Union (mIoU), and a 91.93% accuracy (Acc) on the SFS3K dataset. It also achieves an 87.30% Dice, a 78.68% mIoU, and a 92.95% Acc on the SYN70K test dataset. Our MGICFN model has 0.73M parameters and requires 0.3G FLOPs.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"124-135"},"PeriodicalIF":13.7,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145829977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-24DOI: 10.1109/TIP.2025.3635475
Haihong Xiao;Wenxiong Kang;Yulan Guo;Hao Liu;Ying He
Giving machines the ability to infer the complete 3D geometry and semantics of complex scenes is crucial for many downstream tasks, such as decision-making and planning. Vision-centric Semantic Scene Completion (SSC) has emerged as a trendy 3D perception paradigm due to its compatibility with task properties, low cost, and rich visual cues. Despite impressive results, current approaches inevitably suffer from problems such as depth errors or depth ambiguities during the 2D-to-3D transformation process. To overcome these limitations, in this paper, we first introduce an Optical Flow-Guided (OFG) DepthNet that leverages the strengths of pretrained depth estimation models, while incorporating optical flow images to improve depth prediction accuracy in regions with significant depth changes. Then, we propose a depth ambiguity-mitigated feature lifting strategy that implements deformable cross-attention in 3D pixel space to avoid depth ambiguities caused by the projection process from 3D to 2D and further enhances the effectiveness of feature updating through the utilization of prior mask indices. Moreover, we customize two subnetworks: a residual voxel network and a sparse UNet, to enhance the network’s geometric prediction capabilities and ensure consistent semantic reasoning across varying scales. By doing so, our method achieves performance improvements over state-of-the-art methods on the SemanticKITTI, SSCBench-KITTI-360 and Occ3D-nuScene benchmarks.
{"title":"Enhanced Geometry and Semantics for Camera-Based 3D Semantic Scene Completion","authors":"Haihong Xiao;Wenxiong Kang;Yulan Guo;Hao Liu;Ying He","doi":"10.1109/TIP.2025.3635475","DOIUrl":"10.1109/TIP.2025.3635475","url":null,"abstract":"Giving machines the ability to infer the complete 3D geometry and semantics of complex scenes is crucial for many downstream tasks, such as decision-making and planning. Vision-centric Semantic Scene Completion (SSC) has emerged as a trendy 3D perception paradigm due to its compatibility with task properties, low cost, and rich visual cues. Despite impressive results, current approaches inevitably suffer from problems such as depth errors or depth ambiguities during the 2D-to-3D transformation process. To overcome these limitations, in this paper, we first introduce an Optical Flow-Guided (OFG) DepthNet that leverages the strengths of pretrained depth estimation models, while incorporating optical flow images to improve depth prediction accuracy in regions with significant depth changes. Then, we propose a depth ambiguity-mitigated feature lifting strategy that implements deformable cross-attention in 3D pixel space to avoid depth ambiguities caused by the projection process from 3D to 2D and further enhances the effectiveness of feature updating through the utilization of prior mask indices. Moreover, we customize two subnetworks: a residual voxel network and a sparse UNet, to enhance the network’s geometric prediction capabilities and ensure consistent semantic reasoning across varying scales. By doing so, our method achieves performance improvements over state-of-the-art methods on the SemanticKITTI, SSCBench-KITTI-360 and Occ3D-nuScene benchmarks.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"1-13"},"PeriodicalIF":13.7,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145823141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-24DOI: 10.1109/TIP.2025.3646073
Qixing Yu;Zhongwei Li;Ziqi Xin;Fangming Guo;Guangbo Ren;Jianbu Wang;Zhenggang Bi
Hyperspectral image classification (HSIC) is a valuable method for identifying coastal wetland vegetation, but challenges like environmental complexity and difficulty in distinguishing land cover types make large-scale labeling difficult. Cross-domain few-shot learning (CDFSL) offers a potential solution to limited labeling. Existing CDFSL HSIC methods have made significant progress, but still face challenges like prototype deviation, covariate shifts, and rely on complex domain alignment (DA) methods. To address these issues, a feature reconstruction-based CDFSL (FRFSL) algorithm is proposed. Within FRFSL, a Prototype Calibration Module (PCM) is designed for the prototype deviation, which employs a Bayesian inference-enhanced Gaussian Mixture Model to select reliable query features for prototype reconstruction, aligning the prototypes more closely with the actual distribution. Additionally, a ridge regression closed-form solution is incorporated into the Distance Metric Module (DMM), employing a projection matrix for prototype reconstruction to mitigate covariate shifts between the support and query sets. Features from both source and target domains are reconstructed into dynamic graphs, transforming DA into a graph matching problem guided by optimal transport theory. A novel shared transport matrix implementation algorithm is developed to achieve lightweight and interpretable alignment. Extensive experiments on three self-constructed coastal wetland datasets and one public dataset show that FRFSL outperforms eleven state-of-the-art algorithms. The code will be available at https://github.com/Yqx-ACE/TIP_2025_FRFSL
{"title":"FRFSL: Feature Reconstruction-Based Cross-Domain Few-Shot Learning for Coastal Wetland Hyperspectral Image Classification","authors":"Qixing Yu;Zhongwei Li;Ziqi Xin;Fangming Guo;Guangbo Ren;Jianbu Wang;Zhenggang Bi","doi":"10.1109/TIP.2025.3646073","DOIUrl":"10.1109/TIP.2025.3646073","url":null,"abstract":"Hyperspectral image classification (HSIC) is a valuable method for identifying coastal wetland vegetation, but challenges like environmental complexity and difficulty in distinguishing land cover types make large-scale labeling difficult. Cross-domain few-shot learning (CDFSL) offers a potential solution to limited labeling. Existing CDFSL HSIC methods have made significant progress, but still face challenges like prototype deviation, covariate shifts, and rely on complex domain alignment (DA) methods. To address these issues, a feature reconstruction-based CDFSL (FRFSL) algorithm is proposed. Within FRFSL, a Prototype Calibration Module (PCM) is designed for the prototype deviation, which employs a Bayesian inference-enhanced Gaussian Mixture Model to select reliable query features for prototype reconstruction, aligning the prototypes more closely with the actual distribution. Additionally, a ridge regression closed-form solution is incorporated into the Distance Metric Module (DMM), employing a projection matrix for prototype reconstruction to mitigate covariate shifts between the support and query sets. Features from both source and target domains are reconstructed into dynamic graphs, transforming DA into a graph matching problem guided by optimal transport theory. A novel shared transport matrix implementation algorithm is developed to achieve lightweight and interpretable alignment. Extensive experiments on three self-constructed coastal wetland datasets and one public dataset show that FRFSL outperforms eleven state-of-the-art algorithms. The code will be available at <uri>https://github.com/Yqx-ACE/TIP_2025_FRFSL</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"194-207"},"PeriodicalIF":13.7,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145823140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23DOI: 10.1109/TIP.2025.3645574
Kuan-Chung Ting;Sheng-Jyh Wang;Ruey-Bing Hwang
In this paper, based on second-order cross-partial derivative (CPD), we propose an efficient blind image deblurring algorithm for uniform blur. The proposed method consists of two stages. We first apply a novel blur kernel estimation method to quickly estimate the blur kernel. Then, we use the estimated kernel to perform non-blind deconvolution to restore the image. A key discovery of the proposed kernel estimation method is that the blur kernel information is usually embedded in the cross-partial-derivative (CPD) image of the blurred image. By exploiting this property, we propose a pipeline to extract a set of kernel candidates directly from the CPD image and then select the most suitable kernel as the estimated blur kernel. Since our kernel estimation method can obtain a fairly accurate blur kernel, we can achieve effective image restoration using a relatively simple Tikhonov regularization in the subsequent non-blind deconvolution process. To improve the quality of the restored image, we further adopt an efficient filtering technique to suppress periodic artifacts that may appear in the restored images. Experimental results demonstrate that our algorithm can efficiently restore high-quality sharp images on standard CPUs without relying on GPU acceleration or parallel computation. For blurred images of approximately $800times 800$ resolution, the proposed method can complete image deblurring within 1 to 5 seconds, which is significantly faster than most state-of-the-art methods. Our MATLAB codes are available at https://github.com/e11tkcee06-a11y/CPD-Deblur.git.
{"title":"Fast Blind Image Deblurring Based on Cross Partial Derivative","authors":"Kuan-Chung Ting;Sheng-Jyh Wang;Ruey-Bing Hwang","doi":"10.1109/TIP.2025.3645574","DOIUrl":"10.1109/TIP.2025.3645574","url":null,"abstract":"In this paper, based on second-order cross-partial derivative (CPD), we propose an efficient blind image deblurring algorithm for uniform blur. The proposed method consists of two stages. We first apply a novel blur kernel estimation method to quickly estimate the blur kernel. Then, we use the estimated kernel to perform non-blind deconvolution to restore the image. A key discovery of the proposed kernel estimation method is that the blur kernel information is usually embedded in the cross-partial-derivative (CPD) image of the blurred image. By exploiting this property, we propose a pipeline to extract a set of kernel candidates directly from the CPD image and then select the most suitable kernel as the estimated blur kernel. Since our kernel estimation method can obtain a fairly accurate blur kernel, we can achieve effective image restoration using a relatively simple Tikhonov regularization in the subsequent non-blind deconvolution process. To improve the quality of the restored image, we further adopt an efficient filtering technique to suppress periodic artifacts that may appear in the restored images. Experimental results demonstrate that our algorithm can efficiently restore high-quality sharp images on standard CPUs without relying on GPU acceleration or parallel computation. For blurred images of approximately <inline-formula> <tex-math>$800times 800$ </tex-math></inline-formula> resolution, the proposed method can complete image deblurring within 1 to 5 seconds, which is significantly faster than most state-of-the-art methods. Our MATLAB codes are available at <uri>https://github.com/e11tkcee06-a11y/CPD-Deblur.git</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8627-8640"},"PeriodicalIF":13.7,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145812858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23DOI: 10.1109/TIP.2025.3645572
Xin Guo;Yifan Zhao;Jia Li
Generating 3D-based body movements from speech shows great potential in extensive downstream applications, while it still suffers challenges in imitating realistic human movements. Predominant research efforts focus on end-to-end generation schemes to generate co-speech gestures, spanning GANs, VQ-VAE, and recent diffusion models. As an ill-posed problem, in this paper, we argue that these prevailing learning schemes fail to model crucial inter- and intra-correlations across different motion units, i.e. head, body, and hands, thus leading to unnatural movements and poor coordination. To delve into these intrinsic correlations, we propose a unified Hierarchical Implicit Periodicity (HIP) learning approach for audio-inspired 3D gesture generation. Different from predominant research, our approach models this multi-modal implicit relationship by two explicit technique insights: i) To disentangle the complicated gesture movements, we first explore the gesture motion phase manifolds with periodic autoencoders to imitate human natures from realistic distributions while incorporating non-period ones from current latent states for instance-level diversities. ii) To model the hierarchical relationship of face motions, body gestures, and hand movements, driving the animation with cascaded guidance during learning. We exhibit our proposed approach on 3D avatars and extensive experiments show our method outperforms the state-of-the-art co-speech gesture generation methods by both quantitative and qualitative evaluations. Code and models will be publicly available.
{"title":"Toward Unified Co-Speech Gesture Generation via Hierarchical Implicit Periodicity Learning","authors":"Xin Guo;Yifan Zhao;Jia Li","doi":"10.1109/TIP.2025.3645572","DOIUrl":"10.1109/TIP.2025.3645572","url":null,"abstract":"Generating 3D-based body movements from speech shows great potential in extensive downstream applications, while it still suffers challenges in imitating realistic human movements. Predominant research efforts focus on end-to-end generation schemes to generate co-speech gestures, spanning GANs, VQ-VAE, and recent diffusion models. As an ill-posed problem, in this paper, we argue that these prevailing learning schemes fail to model crucial inter- and intra-correlations across different motion units, i.e. head, body, and hands, thus leading to unnatural movements and poor coordination. To delve into these intrinsic correlations, we propose a unified Hierarchical Implicit Periodicity (HIP) learning approach for audio-inspired 3D gesture generation. Different from predominant research, our approach models this multi-modal implicit relationship by two explicit technique insights: i) To disentangle the complicated gesture movements, we first explore the gesture motion phase manifolds with periodic autoencoders to imitate human natures from realistic distributions while incorporating non-period ones from current latent states for instance-level diversities. ii) To model the hierarchical relationship of face motions, body gestures, and hand movements, driving the animation with cascaded guidance during learning. We exhibit our proposed approach on 3D avatars and extensive experiments show our method outperforms the state-of-the-art co-speech gesture generation methods by both quantitative and qualitative evaluations. Code and models will be publicly available.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"208-220"},"PeriodicalIF":13.7,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145812856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23DOI: 10.1109/TIP.2025.3645630
Chunwei Tian;Chengyuan Zhang;Bob Zhang;Zhiwu Li;C. L. Philip Chen;David Zhang
Deep convolutional neural networks can use hierarchical information to progressively extract structural information to recover high-quality images. However, preserving the effectiveness of the obtained structural information is important in image super-resolution. In this paper, we propose a cosine network for image super-resolution (CSRNet) by improving a network architecture and optimizing the training strategy. To extract complementary homologous structural information, odd and even heterogeneous blocks are designed to enlarge the architectural differences and improve the performance of image super-resolution. Combining linear and non-linear structural information can overcome the drawback of homologous information and enhance the robustness of the obtained structural information in image super-resolution. Taking into account the local minimum of gradient descent, a cosine annealing mechanism is used to optimize the training procedure by performing warm restarts and adjusting the learning rate. Experimental results illustrate that the proposed CSRNet is competitive with state-of-the-art methods in image super-resolution.
{"title":"A Cosine Network for Image Super-Resolution","authors":"Chunwei Tian;Chengyuan Zhang;Bob Zhang;Zhiwu Li;C. L. Philip Chen;David Zhang","doi":"10.1109/TIP.2025.3645630","DOIUrl":"10.1109/TIP.2025.3645630","url":null,"abstract":"Deep convolutional neural networks can use hierarchical information to progressively extract structural information to recover high-quality images. However, preserving the effectiveness of the obtained structural information is important in image super-resolution. In this paper, we propose a cosine network for image super-resolution (CSRNet) by improving a network architecture and optimizing the training strategy. To extract complementary homologous structural information, odd and even heterogeneous blocks are designed to enlarge the architectural differences and improve the performance of image super-resolution. Combining linear and non-linear structural information can overcome the drawback of homologous information and enhance the robustness of the obtained structural information in image super-resolution. Taking into account the local minimum of gradient descent, a cosine annealing mechanism is used to optimize the training procedure by performing warm restarts and adjusting the learning rate. Experimental results illustrate that the proposed CSRNet is competitive with state-of-the-art methods in image super-resolution.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"305-316"},"PeriodicalIF":13.7,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145812860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}