IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

英文中文

Not Every Patch is Needed: Toward a More Efficient and Effective Backbone for Video-Based Person Re-Identification

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2025-01-27 DOI: 10.1109/TIP.2025.3531299

Lanyun Zhu;Tianrun Chen;Deyi Ji;Jieping Ye;Jun Liu

This paper proposes a new effective and efficient plug-and-play backbone for video-based person re-identification (ReID). Conventional video-based ReID methods typically use CNN or transformer backbones to extract deep features for every position in every sampled video frame. Here, we argue that this exhaustive feature extraction could be unnecessary, since we find that different frames in a ReID video often exhibit small differences and contain many similar regions due to the relatively slight movements of human beings. Inspired by this, a more selective, efficient paradigm is explored in this paper. Specifically, we introduce a patch selection mechanism to reduce computational cost by choosing only the crucial and non-repetitive patches for feature extraction. Additionally, we present a novel network structure that generates and utilizes pseudo frame global context to address the issue of incomplete views resulting from sparse inputs. By incorporating these new designs, our backbone can achieve both high performance and low computational cost. Extensive experiments on multiple datasets show that our approach reduces the computational cost by 74% compared to ViT-B and 28% compared to ResNet50, while the accuracy is on par with ViT-B and outperforms ResNet50 significantly.

{"title":"Not Every Patch is Needed: Toward a More Efficient and Effective Backbone for Video-Based Person Re-Identification","authors":"Lanyun Zhu;Tianrun Chen;Deyi Ji;Jieping Ye;Jun Liu","doi":"10.1109/TIP.2025.3531299","DOIUrl":"10.1109/TIP.2025.3531299","url":null,"abstract":"This paper proposes a new effective and efficient plug-and-play backbone for video-based person re-identification (ReID). Conventional video-based ReID methods typically use CNN or transformer backbones to extract deep features for every position in every sampled video frame. Here, we argue that this exhaustive feature extraction could be unnecessary, since we find that different frames in a ReID video often exhibit small differences and contain many similar regions due to the relatively slight movements of human beings. Inspired by this, a more selective, efficient paradigm is explored in this paper. Specifically, we introduce a patch selection mechanism to reduce computational cost by choosing only the crucial and non-repetitive patches for feature extraction. Additionally, we present a novel network structure that generates and utilizes pseudo frame global context to address the issue of incomplete views resulting from sparse inputs. By incorporating these new designs, our backbone can achieve both high performance and low computational cost. Extensive experiments on multiple datasets show that our approach reduces the computational cost by 74% compared to ViT-B and 28% compared to ResNet50, while the accuracy is on par with ViT-B and outperforms ResNet50 significantly.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"785-800"},"PeriodicalIF":0.0,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143049682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Transductive Few-Shot Learning With Enhanced Spectral-Spatial Embedding for Hyperspectral Image Classification

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2025-01-27 DOI: 10.1109/TIP.2025.3531709

Bobo Xi;Yun Zhang;Jiaojiao Li;Yan Huang;Yunsong Li;Zan Li;Jocelyn Chanussot

Few-shot learning (FSL) has been rapidly developed in the hyperspectral image (HSI) classification, potentially eliminating time-consuming and costly labeled data acquisition requirements. Effective feature embedding is empirically significant in FSL methods, which is still challenging for the HSI with rich spectral-spatial information. In addition, compared with inductive FSL, transductive models typically perform better as they explicitly leverage the statistics in the query set. To this end, we devise a transductive FSL framework with enhanced spectral-spatial embedding (TEFSL) to fully exploit the limited prior information available. First, to improve the informative features and suppress the redundant ones contained in the HSI, we devise an attentive feature embedding network (AFEN) comprising a channel calibration module (CCM). Next, a meta-feature interaction module (MFIM) is designed to optimize the support and query features by learning adaptive co-attention using convolutional filters. During inference, we propose an iterative graph-based prototype refinement scheme (iGPRS) to achieve test-time adaptation, making the class centers more representative in a transductive learning manner. Extensive experimental results on four standard benchmarks demonstrate the superiority of our model with various handfuls (i.e., from 1 to 5) labeled samples. The code will be available online at https://github.com/B-Xi/TIP_2025_TEFSL.

{"title":"Transductive Few-Shot Learning With Enhanced Spectral-Spatial Embedding for Hyperspectral Image Classification","authors":"Bobo Xi;Yun Zhang;Jiaojiao Li;Yan Huang;Yunsong Li;Zan Li;Jocelyn Chanussot","doi":"10.1109/TIP.2025.3531709","DOIUrl":"10.1109/TIP.2025.3531709","url":null,"abstract":"Few-shot learning (FSL) has been rapidly developed in the hyperspectral image (HSI) classification, potentially eliminating time-consuming and costly labeled data acquisition requirements. Effective feature embedding is empirically significant in FSL methods, which is still challenging for the HSI with rich spectral-spatial information. In addition, compared with inductive FSL, transductive models typically perform better as they explicitly leverage the statistics in the query set. To this end, we devise a transductive FSL framework with enhanced spectral-spatial embedding (TEFSL) to fully exploit the limited prior information available. First, to improve the informative features and suppress the redundant ones contained in the HSI, we devise an attentive feature embedding network (AFEN) comprising a channel calibration module (CCM). Next, a meta-feature interaction module (MFIM) is designed to optimize the support and query features by learning adaptive co-attention using convolutional filters. During inference, we propose an iterative graph-based prototype refinement scheme (iGPRS) to achieve test-time adaptation, making the class centers more representative in a transductive learning manner. Extensive experimental results on four standard benchmarks demonstrate the superiority of our model with various handfuls (i.e., from 1 to 5) labeled samples. The code will be available online at <uri>https://github.com/B-Xi/TIP_2025_TEFSL</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"854-868"},"PeriodicalIF":0.0,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143049684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multiple Information Prompt Learning for Cloth-Changing Person Re-Identification

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2025-01-24 DOI: 10.1109/TIP.2025.3531217

Shengxun Wei;Zan Gao;Chunjie Ma;Yibo Zhao;Weili Guan;Shengyong Chen

Cloth-changing person re-identification is a subject closer to the real world, which focuses on solving the problem of person re-identification after pedestrians change clothes. The primary challenge in this field is to overcome the complex interplay between intra-class and inter-class variations and to identify features that remain unaffected by changes in appearance. Sufficient data collection for model training would significantly aid in addressing this problem. However, it is challenging to gather diverse datasets in practice. Current methods focus on implicitly learning identity information from the original image or introducing additional auxiliary models, which are largely limited by the quality of the image and the performance of the additional model. To address these issues, inspired by prompt learning, we propose a novel multiple information prompt learning (MIPL) scheme for cloth-changing person ReID, which learns identity robust features through the common prompt guidance of multiple messages. Specifically, the clothing information stripping (CIS) module is designed to decouple the clothing information from the original RGB image features to counteract the influence of clothing appearance. The bio-guided attention (BGA) module is proposed to increase the learning intensity of the model for key information. A dual-length hybrid patch (DHP) module is employed to make the features have diverse coverage to minimize the impact of feature bias. Extensive experiments demonstrate that the proposed method outperforms all state-of-the-art methods on the LTCC, Celeb-reID, Celeb-reID-light, and CSCC datasets, achieving rank-1 scores of 74.8%, 73.3%, 66.0%, and 88.1%, respectively. When compared to AIM (CVPR23), ACID (TIP23), and SCNet (MM23), MIPL achieves rank-1 improvements of 11.3%, 13.8%, and 7.9%, respectively, on the PRCC dataset.

{"title":"Multiple Information Prompt Learning for Cloth-Changing Person Re-Identification","authors":"Shengxun Wei;Zan Gao;Chunjie Ma;Yibo Zhao;Weili Guan;Shengyong Chen","doi":"10.1109/TIP.2025.3531217","DOIUrl":"10.1109/TIP.2025.3531217","url":null,"abstract":"Cloth-changing person re-identification is a subject closer to the real world, which focuses on solving the problem of person re-identification after pedestrians change clothes. The primary challenge in this field is to overcome the complex interplay between intra-class and inter-class variations and to identify features that remain unaffected by changes in appearance. Sufficient data collection for model training would significantly aid in addressing this problem. However, it is challenging to gather diverse datasets in practice. Current methods focus on implicitly learning identity information from the original image or introducing additional auxiliary models, which are largely limited by the quality of the image and the performance of the additional model. To address these issues, inspired by prompt learning, we propose a novel multiple information prompt learning (MIPL) scheme for cloth-changing person ReID, which learns identity robust features through the common prompt guidance of multiple messages. Specifically, the clothing information stripping (CIS) module is designed to decouple the clothing information from the original RGB image features to counteract the influence of clothing appearance. The bio-guided attention (BGA) module is proposed to increase the learning intensity of the model for key information. A dual-length hybrid patch (DHP) module is employed to make the features have diverse coverage to minimize the impact of feature bias. Extensive experiments demonstrate that the proposed method outperforms all state-of-the-art methods on the LTCC, Celeb-reID, Celeb-reID-light, and CSCC datasets, achieving rank-1 scores of 74.8%, 73.3%, 66.0%, and 88.1%, respectively. When compared to AIM (CVPR23), ACID (TIP23), and SCNet (MM23), MIPL achieves rank-1 improvements of 11.3%, 13.8%, and 7.9%, respectively, on the PRCC dataset.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"801-815"},"PeriodicalIF":0.0,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143030779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Local Uncertainty Energy Transfer for Active Domain Adaptation

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2025-01-24 DOI: 10.1109/TIP.2025.3530788

Yulin Sun;Guangming Shi;Weisheng Dong;Xin Li;Le Dong;Xuemei Xie

Active Domain Adaptation (ADA) improves knowledge transfer efficiency from the labeled source domain to the unlabeled target domain by selecting a few target sample labels. However, most existing active sampling methods ignore the local uncertainty of neighbors in the target domain, making it easier to pick out anomalous samples that are detrimental to the model. To address this problem, we present a new approach to active domain adaptation called Local Uncertainty Energy Transfer (LUET), which integrates active learning of local uncertainty confusion and energy transfer alignment constraints into a unified framework. First, in the active learning module, the uncertainty difficult and representative samples from the target domain are selected through local uncertainty energy selection and entropy-weighted class confusion selection. And the active learning strategy based on local uncertainty energy will avoid selecting anomalous samples in the target domain. Second, for the discrimination issue caused by domain shift, we use a global and local energy-transfer alignment constraint module to eliminate the domain gap and improve accuracy. Finally, we used negative log-likelihood loss for supervised learning of source domains and query samples. With the introduction of sample-based energy metrics, the active learning strategy is more closely with the domain alignment. Experiments on multiple domain-adaptive datasets have demonstrated that our LUET can achieve outstanding results and outperform existing state-of-the-art approaches.

{"title":"Local Uncertainty Energy Transfer for Active Domain Adaptation","authors":"Yulin Sun;Guangming Shi;Weisheng Dong;Xin Li;Le Dong;Xuemei Xie","doi":"10.1109/TIP.2025.3530788","DOIUrl":"10.1109/TIP.2025.3530788","url":null,"abstract":"Active Domain Adaptation (ADA) improves knowledge transfer efficiency from the labeled source domain to the unlabeled target domain by selecting a few target sample labels. However, most existing active sampling methods ignore the local uncertainty of neighbors in the target domain, making it easier to pick out anomalous samples that are detrimental to the model. To address this problem, we present a new approach to active domain adaptation called Local Uncertainty Energy Transfer (LUET), which integrates active learning of local uncertainty confusion and energy transfer alignment constraints into a unified framework. First, in the active learning module, the uncertainty difficult and representative samples from the target domain are selected through local uncertainty energy selection and entropy-weighted class confusion selection. And the active learning strategy based on local uncertainty energy will avoid selecting anomalous samples in the target domain. Second, for the discrimination issue caused by domain shift, we use a global and local energy-transfer alignment constraint module to eliminate the domain gap and improve accuracy. Finally, we used negative log-likelihood loss for supervised learning of source domains and query samples. With the introduction of sample-based energy metrics, the active learning strategy is more closely with the domain alignment. Experiments on multiple domain-adaptive datasets have demonstrated that our LUET can achieve outstanding results and outperform existing state-of-the-art approaches.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"816-827"},"PeriodicalIF":0.0,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143030778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dense Information Learning Based Semi-Supervised Object Detection

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2025-01-23 DOI: 10.1109/TIP.2025.3530786

Xi Yang;Penghui Li;Qiubai Zhou;Nannan Wang;Xinbo Gao

Semi-Supervised Object Detection (SSOD) aims to improve the utilization of unlabeled data, and various methods, such as adaptive threshold techniques, have been extensively studied to increase exploitable information. However, these methods are passive, relying solely on the original image data. Additionally, existing approaches prioritize the predicted categories of the teacher model while overlooking the relationships between different categories in the prediction. In this paper, we introduce a novel approach called Dense Information Learning (DIL), which actively generates unlabeled data containing densely exploitable information and forces the network to have relation consistency under different perturbations. Specifically, Dense Information Augmentation (DIA) leverages the prior information of the network to create a foreground bank and actively incorporates exploitable information into the unlabeled data. DIA automatically performs information enhancement and filters noise. Furthermore, to encourage the network to maintain consistency at the manifold level under various perturbations, we introduce Relation Consistency Regularization (RCR). It considers both feature-level and image-level perturbations, guiding the network to focus on more discriminative features. Extensive experiments conducted on multiple datasets validate the effectiveness of our approach in leveraging information from unlabeled images. The proposed DIL improves the mAP by 12.6% and 10.0% relative to the supervised baseline method when utilizing 5% and 10% of labeled data on the MS-COCO dataset, respectively.

{"title":"Dense Information Learning Based Semi-Supervised Object Detection","authors":"Xi Yang;Penghui Li;Qiubai Zhou;Nannan Wang;Xinbo Gao","doi":"10.1109/TIP.2025.3530786","DOIUrl":"10.1109/TIP.2025.3530786","url":null,"abstract":"Semi-Supervised Object Detection (SSOD) aims to improve the utilization of unlabeled data, and various methods, such as adaptive threshold techniques, have been extensively studied to increase exploitable information. However, these methods are passive, relying solely on the original image data. Additionally, existing approaches prioritize the predicted categories of the teacher model while overlooking the relationships between different categories in the prediction. In this paper, we introduce a novel approach called Dense Information Learning (DIL), which actively generates unlabeled data containing densely exploitable information and forces the network to have relation consistency under different perturbations. Specifically, Dense Information Augmentation (DIA) leverages the prior information of the network to create a foreground bank and actively incorporates exploitable information into the unlabeled data. DIA automatically performs information enhancement and filters noise. Furthermore, to encourage the network to maintain consistency at the manifold level under various perturbations, we introduce Relation Consistency Regularization (RCR). It considers both feature-level and image-level perturbations, guiding the network to focus on more discriminative features. Extensive experiments conducted on multiple datasets validate the effectiveness of our approach in leveraging information from unlabeled images. The proposed DIL improves the mAP by 12.6% and 10.0% relative to the supervised baseline method when utilizing 5% and 10% of labeled data on the MS-COCO dataset, respectively.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1022-1035"},"PeriodicalIF":0.0,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143026305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TOPIC: A Parallel Association Paradigm for Multi-Object Tracking Under Complex Motions and Diverse Scenes

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2025-01-23 DOI: 10.1109/TIP.2025.3526066

Xiaoyan Cao;Yiyao Zheng;Yao Yao;Huapeng Qin;Xiaoyu Cao;Shihui Guo

Video data and algorithms have been driving advances in multi-object tracking (MOT). While existing MOT datasets focus on occlusion and appearance similarity, complex motion patterns are widespread yet overlooked. To address this issue, we introduce a new dataset called BEE24 to highlight complex motions. Identity association algorithms have long been the focus of MOT research. Existing trackers can be categorized into two association paradigms: single-feature paradigm (based on either motion or appearance feature) and serial paradigm (one feature serves as secondary while the other is primary). However, these paradigms are incapable of fully utilizing different features. In this paper, we propose a parallel paradigm and present the Two rOund Parallel matchIng meChanism (TOPIC) to implement it. The TOPIC leverages both motion and appearance features and can adaptively select the preferable one as the assignment metric based on motion level. Moreover, we provide an Attention-based Appearance Reconstruction Module (AARM) to reconstruct appearance feature embeddings, thus enhancing the representation of appearance features. Comprehensive experiments show that our approach achieves state-of-the-art performance on four public datasets and BEE24. Moreover, BEE24 challenges existing trackers to track multiple similar-appearing small objects with complex motions over long periods, which is critical in real-world applications such as beekeeping and drone swarm surveillance. Notably, our proposed parallel paradigm surpasses the performance of existing association paradigms by a large margin, e.g., reducing false negatives by 6% to 81% compared to the single-feature association paradigm. The introduced dataset and association paradigm in this work offer a fresh perspective for advancing the MOT field. The source code and dataset are available at https://github.com/holmescao/TOPICTrack.

{"title":"TOPIC: A Parallel Association Paradigm for Multi-Object Tracking Under Complex Motions and Diverse Scenes","authors":"Xiaoyan Cao;Yiyao Zheng;Yao Yao;Huapeng Qin;Xiaoyu Cao;Shihui Guo","doi":"10.1109/TIP.2025.3526066","DOIUrl":"10.1109/TIP.2025.3526066","url":null,"abstract":"Video data and algorithms have been driving advances in multi-object tracking (MOT). While existing MOT datasets focus on occlusion and appearance similarity, complex motion patterns are widespread yet overlooked. To address this issue, we introduce a new dataset called BEE24 to highlight complex motions. Identity association algorithms have long been the focus of MOT research. Existing trackers can be categorized into two association paradigms: single-feature paradigm (based on either motion or appearance feature) and serial paradigm (one feature serves as secondary while the other is primary). However, these paradigms are incapable of fully utilizing different features. In this paper, we propose a parallel paradigm and present the Two rOund Parallel matchIng meChanism (TOPIC) to implement it. The TOPIC leverages both motion and appearance features and can adaptively select the preferable one as the assignment metric based on motion level. Moreover, we provide an Attention-based Appearance Reconstruction Module (AARM) to reconstruct appearance feature embeddings, thus enhancing the representation of appearance features. Comprehensive experiments show that our approach achieves state-of-the-art performance on four public datasets and BEE24. Moreover, BEE24 challenges existing trackers to track multiple similar-appearing small objects with complex motions over long periods, which is critical in real-world applications such as beekeeping and drone swarm surveillance. Notably, our proposed parallel paradigm surpasses the performance of existing association paradigms by a large margin, e.g., reducing false negatives by 6% to 81% compared to the single-feature association paradigm. The introduced dataset and association paradigm in this work offer a fresh perspective for advancing the MOT field. The source code and dataset are available at <uri>https://github.com/holmescao/TOPICTrack</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"743-758"},"PeriodicalIF":0.0,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143026307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Addressing Inconsistent Labeling With Cross Image Matching for Scribble-Based Medical Image Segmentation

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2025-01-23 DOI: 10.1109/TIP.2025.3530787

Jingkun Chen;Wenjian Huang;Jianguo Zhang;Kurt Debattista;Jungong Han

In recent years, there has been a notable surge in the adoption of weakly-supervised learning for medical image segmentation, utilizing scribble annotation as a means to potentially reduce annotation costs. However, the inherent characteristics of scribble labeling, marked by incompleteness, subjectivity, and a lack of standardization, introduce inconsistencies into the annotations. These inconsistencies become significant challenges for the network’s learning process, ultimately affecting the performance of segmentation. To address this challenge, we propose creating a reference set to guide pixel-level feature matching, constructed from class-specific tokens and pixel-level features extracted from variously images. Serving as a repository showcasing diverse pixel styles and classes, the reference set becomes the cornerstone for a pixel-level feature matching strategy. This strategy enables the effective comparison of unlabeled pixels, offering guidance, particularly in learning scenarios characterized by inconsistent and incomplete scribbles. The proposed strategy incorporates smoothing and regression techniques to align pixel-level features across different images. By leveraging the diversity of pixel sources, our matching approach enhances the network’s ability to learn consistent patterns from the reference set. This, in turn, mitigates the impact of inconsistent and incomplete labeling, resulting in improved segmentation outcomes. Extensive experiments conducted on three publicly available datasets demonstrate the superiority of our approach over state-of-the-art methods in terms of segmentation accuracy and stability. The code will be made publicly available at https://github.com/jingkunchen/scribble-medical-segmentation.

{"title":"Addressing Inconsistent Labeling With Cross Image Matching for Scribble-Based Medical Image Segmentation","authors":"Jingkun Chen;Wenjian Huang;Jianguo Zhang;Kurt Debattista;Jungong Han","doi":"10.1109/TIP.2025.3530787","DOIUrl":"10.1109/TIP.2025.3530787","url":null,"abstract":"In recent years, there has been a notable surge in the adoption of weakly-supervised learning for medical image segmentation, utilizing scribble annotation as a means to potentially reduce annotation costs. However, the inherent characteristics of scribble labeling, marked by incompleteness, subjectivity, and a lack of standardization, introduce inconsistencies into the annotations. These inconsistencies become significant challenges for the network’s learning process, ultimately affecting the performance of segmentation. To address this challenge, we propose creating a reference set to guide pixel-level feature matching, constructed from class-specific tokens and pixel-level features extracted from variously images. Serving as a repository showcasing diverse pixel styles and classes, the reference set becomes the cornerstone for a pixel-level feature matching strategy. This strategy enables the effective comparison of unlabeled pixels, offering guidance, particularly in learning scenarios characterized by inconsistent and incomplete scribbles. The proposed strategy incorporates smoothing and regression techniques to align pixel-level features across different images. By leveraging the diversity of pixel sources, our matching approach enhances the network’s ability to learn consistent patterns from the reference set. This, in turn, mitigates the impact of inconsistent and incomplete labeling, resulting in improved segmentation outcomes. Extensive experiments conducted on three publicly available datasets demonstrate the superiority of our approach over state-of-the-art methods in terms of segmentation accuracy and stability. The code will be made publicly available at <uri>https://github.com/jingkunchen/scribble-medical-segmentation</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"842-853"},"PeriodicalIF":0.0,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143026410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TRTST: Arbitrary High-Quality Text-Guided Style Transfer With Transformers

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2025-01-23 DOI: 10.1109/TIP.2025.3530822

Haibo Chen;Zhoujie Wang;Lei Zhao;Jun Li;Jian Yang

Text-guided style transfer aims to repaint a content image with the target style described by a text prompt, offering greater flexibility and creativity compared to traditional image-guided style transfer. Despite the potential, existing text-guided style transfer methods often suffer from many issues, including insufficient visual quality, poor generalization ability, or a reliance on large amounts of paired training data. To address these limitations, we leverage the inherent strengths of transformers in handling multimodal data and propose a novel transformer-based framework called TRTST that not only achieves unpaired arbitrary text-guided style transfer but also significantly improves the visual quality. Specifically, TRTST explores combining a text transformer encoder with an image transformer encoder to project the input text prompt and content image into a joint embedding space and extract the desired style and content features. These features are then input into a multimodal co-attention module to stylize the image sequence based on the text sequence. We also propose a new adaptive parametric positional encoding (APPE) scheme which can adaptively produce different positional encodings to optimally match different inputs with a position encoder. In addition, to further improve content preservation, we introduce a text-guided identity loss to our model. Extensive results and comparisons are conducted to demonstrate the effectiveness and superiority of our method.

{"title":"TRTST: Arbitrary High-Quality Text-Guided Style Transfer With Transformers","authors":"Haibo Chen;Zhoujie Wang;Lei Zhao;Jun Li;Jian Yang","doi":"10.1109/TIP.2025.3530822","DOIUrl":"10.1109/TIP.2025.3530822","url":null,"abstract":"Text-guided style transfer aims to repaint a content image with the target style described by a text prompt, offering greater flexibility and creativity compared to traditional image-guided style transfer. Despite the potential, existing text-guided style transfer methods often suffer from many issues, including insufficient visual quality, poor generalization ability, or a reliance on large amounts of paired training data. To address these limitations, we leverage the inherent strengths of transformers in handling multimodal data and propose a novel transformer-based framework called TRTST that not only achieves unpaired arbitrary text-guided style transfer but also significantly improves the visual quality. Specifically, TRTST explores combining a text transformer encoder with an image transformer encoder to project the input text prompt and content image into a joint embedding space and extract the desired style and content features. These features are then input into a multimodal co-attention module to stylize the image sequence based on the text sequence. We also propose a new adaptive parametric positional encoding (APPE) scheme which can adaptively produce different positional encodings to optimally match different inputs with a position encoder. In addition, to further improve content preservation, we introduce a text-guided identity loss to our model. Extensive results and comparisons are conducted to demonstrate the effectiveness and superiority of our method.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"759-771"},"PeriodicalIF":0.0,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143026437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GCSTG: Generating Class-Confusion-Aware Samples With a Tree-Structure Graph for Few-Shot Object Detection

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2025-01-23 DOI: 10.1109/TIP.2025.3530792

Longrong Yang;Hanbin Zhao;Hongliang Li;Liang Qiao;Ziwei Yang;Xi Li

Few-Shot Object Detection (FSOD) aims to detect the objects of novel classes using only a few manually annotated samples. With the few novel class samples, learning the inter-class relationships among foreground and constructing the corresponding class hierarchy in FSOD is a challenging task. The poor construction of the class hierarchy will result in the inter-class confusion problem, which has been identified as a primary cause of inferior performance in novel classes by recent FSOD methods. In this work, we further find that the intra-super-class confusion, where samples are misclassified as classes within their associated super-classes, is the main challenge in solving the confusion problem. To solve this issue, this work generates class-confusion-aware samples with a pre-defined tree-structure graph, for helping models to construct a precise class hierarchy. In precise, for generating class-confusion-aware samples, we add the noise into available samples and update the noise to maximize confidence scores on associated confusion categories of samples. Then, a confusion-aware curriculum learning strategy is proposed to make generated samples gradually participate in the training, which benefits the model convergence while learning the generated samples. Experimental results show that our method can be used as a plug-in in recent FSOD methods and consistently improve the model performance.

{"title":"GCSTG: Generating Class-Confusion-Aware Samples With a Tree-Structure Graph for Few-Shot Object Detection","authors":"Longrong Yang;Hanbin Zhao;Hongliang Li;Liang Qiao;Ziwei Yang;Xi Li","doi":"10.1109/TIP.2025.3530792","DOIUrl":"10.1109/TIP.2025.3530792","url":null,"abstract":"Few-Shot Object Detection (FSOD) aims to detect the objects of novel classes using only a few manually annotated samples. With the few novel class samples, learning the inter-class relationships among foreground and constructing the corresponding class hierarchy in FSOD is a challenging task. The poor construction of the class hierarchy will result in the inter-class confusion problem, which has been identified as a primary cause of inferior performance in novel classes by recent FSOD methods. In this work, we further find that the intra-super-class confusion, where samples are misclassified as classes within their associated super-classes, is the main challenge in solving the confusion problem. To solve this issue, this work generates class-confusion-aware samples with a pre-defined tree-structure graph, for helping models to construct a precise class hierarchy. In precise, for generating class-confusion-aware samples, we add the noise into available samples and update the noise to maximize confidence scores on associated confusion categories of samples. Then, a confusion-aware curriculum learning strategy is proposed to make generated samples gradually participate in the training, which benefits the model convergence while learning the generated samples. Experimental results show that our method can be used as a plug-in in recent FSOD methods and consistently improve the model performance.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"772-784"},"PeriodicalIF":0.0,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143026308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Viewpoint and Multi-Evaluation With Felicitous Inductive Bias Boost Machine Abstract Reasoning Ability 多视点、多评价和恰当的归纳偏差提高机器抽象推理能力

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2025-01-22 DOI: 10.1109/TIP.2025.3530260

Qinglai Wei;Diancheng Chen;Beiming Yuan

Great efforts have been made to investigate AI’s ability in abstract reasoning, along with the proposal of various versions of RAVEN’s progressive matrices (RPM) as benchmarks. Previous studies suggest that, even after extensive training, neural networks may still struggle to make decisive decisions regarding RPM problems without sophisticated designs or additional semantic information in the form of meta-data. Through comprehensive experiments, we demonstrate that neural networks endowed with appropriate inductive biases, either intentionally designed or fortuitously matched, can efficiently solve RPM problems without the need for extra meta-data augmentation. Our work also reveals the importance of employing a multi-viewpoint with multi-evaluation approach as a key learning strategy for successful reasoning. Nevertheless, we acknowledge the unique role of metadata by demonstrating that a pre-training model supervised by meta-data leads to an RPM solver with improved performance. Codes are available in: https://github.com/QinglaiWeiCASIA/RavenSolver.

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀