首页 > 最新文献

Pattern Recognition最新文献

英文 中文
Multi-scale implicit transformer with re-parameterization for arbitrary-scale super-resolution
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-07 DOI: 10.1016/j.patcog.2024.111327
Jinchen Zhu, Mingjian Zhang, Ling Zheng, Shizhuang Weng
Methods based on implicit neural representations have recently exhibited excellent capabilities for arbitrary-scale super-resolution (ASSR). Although these methods represent the features of an image by generating latent codes, these latent codes are difficult to adapt to the different magnification factors of super-resolution (SR) imaging, seriously affecting their performance. To address this issue, we design a multi-scale implicit transformer (MSIT) that consists of a multi-scale neural operator (MSNO) and multi-scale self-attention (MSSA). MSNO obtains multi-scale latent codes through feature enhancement, multi-scale characteristic extraction, and multi-scale characteristic merging. MSSA further enhances the multi-scale characteristics of latent codes, resulting in improved performance. Furthermore, we propose the re-interaction module combined with a cumulative training strategy to improve the diversity of learned information for the network during training. We have systematically introduced multi-scale characteristics for the first time into ASSR. Extensive experiments are performed to validate the effectiveness of MSIT, and our method achieves state-of-the-art performance in ASSR tasks.
{"title":"Multi-scale implicit transformer with re-parameterization for arbitrary-scale super-resolution","authors":"Jinchen Zhu,&nbsp;Mingjian Zhang,&nbsp;Ling Zheng,&nbsp;Shizhuang Weng","doi":"10.1016/j.patcog.2024.111327","DOIUrl":"10.1016/j.patcog.2024.111327","url":null,"abstract":"<div><div>Methods based on implicit neural representations have recently exhibited excellent capabilities for arbitrary-scale super-resolution (ASSR). Although these methods represent the features of an image by generating latent codes, these latent codes are difficult to adapt to the different magnification factors of super-resolution (SR) imaging, seriously affecting their performance. To address this issue, we design a multi-scale implicit transformer (MSIT) that consists of a multi-scale neural operator (MSNO) and multi-scale self-attention (MSSA). MSNO obtains multi-scale latent codes through feature enhancement, multi-scale characteristic extraction, and multi-scale characteristic merging. MSSA further enhances the multi-scale characteristics of latent codes, resulting in improved performance. Furthermore, we propose the re-interaction module combined with a cumulative training strategy to improve the diversity of learned information for the network during training. We have systematically introduced multi-scale characteristics for the first time into ASSR. Extensive experiments are performed to validate the effectiveness of MSIT, and our method achieves state-of-the-art performance in ASSR tasks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111327"},"PeriodicalIF":7.5,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep face template protection in the wild
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-06 DOI: 10.1016/j.patcog.2024.111336
Sunpill Kim , Hoyong Shin , Jae Hong Seo
The advancement in the field of deep neural network (NN) leads practical recognition systems for biometrics such as face but also increases the threat to privacy such as recovering original biometrics from templates. The efficiency, the security and the usability are three points of important but difficult-to-achieve simultaneously in template protection. IronMask (CVPR 2021) shows the importance of efficient error-correcting mechanism on the metric used in the recognition system when designing template protection satisfying these three points at the same time. It is a first modular protection that can be added to any NN-based face recognition system independently (pre)trained by metric learning with cosine similarity. In addition, its performance with three datasets (Multi-PIE, FEI, Color FERET), which are widely used for evaluating template protection, is comparable with protection-recognition integrated systems that limit the usability due to inefficient registration. In this paper, we first demonstrate and analyze limit of IronMask by using more wilder and larger face datasets (LFW, AgeDB-30, CFP-FP, IJB-C). On the basis of our analyses on IronMask, we propose a new face template protection that has several benefits over IronMask with preserving modular feature. First, ours provides more flexibility to manipulate the error-correcing capacity for balancing between true accept rate (TAR) and false accept rate (FAR). Second, ours minimizes performance degradation while keeping appropriate level of security; even evaluating with a large dataset IJB-C, we achieve a TAR of 96.31% at a FAR of 0.05% with 118-bit security when combined with ArcFace that achieves 96.97% TAR at 0.01% FAR.
{"title":"Deep face template protection in the wild","authors":"Sunpill Kim ,&nbsp;Hoyong Shin ,&nbsp;Jae Hong Seo","doi":"10.1016/j.patcog.2024.111336","DOIUrl":"10.1016/j.patcog.2024.111336","url":null,"abstract":"<div><div>The advancement in the field of deep neural network (NN) leads practical recognition systems for biometrics such as face but also increases the threat to privacy such as recovering original biometrics from templates. The efficiency, the security and the usability are three points of important but difficult-to-achieve simultaneously in template protection. IronMask (CVPR 2021) shows the importance of efficient error-correcting mechanism on the metric used in the recognition system when designing template protection satisfying these three points at the same time. It is a first modular protection that can be added to any NN-based face recognition system independently (pre)trained by metric learning with cosine similarity. In addition, its performance with three datasets (Multi-PIE, FEI, Color FERET), which are widely used for evaluating template protection, is comparable with protection-recognition integrated systems that limit the usability due to inefficient registration. In this paper, we first demonstrate and analyze limit of IronMask by using more wilder and larger face datasets (LFW, AgeDB-30, CFP-FP, IJB-C). On the basis of our analyses on IronMask, we propose a new face template protection that has several benefits over IronMask with preserving modular feature. First, ours provides more flexibility to manipulate the error-correcing capacity for balancing between true accept rate (TAR) and false accept rate (FAR). Second, ours minimizes performance degradation while keeping appropriate level of security; even evaluating with a large dataset IJB-C, we achieve a TAR of 96.31% at a FAR of 0.05% with 118-bit security when combined with ArcFace that achieves 96.97% TAR at 0.01% FAR.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111336"},"PeriodicalIF":7.5,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A projected gradient solution to the minimum connector problem with extensions to support vector machines
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-04 DOI: 10.1016/j.patcog.2024.111339
Raul Fonseca Neto , Saulo Moraes Villela , Antonio Padua Braga
In this paper, we present a comprehensive study on the problem of finding the minimum connector between two convex sets, particularly focusing on polytopes, and extended to large margin classification problems. The problem holds significant relevance in diverse fields such as pattern recognition, machine learning, convex analysis, and applied linear algebra. Notably, it plays a crucial role in binary classification tasks by determining the maximum margin hyperplane that separates two sets of data. Our main contribution is the introduction of an innovative iterative approach that employs a projected gradient method to compute the minimum connector solution using only first-order information. Furthermore, we demonstrate the applicability of our method to solve the one-class problem with a single projection step, and the multi-class problem with a novel multi-objective quadratic function and a multiple projection step, which have important significance in pattern recognition and machine learning fields. Our formulation incorporates a dual representation, enabling utilization of kernel functions to address non-linearly separable problems. Moreover, we establish a connection between the solutions of the Minimum Connector and the Maximum Margin Hyperplane problems through a reparameterization technique based on collinear projection. To validate the effectiveness of our method, we conduct extensive experiments on various benchmark datasets commonly used in the field. The experimental results demonstrate the effectiveness of our approach and its ability to handle diverse applications.
{"title":"A projected gradient solution to the minimum connector problem with extensions to support vector machines","authors":"Raul Fonseca Neto ,&nbsp;Saulo Moraes Villela ,&nbsp;Antonio Padua Braga","doi":"10.1016/j.patcog.2024.111339","DOIUrl":"10.1016/j.patcog.2024.111339","url":null,"abstract":"<div><div>In this paper, we present a comprehensive study on the problem of finding the minimum connector between two convex sets, particularly focusing on polytopes, and extended to large margin classification problems. The problem holds significant relevance in diverse fields such as pattern recognition, machine learning, convex analysis, and applied linear algebra. Notably, it plays a crucial role in binary classification tasks by determining the maximum margin hyperplane that separates two sets of data. Our main contribution is the introduction of an innovative iterative approach that employs a projected gradient method to compute the minimum connector solution using only first-order information. Furthermore, we demonstrate the applicability of our method to solve the one-class problem with a single projection step, and the multi-class problem with a novel multi-objective quadratic function and a multiple projection step, which have important significance in pattern recognition and machine learning fields. Our formulation incorporates a dual representation, enabling utilization of kernel functions to address non-linearly separable problems. Moreover, we establish a connection between the solutions of the Minimum Connector and the Maximum Margin Hyperplane problems through a reparameterization technique based on collinear projection. To validate the effectiveness of our method, we conduct extensive experiments on various benchmark datasets commonly used in the field. The experimental results demonstrate the effectiveness of our approach and its ability to handle diverse applications.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111339"},"PeriodicalIF":7.5,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D human pose estimation and action recognition using fisheye cameras: A survey and benchmark
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-03 DOI: 10.1016/j.patcog.2024.111334
Yahui Zhang , Shaodi You , Sezer Karaoglu , Theo Gevers
3D human pose estimation based on visual information aims to predict 3D poses of humans in images or videos. The aim of human action recognition is to classify what kind of actions people do. Both topics are widely studied in the field of computer vision.
Existing methods mainly focus on 3D human pose estimation and human action recognition using images/videos recorded by perspective cameras. In contrast to perspective cameras, fisheye cameras use wide-angle lenses capturing wider field-of-views (FOV). Fisheye cameras are used in many applications such as surveillance and autonomous driving.
In this paper, a survey is given on monocular 3D human pose estimation and action recognition. A new benchmark dataset is proposed using a fisheye camera to quantitatively compare and analyze existing methods.
{"title":"3D human pose estimation and action recognition using fisheye cameras: A survey and benchmark","authors":"Yahui Zhang ,&nbsp;Shaodi You ,&nbsp;Sezer Karaoglu ,&nbsp;Theo Gevers","doi":"10.1016/j.patcog.2024.111334","DOIUrl":"10.1016/j.patcog.2024.111334","url":null,"abstract":"<div><div>3D human pose estimation based on visual information aims to predict 3D poses of humans in images or videos. The aim of human action recognition is to classify what kind of actions people do. Both topics are widely studied in the field of computer vision.</div><div>Existing methods mainly focus on 3D human pose estimation and human action recognition using images/videos recorded by perspective cameras. In contrast to perspective cameras, fisheye cameras use wide-angle lenses capturing wider field-of-views (FOV). Fisheye cameras are used in many applications such as surveillance and autonomous driving.</div><div>In this paper, a survey is given on monocular 3D human pose estimation and action recognition. A new benchmark dataset is proposed using a fisheye camera to quantitatively compare and analyze existing methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111334"},"PeriodicalIF":7.5,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Class-aware Universum Inspired re-balance learning for long-tailed recognition
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-03 DOI: 10.1016/j.patcog.2024.111337
Enhao Zhang , Chuanxing Geng , Songcan Chen
Data augmentation for minority classes serves as an effective strategy for long-tailed recognition, prompting the emergence of numerous methods. Although these methods achieve balance in sample quantity, the quality of the augmented samples cannot be guaranteed, invoking issues like over-fitting, lack of variety, and semantic drift. To this end, we propose the Class-aware Universum Inspired Re-balance Learning (CaUIRL) for long-tailed recognition, which endows the Universum with class-aware ability to re-balance individual minority classes in terms of both sample quantity and quality. In particular, we theoretically prove that the classifiers learned by CaUIRL are consistent with those learned under the balanced condition from a Bayesian perspective. In addition, we develop a higher-order mixup approach, which can automatically generate class-aware Universum (CaU) data without resorting to any external data. Unlike the traditional Universum, CaU additionally takes into account domain similarity, class separability, and sample diversity into account. Comprehensive experiments on benchmark datasets reveal that the proposed method substantially enhances model performance, especially in minority classes (e.g., the top-1 accuracy of the last two tail classes is improved by 6% on Cifar10-LR).
{"title":"Class-aware Universum Inspired re-balance learning for long-tailed recognition","authors":"Enhao Zhang ,&nbsp;Chuanxing Geng ,&nbsp;Songcan Chen","doi":"10.1016/j.patcog.2024.111337","DOIUrl":"10.1016/j.patcog.2024.111337","url":null,"abstract":"<div><div>Data augmentation for minority classes serves as an effective strategy for long-tailed recognition, prompting the emergence of numerous methods. Although these methods achieve balance in sample quantity, the quality of the augmented samples cannot be guaranteed, invoking issues like over-fitting, lack of variety, and semantic drift. To this end, we propose the Class-aware Universum Inspired Re-balance Learning (CaUIRL) for long-tailed recognition, which endows the Universum with class-aware ability to re-balance individual minority classes in terms of both sample quantity and quality. In particular, we theoretically prove that the classifiers learned by CaUIRL are consistent with those learned under the balanced condition from a Bayesian perspective. In addition, we develop a higher-order mixup approach, which can automatically generate class-aware Universum (CaU) data without resorting to any external data. Unlike the traditional Universum, CaU additionally takes into account domain similarity, class separability, and sample diversity into account. Comprehensive experiments on benchmark datasets reveal that the proposed method substantially enhances model performance, especially in minority classes (e.g., the top-1 accuracy of the last two tail classes is improved by 6% on Cifar10-LR).</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"161 ","pages":"Article 111337"},"PeriodicalIF":7.5,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143146854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Source-free video domain adaptation by learning from noisy labels
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-02 DOI: 10.1016/j.patcog.2024.111328
Avijit Dasgupta , C.V. Jawahar , Karteek Alahari
Despite the progress seen in classification methods, current approaches for handling videos with distribution shifts in source and target domains remain source-dependent as they require access to the source data during the adaptation stage. In this paper, we present a self-training based source-free video domain adaptation approach to address this challenge by bridging the gap between the source and the target domains. We use the source pre-trained model to generate pseudo-labels for the target domain samples, which are inevitably noisy. Thus, we treat the problem of source-free video domain adaptation as learning from noisy labels and argue that the samples with correct pseudo-labels can help us in adaptation. To this end, we leverage the cross-entropy loss as an indicator of the correctness of the pseudo-labels and use the resulting small-loss samples from the target domain for fine-tuning the model. We further enhance the adaptation performance by implementing a teacher–student (TS) framework, in which the teacher, which is updated gradually, produces reliable pseudo-labels. Meanwhile, the student undergoes fine-tuning on the target domain videos using these generated pseudo-labels to improve its performance. Extensive experimental evaluations show that our methods, termed as CleanAdapt, CleanAdapt + TS, achieve state-of-the-art results, outperforming the existing approaches on various open datasets. Our source code is publicly available at https://avijit9.github.io/CleanAdapt.
{"title":"Source-free video domain adaptation by learning from noisy labels","authors":"Avijit Dasgupta ,&nbsp;C.V. Jawahar ,&nbsp;Karteek Alahari","doi":"10.1016/j.patcog.2024.111328","DOIUrl":"10.1016/j.patcog.2024.111328","url":null,"abstract":"<div><div>Despite the progress seen in classification methods, current approaches for handling videos with distribution shifts in source and target domains remain source-dependent as they require access to the source data during the adaptation stage. In this paper, we present a self-training based <em>source-free</em> video domain adaptation approach to address this challenge by bridging the gap between the source and the target domains. We use the source pre-trained model to generate pseudo-labels for the target domain samples, which are inevitably noisy. Thus, we treat the problem of source-free video domain adaptation as learning from noisy labels and argue that the samples with correct pseudo-labels can help us in adaptation. To this end, we leverage the cross-entropy loss as an indicator of the correctness of the pseudo-labels and use the resulting small-loss samples from the target domain for fine-tuning the model. We further enhance the adaptation performance by implementing a teacher–student (TS) framework, in which the teacher, which is updated gradually, produces reliable pseudo-labels. Meanwhile, the student undergoes fine-tuning on the target domain videos using these generated pseudo-labels to improve its performance. Extensive experimental evaluations show that our methods, termed as <em>CleanAdapt, CleanAdapt + TS</em>, achieve state-of-the-art results, outperforming the existing approaches on various open datasets. Our source code is publicly available at <span><span>https://avijit9.github.io/CleanAdapt</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"161 ","pages":"Article 111328"},"PeriodicalIF":7.5,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143146234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PartSeg: Few-shot part segmentation via part-aware prompt learning
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-02 DOI: 10.1016/j.patcog.2024.111326
Mengya Han , Heliang Zheng , Chaoyue Wang , Yong Luo , Han Hu , Jing Zhang , Bo Du
In this work, we address the task of few-shot part segmentation, which aims to segment the different parts of an unseen object using very few labeled examples. It has been found that leveraging the textual space of a powerful pre-trained image-language model, such as CLIP, can substantially enhance the learning of visual features in few-shot tasks. However, CLIP-based methods primarily focus on high-level visual features that are fully aligned with textual features representing the “summary” of the image, which often struggle to understand the concept of object parts through textual descriptions. To address this, we propose PartSeg, a novel method that learns part-aware prompts to grasp the concept of “part” and better utilize the textual space of CLIP to enhance few-shot part segmentation. Specifically, we design a part-aware prompt learning module that generates part-aware prompts, enabling the CLIP model to better understand the concept of “part” and effectively utilize its textual space. The part-aware prompt learning module includes a part-specific prompt generator that produces part-specific tokens for each part class. Furthermore, since the concept of the same part across different object categories is general, we establish relationships between these parts to estimate part-shared tokens during the prompt learning process. Finally, the part-specific and part-shared tokens, along with the textual tokens encoded from textual descriptions of parts (i.e., part labels), are combined to form the part-aware prompt used to generate textual prototypes for segmentation. We conduct extensive experiments on the PartImageNet and Pascal_Part datasets, and the results demonstrate that our proposed method achieves state-of-the-art performance.
{"title":"PartSeg: Few-shot part segmentation via part-aware prompt learning","authors":"Mengya Han ,&nbsp;Heliang Zheng ,&nbsp;Chaoyue Wang ,&nbsp;Yong Luo ,&nbsp;Han Hu ,&nbsp;Jing Zhang ,&nbsp;Bo Du","doi":"10.1016/j.patcog.2024.111326","DOIUrl":"10.1016/j.patcog.2024.111326","url":null,"abstract":"<div><div>In this work, we address the task of few-shot part segmentation, which aims to segment the different parts of an unseen object using very few labeled examples. It has been found that leveraging the textual space of a powerful pre-trained image-language model, such as CLIP, can substantially enhance the learning of visual features in few-shot tasks. However, CLIP-based methods primarily focus on high-level visual features that are fully aligned with textual features representing the “summary” of the image, which often struggle to understand the concept of object parts through textual descriptions. To address this, we propose PartSeg, a novel method that learns part-aware prompts to grasp the concept of “part” and better utilize the textual space of CLIP to enhance few-shot part segmentation. Specifically, we design a part-aware prompt learning module that generates part-aware prompts, enabling the CLIP model to better understand the concept of “part” and effectively utilize its textual space. The part-aware prompt learning module includes a part-specific prompt generator that produces part-specific tokens for each part class. Furthermore, since the concept of the same part across different object categories is general, we establish relationships between these parts to estimate part-shared tokens during the prompt learning process. Finally, the part-specific and part-shared tokens, along with the textual tokens encoded from textual descriptions of parts (i.e., part labels), are combined to form the part-aware prompt used to generate textual prototypes for segmentation. We conduct extensive experiments on the PartImageNet and Pascal_Part datasets, and the results demonstrate that our proposed method achieves state-of-the-art performance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111326"},"PeriodicalIF":7.5,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing knowledge distillation for semantic segmentation through text-assisted modular plugins
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-02 DOI: 10.1016/j.patcog.2024.111329
Letian Wu , Shen Zhang , Chuankai Zhang , Zhenyu Zhao , Jiajun Liang , Wankou Yang
Compared with other model compression methods, such as pruning and quantization, knowledge distillation offers superior compatibility and flexibility. Current knowledge distillation (KD) methods for semantic segmentation predominantly guide the student model to replicate the structured information of the teacher model solely through image data. However, these approaches often overlook the potential benefits of incorporating auxiliary modalities, such as textual information, into the distillation process, thereby failing to effectively bridge the gap between the student and teacher models. This paper introduces a novel text-assisted distillation methodology. Leveraging the framework of Contrastive Language-Image Pretraining (CLIP), we propose two modular plugins: the Text-Channel Distillation module and the Text-Region Distillation module, designed to integrate textual priors into the distillation process. These modules serve as a bridge between the student and teacher models, enhancing the emulation of teacher networks by student models. Characterized by their simplicity, versatility, and seamless integration with existing knowledge distillation frameworks, these modules facilitate improved performance. Experimental evaluations conducted on the Cityscapes, Pascal VOC, and CamVid datasets demonstrate that augmenting state-of-the-art distillation techniques with these plug-and-play modules yields significant improvements in distillation effectiveness.
{"title":"Enhancing knowledge distillation for semantic segmentation through text-assisted modular plugins","authors":"Letian Wu ,&nbsp;Shen Zhang ,&nbsp;Chuankai Zhang ,&nbsp;Zhenyu Zhao ,&nbsp;Jiajun Liang ,&nbsp;Wankou Yang","doi":"10.1016/j.patcog.2024.111329","DOIUrl":"10.1016/j.patcog.2024.111329","url":null,"abstract":"<div><div>Compared with other model compression methods, such as pruning and quantization, knowledge distillation offers superior compatibility and flexibility. Current knowledge distillation (KD) methods for semantic segmentation predominantly guide the student model to replicate the structured information of the teacher model solely through image data. However, these approaches often overlook the potential benefits of incorporating auxiliary modalities, such as textual information, into the distillation process, thereby failing to effectively bridge the gap between the student and teacher models. This paper introduces a novel text-assisted distillation methodology. Leveraging the framework of Contrastive Language-Image Pretraining (CLIP), we propose two modular plugins: the <strong>Text-Channel Distillation</strong> module and the <strong>Text-Region Distillation</strong> module, designed to integrate textual priors into the distillation process. These modules serve as a bridge between the student and teacher models, enhancing the emulation of teacher networks by student models. Characterized by their simplicity, versatility, and seamless integration with existing knowledge distillation frameworks, these modules facilitate improved performance. Experimental evaluations conducted on the Cityscapes, Pascal VOC, and CamVid datasets demonstrate that augmenting state-of-the-art distillation techniques with these plug-and-play modules yields significant improvements in distillation effectiveness.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"161 ","pages":"Article 111329"},"PeriodicalIF":7.5,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143147642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An event-based motion scene feature extraction framework
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-01 DOI: 10.1016/j.patcog.2024.111320
Zhaoxin Liu, Jinjian Wu, Guangming Shi, Wen Yang, Jupo Ma
Integral cameras cause motion blur during relative object displacement, leading to degraded image aesthetics and reduced performance of image-based algorithms. Event cameras capture high-temporal-resolution dynamic scene changes, providing spatially aligned motion information to complement images. However, external modules for event-based motion feature extraction, such as optical flow estimation, introduce additional computational costs and inference time. Moreover, achieving a globally optimal solution becomes challenging without joint optimization. In this paper, we propose a cross-modal motion scene feature extraction framework for motion-sensitive tasks, addressing challenges in motion feature extraction and dual-path feature fusion. The framework, serving as a versatile feature encoder, can adapt its feature extractor structure to meet diverse task requirements. We initially analyzed and identified the spatially concentrated and temporally continuous feature extraction tendency of spiking neural networks (SNNs). Based on this observation, we propose the hybrid spiking motion object feature extractor (HSME). Within this module, a novel fusion block is introduced to avoid feature-level blurring during the fusion of spike-float features. Furthermore, to ensure the acquisition of complementary scene features by the two-modal networks, we devise a spatial feature disentanglement that constraints the network during the optimization process. Event-based motion deblurring represents a prototypical motion-sensitive task, and our approach was assessed on prevalent datasets, attaining a state-of-the-art performance while maintaining an exceptionally low parameter count. We also conducted ablation experiments to evaluate the influence of each framework component on the results. Code and pre-trained models will be published after the paper is accepted.
{"title":"An event-based motion scene feature extraction framework","authors":"Zhaoxin Liu,&nbsp;Jinjian Wu,&nbsp;Guangming Shi,&nbsp;Wen Yang,&nbsp;Jupo Ma","doi":"10.1016/j.patcog.2024.111320","DOIUrl":"10.1016/j.patcog.2024.111320","url":null,"abstract":"<div><div>Integral cameras cause motion blur during relative object displacement, leading to degraded image aesthetics and reduced performance of image-based algorithms. Event cameras capture high-temporal-resolution dynamic scene changes, providing spatially aligned motion information to complement images. However, external modules for event-based motion feature extraction, such as optical flow estimation, introduce additional computational costs and inference time. Moreover, achieving a globally optimal solution becomes challenging without joint optimization. In this paper, we propose a cross-modal motion scene feature extraction framework for motion-sensitive tasks, addressing challenges in motion feature extraction and dual-path feature fusion. The framework, serving as a versatile feature encoder, can adapt its feature extractor structure to meet diverse task requirements. We initially analyzed and identified the spatially concentrated and temporally continuous feature extraction tendency of spiking neural networks (SNNs). Based on this observation, we propose the hybrid spiking motion object feature extractor (HSME). Within this module, a novel fusion block is introduced to avoid feature-level blurring during the fusion of spike-float features. Furthermore, to ensure the acquisition of complementary scene features by the two-modal networks, we devise a spatial feature disentanglement that constraints the network during the optimization process. Event-based motion deblurring represents a prototypical motion-sensitive task, and our approach was assessed on prevalent datasets, attaining a state-of-the-art performance while maintaining an exceptionally low parameter count. We also conducted ablation experiments to evaluate the influence of each framework component on the results. Code and pre-trained models will be published after the paper is accepted.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"161 ","pages":"Article 111320"},"PeriodicalIF":7.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143147639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep multi-view clustering with diverse and discriminative feature learning
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-01 DOI: 10.1016/j.patcog.2024.111322
Junpeng Xu , Min Meng , Jigang Liu , Jigang Wu
Multi-view clustering (MVC) has gained significant attention in unsupervised learning. However, existing methods often face two key limitations: (1) many approaches rely on feature fusion from all views to identify cluster patterns, which inevitably reduces the distinctiveness of the learned representations; (2) existing methods primarily focus on uncovering common semantic features across different views while neglecting to promote the diversity of representations. As a result, they fail to fully leverage the complementary information across views, which potentially inhibits the effectiveness of representation learning. To address these challenges, we propose a novel diverse and discriminative feature learning framework for deep multi-view clustering (DDMVC) in a fusion-free manner. Specifically, we introduce a consistency constraint that performs preliminary alignment of low-level features to ensure consistent relationships between samples from different views. Following this, our model leverages contrastive learning to achieve consistency across multiple views and enhances the diversity of multi-view representations by ensuring the embedding vectors of samples within a batch to be distinct and by decorrelating the embedding dimensions (or variables). In this way, the proposed model can preserve the information content of each view at a certain level and reduce redundancy across multiple views, thereby facilitating the exploration of underlying complementarity among views. This approach successfully incorporates dimension independence in contrastive learning and can be easily integrated into other deep neural networks. Extensive evaluations on eight widely used benchmark datasets demonstrate that the proposed approach outperforms several state-of-the-art MVC methods. The code is available at https://github.com/xujunpeng832/DDMVC.
{"title":"Deep multi-view clustering with diverse and discriminative feature learning","authors":"Junpeng Xu ,&nbsp;Min Meng ,&nbsp;Jigang Liu ,&nbsp;Jigang Wu","doi":"10.1016/j.patcog.2024.111322","DOIUrl":"10.1016/j.patcog.2024.111322","url":null,"abstract":"<div><div>Multi-view clustering (MVC) has gained significant attention in unsupervised learning. However, existing methods often face two key limitations: (1) many approaches rely on feature fusion from all views to identify cluster patterns, which inevitably reduces the distinctiveness of the learned representations; (2) existing methods primarily focus on uncovering common semantic features across different views while neglecting to promote the diversity of representations. As a result, they fail to fully leverage the complementary information across views, which potentially inhibits the effectiveness of representation learning. To address these challenges, we propose a novel diverse and discriminative feature learning framework for deep multi-view clustering (DDMVC) in a fusion-free manner. Specifically, we introduce a consistency constraint that performs preliminary alignment of low-level features to ensure consistent relationships between samples from different views. Following this, our model leverages contrastive learning to achieve consistency across multiple views and enhances the diversity of multi-view representations by ensuring the embedding vectors of samples within a batch to be distinct and by decorrelating the embedding dimensions (or variables). In this way, the proposed model can preserve the information content of each view at a certain level and reduce redundancy across multiple views, thereby facilitating the exploration of underlying complementarity among views. This approach successfully incorporates dimension independence in contrastive learning and can be easily integrated into other deep neural networks. Extensive evaluations on eight widely used benchmark datasets demonstrate that the proposed approach outperforms several state-of-the-art MVC methods. The code is available at <span><span>https://github.com/xujunpeng832/DDMVC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"161 ","pages":"Article 111322"},"PeriodicalIF":7.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143146442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1