首页 > 最新文献

Pattern Recognition最新文献

英文 中文
Active learning for cross-modal cardiac segmentation with sparse annotation
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-04 DOI: 10.1016/j.patcog.2025.111403
Zihang Chen , Weijie Zhao , Jingyang Liu , Puguang Xie , Siyu Hou , Yongjian Nian , Xiaochao Yang , Ruiyan Ma , Haiyan Ding , Jingjing Xiao
This work presents a new dual-domain active learning method for cross-modal cardiac image segmentation with sparse annotations. Our network uses tilted Variational Auto-Encoders (tVAE) to extract and align invariant features from different domains. A proposed innovative Category Diversity Maximization approach that calculates statistical information regarding categories within a region reflects category diversity. The Uncertainty Region Selection Strategy is devised to measure the uncertainty of each predicted pixel. By jointly using these two methodologies, we identify risky areas for future annotation in active learning. The method was benchmarked against leading algorithms using two public cardiac datasets. In the MS-CMRSeg bSSFP to LGE segmentation task, our method achieved a DSC of 87.2% with just six-pixel annotations, surpassing the best results from the MS-CMRSeg Challenge 2019. In the MM-WHS dataset, our method using only 0.1% of annotations achieved a DSC of 91.8% for CT to MR and 88.9% for MR to CT, surpassing fully supervised models.1
{"title":"Active learning for cross-modal cardiac segmentation with sparse annotation","authors":"Zihang Chen ,&nbsp;Weijie Zhao ,&nbsp;Jingyang Liu ,&nbsp;Puguang Xie ,&nbsp;Siyu Hou ,&nbsp;Yongjian Nian ,&nbsp;Xiaochao Yang ,&nbsp;Ruiyan Ma ,&nbsp;Haiyan Ding ,&nbsp;Jingjing Xiao","doi":"10.1016/j.patcog.2025.111403","DOIUrl":"10.1016/j.patcog.2025.111403","url":null,"abstract":"<div><div>This work presents a new dual-domain active learning method for cross-modal cardiac image segmentation with sparse annotations. Our network uses tilted Variational Auto-Encoders (tVAE) to extract and align invariant features from different domains. A proposed innovative Category Diversity Maximization approach that calculates statistical information regarding categories within a region reflects category diversity. The Uncertainty Region Selection Strategy is devised to measure the uncertainty of each predicted pixel. By jointly using these two methodologies, we identify risky areas for future annotation in active learning. The method was benchmarked against leading algorithms using two public cardiac datasets. In the MS-CMRSeg bSSFP to LGE segmentation task, our method achieved a DSC of 87.2% with just six-pixel annotations, surpassing the best results from the MS-CMRSeg Challenge 2019. In the MM-WHS dataset, our method using only 0.1% of annotations achieved a DSC of 91.8% for CT to MR and 88.9% for MR to CT, surpassing fully supervised models.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111403"},"PeriodicalIF":7.5,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143403362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incremental clustering based on Wasserstein distance between histogram models
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-04 DOI: 10.1016/j.patcog.2025.111414
Xiaotong Qian , Guénaël Cabanes , Parisa Rastin , Mohamed Alae Guidani , Ghassen Marrakchi , Marianne Clausel , Nistor Grozavu
In this article, we present an innovative clustering framework designed for large datasets and real-time data streams which uses a sliding window and histogram model to address the challenge of memory congestion while reducing computational complexity and improving cluster quality for both static and dynamic clustering. The framework provides a simple way to characterize the probability distribution of cluster distributions through histogram models, regardless of their distribution type. This advantage allows for efficient use with various conventional clustering algorithms. To facilitate effective clustering across windows, we use a statistical measure that allows the comparison and merging of different clusters based on the calculation of the Wasserstein distance between histograms.
{"title":"Incremental clustering based on Wasserstein distance between histogram models","authors":"Xiaotong Qian ,&nbsp;Guénaël Cabanes ,&nbsp;Parisa Rastin ,&nbsp;Mohamed Alae Guidani ,&nbsp;Ghassen Marrakchi ,&nbsp;Marianne Clausel ,&nbsp;Nistor Grozavu","doi":"10.1016/j.patcog.2025.111414","DOIUrl":"10.1016/j.patcog.2025.111414","url":null,"abstract":"<div><div>In this article, we present an innovative clustering framework designed for large datasets and real-time data streams which uses a sliding window and histogram model to address the challenge of memory congestion while reducing computational complexity and improving cluster quality for both static and dynamic clustering. The framework provides a simple way to characterize the probability distribution of cluster distributions through histogram models, regardless of their distribution type. This advantage allows for efficient use with various conventional clustering algorithms. To facilitate effective clustering across windows, we use a statistical measure that allows the comparison and merging of different clusters based on the calculation of the Wasserstein distance between histograms.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111414"},"PeriodicalIF":7.5,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143372153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep clustering via dual-supervised multi-kernel mapping
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-04 DOI: 10.1016/j.patcog.2025.111419
Lina Ren , Ruizhang Huang , Shengwei Ma , Yongbin Qin , Yanping Chen , Chuan Lin
In this paper, we propose a novel deep clustering framework via dual-supervised multi-kernel mapping, namely DCDMK, to improve clustering performance by learning linearly structural separable data representations. In the DCDMK framework, we introduce a kernel-aid encoder comprising two key components: a semantic representation learner, which captures the essential semantic information for clustering, and a multi-kernel representation learner, which dynamically selects the optimal combination of kernel functions through dual-supervised multi-kernel mapping to learn structurally separable kernel representations. The dual self-supervised mechanism is devised to jointly optimize both kernel representation learning and structural partitioning. Based on this framework, we introduce different fusion strategies to learn the multi-kernel representation of data samples for the clustering task. We derive two variants, namely DCDMK-WL (with layer-level kernel representation learning) and DCDMK-OL (without layer-level kernel representation learning). Extensive experiments on six real-world datasets demonstrate the effectiveness of our DCDMK framework.
{"title":"Deep clustering via dual-supervised multi-kernel mapping","authors":"Lina Ren ,&nbsp;Ruizhang Huang ,&nbsp;Shengwei Ma ,&nbsp;Yongbin Qin ,&nbsp;Yanping Chen ,&nbsp;Chuan Lin","doi":"10.1016/j.patcog.2025.111419","DOIUrl":"10.1016/j.patcog.2025.111419","url":null,"abstract":"<div><div>In this paper, we propose a novel deep clustering framework via dual-supervised multi-kernel mapping, namely DCDMK, to improve clustering performance by learning linearly structural separable data representations. In the DCDMK framework, we introduce a kernel-aid encoder comprising two key components: a semantic representation learner, which captures the essential semantic information for clustering, and a multi-kernel representation learner, which dynamically selects the optimal combination of kernel functions through dual-supervised multi-kernel mapping to learn structurally separable kernel representations. The dual self-supervised mechanism is devised to jointly optimize both kernel representation learning and structural partitioning. Based on this framework, we introduce different fusion strategies to learn the multi-kernel representation of data samples for the clustering task. We derive two variants, namely DCDMK-WL (with layer-level kernel representation learning) and DCDMK-OL (without layer-level kernel representation learning). Extensive experiments on six real-world datasets demonstrate the effectiveness of our DCDMK framework.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111419"},"PeriodicalIF":7.5,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143298593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretable multi-view clustering
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-03 DOI: 10.1016/j.patcog.2025.111418
Mudi Jiang , Lianyu Hu , Zengyou He , Zhikui Chen
Multi-view clustering has become a significant area of research, with numerous methods proposed over the past decades to enhance clustering accuracy. However, in many real-world applications, it is crucial to demonstrate a clear decision-making process-specifically, explaining why samples are assigned to particular clusters. Consequently, there remains a notable gap in developing interpretable methods for clustering multi-view data. To fill this crucial gap, we make the first attempt towards this direction by introducing an interpretable multi-view clustering framework. Our method begins by extracting embedded features from each view and generates pseudo-labels to guide the initial construction of the decision tree. Subsequently, it iteratively optimizes the feature representation for each view along with refining the interpretable decision tree. Experimental results on real datasets demonstrate that our method not only provides a transparent clustering process for multi-view data but also delivers performance comparable to state-of-the-art multi-view clustering methods. To the best of our knowledge, this is the first effort to design an interpretable clustering framework specifically for multi-view data, opening a new avenue in this field.
{"title":"Interpretable multi-view clustering","authors":"Mudi Jiang ,&nbsp;Lianyu Hu ,&nbsp;Zengyou He ,&nbsp;Zhikui Chen","doi":"10.1016/j.patcog.2025.111418","DOIUrl":"10.1016/j.patcog.2025.111418","url":null,"abstract":"<div><div>Multi-view clustering has become a significant area of research, with numerous methods proposed over the past decades to enhance clustering accuracy. However, in many real-world applications, it is crucial to demonstrate a clear decision-making process-specifically, explaining why samples are assigned to particular clusters. Consequently, there remains a notable gap in developing interpretable methods for clustering multi-view data. To fill this crucial gap, we make the first attempt towards this direction by introducing an interpretable multi-view clustering framework. Our method begins by extracting embedded features from each view and generates pseudo-labels to guide the initial construction of the decision tree. Subsequently, it iteratively optimizes the feature representation for each view along with refining the interpretable decision tree. Experimental results on real datasets demonstrate that our method not only provides a transparent clustering process for multi-view data but also delivers performance comparable to state-of-the-art multi-view clustering methods. To the best of our knowledge, this is the first effort to design an interpretable clustering framework specifically for multi-view data, opening a new avenue in this field.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111418"},"PeriodicalIF":7.5,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143298461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generative feature style augmentation for domain generalization in medical image segmentation
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-03 DOI: 10.1016/j.patcog.2025.111416
Yunzhi Huang , Luyi Han , Haoran Dou
Although learning-based models have achieved tremendous success in medical image segmentation for independent and identical distributed data, model performance often deteriorates for out-of-distribution data. Training a model for each domain requires extra time and computing resources and increases the annotation burden of physicians. It is hence more practical to generalize the segmentation model trained using a single source domain. In this work, we model domain-level feature style with a flexible probabilistic block, that is framework-agnostic and can be integrated into an arbitrary segmentation network to enhance the model generality on unseen datasets. Specifically, we employ a variational auto-encoder to learn the feature style representations, enabling the generation of diverse feature styles through sampling from a prior distribution. During inference, we replace the target feature style with that of the source domain using a linear transformation. We compare our method with five state-of-the-art domain generalization (DG) methods using prostate MRI data from six centers and spinal cord MRI data from four sites. Evaluation with Dice similarity coefficient score and 95th percentile Hausdorff distance demonstrates that our method achieves superior improvement in model generalizability over other DG models.
{"title":"Generative feature style augmentation for domain generalization in medical image segmentation","authors":"Yunzhi Huang ,&nbsp;Luyi Han ,&nbsp;Haoran Dou","doi":"10.1016/j.patcog.2025.111416","DOIUrl":"10.1016/j.patcog.2025.111416","url":null,"abstract":"<div><div>Although learning-based models have achieved tremendous success in medical image segmentation for independent and identical distributed data, model performance often deteriorates for out-of-distribution data. Training a model for each domain requires extra time and computing resources and increases the annotation burden of physicians. It is hence more practical to generalize the segmentation model trained using a single source domain. In this work, we model domain-level feature style with a flexible probabilistic block, that is framework-agnostic and can be integrated into an arbitrary segmentation network to enhance the model generality on unseen datasets. Specifically, we employ a variational auto-encoder to learn the feature style representations, enabling the generation of diverse feature styles through sampling from a prior distribution. During inference, we replace the target feature style with that of the source domain using a linear transformation. We compare our method with five state-of-the-art domain generalization (DG) methods using prostate MRI data from six centers and spinal cord MRI data from four sites. Evaluation with Dice similarity coefficient score and 95th percentile Hausdorff distance demonstrates that our method achieves superior improvement in model generalizability over other DG models.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111416"},"PeriodicalIF":7.5,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143298460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond boundaries: Hierarchical-contrast unsupervised temporal action localization with high-coupling feature learning
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-03 DOI: 10.1016/j.patcog.2025.111421
Yuanyuan Liu , Ning Zhou , Yuxuan Huang , Shuyang Liu , Leyuan Liu , Wujie Zhou , Chang Tang , Ke Wang
Current unsupervised temporal action localization (UTAL) methods mainly use clustering and localization with independent learning mechanisms. However, these individual mechanisms are low-coupled and struggle to finely localize action-background boundary information due to the lack of feature interactions in the clustering and localization process. To address this, we propose an end-to-end Hierarchical-Contrast UTAL (HC-UTAL) framework with high-coupling multi-task feature learning. HC-UTAL incorporates coarse-to-fine contrastive learning (CL) at three levels: video level, instance level and boundary level, thus obtaining adaptive interaction and robust performance. We first employ the video-level CL on video-level and cluster-level feature learning, generating video action pseudo-labels. Then, using the video action pseudo-labels, we further devise the instance-level CL on action-related feature learning for coarse localization and the boundary-level CL on ambiguous action-background boundary feature learning for finer localization, respectively. We conduct extensive experiments on THUMOS’14, ActivityNet v1.2, and ActivityNet v1.3 datasets. The results demonstrate that our method achieves state-of-the-art performance. The code and trained models are available at: https://github.com/bugcat9/HC-UTAL.
{"title":"Beyond boundaries: Hierarchical-contrast unsupervised temporal action localization with high-coupling feature learning","authors":"Yuanyuan Liu ,&nbsp;Ning Zhou ,&nbsp;Yuxuan Huang ,&nbsp;Shuyang Liu ,&nbsp;Leyuan Liu ,&nbsp;Wujie Zhou ,&nbsp;Chang Tang ,&nbsp;Ke Wang","doi":"10.1016/j.patcog.2025.111421","DOIUrl":"10.1016/j.patcog.2025.111421","url":null,"abstract":"<div><div>Current unsupervised temporal action localization (UTAL) methods mainly use clustering and localization with independent learning mechanisms. However, these individual mechanisms are low-coupled and struggle to finely localize action-background boundary information due to the lack of feature interactions in the clustering and localization process. To address this, we propose an end-to-end Hierarchical-Contrast UTAL (HC-UTAL) framework with high-coupling multi-task feature learning. HC-UTAL incorporates coarse-to-fine contrastive learning (CL) at three levels: <em>video level</em>, <em>instance level</em> and <em>boundary level</em>, thus obtaining adaptive interaction and robust performance. We first employ the <em>video-level CL</em> on video-level and cluster-level feature learning, generating video action pseudo-labels. Then, using the video action pseudo-labels, we further devise the <em>instance-level CL</em> on action-related feature learning for coarse localization and the <em>boundary-level CL</em> on ambiguous action-background boundary feature learning for finer localization, respectively. We conduct extensive experiments on THUMOS’14, ActivityNet v1.2, and ActivityNet v1.3 datasets. The results demonstrate that our method achieves state-of-the-art performance. The code and trained models are available at: <span><span>https://github.com/bugcat9/HC-UTAL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111421"},"PeriodicalIF":7.5,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143377928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable multi-scale attention fusion network and adaptive correcting gradient optimization for multi-task learning
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-01 DOI: 10.1016/j.patcog.2025.111423
Naihua Ji , Yongqiang Sun , Fanyun Meng , Liping Pang , Yuzhu Tian
Network architecture and optimization are two indispensable parts in multi-task learning, which together improve the performance of multi-task learning. Previous work has rarely focused on both aspects simultaneously. In this paper, we analyze the multi-task learning from network architecture and optimization. In network architecture aspect, we propose a variable multi-scale attention fusion network, which overcomes the issue of feature loss when processing small-scale feature maps during upsampling and resolves the problem of inadequate learning in conventional multi-scale models due to significant spatial size disparities. In optimization aspect, a adaptive correcting gradient scheme is put forward to treat the defects of conflicts and dominance among multiple tasks during the process of training, and it effectively alleviates the imbalance of multi-task training. Various ablation experiments and comparative experiments demonstrate that simultaneously considering the network framework and optimization can make great improvement for the performance of multi-task learning. Our code is available at https://github.com/SyqxhSt/Net-Opt-MTL
{"title":"Variable multi-scale attention fusion network and adaptive correcting gradient optimization for multi-task learning","authors":"Naihua Ji ,&nbsp;Yongqiang Sun ,&nbsp;Fanyun Meng ,&nbsp;Liping Pang ,&nbsp;Yuzhu Tian","doi":"10.1016/j.patcog.2025.111423","DOIUrl":"10.1016/j.patcog.2025.111423","url":null,"abstract":"<div><div>Network architecture and optimization are two indispensable parts in multi-task learning, which together improve the performance of multi-task learning. Previous work has rarely focused on both aspects simultaneously. In this paper, we analyze the multi-task learning from network architecture and optimization. In network architecture aspect, we propose a variable multi-scale attention fusion network, which overcomes the issue of feature loss when processing small-scale feature maps during upsampling and resolves the problem of inadequate learning in conventional multi-scale models due to significant spatial size disparities. In optimization aspect, a adaptive correcting gradient scheme is put forward to treat the defects of conflicts and dominance among multiple tasks during the process of training, and it effectively alleviates the imbalance of multi-task training. Various ablation experiments and comparative experiments demonstrate that simultaneously considering the network framework and optimization can make great improvement for the performance of multi-task learning. Our code is available at <span><span>https://github.com/SyqxhSt/Net-Opt-MTL</span><svg><path></path></svg></span></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111423"},"PeriodicalIF":7.5,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A closer look at the explainability of Contrastive language-image pre-training
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-01 DOI: 10.1016/j.patcog.2025.111409
Yi Li , Hualiang Wang , Yiqun Duan , Jiheng Zhang , Xiaomeng Li
Contrastive language-image pre-training (CLIP) is a powerful vision-language model that has shown great benefits for various tasks. However, we have identified some issues with its explainability, which undermine its credibility and limit the capacity for related tasks. Specifically, we find that CLIP tends to focus on background regions rather than foregrounds, with noisy activations at irrelevant positions on the visualization results. These phenomena conflict with conventional explainability methods based on the class attention map (CAM), where the raw model can highlight the local foreground regions using global supervision without alignment. To address these problems, we take a closer look at its architecture and features. Our analysis revealed that raw self-attentions link to inconsistent semantic regions, resulting in the opposite visualization. Besides, the noisy activations stem from redundant features among categories. Building on these insights, we propose the CLIP Surgery for reliable CAM, a method that allows surgery-like modifications to the inference architecture and features, without further fine-tuning as classical CAM methods. This approach significantly improves the explainability of CLIP, surpassing existing methods by large margins. Besides, it enables multimodal visualization and extends the capacity of raw CLIP on open-vocabulary tasks without extra alignment. The code is available at https://github.com/xmed-lab/CLIP_Surgery.
{"title":"A closer look at the explainability of Contrastive language-image pre-training","authors":"Yi Li ,&nbsp;Hualiang Wang ,&nbsp;Yiqun Duan ,&nbsp;Jiheng Zhang ,&nbsp;Xiaomeng Li","doi":"10.1016/j.patcog.2025.111409","DOIUrl":"10.1016/j.patcog.2025.111409","url":null,"abstract":"<div><div>Contrastive language-image pre-training (CLIP) is a powerful vision-language model that has shown great benefits for various tasks. However, we have identified some issues with its explainability, which undermine its credibility and limit the capacity for related tasks. Specifically, we find that CLIP tends to focus on background regions rather than foregrounds, with noisy activations at irrelevant positions on the visualization results. These phenomena conflict with conventional explainability methods based on the class attention map (CAM), where the raw model can highlight the local foreground regions using global supervision without alignment. To address these problems, we take a closer look at its architecture and features. Our analysis revealed that raw self-attentions link to inconsistent semantic regions, resulting in the opposite visualization. Besides, the noisy activations stem from redundant features among categories. Building on these insights, we propose the CLIP Surgery for reliable CAM, a method that allows surgery-like modifications to the inference architecture and features, without further fine-tuning as classical CAM methods. This approach significantly improves the explainability of CLIP, surpassing existing methods by large margins. Besides, it enables multimodal visualization and extends the capacity of raw CLIP on open-vocabulary tasks without extra alignment. The code is available at <span><span>https://github.com/xmed-lab/CLIP_Surgery</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111409"},"PeriodicalIF":7.5,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HiPPO: Enhancing proximal policy optimization with highlight replay
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-31 DOI: 10.1016/j.patcog.2025.111408
Shutong Zhang , Xing Chen , Zhaogeng Liu , Hechang Chen , Yi Chang
Sample efficiency remains a paramount challenge in policy gradient methods within reinforcement learning. The success of experience replay demonstrates the importance of leveraging historical experiences, often through off-policy methods to enhance approximate policy learning algorithms that aim to maximize current interaction sample reuse, aligning approximate policies with target objectives. However, the inaccurate approximation can negatively affect actual optimization, leading to poorer current experiences than past ones. We propose Highlight Replay Enhanced Proximal Policy Optimization (HiPPO) to address the challenge. Specifically, HiPPO optimizes by highlighting policies and introducing a penalty reward function for constrained optimization, which alleviates the constraints of policy similarity and boosts adaptability to historical experiences. Empirical studies show HiPPO outperforming state-of-the-art algorithms in MuJoCo continuous tasks in performance and learning speed. An in-depth analysis of the experimental results validates the effectiveness of employing highlight replay and penalty reward functions in our proposed method.
{"title":"HiPPO: Enhancing proximal policy optimization with highlight replay","authors":"Shutong Zhang ,&nbsp;Xing Chen ,&nbsp;Zhaogeng Liu ,&nbsp;Hechang Chen ,&nbsp;Yi Chang","doi":"10.1016/j.patcog.2025.111408","DOIUrl":"10.1016/j.patcog.2025.111408","url":null,"abstract":"<div><div>Sample efficiency remains a paramount challenge in policy gradient methods within reinforcement learning. The success of experience replay demonstrates the importance of leveraging historical experiences, often through off-policy methods to enhance approximate policy learning algorithms that aim to maximize current interaction sample reuse, aligning approximate policies with target objectives. However, the inaccurate approximation can negatively affect actual optimization, leading to poorer current experiences than past ones. We propose Highlight Replay Enhanced Proximal Policy Optimization (HiPPO) to address the challenge. Specifically, HiPPO optimizes by highlighting policies and introducing a penalty reward function for constrained optimization, which alleviates the constraints of policy similarity and boosts adaptability to historical experiences. Empirical studies show HiPPO outperforming state-of-the-art algorithms in MuJoCo continuous tasks in performance and learning speed. An in-depth analysis of the experimental results validates the effectiveness of employing highlight replay and penalty reward functions in our proposed method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111408"},"PeriodicalIF":7.5,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TFDNet: Time–Frequency enhanced Decomposed Network for long-term time series forecasting
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-31 DOI: 10.1016/j.patcog.2025.111412
Yuxiao Luo , Songming Zhang , Ziyu Lyu , Yuhan Hu
Long-term time series forecasting is a vital task and applicable across diverse fields. Recent methods focus on capturing the underlying patterns from one single domain (e.g. the time domain or the frequency domain) without a holistic view to process long-term time series from the time–frequency domains. In this paper, we propose a Time-Frequency enhanced Decomposed Network (TFDNet) to capture both the long-term temporal variations and periodicity from the time–frequency domain. In TFDNet, we devise a multi-scale time–frequency enhanced encoder backbone with two separate trend and seasonal time–frequency blocks to capture the distinct patterns within the decomposed components in multi-resolutions. Diverse kernel learning strategies of the kernel operations in time–frequency blocks have been explored, by investigating and incorporating the potential different channel-wise correlation patterns of multivariate time series. Experimental evaluation of eight datasets demonstrated that TFDNet is superior to state-of-the-art approaches in both effectiveness and efficiency. The code is available at https://github.com/YuxiaoLuo0013/TFDNet.
{"title":"TFDNet: Time–Frequency enhanced Decomposed Network for long-term time series forecasting","authors":"Yuxiao Luo ,&nbsp;Songming Zhang ,&nbsp;Ziyu Lyu ,&nbsp;Yuhan Hu","doi":"10.1016/j.patcog.2025.111412","DOIUrl":"10.1016/j.patcog.2025.111412","url":null,"abstract":"<div><div>Long-term time series forecasting is a vital task and applicable across diverse fields. Recent methods focus on capturing the underlying patterns from one single domain (e.g. the time domain or the frequency domain) without a holistic view to process long-term time series from the time–frequency domains. In this paper, we propose a <strong>T</strong>ime-<strong>F</strong>requency enhanced <strong>D</strong>ecomposed <strong>Net</strong>work (<strong>TFDNet</strong>) to capture both the long-term temporal variations and periodicity from the time–frequency domain. In TFDNet, we devise a multi-scale time–frequency enhanced encoder backbone with two separate trend and seasonal time–frequency blocks to capture the distinct patterns within the decomposed components in multi-resolutions. Diverse kernel learning strategies of the kernel operations in time–frequency blocks have been explored, by investigating and incorporating the potential different channel-wise correlation patterns of multivariate time series. Experimental evaluation of eight datasets demonstrated that TFDNet is superior to state-of-the-art approaches in both effectiveness and efficiency. The code is available at <span><span>https://github.com/YuxiaoLuo0013/TFDNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111412"},"PeriodicalIF":7.5,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1