Pub Date : 2025-12-13DOI: 10.1016/j.patcog.2025.112905
Xiaoyong Lu , Yuhan Chen , Bin Kang , Songlin Du
Local feature matching establishes correspondences between two sets of image features, a fundamental yet challenging task in computer vision. Existing Transformer-based methods achieve strong global modeling but suffer from high computational costs and limited locality. We propose PCMatcher, a detector-based feature matching framework that leverages parallel consensus attention to address these issues. Parallel consensus attention integrates a local consensus module to incorporate neighborhood information and a parallel attention mechanism to reuse parameters and computations efficiently. Additionally, a multi-scale fusion module combines features from different layers to improve robustness. Extensive experiments indicate that PCMatcher achieves a competitive accuracy-efficiency trade-off across various downstream tasks. The source code will be publicly released upon acceptance.
{"title":"Parallel consensus transformer for local feature matching","authors":"Xiaoyong Lu , Yuhan Chen , Bin Kang , Songlin Du","doi":"10.1016/j.patcog.2025.112905","DOIUrl":"10.1016/j.patcog.2025.112905","url":null,"abstract":"<div><div>Local feature matching establishes correspondences between two sets of image features, a fundamental yet challenging task in computer vision. Existing Transformer-based methods achieve strong global modeling but suffer from high computational costs and limited locality. We propose PCMatcher, a detector-based feature matching framework that leverages parallel consensus attention to address these issues. Parallel consensus attention integrates a local consensus module to incorporate neighborhood information and a parallel attention mechanism to reuse parameters and computations efficiently. Additionally, a multi-scale fusion module combines features from different layers to improve robustness. Extensive experiments indicate that PCMatcher achieves a competitive accuracy-efficiency trade-off across various downstream tasks. The source code will be publicly released upon acceptance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112905"},"PeriodicalIF":7.6,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-13DOI: 10.1016/j.patcog.2025.112909
J. Tinguaro Rodríguez , Xabier Gonzalez-Garcia , Daniel Gómez , Humberto Bustince
Accurate number-of-clusters estimation (NCE) is a central task in many clustering applications, particularly for prototype-based k-centers methods like k-Means, which require the number of clusters k to be specified in advance. This paper presents CRB-NCE, a general cluster cohesion rule-based framework for NCE integrating three main innovations: (i) the introduction of tail ratios to reliably identify decelerations in sequences of cohesion measures, (ii) a threshold-based rule system supporting accurate NCE, and (iii) an optimization-driven approach to learn these thresholds from synthetic datasets with controlled clustering complexity. Two cohesion measures are considered: inertia (SSE) and a new, scale-invariant metric called the mean coverage index. CRB-NCE is mainly applied to derive general-purpose NCE methods, but, most importantly, it also provides an adaptable framework that enables producing specialized procedures with enhanced performance under specific conditions, such as particular clustering algorithms or overlapping cluster structures. Extensive evaluations on synthetic Gaussian datasets (both standard and high-dimensional), clustering benchmarks, and real-world datasets show that CRB-NCE methods consistently achieve robust and competitive NCE performance with efficient runtimes compared to a broad baseline of internal clustering validity indices and other NCE methods.
{"title":"CRB-NCE: An adaptable cohesion rule-based approach to number of clusters estimation","authors":"J. Tinguaro Rodríguez , Xabier Gonzalez-Garcia , Daniel Gómez , Humberto Bustince","doi":"10.1016/j.patcog.2025.112909","DOIUrl":"10.1016/j.patcog.2025.112909","url":null,"abstract":"<div><div>Accurate number-of-clusters estimation (NCE) is a central task in many clustering applications, particularly for prototype-based <em>k</em>-centers methods like <em>k</em>-Means, which require the number of clusters <em>k</em> to be specified in advance. This paper presents CRB-NCE, a general cluster cohesion rule-based framework for NCE integrating three main innovations: (i) the introduction of tail ratios to reliably identify decelerations in sequences of cohesion measures, (ii) a threshold-based rule system supporting accurate NCE, and (iii) an optimization-driven approach to learn these thresholds from synthetic datasets with controlled clustering complexity. Two cohesion measures are considered: inertia (SSE) and a new, scale-invariant metric called the mean coverage index. CRB-NCE is mainly applied to derive general-purpose NCE methods, but, most importantly, it also provides an adaptable framework that enables producing specialized procedures with enhanced performance under specific conditions, such as particular clustering algorithms or overlapping cluster structures. Extensive evaluations on synthetic Gaussian datasets (both standard and high-dimensional), clustering benchmarks, and real-world datasets show that CRB-NCE methods consistently achieve robust and competitive NCE performance with efficient runtimes compared to a broad baseline of internal clustering validity indices and other NCE methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112909"},"PeriodicalIF":7.6,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-13DOI: 10.1016/j.patcog.2025.112895
Xian Lin , Xiayu Guo , Zengqiang Yan , Li Yu
Volumetric medical image segmentation relies on efficient intra- and inter-slice interaction. However, 2D and 3D approaches are sub-optimal when segmenting the anisotropic volumes due to absent spatial information or excessive spatial noise. Though 2.5D approaches aim to strike a balance by treating imaging dimensions differently, their rigid inter-slice interaction fails to build efficient cross-slice dependency for various objects. To address this, in this paper, we present a novel 2.5D framework named ACSFormer, allowing dense-yet-lightweight intra-slice interaction and sparse-yet-adaptive inter-slice interaction. Specifically, we propose intra-slice class-aware attention (ICA) by introducing class messengers to capture class-wise global semantics and build dependency between tokens and messengers. In this way, ICA effectively builds global intra-slice interaction with linear-level computational complexity. For inter-slice interaction, slice-wise entropy estimation is adopted to select reference slices for each target slice. To ensure flexible inter-slice interaction, we propose an inter-slice token-specific transformer (ITT) to localize cross-slice relevant regions based on feature relevance and build customized inter-slice dependency for each token. Extensive experiments on four publicly available datasets demonstrate the superiority of ACSFormer, consistently outperforming existing 2D, 2.5D, and 3D approaches with much lower model and computational complexity compared to 3D approaches. The code will be available at https://github.com/xianlin7/ACSFormer.
{"title":"Selective intra- and inter-slice interaction for efficient anisotropic medical image segmentation","authors":"Xian Lin , Xiayu Guo , Zengqiang Yan , Li Yu","doi":"10.1016/j.patcog.2025.112895","DOIUrl":"10.1016/j.patcog.2025.112895","url":null,"abstract":"<div><div>Volumetric medical image segmentation relies on efficient intra- and inter-slice interaction. However, 2D and 3D approaches are sub-optimal when segmenting the anisotropic volumes due to absent spatial information or excessive spatial noise. Though 2.5D approaches aim to strike a balance by treating imaging dimensions differently, their rigid inter-slice interaction fails to build efficient cross-slice dependency for various objects. To address this, in this paper, we present a novel 2.5D framework named ACSFormer, allowing dense-yet-lightweight intra-slice interaction and sparse-yet-adaptive inter-slice interaction. Specifically, we propose intra-slice class-aware attention (ICA) by introducing class messengers to capture class-wise global semantics and build dependency between tokens and messengers. In this way, ICA effectively builds global intra-slice interaction with linear-level computational complexity. For inter-slice interaction, slice-wise entropy estimation is adopted to select reference slices for each target slice. To ensure flexible inter-slice interaction, we propose an inter-slice token-specific transformer (ITT) to localize cross-slice relevant regions based on feature relevance and build customized inter-slice dependency for each token. Extensive experiments on four publicly available datasets demonstrate the superiority of ACSFormer, consistently outperforming existing 2D, 2.5D, and 3D approaches with much lower model and computational complexity compared to 3D approaches. The code will be available at <span><span>https://github.com/xianlin7/ACSFormer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"174 ","pages":"Article 112895"},"PeriodicalIF":7.6,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145842786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-11DOI: 10.1016/j.patcog.2025.112823
Haixin Wang , Jian Yang , Ryohei Katayama , Michiya Matusaki , Tomoyuki Miyao , Ying Li , Jinjia Zhou
Deep learning-based stain deconvolution approaches translate affordable IHC slides into informative mpIF images for nuclei segmentation; however, performance drops when inputs are H&E owing to domain shift. We prepended a stain transfer from H&E to IHC, then performed stain deconvolution from IHC to mpIF. To improve deconvolution, we adopted a semi-supervised scheme with paired GANs (I2M/M2I) that combines supervised and unsupervised objectives to diversify training data and mitigate pseudo-input noise. We further integrated a user interface for manual correction and leveraged its real-time feedback to estimate adaptive weights, enabling dataset-specific refinement without retraining. Across benchmark datasets, the proposed method surpasses state-of-the-art performance while improving robustness and usability for histopathological image analysis.
{"title":"NuclSeg-v2.0: Nuclei segmentation using semi-supervised stain deconvolution with real-time user feedback","authors":"Haixin Wang , Jian Yang , Ryohei Katayama , Michiya Matusaki , Tomoyuki Miyao , Ying Li , Jinjia Zhou","doi":"10.1016/j.patcog.2025.112823","DOIUrl":"10.1016/j.patcog.2025.112823","url":null,"abstract":"<div><div>Deep learning-based stain deconvolution approaches translate affordable IHC slides into informative mpIF images for nuclei segmentation; however, performance drops when inputs are H&E owing to domain shift. We prepended a stain transfer from H&E to IHC, then performed stain deconvolution from IHC to mpIF. To improve deconvolution, we adopted a semi-supervised scheme with paired GANs (I2M/M2I) that combines supervised and unsupervised objectives to diversify training data and mitigate pseudo-input noise. We further integrated a user interface for manual correction and leveraged its real-time feedback to estimate adaptive weights, enabling dataset-specific refinement without retraining. Across benchmark datasets, the proposed method surpasses state-of-the-art performance while improving robustness and usability for histopathological image analysis.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112823"},"PeriodicalIF":7.6,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-11DOI: 10.1016/j.patcog.2025.112886
Cong Guan, Osamu Yoshie
Existing deraining models process all rainy images within a single network. However, different rain patterns have significant variations, which makes it challenging for a single network to handle diverse types of raindrops and streaks. To address this limitation, we propose a novel CLIP-driven rain perception network (CLIP-RPN) that leverages CLIP to automatically perceive rain patterns by computing visual-language matching scores and adaptively routing to sub-networks to handle different rain patterns, such as varying raindrop densities, streak orientations, and rainfall intensity. CLIP-RPN establishes semantic-aware rain pattern recognition through CLIP’s cross-modal visual-language alignment capabilities, enabling automatic identification of precipitation characteristics across different rain scenarios. This rain pattern awareness drives an adaptive subnetwork routing mechanism where specialized processing branches are dynamically activated based on the detected rain type, significantly enhancing the model’s capacity to handle diverse rainfall conditions. Furthermore, within sub-networks of CLIP-RPN, we introduce a mask-guided cross-attention mechanism (MGCA) that predicts precise rain masks at multi-scale to facilitate contextual interactions between rainy regions and clean background areas by cross-attention. We also introduces a dynamic loss scheduling mechanism (DLS) to adaptively adjust the gradients for the optimization process of CLIP-RPN. Compared with the commonly used l1 or l2 loss, DLS is more compatible with the inherent dynamics of the network training process, thus achieving enhanced outcomes. Our method achieves state-of-the-art performance across multiple datasets, particularly excelling in complex mixed datasets.
{"title":"CLIP-driven rain perception: Adaptive deraining with pattern-aware network routing and mask-guided cross-attention","authors":"Cong Guan, Osamu Yoshie","doi":"10.1016/j.patcog.2025.112886","DOIUrl":"10.1016/j.patcog.2025.112886","url":null,"abstract":"<div><div>Existing deraining models process all rainy images within a single network. However, different rain patterns have significant variations, which makes it challenging for a single network to handle diverse types of raindrops and streaks. To address this limitation, we propose a novel CLIP-driven rain perception network (CLIP-RPN) that leverages CLIP to automatically perceive rain patterns by computing visual-language matching scores and adaptively routing to sub-networks to handle different rain patterns, such as varying raindrop densities, streak orientations, and rainfall intensity. CLIP-RPN establishes semantic-aware rain pattern recognition through CLIP’s cross-modal visual-language alignment capabilities, enabling automatic identification of precipitation characteristics across different rain scenarios. This rain pattern awareness drives an adaptive subnetwork routing mechanism where specialized processing branches are dynamically activated based on the detected rain type, significantly enhancing the model’s capacity to handle diverse rainfall conditions. Furthermore, within sub-networks of CLIP-RPN, we introduce a mask-guided cross-attention mechanism (MGCA) that predicts precise rain masks at multi-scale to facilitate contextual interactions between rainy regions and clean background areas by cross-attention. We also introduces a dynamic loss scheduling mechanism (DLS) to adaptively adjust the gradients for the optimization process of CLIP-RPN. Compared with the commonly used <em>l</em><sub>1</sub> or <em>l</em><sub>2</sub> loss, DLS is more compatible with the inherent dynamics of the network training process, thus achieving enhanced outcomes. Our method achieves state-of-the-art performance across multiple datasets, particularly excelling in complex mixed datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112886"},"PeriodicalIF":7.6,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-11DOI: 10.1016/j.patcog.2025.112890
Zeeshan Ali Haider , Sareer Ul Amin , Muhammad Fayaz , Fida Muhammad Khan , Hyeonjoon Moon , Sanghyun Seo
This paper presents a new technology that focuses on blind image quality assessment (BIQA) through a framework known as Quality-Centric Embedding and Ranking Network (QCERN). The framework ensures maximum efficiency when processing images under various possible scenarios. QCERN is entirely different from contemporary BIQA techniques, which focus solely on regressing quality scores without structured embeddings. In contrast, the proposed model features a well-defined embedding space as its principal focus, in which picture quality is both clustered and ordered. This dynamic quality of images enables QCERN to utilize several adaptive ranking transformers along a geometric space populated by dynamic score anchors representing images of equivalent quality QCERN features a distinct advantage since unlabeled images of interest can be placed by evaluation of their distance to these specified score anchors inductively in the embedding space, improving accuracy as well as generalization across disparate datasets. Multiple loss functions are utilized in this instance, including order and metric loss, to ensure that images are positioned correctly according to their quality while maintaining distinct divisions of quality. With the application of QCERN, numerous experiments have demonstrated its ability to outperform existing models by consistently delivering high-quality predictions across various datasets, making it a competitive option. This quality-centric embedding and ranking methodology is excellent for reliable quality assessment applications, such as in photography, medical imaging, and surveillance.
{"title":"A comprehensive approach for image quality assessment using quality-centric embedding and ranking networks","authors":"Zeeshan Ali Haider , Sareer Ul Amin , Muhammad Fayaz , Fida Muhammad Khan , Hyeonjoon Moon , Sanghyun Seo","doi":"10.1016/j.patcog.2025.112890","DOIUrl":"10.1016/j.patcog.2025.112890","url":null,"abstract":"<div><div>This paper presents a new technology that focuses on blind image quality assessment (BIQA) through a framework known as Quality-Centric Embedding and Ranking Network (QCERN). The framework ensures maximum efficiency when processing images under various possible scenarios. QCERN is entirely different from contemporary BIQA techniques, which focus solely on regressing quality scores without structured embeddings. In contrast, the proposed model features a well-defined embedding space as its principal focus, in which picture quality is both clustered and ordered. This dynamic quality of images enables QCERN to utilize several adaptive ranking transformers along a geometric space populated by dynamic score anchors representing images of equivalent quality QCERN features a distinct advantage since unlabeled images of interest can be placed by evaluation of their distance to these specified score anchors inductively in the embedding space, improving accuracy as well as generalization across disparate datasets. Multiple loss functions are utilized in this instance, including order and metric loss, to ensure that images are positioned correctly according to their quality while maintaining distinct divisions of quality. With the application of QCERN, numerous experiments have demonstrated its ability to outperform existing models by consistently delivering high-quality predictions across various datasets, making it a competitive option. This quality-centric embedding and ranking methodology is excellent for reliable quality assessment applications, such as in photography, medical imaging, and surveillance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112890"},"PeriodicalIF":7.6,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-11DOI: 10.1016/j.patcog.2025.112889
Wenbin Zuo , Hongying Liu , Huadeng Wang , Lingqi Zeng , Ningning Tang , Fanhua Shang , Liang Wan , Jingjing Deng
Semi-supervised methods aim to alleviate the high cost of annotating medical images by incorporating unlabeled data into the training set. Recently, various consistency regularization methods based on the mean-teacher model have emerged. However, their performance is limited by the small number and poor quality of confident pixels in the pseudo-labels. Based on experimental observations, we propose a new argument: the performance gains of the model do not proportionally translate into improvements in pseudo-label quality, mainly due to constraints in pixel diversity representation and model expressiveness. Therefore, we propose a novel semi-supervised framework, DOC-MLE, which consists of two key components: a dynamic orthogonal constraint (DyOrCon) method and one multi-level election (MLElect) strategy. Specifically, DyOrCon imposes orthogonal constraints on multiple intermediate projection heads to enhance pixel diversity and fully exploit the model’s potential representation capacity. MLElect is designed considering both unsupervised pixel-level and supervised feature-level strategies, to generate reliable pseudo-labels. Moreover, to generate more robust prototype representations, this paper proposes new threshold filtering, edge erosion, and dynamic convolution strategies to address errors associated with low-confidence, high-confidence, and local morphological constraints. Extensive experiments on coronary angiography, polyp dataset, and retinal fundus images have proven the effectiveness of the proposed method.
{"title":"Enhancing the impact of model performance gains for semi-supervised medical image segmentation","authors":"Wenbin Zuo , Hongying Liu , Huadeng Wang , Lingqi Zeng , Ningning Tang , Fanhua Shang , Liang Wan , Jingjing Deng","doi":"10.1016/j.patcog.2025.112889","DOIUrl":"10.1016/j.patcog.2025.112889","url":null,"abstract":"<div><div>Semi-supervised methods aim to alleviate the high cost of annotating medical images by incorporating unlabeled data into the training set. Recently, various consistency regularization methods based on the mean-teacher model have emerged. However, their performance is limited by the small number and poor quality of confident pixels in the pseudo-labels. Based on experimental observations, we propose a new argument: the performance gains of the model do not proportionally translate into improvements in pseudo-label quality, mainly due to constraints in pixel diversity representation and model expressiveness. Therefore, we propose a novel semi-supervised framework, DOC-MLE, which consists of two key components: a dynamic orthogonal constraint (DyOrCon) method and one multi-level election (MLElect) strategy. Specifically, DyOrCon imposes orthogonal constraints on multiple intermediate projection heads to enhance pixel diversity and fully exploit the model’s potential representation capacity. MLElect is designed considering both unsupervised pixel-level and supervised feature-level strategies, to generate reliable pseudo-labels. Moreover, to generate more robust prototype representations, this paper proposes new threshold filtering, edge erosion, and dynamic convolution strategies to address errors associated with low-confidence, high-confidence, and local morphological constraints. Extensive experiments on coronary angiography, polyp dataset, and retinal fundus images have proven the effectiveness of the proposed method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"174 ","pages":"Article 112889"},"PeriodicalIF":7.6,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145842785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-11DOI: 10.1016/j.patcog.2025.112908
Lin Sun , Yiman Zhang , Weiping Ding , Jiucheng Xu
Recently, multi-view fuzzy C-means clustering (MFCMC) can analyze samples from different views. However, it is affected by randomly initializing cluster centers and fails to comprehensively consider important differences between view and feature weights. To overcome these defects, an MFCMC methodology via multi-objective slime mould and cooperative learning is proposed. First, by combining the uniform distribution and great ergodicity of Tent mapping, Logistic mapping and Cosine mapping, a hybrid chaotic mapping, namely Tent-Logistic- Cosine, is designed to initialize the slime mould algorithm (SMA). An adaptive step via cosine function is applied into the anisotropic search to arrive at an optimal trade-off between the exploration and exploitation of SMA. Second, via changing characteristics of exponential function, an adjustable feedback factor is applied to update venation tube formation stage, and the global and local search of SMA is updated by the nonlinear adjustment. Then multi-objective SMA (MSMA) is studied by multiple strategies of hybrid chaotic mapping, adaptive step and adjustable feedback factor, and the optimal solution of MSMA can initialize the cluster center and feature weight of MFCMC. Third, via important differences between features and views, view and feature weights are designed for an objective function, and a novel MFCMC model via collaborative learning is developed to identify irrelevant features in each view. Finally, an MFCMC scheme with MSMA can reduce sensitivity of initial cluster centers and improve accuracy of clustering. Experiments on 24 benchmark functions for optimization and 14 multi-view datasets for clustering show the effectiveness of our developed methodology, respectively.
{"title":"Multi-view fuzzy C-means clustering via multi-objective slime mould and cooperative learning","authors":"Lin Sun , Yiman Zhang , Weiping Ding , Jiucheng Xu","doi":"10.1016/j.patcog.2025.112908","DOIUrl":"10.1016/j.patcog.2025.112908","url":null,"abstract":"<div><div>Recently, multi-view fuzzy C-means clustering (MFCMC) can analyze samples from different views. However, it is affected by randomly initializing cluster centers and fails to comprehensively consider important differences between view and feature weights. To overcome these defects, an MFCMC methodology via multi-objective slime mould and cooperative learning is proposed. First, by combining the uniform distribution and great ergodicity of Tent mapping, Logistic mapping and Cosine mapping, a hybrid chaotic mapping, namely Tent-Logistic- Cosine, is designed to initialize the slime mould algorithm (SMA). An adaptive step via cosine function is applied into the anisotropic search to arrive at an optimal trade-off between the exploration and exploitation of SMA. Second, via changing characteristics of exponential function, an adjustable feedback factor is applied to update venation tube formation stage, and the global and local search of SMA is updated by the nonlinear adjustment. Then multi-objective SMA (MSMA) is studied by multiple strategies of hybrid chaotic mapping, adaptive step and adjustable feedback factor, and the optimal solution of MSMA can initialize the cluster center and feature weight of MFCMC. Third, via important differences between features and views, view and feature weights are designed for an objective function, and a novel MFCMC model via collaborative learning is developed to identify irrelevant features in each view. Finally, an MFCMC scheme with MSMA can reduce sensitivity of initial cluster centers and improve accuracy of clustering. Experiments on 24 benchmark functions for optimization and 14 multi-view datasets for clustering show the effectiveness of our developed methodology, respectively.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112908"},"PeriodicalIF":7.6,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Visual counterfactual explainers (VCEs) are a straightforward and promising approach to enhancing the transparency of image classifiers. VCEs complement other types of explanations, such as feature attribution, by revealing the specific data transformations to which a machine learning model responds most strongly. In this paper, we argue that existing VCEs tend to focus too narrowly on optimizing sample quality or change minimality; they do not consider the more holistic desiderata for an explanation, such as fidelity, understandability, and sufficiency. To address this shortcoming, we explore new mechanisms for counterfactual generation and investigate how they can help fulfill these desiderata. We combine these mechanisms into a novel ‘smooth counterfactual explorer’ (SCE) algorithm and demonstrate its effectiveness through systematic evaluations on synthetic and real data.
{"title":"Towards desiderata-driven design of visual counterfactual explainers","authors":"Sidney Bender , Jan Herrmann , Klaus-Robert Müller , Grégoire Montavon","doi":"10.1016/j.patcog.2025.112811","DOIUrl":"10.1016/j.patcog.2025.112811","url":null,"abstract":"<div><div>Visual counterfactual explainers (VCEs) are a straightforward and promising approach to enhancing the transparency of image classifiers. VCEs complement other types of explanations, such as feature attribution, by revealing the specific data transformations to which a machine learning model responds most strongly. In this paper, we argue that existing VCEs tend to focus too narrowly on optimizing sample quality or change minimality; they do not consider the more holistic desiderata for an explanation, such as fidelity, understandability, and sufficiency. To address this shortcoming, we explore new mechanisms for counterfactual generation and investigate how they can help fulfill these desiderata. We combine these mechanisms into a novel ‘smooth counterfactual explorer’ (SCE) algorithm and demonstrate its effectiveness through systematic evaluations on synthetic and real data.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"174 ","pages":"Article 112811"},"PeriodicalIF":7.6,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145842782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-11DOI: 10.1016/j.patcog.2025.112871
Hanpeng Liu, Shuoxi Zhang, Kaiyuan Gao, Kun He
Self-supervised learning (SSL) has recently achieved remarkable success in computer vision, primarily through joint embedding architectures. These models train dual networks by aligning different augmentations of the same image, as well as preventing feature space collapse. Building upon this, previous work establishes a mathematical connection between joint embedding SSL and the co-occurrences of image patches. Moreover, there have been a number of efforts to scale patch-based SSL to a vast number of image patches, demonstrating rapid convergence and notable performance. However, the efficiency of these methods is hindered by the excessive use of cropped patches. Addressing this issue, we propose a novel framework named Past-to-Present (P2P) smoothing that leverages the model’s previous outputs as a supervisory signal. Specifically, we divide the patch augmentations of a single image into two portions. One portion is used to update the model at iteration and retained as past information of iteration t. The other portion is used for comparison in iteration t, serving as present information to be complementary to the past. This design allows us to spread the patches of the same image across different batches, thereby enhancing the utilization rate of patch-based learning in our model. Through extensive experimentation and validation, our method achieves outstanding accuracy, scoring 94.2 % on CIFAR-10, 74.2 % on CIFAR-100, 49.5 % on TinyImageNet, and 78.2 % on ImageNet-100. Besides, additional experiments demonstrate its enhanced transferability to out-of-domain datasets when compared to other SSL baselines.
{"title":"Boosting the patch-based self-supervised learning through past-to-present smoothing","authors":"Hanpeng Liu, Shuoxi Zhang, Kaiyuan Gao, Kun He","doi":"10.1016/j.patcog.2025.112871","DOIUrl":"10.1016/j.patcog.2025.112871","url":null,"abstract":"<div><div>Self-supervised learning (SSL) has recently achieved remarkable success in computer vision, primarily through joint embedding architectures. These models train dual networks by aligning different augmentations of the same image, as well as preventing feature space collapse. Building upon this, previous work establishes a mathematical connection between joint embedding SSL and the co-occurrences of image patches. Moreover, there have been a number of efforts to scale patch-based SSL to a vast number of image patches, demonstrating rapid convergence and notable performance. However, the efficiency of these methods is hindered by the excessive use of cropped patches. Addressing this issue, we propose a novel framework named Past-to-Present (P2P) smoothing that leverages the model’s previous outputs as a supervisory signal. Specifically, we divide the patch augmentations of a single image into two portions. One portion is used to update the model at iteration <span><math><mrow><mi>t</mi><mo>−</mo><mn>1</mn></mrow></math></span> and retained as past information of iteration <em>t</em>. The other portion is used for comparison in iteration <em>t</em>, serving as present information to be complementary to the past. This design allows us to spread the patches of the same image across different batches, thereby enhancing the utilization rate of patch-based learning in our model. Through extensive experimentation and validation, our method achieves outstanding accuracy, scoring 94.2 % on CIFAR-10, 74.2 % on CIFAR-100, 49.5 % on TinyImageNet, and 78.2 % on ImageNet-100. Besides, additional experiments demonstrate its enhanced transferability to out-of-domain datasets when compared to other SSL baselines.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112871"},"PeriodicalIF":7.6,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}