Pub Date : 2025-02-07DOI: 10.1016/j.patcog.2025.111427
Jie Li , Chaoqian Li
Semi-supervised Symmetric Non-Negative Matrix Factorization (SNMF) has proven to be an effective clustering method. However, most existing semi-supervised SNMF approaches rely on sophisticated techniques to incorporate supervised information, which results in increased hyper-parameter tuning and model complexity. To achieve better performance with lower complexity, we propose a novel semi-supervised SNMF method called One-hot Constrained SNMF (OCSNMF). This method introduces a parameter-free embedding strategy for partial label information, representing the clustering assignments of labeled data points as one-hot vectors in the SNMF decomposition. We present an iterative algorithm to solve the optimization problem of the proposed OCSNMF, along with analyses of convergence and complexity. Experimental results on six image datasets demonstrate the superiority of OCSNMF compared to several state-of-the-art methods. The code can be obtained from: https://github.com/ljisxz/OCSNMF.
{"title":"One-hot constrained symmetric nonnegative matrix factorization for image clustering","authors":"Jie Li , Chaoqian Li","doi":"10.1016/j.patcog.2025.111427","DOIUrl":"10.1016/j.patcog.2025.111427","url":null,"abstract":"<div><div>Semi-supervised Symmetric Non-Negative Matrix Factorization (SNMF) has proven to be an effective clustering method. However, most existing semi-supervised SNMF approaches rely on sophisticated techniques to incorporate supervised information, which results in increased hyper-parameter tuning and model complexity. To achieve better performance with lower complexity, we propose a novel semi-supervised SNMF method called One-hot Constrained SNMF (OCSNMF). This method introduces a parameter-free embedding strategy for partial label information, representing the clustering assignments of labeled data points as one-hot vectors in the SNMF decomposition. We present an iterative algorithm to solve the optimization problem of the proposed OCSNMF, along with analyses of convergence and complexity. Experimental results on six image datasets demonstrate the superiority of OCSNMF compared to several state-of-the-art methods. The code can be obtained from: <span><span>https://github.com/ljisxz/OCSNMF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111427"},"PeriodicalIF":7.5,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143377929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-07DOI: 10.1016/j.patcog.2025.111425
Heng Hu, Sibao Chen, Zhihui You, Jin Tang
Although feature fusion has been widely used to improve detection performance, it can also lead to the mixing of feature information from different layers, which affects detection of tiny objects. To alleviate feature mixing problem, suppress the complex background interference and further improve detection performance, a new feature suppression and enhancement network is designed in this paper. In order to suppress background information and object feature information from non-local feature layers, we propose a feature suppression and enhancement module (FSEM). In FSEM, feature suppression module (FSM) aims to suppress background information and redundant features while emphasizing features of tiny objects. This helps to mitigate blending of irrelevant features and increase focusing on tiny object features. Feature enhancement module (FEM) aims to highlight deep large object feature information by combining it with shallow features. By enhancing features at different scales, FEM helps maintain feature discrimination. FSM adopts a plug-and-play design and can be embedded into detectors with feature fusion capabilities. In addition, we propose an improved Kullback–Leibler divergence (IKLD) as loss function. Distribution shifting convolution (DSConv) is adopted instead of convolution in neck to reduce computational effort. The effectiveness of our method is validated on the AI-TOD, VisDrone and DOTA datasets.
{"title":"FSENet: Feature suppression and enhancement network for tiny object detection","authors":"Heng Hu, Sibao Chen, Zhihui You, Jin Tang","doi":"10.1016/j.patcog.2025.111425","DOIUrl":"10.1016/j.patcog.2025.111425","url":null,"abstract":"<div><div>Although feature fusion has been widely used to improve detection performance, it can also lead to the mixing of feature information from different layers, which affects detection of tiny objects. To alleviate feature mixing problem, suppress the complex background interference and further improve detection performance, a new feature suppression and enhancement network is designed in this paper. In order to suppress background information and object feature information from non-local feature layers, we propose a feature suppression and enhancement module (FSEM). In FSEM, feature suppression module (FSM) aims to suppress background information and redundant features while emphasizing features of tiny objects. This helps to mitigate blending of irrelevant features and increase focusing on tiny object features. Feature enhancement module (FEM) aims to highlight deep large object feature information by combining it with shallow features. By enhancing features at different scales, FEM helps maintain feature discrimination. FSM adopts a plug-and-play design and can be embedded into detectors with feature fusion capabilities. In addition, we propose an improved Kullback–Leibler divergence (IKLD) as loss function. Distribution shifting convolution (DSConv) is adopted instead of convolution in neck to reduce computational effort. The effectiveness of our method is validated on the AI-TOD, VisDrone and DOTA datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111425"},"PeriodicalIF":7.5,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143395922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Low-light image enhancement (LLIE) is a challenging task, due to the multiple degradation problems involved, such as low brightness, color distortion, heavy noise, and detail degradation. Existing deep learning-based LLIE methods mainly use encoder–decoder networks or full-resolution networks, which excel at extracting context or detail information, respectively. Since detail and context information are both required for LLIE, existing methods cannot solve all the degradation problems. To solve the above problem, we propose an LLIE method based on collaboratively enhanced and integrated detail-context information (CoEIDC). Specifically, we propose a full-resolution network with two collaborative subnetworks, namely the detail extraction and enhancement subnetwork (DE2-Net) and context extraction and enhancement subnetwork (CE2-Net). CE2-Net extracts context information from the features of DE2-Net at different stages through large receptive field convolutions. Moreover, a collaborative attention module (CAM) and a detail-context integration module are proposed to enhance and integrate detail and context information. CAM is reused to enhance the detail features from multi-receptive fields and the context features from multiple stages. Extensive experimental results demonstrate that our method outperforms the state-of-the-art LLIE methods, and is applicable to other image enhancement tasks, such as underwater image enhancement.
{"title":"Collaboratively enhanced and integrated detail-context information for low-light image enhancement","authors":"Yuzhen Niu, Xiaofeng Lin, Huangbiao Xu, Rui Xu, Yuzhong Chen","doi":"10.1016/j.patcog.2025.111424","DOIUrl":"10.1016/j.patcog.2025.111424","url":null,"abstract":"<div><div>Low-light image enhancement (LLIE) is a challenging task, due to the multiple degradation problems involved, such as low brightness, color distortion, heavy noise, and detail degradation. Existing deep learning-based LLIE methods mainly use encoder–decoder networks or full-resolution networks, which excel at extracting context or detail information, respectively. Since detail and context information are both required for LLIE, existing methods cannot solve all the degradation problems. To solve the above problem, we propose an LLIE method based on collaboratively enhanced and integrated detail-context information (CoEIDC). Specifically, we propose a full-resolution network with two collaborative subnetworks, namely the detail extraction and enhancement subnetwork (DE<sup>2</sup>-Net) and context extraction and enhancement subnetwork (CE<sup>2</sup>-Net). CE<sup>2</sup>-Net extracts context information from the features of DE<sup>2</sup>-Net at different stages through large receptive field convolutions. Moreover, a collaborative attention module (CAM) and a detail-context integration module are proposed to enhance and integrate detail and context information. CAM is reused to enhance the detail features from multi-receptive fields and the context features from multiple stages. Extensive experimental results demonstrate that our method outperforms the state-of-the-art LLIE methods, and is applicable to other image enhancement tasks, such as underwater image enhancement.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111424"},"PeriodicalIF":7.5,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143377930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-06DOI: 10.1016/j.patcog.2025.111413
Huafeng Wang , Haodu Zhang , Wanquan Liu , Zhimin Hu , Haoqi Gao , Weifeng Lv , Xianfeng Gu
Effectively combining different data types (RGB, depth) for 6D pose estimation in deep learning remains challenging. Effectively extracting complementary information from these modalities and achieving implicit alignment is crucial for accurate pose estimation. This work proposes a novel fusion module that utilizes Transformer-based architecture for cross-modal fusion. This design fosters feature combination and strengthens global information processing, reducing dependence on traditional convolutional methods. Additionally, a residual attentional structure tackles two key issues: (1) mitigating information loss commonly encountered in deep networks, and (2) enhancing modal alignment through learned attention weights. We evaluate our method on the LineMOD Hinterstoisser et al. (2011) and YCB-Video Xiang et al. (2018) datasets, achieving state-of-the-art performance on YCB-Video and outperforming most existing methods on LineMOD. These results demonstrate the effectiveness of our approach and its strong generalization capabilities.
{"title":"A novel 6DoF pose estimation method using transformer fusion","authors":"Huafeng Wang , Haodu Zhang , Wanquan Liu , Zhimin Hu , Haoqi Gao , Weifeng Lv , Xianfeng Gu","doi":"10.1016/j.patcog.2025.111413","DOIUrl":"10.1016/j.patcog.2025.111413","url":null,"abstract":"<div><div>Effectively combining different data types (RGB, depth) for 6D pose estimation in deep learning remains challenging. Effectively extracting complementary information from these modalities and achieving implicit alignment is crucial for accurate pose estimation. This work proposes a novel fusion module that utilizes Transformer-based architecture for cross-modal fusion. This design fosters feature combination and strengthens global information processing, reducing dependence on traditional convolutional methods. Additionally, a residual attentional structure tackles two key issues: (1) mitigating information loss commonly encountered in deep networks, and (2) enhancing modal alignment through learned attention weights. We evaluate our method on the LineMOD Hinterstoisser et al. (2011) and YCB-Video Xiang et al. (2018) datasets, achieving state-of-the-art performance on YCB-Video and outperforming most existing methods on LineMOD. These results demonstrate the effectiveness of our approach and its strong generalization capabilities.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111413"},"PeriodicalIF":7.5,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143348361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-05DOI: 10.1016/j.patcog.2025.111422
Chao Ning, Hongping Gan
Vision Transformer (ViT) usually adopts a columnar or hierarchical structure with four stages, where identical block settings are applied within the same stage. To achieve more nuanced configurations for each ViT block, additional search is conducted to explore stronger architectures. However, the search cost is typically expensive and the results may not be transferable to different ViT architectures. In this paper, we present a DFC module, which exploits two lightweight grouped linear (GL) layers to learn the representations of the expansion layer between two fully connected layers and the nonlinear activation of multi-layer perceptron (MLP), respectively. Afterwards, we introduce the DFC module into vanilla ViT and analyze the learned weights of its GL layers. Interestingly, several pathologies arise even though the GL layers share the same initialization strategy. For instance, the GL layer weights display different patterns across various depths, and the GL1 and GL2 weights have different patterns in the same depth. We progressively compare and analyze these pathologies and derive a specific setting (SS) for ViT blocks at different depths. Experimental results demonstrate that SS generically improves the performance of various ViT architectures, not only enhancing accuracy but also reducing inference time and computational complexity. For example, on ImageNet-1k classification task, SS yields a significant 0.8% accuracy improvement, approximately 12.9% faster inference speed, and 25% fewer floating-point operations (FLOPs) on PVTv2 model. The codes and trained models are available at https://github.com/ICSResearch/SS.
{"title":"SS ViT: Observing pathologies of multi-layer perceptron weights and re-setting vision transformer","authors":"Chao Ning, Hongping Gan","doi":"10.1016/j.patcog.2025.111422","DOIUrl":"10.1016/j.patcog.2025.111422","url":null,"abstract":"<div><div>Vision Transformer (ViT) usually adopts a columnar or hierarchical structure with four stages, where identical block settings are applied within the same stage. To achieve more nuanced configurations for each ViT block, additional search is conducted to explore stronger architectures. However, the search cost is typically expensive and the results may not be transferable to different ViT architectures. In this paper, we present a DFC module, which exploits two lightweight grouped linear (GL) layers to learn the representations of the expansion layer between two fully connected layers and the nonlinear activation of multi-layer perceptron (MLP), respectively. Afterwards, we introduce the DFC module into vanilla ViT and analyze the learned weights of its GL layers. Interestingly, several pathologies arise even though the GL layers share the same initialization strategy. For instance, the GL layer weights display different patterns across various depths, and the GL1 and GL2 weights have different patterns in the same depth. We progressively compare and analyze these pathologies and derive a specific setting (SS) for ViT blocks at different depths. Experimental results demonstrate that SS generically improves the performance of various ViT architectures, not only enhancing accuracy but also reducing inference time and computational complexity. For example, on ImageNet-1k classification task, SS yields a significant 0.8% accuracy improvement, approximately 12.9% faster inference speed, and 25% fewer floating-point operations (FLOPs) on PVTv2 model. The codes and trained models are available at <span><span>https://github.com/ICSResearch/SS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111422"},"PeriodicalIF":7.5,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143377507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-05DOI: 10.1016/j.patcog.2025.111417
Rui Huang , Zonghai Huang , Hantang Zhou , Qiang Zhai , Fengjun Mu , Huayi Zhan , Hong Cheng , Xiao Yang
Magnetic resonance imaging (MRI) plays an important role in the diagnosis of knee injuries, due to its detailed information, which greatly enhances physicians’ diagnostic accuracy. However, the complex image information also makes it difficult for physicians to interpret MRI. There is an urgent need for a computer-assisted method to help physicians extract key information from MRIs and make diagnostic decisions. Knee MRI includes features across three different levels: anatomical plane-level, dataset-level, and case-level. In this paper, we approach the intelligent diagnosis of knee injuries as an interpretable MRI classification task, using a three-stage Memory-Guided Transformer (MGT) for implementation. The first stage focuses on extracting anatomical plane-level and dataset-level features through group attention and cross-attention, which are then stored in the memory matrix. In the second stage, the trained memory matrix guides the extraction of case-level features from different anatomical planes for each case. Finally, the probability of knee injury is determined using linear regression. The MGT was trained with the publicly available MRNet dataset. Compared with the original optimal model PERMIT, it shows a 5.7% improvement in the Youden index. A high level of consistency was observed between the physician-labeled diagnostic regions and the regions identified by group attention. Visualization of the trained memory revealed specific patterns, with column 62 corresponding to healthy subjects and column 81 to patients. These results demonstrate that MGT can effectively assist physicians in diagnosing knee injuries while offering excellent interpretability.
{"title":"Memory-Guided Transformer with group attention for knee MRI diagnosis","authors":"Rui Huang , Zonghai Huang , Hantang Zhou , Qiang Zhai , Fengjun Mu , Huayi Zhan , Hong Cheng , Xiao Yang","doi":"10.1016/j.patcog.2025.111417","DOIUrl":"10.1016/j.patcog.2025.111417","url":null,"abstract":"<div><div>Magnetic resonance imaging (MRI) plays an important role in the diagnosis of knee injuries, due to its detailed information, which greatly enhances physicians’ diagnostic accuracy. However, the complex image information also makes it difficult for physicians to interpret MRI. There is an urgent need for a computer-assisted method to help physicians extract key information from MRIs and make diagnostic decisions. Knee MRI includes features across three different levels: anatomical plane-level, dataset-level, and case-level. In this paper, we approach the intelligent diagnosis of knee injuries as an interpretable MRI classification task, using a three-stage Memory-Guided Transformer (MGT) for implementation. The first stage focuses on extracting anatomical plane-level and dataset-level features through group attention and cross-attention, which are then stored in the memory matrix. In the second stage, the trained memory matrix guides the extraction of case-level features from different anatomical planes for each case. Finally, the probability of knee injury is determined using linear regression. The MGT was trained with the publicly available MRNet dataset. Compared with the original optimal model PERMIT, it shows a 5.7% improvement in the Youden index. A high level of consistency was observed between the physician-labeled diagnostic regions and the regions identified by group attention. Visualization of the trained memory revealed specific patterns, with column 62 corresponding to healthy subjects and column 81 to patients. These results demonstrate that MGT can effectively assist physicians in diagnosing knee injuries while offering excellent interpretability.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111417"},"PeriodicalIF":7.5,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143387705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-04DOI: 10.1016/j.patcog.2025.111403
Zihang Chen , Weijie Zhao , Jingyang Liu , Puguang Xie , Siyu Hou , Yongjian Nian , Xiaochao Yang , Ruiyan Ma , Haiyan Ding , Jingjing Xiao
This work presents a new dual-domain active learning method for cross-modal cardiac image segmentation with sparse annotations. Our network uses tilted Variational Auto-Encoders (tVAE) to extract and align invariant features from different domains. A proposed innovative Category Diversity Maximization approach that calculates statistical information regarding categories within a region reflects category diversity. The Uncertainty Region Selection Strategy is devised to measure the uncertainty of each predicted pixel. By jointly using these two methodologies, we identify risky areas for future annotation in active learning. The method was benchmarked against leading algorithms using two public cardiac datasets. In the MS-CMRSeg bSSFP to LGE segmentation task, our method achieved a DSC of 87.2% with just six-pixel annotations, surpassing the best results from the MS-CMRSeg Challenge 2019. In the MM-WHS dataset, our method using only 0.1% of annotations achieved a DSC of 91.8% for CT to MR and 88.9% for MR to CT, surpassing fully supervised models.1
{"title":"Active learning for cross-modal cardiac segmentation with sparse annotation","authors":"Zihang Chen , Weijie Zhao , Jingyang Liu , Puguang Xie , Siyu Hou , Yongjian Nian , Xiaochao Yang , Ruiyan Ma , Haiyan Ding , Jingjing Xiao","doi":"10.1016/j.patcog.2025.111403","DOIUrl":"10.1016/j.patcog.2025.111403","url":null,"abstract":"<div><div>This work presents a new dual-domain active learning method for cross-modal cardiac image segmentation with sparse annotations. Our network uses tilted Variational Auto-Encoders (tVAE) to extract and align invariant features from different domains. A proposed innovative Category Diversity Maximization approach that calculates statistical information regarding categories within a region reflects category diversity. The Uncertainty Region Selection Strategy is devised to measure the uncertainty of each predicted pixel. By jointly using these two methodologies, we identify risky areas for future annotation in active learning. The method was benchmarked against leading algorithms using two public cardiac datasets. In the MS-CMRSeg bSSFP to LGE segmentation task, our method achieved a DSC of 87.2% with just six-pixel annotations, surpassing the best results from the MS-CMRSeg Challenge 2019. In the MM-WHS dataset, our method using only 0.1% of annotations achieved a DSC of 91.8% for CT to MR and 88.9% for MR to CT, surpassing fully supervised models.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111403"},"PeriodicalIF":7.5,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143403362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, we present an innovative clustering framework designed for large datasets and real-time data streams which uses a sliding window and histogram model to address the challenge of memory congestion while reducing computational complexity and improving cluster quality for both static and dynamic clustering. The framework provides a simple way to characterize the probability distribution of cluster distributions through histogram models, regardless of their distribution type. This advantage allows for efficient use with various conventional clustering algorithms. To facilitate effective clustering across windows, we use a statistical measure that allows the comparison and merging of different clusters based on the calculation of the Wasserstein distance between histograms.
{"title":"Incremental clustering based on Wasserstein distance between histogram models","authors":"Xiaotong Qian , Guénaël Cabanes , Parisa Rastin , Mohamed Alae Guidani , Ghassen Marrakchi , Marianne Clausel , Nistor Grozavu","doi":"10.1016/j.patcog.2025.111414","DOIUrl":"10.1016/j.patcog.2025.111414","url":null,"abstract":"<div><div>In this article, we present an innovative clustering framework designed for large datasets and real-time data streams which uses a sliding window and histogram model to address the challenge of memory congestion while reducing computational complexity and improving cluster quality for both static and dynamic clustering. The framework provides a simple way to characterize the probability distribution of cluster distributions through histogram models, regardless of their distribution type. This advantage allows for efficient use with various conventional clustering algorithms. To facilitate effective clustering across windows, we use a statistical measure that allows the comparison and merging of different clusters based on the calculation of the Wasserstein distance between histograms.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111414"},"PeriodicalIF":7.5,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143372153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-04DOI: 10.1016/j.patcog.2025.111419
Lina Ren , Ruizhang Huang , Shengwei Ma , Yongbin Qin , Yanping Chen , Chuan Lin
In this paper, we propose a novel deep clustering framework via dual-supervised multi-kernel mapping, namely DCDMK, to improve clustering performance by learning linearly structural separable data representations. In the DCDMK framework, we introduce a kernel-aid encoder comprising two key components: a semantic representation learner, which captures the essential semantic information for clustering, and a multi-kernel representation learner, which dynamically selects the optimal combination of kernel functions through dual-supervised multi-kernel mapping to learn structurally separable kernel representations. The dual self-supervised mechanism is devised to jointly optimize both kernel representation learning and structural partitioning. Based on this framework, we introduce different fusion strategies to learn the multi-kernel representation of data samples for the clustering task. We derive two variants, namely DCDMK-WL (with layer-level kernel representation learning) and DCDMK-OL (without layer-level kernel representation learning). Extensive experiments on six real-world datasets demonstrate the effectiveness of our DCDMK framework.
{"title":"Deep clustering via dual-supervised multi-kernel mapping","authors":"Lina Ren , Ruizhang Huang , Shengwei Ma , Yongbin Qin , Yanping Chen , Chuan Lin","doi":"10.1016/j.patcog.2025.111419","DOIUrl":"10.1016/j.patcog.2025.111419","url":null,"abstract":"<div><div>In this paper, we propose a novel deep clustering framework via dual-supervised multi-kernel mapping, namely DCDMK, to improve clustering performance by learning linearly structural separable data representations. In the DCDMK framework, we introduce a kernel-aid encoder comprising two key components: a semantic representation learner, which captures the essential semantic information for clustering, and a multi-kernel representation learner, which dynamically selects the optimal combination of kernel functions through dual-supervised multi-kernel mapping to learn structurally separable kernel representations. The dual self-supervised mechanism is devised to jointly optimize both kernel representation learning and structural partitioning. Based on this framework, we introduce different fusion strategies to learn the multi-kernel representation of data samples for the clustering task. We derive two variants, namely DCDMK-WL (with layer-level kernel representation learning) and DCDMK-OL (without layer-level kernel representation learning). Extensive experiments on six real-world datasets demonstrate the effectiveness of our DCDMK framework.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111419"},"PeriodicalIF":7.5,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143298593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-03DOI: 10.1016/j.patcog.2025.111418
Mudi Jiang , Lianyu Hu , Zengyou He , Zhikui Chen
Multi-view clustering has become a significant area of research, with numerous methods proposed over the past decades to enhance clustering accuracy. However, in many real-world applications, it is crucial to demonstrate a clear decision-making process-specifically, explaining why samples are assigned to particular clusters. Consequently, there remains a notable gap in developing interpretable methods for clustering multi-view data. To fill this crucial gap, we make the first attempt towards this direction by introducing an interpretable multi-view clustering framework. Our method begins by extracting embedded features from each view and generates pseudo-labels to guide the initial construction of the decision tree. Subsequently, it iteratively optimizes the feature representation for each view along with refining the interpretable decision tree. Experimental results on real datasets demonstrate that our method not only provides a transparent clustering process for multi-view data but also delivers performance comparable to state-of-the-art multi-view clustering methods. To the best of our knowledge, this is the first effort to design an interpretable clustering framework specifically for multi-view data, opening a new avenue in this field.
{"title":"Interpretable multi-view clustering","authors":"Mudi Jiang , Lianyu Hu , Zengyou He , Zhikui Chen","doi":"10.1016/j.patcog.2025.111418","DOIUrl":"10.1016/j.patcog.2025.111418","url":null,"abstract":"<div><div>Multi-view clustering has become a significant area of research, with numerous methods proposed over the past decades to enhance clustering accuracy. However, in many real-world applications, it is crucial to demonstrate a clear decision-making process-specifically, explaining why samples are assigned to particular clusters. Consequently, there remains a notable gap in developing interpretable methods for clustering multi-view data. To fill this crucial gap, we make the first attempt towards this direction by introducing an interpretable multi-view clustering framework. Our method begins by extracting embedded features from each view and generates pseudo-labels to guide the initial construction of the decision tree. Subsequently, it iteratively optimizes the feature representation for each view along with refining the interpretable decision tree. Experimental results on real datasets demonstrate that our method not only provides a transparent clustering process for multi-view data but also delivers performance comparable to state-of-the-art multi-view clustering methods. To the best of our knowledge, this is the first effort to design an interpretable clustering framework specifically for multi-view data, opening a new avenue in this field.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111418"},"PeriodicalIF":7.5,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143298461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}