Feature selection has been studied extensively over the last few decades. As a widely used method, the information-theoretic feature selection methods have attracted considerable attention due to their better interpretation and desirable performance. From an information-theoretic perspective, a golden rule for feature selection is to maximize the mutual information between the selected feature subset and the class labels . Despite its simplicity, explicitly optimizing this objective is a non-trivial task. In this work, we propose a novel global neural network-based feature selection framework with the information bottleneck principle and establish its connection to the rule of maximizing . Using the matrix-based Rényi’s -order entropy functional, our framework enjoys a simple and tractable objective without any variational approximation or distributional assumption. We further extend the framework to multi-view scenarios and verify it with two large-scale, high-dimensional real-world biomedical applications. Comprehensive experimental results demonstrate the superior performance of our framework not only in terms of classification accuracy but also in terms of good interpretability within and across each view, effectively proving that the proposed framework is trustworthy. Code is available at https://github.com/archy666/IBFS.
{"title":"An information bottleneck approach for feature selection","authors":"Qi Zhang , Mingfei Lu , Shujian Yu , Jingmin Xin , Badong Chen","doi":"10.1016/j.patcog.2025.111564","DOIUrl":"10.1016/j.patcog.2025.111564","url":null,"abstract":"<div><div>Feature selection has been studied extensively over the last few decades. As a widely used method, the information-theoretic feature selection methods have attracted considerable attention due to their better interpretation and desirable performance. From an information-theoretic perspective, a golden rule for feature selection is to maximize the mutual information <span><math><mrow><mi>I</mi><mrow><mo>(</mo><msub><mrow><mi>X</mi></mrow><mrow><mi>s</mi></mrow></msub><mo>,</mo><mi>Y</mi><mo>)</mo></mrow></mrow></math></span> between the selected feature subset <span><math><msub><mrow><mi>X</mi></mrow><mrow><mi>s</mi></mrow></msub></math></span> and the class labels <span><math><mi>Y</mi></math></span>. Despite its simplicity, explicitly optimizing this objective is a non-trivial task. In this work, we propose a novel global neural network-based feature selection framework with the information bottleneck principle and establish its connection to the rule of maximizing <span><math><mrow><mi>I</mi><mrow><mo>(</mo><msub><mrow><mi>X</mi></mrow><mrow><mi>s</mi></mrow></msub><mo>,</mo><mi>Y</mi><mo>)</mo></mrow></mrow></math></span>. Using the matrix-based Rényi’s <span><math><mi>α</mi></math></span>-order entropy functional, our framework enjoys a simple and tractable objective without any variational approximation or distributional assumption. We further extend the framework to multi-view scenarios and verify it with two large-scale, high-dimensional real-world biomedical applications. Comprehensive experimental results demonstrate the superior performance of our framework not only in terms of classification accuracy but also in terms of good interpretability within and across each view, effectively proving that the proposed framework is trustworthy. Code is available at <span><span>https://github.com/archy666/IBFS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111564"},"PeriodicalIF":7.5,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-13DOI: 10.1016/j.patcog.2025.111540
Bingfeng Zhang , Jimin Xiao , Yao Zhao
We focus on confronting weakly supervised semantic segmentation with scribble-level annotation. The regularized loss has proven to be an effective solution for this task. However, most existing regularized losses only leverage static shallow features (color, spatial information) to compute the regularized kernel, which limits its final performance since such static shallow features fail to describe pair-wise pixel relationships in complicated cases. In this paper, we propose a new regularized loss that utilizes both shallow and deep features that are dynamically updated to aggregate sufficient information to represent the relationship of different pixels. Moreover, to provide accurate deep features, we design a feature consistency head to train the pair-wise feature relationship. In contrast to most approaches that adopt a multi-stage training strategy with complicated training settings and high time-consuming steps, our approach can be directly trained in an end-to-end manner, in which the feature consistency head and our regularized loss can benefit from each other. We evaluate our approach on different backbones, and extensive experiments show that our approach achieves new state-of-the-art performances on different cases, e.g., using our approach with a vision transformer outperforms other approaches by a substantial margin (more than 5% mIoU increase). The source code will be released at: https://github.com/zbf1991/DFR.
{"title":"Dynamic feature regularized loss for weakly supervised semantic segmentation","authors":"Bingfeng Zhang , Jimin Xiao , Yao Zhao","doi":"10.1016/j.patcog.2025.111540","DOIUrl":"10.1016/j.patcog.2025.111540","url":null,"abstract":"<div><div>We focus on confronting weakly supervised semantic segmentation with scribble-level annotation. The regularized loss has proven to be an effective solution for this task. However, most existing regularized losses only leverage static shallow features (color, spatial information) to compute the regularized kernel, which limits its final performance since such static shallow features fail to describe pair-wise pixel relationships in complicated cases. In this paper, we propose a new regularized loss that utilizes both shallow and deep features that are dynamically updated to aggregate sufficient information to represent the relationship of different pixels. Moreover, to provide accurate deep features, we design a feature consistency head to train the pair-wise feature relationship. In contrast to most approaches that adopt a multi-stage training strategy with complicated training settings and high time-consuming steps, our approach can be directly trained in an end-to-end manner, in which the feature consistency head and our regularized loss can benefit from each other. We evaluate our approach on different backbones, and extensive experiments show that our approach achieves new state-of-the-art performances on different cases, <em>e.g.</em>, using our approach with a vision transformer outperforms other approaches by a substantial margin (more than 5% mIoU increase). The source code will be released at: <span><span>https://github.com/zbf1991/DFR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111540"},"PeriodicalIF":7.5,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143642409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pre-trained vision-language (V-L) models such as CLIP have shown excellent performance in many downstream cross-modal tasks. However, most of them are only applicable to the English context. Subsequent research has focused on this problem and proposed improved models, such as CN-CLIP and AltCLIP, to facilitate their applicability to Chinese and even other languages. Nevertheless, these models suffer from high latency and a large memory footprint in inference, which limits their further deployment on resource-constrained edge devices. In this work, we propose a conceptually simple yet effective multilingual CLIP Compression framework and train a lightweight multilingual vision-language model, called DC-CLIP, for both Chinese and English contexts. In this framework, we collect a high-quality Chinese–English multi-source dataset and design two training stages, including multilingual vision-language feature distillation and alignment. During the first stage, lightweight image/text student models are designed to learn robust visual/multilingual textual feature representation ability from corresponding teacher models, respectively. Subsequently, the multilingual vision-language alignment stage enables effective alignment of visual and multilingual textual features to further improve the model’s multilingual performance. Comprehensive experiments in zero-shot image classification, conducted based on the ELEVATER benchmark, showcase that DC-CLIP achieves superior performance in the English context and competitive performance in the Chinese context, even with less training data, when compared to existing models of similar parameter magnitude. The evaluation demonstrates the effectiveness of our designed training mechanism.
{"title":"DC-CLIP: Multilingual CLIP Compression via vision-language distillation and vision-language alignment","authors":"Wenbo Zhang , Yifan Zhang , Jianfeng Lin , Binqiang Huang , Jinlu Zhang , Wenhao Yu","doi":"10.1016/j.patcog.2025.111547","DOIUrl":"10.1016/j.patcog.2025.111547","url":null,"abstract":"<div><div>Pre-trained vision-language (V-L) models such as CLIP have shown excellent performance in many downstream cross-modal tasks. However, most of them are only applicable to the English context. Subsequent research has focused on this problem and proposed improved models, such as CN-CLIP and AltCLIP, to facilitate their applicability to Chinese and even other languages. Nevertheless, these models suffer from high latency and a large memory footprint in inference, which limits their further deployment on resource-constrained edge devices. In this work, we propose a conceptually simple yet effective multilingual CLIP Compression framework and train a lightweight multilingual vision-language model, called DC-CLIP, for both Chinese and English contexts. In this framework, we collect a high-quality Chinese–English multi-source dataset and design two training stages, including multilingual vision-language feature distillation and alignment. During the first stage, lightweight image/text student models are designed to learn robust visual/multilingual textual feature representation ability from corresponding teacher models, respectively. Subsequently, the multilingual vision-language alignment stage enables effective alignment of visual and multilingual textual features to further improve the model’s multilingual performance. Comprehensive experiments in zero-shot image classification, conducted based on the ELEVATER benchmark, showcase that DC-CLIP achieves superior performance in the English context and competitive performance in the Chinese context, even with less training data, when compared to existing models of similar parameter magnitude. The evaluation demonstrates the effectiveness of our designed training mechanism.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111547"},"PeriodicalIF":7.5,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-13DOI: 10.1016/j.patcog.2025.111555
Rajeev Airani , Sachin Kamble
The spectral decomposition of Gaussian Random Graph Laplacian (GRGLs) is at the core of the solutions to many graph-based problems. Most prevalent are graph signal processing, graph matching, and graph learning problems. Proposed here is the Eigen Approximation Theorem (EAT), which states that the diagonal entries of a GRGL matrix are reliable empirical approximations of its eigenvalues, given certain general conditions. This theorem provides a more precise bound for eigenvalues in a subspace derived from the Courant–Fischer min–max theorem. Consequently, the th eigenvalue and eigenvector of a GRGL can be computed efficiently using deflated power iteration. Simulation results demonstrate the accuracy and computational speed of the EAT application. Hence, it can solve problems involving GRGLs like graph signal processing, graph matching, and graph learning. The EAT can also be used directly when approximations to spectral decomposition suffice. The real-time applications are also demonstrated.
{"title":"Spectral approximation of Gaussian random graph Laplacians and applications to pattern recognition","authors":"Rajeev Airani , Sachin Kamble","doi":"10.1016/j.patcog.2025.111555","DOIUrl":"10.1016/j.patcog.2025.111555","url":null,"abstract":"<div><div>The spectral decomposition of Gaussian Random Graph Laplacian (GRGLs) is at the core of the solutions to many graph-based problems. Most prevalent are graph signal processing, graph matching, and graph learning problems. Proposed here is the Eigen Approximation Theorem (EAT), which states that the diagonal entries of a GRGL matrix are reliable empirical approximations of its eigenvalues, given certain general conditions. This theorem provides a more precise bound for eigenvalues in a subspace derived from the Courant–Fischer min–max theorem. Consequently, the <span><math><mi>k</mi></math></span>th eigenvalue and eigenvector of a GRGL can be computed efficiently using deflated power iteration. Simulation results demonstrate the accuracy and computational speed of the EAT application. Hence, it can solve problems involving GRGLs like graph signal processing, graph matching, and graph learning. The EAT can also be used directly when approximations to spectral decomposition suffice. The real-time applications are also demonstrated.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111555"},"PeriodicalIF":7.5,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143642426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-12DOI: 10.1016/j.patcog.2025.111548
Namita Bajpai, Jiaul H. Paik, Sudeshna Sarkar
K-means is one of the most popular and effective partitional clustering algorithms. However, in K-means, the initial seeds (centroids) play a critical role in determining the quality of the clusters. The existing methods address this problem either by factoring in the distance between the points on n-dimensional space so that the seeds are spaced apart or by choosing points from the dense regions to avoid the selection of outliers. We introduce a novel approach for seed selection that jointly models diversity as well as the quality of the seeds in a unified probabilistic framework based on a fixed-size determinantal point process. The quality indicator measures the reliability of the point to be considered as a potential seed, while the diversity measure factors in the spatial relation between the points on Euclidean space. The results show that the proposed algorithm outperforms the state-of-the-art models on several datasets.
{"title":"Balanced seed selection for K-means clustering with determinantal point process","authors":"Namita Bajpai, Jiaul H. Paik, Sudeshna Sarkar","doi":"10.1016/j.patcog.2025.111548","DOIUrl":"10.1016/j.patcog.2025.111548","url":null,"abstract":"<div><div>K-means is one of the most popular and effective partitional clustering algorithms. However, in K-means, the initial seeds (centroids) play a critical role in determining the quality of the clusters. The existing methods address this problem either by factoring in the distance between the points on n-dimensional space so that the seeds are spaced apart or by choosing points from the dense regions to avoid the selection of outliers. We introduce a novel approach for seed selection that jointly models diversity as well as the quality of the seeds in a unified probabilistic framework based on a fixed-size determinantal point process. The quality indicator measures the reliability of the point to be considered as a potential seed, while the diversity measure factors in the spatial relation between the points on Euclidean space. The results show that the proposed algorithm outperforms the state-of-the-art models on several datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111548"},"PeriodicalIF":7.5,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-12DOI: 10.1016/j.patcog.2025.111601
Xiaoxing Guo, Gui-Fu Lu
In the era of big data, the rapid increase in data volume is accompanied by substantial missing data issues. Incomplete multiple kernel clustering (IMKC) investigates how to perform clustering when certain rows or columns of the predefined kernel matrix are missing. Among existing IMKC methods, the recent proposed late fusion IMKC (LF-IMKC) algorithm has garnered considerable attention due to its superior clustering accuracy and computational efficiency. However, existing LF-IMKC algorithms still suffer from several limitations. Firstly, we observe that in existing methods, the missing kernel imputation, kernel partition learning and subsequent late fusion processes are treated separately, which may lead to suboptimal solutions and adversely affect the clustering performance. Secondly, existing LF-IMKC algorithms treat each base partition equally, overlooking the differences in their contributions to the consistent clustering process. Thirdly, Existing algorithms typically overlook the higher-order correlations between the base partitions as well as the strong correlations between the base and consensus partitions, let alone leveraging these correlations for clustering. To address these issues, we propose a novel method, i.e., tensor-based incomplete multiple kernel clustering with auto-weighted late fusion alignment (TIKC-ALFA). Specifically, we first integrate the missing kernel imputation, base partition learning and subsequent late fusion processes within a unified framework. Secondly, we construct a third-order tensor using the weighted base partitions, offering an innovative perspective on tensor slices through the lens of weight distribution and then utilize the tensor nuclear norm (TNN) to approximate the true rank of the tensor. Furthermore, we incorporate the consensus partition into the tensor structure originally constructed solely from weighted base partitions to further investigate the strong correlations between the base partitions and the consensus partition. The experimental results on six commonly used datasets demonstrate the effectiveness of our algorithm.
{"title":"Tensor-based incomplete multiple kernel clustering with auto-weighted late fusion alignment","authors":"Xiaoxing Guo, Gui-Fu Lu","doi":"10.1016/j.patcog.2025.111601","DOIUrl":"10.1016/j.patcog.2025.111601","url":null,"abstract":"<div><div>In the era of big data, the rapid increase in data volume is accompanied by substantial missing data issues. Incomplete multiple kernel clustering (IMKC) investigates how to perform clustering when certain rows or columns of the predefined kernel matrix are missing. Among existing IMKC methods, the recent proposed late fusion IMKC (LF-IMKC) algorithm has garnered considerable attention due to its superior clustering accuracy and computational efficiency. However, existing LF-IMKC algorithms still suffer from several limitations. Firstly, we observe that in existing methods, the missing kernel imputation, kernel partition learning and subsequent late fusion processes are treated separately, which may lead to suboptimal solutions and adversely affect the clustering performance. Secondly, existing LF-IMKC algorithms treat each base partition equally, overlooking the differences in their contributions to the consistent clustering process. Thirdly, Existing algorithms typically overlook the higher-order correlations between the base partitions as well as the strong correlations between the base and consensus partitions, let alone leveraging these correlations for clustering. To address these issues, we propose a novel method, i.e., tensor-based incomplete multiple kernel clustering with auto-weighted late fusion alignment (TIKC-ALFA). Specifically, we first integrate the missing kernel imputation, base partition learning and subsequent late fusion processes within a unified framework. Secondly, we construct a third-order tensor using the weighted base partitions, offering an innovative perspective on tensor slices through the lens of weight distribution and then utilize the tensor nuclear norm (TNN) to approximate the true rank of the tensor. Furthermore, we incorporate the consensus partition into the tensor structure originally constructed solely from weighted base partitions to further investigate the strong correlations between the base partitions and the consensus partition. The experimental results on six commonly used datasets demonstrate the effectiveness of our algorithm.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111601"},"PeriodicalIF":7.5,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143643508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-12DOI: 10.1016/j.patcog.2025.111565
Zenan Shi , Haipeng Chen , Dong Zhang
The proliferation of image manipulation tools has led to an increase in the number of manipulated images being disseminated online, posing risks like the propagation of fake news and telecom fraud. Thus, there is an increasing demand for precise, generic, and robust methods for detecting and locating manipulated images. In this paper, we propose a simple and clean model, named MEAFormer, for image forgery localization that does not heavily rely on pre-trained models. MEAFormer comprises three main components: an encoder network, a neck network, and a decoder network. Specifically, the transformer-based encoder network extracts hierarchical feature representations from the input image, providing rich contextual information in each layer. The neck network, incorporating our proposed cross-layer feature aggregation (CFA), aggregates these hierarchical features. To achieve better spatial feature co-occurrence, instead of using noise or edge artifacts, we introduce a multi-scale graph reasoning (MGR) module within the decoder network via bipartite graphs over the encoder and decoder features in a multi-scale fashion. The cross-level enhancement (CLE) further performs adjacent-level feature fusion to amplify the regions of interest in aggregated manipulation features. Finally, the multi-exit architecture (MEA) guides the model to learn fine-grained features and segment out the manipulated region. Extensive experiments across diverse and challenging datasets conclusively establish the superiority of MEAFormer over existing state-of-the-art methods, excelling in accuracy, generalization, and robustness.
{"title":"Robustifying vision transformer for image forgery localization with multi-exit architectures","authors":"Zenan Shi , Haipeng Chen , Dong Zhang","doi":"10.1016/j.patcog.2025.111565","DOIUrl":"10.1016/j.patcog.2025.111565","url":null,"abstract":"<div><div>The proliferation of image manipulation tools has led to an increase in the number of manipulated images being disseminated online, posing risks like the propagation of fake news and telecom fraud. Thus, there is an increasing demand for precise, generic, and robust methods for detecting and locating manipulated images. In this paper, we propose a simple and clean model, named MEAFormer, for image forgery localization that does not heavily rely on pre-trained models. MEAFormer comprises three main components: an <em>encoder network</em>, a <em>neck network</em>, and a <em>decoder network</em>. Specifically, the transformer-based <em>encoder network</em> extracts hierarchical feature representations from the input image, providing rich contextual information in each layer. The <em>neck network</em>, incorporating our proposed cross-layer feature aggregation (CFA), aggregates these hierarchical features. To achieve better spatial feature co-occurrence, instead of using noise or edge artifacts, we introduce a multi-scale graph reasoning (MGR) module within the <em>decoder network</em> via bipartite graphs over the encoder and decoder features in a multi-scale fashion. The cross-level enhancement (CLE) further performs adjacent-level feature fusion to amplify the regions of interest in aggregated manipulation features. Finally, the multi-exit architecture (MEA) guides the model to learn fine-grained features and segment out the manipulated region. Extensive experiments across diverse and challenging datasets conclusively establish the superiority of MEAFormer over existing state-of-the-art methods, excelling in accuracy, generalization, and robustness.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111565"},"PeriodicalIF":7.5,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-12DOI: 10.1016/j.patcog.2025.111559
Qianqian Wang , Junhao Zhang , Long Li , Lishan Qiao , Pew-Thian Yap , Mingxia Liu
Resting-state functional MRI (rs-fMRI) is a non-invasive tool increasingly used to detect abnormalities in brain connectivity for disorder analysis. Many learning models have been explored for fMRI analysis but usually require extensive data for reliable training. Multisite studies increase sample sizes by pooling data from multiple sites, but often face data security and privacy challenges. Federated learning (FL) facilitates collaborative model training without pooling fMRI data from different sites/clients. However, many FL methods share model parameters between clients, posing significant security risks during communication and greatly increasing communication costs. Besides, fMRI data for local model training is usually limited at each site, which may hinder local model training. To this end, we propose a graph augmentation guided federated distillation (GAFD) framework for multisite fMRI analysis and brain disorder identification. At each client, we augment each input functional connectivity network/graph derived from fMRI by perturbing node features and edges, followed by a feature encoder for graph representation learning. A contrastive loss is used to maximize the agreement of learned representations from the same subject, further enhancing discriminative power of fMRI representations. On the server side, the server receives model outputs (i.e., logit scores) corresponding to augmented graphs from each client and merges them. The merged logit score is then sent back to each client for knowledge distillation. This can promote knowledge sharing among clients, reduce the risk of privacy leakage, and greatly decrease communication costs. Experimental results on two multisite fMRI datasets indicate that our approach outperforms several state-of-the-arts.
{"title":"Graph augmentation guided federated knowledge distillation for multisite functional MRI analysis","authors":"Qianqian Wang , Junhao Zhang , Long Li , Lishan Qiao , Pew-Thian Yap , Mingxia Liu","doi":"10.1016/j.patcog.2025.111559","DOIUrl":"10.1016/j.patcog.2025.111559","url":null,"abstract":"<div><div>Resting-state functional MRI (rs-fMRI) is a non-invasive tool increasingly used to detect abnormalities in brain connectivity for disorder analysis. Many learning models have been explored for fMRI analysis but usually require extensive data for reliable training. Multisite studies increase sample sizes by pooling data from multiple sites, but often face data security and privacy challenges. Federated learning (FL) facilitates collaborative model training without pooling fMRI data from different sites/clients. However, many FL methods share model parameters between clients, posing significant security risks during communication and greatly increasing communication costs. Besides, fMRI data for local model training is usually limited at each site, which may hinder local model training. To this end, we propose a graph augmentation guided federated distillation (GAFD) framework for multisite fMRI analysis and brain disorder identification. At each client, we augment each input functional connectivity network/graph derived from fMRI by perturbing node features and edges, followed by a feature encoder for graph representation learning. A contrastive loss is used to maximize the agreement of learned representations from the same subject, further enhancing discriminative power of fMRI representations. On the server side, the server receives model outputs (<em>i.e.</em>, logit scores) corresponding to augmented graphs from each client and merges them. The merged logit score is then sent back to each client for knowledge distillation. This can promote knowledge sharing among clients, reduce the risk of privacy leakage, and greatly decrease communication costs. Experimental results on two multisite fMRI datasets indicate that our approach outperforms several state-of-the-arts.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111559"},"PeriodicalIF":7.5,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-12DOI: 10.1016/j.patcog.2025.111578
Gang Xiao , Sihan Ge , Yangsheng Zhong , Zhongcheng Xiao , Junfeng Song , Jiawei Lu
Transformer-based networks have achieved impressive performance on three-dimensional point cloud data. However, most existing methods focus on aggregating local features in the neighborhoods of a point cloud, ignoring the global feature information. Therefore, it is difficult to capture the long-range dependencies of a point cloud. In this paper, we propose the Shape-Aware Propagation Transformer (SAPFormer), which flexibly captures the semantic information of point clouds in geometric space and effectively extracts the contextual geometric space information. Specifically, we first design local group self-attention (LGA) to capture the local interaction information in each region. To capture the separated local region feature relationships, we propose local group propagation (LGP) to pass the information between different regions via query points. This allows features to propagate among neighbors for more fine-grained feature information. To further enlarge the receptive field, we propose the global shape feature module (GSFM) to learn global context information through key shape points (KSPs). Finally, to solve the positional information cues between global contexts, we introduce spatial-shape relative position encoding (SS-RPE), which obtains positional relationships between points. Extensive experiments demonstrate the effectiveness and superiority of our method on the S3DIS, SensatUrban, ScanNet V2, ShapeNetPart, and ModelNet40 datasets. The code is available at https://github.com/viivan/SAPFormer-main.
{"title":"SAPFormer: Shape-aware propagation Transformer for point clouds","authors":"Gang Xiao , Sihan Ge , Yangsheng Zhong , Zhongcheng Xiao , Junfeng Song , Jiawei Lu","doi":"10.1016/j.patcog.2025.111578","DOIUrl":"10.1016/j.patcog.2025.111578","url":null,"abstract":"<div><div>Transformer-based networks have achieved impressive performance on three-dimensional point cloud data. However, most existing methods focus on aggregating local features in the neighborhoods of a point cloud, ignoring the global feature information. Therefore, it is difficult to capture the long-range dependencies of a point cloud. In this paper, we propose the <strong>Shape-Aware Propagation Transformer (SAPFormer)</strong>, which flexibly captures the semantic information of point clouds in geometric space and effectively extracts the contextual geometric space information. Specifically, we first design local group self-attention (LGA) to capture the local interaction information in each region. To capture the separated local region feature relationships, we propose local group propagation (LGP) to pass the information between different regions via query points. This allows features to propagate among neighbors for more fine-grained feature information. To further enlarge the receptive field, we propose the global shape feature module (GSFM) to learn global context information through key shape points (KSPs). Finally, to solve the positional information cues between global contexts, we introduce spatial-shape relative position encoding (SS-RPE), which obtains positional relationships between points. Extensive experiments demonstrate the effectiveness and superiority of our method on the S3DIS, SensatUrban, ScanNet V2, ShapeNetPart, and ModelNet40 datasets. The code is available at <span><span>https://github.com/viivan/SAPFormer-main</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111578"},"PeriodicalIF":7.5,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-11DOI: 10.1016/j.patcog.2025.111595
Yujun Kim, Young-Gab Kim
As video surveillance has become increasingly widespread, the necessity of video anomaly detection to support surveillance-related tasks has grown significantly. We propose a novel multi-frame prediction error-based framework (MPE) to enhance anomaly detection accuracy and efficiency. MPE mitigates false positives in prediction models by leveraging multi-frame prediction errors and reduces the time required for their generation through a frame prediction error storage method. The core idea of MPE is to reduce the prediction error of a normal frame while increasing the prediction error of an abnormal frame by leveraging the prediction errors of adjacent frames. We evaluated our method on the Ped2, Avenue, and ShanghaiTech datasets. The experimental results demonstrate that MPE improved the frame-level area under the curve (AUC) of prediction models while maintaining low computational overhead across all datasets. These results show that MPE makes prediction models robust and efficient for video anomaly detection in real-world scenarios.
{"title":"MPE: Multi-frame prediction error-based video anomaly detection framework for robust anomaly inference","authors":"Yujun Kim, Young-Gab Kim","doi":"10.1016/j.patcog.2025.111595","DOIUrl":"10.1016/j.patcog.2025.111595","url":null,"abstract":"<div><div>As video surveillance has become increasingly widespread, the necessity of video anomaly detection to support surveillance-related tasks has grown significantly. We propose a novel multi-frame prediction error-based framework (MPE) to enhance anomaly detection accuracy and efficiency. MPE mitigates false positives in prediction models by leveraging multi-frame prediction errors and reduces the time required for their generation through a frame prediction error storage method. The core idea of MPE is to reduce the prediction error of a normal frame while increasing the prediction error of an abnormal frame by leveraging the prediction errors of adjacent frames. We evaluated our method on the Ped2, Avenue, and ShanghaiTech datasets. The experimental results demonstrate that MPE improved the frame-level area under the curve (AUC) of prediction models while maintaining low computational overhead across all datasets. These results show that MPE makes prediction models robust and efficient for video anomaly detection in real-world scenarios.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111595"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}