Pub Date : 2025-12-25DOI: 10.1016/j.patrec.2025.12.012
Yanru Pan, Benchong Li
The Natarajan dimension is a crucial metric for measuring the capacity of a learning model and analyzing generalization ability of a classifier in multi-class classification tasks. In this paper, we present a tight upper bound of Natarajan dimension for linear multi-class predictors based on class sensitive feature mapping for multi-vector construction, and provide the exact Natarajan dimension when the dimension of feature is 2.
{"title":"Bounds on the Natarajan dimension of a class of linear multi-class predictors","authors":"Yanru Pan, Benchong Li","doi":"10.1016/j.patrec.2025.12.012","DOIUrl":"10.1016/j.patrec.2025.12.012","url":null,"abstract":"<div><div>The Natarajan dimension is a crucial metric for measuring the capacity of a learning model and analyzing generalization ability of a classifier in multi-class classification tasks. In this paper, we present a tight upper bound of Natarajan dimension for linear multi-class predictors based on class sensitive feature mapping for multi-vector construction, and provide the exact Natarajan dimension when the dimension of feature is 2.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 129-134"},"PeriodicalIF":3.3,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-25DOI: 10.1016/j.patrec.2025.12.010
Jingang Wang , Tong Xiao , Hui Du , Cheng Zhang , Peng Liu
Cross-domain detection of AI-generated text is a crucial task for cybersecurity. In practical scenarios, after being trained on one or multiple known text generation sources (source domain), a detection model must be capable of effectively identifying text generated by unknown and unseen sources (target domain). Current approaches suffer from limited cross-domain generalization due to insufficient structural adaptation to domain discrepancies. To address this critical limitation, we propose RiDis,a classification model that synergizes Linguistic Richness and Lexical Pair Dispersion for cross-domain AI-generated text detection. Through comprehensive statistical analysis, we establish Linguistic Richness and Lexical Pair Dispersion as discriminative indicators for distinguishing human-authored and machine-generated texts. Our architecture features two innovative components, a Semantic Coherence Extraction Module employing long-range receptive fields to capture linguistic richness through global semantic trend analysis, and a Contextual Dependency Extraction Module utilizing localized receptive fields to quantify lexical pair dispersion via fine-grained word association patterns. The framework further incorporates domain adaptation learning to enhance cross-domain detection robustness. Extensive evaluations demonstrate that our method achieves superior detection accuracy compared to state-of-the-art baselines across multiple domains, with experimental results showing significant performance improvements on cross-domain test scenarios.
{"title":"Cross-Domain detection of AI-Generated text: Integrating linguistic richness and lexical pair dispersion via deep learning","authors":"Jingang Wang , Tong Xiao , Hui Du , Cheng Zhang , Peng Liu","doi":"10.1016/j.patrec.2025.12.010","DOIUrl":"10.1016/j.patrec.2025.12.010","url":null,"abstract":"<div><div>Cross-domain detection of AI-generated text is a crucial task for cybersecurity. In practical scenarios, after being trained on one or multiple known text generation sources (source domain), a detection model must be capable of effectively identifying text generated by unknown and unseen sources (target domain). Current approaches suffer from limited cross-domain generalization due to insufficient structural adaptation to domain discrepancies. To address this critical limitation, we propose <strong>RiDis</strong>,a classification model that synergizes Linguistic <strong>Ri</strong>chness and Lexical Pair <strong>Dis</strong>persion for cross-domain AI-generated text detection. Through comprehensive statistical analysis, we establish Linguistic Richness and Lexical Pair Dispersion as discriminative indicators for distinguishing human-authored and machine-generated texts. Our architecture features two innovative components, a Semantic Coherence Extraction Module employing long-range receptive fields to capture linguistic richness through global semantic trend analysis, and a Contextual Dependency Extraction Module utilizing localized receptive fields to quantify lexical pair dispersion via fine-grained word association patterns. The framework further incorporates domain adaptation learning to enhance cross-domain detection robustness. Extensive evaluations demonstrate that our method achieves superior detection accuracy compared to state-of-the-art baselines across multiple domains, with experimental results showing significant performance improvements on cross-domain test scenarios.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 123-128"},"PeriodicalIF":3.3,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-25DOI: 10.1016/j.patrec.2025.12.011
Qingshuo Sun , Guorui Sheng , Xiangyi Zhu , Jingru Song , Yongqiang Song , Tao Yao , Haiyang Wang , Lili Wang
Food image recognition based on deep learning plays a crucial role in the field of food computing. However, its high demand for computing resources limits its deployment on end devices and fails to effectively achieve intelligent diet and nutrition management. To address this issue, we aim to balance computational efficiency with recognition accuracy and propose a compact food image recognition model named Lightweight Inter-Group Food Recognition Net (LIFR-Net) that combines Convolutional Neural Network (CNN) and Vision Transformer (ViT). In LIFR-Net, a lightweight ViT module called Lightweight Inter-group Transformer (LIT) is designed, and a lightweight component named Feature Grouping Transformer is constructed, which can efficiently extract local and global features of food images and optimize the number of parameters and computational complexity. In addition, by shuffling and fusing irregularly grouped feature maps, the information exchange among channels is enhanced, and the recognition accuracy of the model is improved. Extensive experiments on three commonly used public food image recognition datasets, namely ETHZ Food–101, Vireo Food–172, and UEC Food–256, show that LIFR-Net achieves recognition accuracies of 90.49%, 91.04%, and 74.23% with lower numbers of parameters and computational amounts.
{"title":"LIFR-Net: A lightweight hybrid neural network with feature grouping for efficient food image recognition","authors":"Qingshuo Sun , Guorui Sheng , Xiangyi Zhu , Jingru Song , Yongqiang Song , Tao Yao , Haiyang Wang , Lili Wang","doi":"10.1016/j.patrec.2025.12.011","DOIUrl":"10.1016/j.patrec.2025.12.011","url":null,"abstract":"<div><div>Food image recognition based on deep learning plays a crucial role in the field of food computing. However, its high demand for computing resources limits its deployment on end devices and fails to effectively achieve intelligent diet and nutrition management. To address this issue, we aim to balance computational efficiency with recognition accuracy and propose a compact food image recognition model named Lightweight Inter-Group Food Recognition Net (LIFR-Net) that combines Convolutional Neural Network (CNN) and Vision Transformer (ViT). In LIFR-Net, a lightweight ViT module called Lightweight Inter-group Transformer (LIT) is designed, and a lightweight component named Feature Grouping Transformer is constructed, which can efficiently extract local and global features of food images and optimize the number of parameters and computational complexity. In addition, by shuffling and fusing irregularly grouped feature maps, the information exchange among channels is enhanced, and the recognition accuracy of the model is improved. Extensive experiments on three commonly used public food image recognition datasets, namely ETHZ Food–101, Vireo Food–172, and UEC Food–256, show that LIFR-Net achieves recognition accuracies of 90.49%, 91.04%, and 74.23% with lower numbers of parameters and computational amounts.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 22-28"},"PeriodicalIF":3.3,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-24DOI: 10.1016/j.patrec.2025.12.008
Edoardo Coppola , Mattia Savardi , Alberto Signoroni
Accurate 3D segmentation of multiple sclerosis lesions is critical for clinical practice, yet existing approaches face key limitations: many models rely on 2D architectures or partial modality combinations, while others struggle to generalise across scanners and protocols. Although large-scale, multi-site training can improve robustness, its data demands are often prohibitive. To address these challenges, we propose a 3D multi-modal network that simultaneously processes T1-weighted, T2-weighted, and FLAIR scans, leveraging full cross-modal interactions and volumetric context to achieve state-of-the-art performance across four diverse public datasets. To tackle data scarcity, we quantify the minimal fine-tuning effort needed to adapt to individual unseen datasets and reformulate the few-shot learning paradigm at an “instance-per-dataset” level (rather than traditional “instance-per-class”), enabling the quantification of the minimal fine-tuning effort to adapt to multiple unseen sources simultaneously. Finally, we introduce Latent Distance Analysis, a novel label-free reliability estimation technique that anticipates potential distribution shifts and supports any form of test-time adaptation, thereby strengthening efficient robustness and physicians’ trust.
{"title":"Towards robust and reliable multi-modal 3D segmentation of multiple sclerosis lesions","authors":"Edoardo Coppola , Mattia Savardi , Alberto Signoroni","doi":"10.1016/j.patrec.2025.12.008","DOIUrl":"10.1016/j.patrec.2025.12.008","url":null,"abstract":"<div><div>Accurate 3D segmentation of multiple sclerosis lesions is critical for clinical practice, yet existing approaches face key limitations: many models rely on 2D architectures or partial modality combinations, while others struggle to generalise across scanners and protocols. Although large-scale, multi-site training can improve robustness, its data demands are often prohibitive. To address these challenges, we propose a 3D multi-modal network that simultaneously processes T1-weighted, T2-weighted, and FLAIR scans, leveraging full cross-modal interactions and volumetric context to achieve state-of-the-art performance across four diverse public datasets. To tackle data scarcity, we quantify the <em>minimal</em> fine-tuning effort needed to adapt to individual unseen datasets and reformulate the few-shot learning paradigm at an “instance-per-dataset” level (rather than traditional “instance-per-class”), enabling the quantification of the <em>minimal</em> fine-tuning effort to adapt to <em>multiple</em> unseen sources simultaneously. Finally, we introduce <em>Latent Distance Analysis</em>, a novel label-free reliability estimation technique that anticipates potential distribution shifts and supports any form of test-time adaptation, thereby strengthening efficient robustness and physicians’ trust.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 115-122"},"PeriodicalIF":3.3,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-24DOI: 10.1016/j.patrec.2025.12.007
Hyunseo Kim, Longbin Jin, Eun Yi Kim
Diagnosing depression is critical due to its profound impact on individuals and associated risks. Although deep learning techniques like convolutional neural networks and transformers have been employed to detect depression, they require large, labeled datasets and substantial computational resources, posing challenges in data-scarce environments. We introduce p-DREAM (Prompt-Driven Reprogramming Exploiting Audio Mapping), a novel and data-efficient model designed to diagnose depression from speech data alone. The p-DREAM combines two main strategies: data augmentation and model reprogramming. First, it utilizes audio-specific data augmentation techniques to generate a richer set of training examples. Next, it employs audio prompts to aid in domain adaptation. These prompts guide a frozen pre-trained transformer, which extracts meaningful features. Finally, these features are fed into a lightweight classifier for prediction. The p-DREAM outperforms traditional fine-tuning and linear probing methods, while requiring only a small number of trainable parameters. Evaluations on three benchmark datasets (DAIC-WoZ, E-DAIC, and AVEC 2014) demonstrate consistent improvements. In particular, p-DREAM achieves a leading macro F1 score of 0.7734 using only acoustic features. We further conducted ablation studies on prompt length, position, and initialization, confirming their importance in effective model adaptation. p-DREAM offers a practical and privacy-conscious approach for speech-based depression assessment in low-resource environments. To promote reproducibility and community adoption, we plan to release our codebase in compliance with the ethical protocols outlined in the AVEC challenges.
{"title":"Audio prompt driven reprogramming for diagnosing major depressive disorder","authors":"Hyunseo Kim, Longbin Jin, Eun Yi Kim","doi":"10.1016/j.patrec.2025.12.007","DOIUrl":"10.1016/j.patrec.2025.12.007","url":null,"abstract":"<div><div>Diagnosing depression is critical due to its profound impact on individuals and associated risks. Although deep learning techniques like convolutional neural networks and transformers have been employed to detect depression, they require large, labeled datasets and substantial computational resources, posing challenges in data-scarce environments. We introduce p-DREAM (Prompt-Driven Reprogramming Exploiting Audio Mapping), a novel and data-efficient model designed to diagnose depression from speech data alone. The p-DREAM combines two main strategies: data augmentation and model reprogramming. First, it utilizes audio-specific data augmentation techniques to generate a richer set of training examples. Next, it employs audio prompts to aid in domain adaptation. These prompts guide a frozen pre-trained transformer, which extracts meaningful features. Finally, these features are fed into a lightweight classifier for prediction. The p-DREAM outperforms traditional fine-tuning and linear probing methods, while requiring only a small number of trainable parameters. Evaluations on three benchmark datasets (DAIC-WoZ, E-DAIC, and AVEC 2014) demonstrate consistent improvements. In particular, p-DREAM achieves a leading macro F1 score of 0.7734 using only acoustic features. We further conducted ablation studies on prompt length, position, and initialization, confirming their importance in effective model adaptation. p-DREAM offers a practical and privacy-conscious approach for speech-based depression assessment in low-resource environments. To promote reproducibility and community adoption, we plan to release our codebase in compliance with the ethical protocols outlined in the AVEC challenges.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 1-8"},"PeriodicalIF":3.3,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-24DOI: 10.1016/j.patrec.2025.12.009
Hangfan Liu , Bo Li , Yiran Li , Manuel Taso , Dylan Tisdall , Yulin Chang , John A Detre , Ze Wang
Arterial spin labeling (ASL) perfusion MRI stands as the sole non-invasive method to quantify regional cerebral blood flow (CBF), a crucial physiological parameter. However, ASL MRI typically suffers from a relatively low signal-to-noise ratio. In this study, we introduce a novel ASL denoising approach termed Multi-coil Unified Sparsity regularization using Inter-slice Correlation (MUSIC). While MRI, including ASL data, is routinely captured using multi-channel coils, existing denoising techniques are tailored for coil-combined data, overlooking inherent multi-channel correlations. MUSIC capitalizes on the fact that multi-channel images are primarily distinguished by coil sensitivity weighting and random noise, resulting in an intrinsic low-rank structure within the stacked multi-channel data matrix. This low rankness can be further enhanced by grouping highly correlated slices. Our approach involves adapting regularization to each slice individually, forming potentially low-rank matrices by stacking vectorized slices selected from different channels based on their Euclidean distance from the current slice under processing. Matrix rank is then approximated using the logarithm-determinant of the covariance matrix. Importantly, MUSIC operates directly on complex data, eliminating the need for separating magnitude and phase or dividing real and imaginary data, thereby minimizing information loss. The degree of low-rank regularization is controlled by the estimated noise level, achieving a balance between noise reduction and texture preservation. Experimental validation on real-world imaging data demonstrates the efficacy of MUSIC in significantly enhancing ASL perfusion quality. By effectively suppressing noise while retaining essential textural information, MUSIC holds promise for improving the utility and accuracy of ASL perfusion MRI, thus advancing neuroimaging research and clinical diagnoses.
{"title":"MUSIC: Multi-coil unified sparsity regularization using inter-slice correlation for arterial spin labeling MRI denoising","authors":"Hangfan Liu , Bo Li , Yiran Li , Manuel Taso , Dylan Tisdall , Yulin Chang , John A Detre , Ze Wang","doi":"10.1016/j.patrec.2025.12.009","DOIUrl":"10.1016/j.patrec.2025.12.009","url":null,"abstract":"<div><div>Arterial spin labeling (ASL) perfusion MRI stands as the sole non-invasive method to quantify regional cerebral blood flow (CBF), a crucial physiological parameter. However, ASL MRI typically suffers from a relatively low signal-to-noise ratio. In this study, we introduce a novel ASL denoising approach termed Multi-coil Unified Sparsity regularization using Inter-slice Correlation (MUSIC). While MRI, including ASL data, is routinely captured using multi-channel coils, existing denoising techniques are tailored for coil-combined data, overlooking inherent multi-channel correlations. MUSIC capitalizes on the fact that multi-channel images are primarily distinguished by coil sensitivity weighting and random noise, resulting in an intrinsic low-rank structure within the stacked multi-channel data matrix. This low rankness can be further enhanced by grouping highly correlated slices. Our approach involves adapting regularization to each slice individually, forming potentially low-rank matrices by stacking vectorized slices selected from different channels based on their Euclidean distance from the current slice under processing. Matrix rank is then approximated using the logarithm-determinant of the covariance matrix. Importantly, MUSIC operates directly on complex data, eliminating the need for separating magnitude and phase or dividing real and imaginary data, thereby minimizing information loss. The degree of low-rank regularization is controlled by the estimated noise level, achieving a balance between noise reduction and texture preservation. Experimental validation on real-world imaging data demonstrates the efficacy of MUSIC in significantly enhancing ASL perfusion quality. By effectively suppressing noise while retaining essential textural information, MUSIC holds promise for improving the utility and accuracy of ASL perfusion MRI, thus advancing neuroimaging research and clinical diagnoses.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 142-148"},"PeriodicalIF":3.3,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.1016/j.patrec.2025.11.045
Qi Guo, Xiaodong Gu
Accurate facial expression manipulation, particularly transforming complex, non-neutral expressions into specific target states, remains challenging due to substantial disparities among expression domains. Existing methods often struggle with such domain shifts, leading to suboptimal editing results. To address these challenges, we propose a novel framework called Domain-Aware Expression Transformation with Dual-Level Label Information Classifier (DAET-DLIC). The DAET-DLIC architecture consists of two major modules. The Domain-Aware Expression Transformation module enhances domain awareness by processing latent codes to model expression-domain distributions. The Dual-Level Label Information Classifier performs classification at both the latent and image levels to ensure comprehensive and reliable label supervision. Furthermore, the Expression Awareness Loss Function provides precise control over the directionality of expression transformations, effectively reducing the risk of expression semantic drift in the CLIP (Contrastive Language-Image Pretraining) space. We validate our method through extensive quantitative and qualitative experiments on the Radboud Faces Database and CelebA-HQ datasets and introduce a comprehensive quantitative metric to assess manipulation efficacy.
{"title":"Enhanced facial expression manipulation through domain-aware transformation and dual-level classification with expression awarness loss in the CLIP space","authors":"Qi Guo, Xiaodong Gu","doi":"10.1016/j.patrec.2025.11.045","DOIUrl":"10.1016/j.patrec.2025.11.045","url":null,"abstract":"<div><div>Accurate facial expression manipulation, particularly transforming complex, non-neutral expressions into specific target states, remains challenging due to substantial disparities among expression domains. Existing methods often struggle with such domain shifts, leading to suboptimal editing results. To address these challenges, we propose a novel framework called Domain-Aware Expression Transformation with Dual-Level Label Information Classifier (DAET-DLIC). The DAET-DLIC architecture consists of two major modules. The Domain-Aware Expression Transformation module enhances domain awareness by processing latent codes to model expression-domain distributions. The Dual-Level Label Information Classifier performs classification at both the latent and image levels to ensure comprehensive and reliable label supervision. Furthermore, the Expression Awareness Loss Function provides precise control over the directionality of expression transformations, effectively reducing the risk of expression semantic drift in the CLIP (Contrastive Language-Image Pretraining) space. We validate our method through extensive quantitative and qualitative experiments on the Radboud Faces Database and CelebA-HQ datasets and introduce a comprehensive quantitative metric to assess manipulation efficacy.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 102-107"},"PeriodicalIF":3.3,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1016/j.patrec.2025.12.005
Jian Cheng, Chen Feng, Yang Xiao, Zhiguo Cao
Crowd counting is widely studied, yet its reliability in low-light environments remains underexplored. Regular counters fail to perform well due to poor image quality; applying image enhancement pre-processing yields limited improvement; and introducing additional thermal inputs increases cost. This study presents an approach that only requires annotated normal-light RGB data. To learn illumination-robust representations, we construct normal- and low-light image pairs and decompose their features into common and unique components. The common components preserve shared thus illumination-robust information, so they are optimized for density map prediction. We also introduce a dataset for evaluating crowd counting performance in low-light conditions. Experiments show that our approach consistently improves performance on multiple baseline architectures with negligible computational overhead. The source code and dataset will be made publicly available upon acceptance at https://github.com/hustaia/Feature_Decomposition_Counting.
{"title":"An illumination-robust feature decomposition approach for low-light crowd counting","authors":"Jian Cheng, Chen Feng, Yang Xiao, Zhiguo Cao","doi":"10.1016/j.patrec.2025.12.005","DOIUrl":"10.1016/j.patrec.2025.12.005","url":null,"abstract":"<div><div>Crowd counting is widely studied, yet its reliability in low-light environments remains underexplored. Regular counters fail to perform well due to poor image quality; applying image enhancement pre-processing yields limited improvement; and introducing additional thermal inputs increases cost. This study presents an approach that only requires annotated normal-light RGB data. To learn illumination-robust representations, we construct normal- and low-light image pairs and decompose their features into common and unique components. The common components preserve shared thus illumination-robust information, so they are optimized for density map prediction. We also introduce a dataset for evaluating crowd counting performance in low-light conditions. Experiments show that our approach consistently improves performance on multiple baseline architectures with negligible computational overhead. The source code and dataset will be made publicly available upon acceptance at <span><span>https://github.com/hustaia/Feature_Decomposition_Counting</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 108-114"},"PeriodicalIF":3.3,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1016/j.patrec.2025.12.006
Jiaqi Yu, Yanshan Zhou, Renjie Pan, Cunyan Li, Hua Yang
Understanding dense crowd scenes requires analyzing multiple spatial and behavioral attributes. However, existing attributes often fall short of identifying potential safety risks such as panic. To address this, we propose two safety-aware crowd attributes: Crowd Motion Stability (CMS) and Individual Comfort Distance (ICD). CMS characterizes macro-level motion coordination based on the spatial-temporal consistency of crowd movement. In contrast, ICD is grounded in social psychology and captures individuals’ preferred interpersonal distance under varying densities. To accurately recognize these attributes, we propose a Psychology-Guided Safety-Aware Network (PGSAN), which integrates the Spatial-Temporal Consistency Network (STCN) and the Spatial Distance Network (SDN). Specifically, STCN is constructed based on behavioral coherence theory to measure CMS. Meanwhile, SDN models ICD by integrating dynamic crowd states and dual perceptual mechanisms (intuitive and analytical) in psychology, enabling adaptive comfort distance extraction. Features from both sub-networks are fused to support attribute recognition across diverse video scenes. Experimental results demonstrate the proposed method’s superior performance in recognizing safety attributes in dense crowds.
{"title":"Psychology-informed safety attributes recognition in dense crowds","authors":"Jiaqi Yu, Yanshan Zhou, Renjie Pan, Cunyan Li, Hua Yang","doi":"10.1016/j.patrec.2025.12.006","DOIUrl":"10.1016/j.patrec.2025.12.006","url":null,"abstract":"<div><div>Understanding dense crowd scenes requires analyzing multiple spatial and behavioral attributes. However, existing attributes often fall short of identifying potential safety risks such as panic. To address this, we propose two safety-aware crowd attributes: Crowd Motion Stability (CMS) and Individual Comfort Distance (ICD). CMS characterizes macro-level motion coordination based on the spatial-temporal consistency of crowd movement. In contrast, ICD is grounded in social psychology and captures individuals’ preferred interpersonal distance under varying densities. To accurately recognize these attributes, we propose a Psychology-Guided Safety-Aware Network (PGSAN), which integrates the Spatial-Temporal Consistency Network (STCN) and the Spatial Distance Network (SDN). Specifically, STCN is constructed based on behavioral coherence theory to measure CMS. Meanwhile, SDN models ICD by integrating dynamic crowd states and dual perceptual mechanisms (intuitive and analytical) in psychology, enabling adaptive comfort distance extraction. Features from both sub-networks are fused to support attribute recognition across diverse video scenes. Experimental results demonstrate the proposed method’s superior performance in recognizing safety attributes in dense crowds.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 88-94"},"PeriodicalIF":3.3,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1016/j.patrec.2025.12.004
Shengjie Li , Jian Shi , Danyang Chen , Zheng Zhu , Feng Hu , Wei Jiang , Kai Shu , Zheng You , Ping Zhang , Zhouping Tang
In the study of electroencephalography (EEG)-based motor imagery (MI) brain-computer interfaces (BCIs), neurorehabilitation technologies hold significant potential for recovering from intracerebral hemorrhage (ICH). However, the rehabilitation process is hindered as the clinical practicality of such systems is reduced considerably due to their lengthy setup procedures caused by excessive number of channels. Accordingly, this study proposes a channel selection method based on an adaptive recursive learning framework, which establishes a comprehensive evaluation metric by combining time-frequency domain features. Experimental results demonstrate that, upon using 37.50 % fewer channels, the average accuracy of MI classification increased from 65.44 % to 69.28 % in healthy subjects and from 65.00 % to 67.64 % in patients with ICH. This study presents the pioneering EEG-based MI BCI channel selection process specifically designed for ICH patients, paving the way for personalized rehabilitation protocols and facilitating the translation of neurotechnology into clinical practice.
{"title":"Adaptive recursive channel selection for robust decoding of motor imagery EEG signal in patients with intracerebral hemorrhage","authors":"Shengjie Li , Jian Shi , Danyang Chen , Zheng Zhu , Feng Hu , Wei Jiang , Kai Shu , Zheng You , Ping Zhang , Zhouping Tang","doi":"10.1016/j.patrec.2025.12.004","DOIUrl":"10.1016/j.patrec.2025.12.004","url":null,"abstract":"<div><div>In the study of electroencephalography (EEG)-based motor imagery (MI) brain-computer interfaces (BCIs), neurorehabilitation technologies hold significant potential for recovering from intracerebral hemorrhage (ICH). However, the rehabilitation process is hindered as the clinical practicality of such systems is reduced considerably due to their lengthy setup procedures caused by excessive number of channels. Accordingly, this study proposes a channel selection method based on an adaptive recursive learning framework, which establishes a comprehensive evaluation metric by combining time-frequency domain features. Experimental results demonstrate that, upon using 37.50 % fewer channels, the average accuracy of MI classification increased from 65.44 % to 69.28 % in healthy subjects and from 65.00 % to 67.64 % in patients with ICH. This study presents the pioneering EEG-based MI BCI channel selection process specifically designed for ICH patients, paving the way for personalized rehabilitation protocols and facilitating the translation of neurotechnology into clinical practice.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 95-101"},"PeriodicalIF":3.3,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}