Pub Date : 2025-12-19DOI: 10.1016/j.patrec.2025.11.045
Qi Guo, Xiaodong Gu
Accurate facial expression manipulation, particularly transforming complex, non-neutral expressions into specific target states, remains challenging due to substantial disparities among expression domains. Existing methods often struggle with such domain shifts, leading to suboptimal editing results. To address these challenges, we propose a novel framework called Domain-Aware Expression Transformation with Dual-Level Label Information Classifier (DAET-DLIC). The DAET-DLIC architecture consists of two major modules. The Domain-Aware Expression Transformation module enhances domain awareness by processing latent codes to model expression-domain distributions. The Dual-Level Label Information Classifier performs classification at both the latent and image levels to ensure comprehensive and reliable label supervision. Furthermore, the Expression Awareness Loss Function provides precise control over the directionality of expression transformations, effectively reducing the risk of expression semantic drift in the CLIP (Contrastive Language-Image Pretraining) space. We validate our method through extensive quantitative and qualitative experiments on the Radboud Faces Database and CelebA-HQ datasets and introduce a comprehensive quantitative metric to assess manipulation efficacy.
{"title":"Enhanced facial expression manipulation through domain-aware transformation and dual-level classification with expression awarness loss in the CLIP space","authors":"Qi Guo, Xiaodong Gu","doi":"10.1016/j.patrec.2025.11.045","DOIUrl":"10.1016/j.patrec.2025.11.045","url":null,"abstract":"<div><div>Accurate facial expression manipulation, particularly transforming complex, non-neutral expressions into specific target states, remains challenging due to substantial disparities among expression domains. Existing methods often struggle with such domain shifts, leading to suboptimal editing results. To address these challenges, we propose a novel framework called Domain-Aware Expression Transformation with Dual-Level Label Information Classifier (DAET-DLIC). The DAET-DLIC architecture consists of two major modules. The Domain-Aware Expression Transformation module enhances domain awareness by processing latent codes to model expression-domain distributions. The Dual-Level Label Information Classifier performs classification at both the latent and image levels to ensure comprehensive and reliable label supervision. Furthermore, the Expression Awareness Loss Function provides precise control over the directionality of expression transformations, effectively reducing the risk of expression semantic drift in the CLIP (Contrastive Language-Image Pretraining) space. We validate our method through extensive quantitative and qualitative experiments on the Radboud Faces Database and CelebA-HQ datasets and introduce a comprehensive quantitative metric to assess manipulation efficacy.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 102-107"},"PeriodicalIF":3.3,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1016/j.patrec.2025.12.005
Jian Cheng, Chen Feng, Yang Xiao, Zhiguo Cao
Crowd counting is widely studied, yet its reliability in low-light environments remains underexplored. Regular counters fail to perform well due to poor image quality; applying image enhancement pre-processing yields limited improvement; and introducing additional thermal inputs increases cost. This study presents an approach that only requires annotated normal-light RGB data. To learn illumination-robust representations, we construct normal- and low-light image pairs and decompose their features into common and unique components. The common components preserve shared thus illumination-robust information, so they are optimized for density map prediction. We also introduce a dataset for evaluating crowd counting performance in low-light conditions. Experiments show that our approach consistently improves performance on multiple baseline architectures with negligible computational overhead. The source code and dataset will be made publicly available upon acceptance at https://github.com/hustaia/Feature_Decomposition_Counting.
{"title":"An illumination-robust feature decomposition approach for low-light crowd counting","authors":"Jian Cheng, Chen Feng, Yang Xiao, Zhiguo Cao","doi":"10.1016/j.patrec.2025.12.005","DOIUrl":"10.1016/j.patrec.2025.12.005","url":null,"abstract":"<div><div>Crowd counting is widely studied, yet its reliability in low-light environments remains underexplored. Regular counters fail to perform well due to poor image quality; applying image enhancement pre-processing yields limited improvement; and introducing additional thermal inputs increases cost. This study presents an approach that only requires annotated normal-light RGB data. To learn illumination-robust representations, we construct normal- and low-light image pairs and decompose their features into common and unique components. The common components preserve shared thus illumination-robust information, so they are optimized for density map prediction. We also introduce a dataset for evaluating crowd counting performance in low-light conditions. Experiments show that our approach consistently improves performance on multiple baseline architectures with negligible computational overhead. The source code and dataset will be made publicly available upon acceptance at <span><span>https://github.com/hustaia/Feature_Decomposition_Counting</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 108-114"},"PeriodicalIF":3.3,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1016/j.patrec.2025.12.006
Jiaqi Yu, Yanshan Zhou, Renjie Pan, Cunyan Li, Hua Yang
Understanding dense crowd scenes requires analyzing multiple spatial and behavioral attributes. However, existing attributes often fall short of identifying potential safety risks such as panic. To address this, we propose two safety-aware crowd attributes: Crowd Motion Stability (CMS) and Individual Comfort Distance (ICD). CMS characterizes macro-level motion coordination based on the spatial-temporal consistency of crowd movement. In contrast, ICD is grounded in social psychology and captures individuals’ preferred interpersonal distance under varying densities. To accurately recognize these attributes, we propose a Psychology-Guided Safety-Aware Network (PGSAN), which integrates the Spatial-Temporal Consistency Network (STCN) and the Spatial Distance Network (SDN). Specifically, STCN is constructed based on behavioral coherence theory to measure CMS. Meanwhile, SDN models ICD by integrating dynamic crowd states and dual perceptual mechanisms (intuitive and analytical) in psychology, enabling adaptive comfort distance extraction. Features from both sub-networks are fused to support attribute recognition across diverse video scenes. Experimental results demonstrate the proposed method’s superior performance in recognizing safety attributes in dense crowds.
{"title":"Psychology-informed safety attributes recognition in dense crowds","authors":"Jiaqi Yu, Yanshan Zhou, Renjie Pan, Cunyan Li, Hua Yang","doi":"10.1016/j.patrec.2025.12.006","DOIUrl":"10.1016/j.patrec.2025.12.006","url":null,"abstract":"<div><div>Understanding dense crowd scenes requires analyzing multiple spatial and behavioral attributes. However, existing attributes often fall short of identifying potential safety risks such as panic. To address this, we propose two safety-aware crowd attributes: Crowd Motion Stability (CMS) and Individual Comfort Distance (ICD). CMS characterizes macro-level motion coordination based on the spatial-temporal consistency of crowd movement. In contrast, ICD is grounded in social psychology and captures individuals’ preferred interpersonal distance under varying densities. To accurately recognize these attributes, we propose a Psychology-Guided Safety-Aware Network (PGSAN), which integrates the Spatial-Temporal Consistency Network (STCN) and the Spatial Distance Network (SDN). Specifically, STCN is constructed based on behavioral coherence theory to measure CMS. Meanwhile, SDN models ICD by integrating dynamic crowd states and dual perceptual mechanisms (intuitive and analytical) in psychology, enabling adaptive comfort distance extraction. Features from both sub-networks are fused to support attribute recognition across diverse video scenes. Experimental results demonstrate the proposed method’s superior performance in recognizing safety attributes in dense crowds.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 88-94"},"PeriodicalIF":3.3,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1016/j.patrec.2025.12.004
Shengjie Li , Jian Shi , Danyang Chen , Zheng Zhu , Feng Hu , Wei Jiang , Kai Shu , Zheng You , Ping Zhang , Zhouping Tang
In the study of electroencephalography (EEG)-based motor imagery (MI) brain-computer interfaces (BCIs), neurorehabilitation technologies hold significant potential for recovering from intracerebral hemorrhage (ICH). However, the rehabilitation process is hindered as the clinical practicality of such systems is reduced considerably due to their lengthy setup procedures caused by excessive number of channels. Accordingly, this study proposes a channel selection method based on an adaptive recursive learning framework, which establishes a comprehensive evaluation metric by combining time-frequency domain features. Experimental results demonstrate that, upon using 37.50 % fewer channels, the average accuracy of MI classification increased from 65.44 % to 69.28 % in healthy subjects and from 65.00 % to 67.64 % in patients with ICH. This study presents the pioneering EEG-based MI BCI channel selection process specifically designed for ICH patients, paving the way for personalized rehabilitation protocols and facilitating the translation of neurotechnology into clinical practice.
{"title":"Adaptive recursive channel selection for robust decoding of motor imagery EEG signal in patients with intracerebral hemorrhage","authors":"Shengjie Li , Jian Shi , Danyang Chen , Zheng Zhu , Feng Hu , Wei Jiang , Kai Shu , Zheng You , Ping Zhang , Zhouping Tang","doi":"10.1016/j.patrec.2025.12.004","DOIUrl":"10.1016/j.patrec.2025.12.004","url":null,"abstract":"<div><div>In the study of electroencephalography (EEG)-based motor imagery (MI) brain-computer interfaces (BCIs), neurorehabilitation technologies hold significant potential for recovering from intracerebral hemorrhage (ICH). However, the rehabilitation process is hindered as the clinical practicality of such systems is reduced considerably due to their lengthy setup procedures caused by excessive number of channels. Accordingly, this study proposes a channel selection method based on an adaptive recursive learning framework, which establishes a comprehensive evaluation metric by combining time-frequency domain features. Experimental results demonstrate that, upon using 37.50 % fewer channels, the average accuracy of MI classification increased from 65.44 % to 69.28 % in healthy subjects and from 65.00 % to 67.64 % in patients with ICH. This study presents the pioneering EEG-based MI BCI channel selection process specifically designed for ICH patients, paving the way for personalized rehabilitation protocols and facilitating the translation of neurotechnology into clinical practice.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 95-101"},"PeriodicalIF":3.3,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09DOI: 10.1016/j.patrec.2025.12.001
Jian Zou , Jun Wang , Kezhong Lu , Yingxin Lai , Kaiwen Luo , Zitong Yu
Generative models, such as GANs and Diffusion models, have achieved remarkable advancements in Artificial Intelligence Generated Content (AIGC), creating images that are nearly indistinguishable from real ones. However, existing detection methods often face challenges in identifying images generated by unseen models and exhibit limited generalization across different domains. In this paper, our aim is to improve the generalization capacity of AIGC image detectors by leveraging artifact features exposed during the upsampling process. Specifically, we reexamine the upsampling operations employed by generative models and observe that, in high-frequency regions of an image (e.g., edge areas with significant pixel intensity differences), generative models often struggle to accurately replicate the pixel distributions of real images, thereby leaving behind unavoidable artifact information. Based on this observation, we propose to utilize edge detection operators to enrich edge-aware detailed clues, enabling the model to focus on these critical features. Furthermore, We designed a module that combines upsampling and downsampling to analyze pixel correlation changes introduced by interpolation artifacts. The integrated approach effectively enhances the detection of subtle generative traces, thereby improving generalization across diverse generative models. Extensive experiments on three benchmark datasets demonstrate the superior performance of the proposed approach against previous state-of-the-art methods under cross-domain testing scenarios. The code is available at https://github.com/zj56/EdgeEnhanced-DeepfakeDetection.
{"title":"E2GenF: Universal AIGC image detection based on edge enhanced generalizable features","authors":"Jian Zou , Jun Wang , Kezhong Lu , Yingxin Lai , Kaiwen Luo , Zitong Yu","doi":"10.1016/j.patrec.2025.12.001","DOIUrl":"10.1016/j.patrec.2025.12.001","url":null,"abstract":"<div><div>Generative models, such as GANs and Diffusion models, have achieved remarkable advancements in Artificial Intelligence Generated Content (AIGC), creating images that are nearly indistinguishable from real ones. However, existing detection methods often face challenges in identifying images generated by unseen models and exhibit limited generalization across different domains. In this paper, our aim is to improve the generalization capacity of AIGC image detectors by leveraging artifact features exposed during the upsampling process. Specifically, we reexamine the upsampling operations employed by generative models and observe that, in high-frequency regions of an image (e.g., edge areas with significant pixel intensity differences), generative models often struggle to accurately replicate the pixel distributions of real images, thereby leaving behind unavoidable artifact information. Based on this observation, we propose to utilize edge detection operators to enrich edge-aware detailed clues, enabling the model to focus on these critical features. Furthermore, We designed a module that combines upsampling and downsampling to analyze pixel correlation changes introduced by interpolation artifacts. The integrated approach effectively enhances the detection of subtle generative traces, thereby improving generalization across diverse generative models. Extensive experiments on three benchmark datasets demonstrate the superior performance of the proposed approach against previous state-of-the-art methods under cross-domain testing scenarios. The code is available at <span><span>https://github.com/zj56/EdgeEnhanced-DeepfakeDetection</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 74-80"},"PeriodicalIF":3.3,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-07DOI: 10.1016/j.patrec.2025.12.003
Yue Liu, Wenxi Yang, Jianbin Jiao
Diffusion Transformers (DiTs) combine the scalability of transformers with the fidelity of diffusion models, achieving state-of-the-art image generation performance. However, their high computational cost hinders efficient deployment. Post-Training Quantization (PTQ) offers a remedy, yet existing methods struggle with the temporal and spatial dynamics of DiTs. We propose a simplified PTQ framework-combining computationally efficient rotation and randomness-for stable and effective DiT quantization. By replacing block-wise rotations with Hadamard transforms and zigzag permutations with random permutations, our method preserves the decorrelation effect while greatly reducing computational overhead. Experiments show that our approach maintains near full-precision performance at 8-bit and 6-bit precision levels. This work demonstrates that lightweight PTQ with structured randomness can effectively balance efficiency and fidelity, enabling practical deployment of DiTs in resource-constrained environments.
{"title":"Quantized DiT with hadamard transformation: A technical report","authors":"Yue Liu, Wenxi Yang, Jianbin Jiao","doi":"10.1016/j.patrec.2025.12.003","DOIUrl":"10.1016/j.patrec.2025.12.003","url":null,"abstract":"<div><div>Diffusion Transformers (DiTs) combine the scalability of transformers with the fidelity of diffusion models, achieving state-of-the-art image generation performance. However, their high computational cost hinders efficient deployment. Post-Training Quantization (PTQ) offers a remedy, yet existing methods struggle with the temporal and spatial dynamics of DiTs. We propose a simplified PTQ framework-combining computationally efficient rotation and randomness-for stable and effective DiT quantization. By replacing block-wise rotations with Hadamard transforms and zigzag permutations with random permutations, our method preserves the decorrelation effect while greatly reducing computational overhead. Experiments show that our approach maintains near full-precision performance at 8-bit and 6-bit precision levels. This work demonstrates that lightweight PTQ with structured randomness can effectively balance efficiency and fidelity, enabling practical deployment of DiTs in resource-constrained environments.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 81-87"},"PeriodicalIF":3.3,"publicationDate":"2025-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-06DOI: 10.1016/j.patrec.2025.11.034
Quanquan Xiao , Haiyan Jin , Haonan Su , Yuanlin Zhang
Infrared and visible image fusion is a current research hotspot in the field of multimodal image fusion, which aims to improve the perception and understanding of the scene through effective fusion. However, current deep learning-based fusion methods often fail to fully consider the difference between visible light brightness and thermal information of infrared images, resulting in brightness artifacts in the generated fused images, which seriously affects the visual effect of the fused images. To solve this problem, we propose a lightweight infrared and visible image decomposition fusion method (DECFusion). The method decomposes the luminance information of the visible image and the thermal information of the infrared image into illumination and reflection components through a learnable lightweight network, and adaptively adjusts the illumination component to remove unnecessary luminance interference. In the reconstruction stage, we combine the Retinex theory to reconstruct the image. Experiments show that the fused images generated by our method not only avoid the generation of luminance artifacts, but also are more lightweight and outperform the current state-of-the-art infrared and visible image fusion methods in terms of the visual quality of the fused images. Our code is available at https://github.com/tianzhiya/DECFusion.
{"title":"DECFusion: A lightweight decomposition fusion method for luminance artifact removal in infrared and visible images","authors":"Quanquan Xiao , Haiyan Jin , Haonan Su , Yuanlin Zhang","doi":"10.1016/j.patrec.2025.11.034","DOIUrl":"10.1016/j.patrec.2025.11.034","url":null,"abstract":"<div><div>Infrared and visible image fusion is a current research hotspot in the field of multimodal image fusion, which aims to improve the perception and understanding of the scene through effective fusion. However, current deep learning-based fusion methods often fail to fully consider the difference between visible light brightness and thermal information of infrared images, resulting in brightness artifacts in the generated fused images, which seriously affects the visual effect of the fused images. To solve this problem, we propose a lightweight infrared and visible image decomposition fusion method (DECFusion). The method decomposes the luminance information of the visible image and the thermal information of the infrared image into illumination and reflection components through a learnable lightweight network, and adaptively adjusts the illumination component to remove unnecessary luminance interference. In the reconstruction stage, we combine the Retinex theory to reconstruct the image. Experiments show that the fused images generated by our method not only avoid the generation of luminance artifacts, but also are more lightweight and outperform the current state-of-the-art infrared and visible image fusion methods in terms of the visual quality of the fused images. Our code is available at <span><span>https://github.com/tianzhiya/DECFusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 67-73"},"PeriodicalIF":3.3,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145749050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Detecting concealed emotions within apparently normal expressions is crucial for identifying potential mental health issues and facilitating timely support and intervention. The task of spotting macro- and micro-expressions involves predicting the emotional timeline within a video by identifying the onset (i.e., the beginning), apex (the peak of emotion), and offset (the end of emotion) frames of the displayed emotions. More particularly, closely monitoring the key emotion-conveying regions of the face; namely, the foundational muscle-movement cues known as facial action units (AUs)–greatly aids in the clear identification of micro-expressions. One major roadblock is the inadvertent introduction of biases into the training process, which degrades performance regardless of feature quality. Biases are spurious factors that falsely inflate or deflate performance metrics. For instance, the neural networks tend to falsely attribute certain AUs in specific facial regions to particular emotion classes, a phenomenon also termed as Inductive biases. To remove these false attributions, we must identify and mitigate biases that arise from mere correlation between some features and the output class labels. We hence introduce action-unit causal graphs. Unlike the traditional action-unit graph, which connects AUs based solely on spatial adjacency, the causal AU graph is derived from statistical tests and retains edges between AUs only when there is significant evidence that one AU causally influences another. Our model, named Causal-Ex (Causal-based Expression spotting), employs a fast causal inference algorithm to construct a causal graph of facial region of interests (ROIs). This enables us to select causally relevant facial action units in the ROIs. Our work demonstrates improvement in overall F1-scores compared to state-of-the-art approaches with 0.388 on CAS(ME)2 and 0.3701 on SAMM-Long Video datasets. Our code can be found at: https://github.com/noobasuna/causal_ex.git.
在明显正常的表情中发现隐藏的情绪对于识别潜在的心理健康问题和促进及时的支持和干预至关重要。发现宏观和微观表情的任务包括通过识别所显示的情绪的开始(即开始),顶点(情绪的顶峰)和抵消(情绪的结束)帧来预测视频中的情绪时间轴。更具体地说,密切监测面部关键的情绪传递区域;也就是说,被称为面部动作单位(AUs)的基本肌肉运动线索,极大地帮助清晰地识别微表情。一个主要的障碍是在训练过程中无意中引入偏差,这会降低性能,而不管特征质量如何。偏见是虚假的因素,错误地夸大或缩小绩效指标。例如,神经网络倾向于错误地将特定面部区域的某些au归因于特定的情绪类别,这种现象也被称为归纳偏差。为了消除这些错误的归因,我们必须识别和减轻由于一些特征和输出类标签之间的相关性而产生的偏差。因此,我们引入动作单元因果图。与传统的行动单元图(仅基于空间邻接性连接AU)不同,因果AU图是从统计测试中得出的,只有当有显著证据表明一个AU对另一个AU产生因果影响时,才保留AU之间的边缘。该模型采用快速因果推理算法构建了面部兴趣区域(roi)的因果图,并命名为causal - ex (causal -based Expression spotting)。这使我们能够在roi中选择因果相关的面部动作单元。我们的工作表明,与最先进的方法相比,总体f1得分有所提高,在CAS(ME)2上为0.388,在SAMM-Long Video数据集上为0.3701。我们的代码可以在https://github.com/noobasuna/causal_ex.git找到。
{"title":"Causal-Ex: Causal graph-based micro and macro expression spotting","authors":"Pei-Sze Tan, Sailaja Rajanala, Arghya Pal, Raphaël C.-W. Phan, Huey-Fang Ong","doi":"10.1016/j.patrec.2025.12.002","DOIUrl":"10.1016/j.patrec.2025.12.002","url":null,"abstract":"<div><div>Detecting concealed emotions within apparently normal expressions is crucial for identifying potential mental health issues and facilitating timely support and intervention. The task of spotting macro- and micro-expressions involves predicting the emotional timeline within a video by identifying the onset (i.e., the beginning), apex (the peak of emotion), and offset (the end of emotion) frames of the displayed emotions. More particularly, closely monitoring the key emotion-conveying regions of the face; namely, the foundational muscle-movement cues known as facial action units (AUs)–greatly aids in the clear identification of micro-expressions. One major roadblock is the inadvertent introduction of biases into the training process, which degrades performance regardless of feature quality. Biases are spurious factors that falsely inflate or deflate performance metrics. For instance, the neural networks tend to falsely attribute certain AUs in specific facial regions to particular emotion classes, a phenomenon also termed as Inductive biases. To remove these false attributions, we must identify and mitigate biases that arise from mere correlation between some features and the output class labels. We hence introduce action-unit causal graphs. Unlike the traditional action-unit graph, which connects AUs based solely on spatial adjacency, the causal AU graph is derived from statistical tests and retains edges between AUs only when there is significant evidence that one AU causally influences another. Our model, named <span>Causal-Ex</span> (<strong>Causal</strong>-based <strong>Ex</strong>pression spotting), employs a fast causal inference algorithm to construct a causal graph of facial region of interests (ROIs). This enables us to select causally relevant facial action units in the ROIs. Our work demonstrates improvement in overall F1-scores compared to state-of-the-art approaches with 0.388 on CAS(ME)<sup>2</sup> and 0.3701 on SAMM-Long Video datasets. Our code can be found at: <span><span>https://github.com/noobasuna/causal_ex.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 52-59"},"PeriodicalIF":3.3,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145749054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-02DOI: 10.1016/j.patrec.2025.11.044
Bradley Hurst , Nicola Bellotto , Petra Bosilj
Semi-supervised teacher-student pseudo-labelling improves instance segmentation by exploiting unlabelled data, where a teacher network, trained with a small annotated dataset, generates pseudo labels for the remaining data, to train the student model. However, mask selection typically relies heavily on the class confidence scores. In single-class settings these scores saturate, offering little discrimination between masks. In this work we propose a mask symmetry score that evaluates logits from the mask prediction head, enabling more reliable pseudo-label selection without architectural changes. Evaluations on both CNN- and Transformer-based models show our method outperforms state-of-the-art approaches on a real-world agri-robotic dataset of densely clustered potato tubers.
{"title":"Improving pseudo-labelling for semi-supervised single-class instance segmentation via mask symmetry scoring","authors":"Bradley Hurst , Nicola Bellotto , Petra Bosilj","doi":"10.1016/j.patrec.2025.11.044","DOIUrl":"10.1016/j.patrec.2025.11.044","url":null,"abstract":"<div><div>Semi-supervised teacher-student pseudo-labelling improves instance segmentation by exploiting unlabelled data, where a teacher network, trained with a small annotated dataset, generates pseudo labels for the remaining data, to train the student model. However, mask selection typically relies heavily on the class confidence scores. In single-class settings these scores saturate, offering little discrimination between masks. In this work we propose a mask symmetry score that evaluates logits from the mask prediction head, enabling more reliable pseudo-label selection without architectural changes. Evaluations on both CNN- and Transformer-based models show our method outperforms state-of-the-art approaches on a real-world agri-robotic dataset of densely clustered potato tubers.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 60-66"},"PeriodicalIF":3.3,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145749053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-30DOI: 10.1016/j.patrec.2025.11.043
Bo Huang , Yiwei Lu , Changsheng Yin , Ruopeng Yang , Yu Tao , Yongqi Shi , Shijie Wang , Qian Zhao
With the rapid development of artificial intelligence technology, deep learning has been widely applied in the semantic segmentation of remote sensing images. Current methods for remote sensing semantic segmentation mainly employ architectures based on convolutional neural networks and Transformer networks, achieving good performance in segmentation tasks. However, existing approaches fail to optimize segmentation for diverse terrain characteristics, leading to limitations in segmentation accuracy in complex scenes. To address this, we propose a novel network called DBASNet, which consists of two decoding branches: road topology and terrain classification. The former focuses on the integrity of the topological structure of road terrains, while the latter emphasizes the accuracy of other terrain segmentations. Experiments demonstrate that DBASNet achieves state-of-the-art semantic segmentation results by balancing terrain segmentation accuracy with road connectivity on the LoveDA and LandCover.ai datasets.
{"title":"DBASNet: A double-branch adaptive segmentation network for remote sensing image","authors":"Bo Huang , Yiwei Lu , Changsheng Yin , Ruopeng Yang , Yu Tao , Yongqi Shi , Shijie Wang , Qian Zhao","doi":"10.1016/j.patrec.2025.11.043","DOIUrl":"10.1016/j.patrec.2025.11.043","url":null,"abstract":"<div><div>With the rapid development of artificial intelligence technology, deep learning has been widely applied in the semantic segmentation of remote sensing images. Current methods for remote sensing semantic segmentation mainly employ architectures based on convolutional neural networks and Transformer networks, achieving good performance in segmentation tasks. However, existing approaches fail to optimize segmentation for diverse terrain characteristics, leading to limitations in segmentation accuracy in complex scenes. To address this, we propose a novel network called DBASNet, which consists of two decoding branches: road topology and terrain classification. The former focuses on the integrity of the topological structure of road terrains, while the latter emphasizes the accuracy of other terrain segmentations. Experiments demonstrate that DBASNet achieves state-of-the-art semantic segmentation results by balancing terrain segmentation accuracy with road connectivity on the LoveDA and LandCover.ai datasets.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 9-14"},"PeriodicalIF":3.3,"publicationDate":"2025-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}