首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
Enhanced facial expression manipulation through domain-aware transformation and dual-level classification with expression awarness loss in the CLIP space 基于领域感知变换和双级分类的面部表情处理方法
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-19 DOI: 10.1016/j.patrec.2025.11.045
Qi Guo, Xiaodong Gu
Accurate facial expression manipulation, particularly transforming complex, non-neutral expressions into specific target states, remains challenging due to substantial disparities among expression domains. Existing methods often struggle with such domain shifts, leading to suboptimal editing results. To address these challenges, we propose a novel framework called Domain-Aware Expression Transformation with Dual-Level Label Information Classifier (DAET-DLIC). The DAET-DLIC architecture consists of two major modules. The Domain-Aware Expression Transformation module enhances domain awareness by processing latent codes to model expression-domain distributions. The Dual-Level Label Information Classifier performs classification at both the latent and image levels to ensure comprehensive and reliable label supervision. Furthermore, the Expression Awareness Loss Function provides precise control over the directionality of expression transformations, effectively reducing the risk of expression semantic drift in the CLIP (Contrastive Language-Image Pretraining) space. We validate our method through extensive quantitative and qualitative experiments on the Radboud Faces Database and CelebA-HQ datasets and introduce a comprehensive quantitative metric to assess manipulation efficacy.
准确的面部表情操纵,特别是将复杂的非中性表情转化为特定的目标状态,由于表情领域之间的巨大差异,仍然具有挑战性。现有的方法经常与这样的领域转移作斗争,导致次优的编辑结果。为了解决这些挑战,我们提出了一个新的框架,称为双级标签信息分类器的领域感知表达式转换(DAET-DLIC)。DAET-DLIC体系结构由两个主要模块组成。领域感知表达式转换模块通过处理潜在代码来对表达式域分布进行建模,从而增强领域感知能力。双级标签信息分类器在潜在和图像两个级别进行分类,以确保全面可靠的标签监管。此外,表达意识损失函数提供了对表达转换方向性的精确控制,有效降低了CLIP(对比语言-图像预训练)空间中表达语义漂移的风险。我们在Radboud Faces数据库和CelebA-HQ数据集上进行了大量的定量和定性实验,验证了我们的方法,并引入了一个全面的定量指标来评估操作效果。
{"title":"Enhanced facial expression manipulation through domain-aware transformation and dual-level classification with expression awarness loss in the CLIP space","authors":"Qi Guo,&nbsp;Xiaodong Gu","doi":"10.1016/j.patrec.2025.11.045","DOIUrl":"10.1016/j.patrec.2025.11.045","url":null,"abstract":"<div><div>Accurate facial expression manipulation, particularly transforming complex, non-neutral expressions into specific target states, remains challenging due to substantial disparities among expression domains. Existing methods often struggle with such domain shifts, leading to suboptimal editing results. To address these challenges, we propose a novel framework called Domain-Aware Expression Transformation with Dual-Level Label Information Classifier (DAET-DLIC). The DAET-DLIC architecture consists of two major modules. The Domain-Aware Expression Transformation module enhances domain awareness by processing latent codes to model expression-domain distributions. The Dual-Level Label Information Classifier performs classification at both the latent and image levels to ensure comprehensive and reliable label supervision. Furthermore, the Expression Awareness Loss Function provides precise control over the directionality of expression transformations, effectively reducing the risk of expression semantic drift in the CLIP (Contrastive Language-Image Pretraining) space. We validate our method through extensive quantitative and qualitative experiments on the Radboud Faces Database and CelebA-HQ datasets and introduce a comprehensive quantitative metric to assess manipulation efficacy.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 102-107"},"PeriodicalIF":3.3,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An illumination-robust feature decomposition approach for low-light crowd counting 弱光人群计数的光照鲁棒特征分解方法
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-16 DOI: 10.1016/j.patrec.2025.12.005
Jian Cheng, Chen Feng, Yang Xiao, Zhiguo Cao
Crowd counting is widely studied, yet its reliability in low-light environments remains underexplored. Regular counters fail to perform well due to poor image quality; applying image enhancement pre-processing yields limited improvement; and introducing additional thermal inputs increases cost. This study presents an approach that only requires annotated normal-light RGB data. To learn illumination-robust representations, we construct normal- and low-light image pairs and decompose their features into common and unique components. The common components preserve shared thus illumination-robust information, so they are optimized for density map prediction. We also introduce a dataset for evaluating crowd counting performance in low-light conditions. Experiments show that our approach consistently improves performance on multiple baseline architectures with negligible computational overhead. The source code and dataset will be made publicly available upon acceptance at https://github.com/hustaia/Feature_Decomposition_Counting.
人群计数被广泛研究,但其在低光环境下的可靠性仍有待探索。普通计数器由于图像质量差而表现不佳;图像增强预处理的效果有限;引入额外的热输入会增加成本。本研究提出了一种只需要注释的正常光RGB数据的方法。为了学习光照鲁棒表示,我们构建正常光照和低光照图像对,并将其特征分解为共同和独特的组件。公共分量保留了共享的光照鲁棒信息,因此它们被优化用于密度图预测。我们还介绍了一个用于评估低光照条件下人群计数性能的数据集。实验表明,我们的方法在计算开销可以忽略不计的情况下持续提高多个基准架构的性能。源代码和数据集在接受后将在https://github.com/hustaia/Feature_Decomposition_Counting上公开提供。
{"title":"An illumination-robust feature decomposition approach for low-light crowd counting","authors":"Jian Cheng,&nbsp;Chen Feng,&nbsp;Yang Xiao,&nbsp;Zhiguo Cao","doi":"10.1016/j.patrec.2025.12.005","DOIUrl":"10.1016/j.patrec.2025.12.005","url":null,"abstract":"<div><div>Crowd counting is widely studied, yet its reliability in low-light environments remains underexplored. Regular counters fail to perform well due to poor image quality; applying image enhancement pre-processing yields limited improvement; and introducing additional thermal inputs increases cost. This study presents an approach that only requires annotated normal-light RGB data. To learn illumination-robust representations, we construct normal- and low-light image pairs and decompose their features into common and unique components. The common components preserve shared thus illumination-robust information, so they are optimized for density map prediction. We also introduce a dataset for evaluating crowd counting performance in low-light conditions. Experiments show that our approach consistently improves performance on multiple baseline architectures with negligible computational overhead. The source code and dataset will be made publicly available upon acceptance at <span><span>https://github.com/hustaia/Feature_Decomposition_Counting</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 108-114"},"PeriodicalIF":3.3,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Psychology-informed safety attributes recognition in dense crowds 密集人群中基于心理的安全属性识别
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-16 DOI: 10.1016/j.patrec.2025.12.006
Jiaqi Yu, Yanshan Zhou, Renjie Pan, Cunyan Li, Hua Yang
Understanding dense crowd scenes requires analyzing multiple spatial and behavioral attributes. However, existing attributes often fall short of identifying potential safety risks such as panic. To address this, we propose two safety-aware crowd attributes: Crowd Motion Stability (CMS) and Individual Comfort Distance (ICD). CMS characterizes macro-level motion coordination based on the spatial-temporal consistency of crowd movement. In contrast, ICD is grounded in social psychology and captures individuals’ preferred interpersonal distance under varying densities. To accurately recognize these attributes, we propose a Psychology-Guided Safety-Aware Network (PGSAN), which integrates the Spatial-Temporal Consistency Network (STCN) and the Spatial Distance Network (SDN). Specifically, STCN is constructed based on behavioral coherence theory to measure CMS. Meanwhile, SDN models ICD by integrating dynamic crowd states and dual perceptual mechanisms (intuitive and analytical) in psychology, enabling adaptive comfort distance extraction. Features from both sub-networks are fused to support attribute recognition across diverse video scenes. Experimental results demonstrate the proposed method’s superior performance in recognizing safety attributes in dense crowds.
理解密集人群场景需要分析多个空间和行为属性。然而,现有的属性往往无法识别潜在的安全风险,比如恐慌。为了解决这个问题,我们提出了两个安全意识人群属性:人群运动稳定性(CMS)和个体舒适距离(ICD)。CMS以人群运动的时空一致性为基础,以宏观层面的运动协调为特征。相比之下,ICD以社会心理学为基础,捕捉了不同密度下个体偏好的人际距离。为了准确识别这些属性,我们提出了一种心理引导的安全感知网络(PGSAN),该网络集成了时空一致性网络(STCN)和空间距离网络(SDN)。具体而言,基于行为相干理论构建STCN来测量CMS。同时,SDN通过整合动态人群状态和心理学的双重感知机制(直观和分析)来建模ICD,实现自适应舒适距离提取。两个子网的特征融合在一起,以支持跨不同视频场景的属性识别。实验结果表明,该方法在密集人群中具有较好的安全属性识别性能。
{"title":"Psychology-informed safety attributes recognition in dense crowds","authors":"Jiaqi Yu,&nbsp;Yanshan Zhou,&nbsp;Renjie Pan,&nbsp;Cunyan Li,&nbsp;Hua Yang","doi":"10.1016/j.patrec.2025.12.006","DOIUrl":"10.1016/j.patrec.2025.12.006","url":null,"abstract":"<div><div>Understanding dense crowd scenes requires analyzing multiple spatial and behavioral attributes. However, existing attributes often fall short of identifying potential safety risks such as panic. To address this, we propose two safety-aware crowd attributes: Crowd Motion Stability (CMS) and Individual Comfort Distance (ICD). CMS characterizes macro-level motion coordination based on the spatial-temporal consistency of crowd movement. In contrast, ICD is grounded in social psychology and captures individuals’ preferred interpersonal distance under varying densities. To accurately recognize these attributes, we propose a Psychology-Guided Safety-Aware Network (PGSAN), which integrates the Spatial-Temporal Consistency Network (STCN) and the Spatial Distance Network (SDN). Specifically, STCN is constructed based on behavioral coherence theory to measure CMS. Meanwhile, SDN models ICD by integrating dynamic crowd states and dual perceptual mechanisms (intuitive and analytical) in psychology, enabling adaptive comfort distance extraction. Features from both sub-networks are fused to support attribute recognition across diverse video scenes. Experimental results demonstrate the proposed method’s superior performance in recognizing safety attributes in dense crowds.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 88-94"},"PeriodicalIF":3.3,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive recursive channel selection for robust decoding of motor imagery EEG signal in patients with intracerebral hemorrhage 自适应递归通道选择在脑出血患者运动图像脑电信号鲁棒解码中的应用
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-16 DOI: 10.1016/j.patrec.2025.12.004
Shengjie Li , Jian Shi , Danyang Chen , Zheng Zhu , Feng Hu , Wei Jiang , Kai Shu , Zheng You , Ping Zhang , Zhouping Tang
In the study of electroencephalography (EEG)-based motor imagery (MI) brain-computer interfaces (BCIs), neurorehabilitation technologies hold significant potential for recovering from intracerebral hemorrhage (ICH). However, the rehabilitation process is hindered as the clinical practicality of such systems is reduced considerably due to their lengthy setup procedures caused by excessive number of channels. Accordingly, this study proposes a channel selection method based on an adaptive recursive learning framework, which establishes a comprehensive evaluation metric by combining time-frequency domain features. Experimental results demonstrate that, upon using 37.50 % fewer channels, the average accuracy of MI classification increased from 65.44 % to 69.28 % in healthy subjects and from 65.00 % to 67.64 % in patients with ICH. This study presents the pioneering EEG-based MI BCI channel selection process specifically designed for ICH patients, paving the way for personalized rehabilitation protocols and facilitating the translation of neurotechnology into clinical practice.
在基于脑电图(EEG)的运动成像(MI)脑机接口(bci)的研究中,神经康复技术在脑出血(ICH)的康复中具有重要的潜力。然而,由于通道数量过多,这些系统的设置过程冗长,大大降低了临床实用性,从而阻碍了康复过程。为此,本研究提出了一种基于自适应递归学习框架的信道选择方法,该方法结合时频域特征建立了综合评价指标。实验结果表明,在减少37.50%通道后,健康受试者的心肌梗死分类平均准确率从65.44%提高到69.28%,脑出血患者的心肌梗死分类平均准确率从65.00%提高到67.64%。本研究提出了专门为脑出血患者设计的基于脑电图的MI BCI通道选择过程,为个性化康复方案铺平了道路,并促进了神经技术向临床实践的转化。
{"title":"Adaptive recursive channel selection for robust decoding of motor imagery EEG signal in patients with intracerebral hemorrhage","authors":"Shengjie Li ,&nbsp;Jian Shi ,&nbsp;Danyang Chen ,&nbsp;Zheng Zhu ,&nbsp;Feng Hu ,&nbsp;Wei Jiang ,&nbsp;Kai Shu ,&nbsp;Zheng You ,&nbsp;Ping Zhang ,&nbsp;Zhouping Tang","doi":"10.1016/j.patrec.2025.12.004","DOIUrl":"10.1016/j.patrec.2025.12.004","url":null,"abstract":"<div><div>In the study of electroencephalography (EEG)-based motor imagery (MI) brain-computer interfaces (BCIs), neurorehabilitation technologies hold significant potential for recovering from intracerebral hemorrhage (ICH). However, the rehabilitation process is hindered as the clinical practicality of such systems is reduced considerably due to their lengthy setup procedures caused by excessive number of channels. Accordingly, this study proposes a channel selection method based on an adaptive recursive learning framework, which establishes a comprehensive evaluation metric by combining time-frequency domain features. Experimental results demonstrate that, upon using 37.50 % fewer channels, the average accuracy of MI classification increased from 65.44 % to 69.28 % in healthy subjects and from 65.00 % to 67.64 % in patients with ICH. This study presents the pioneering EEG-based MI BCI channel selection process specifically designed for ICH patients, paving the way for personalized rehabilitation protocols and facilitating the translation of neurotechnology into clinical practice.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 95-101"},"PeriodicalIF":3.3,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
E2GenF: Universal AIGC image detection based on edge enhanced generalizable features 基于边缘增强广义特征的通用AIGC图像检测
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-09 DOI: 10.1016/j.patrec.2025.12.001
Jian Zou , Jun Wang , Kezhong Lu , Yingxin Lai , Kaiwen Luo , Zitong Yu
Generative models, such as GANs and Diffusion models, have achieved remarkable advancements in Artificial Intelligence Generated Content (AIGC), creating images that are nearly indistinguishable from real ones. However, existing detection methods often face challenges in identifying images generated by unseen models and exhibit limited generalization across different domains. In this paper, our aim is to improve the generalization capacity of AIGC image detectors by leveraging artifact features exposed during the upsampling process. Specifically, we reexamine the upsampling operations employed by generative models and observe that, in high-frequency regions of an image (e.g., edge areas with significant pixel intensity differences), generative models often struggle to accurately replicate the pixel distributions of real images, thereby leaving behind unavoidable artifact information. Based on this observation, we propose to utilize edge detection operators to enrich edge-aware detailed clues, enabling the model to focus on these critical features. Furthermore, We designed a module that combines upsampling and downsampling to analyze pixel correlation changes introduced by interpolation artifacts. The integrated approach effectively enhances the detection of subtle generative traces, thereby improving generalization across diverse generative models. Extensive experiments on three benchmark datasets demonstrate the superior performance of the proposed approach against previous state-of-the-art methods under cross-domain testing scenarios. The code is available at https://github.com/zj56/EdgeEnhanced-DeepfakeDetection.
生成模型,如gan和扩散模型,在人工智能生成内容(AIGC)方面取得了显着进步,创建的图像几乎与真实图像无法区分。然而,现有的检测方法在识别未知模型生成的图像时经常面临挑战,并且在不同领域的泛化能力有限。在本文中,我们的目标是通过利用在上采样过程中暴露的伪特征来提高AIGC图像检测器的泛化能力。具体来说,我们重新审视了生成模型所使用的上采样操作,并观察到,在图像的高频区域(例如,具有显著像素强度差异的边缘区域),生成模型通常难以准确地复制真实图像的像素分布,从而留下不可避免的伪影信息。基于这一观察,我们提出利用边缘检测算子丰富边缘感知的细节线索,使模型能够专注于这些关键特征。此外,我们设计了一个结合上采样和下采样的模块来分析插值伪影带来的像素相关变化。该方法有效地增强了对细微生成轨迹的检测,从而提高了不同生成模型之间的泛化能力。在三个基准数据集上进行的大量实验表明,在跨域测试场景下,所提出的方法比以前最先进的方法具有更好的性能。代码可在https://github.com/zj56/EdgeEnhanced-DeepfakeDetection上获得。
{"title":"E2GenF: Universal AIGC image detection based on edge enhanced generalizable features","authors":"Jian Zou ,&nbsp;Jun Wang ,&nbsp;Kezhong Lu ,&nbsp;Yingxin Lai ,&nbsp;Kaiwen Luo ,&nbsp;Zitong Yu","doi":"10.1016/j.patrec.2025.12.001","DOIUrl":"10.1016/j.patrec.2025.12.001","url":null,"abstract":"<div><div>Generative models, such as GANs and Diffusion models, have achieved remarkable advancements in Artificial Intelligence Generated Content (AIGC), creating images that are nearly indistinguishable from real ones. However, existing detection methods often face challenges in identifying images generated by unseen models and exhibit limited generalization across different domains. In this paper, our aim is to improve the generalization capacity of AIGC image detectors by leveraging artifact features exposed during the upsampling process. Specifically, we reexamine the upsampling operations employed by generative models and observe that, in high-frequency regions of an image (e.g., edge areas with significant pixel intensity differences), generative models often struggle to accurately replicate the pixel distributions of real images, thereby leaving behind unavoidable artifact information. Based on this observation, we propose to utilize edge detection operators to enrich edge-aware detailed clues, enabling the model to focus on these critical features. Furthermore, We designed a module that combines upsampling and downsampling to analyze pixel correlation changes introduced by interpolation artifacts. The integrated approach effectively enhances the detection of subtle generative traces, thereby improving generalization across diverse generative models. Extensive experiments on three benchmark datasets demonstrate the superior performance of the proposed approach against previous state-of-the-art methods under cross-domain testing scenarios. The code is available at <span><span>https://github.com/zj56/EdgeEnhanced-DeepfakeDetection</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 74-80"},"PeriodicalIF":3.3,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantized DiT with hadamard transformation: A technical report 量化DiT与hadamard转换:技术报告
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-07 DOI: 10.1016/j.patrec.2025.12.003
Yue Liu, Wenxi Yang, Jianbin Jiao
Diffusion Transformers (DiTs) combine the scalability of transformers with the fidelity of diffusion models, achieving state-of-the-art image generation performance. However, their high computational cost hinders efficient deployment. Post-Training Quantization (PTQ) offers a remedy, yet existing methods struggle with the temporal and spatial dynamics of DiTs. We propose a simplified PTQ framework-combining computationally efficient rotation and randomness-for stable and effective DiT quantization. By replacing block-wise rotations with Hadamard transforms and zigzag permutations with random permutations, our method preserves the decorrelation effect while greatly reducing computational overhead. Experiments show that our approach maintains near full-precision performance at 8-bit and 6-bit precision levels. This work demonstrates that lightweight PTQ with structured randomness can effectively balance efficiency and fidelity, enabling practical deployment of DiTs in resource-constrained environments.
扩散变压器(DiTs)结合了变压器的可扩展性和扩散模型的保真度,实现了最先进的图像生成性能。然而,它们的高计算成本阻碍了高效部署。训练后量化(PTQ)提供了一种补救方法,但现有方法难以解决dit的时空动态问题。我们提出了一个简化的PTQ框架-结合计算效率的旋转和随机性-用于稳定和有效的DiT量化。通过用Hadamard变换代替块方向旋转,用随机排列代替之字形排列,我们的方法在保留去相关效果的同时大大减少了计算开销。实验表明,我们的方法在8位和6位精度水平上保持接近全精度的性能。这项工作表明,具有结构化随机性的轻量级PTQ可以有效地平衡效率和保真度,使dit在资源受限环境中的实际部署成为可能。
{"title":"Quantized DiT with hadamard transformation: A technical report","authors":"Yue Liu,&nbsp;Wenxi Yang,&nbsp;Jianbin Jiao","doi":"10.1016/j.patrec.2025.12.003","DOIUrl":"10.1016/j.patrec.2025.12.003","url":null,"abstract":"<div><div>Diffusion Transformers (DiTs) combine the scalability of transformers with the fidelity of diffusion models, achieving state-of-the-art image generation performance. However, their high computational cost hinders efficient deployment. Post-Training Quantization (PTQ) offers a remedy, yet existing methods struggle with the temporal and spatial dynamics of DiTs. We propose a simplified PTQ framework-combining computationally efficient rotation and randomness-for stable and effective DiT quantization. By replacing block-wise rotations with Hadamard transforms and zigzag permutations with random permutations, our method preserves the decorrelation effect while greatly reducing computational overhead. Experiments show that our approach maintains near full-precision performance at 8-bit and 6-bit precision levels. This work demonstrates that lightweight PTQ with structured randomness can effectively balance efficiency and fidelity, enabling practical deployment of DiTs in resource-constrained environments.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 81-87"},"PeriodicalIF":3.3,"publicationDate":"2025-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DECFusion: A lightweight decomposition fusion method for luminance artifact removal in infrared and visible images DECFusion:一种用于去除红外和可见光图像中亮度伪影的轻量级分解融合方法
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-06 DOI: 10.1016/j.patrec.2025.11.034
Quanquan Xiao , Haiyan Jin , Haonan Su , Yuanlin Zhang
Infrared and visible image fusion is a current research hotspot in the field of multimodal image fusion, which aims to improve the perception and understanding of the scene through effective fusion. However, current deep learning-based fusion methods often fail to fully consider the difference between visible light brightness and thermal information of infrared images, resulting in brightness artifacts in the generated fused images, which seriously affects the visual effect of the fused images. To solve this problem, we propose a lightweight infrared and visible image decomposition fusion method (DECFusion). The method decomposes the luminance information of the visible image and the thermal information of the infrared image into illumination and reflection components through a learnable lightweight network, and adaptively adjusts the illumination component to remove unnecessary luminance interference. In the reconstruction stage, we combine the Retinex theory to reconstruct the image. Experiments show that the fused images generated by our method not only avoid the generation of luminance artifacts, but also are more lightweight and outperform the current state-of-the-art infrared and visible image fusion methods in terms of the visual quality of the fused images. Our code is available at https://github.com/tianzhiya/DECFusion.
红外与可见光图像融合是当前多模态图像融合领域的研究热点,其目的是通过有效的融合提高对场景的感知和理解。然而,目前基于深度学习的融合方法往往没有充分考虑红外图像可见光亮度和热信息的差异,导致生成的融合图像中存在亮度伪影,严重影响融合图像的视觉效果。为了解决这一问题,我们提出了一种轻量级的红外和可见光图像分解融合方法(DECFusion)。该方法通过可学习的轻量级网络将可见光图像的亮度信息和红外图像的热信息分解为照明和反射分量,并自适应调整照明分量以消除不必要的亮度干扰。在重建阶段,我们结合Retinex理论对图像进行重建。实验表明,该方法生成的融合图像不仅避免了亮度伪影的产生,而且更轻量化,在融合图像的视觉质量方面优于目前最先进的红外和可见光图像融合方法。我们的代码可在https://github.com/tianzhiya/DECFusion上获得。
{"title":"DECFusion: A lightweight decomposition fusion method for luminance artifact removal in infrared and visible images","authors":"Quanquan Xiao ,&nbsp;Haiyan Jin ,&nbsp;Haonan Su ,&nbsp;Yuanlin Zhang","doi":"10.1016/j.patrec.2025.11.034","DOIUrl":"10.1016/j.patrec.2025.11.034","url":null,"abstract":"<div><div>Infrared and visible image fusion is a current research hotspot in the field of multimodal image fusion, which aims to improve the perception and understanding of the scene through effective fusion. However, current deep learning-based fusion methods often fail to fully consider the difference between visible light brightness and thermal information of infrared images, resulting in brightness artifacts in the generated fused images, which seriously affects the visual effect of the fused images. To solve this problem, we propose a lightweight infrared and visible image decomposition fusion method (DECFusion). The method decomposes the luminance information of the visible image and the thermal information of the infrared image into illumination and reflection components through a learnable lightweight network, and adaptively adjusts the illumination component to remove unnecessary luminance interference. In the reconstruction stage, we combine the Retinex theory to reconstruct the image. Experiments show that the fused images generated by our method not only avoid the generation of luminance artifacts, but also are more lightweight and outperform the current state-of-the-art infrared and visible image fusion methods in terms of the visual quality of the fused images. Our code is available at <span><span>https://github.com/tianzhiya/DECFusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 67-73"},"PeriodicalIF":3.3,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145749050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal-Ex: Causal graph-based micro and macro expression spotting 因果关系:基于因果图的微观和宏观表达定位
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-05 DOI: 10.1016/j.patrec.2025.12.002
Pei-Sze Tan, Sailaja Rajanala, Arghya Pal, Raphaël C.-W. Phan, Huey-Fang Ong
Detecting concealed emotions within apparently normal expressions is crucial for identifying potential mental health issues and facilitating timely support and intervention. The task of spotting macro- and micro-expressions involves predicting the emotional timeline within a video by identifying the onset (i.e., the beginning), apex (the peak of emotion), and offset (the end of emotion) frames of the displayed emotions. More particularly, closely monitoring the key emotion-conveying regions of the face; namely, the foundational muscle-movement cues known as facial action units (AUs)–greatly aids in the clear identification of micro-expressions. One major roadblock is the inadvertent introduction of biases into the training process, which degrades performance regardless of feature quality. Biases are spurious factors that falsely inflate or deflate performance metrics. For instance, the neural networks tend to falsely attribute certain AUs in specific facial regions to particular emotion classes, a phenomenon also termed as Inductive biases. To remove these false attributions, we must identify and mitigate biases that arise from mere correlation between some features and the output class labels. We hence introduce action-unit causal graphs. Unlike the traditional action-unit graph, which connects AUs based solely on spatial adjacency, the causal AU graph is derived from statistical tests and retains edges between AUs only when there is significant evidence that one AU causally influences another. Our model, named Causal-Ex (Causal-based Expression spotting), employs a fast causal inference algorithm to construct a causal graph of facial region of interests (ROIs). This enables us to select causally relevant facial action units in the ROIs. Our work demonstrates improvement in overall F1-scores compared to state-of-the-art approaches with 0.388 on CAS(ME)2 and 0.3701 on SAMM-Long Video datasets. Our code can be found at: https://github.com/noobasuna/causal_ex.git.
在明显正常的表情中发现隐藏的情绪对于识别潜在的心理健康问题和促进及时的支持和干预至关重要。发现宏观和微观表情的任务包括通过识别所显示的情绪的开始(即开始),顶点(情绪的顶峰)和抵消(情绪的结束)帧来预测视频中的情绪时间轴。更具体地说,密切监测面部关键的情绪传递区域;也就是说,被称为面部动作单位(AUs)的基本肌肉运动线索,极大地帮助清晰地识别微表情。一个主要的障碍是在训练过程中无意中引入偏差,这会降低性能,而不管特征质量如何。偏见是虚假的因素,错误地夸大或缩小绩效指标。例如,神经网络倾向于错误地将特定面部区域的某些au归因于特定的情绪类别,这种现象也被称为归纳偏差。为了消除这些错误的归因,我们必须识别和减轻由于一些特征和输出类标签之间的相关性而产生的偏差。因此,我们引入动作单元因果图。与传统的行动单元图(仅基于空间邻接性连接AU)不同,因果AU图是从统计测试中得出的,只有当有显著证据表明一个AU对另一个AU产生因果影响时,才保留AU之间的边缘。该模型采用快速因果推理算法构建了面部兴趣区域(roi)的因果图,并命名为causal - ex (causal -based Expression spotting)。这使我们能够在roi中选择因果相关的面部动作单元。我们的工作表明,与最先进的方法相比,总体f1得分有所提高,在CAS(ME)2上为0.388,在SAMM-Long Video数据集上为0.3701。我们的代码可以在https://github.com/noobasuna/causal_ex.git找到。
{"title":"Causal-Ex: Causal graph-based micro and macro expression spotting","authors":"Pei-Sze Tan,&nbsp;Sailaja Rajanala,&nbsp;Arghya Pal,&nbsp;Raphaël C.-W. Phan,&nbsp;Huey-Fang Ong","doi":"10.1016/j.patrec.2025.12.002","DOIUrl":"10.1016/j.patrec.2025.12.002","url":null,"abstract":"<div><div>Detecting concealed emotions within apparently normal expressions is crucial for identifying potential mental health issues and facilitating timely support and intervention. The task of spotting macro- and micro-expressions involves predicting the emotional timeline within a video by identifying the onset (i.e., the beginning), apex (the peak of emotion), and offset (the end of emotion) frames of the displayed emotions. More particularly, closely monitoring the key emotion-conveying regions of the face; namely, the foundational muscle-movement cues known as facial action units (AUs)–greatly aids in the clear identification of micro-expressions. One major roadblock is the inadvertent introduction of biases into the training process, which degrades performance regardless of feature quality. Biases are spurious factors that falsely inflate or deflate performance metrics. For instance, the neural networks tend to falsely attribute certain AUs in specific facial regions to particular emotion classes, a phenomenon also termed as Inductive biases. To remove these false attributions, we must identify and mitigate biases that arise from mere correlation between some features and the output class labels. We hence introduce action-unit causal graphs. Unlike the traditional action-unit graph, which connects AUs based solely on spatial adjacency, the causal AU graph is derived from statistical tests and retains edges between AUs only when there is significant evidence that one AU causally influences another. Our model, named <span>Causal-Ex</span> (<strong>Causal</strong>-based <strong>Ex</strong>pression spotting), employs a fast causal inference algorithm to construct a causal graph of facial region of interests (ROIs). This enables us to select causally relevant facial action units in the ROIs. Our work demonstrates improvement in overall F1-scores compared to state-of-the-art approaches with 0.388 on CAS(ME)<sup>2</sup> and 0.3701 on SAMM-Long Video datasets. Our code can be found at: <span><span>https://github.com/noobasuna/causal_ex.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 52-59"},"PeriodicalIF":3.3,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145749054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving pseudo-labelling for semi-supervised single-class instance segmentation via mask symmetry scoring 基于掩码对称评分的半监督单类实例分割伪标记改进
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-02 DOI: 10.1016/j.patrec.2025.11.044
Bradley Hurst , Nicola Bellotto , Petra Bosilj
Semi-supervised teacher-student pseudo-labelling improves instance segmentation by exploiting unlabelled data, where a teacher network, trained with a small annotated dataset, generates pseudo labels for the remaining data, to train the student model. However, mask selection typically relies heavily on the class confidence scores. In single-class settings these scores saturate, offering little discrimination between masks. In this work we propose a mask symmetry score that evaluates logits from the mask prediction head, enabling more reliable pseudo-label selection without architectural changes. Evaluations on both CNN- and Transformer-based models show our method outperforms state-of-the-art approaches on a real-world agri-robotic dataset of densely clustered potato tubers.
半监督师生伪标记通过利用未标记的数据来改进实例分割,其中一个教师网络,用一个小的带注释的数据集训练,为剩余的数据生成伪标签,以训练学生模型。然而,面具选择通常严重依赖于班级置信度得分。在单一类别设置中,这些分数饱和,在面具之间几乎没有区别。在这项工作中,我们提出了一个掩码对称评分,用于评估来自掩码预测头的logits,从而在不改变架构的情况下实现更可靠的伪标签选择。对基于CNN和transformer的模型的评估表明,我们的方法在密集聚集的马铃薯块茎的真实农业机器人数据集上优于最先进的方法。
{"title":"Improving pseudo-labelling for semi-supervised single-class instance segmentation via mask symmetry scoring","authors":"Bradley Hurst ,&nbsp;Nicola Bellotto ,&nbsp;Petra Bosilj","doi":"10.1016/j.patrec.2025.11.044","DOIUrl":"10.1016/j.patrec.2025.11.044","url":null,"abstract":"<div><div>Semi-supervised teacher-student pseudo-labelling improves instance segmentation by exploiting unlabelled data, where a teacher network, trained with a small annotated dataset, generates pseudo labels for the remaining data, to train the student model. However, mask selection typically relies heavily on the class confidence scores. In single-class settings these scores saturate, offering little discrimination between masks. In this work we propose a mask symmetry score that evaluates logits from the mask prediction head, enabling more reliable pseudo-label selection without architectural changes. Evaluations on both CNN- and Transformer-based models show our method outperforms state-of-the-art approaches on a real-world agri-robotic dataset of densely clustered potato tubers.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 60-66"},"PeriodicalIF":3.3,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145749053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DBASNet: A double-branch adaptive segmentation network for remote sensing image 基于DBASNet的遥感图像双分支自适应分割网络
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-30 DOI: 10.1016/j.patrec.2025.11.043
Bo Huang , Yiwei Lu , Changsheng Yin , Ruopeng Yang , Yu Tao , Yongqi Shi , Shijie Wang , Qian Zhao
With the rapid development of artificial intelligence technology, deep learning has been widely applied in the semantic segmentation of remote sensing images. Current methods for remote sensing semantic segmentation mainly employ architectures based on convolutional neural networks and Transformer networks, achieving good performance in segmentation tasks. However, existing approaches fail to optimize segmentation for diverse terrain characteristics, leading to limitations in segmentation accuracy in complex scenes. To address this, we propose a novel network called DBASNet, which consists of two decoding branches: road topology and terrain classification. The former focuses on the integrity of the topological structure of road terrains, while the latter emphasizes the accuracy of other terrain segmentations. Experiments demonstrate that DBASNet achieves state-of-the-art semantic segmentation results by balancing terrain segmentation accuracy with road connectivity on the LoveDA and LandCover.ai datasets.
随着人工智能技术的快速发展,深度学习在遥感图像的语义分割中得到了广泛的应用。目前的遥感语义切分方法主要采用基于卷积神经网络和Transformer网络的体系结构,在切分任务中取得了较好的性能。然而,现有的分割方法无法针对不同的地形特征进行分割优化,导致复杂场景下的分割精度受到限制。为了解决这个问题,我们提出了一个名为DBASNet的新型网络,它由两个解码分支组成:道路拓扑和地形分类。前者注重道路地形拓扑结构的完整性,后者强调其他地形分割的准确性。实验表明,DBASNet通过平衡地形分割精度和道路连通性,在LoveDA和LandCover上实现了最先进的语义分割结果。人工智能的数据集。
{"title":"DBASNet: A double-branch adaptive segmentation network for remote sensing image","authors":"Bo Huang ,&nbsp;Yiwei Lu ,&nbsp;Changsheng Yin ,&nbsp;Ruopeng Yang ,&nbsp;Yu Tao ,&nbsp;Yongqi Shi ,&nbsp;Shijie Wang ,&nbsp;Qian Zhao","doi":"10.1016/j.patrec.2025.11.043","DOIUrl":"10.1016/j.patrec.2025.11.043","url":null,"abstract":"<div><div>With the rapid development of artificial intelligence technology, deep learning has been widely applied in the semantic segmentation of remote sensing images. Current methods for remote sensing semantic segmentation mainly employ architectures based on convolutional neural networks and Transformer networks, achieving good performance in segmentation tasks. However, existing approaches fail to optimize segmentation for diverse terrain characteristics, leading to limitations in segmentation accuracy in complex scenes. To address this, we propose a novel network called DBASNet, which consists of two decoding branches: road topology and terrain classification. The former focuses on the integrity of the topological structure of road terrains, while the latter emphasizes the accuracy of other terrain segmentations. Experiments demonstrate that DBASNet achieves state-of-the-art semantic segmentation results by balancing terrain segmentation accuracy with road connectivity on the LoveDA and LandCover.ai datasets.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 9-14"},"PeriodicalIF":3.3,"publicationDate":"2025-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1