Pub Date : 2026-02-01Epub Date: 2025-12-19DOI: 10.1016/j.patrec.2025.11.045
Qi Guo, Xiaodong Gu
Accurate facial expression manipulation, particularly transforming complex, non-neutral expressions into specific target states, remains challenging due to substantial disparities among expression domains. Existing methods often struggle with such domain shifts, leading to suboptimal editing results. To address these challenges, we propose a novel framework called Domain-Aware Expression Transformation with Dual-Level Label Information Classifier (DAET-DLIC). The DAET-DLIC architecture consists of two major modules. The Domain-Aware Expression Transformation module enhances domain awareness by processing latent codes to model expression-domain distributions. The Dual-Level Label Information Classifier performs classification at both the latent and image levels to ensure comprehensive and reliable label supervision. Furthermore, the Expression Awareness Loss Function provides precise control over the directionality of expression transformations, effectively reducing the risk of expression semantic drift in the CLIP (Contrastive Language-Image Pretraining) space. We validate our method through extensive quantitative and qualitative experiments on the Radboud Faces Database and CelebA-HQ datasets and introduce a comprehensive quantitative metric to assess manipulation efficacy.
{"title":"Enhanced facial expression manipulation through domain-aware transformation and dual-level classification with expression awarness loss in the CLIP space","authors":"Qi Guo, Xiaodong Gu","doi":"10.1016/j.patrec.2025.11.045","DOIUrl":"10.1016/j.patrec.2025.11.045","url":null,"abstract":"<div><div>Accurate facial expression manipulation, particularly transforming complex, non-neutral expressions into specific target states, remains challenging due to substantial disparities among expression domains. Existing methods often struggle with such domain shifts, leading to suboptimal editing results. To address these challenges, we propose a novel framework called Domain-Aware Expression Transformation with Dual-Level Label Information Classifier (DAET-DLIC). The DAET-DLIC architecture consists of two major modules. The Domain-Aware Expression Transformation module enhances domain awareness by processing latent codes to model expression-domain distributions. The Dual-Level Label Information Classifier performs classification at both the latent and image levels to ensure comprehensive and reliable label supervision. Furthermore, the Expression Awareness Loss Function provides precise control over the directionality of expression transformations, effectively reducing the risk of expression semantic drift in the CLIP (Contrastive Language-Image Pretraining) space. We validate our method through extensive quantitative and qualitative experiments on the Radboud Faces Database and CelebA-HQ datasets and introduce a comprehensive quantitative metric to assess manipulation efficacy.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 102-107"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-12-16DOI: 10.1016/j.patrec.2025.12.006
Jiaqi Yu, Yanshan Zhou, Renjie Pan, Cunyan Li, Hua Yang
Understanding dense crowd scenes requires analyzing multiple spatial and behavioral attributes. However, existing attributes often fall short of identifying potential safety risks such as panic. To address this, we propose two safety-aware crowd attributes: Crowd Motion Stability (CMS) and Individual Comfort Distance (ICD). CMS characterizes macro-level motion coordination based on the spatial-temporal consistency of crowd movement. In contrast, ICD is grounded in social psychology and captures individuals’ preferred interpersonal distance under varying densities. To accurately recognize these attributes, we propose a Psychology-Guided Safety-Aware Network (PGSAN), which integrates the Spatial-Temporal Consistency Network (STCN) and the Spatial Distance Network (SDN). Specifically, STCN is constructed based on behavioral coherence theory to measure CMS. Meanwhile, SDN models ICD by integrating dynamic crowd states and dual perceptual mechanisms (intuitive and analytical) in psychology, enabling adaptive comfort distance extraction. Features from both sub-networks are fused to support attribute recognition across diverse video scenes. Experimental results demonstrate the proposed method’s superior performance in recognizing safety attributes in dense crowds.
{"title":"Psychology-informed safety attributes recognition in dense crowds","authors":"Jiaqi Yu, Yanshan Zhou, Renjie Pan, Cunyan Li, Hua Yang","doi":"10.1016/j.patrec.2025.12.006","DOIUrl":"10.1016/j.patrec.2025.12.006","url":null,"abstract":"<div><div>Understanding dense crowd scenes requires analyzing multiple spatial and behavioral attributes. However, existing attributes often fall short of identifying potential safety risks such as panic. To address this, we propose two safety-aware crowd attributes: Crowd Motion Stability (CMS) and Individual Comfort Distance (ICD). CMS characterizes macro-level motion coordination based on the spatial-temporal consistency of crowd movement. In contrast, ICD is grounded in social psychology and captures individuals’ preferred interpersonal distance under varying densities. To accurately recognize these attributes, we propose a Psychology-Guided Safety-Aware Network (PGSAN), which integrates the Spatial-Temporal Consistency Network (STCN) and the Spatial Distance Network (SDN). Specifically, STCN is constructed based on behavioral coherence theory to measure CMS. Meanwhile, SDN models ICD by integrating dynamic crowd states and dual perceptual mechanisms (intuitive and analytical) in psychology, enabling adaptive comfort distance extraction. Features from both sub-networks are fused to support attribute recognition across diverse video scenes. Experimental results demonstrate the proposed method’s superior performance in recognizing safety attributes in dense crowds.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 88-94"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-12-25DOI: 10.1016/j.patrec.2025.12.012
Yanru Pan, Benchong Li
The Natarajan dimension is a crucial metric for measuring the capacity of a learning model and analyzing generalization ability of a classifier in multi-class classification tasks. In this paper, we present a tight upper bound of Natarajan dimension for linear multi-class predictors based on class sensitive feature mapping for multi-vector construction, and provide the exact Natarajan dimension when the dimension of feature is 2.
{"title":"Bounds on the Natarajan dimension of a class of linear multi-class predictors","authors":"Yanru Pan, Benchong Li","doi":"10.1016/j.patrec.2025.12.012","DOIUrl":"10.1016/j.patrec.2025.12.012","url":null,"abstract":"<div><div>The Natarajan dimension is a crucial metric for measuring the capacity of a learning model and analyzing generalization ability of a classifier in multi-class classification tasks. In this paper, we present a tight upper bound of Natarajan dimension for linear multi-class predictors based on class sensitive feature mapping for multi-vector construction, and provide the exact Natarajan dimension when the dimension of feature is 2.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 129-134"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-12-25DOI: 10.1016/j.patrec.2025.12.010
Jingang Wang , Tong Xiao , Hui Du , Cheng Zhang , Peng Liu
Cross-domain detection of AI-generated text is a crucial task for cybersecurity. In practical scenarios, after being trained on one or multiple known text generation sources (source domain), a detection model must be capable of effectively identifying text generated by unknown and unseen sources (target domain). Current approaches suffer from limited cross-domain generalization due to insufficient structural adaptation to domain discrepancies. To address this critical limitation, we propose RiDis,a classification model that synergizes Linguistic Richness and Lexical Pair Dispersion for cross-domain AI-generated text detection. Through comprehensive statistical analysis, we establish Linguistic Richness and Lexical Pair Dispersion as discriminative indicators for distinguishing human-authored and machine-generated texts. Our architecture features two innovative components, a Semantic Coherence Extraction Module employing long-range receptive fields to capture linguistic richness through global semantic trend analysis, and a Contextual Dependency Extraction Module utilizing localized receptive fields to quantify lexical pair dispersion via fine-grained word association patterns. The framework further incorporates domain adaptation learning to enhance cross-domain detection robustness. Extensive evaluations demonstrate that our method achieves superior detection accuracy compared to state-of-the-art baselines across multiple domains, with experimental results showing significant performance improvements on cross-domain test scenarios.
{"title":"Cross-Domain detection of AI-Generated text: Integrating linguistic richness and lexical pair dispersion via deep learning","authors":"Jingang Wang , Tong Xiao , Hui Du , Cheng Zhang , Peng Liu","doi":"10.1016/j.patrec.2025.12.010","DOIUrl":"10.1016/j.patrec.2025.12.010","url":null,"abstract":"<div><div>Cross-domain detection of AI-generated text is a crucial task for cybersecurity. In practical scenarios, after being trained on one or multiple known text generation sources (source domain), a detection model must be capable of effectively identifying text generated by unknown and unseen sources (target domain). Current approaches suffer from limited cross-domain generalization due to insufficient structural adaptation to domain discrepancies. To address this critical limitation, we propose <strong>RiDis</strong>,a classification model that synergizes Linguistic <strong>Ri</strong>chness and Lexical Pair <strong>Dis</strong>persion for cross-domain AI-generated text detection. Through comprehensive statistical analysis, we establish Linguistic Richness and Lexical Pair Dispersion as discriminative indicators for distinguishing human-authored and machine-generated texts. Our architecture features two innovative components, a Semantic Coherence Extraction Module employing long-range receptive fields to capture linguistic richness through global semantic trend analysis, and a Contextual Dependency Extraction Module utilizing localized receptive fields to quantify lexical pair dispersion via fine-grained word association patterns. The framework further incorporates domain adaptation learning to enhance cross-domain detection robustness. Extensive evaluations demonstrate that our method achieves superior detection accuracy compared to state-of-the-art baselines across multiple domains, with experimental results showing significant performance improvements on cross-domain test scenarios.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 123-128"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-11-28DOI: 10.1016/j.patrec.2025.11.040
Zirong Peng , Xiaoming Liu , Guan Yang , Jie Liu , Xueping Peng , Yang Long
The rapid evolution of large language models (LLMs) has significantly advanced the capabilities of natural language processing (NLP), enabling a broad range of applications from text generation to complex problem-solving. However, these models often struggle with verifying the reliability of their outputs for complex tasks. Chain-of-Thought (CoT) reasoning, a technique that asks LLMs to generate step-by-step reasoning paths, attempts to address the challenge by making reasoning steps explicit, yet it falls short when assumptions of process faithfulness are unmet, leading to inaccuracies. This reveals a critical gap: the absence of a mechanism to handle inherent uncertainties in reasoning processes. To bridge this gap, we propose a novel approach, the Chain of Uncertainty Reasoning (CUR), which integrates uncertainty management into LLMs’ reasoning. CUR employs prompt-based techniques to express uncertainty effectively and leverages a structured approach to introduce uncertainty through a small number of samples. This enables the model to self-assess its uncertainty and adapt to different perspectives, thus enhancing the faithfulness of its outputs. Experimental results on the datasets of StrategyQA, HotpotQA, and FEVER demonstrate that our method significantly improves performance compared to baselines, confirming the utility of incorporating uncertainty into LLM reasoning processes. This approach offers a promising direction for enhancing the reliability and trustworthiness of LLMs’ applications in various domains. Our code is publicly available at: https://github.com/PengZirong/ChainofUncertaintyReasoning.
{"title":"The uncertainty advantage: Enhancing large language models’ reliability through chain of uncertainty reasoning","authors":"Zirong Peng , Xiaoming Liu , Guan Yang , Jie Liu , Xueping Peng , Yang Long","doi":"10.1016/j.patrec.2025.11.040","DOIUrl":"10.1016/j.patrec.2025.11.040","url":null,"abstract":"<div><div>The rapid evolution of large language models (LLMs) has significantly advanced the capabilities of natural language processing (NLP), enabling a broad range of applications from text generation to complex problem-solving. However, these models often struggle with verifying the reliability of their outputs for complex tasks. Chain-of-Thought (CoT) reasoning, a technique that asks LLMs to generate step-by-step reasoning paths, attempts to address the challenge by making reasoning steps explicit, yet it falls short when assumptions of process faithfulness are unmet, leading to inaccuracies. This reveals a critical gap: the absence of a mechanism to handle inherent uncertainties in reasoning processes. To bridge this gap, we propose a novel approach, the Chain of Uncertainty Reasoning (CUR), which integrates uncertainty management into LLMs’ reasoning. CUR employs prompt-based techniques to express uncertainty effectively and leverages a structured approach to introduce uncertainty through a small number of samples. This enables the model to self-assess its uncertainty and adapt to different perspectives, thus enhancing the faithfulness of its outputs. Experimental results on the datasets of StrategyQA, HotpotQA, and FEVER demonstrate that our method significantly improves performance compared to baselines, confirming the utility of incorporating uncertainty into LLM reasoning processes. This approach offers a promising direction for enhancing the reliability and trustworthiness of LLMs’ applications in various domains. Our code is publicly available at: <span><span>https://github.com/PengZirong/ChainofUncertaintyReasoning</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 30-36"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-12-09DOI: 10.1016/j.patrec.2025.12.001
Jian Zou , Jun Wang , Kezhong Lu , Yingxin Lai , Kaiwen Luo , Zitong Yu
Generative models, such as GANs and Diffusion models, have achieved remarkable advancements in Artificial Intelligence Generated Content (AIGC), creating images that are nearly indistinguishable from real ones. However, existing detection methods often face challenges in identifying images generated by unseen models and exhibit limited generalization across different domains. In this paper, our aim is to improve the generalization capacity of AIGC image detectors by leveraging artifact features exposed during the upsampling process. Specifically, we reexamine the upsampling operations employed by generative models and observe that, in high-frequency regions of an image (e.g., edge areas with significant pixel intensity differences), generative models often struggle to accurately replicate the pixel distributions of real images, thereby leaving behind unavoidable artifact information. Based on this observation, we propose to utilize edge detection operators to enrich edge-aware detailed clues, enabling the model to focus on these critical features. Furthermore, We designed a module that combines upsampling and downsampling to analyze pixel correlation changes introduced by interpolation artifacts. The integrated approach effectively enhances the detection of subtle generative traces, thereby improving generalization across diverse generative models. Extensive experiments on three benchmark datasets demonstrate the superior performance of the proposed approach against previous state-of-the-art methods under cross-domain testing scenarios. The code is available at https://github.com/zj56/EdgeEnhanced-DeepfakeDetection.
{"title":"E2GenF: Universal AIGC image detection based on edge enhanced generalizable features","authors":"Jian Zou , Jun Wang , Kezhong Lu , Yingxin Lai , Kaiwen Luo , Zitong Yu","doi":"10.1016/j.patrec.2025.12.001","DOIUrl":"10.1016/j.patrec.2025.12.001","url":null,"abstract":"<div><div>Generative models, such as GANs and Diffusion models, have achieved remarkable advancements in Artificial Intelligence Generated Content (AIGC), creating images that are nearly indistinguishable from real ones. However, existing detection methods often face challenges in identifying images generated by unseen models and exhibit limited generalization across different domains. In this paper, our aim is to improve the generalization capacity of AIGC image detectors by leveraging artifact features exposed during the upsampling process. Specifically, we reexamine the upsampling operations employed by generative models and observe that, in high-frequency regions of an image (e.g., edge areas with significant pixel intensity differences), generative models often struggle to accurately replicate the pixel distributions of real images, thereby leaving behind unavoidable artifact information. Based on this observation, we propose to utilize edge detection operators to enrich edge-aware detailed clues, enabling the model to focus on these critical features. Furthermore, We designed a module that combines upsampling and downsampling to analyze pixel correlation changes introduced by interpolation artifacts. The integrated approach effectively enhances the detection of subtle generative traces, thereby improving generalization across diverse generative models. Extensive experiments on three benchmark datasets demonstrate the superior performance of the proposed approach against previous state-of-the-art methods under cross-domain testing scenarios. The code is available at <span><span>https://github.com/zj56/EdgeEnhanced-DeepfakeDetection</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 74-80"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-12-26DOI: 10.1016/j.patrec.2025.12.013
Qun Li , Jiru He , Tiancheng Guo , Xinping Gao , Bir Bhanu
Recent advances in Mixture of Experts (MoE) have improved the representational capacity of Vision Transformer (ViT), but most existing methods remain constrained to token-level routing or homogeneous expert scaling, overlooking the diverse representation requirements across different layers and the parameter redundancy within attention modules. To address these problems, we propose PE-ViT, a novel parameter-efficient architecture that integrates the Dimension-adaptive Mixture of Experts (DMoE) and the Selective and Shared Attention (SSA) mechanisms to improve both computational efficiency and model performance. Specifically, DMoE adaptively allocates expert dimensions through layer-wise representation analysis and incorporates shared experts to enhance parameter utilization, while SSA reduces the parameter overhead of attention by dynamically selecting attention heads and sharing query-key projections. Experimental results demonstrate that PE-ViT consistently outperforms existing MoE methods across multiple benchmark datasets.
{"title":"PE-ViT: Parameter-efficient vision transformer with dimension-adaptive experts and economical attention","authors":"Qun Li , Jiru He , Tiancheng Guo , Xinping Gao , Bir Bhanu","doi":"10.1016/j.patrec.2025.12.013","DOIUrl":"10.1016/j.patrec.2025.12.013","url":null,"abstract":"<div><div>Recent advances in Mixture of Experts (MoE) have improved the representational capacity of Vision Transformer (ViT), but most existing methods remain constrained to token-level routing or homogeneous expert scaling, overlooking the diverse representation requirements across different layers and the parameter redundancy within attention modules. To address these problems, we propose PE-ViT, a novel parameter-efficient architecture that integrates the Dimension-adaptive Mixture of Experts (DMoE) and the Selective and Shared Attention (SSA) mechanisms to improve both computational efficiency and model performance. Specifically, DMoE adaptively allocates expert dimensions through layer-wise representation analysis and incorporates shared experts to enhance parameter utilization, while SSA reduces the parameter overhead of attention by dynamically selecting attention heads and sharing query-key projections. Experimental results demonstrate that PE-ViT consistently outperforms existing MoE methods across multiple benchmark datasets.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 135-141"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-11-29DOI: 10.1016/j.patrec.2025.11.033
Jing Hu , Hengkang Ye , Weiwei Zhong , Zican Shi , Yifan Chen , Jie Ren , Xiaohui Zhu , Li Fan
Cross-Domain Few-Shot Object Detection (CD-FSOD) from visible to infrared domains faces a critical challenge: object classification proves significantly more error-prone than localization under fine-tuning adaptation. This stems from substantial representational discrepancies in internal object features between domains, which hinder effective transfer. To enhance the saliency of infrared internal object features and mitigate classification errors in few-shot visible-to-infrared transfer, we propose the Class-Aware Memory Network for Few-Shot Object Detection (CAMN-FSOD). CAMN explicitly memories high-quality internal object features during fine-tuning and leverages memory to augment features,boosting recognition accuracy during inference. Furthermore, we introduce our two-stage Decoupled-Coupled Fine-tuning approach (DCFA) to combat CAMN overfitting in few-shot training and maximize its effectiveness. We establish a visible-infrared FSOD benchmark dataset for evaluation. Extensive experiments demonstrate that CAMN-FSOD significantly enhances the few-shot learning capability of the base model without increasing trainable parameters. In the 1-shot setting, our method achieves 42.0 mAP50, which is 14.4 points higher than the baseline, and an overall mAP of 25.2, showing an improvement of 2.3 points, outperforming existing methods.
{"title":"CAMN-FSOD: Class-aware memory network for few-shot infrared object detection","authors":"Jing Hu , Hengkang Ye , Weiwei Zhong , Zican Shi , Yifan Chen , Jie Ren , Xiaohui Zhu , Li Fan","doi":"10.1016/j.patrec.2025.11.033","DOIUrl":"10.1016/j.patrec.2025.11.033","url":null,"abstract":"<div><div>Cross-Domain Few-Shot Object Detection (CD-FSOD) from visible to infrared domains faces a critical challenge: object classification proves significantly more error-prone than localization under fine-tuning adaptation. This stems from substantial representational discrepancies in internal object features between domains, which hinder effective transfer. To enhance the saliency of infrared internal object features and mitigate classification errors in few-shot visible-to-infrared transfer, we propose the Class-Aware Memory Network for Few-Shot Object Detection (CAMN-FSOD). CAMN explicitly memories high-quality internal object features during fine-tuning and leverages memory to augment features,boosting recognition accuracy during inference. Furthermore, we introduce our two-stage Decoupled-Coupled Fine-tuning approach (DCFA) to combat CAMN overfitting in few-shot training and maximize its effectiveness. We establish a visible-infrared FSOD benchmark dataset for evaluation. Extensive experiments demonstrate that CAMN-FSOD significantly enhances the few-shot learning capability of the base model without increasing trainable parameters. In the 1-shot setting, our method achieves 42.0 mAP<sub>50</sub>, which is 14.4 points higher than the baseline, and an overall mAP of 25.2, showing an improvement of 2.3 points, outperforming existing methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 16-22"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-11-27DOI: 10.1016/j.patrec.2025.11.039
Sergio A. Velastin , Ruber Hernández-García
The Iberoamerican Congress on Pattern Recognition (CIARP) is a well-established scientific event, endorsed by the International Association for Pattern Recognition (IAPR), that focuses on all aspects of pattern recognition, computer vision, artificial intelligence, data mining, and related areas. Since 1995, it has provided an important forum for researchers in IberoAmerica and beyond for presenting ongoing research, scientific results, and experiences on mathematical models, computational methods, and their applications in areas such as robotics, industry, health, space exploration, telecommunications, document analysis, and natural language processing. CIARP has helped strengthening regional cooperation and had contributed to the development of emerging research groups across Iberoamerica. The 27th edition, was held at Universidad Católica del Maule in Talca, Chile, from November 26-29, 2024, and comprised an engaging four-day program of single-track sessions, tutorials, and invited keynotes. I had the privilege to be its Program Chair. As guest editor of this Special Section, I am pleased to introduce fully extended and peer-reviewed versions of the two papers that were awarded best paper prizes in CIAPR-24. In the first one, from Argentina and Uruguay, [1] expand their work to describe a multi-sensor approach for automatic precipitation remote sensing detection using Conditional GANs and Recurrent Networks of special relevance in places where precipitations are not very common events. They integrate satellite infrared brightness temperature (IR-BT) with lighting temporal signals and argue that their proposed architecture achieves better precision than alternative methods. They suggest that their results have potential applications in cyanobacteria bloom event prediction and to help setting social policies for water resource management. This is a good example on how pattern recognition research may have a clear impact. In the second paper, from Chile, [2] extend their previous work and consider the problem of dealing with Out-Of-Distribution (ODD) data in text classification. They propose a new method, BBMOE, based on bimodal beta mixture distribution that fine-tunes pre- trained models using labeled OOD data with a bimodal Beta mixture distribution regularization that enhances differentiation between near-OOD and far-OOD data in multi-class text classification. Their results show improvements over the state-of-the-art for various datasets. We thank the authors and the reviewers for their thorough work and hope that you enjoy reading these papers and perhaps consider submitting work to a future CIARP.
伊比利亚美洲模式识别大会(CIARP)是由国际模式识别协会(IAPR)认可的一项成熟的科学活动,重点关注模式识别、计算机视觉、人工智能、数据挖掘和相关领域的各个方面。自1995年以来,它为伊比利亚美洲和其他地区的研究人员提供了一个重要论坛,介绍关于数学模型、计算方法及其在机器人、工业、卫生、空间探索、电信、文件分析和自然语言处理等领域的应用的正在进行的研究、科学成果和经验。该研究所帮助加强了区域合作,并为伊比利亚美洲各地新兴研究小组的发展作出了贡献。第27届会议于2024年11月26日至29日在智利塔尔卡(Talca)的universsidad Católica del Maule举行,为期四天,包括单轨会议、教程和受邀主题演讲。我有幸成为了它的项目主席。作为本专题的客座编辑,我很高兴向大家介绍这两篇获得CIAPR-24最佳论文奖的论文的完整扩展版和同行评议版。在第一篇来自阿根廷和乌拉圭的论文中,[1]扩展了他们的工作,描述了一种多传感器方法,用于在降水不常见的地方使用条件gan和循环网络进行自动降水遥感检测。他们将卫星红外亮度温度(IR-BT)与照明时间信号相结合,并认为他们提出的架构比其他方法具有更好的精度。他们认为,他们的研究结果在蓝藻水华事件预测和帮助制定水资源管理的社会政策方面具有潜在的应用价值。这是一个很好的例子,说明模式识别研究可能会产生明显的影响。在第二篇论文中,来自智利的[2]扩展了他们之前的工作,并考虑了文本分类中out - distribution (ODD)数据的处理问题。他们提出了一种基于双峰beta混合分布的新方法BBMOE,该方法使用带有双峰beta混合分布正则化的标记OOD数据对预训练模型进行微调,从而增强了多类文本分类中近OOD和远OOD数据的区分。他们的结果显示了对各种数据集的改进。我们感谢作者和审稿人的全面工作,并希望您喜欢阅读这些论文,并考虑将工作提交给未来的CIARP。
{"title":"Special section: CIARP-24","authors":"Sergio A. Velastin , Ruber Hernández-García","doi":"10.1016/j.patrec.2025.11.039","DOIUrl":"10.1016/j.patrec.2025.11.039","url":null,"abstract":"<div><div>The Iberoamerican Congress on Pattern Recognition (CIARP) is a well-established scientific event, endorsed by the International Association for Pattern Recognition (IAPR), that focuses on all aspects of pattern recognition, computer vision, artificial intelligence, data mining, and related areas. Since 1995, it has provided an important forum for researchers in IberoAmerica and beyond for presenting ongoing research, scientific results, and experiences on mathematical models, computational methods, and their applications in areas such as robotics, industry, health, space exploration, telecommunications, document analysis, and natural language processing. CIARP has helped strengthening regional cooperation and had contributed to the development of emerging research groups across Iberoamerica. The 27th edition, was held at Universidad Católica del Maule in Talca, Chile, from November 26-29, 2024, and comprised an engaging four-day program of single-track sessions, tutorials, and invited keynotes. I had the privilege to be its Program Chair. As guest editor of this Special Section, I am pleased to introduce fully extended and peer-reviewed versions of the two papers that were awarded best paper prizes in CIAPR-24. In the first one, from Argentina and Uruguay, <span><span>[1]</span></span> expand their work to describe a multi-sensor approach for automatic precipitation remote sensing detection using Conditional GANs and Recurrent Networks of special relevance in places where precipitations are not very common events. They integrate satellite infrared brightness temperature (IR-BT) with lighting temporal signals and argue that their proposed architecture achieves better precision than alternative methods. They suggest that their results have potential applications in cyanobacteria bloom event prediction and to help setting social policies for water resource management. This is a good example on how pattern recognition research may have a clear impact. In the second paper, from Chile, <span><span>[2]</span></span> extend their previous work and consider the problem of dealing with Out-Of-Distribution (ODD) data in text classification. They propose a new method, BBMOE, based on bimodal beta mixture distribution that fine-tunes pre- trained models using labeled OOD data with a bimodal Beta mixture distribution regularization that enhances differentiation between near-OOD and far-OOD data in multi-class text classification. Their results show improvements over the state-of-the-art for various datasets. We thank the authors and the reviewers for their thorough work and hope that you enjoy reading these papers and perhaps consider submitting work to a future CIARP.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Page 149"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-12-07DOI: 10.1016/j.patrec.2025.12.003
Yue Liu, Wenxi Yang, Jianbin Jiao
Diffusion Transformers (DiTs) combine the scalability of transformers with the fidelity of diffusion models, achieving state-of-the-art image generation performance. However, their high computational cost hinders efficient deployment. Post-Training Quantization (PTQ) offers a remedy, yet existing methods struggle with the temporal and spatial dynamics of DiTs. We propose a simplified PTQ framework-combining computationally efficient rotation and randomness-for stable and effective DiT quantization. By replacing block-wise rotations with Hadamard transforms and zigzag permutations with random permutations, our method preserves the decorrelation effect while greatly reducing computational overhead. Experiments show that our approach maintains near full-precision performance at 8-bit and 6-bit precision levels. This work demonstrates that lightweight PTQ with structured randomness can effectively balance efficiency and fidelity, enabling practical deployment of DiTs in resource-constrained environments.
{"title":"Quantized DiT with hadamard transformation: A technical report","authors":"Yue Liu, Wenxi Yang, Jianbin Jiao","doi":"10.1016/j.patrec.2025.12.003","DOIUrl":"10.1016/j.patrec.2025.12.003","url":null,"abstract":"<div><div>Diffusion Transformers (DiTs) combine the scalability of transformers with the fidelity of diffusion models, achieving state-of-the-art image generation performance. However, their high computational cost hinders efficient deployment. Post-Training Quantization (PTQ) offers a remedy, yet existing methods struggle with the temporal and spatial dynamics of DiTs. We propose a simplified PTQ framework-combining computationally efficient rotation and randomness-for stable and effective DiT quantization. By replacing block-wise rotations with Hadamard transforms and zigzag permutations with random permutations, our method preserves the decorrelation effect while greatly reducing computational overhead. Experiments show that our approach maintains near full-precision performance at 8-bit and 6-bit precision levels. This work demonstrates that lightweight PTQ with structured randomness can effectively balance efficiency and fidelity, enabling practical deployment of DiTs in resource-constrained environments.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 81-87"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}