Pub Date : 2025-11-29DOI: 10.1016/j.patrec.2025.11.032
Mohammad Alshurbaji , Maregu Assefa , Ahmad Obeid , Mohamed L. Seghier , Taimur Hassan , Kamal Taha , Naoufel Werghi
Accurate brain lesion segmentation in MRI is critical for clinical decision-making, but pixel-wise annotations remain costly and time-consuming. We propose TriGAN-SiaMT, a novel semi-supervised segmentation framework that combines adversarial learning, consistency regularization, and bounding box priors. Our architecture comprises three segmentors (S0, S1, S2) and two discriminators (D0, D1). It includes: (1) a supervised branch (S0↔D0) trained on a small labeled subset; (2) a Siamese branch (S1↔D1) with an identical architecture to S0↔D0, but trained on unlabeled data; and (3) a teacher branch (S2) updated via exponential moving average (EMA) from S1, following the Mean Teacher (MT) paradigm. The teacher S2 generates pseudo-labels to supervise S1. It also provides soft segmentations to guide D1, which does not see any labeled data. The model enforces consistency at multiple levels: between S0 and S1 (Siamese consistency), and between S1 and S2 (EMA consistency). Bounding box priors are incorporated as weak supervision for both labeled and unlabeled images, improving lesion localization. Evaluated on the ISLES 2022 and BraTS 2019 datasets, TriGAN-SiaMT achieves DSC scores of 84.80 % and 86.32 %, respectively, using only 5 % labeled data. These results demonstrate strong performance under limited supervision and robust generalization across brain lesions.
{"title":"TriGAN-SiaMT: A triple-segmentor adversarial network with bounding box priors for semi-supervised brain lesion segmentation","authors":"Mohammad Alshurbaji , Maregu Assefa , Ahmad Obeid , Mohamed L. Seghier , Taimur Hassan , Kamal Taha , Naoufel Werghi","doi":"10.1016/j.patrec.2025.11.032","DOIUrl":"10.1016/j.patrec.2025.11.032","url":null,"abstract":"<div><div>Accurate brain lesion segmentation in MRI is critical for clinical decision-making, but pixel-wise annotations remain costly and time-consuming. We propose TriGAN-SiaMT, a novel semi-supervised segmentation framework that combines adversarial learning, consistency regularization, and bounding box priors. Our architecture comprises three segmentors (<em>S</em><sub>0</sub>, <em>S</em><sub>1</sub>, <em>S</em><sub>2</sub>) and two discriminators (<em>D</em><sub>0</sub>, <em>D</em><sub>1</sub>). It includes: (1) a supervised branch (<em>S</em><sub>0</sub>↔<em>D</em><sub>0</sub>) trained on a small labeled subset; (2) a Siamese branch (<em>S</em><sub>1</sub>↔<em>D</em><sub>1</sub>) with an identical architecture to <em>S</em><sub>0</sub>↔<em>D</em><sub>0</sub>, but trained on unlabeled data; and (3) a teacher branch (<em>S</em><sub>2</sub>) updated via exponential moving average (EMA) from <em>S</em><sub>1</sub>, following the Mean Teacher (MT) paradigm. The teacher <em>S</em><sub>2</sub> generates pseudo-labels to supervise <em>S</em><sub>1</sub>. It also provides soft segmentations to guide <em>D</em><sub>1</sub>, which does not see any labeled data. The model enforces consistency at multiple levels: between <em>S</em><sub>0</sub> and <em>S</em><sub>1</sub> (Siamese consistency), and between <em>S</em><sub>1</sub> and <em>S</em><sub>2</sub> (EMA consistency). Bounding box priors are incorporated as weak supervision for both labeled and unlabeled images, improving lesion localization. Evaluated on the ISLES 2022 and BraTS 2019 datasets, TriGAN-SiaMT achieves DSC scores of 84.80 % and 86.32 %, respectively, using only 5 % labeled data. These results demonstrate strong performance under limited supervision and robust generalization across brain lesions.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 37-43"},"PeriodicalIF":3.3,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-29DOI: 10.1016/j.patrec.2025.11.033
Jing Hu , Hengkang Ye , Weiwei Zhong , Zican Shi , Yifan Chen , Jie Ren , Xiaohui Zhu , Li Fan
Cross-Domain Few-Shot Object Detection (CD-FSOD) from visible to infrared domains faces a critical challenge: object classification proves significantly more error-prone than localization under fine-tuning adaptation. This stems from substantial representational discrepancies in internal object features between domains, which hinder effective transfer. To enhance the saliency of infrared internal object features and mitigate classification errors in few-shot visible-to-infrared transfer, we propose the Class-Aware Memory Network for Few-Shot Object Detection (CAMN-FSOD). CAMN explicitly memories high-quality internal object features during fine-tuning and leverages memory to augment features,boosting recognition accuracy during inference. Furthermore, we introduce our two-stage Decoupled-Coupled Fine-tuning approach (DCFA) to combat CAMN overfitting in few-shot training and maximize its effectiveness. We establish a visible-infrared FSOD benchmark dataset for evaluation. Extensive experiments demonstrate that CAMN-FSOD significantly enhances the few-shot learning capability of the base model without increasing trainable parameters. In the 1-shot setting, our method achieves 42.0 mAP50, which is 14.4 points higher than the baseline, and an overall mAP of 25.2, showing an improvement of 2.3 points, outperforming existing methods.
{"title":"CAMN-FSOD: Class-aware memory network for few-shot infrared object detection","authors":"Jing Hu , Hengkang Ye , Weiwei Zhong , Zican Shi , Yifan Chen , Jie Ren , Xiaohui Zhu , Li Fan","doi":"10.1016/j.patrec.2025.11.033","DOIUrl":"10.1016/j.patrec.2025.11.033","url":null,"abstract":"<div><div>Cross-Domain Few-Shot Object Detection (CD-FSOD) from visible to infrared domains faces a critical challenge: object classification proves significantly more error-prone than localization under fine-tuning adaptation. This stems from substantial representational discrepancies in internal object features between domains, which hinder effective transfer. To enhance the saliency of infrared internal object features and mitigate classification errors in few-shot visible-to-infrared transfer, we propose the Class-Aware Memory Network for Few-Shot Object Detection (CAMN-FSOD). CAMN explicitly memories high-quality internal object features during fine-tuning and leverages memory to augment features,boosting recognition accuracy during inference. Furthermore, we introduce our two-stage Decoupled-Coupled Fine-tuning approach (DCFA) to combat CAMN overfitting in few-shot training and maximize its effectiveness. We establish a visible-infrared FSOD benchmark dataset for evaluation. Extensive experiments demonstrate that CAMN-FSOD significantly enhances the few-shot learning capability of the base model without increasing trainable parameters. In the 1-shot setting, our method achieves 42.0 mAP<sub>50</sub>, which is 14.4 points higher than the baseline, and an overall mAP of 25.2, showing an improvement of 2.3 points, outperforming existing methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 16-22"},"PeriodicalIF":3.3,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-29DOI: 10.1016/j.patrec.2025.11.037
Stavros N. Moutsis , Konstantinos A. Tsintotas , Ioannis Kansizoglou , Antonios Gasteratos
Rotation-invariant frameworks are crucial in many computer vision tasks, such as human action recognition (HAR), especially when applied in real-world scenarios. Since most datasets, including those on fall detection, have been generated in controlled environments with fixed camera angles, heights, and movements, approaches developed to address such tasks tend to fail when individual appearance variations occur. To address this challenge, our study proposes the use of the EVA-02-Ti lightweight vision transformer for processing people’s polar mappings and handling the task of fall detection. In particular, we strive to leverage the transformation’s rotation-invariant characteristic and correctly classify the rotated images. Towards this goal, a polar-based rotary position embedding (P-RoPE), which generates relative positions among polar patches according to r and θ axes instead of the Cartesian x and y axes, is presented. Replacing the original RoPE, we achieve an enhancement of ViT’s performance, as demonstrated in our experimental protocol, while it also outperforms a state-of-the-art approach. An evaluation was conducted on E-FPDS and VFP290k, where training was performed on initial images and testing was performed on the rotated ones. Finally, when assessed on Fashion-MNIST-rot-12k, a standard dataset for rotation-invariant scenarios, P-RoPE again surpasses both the baseline version and another benchmark method.
{"title":"P-RoPE: A polar-based rotary position embedding for polar transformed images in rotation-invariant tasks","authors":"Stavros N. Moutsis , Konstantinos A. Tsintotas , Ioannis Kansizoglou , Antonios Gasteratos","doi":"10.1016/j.patrec.2025.11.037","DOIUrl":"10.1016/j.patrec.2025.11.037","url":null,"abstract":"<div><div>Rotation-invariant frameworks are crucial in many computer vision tasks, such as human action recognition (HAR), especially when applied in real-world scenarios. Since most datasets, including those on fall detection, have been generated in controlled environments with fixed camera angles, heights, and movements, approaches developed to address such tasks tend to fail when individual appearance variations occur. To address this challenge, our study proposes the use of the EVA-02-Ti lightweight vision transformer for processing people’s polar mappings and handling the task of fall detection. In particular, we strive to leverage the transformation’s rotation-invariant characteristic and correctly classify the rotated images. Towards this goal, a polar-based rotary position embedding (P-RoPE), which generates relative positions among polar patches according to <em>r</em> and <em>θ</em> axes instead of the Cartesian <em>x</em> and <em>y</em> axes, is presented. Replacing the original RoPE, we achieve an enhancement of ViT’s performance, as demonstrated in our experimental protocol, while it also outperforms a state-of-the-art approach. An evaluation was conducted on E-FPDS and VFP290k, where training was performed on initial images and testing was performed on the rotated ones. Finally, when assessed on Fashion-MNIST-rot-12k, a standard dataset for rotation-invariant scenarios, P-RoPE again surpasses both the baseline version and another benchmark method.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 23-29"},"PeriodicalIF":3.3,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1016/j.patrec.2025.11.040
Zirong Peng , Xiaoming Liu , Guan Yang , Jie Liu , Xueping Peng , Yang Long
The rapid evolution of large language models (LLMs) has significantly advanced the capabilities of natural language processing (NLP), enabling a broad range of applications from text generation to complex problem-solving. However, these models often struggle with verifying the reliability of their outputs for complex tasks. Chain-of-Thought (CoT) reasoning, a technique that asks LLMs to generate step-by-step reasoning paths, attempts to address the challenge by making reasoning steps explicit, yet it falls short when assumptions of process faithfulness are unmet, leading to inaccuracies. This reveals a critical gap: the absence of a mechanism to handle inherent uncertainties in reasoning processes. To bridge this gap, we propose a novel approach, the Chain of Uncertainty Reasoning (CUR), which integrates uncertainty management into LLMs’ reasoning. CUR employs prompt-based techniques to express uncertainty effectively and leverages a structured approach to introduce uncertainty through a small number of samples. This enables the model to self-assess its uncertainty and adapt to different perspectives, thus enhancing the faithfulness of its outputs. Experimental results on the datasets of StrategyQA, HotpotQA, and FEVER demonstrate that our method significantly improves performance compared to baselines, confirming the utility of incorporating uncertainty into LLM reasoning processes. This approach offers a promising direction for enhancing the reliability and trustworthiness of LLMs’ applications in various domains. Our code is publicly available at: https://github.com/PengZirong/ChainofUncertaintyReasoning.
{"title":"The uncertainty advantage: Enhancing large language models’ reliability through chain of uncertainty reasoning","authors":"Zirong Peng , Xiaoming Liu , Guan Yang , Jie Liu , Xueping Peng , Yang Long","doi":"10.1016/j.patrec.2025.11.040","DOIUrl":"10.1016/j.patrec.2025.11.040","url":null,"abstract":"<div><div>The rapid evolution of large language models (LLMs) has significantly advanced the capabilities of natural language processing (NLP), enabling a broad range of applications from text generation to complex problem-solving. However, these models often struggle with verifying the reliability of their outputs for complex tasks. Chain-of-Thought (CoT) reasoning, a technique that asks LLMs to generate step-by-step reasoning paths, attempts to address the challenge by making reasoning steps explicit, yet it falls short when assumptions of process faithfulness are unmet, leading to inaccuracies. This reveals a critical gap: the absence of a mechanism to handle inherent uncertainties in reasoning processes. To bridge this gap, we propose a novel approach, the Chain of Uncertainty Reasoning (CUR), which integrates uncertainty management into LLMs’ reasoning. CUR employs prompt-based techniques to express uncertainty effectively and leverages a structured approach to introduce uncertainty through a small number of samples. This enables the model to self-assess its uncertainty and adapt to different perspectives, thus enhancing the faithfulness of its outputs. Experimental results on the datasets of StrategyQA, HotpotQA, and FEVER demonstrate that our method significantly improves performance compared to baselines, confirming the utility of incorporating uncertainty into LLM reasoning processes. This approach offers a promising direction for enhancing the reliability and trustworthiness of LLMs’ applications in various domains. Our code is publicly available at: <span><span>https://github.com/PengZirong/ChainofUncertaintyReasoning</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 30-36"},"PeriodicalIF":3.3,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1016/j.patrec.2025.11.042
Chihiro Yamasaki, Kai Sugahara, Yuma Nagi, Kazushi Okamoto
Complementary recommendations enhance the user experience by suggesting items that are frequently purchased together while serving different functions from the query item. Inferring or evaluating whether two items have a complementary relationship requires complementary relationship labels; however, defining these labels is challenging because of the inherent ambiguity of such relationships. Complementary labels based on user historical behavior logs attempt to capture these relationships, but often produce inconsistent and unreliable results. Recent efforts have introduced large language models (LLMs) to infer these relationships. However, these approaches provide a binary classification without a nuanced understanding of complementary relationships. In this study, we address these challenges by introducing Function-Based Labels (FBLs), a novel definition of complementary relationships independent of user purchase logs and the opaque decision processes of LLMs. We constructed a human-annotated FBLs dataset comprising 2759 item pairs and demonstrated that it covered possible item relationships and minimized ambiguity. We then evaluated whether machine learning methods using annotated FBLs could accurately infer labels for unseen item pairs, and whether LLM-generated complementary labels align with human perception. Among machine learning methods, ModernBERT achieved the highest performance with a Macro-F1 of 0.911, demonstrating accuracy and robustness even under limited supervision. For LLMs, GPT-4o-mini achieved high consistency (0.989) and classification accuracy (0.849) under the detailed FBL definition, while requiring only 1/842 the cost and 1/75 the time of human annotation. Overall, our study presents FBLs as a clear definition of complementary relationships, enabling more accurate inferences and automated labeling of complementary recommendations.
{"title":"Function-based labels for complementary recommendation: Definition, annotation, and LLM-as-a-Judge","authors":"Chihiro Yamasaki, Kai Sugahara, Yuma Nagi, Kazushi Okamoto","doi":"10.1016/j.patrec.2025.11.042","DOIUrl":"10.1016/j.patrec.2025.11.042","url":null,"abstract":"<div><div>Complementary recommendations enhance the user experience by suggesting items that are frequently purchased together while serving different functions from the query item. Inferring or evaluating whether two items have a complementary relationship requires complementary relationship labels; however, defining these labels is challenging because of the inherent ambiguity of such relationships. Complementary labels based on user historical behavior logs attempt to capture these relationships, but often produce inconsistent and unreliable results. Recent efforts have introduced large language models (LLMs) to infer these relationships. However, these approaches provide a binary classification without a nuanced understanding of complementary relationships. In this study, we address these challenges by introducing Function-Based Labels (FBLs), a novel definition of complementary relationships independent of user purchase logs and the opaque decision processes of LLMs. We constructed a human-annotated FBLs dataset comprising 2759 item pairs and demonstrated that it covered possible item relationships and minimized ambiguity. We then evaluated whether machine learning methods using annotated FBLs could accurately infer labels for unseen item pairs, and whether LLM-generated complementary labels align with human perception. Among machine learning methods, ModernBERT achieved the highest performance with a Macro-F1 of 0.911, demonstrating accuracy and robustness even under limited supervision. For LLMs, GPT-4o-mini achieved high consistency (0.989) and classification accuracy (0.849) under the detailed FBL definition, while requiring only 1/842 the cost and 1/75 the time of human annotation. Overall, our study presents FBLs as a clear definition of complementary relationships, enabling more accurate inferences and automated labeling of complementary recommendations.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 8-15"},"PeriodicalIF":3.3,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-27DOI: 10.1016/j.patrec.2025.11.036
Qi Fu , Yuhao Zhang , Dexi Liu , Liyuan Zhang , Wenzhong Peng
Suicide prevention has been a critical research focus for governments, mental health professionals, and social work researchers worldwide. With the increasing number of individuals seeking help through social networks and psychological counseling platforms, timely analysis of the causes of Suicidal Ideation (SI) in help-seeking texts can provide scientific evidence and actionable insights for suicide prevention efforts. Existing approaches face challenges: (i) SIC clause extraction is coarse-grained and thus imprecise in localization; (ii) SIC phrase extraction is more precise but inherently harder. To address this, we propose an end-to-end interactive joint model (EIJM) based on a clause-phrase multi-task learning (MTL) framework, where SIC phrase extraction serves as the main task and SIC clause extraction as the auxiliary task. By leveraging joint learning, EIJM enhances extraction accuracy while reducing task difficulty. Experimental results demonstrate that EIJM outperforms the two-stage independent multi-task (2SIM) approach across multiple evaluation metrics. Specifically, in the SIC phrase extraction task, EIJM achieves a 1.1 % improvement in recall over 2SIM without compromising precision. In the SIC clause extraction task, EIJM improves precision, recall, and F1-score by 0.4 %, 0.9 %, and 0.7 %, respectively. Furthermore, in 2SIM, incorporating clause-level representations from the auxiliary task into the main task enhances local matching and fuzzy matching metrics, with the Fuzzy Match method improving the most by 0.9 %. However, it yielded limited improvement in exact matching performance.
{"title":"End-to-end interactive joint model: Clause-phrase multi-task learning for suicidal ideation cause extraction (SICE) in Chinese Weibo text","authors":"Qi Fu , Yuhao Zhang , Dexi Liu , Liyuan Zhang , Wenzhong Peng","doi":"10.1016/j.patrec.2025.11.036","DOIUrl":"10.1016/j.patrec.2025.11.036","url":null,"abstract":"<div><div>Suicide prevention has been a critical research focus for governments, mental health professionals, and social work researchers worldwide. With the increasing number of individuals seeking help through social networks and psychological counseling platforms, timely analysis of the causes of Suicidal Ideation (SI) in help-seeking texts can provide scientific evidence and actionable insights for suicide prevention efforts. Existing approaches face challenges: (i) SIC clause extraction is coarse-grained and thus imprecise in localization; (ii) SIC phrase extraction is more precise but inherently harder. To address this, we propose an end-to-end interactive joint model (EIJM) based on a clause-phrase multi-task learning (MTL) framework, where SIC phrase extraction serves as the main task and SIC clause extraction as the auxiliary task. By leveraging joint learning, EIJM enhances extraction accuracy while reducing task difficulty. Experimental results demonstrate that EIJM outperforms the two-stage independent multi-task (2SIM) approach across multiple evaluation metrics. Specifically, in the SIC phrase extraction task, EIJM achieves a 1.1 % improvement in recall over 2SIM without compromising precision. In the SIC clause extraction task, EIJM improves precision, recall, and F1-score by 0.4 %, 0.9 %, and 0.7 %, respectively. Furthermore, in 2SIM, incorporating clause-level representations from the auxiliary task into the main task enhances local matching and fuzzy matching metrics, with the Fuzzy Match method improving the most by 0.9 %. However, it yielded limited improvement in exact matching performance.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 1-7"},"PeriodicalIF":3.3,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145658791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-27DOI: 10.1016/j.patrec.2025.11.039
Sergio A. Velastin , Ruber Hernández-García
The Iberoamerican Congress on Pattern Recognition (CIARP) is a well-established scientific event, endorsed by the International Association for Pattern Recognition (IAPR), that focuses on all aspects of pattern recognition, computer vision, artificial intelligence, data mining, and related areas. Since 1995, it has provided an important forum for researchers in IberoAmerica and beyond for presenting ongoing research, scientific results, and experiences on mathematical models, computational methods, and their applications in areas such as robotics, industry, health, space exploration, telecommunications, document analysis, and natural language processing. CIARP has helped strengthening regional cooperation and had contributed to the development of emerging research groups across Iberoamerica. The 27th edition, was held at Universidad Católica del Maule in Talca, Chile, from November 26-29, 2024, and comprised an engaging four-day program of single-track sessions, tutorials, and invited keynotes. I had the privilege to be its Program Chair. As guest editor of this Special Section, I am pleased to introduce fully extended and peer-reviewed versions of the two papers that were awarded best paper prizes in CIAPR-24. In the first one, from Argentina and Uruguay, [1] expand their work to describe a multi-sensor approach for automatic precipitation remote sensing detection using Conditional GANs and Recurrent Networks of special relevance in places where precipitations are not very common events. They integrate satellite infrared brightness temperature (IR-BT) with lighting temporal signals and argue that their proposed architecture achieves better precision than alternative methods. They suggest that their results have potential applications in cyanobacteria bloom event prediction and to help setting social policies for water resource management. This is a good example on how pattern recognition research may have a clear impact. In the second paper, from Chile, [2] extend their previous work and consider the problem of dealing with Out-Of-Distribution (ODD) data in text classification. They propose a new method, BBMOE, based on bimodal beta mixture distribution that fine-tunes pre- trained models using labeled OOD data with a bimodal Beta mixture distribution regularization that enhances differentiation between near-OOD and far-OOD data in multi-class text classification. Their results show improvements over the state-of-the-art for various datasets. We thank the authors and the reviewers for their thorough work and hope that you enjoy reading these papers and perhaps consider submitting work to a future CIARP.
伊比利亚美洲模式识别大会(CIARP)是由国际模式识别协会(IAPR)认可的一项成熟的科学活动,重点关注模式识别、计算机视觉、人工智能、数据挖掘和相关领域的各个方面。自1995年以来,它为伊比利亚美洲和其他地区的研究人员提供了一个重要论坛,介绍关于数学模型、计算方法及其在机器人、工业、卫生、空间探索、电信、文件分析和自然语言处理等领域的应用的正在进行的研究、科学成果和经验。该研究所帮助加强了区域合作,并为伊比利亚美洲各地新兴研究小组的发展作出了贡献。第27届会议于2024年11月26日至29日在智利塔尔卡(Talca)的universsidad Católica del Maule举行,为期四天,包括单轨会议、教程和受邀主题演讲。我有幸成为了它的项目主席。作为本专题的客座编辑,我很高兴向大家介绍这两篇获得CIAPR-24最佳论文奖的论文的完整扩展版和同行评议版。在第一篇来自阿根廷和乌拉圭的论文中,[1]扩展了他们的工作,描述了一种多传感器方法,用于在降水不常见的地方使用条件gan和循环网络进行自动降水遥感检测。他们将卫星红外亮度温度(IR-BT)与照明时间信号相结合,并认为他们提出的架构比其他方法具有更好的精度。他们认为,他们的研究结果在蓝藻水华事件预测和帮助制定水资源管理的社会政策方面具有潜在的应用价值。这是一个很好的例子,说明模式识别研究可能会产生明显的影响。在第二篇论文中,来自智利的[2]扩展了他们之前的工作,并考虑了文本分类中out - distribution (ODD)数据的处理问题。他们提出了一种基于双峰beta混合分布的新方法BBMOE,该方法使用带有双峰beta混合分布正则化的标记OOD数据对预训练模型进行微调,从而增强了多类文本分类中近OOD和远OOD数据的区分。他们的结果显示了对各种数据集的改进。我们感谢作者和审稿人的全面工作,并希望您喜欢阅读这些论文,并考虑将工作提交给未来的CIARP。
{"title":"Special section: CIARP-24","authors":"Sergio A. Velastin , Ruber Hernández-García","doi":"10.1016/j.patrec.2025.11.039","DOIUrl":"10.1016/j.patrec.2025.11.039","url":null,"abstract":"<div><div>The Iberoamerican Congress on Pattern Recognition (CIARP) is a well-established scientific event, endorsed by the International Association for Pattern Recognition (IAPR), that focuses on all aspects of pattern recognition, computer vision, artificial intelligence, data mining, and related areas. Since 1995, it has provided an important forum for researchers in IberoAmerica and beyond for presenting ongoing research, scientific results, and experiences on mathematical models, computational methods, and their applications in areas such as robotics, industry, health, space exploration, telecommunications, document analysis, and natural language processing. CIARP has helped strengthening regional cooperation and had contributed to the development of emerging research groups across Iberoamerica. The 27th edition, was held at Universidad Católica del Maule in Talca, Chile, from November 26-29, 2024, and comprised an engaging four-day program of single-track sessions, tutorials, and invited keynotes. I had the privilege to be its Program Chair. As guest editor of this Special Section, I am pleased to introduce fully extended and peer-reviewed versions of the two papers that were awarded best paper prizes in CIAPR-24. In the first one, from Argentina and Uruguay, <span><span>[1]</span></span> expand their work to describe a multi-sensor approach for automatic precipitation remote sensing detection using Conditional GANs and Recurrent Networks of special relevance in places where precipitations are not very common events. They integrate satellite infrared brightness temperature (IR-BT) with lighting temporal signals and argue that their proposed architecture achieves better precision than alternative methods. They suggest that their results have potential applications in cyanobacteria bloom event prediction and to help setting social policies for water resource management. This is a good example on how pattern recognition research may have a clear impact. In the second paper, from Chile, <span><span>[2]</span></span> extend their previous work and consider the problem of dealing with Out-Of-Distribution (ODD) data in text classification. They propose a new method, BBMOE, based on bimodal beta mixture distribution that fine-tunes pre- trained models using labeled OOD data with a bimodal Beta mixture distribution regularization that enhances differentiation between near-OOD and far-OOD data in multi-class text classification. Their results show improvements over the state-of-the-art for various datasets. We thank the authors and the reviewers for their thorough work and hope that you enjoy reading these papers and perhaps consider submitting work to a future CIARP.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Page 149"},"PeriodicalIF":3.3,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-26DOI: 10.1016/j.patrec.2025.11.041
Chunying Song, Qiong Wang, Zeren Sun, Huafeng Liu
Monocular 3D lane detection is a critical yet challenging task in autonomous driving, largely due to the lack of depth cues, complex road geometries, and appearance variations in real-world environments. Existing approaches often depend on bird’s-eye-view transformations or rigid geometric assumptions, which may introduce projection artifacts and hinder generalization. In this paper, we present GeoCNet, a BEV-free framework that directly estimates 3D lanes in the perspective domain. The architecture incorporates three key components: a Geometry-Guided Spatial Transformer (GST) for adaptive multi-plane ground modeling, a Perception-Aware Feature Modulation (PFM) module for context-driven feature refinement, and a Structure-Aware Lane Decoder (SALD) that reconstructs lanes as curvature-regularized anchor-aligned sequences. Extensive experiments on the OpenLane dataset demonstrate that GeoCNet achieves competitive performance in overall accuracy and shows clear improvements in challenging conditions such as night scenes and complex intersections. Additional evaluation on the Apollo Synthetic dataset further confirms the robustness and cross-domain generalization of the proposed framework. These results underscore the effectiveness of jointly leveraging geometry and contextual cues for accurate and reliable monocular 3D lane detection. Our code has been released at https://github.com/chunyingsong/GeoCNet.
{"title":"Monocular 3D lane detection with geometry-guided transformation and contextual enhancement","authors":"Chunying Song, Qiong Wang, Zeren Sun, Huafeng Liu","doi":"10.1016/j.patrec.2025.11.041","DOIUrl":"10.1016/j.patrec.2025.11.041","url":null,"abstract":"<div><div>Monocular 3D lane detection is a critical yet challenging task in autonomous driving, largely due to the lack of depth cues, complex road geometries, and appearance variations in real-world environments. Existing approaches often depend on bird’s-eye-view transformations or rigid geometric assumptions, which may introduce projection artifacts and hinder generalization. In this paper, we present GeoCNet, a BEV-free framework that directly estimates 3D lanes in the perspective domain. The architecture incorporates three key components: a Geometry-Guided Spatial Transformer (GST) for adaptive multi-plane ground modeling, a Perception-Aware Feature Modulation (PFM) module for context-driven feature refinement, and a Structure-Aware Lane Decoder (SALD) that reconstructs lanes as curvature-regularized anchor-aligned sequences. Extensive experiments on the OpenLane dataset demonstrate that GeoCNet achieves competitive performance in overall accuracy and shows clear improvements in challenging conditions such as night scenes and complex intersections. Additional evaluation on the Apollo Synthetic dataset further confirms the robustness and cross-domain generalization of the proposed framework. These results underscore the effectiveness of jointly leveraging geometry and contextual cues for accurate and reliable monocular 3D lane detection. Our code has been released at <span><span>https://github.com/chunyingsong/GeoCNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 278-284"},"PeriodicalIF":3.3,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-26DOI: 10.1016/j.patrec.2025.11.038
Jiaxin Gong, Jindong Xu, Haoqin Sun
In the field of Blind Image Separation (BIS), typical applications include rain/snow removal and reflection/shadow layer separation. However, the existing BIS methods generally rely on artificial priors or variants based on CNN or GAN, which are difficult to describe the complex and variable feature distribution of the source image in the real scene, resulting in defects such as source image separation estimation bias, texture distortion and artifact residue in the case of strong noise, nonlinear mixing and high coupling of texture details. To address this issue, this paper innovatively introduces diffusion models into BIS and proposes an efficient Dual Branch Interactive Diffusion Model (DBIDM). DBIDM employs a conditional diffusion model to learn the feature distribution of source images and performs an initial reconstruction of source image feature structures. Furthermore, considering that the two source images are mutually coupled with noise, we designed a Wavelet Interactive Decoupling Module (WIDM). This module is integrated into the diffusion denoising process to improve the separation of detailed information in mixed images. Experiments on synthetic datasets containing rain/snow and complex mixed interference demonstrate that the proposed DBIDM method achieves breakthrough performance in both image restoration and blind separation tasks. Specifically, in single-source degraded scenarios, DBIDM reaches optimal levels of 35.0023 dB (PSNR) and 0.9549 (SSIM) in the rain removal task. It outperforms comparison method by an average of 1.2570 dB and 0.0262. For the snow removal task, improvements of 0.9272 dB and 0.0289 over the second-best indicators are also achieved. In complex dual blind separation scenarios, the restored dual-source images significantly surpass other methods in terms of texture fidelity and detail integrity. There are improvements of 4.1249 dB in PSNR and 0.0926 in SSIM. This effectively addresses the issues of information loss and artifact remnants caused by complex coupling interference.
在盲图像分离(BIS)领域,典型的应用包括雨/雪去除和反射/阴影层分离。然而,现有的BIS方法一般依赖于基于CNN或GAN的人工先验或变体,难以描述真实场景中源图像复杂多变的特征分布,导致在强噪声、非线性混合和纹理细节高耦合的情况下存在源图像分离估计偏差、纹理失真和伪影残留等缺陷。为了解决这一问题,本文创新性地将扩散模型引入到BIS中,提出了一种高效的双分支交互扩散模型(DBIDM)。DBIDM采用条件扩散模型学习源图像的特征分布,对源图像特征结构进行初始重建。此外,考虑到两个源图像相互耦合并带有噪声,我们设计了一个小波交互解耦模块(WIDM)。该模块被集成到扩散去噪过程中,以提高混合图像中详细信息的分离。在包含雨雪和复杂混合干扰的合成数据集上的实验表明,DBIDM方法在图像恢复和盲分离任务上都取得了突破性的性能。其中,在单源退化场景下,DBIDM在降雨任务中分别达到35.0023 dB (PSNR)和0.9549 (SSIM)的最优水平。比比较法平均高出1.2570 dB和0.0262 dB。对于除雪任务,也比次优指标提高了0.9272 dB和0.0289。在复杂的双盲分离场景下,恢复的双源图像在纹理保真度和细节完整性方面明显优于其他方法。PSNR提高4.1249 dB, SSIM提高0.0926 dB。这有效地解决了由复杂耦合干扰引起的信息丢失和工件残留问题。
{"title":"DBIDM: Implementing blind image separation through a dual branch interactive diffusion model","authors":"Jiaxin Gong, Jindong Xu, Haoqin Sun","doi":"10.1016/j.patrec.2025.11.038","DOIUrl":"10.1016/j.patrec.2025.11.038","url":null,"abstract":"<div><div>In the field of Blind Image Separation (BIS), typical applications include rain/snow removal and reflection/shadow layer separation. However, the existing BIS methods generally rely on artificial priors or variants based on CNN or GAN, which are difficult to describe the complex and variable feature distribution of the source image in the real scene, resulting in defects such as source image separation estimation bias, texture distortion and artifact residue in the case of strong noise, nonlinear mixing and high coupling of texture details. To address this issue, this paper innovatively introduces diffusion models into BIS and proposes an efficient Dual Branch Interactive Diffusion Model (DBIDM). DBIDM employs a conditional diffusion model to learn the feature distribution of source images and performs an initial reconstruction of source image feature structures. Furthermore, considering that the two source images are mutually coupled with noise, we designed a Wavelet Interactive Decoupling Module (WIDM). This module is integrated into the diffusion denoising process to improve the separation of detailed information in mixed images. Experiments on synthetic datasets containing rain/snow and complex mixed interference demonstrate that the proposed DBIDM method achieves breakthrough performance in both image restoration and blind separation tasks. Specifically, in single-source degraded scenarios, DBIDM reaches optimal levels of 35.0023 dB (PSNR) and 0.9549 (SSIM) in the rain removal task. It outperforms comparison method by an average of 1.2570 dB and 0.0262. For the snow removal task, improvements of 0.9272 dB and 0.0289 over the second-best indicators are also achieved. In complex dual blind separation scenarios, the restored dual-source images significantly surpass other methods in terms of texture fidelity and detail integrity. There are improvements of 4.1249 dB in PSNR and 0.0926 in SSIM. This effectively addresses the issues of information loss and artifact remnants caused by complex coupling interference.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 44-51"},"PeriodicalIF":3.3,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-25DOI: 10.1016/j.patrec.2025.11.029
Michela Gravina , Giuseppe Pontillo , Zeena Shawa , James H. Cole , Carlo Sansone
Predicting brain age from structural Magnetic Resonance Imaging (MRI) has emerged as a critical task at the intersection of medical imaging and Artificial Intelligence, with Deep learning (DL) models achieving state-of-the-art performance. However, despite their predictive power, such models remain susceptible to algorithmic bias, especially when applied to populations whose demographic characteristics differ from those seen during training. In this paper, we investigate how demographic factors influence the performance of brain age prediction models. We leverage a large, demographically diverse MRI dataset including 7480 healthy subjects (3599 female and 3881 male) spanning three major racial groups: White, Black, and Asian. To explore the effects of data composition and model architecture on generalization, we design and compare multiple training paradigms, including models trained on single group and a Multi-Input architecture that explicitly incorporates demographic metadata. Results on an external test set including 3194 subjects (2162 White, 694 Black, and 338 Asian) reveal evidence of demographic bias, with the Multi-Input model achieving the most balanced performance across groups (mean absolute error: 2.94 ± 0.07 for White, 2.91 ± 0.16 for Black, and 3.34 ± 0.17 for Asian subjects). These findings highlight the need for fairness-aware approaches, advocating for strategies that mitigate bias, and enhance generalizability.
{"title":"Assessing demographic bias in brain age prediction models using multiple deep learning paradigms","authors":"Michela Gravina , Giuseppe Pontillo , Zeena Shawa , James H. Cole , Carlo Sansone","doi":"10.1016/j.patrec.2025.11.029","DOIUrl":"10.1016/j.patrec.2025.11.029","url":null,"abstract":"<div><div>Predicting brain age from structural Magnetic Resonance Imaging (MRI) has emerged as a critical task at the intersection of medical imaging and Artificial Intelligence, with Deep learning (DL) models achieving state-of-the-art performance. However, despite their predictive power, such models remain susceptible to algorithmic bias, especially when applied to populations whose demographic characteristics differ from those seen during training. In this paper, we investigate how demographic factors influence the performance of brain age prediction models. We leverage a large, demographically diverse MRI dataset including 7480 healthy subjects (3599 female and 3881 male) spanning three major racial groups: White, Black, and Asian. To explore the effects of data composition and model architecture on generalization, we design and compare multiple training paradigms, including models trained on single group and a Multi-Input architecture that explicitly incorporates demographic metadata. Results on an external test set including 3194 subjects (2162 White, 694 Black, and 338 Asian) reveal evidence of demographic bias, with the Multi-Input model achieving the most balanced performance across groups (mean absolute error: 2.94 ± 0.07 for White, 2.91 ± 0.16 for Black, and 3.34 ± 0.17 for Asian subjects). These findings highlight the need for fairness-aware approaches, advocating for strategies that mitigate bias, and enhance generalizability.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 246-253"},"PeriodicalIF":3.3,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145617786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}