首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
TriGAN-SiaMT: A triple-segmentor adversarial network with bounding box priors for semi-supervised brain lesion segmentation TriGAN-SiaMT:一种具有边界盒先验的半监督脑损伤分割的三分段对抗网络
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-29 DOI: 10.1016/j.patrec.2025.11.032
Mohammad Alshurbaji , Maregu Assefa , Ahmad Obeid , Mohamed L. Seghier , Taimur Hassan , Kamal Taha , Naoufel Werghi
Accurate brain lesion segmentation in MRI is critical for clinical decision-making, but pixel-wise annotations remain costly and time-consuming. We propose TriGAN-SiaMT, a novel semi-supervised segmentation framework that combines adversarial learning, consistency regularization, and bounding box priors. Our architecture comprises three segmentors (S0, S1, S2) and two discriminators (D0, D1). It includes: (1) a supervised branch (S0D0) trained on a small labeled subset; (2) a Siamese branch (S1D1) with an identical architecture to S0D0, but trained on unlabeled data; and (3) a teacher branch (S2) updated via exponential moving average (EMA) from S1, following the Mean Teacher (MT) paradigm. The teacher S2 generates pseudo-labels to supervise S1. It also provides soft segmentations to guide D1, which does not see any labeled data. The model enforces consistency at multiple levels: between S0 and S1 (Siamese consistency), and between S1 and S2 (EMA consistency). Bounding box priors are incorporated as weak supervision for both labeled and unlabeled images, improving lesion localization. Evaluated on the ISLES 2022 and BraTS 2019 datasets, TriGAN-SiaMT achieves DSC scores of 84.80 % and 86.32 %, respectively, using only 5 % labeled data. These results demonstrate strong performance under limited supervision and robust generalization across brain lesions.
MRI中准确的脑病变分割对临床决策至关重要,但像素化注释仍然昂贵且耗时。我们提出了TriGAN-SiaMT,一种结合了对抗学习、一致性正则化和边界盒先验的新型半监督分割框架。我们的架构包括三个分段器(S0, S1, S2)和两个鉴别器(D0, D1)。它包括:(1)在一个小的有标签子集上训练的监督分支(S0↔D0);(2)一个Siamese分支(S1↔D1),其结构与S0↔D0相同,但训练在未标记数据上;(3)教师分支(S2)通过指数移动平均线(EMA)从S1更新,遵循平均教师(MT)范式。老师S2生成伪标签来监督S1。它还提供软分割来引导D1,它看不到任何标记数据。该模型在多个级别强制一致性:在S0和S1之间(Siamese一致性),以及在S1和S2之间(EMA一致性)。结合边界盒先验作为对标记和未标记图像的弱监督,提高病灶定位。在ISLES 2022和BraTS 2019数据集上进行评估,仅使用5%的标记数据,TriGAN-SiaMT的DSC得分分别为84.80%和86.32%。这些结果表明,在有限的监督下,这些结果具有很强的性能,并且在脑病变中具有强大的泛化性。
{"title":"TriGAN-SiaMT: A triple-segmentor adversarial network with bounding box priors for semi-supervised brain lesion segmentation","authors":"Mohammad Alshurbaji ,&nbsp;Maregu Assefa ,&nbsp;Ahmad Obeid ,&nbsp;Mohamed L. Seghier ,&nbsp;Taimur Hassan ,&nbsp;Kamal Taha ,&nbsp;Naoufel Werghi","doi":"10.1016/j.patrec.2025.11.032","DOIUrl":"10.1016/j.patrec.2025.11.032","url":null,"abstract":"<div><div>Accurate brain lesion segmentation in MRI is critical for clinical decision-making, but pixel-wise annotations remain costly and time-consuming. We propose TriGAN-SiaMT, a novel semi-supervised segmentation framework that combines adversarial learning, consistency regularization, and bounding box priors. Our architecture comprises three segmentors (<em>S</em><sub>0</sub>, <em>S</em><sub>1</sub>, <em>S</em><sub>2</sub>) and two discriminators (<em>D</em><sub>0</sub>, <em>D</em><sub>1</sub>). It includes: (1) a supervised branch (<em>S</em><sub>0</sub>↔<em>D</em><sub>0</sub>) trained on a small labeled subset; (2) a Siamese branch (<em>S</em><sub>1</sub>↔<em>D</em><sub>1</sub>) with an identical architecture to <em>S</em><sub>0</sub>↔<em>D</em><sub>0</sub>, but trained on unlabeled data; and (3) a teacher branch (<em>S</em><sub>2</sub>) updated via exponential moving average (EMA) from <em>S</em><sub>1</sub>, following the Mean Teacher (MT) paradigm. The teacher <em>S</em><sub>2</sub> generates pseudo-labels to supervise <em>S</em><sub>1</sub>. It also provides soft segmentations to guide <em>D</em><sub>1</sub>, which does not see any labeled data. The model enforces consistency at multiple levels: between <em>S</em><sub>0</sub> and <em>S</em><sub>1</sub> (Siamese consistency), and between <em>S</em><sub>1</sub> and <em>S</em><sub>2</sub> (EMA consistency). Bounding box priors are incorporated as weak supervision for both labeled and unlabeled images, improving lesion localization. Evaluated on the ISLES 2022 and BraTS 2019 datasets, TriGAN-SiaMT achieves DSC scores of 84.80 % and 86.32 %, respectively, using only 5 % labeled data. These results demonstrate strong performance under limited supervision and robust generalization across brain lesions.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 37-43"},"PeriodicalIF":3.3,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAMN-FSOD: Class-aware memory network for few-shot infrared object detection CAMN-FSOD:类感知记忆网络,用于少量红外目标检测
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-29 DOI: 10.1016/j.patrec.2025.11.033
Jing Hu , Hengkang Ye , Weiwei Zhong , Zican Shi , Yifan Chen , Jie Ren , Xiaohui Zhu , Li Fan
Cross-Domain Few-Shot Object Detection (CD-FSOD) from visible to infrared domains faces a critical challenge: object classification proves significantly more error-prone than localization under fine-tuning adaptation. This stems from substantial representational discrepancies in internal object features between domains, which hinder effective transfer. To enhance the saliency of infrared internal object features and mitigate classification errors in few-shot visible-to-infrared transfer, we propose the Class-Aware Memory Network for Few-Shot Object Detection (CAMN-FSOD). CAMN explicitly memories high-quality internal object features during fine-tuning and leverages memory to augment features,boosting recognition accuracy during inference. Furthermore, we introduce our two-stage Decoupled-Coupled Fine-tuning approach (DCFA) to combat CAMN overfitting in few-shot training and maximize its effectiveness. We establish a visible-infrared FSOD benchmark dataset for evaluation. Extensive experiments demonstrate that CAMN-FSOD significantly enhances the few-shot learning capability of the base model without increasing trainable parameters. In the 1-shot setting, our method achieves 42.0 mAP50, which is 14.4 points higher than the baseline, and an overall mAP of 25.2, showing an improvement of 2.3 points, outperforming existing methods.
从可见光到红外域的跨域少射目标检测(CD-FSOD)面临着一个关键的挑战:在微调适应下,目标分类比定位更容易出错。这源于领域之间内部对象特征的大量表征差异,这阻碍了有效的转移。为了增强红外内部目标特征的显着性,减少少量可见到红外传输中的分类错误,我们提出了用于少量目标检测的类别感知记忆网络(CAMN-FSOD)。CAMN在微调期间显式地存储高质量的内部对象特征,并利用内存来增强特征,从而在推理期间提高识别准确性。此外,我们介绍了我们的两阶段解耦耦合微调方法(DCFA)来对抗CAMN过拟合,并最大化其有效性。我们建立了一个可见-红外FSOD基准数据集进行评价。大量实验表明,在不增加可训练参数的情况下,CAMN-FSOD显著提高了基本模型的少镜头学习能力。在1镜头设置下,我们的方法实现了42.0的mAP50,比基线提高了14.4分,整体mAP为25.2,提高了2.3分,优于现有方法。
{"title":"CAMN-FSOD: Class-aware memory network for few-shot infrared object detection","authors":"Jing Hu ,&nbsp;Hengkang Ye ,&nbsp;Weiwei Zhong ,&nbsp;Zican Shi ,&nbsp;Yifan Chen ,&nbsp;Jie Ren ,&nbsp;Xiaohui Zhu ,&nbsp;Li Fan","doi":"10.1016/j.patrec.2025.11.033","DOIUrl":"10.1016/j.patrec.2025.11.033","url":null,"abstract":"<div><div>Cross-Domain Few-Shot Object Detection (CD-FSOD) from visible to infrared domains faces a critical challenge: object classification proves significantly more error-prone than localization under fine-tuning adaptation. This stems from substantial representational discrepancies in internal object features between domains, which hinder effective transfer. To enhance the saliency of infrared internal object features and mitigate classification errors in few-shot visible-to-infrared transfer, we propose the Class-Aware Memory Network for Few-Shot Object Detection (CAMN-FSOD). CAMN explicitly memories high-quality internal object features during fine-tuning and leverages memory to augment features,boosting recognition accuracy during inference. Furthermore, we introduce our two-stage Decoupled-Coupled Fine-tuning approach (DCFA) to combat CAMN overfitting in few-shot training and maximize its effectiveness. We establish a visible-infrared FSOD benchmark dataset for evaluation. Extensive experiments demonstrate that CAMN-FSOD significantly enhances the few-shot learning capability of the base model without increasing trainable parameters. In the 1-shot setting, our method achieves 42.0 mAP<sub>50</sub>, which is 14.4 points higher than the baseline, and an overall mAP of 25.2, showing an improvement of 2.3 points, outperforming existing methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 16-22"},"PeriodicalIF":3.3,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
P-RoPE: A polar-based rotary position embedding for polar transformed images in rotation-invariant tasks P-RoPE:一种基于极的旋转位置嵌入方法,用于旋转不变任务中的极变换图像
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-29 DOI: 10.1016/j.patrec.2025.11.037
Stavros N. Moutsis , Konstantinos A. Tsintotas , Ioannis Kansizoglou , Antonios Gasteratos
Rotation-invariant frameworks are crucial in many computer vision tasks, such as human action recognition (HAR), especially when applied in real-world scenarios. Since most datasets, including those on fall detection, have been generated in controlled environments with fixed camera angles, heights, and movements, approaches developed to address such tasks tend to fail when individual appearance variations occur. To address this challenge, our study proposes the use of the EVA-02-Ti lightweight vision transformer for processing people’s polar mappings and handling the task of fall detection. In particular, we strive to leverage the transformation’s rotation-invariant characteristic and correctly classify the rotated images. Towards this goal, a polar-based rotary position embedding (P-RoPE), which generates relative positions among polar patches according to r and θ axes instead of the Cartesian x and y axes, is presented. Replacing the original RoPE, we achieve an enhancement of ViT’s performance, as demonstrated in our experimental protocol, while it also outperforms a state-of-the-art approach. An evaluation was conducted on E-FPDS and VFP290k, where training was performed on initial images and testing was performed on the rotated ones. Finally, when assessed on Fashion-MNIST-rot-12k, a standard dataset for rotation-invariant scenarios, P-RoPE again surpasses both the baseline version and another benchmark method.
旋转不变框架在许多计算机视觉任务中至关重要,例如人类动作识别(HAR),特别是在实际场景中应用时。由于大多数数据集(包括跌倒检测数据集)都是在固定摄像机角度、高度和运动的受控环境中生成的,因此当个体外观发生变化时,用于解决此类任务的方法往往会失败。为了解决这一挑战,我们的研究提出使用EVA-02-Ti轻型视觉变压器来处理人的极性映射和处理跌倒检测任务。特别是,我们努力利用变换的旋转不变性特征并正确分类旋转图像。为了实现这一目标,提出了一种基于极的旋转位置嵌入(P-RoPE)方法,该方法根据r轴和θ轴而不是笛卡尔的x轴和y轴来生成极块之间的相对位置。正如我们的实验方案所证明的那样,取代原始RoPE,我们实现了ViT性能的增强,同时它也优于最先进的方法。对E-FPDS和VFP290k进行了评估,其中对初始图像进行了训练,对旋转图像进行了测试。最后,在fashion - mist -rot-12k(一个用于旋转不变场景的标准数据集)上进行评估时,P-RoPE再次超过了基线版本和另一种基准方法。
{"title":"P-RoPE: A polar-based rotary position embedding for polar transformed images in rotation-invariant tasks","authors":"Stavros N. Moutsis ,&nbsp;Konstantinos A. Tsintotas ,&nbsp;Ioannis Kansizoglou ,&nbsp;Antonios Gasteratos","doi":"10.1016/j.patrec.2025.11.037","DOIUrl":"10.1016/j.patrec.2025.11.037","url":null,"abstract":"<div><div>Rotation-invariant frameworks are crucial in many computer vision tasks, such as human action recognition (HAR), especially when applied in real-world scenarios. Since most datasets, including those on fall detection, have been generated in controlled environments with fixed camera angles, heights, and movements, approaches developed to address such tasks tend to fail when individual appearance variations occur. To address this challenge, our study proposes the use of the EVA-02-Ti lightweight vision transformer for processing people’s polar mappings and handling the task of fall detection. In particular, we strive to leverage the transformation’s rotation-invariant characteristic and correctly classify the rotated images. Towards this goal, a polar-based rotary position embedding (P-RoPE), which generates relative positions among polar patches according to <em>r</em> and <em>θ</em> axes instead of the Cartesian <em>x</em> and <em>y</em> axes, is presented. Replacing the original RoPE, we achieve an enhancement of ViT’s performance, as demonstrated in our experimental protocol, while it also outperforms a state-of-the-art approach. An evaluation was conducted on E-FPDS and VFP290k, where training was performed on initial images and testing was performed on the rotated ones. Finally, when assessed on Fashion-MNIST-rot-12k, a standard dataset for rotation-invariant scenarios, P-RoPE again surpasses both the baseline version and another benchmark method.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 23-29"},"PeriodicalIF":3.3,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The uncertainty advantage: Enhancing large language models’ reliability through chain of uncertainty reasoning 不确定性优势:通过不确定性推理链提高大型语言模型的可靠性
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-28 DOI: 10.1016/j.patrec.2025.11.040
Zirong Peng , Xiaoming Liu , Guan Yang , Jie Liu , Xueping Peng , Yang Long
The rapid evolution of large language models (LLMs) has significantly advanced the capabilities of natural language processing (NLP), enabling a broad range of applications from text generation to complex problem-solving. However, these models often struggle with verifying the reliability of their outputs for complex tasks. Chain-of-Thought (CoT) reasoning, a technique that asks LLMs to generate step-by-step reasoning paths, attempts to address the challenge by making reasoning steps explicit, yet it falls short when assumptions of process faithfulness are unmet, leading to inaccuracies. This reveals a critical gap: the absence of a mechanism to handle inherent uncertainties in reasoning processes. To bridge this gap, we propose a novel approach, the Chain of Uncertainty Reasoning (CUR), which integrates uncertainty management into LLMs’ reasoning. CUR employs prompt-based techniques to express uncertainty effectively and leverages a structured approach to introduce uncertainty through a small number of samples. This enables the model to self-assess its uncertainty and adapt to different perspectives, thus enhancing the faithfulness of its outputs. Experimental results on the datasets of StrategyQA, HotpotQA, and FEVER demonstrate that our method significantly improves performance compared to baselines, confirming the utility of incorporating uncertainty into LLM reasoning processes. This approach offers a promising direction for enhancing the reliability and trustworthiness of LLMs’ applications in various domains. Our code is publicly available at: https://github.com/PengZirong/ChainofUncertaintyReasoning.
大型语言模型(llm)的快速发展极大地提高了自然语言处理(NLP)的能力,使从文本生成到复杂问题解决的广泛应用成为可能。然而,这些模型在验证复杂任务输出的可靠性方面经常遇到困难。思维链(CoT)推理是一种要求法学硕士生成一步一步推理路径的技术,它试图通过明确推理步骤来解决挑战,然而,当过程忠实性的假设不满足时,它就会失败,从而导致不准确。这揭示了一个关键的差距:缺乏一种机制来处理推理过程中固有的不确定性。为了弥补这一差距,我们提出了一种新的方法,即不确定性推理链(CUR),它将不确定性管理集成到法学硕士的推理中。CUR采用基于提示的技术来有效地表达不确定性,并利用结构化的方法通过少量样本引入不确定性。这使模型能够自我评估其不确定性并适应不同的观点,从而提高其输出的可信度。在StrategyQA、HotpotQA和FEVER数据集上的实验结果表明,与基线相比,我们的方法显著提高了性能,证实了将不确定性纳入LLM推理过程的实用性。该方法为提高法学硕士在各个领域应用的可靠性和可信度提供了一个有希望的方向。我们的代码可以在https://github.com/PengZirong/ChainofUncertaintyReasoning上公开获得。
{"title":"The uncertainty advantage: Enhancing large language models’ reliability through chain of uncertainty reasoning","authors":"Zirong Peng ,&nbsp;Xiaoming Liu ,&nbsp;Guan Yang ,&nbsp;Jie Liu ,&nbsp;Xueping Peng ,&nbsp;Yang Long","doi":"10.1016/j.patrec.2025.11.040","DOIUrl":"10.1016/j.patrec.2025.11.040","url":null,"abstract":"<div><div>The rapid evolution of large language models (LLMs) has significantly advanced the capabilities of natural language processing (NLP), enabling a broad range of applications from text generation to complex problem-solving. However, these models often struggle with verifying the reliability of their outputs for complex tasks. Chain-of-Thought (CoT) reasoning, a technique that asks LLMs to generate step-by-step reasoning paths, attempts to address the challenge by making reasoning steps explicit, yet it falls short when assumptions of process faithfulness are unmet, leading to inaccuracies. This reveals a critical gap: the absence of a mechanism to handle inherent uncertainties in reasoning processes. To bridge this gap, we propose a novel approach, the Chain of Uncertainty Reasoning (CUR), which integrates uncertainty management into LLMs’ reasoning. CUR employs prompt-based techniques to express uncertainty effectively and leverages a structured approach to introduce uncertainty through a small number of samples. This enables the model to self-assess its uncertainty and adapt to different perspectives, thus enhancing the faithfulness of its outputs. Experimental results on the datasets of StrategyQA, HotpotQA, and FEVER demonstrate that our method significantly improves performance compared to baselines, confirming the utility of incorporating uncertainty into LLM reasoning processes. This approach offers a promising direction for enhancing the reliability and trustworthiness of LLMs’ applications in various domains. Our code is publicly available at: <span><span>https://github.com/PengZirong/ChainofUncertaintyReasoning</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 30-36"},"PeriodicalIF":3.3,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Function-based labels for complementary recommendation: Definition, annotation, and LLM-as-a-Judge 用于补充推荐的基于函数的标签:定义、注释和法学硕士作为法官
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-28 DOI: 10.1016/j.patrec.2025.11.042
Chihiro Yamasaki, Kai Sugahara, Yuma Nagi, Kazushi Okamoto
Complementary recommendations enhance the user experience by suggesting items that are frequently purchased together while serving different functions from the query item. Inferring or evaluating whether two items have a complementary relationship requires complementary relationship labels; however, defining these labels is challenging because of the inherent ambiguity of such relationships. Complementary labels based on user historical behavior logs attempt to capture these relationships, but often produce inconsistent and unreliable results. Recent efforts have introduced large language models (LLMs) to infer these relationships. However, these approaches provide a binary classification without a nuanced understanding of complementary relationships. In this study, we address these challenges by introducing Function-Based Labels (FBLs), a novel definition of complementary relationships independent of user purchase logs and the opaque decision processes of LLMs. We constructed a human-annotated FBLs dataset comprising 2759 item pairs and demonstrated that it covered possible item relationships and minimized ambiguity. We then evaluated whether machine learning methods using annotated FBLs could accurately infer labels for unseen item pairs, and whether LLM-generated complementary labels align with human perception. Among machine learning methods, ModernBERT achieved the highest performance with a Macro-F1 of 0.911, demonstrating accuracy and robustness even under limited supervision. For LLMs, GPT-4o-mini achieved high consistency (0.989) and classification accuracy (0.849) under the detailed FBL definition, while requiring only 1/842 the cost and 1/75 the time of human annotation. Overall, our study presents FBLs as a clear definition of complementary relationships, enabling more accurate inferences and automated labeling of complementary recommendations.
补充推荐通过推荐经常一起购买的商品来增强用户体验,同时提供与查询商品不同的功能。推断或评价两个项目是否具有互补关系需要互补关系标签;然而,由于这些关系固有的模糊性,定义这些标签是具有挑战性的。基于用户历史行为日志的补充标签试图捕捉这些关系,但经常产生不一致和不可靠的结果。最近的努力引入了大型语言模型(llm)来推断这些关系。然而,这些方法提供了一种二元分类,而没有对互补关系进行细致入微的理解。在本研究中,我们通过引入基于功能的标签(FBLs)来解决这些挑战,FBLs是一种独立于用户购买日志和llm不透明决策过程的互补关系的新定义。我们构建了一个包含2759个项目对的人工注释FBLs数据集,并证明它涵盖了可能的项目关系和最小化的歧义。然后,我们评估了使用带注释的fbl的机器学习方法是否可以准确地推断出未见项目对的标签,以及llm生成的互补标签是否与人类感知一致。在机器学习方法中,ModernBERT取得了最高的性能,其Macro-F1为0.911,即使在有限的监督下也表现出准确性和鲁棒性。对于llm, gpt - 40 -mini在详细的FBL定义下获得了较高的一致性(0.989)和分类准确率(0.849),而成本仅为人工标注的1/842,时间仅为人工标注的1/75。总的来说,我们的研究将FBLs作为互补关系的明确定义,从而实现更准确的推断和互补推荐的自动标记。
{"title":"Function-based labels for complementary recommendation: Definition, annotation, and LLM-as-a-Judge","authors":"Chihiro Yamasaki,&nbsp;Kai Sugahara,&nbsp;Yuma Nagi,&nbsp;Kazushi Okamoto","doi":"10.1016/j.patrec.2025.11.042","DOIUrl":"10.1016/j.patrec.2025.11.042","url":null,"abstract":"<div><div>Complementary recommendations enhance the user experience by suggesting items that are frequently purchased together while serving different functions from the query item. Inferring or evaluating whether two items have a complementary relationship requires complementary relationship labels; however, defining these labels is challenging because of the inherent ambiguity of such relationships. Complementary labels based on user historical behavior logs attempt to capture these relationships, but often produce inconsistent and unreliable results. Recent efforts have introduced large language models (LLMs) to infer these relationships. However, these approaches provide a binary classification without a nuanced understanding of complementary relationships. In this study, we address these challenges by introducing Function-Based Labels (FBLs), a novel definition of complementary relationships independent of user purchase logs and the opaque decision processes of LLMs. We constructed a human-annotated FBLs dataset comprising 2759 item pairs and demonstrated that it covered possible item relationships and minimized ambiguity. We then evaluated whether machine learning methods using annotated FBLs could accurately infer labels for unseen item pairs, and whether LLM-generated complementary labels align with human perception. Among machine learning methods, ModernBERT achieved the highest performance with a Macro-F1 of 0.911, demonstrating accuracy and robustness even under limited supervision. For LLMs, GPT-4o-mini achieved high consistency (0.989) and classification accuracy (0.849) under the detailed FBL definition, while requiring only 1/842 the cost and 1/75 the time of human annotation. Overall, our study presents FBLs as a clear definition of complementary relationships, enabling more accurate inferences and automated labeling of complementary recommendations.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 8-15"},"PeriodicalIF":3.3,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
End-to-end interactive joint model: Clause-phrase multi-task learning for suicidal ideation cause extraction (SICE) in Chinese Weibo text 端到端交互联合模型:中文微博文本自杀意念原因抽取的子句-短语多任务学习
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-27 DOI: 10.1016/j.patrec.2025.11.036
Qi Fu , Yuhao Zhang , Dexi Liu , Liyuan Zhang , Wenzhong Peng
Suicide prevention has been a critical research focus for governments, mental health professionals, and social work researchers worldwide. With the increasing number of individuals seeking help through social networks and psychological counseling platforms, timely analysis of the causes of Suicidal Ideation (SI) in help-seeking texts can provide scientific evidence and actionable insights for suicide prevention efforts. Existing approaches face challenges: (i) SIC clause extraction is coarse-grained and thus imprecise in localization; (ii) SIC phrase extraction is more precise but inherently harder. To address this, we propose an end-to-end interactive joint model (EIJM) based on a clause-phrase multi-task learning (MTL) framework, where SIC phrase extraction serves as the main task and SIC clause extraction as the auxiliary task. By leveraging joint learning, EIJM enhances extraction accuracy while reducing task difficulty. Experimental results demonstrate that EIJM outperforms the two-stage independent multi-task (2SIM) approach across multiple evaluation metrics. Specifically, in the SIC phrase extraction task, EIJM achieves a 1.1 % improvement in recall over 2SIM without compromising precision. In the SIC clause extraction task, EIJM improves precision, recall, and F1-score by 0.4 %, 0.9 %, and 0.7 %, respectively. Furthermore, in 2SIM, incorporating clause-level representations from the auxiliary task into the main task enhances local matching and fuzzy matching metrics, with the Fuzzy Match method improving the most by 0.9 %. However, it yielded limited improvement in exact matching performance.
自杀预防一直是世界各国政府、精神卫生专业人员和社会工作研究人员的重要研究重点。随着越来越多的人通过社交网络和心理咨询平台寻求帮助,及时分析求助文本中自杀意念(SI)的原因可以为自杀预防工作提供科学依据和可操作的见解。现有的方法面临挑战:(1)SIC子句提取是粗粒度的,因此定位不精确;(ii) SIC短语提取更精确,但本质上更困难。为了解决这个问题,我们提出了一个基于子句-短语多任务学习(MTL)框架的端到端交互联合模型(EIJM),其中SIC短语提取作为主任务,SIC小句提取作为辅助任务。通过利用联合学习,EIJM提高了提取的准确性,同时降低了任务难度。实验结果表明,EIJM在多个评价指标上优于两阶段独立多任务(2SIM)方法。具体来说,在SIC短语提取任务中,EIJM在不影响精度的情况下,在2SIM的召回率上提高了1.1%。在SIC子句提取任务中,EIJM分别将准确率、召回率和f1分数提高了0.4%、0.9%和0.7%。此外,在2SIM中,将辅助任务的子句级表示合并到主任务中可以增强局部匹配和模糊匹配度量,其中模糊匹配方法的改进幅度最大,提高了0.9%。然而,它在精确匹配性能方面的改进有限。
{"title":"End-to-end interactive joint model: Clause-phrase multi-task learning for suicidal ideation cause extraction (SICE) in Chinese Weibo text","authors":"Qi Fu ,&nbsp;Yuhao Zhang ,&nbsp;Dexi Liu ,&nbsp;Liyuan Zhang ,&nbsp;Wenzhong Peng","doi":"10.1016/j.patrec.2025.11.036","DOIUrl":"10.1016/j.patrec.2025.11.036","url":null,"abstract":"<div><div>Suicide prevention has been a critical research focus for governments, mental health professionals, and social work researchers worldwide. With the increasing number of individuals seeking help through social networks and psychological counseling platforms, timely analysis of the causes of Suicidal Ideation (SI) in help-seeking texts can provide scientific evidence and actionable insights for suicide prevention efforts. Existing approaches face challenges: (i) SIC clause extraction is coarse-grained and thus imprecise in localization; (ii) SIC phrase extraction is more precise but inherently harder. To address this, we propose an end-to-end interactive joint model (EIJM) based on a clause-phrase multi-task learning (MTL) framework, where SIC phrase extraction serves as the main task and SIC clause extraction as the auxiliary task. By leveraging joint learning, EIJM enhances extraction accuracy while reducing task difficulty. Experimental results demonstrate that EIJM outperforms the two-stage independent multi-task (2SIM) approach across multiple evaluation metrics. Specifically, in the SIC phrase extraction task, EIJM achieves a 1.1 % improvement in recall over 2SIM without compromising precision. In the SIC clause extraction task, EIJM improves precision, recall, and F1-score by 0.4 %, 0.9 %, and 0.7 %, respectively. Furthermore, in 2SIM, incorporating clause-level representations from the auxiliary task into the main task enhances local matching and fuzzy matching metrics, with the Fuzzy Match method improving the most by 0.9 %. However, it yielded limited improvement in exact matching performance.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 1-7"},"PeriodicalIF":3.3,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145658791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Special section: CIARP-24 特殊章节:CIARP-24
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-27 DOI: 10.1016/j.patrec.2025.11.039
Sergio A. Velastin , Ruber Hernández-García
The Iberoamerican Congress on Pattern Recognition (CIARP) is a well-established scientific event, endorsed by the International Association for Pattern Recognition (IAPR), that focuses on all aspects of pattern recognition, computer vision, artificial intelligence, data mining, and related areas. Since 1995, it has provided an important forum for researchers in IberoAmerica and beyond for presenting ongoing research, scientific results, and experiences on mathematical models, computational methods, and their applications in areas such as robotics, industry, health, space exploration, telecommunications, document analysis, and natural language processing. CIARP has helped strengthening regional cooperation and had contributed to the development of emerging research groups across Iberoamerica. The 27th edition, was held at Universidad Católica del Maule in Talca, Chile, from November 26-29, 2024, and comprised an engaging four-day program of single-track sessions, tutorials, and invited keynotes. I had the privilege to be its Program Chair. As guest editor of this Special Section, I am pleased to introduce fully extended and peer-reviewed versions of the two papers that were awarded best paper prizes in CIAPR-24. In the first one, from Argentina and Uruguay, [1] expand their work to describe a multi-sensor approach for automatic precipitation remote sensing detection using Conditional GANs and Recurrent Networks of special relevance in places where precipitations are not very common events. They integrate satellite infrared brightness temperature (IR-BT) with lighting temporal signals and argue that their proposed architecture achieves better precision than alternative methods. They suggest that their results have potential applications in cyanobacteria bloom event prediction and to help setting social policies for water resource management. This is a good example on how pattern recognition research may have a clear impact. In the second paper, from Chile, [2] extend their previous work and consider the problem of dealing with Out-Of-Distribution (ODD) data in text classification. They propose a new method, BBMOE, based on bimodal beta mixture distribution that fine-tunes pre- trained models using labeled OOD data with a bimodal Beta mixture distribution regularization that enhances differentiation between near-OOD and far-OOD data in multi-class text classification. Their results show improvements over the state-of-the-art for various datasets. We thank the authors and the reviewers for their thorough work and hope that you enjoy reading these papers and perhaps consider submitting work to a future CIARP.
伊比利亚美洲模式识别大会(CIARP)是由国际模式识别协会(IAPR)认可的一项成熟的科学活动,重点关注模式识别、计算机视觉、人工智能、数据挖掘和相关领域的各个方面。自1995年以来,它为伊比利亚美洲和其他地区的研究人员提供了一个重要论坛,介绍关于数学模型、计算方法及其在机器人、工业、卫生、空间探索、电信、文件分析和自然语言处理等领域的应用的正在进行的研究、科学成果和经验。该研究所帮助加强了区域合作,并为伊比利亚美洲各地新兴研究小组的发展作出了贡献。第27届会议于2024年11月26日至29日在智利塔尔卡(Talca)的universsidad Católica del Maule举行,为期四天,包括单轨会议、教程和受邀主题演讲。我有幸成为了它的项目主席。作为本专题的客座编辑,我很高兴向大家介绍这两篇获得CIAPR-24最佳论文奖的论文的完整扩展版和同行评议版。在第一篇来自阿根廷和乌拉圭的论文中,[1]扩展了他们的工作,描述了一种多传感器方法,用于在降水不常见的地方使用条件gan和循环网络进行自动降水遥感检测。他们将卫星红外亮度温度(IR-BT)与照明时间信号相结合,并认为他们提出的架构比其他方法具有更好的精度。他们认为,他们的研究结果在蓝藻水华事件预测和帮助制定水资源管理的社会政策方面具有潜在的应用价值。这是一个很好的例子,说明模式识别研究可能会产生明显的影响。在第二篇论文中,来自智利的[2]扩展了他们之前的工作,并考虑了文本分类中out - distribution (ODD)数据的处理问题。他们提出了一种基于双峰beta混合分布的新方法BBMOE,该方法使用带有双峰beta混合分布正则化的标记OOD数据对预训练模型进行微调,从而增强了多类文本分类中近OOD和远OOD数据的区分。他们的结果显示了对各种数据集的改进。我们感谢作者和审稿人的全面工作,并希望您喜欢阅读这些论文,并考虑将工作提交给未来的CIARP。
{"title":"Special section: CIARP-24","authors":"Sergio A. Velastin ,&nbsp;Ruber Hernández-García","doi":"10.1016/j.patrec.2025.11.039","DOIUrl":"10.1016/j.patrec.2025.11.039","url":null,"abstract":"<div><div>The Iberoamerican Congress on Pattern Recognition (CIARP) is a well-established scientific event, endorsed by the International Association for Pattern Recognition (IAPR), that focuses on all aspects of pattern recognition, computer vision, artificial intelligence, data mining, and related areas. Since 1995, it has provided an important forum for researchers in IberoAmerica and beyond for presenting ongoing research, scientific results, and experiences on mathematical models, computational methods, and their applications in areas such as robotics, industry, health, space exploration, telecommunications, document analysis, and natural language processing. CIARP has helped strengthening regional cooperation and had contributed to the development of emerging research groups across Iberoamerica. The 27th edition, was held at Universidad Católica del Maule in Talca, Chile, from November 26-29, 2024, and comprised an engaging four-day program of single-track sessions, tutorials, and invited keynotes. I had the privilege to be its Program Chair. As guest editor of this Special Section, I am pleased to introduce fully extended and peer-reviewed versions of the two papers that were awarded best paper prizes in CIAPR-24. In the first one, from Argentina and Uruguay, <span><span>[1]</span></span> expand their work to describe a multi-sensor approach for automatic precipitation remote sensing detection using Conditional GANs and Recurrent Networks of special relevance in places where precipitations are not very common events. They integrate satellite infrared brightness temperature (IR-BT) with lighting temporal signals and argue that their proposed architecture achieves better precision than alternative methods. They suggest that their results have potential applications in cyanobacteria bloom event prediction and to help setting social policies for water resource management. This is a good example on how pattern recognition research may have a clear impact. In the second paper, from Chile, <span><span>[2]</span></span> extend their previous work and consider the problem of dealing with Out-Of-Distribution (ODD) data in text classification. They propose a new method, BBMOE, based on bimodal beta mixture distribution that fine-tunes pre- trained models using labeled OOD data with a bimodal Beta mixture distribution regularization that enhances differentiation between near-OOD and far-OOD data in multi-class text classification. Their results show improvements over the state-of-the-art for various datasets. We thank the authors and the reviewers for their thorough work and hope that you enjoy reading these papers and perhaps consider submitting work to a future CIARP.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Page 149"},"PeriodicalIF":3.3,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Monocular 3D lane detection with geometry-guided transformation and contextual enhancement 基于几何引导变换和上下文增强的单目三维车道检测
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-26 DOI: 10.1016/j.patrec.2025.11.041
Chunying Song, Qiong Wang, Zeren Sun, Huafeng Liu
Monocular 3D lane detection is a critical yet challenging task in autonomous driving, largely due to the lack of depth cues, complex road geometries, and appearance variations in real-world environments. Existing approaches often depend on bird’s-eye-view transformations or rigid geometric assumptions, which may introduce projection artifacts and hinder generalization. In this paper, we present GeoCNet, a BEV-free framework that directly estimates 3D lanes in the perspective domain. The architecture incorporates three key components: a Geometry-Guided Spatial Transformer (GST) for adaptive multi-plane ground modeling, a Perception-Aware Feature Modulation (PFM) module for context-driven feature refinement, and a Structure-Aware Lane Decoder (SALD) that reconstructs lanes as curvature-regularized anchor-aligned sequences. Extensive experiments on the OpenLane dataset demonstrate that GeoCNet achieves competitive performance in overall accuracy and shows clear improvements in challenging conditions such as night scenes and complex intersections. Additional evaluation on the Apollo Synthetic dataset further confirms the robustness and cross-domain generalization of the proposed framework. These results underscore the effectiveness of jointly leveraging geometry and contextual cues for accurate and reliable monocular 3D lane detection. Our code has been released at https://github.com/chunyingsong/GeoCNet.
单目3D车道检测在自动驾驶中是一项关键但具有挑战性的任务,主要原因是缺乏深度线索、复杂的道路几何形状以及现实环境中的外观变化。现有的方法通常依赖于鸟瞰图变换或严格的几何假设,这可能会引入投影伪影并阻碍泛化。在本文中,我们提出了GeoCNet,这是一个无bev的框架,可以直接估计透视域中的3D车道。该架构包含三个关键组件:用于自适应多平面地面建模的几何引导空间变压器(GST),用于上下文驱动特征优化的感知感知特征调制(PFM)模块,以及用于将车道重建为曲率正则化锚对齐序列的结构感知车道解码器(SALD)。在OpenLane数据集上进行的大量实验表明,GeoCNet在整体精度方面取得了具有竞争力的性能,并在具有挑战性的条件下(如夜景和复杂的十字路口)显示出明显的改进。对Apollo合成数据集的额外评估进一步证实了所提出框架的鲁棒性和跨域泛化。这些结果强调了联合利用几何和上下文线索进行准确可靠的单目3D车道检测的有效性。我们的代码已在https://github.com/chunyingsong/GeoCNet上发布。
{"title":"Monocular 3D lane detection with geometry-guided transformation and contextual enhancement","authors":"Chunying Song,&nbsp;Qiong Wang,&nbsp;Zeren Sun,&nbsp;Huafeng Liu","doi":"10.1016/j.patrec.2025.11.041","DOIUrl":"10.1016/j.patrec.2025.11.041","url":null,"abstract":"<div><div>Monocular 3D lane detection is a critical yet challenging task in autonomous driving, largely due to the lack of depth cues, complex road geometries, and appearance variations in real-world environments. Existing approaches often depend on bird’s-eye-view transformations or rigid geometric assumptions, which may introduce projection artifacts and hinder generalization. In this paper, we present GeoCNet, a BEV-free framework that directly estimates 3D lanes in the perspective domain. The architecture incorporates three key components: a Geometry-Guided Spatial Transformer (GST) for adaptive multi-plane ground modeling, a Perception-Aware Feature Modulation (PFM) module for context-driven feature refinement, and a Structure-Aware Lane Decoder (SALD) that reconstructs lanes as curvature-regularized anchor-aligned sequences. Extensive experiments on the OpenLane dataset demonstrate that GeoCNet achieves competitive performance in overall accuracy and shows clear improvements in challenging conditions such as night scenes and complex intersections. Additional evaluation on the Apollo Synthetic dataset further confirms the robustness and cross-domain generalization of the proposed framework. These results underscore the effectiveness of jointly leveraging geometry and contextual cues for accurate and reliable monocular 3D lane detection. Our code has been released at <span><span>https://github.com/chunyingsong/GeoCNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 278-284"},"PeriodicalIF":3.3,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DBIDM: Implementing blind image separation through a dual branch interactive diffusion model DBIDM:通过双分支交互扩散模型实现图像盲分离
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-26 DOI: 10.1016/j.patrec.2025.11.038
Jiaxin Gong, Jindong Xu, Haoqin Sun
In the field of Blind Image Separation (BIS), typical applications include rain/snow removal and reflection/shadow layer separation. However, the existing BIS methods generally rely on artificial priors or variants based on CNN or GAN, which are difficult to describe the complex and variable feature distribution of the source image in the real scene, resulting in defects such as source image separation estimation bias, texture distortion and artifact residue in the case of strong noise, nonlinear mixing and high coupling of texture details. To address this issue, this paper innovatively introduces diffusion models into BIS and proposes an efficient Dual Branch Interactive Diffusion Model (DBIDM). DBIDM employs a conditional diffusion model to learn the feature distribution of source images and performs an initial reconstruction of source image feature structures. Furthermore, considering that the two source images are mutually coupled with noise, we designed a Wavelet Interactive Decoupling Module (WIDM). This module is integrated into the diffusion denoising process to improve the separation of detailed information in mixed images. Experiments on synthetic datasets containing rain/snow and complex mixed interference demonstrate that the proposed DBIDM method achieves breakthrough performance in both image restoration and blind separation tasks. Specifically, in single-source degraded scenarios, DBIDM reaches optimal levels of 35.0023 dB (PSNR) and 0.9549 (SSIM) in the rain removal task. It outperforms comparison method by an average of 1.2570 dB and 0.0262. For the snow removal task, improvements of 0.9272 dB and 0.0289 over the second-best indicators are also achieved. In complex dual blind separation scenarios, the restored dual-source images significantly surpass other methods in terms of texture fidelity and detail integrity. There are improvements of 4.1249 dB in PSNR and 0.0926 in SSIM. This effectively addresses the issues of information loss and artifact remnants caused by complex coupling interference.
在盲图像分离(BIS)领域,典型的应用包括雨/雪去除和反射/阴影层分离。然而,现有的BIS方法一般依赖于基于CNN或GAN的人工先验或变体,难以描述真实场景中源图像复杂多变的特征分布,导致在强噪声、非线性混合和纹理细节高耦合的情况下存在源图像分离估计偏差、纹理失真和伪影残留等缺陷。为了解决这一问题,本文创新性地将扩散模型引入到BIS中,提出了一种高效的双分支交互扩散模型(DBIDM)。DBIDM采用条件扩散模型学习源图像的特征分布,对源图像特征结构进行初始重建。此外,考虑到两个源图像相互耦合并带有噪声,我们设计了一个小波交互解耦模块(WIDM)。该模块被集成到扩散去噪过程中,以提高混合图像中详细信息的分离。在包含雨雪和复杂混合干扰的合成数据集上的实验表明,DBIDM方法在图像恢复和盲分离任务上都取得了突破性的性能。其中,在单源退化场景下,DBIDM在降雨任务中分别达到35.0023 dB (PSNR)和0.9549 (SSIM)的最优水平。比比较法平均高出1.2570 dB和0.0262 dB。对于除雪任务,也比次优指标提高了0.9272 dB和0.0289。在复杂的双盲分离场景下,恢复的双源图像在纹理保真度和细节完整性方面明显优于其他方法。PSNR提高4.1249 dB, SSIM提高0.0926 dB。这有效地解决了由复杂耦合干扰引起的信息丢失和工件残留问题。
{"title":"DBIDM: Implementing blind image separation through a dual branch interactive diffusion model","authors":"Jiaxin Gong,&nbsp;Jindong Xu,&nbsp;Haoqin Sun","doi":"10.1016/j.patrec.2025.11.038","DOIUrl":"10.1016/j.patrec.2025.11.038","url":null,"abstract":"<div><div>In the field of Blind Image Separation (BIS), typical applications include rain/snow removal and reflection/shadow layer separation. However, the existing BIS methods generally rely on artificial priors or variants based on CNN or GAN, which are difficult to describe the complex and variable feature distribution of the source image in the real scene, resulting in defects such as source image separation estimation bias, texture distortion and artifact residue in the case of strong noise, nonlinear mixing and high coupling of texture details. To address this issue, this paper innovatively introduces diffusion models into BIS and proposes an efficient Dual Branch Interactive Diffusion Model (DBIDM). DBIDM employs a conditional diffusion model to learn the feature distribution of source images and performs an initial reconstruction of source image feature structures. Furthermore, considering that the two source images are mutually coupled with noise, we designed a Wavelet Interactive Decoupling Module (WIDM). This module is integrated into the diffusion denoising process to improve the separation of detailed information in mixed images. Experiments on synthetic datasets containing rain/snow and complex mixed interference demonstrate that the proposed DBIDM method achieves breakthrough performance in both image restoration and blind separation tasks. Specifically, in single-source degraded scenarios, DBIDM reaches optimal levels of 35.0023 dB (PSNR) and 0.9549 (SSIM) in the rain removal task. It outperforms comparison method by an average of 1.2570 dB and 0.0262. For the snow removal task, improvements of 0.9272 dB and 0.0289 over the second-best indicators are also achieved. In complex dual blind separation scenarios, the restored dual-source images significantly surpass other methods in terms of texture fidelity and detail integrity. There are improvements of 4.1249 dB in PSNR and 0.0926 in SSIM. This effectively addresses the issues of information loss and artifact remnants caused by complex coupling interference.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 44-51"},"PeriodicalIF":3.3,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing demographic bias in brain age prediction models using multiple deep learning paradigms 使用多种深度学习范式评估脑年龄预测模型中的人口统计学偏差
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-25 DOI: 10.1016/j.patrec.2025.11.029
Michela Gravina , Giuseppe Pontillo , Zeena Shawa , James H. Cole , Carlo Sansone
Predicting brain age from structural Magnetic Resonance Imaging (MRI) has emerged as a critical task at the intersection of medical imaging and Artificial Intelligence, with Deep learning (DL) models achieving state-of-the-art performance. However, despite their predictive power, such models remain susceptible to algorithmic bias, especially when applied to populations whose demographic characteristics differ from those seen during training. In this paper, we investigate how demographic factors influence the performance of brain age prediction models. We leverage a large, demographically diverse MRI dataset including 7480 healthy subjects (3599 female and 3881 male) spanning three major racial groups: White, Black, and Asian. To explore the effects of data composition and model architecture on generalization, we design and compare multiple training paradigms, including models trained on single group and a Multi-Input architecture that explicitly incorporates demographic metadata. Results on an external test set including 3194 subjects (2162 White, 694 Black, and 338 Asian) reveal evidence of demographic bias, with the Multi-Input model achieving the most balanced performance across groups (mean absolute error: 2.94 ± 0.07 for White, 2.91 ± 0.16 for Black, and 3.34 ± 0.17 for Asian subjects). These findings highlight the need for fairness-aware approaches, advocating for strategies that mitigate bias, and enhance generalizability.
通过结构磁共振成像(MRI)预测大脑年龄已经成为医学成像和人工智能交叉领域的一项关键任务,深度学习(DL)模型实现了最先进的性能。然而,尽管这些模型具有预测能力,但它们仍然容易受到算法偏差的影响,特别是在应用于人口统计学特征与训练期间所看到的不同的人群时。在本文中,我们研究人口因素如何影响脑年龄预测模型的性能。我们利用了一个庞大的、人口统计学上多样化的MRI数据集,包括7480名健康受试者(3599名女性和3881名男性),涵盖三个主要种族:白人、黑人和亚洲人。为了探索数据组成和模型架构对泛化的影响,我们设计并比较了多个训练范式,包括单组训练的模型和明确包含人口统计元数据的多输入架构。包括3194名受试者(2162名白人、694名黑人和338名亚洲人)在内的外部测试集的结果显示了人口统计学偏差的证据,多输入模型在各组间的表现最为平衡(白人受试者的平均绝对误差为2.94±0.07,黑人受试者的平均绝对误差为2.91±0.16,亚洲受试者的平均绝对误差为3.34±0.17)。这些发现强调了对公平意识方法的需求,倡导减轻偏见和提高普遍性的策略。
{"title":"Assessing demographic bias in brain age prediction models using multiple deep learning paradigms","authors":"Michela Gravina ,&nbsp;Giuseppe Pontillo ,&nbsp;Zeena Shawa ,&nbsp;James H. Cole ,&nbsp;Carlo Sansone","doi":"10.1016/j.patrec.2025.11.029","DOIUrl":"10.1016/j.patrec.2025.11.029","url":null,"abstract":"<div><div>Predicting brain age from structural Magnetic Resonance Imaging (MRI) has emerged as a critical task at the intersection of medical imaging and Artificial Intelligence, with Deep learning (DL) models achieving state-of-the-art performance. However, despite their predictive power, such models remain susceptible to algorithmic bias, especially when applied to populations whose demographic characteristics differ from those seen during training. In this paper, we investigate how demographic factors influence the performance of brain age prediction models. We leverage a large, demographically diverse MRI dataset including 7480 healthy subjects (3599 female and 3881 male) spanning three major racial groups: White, Black, and Asian. To explore the effects of data composition and model architecture on generalization, we design and compare multiple training paradigms, including models trained on single group and a Multi-Input architecture that explicitly incorporates demographic metadata. Results on an external test set including 3194 subjects (2162 White, 694 Black, and 338 Asian) reveal evidence of demographic bias, with the Multi-Input model achieving the most balanced performance across groups (mean absolute error: 2.94 ± 0.07 for White, 2.91 ± 0.16 for Black, and 3.34 ± 0.17 for Asian subjects). These findings highlight the need for fairness-aware approaches, advocating for strategies that mitigate bias, and enhance generalizability.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 246-253"},"PeriodicalIF":3.3,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145617786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1