首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
P-RoPE: A polar-based rotary position embedding for polar transformed images in rotation-invariant tasks P-RoPE:一种基于极的旋转位置嵌入方法,用于旋转不变任务中的极变换图像
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-01 Epub Date: 2025-11-29 DOI: 10.1016/j.patrec.2025.11.037
Stavros N. Moutsis , Konstantinos A. Tsintotas , Ioannis Kansizoglou , Antonios Gasteratos
Rotation-invariant frameworks are crucial in many computer vision tasks, such as human action recognition (HAR), especially when applied in real-world scenarios. Since most datasets, including those on fall detection, have been generated in controlled environments with fixed camera angles, heights, and movements, approaches developed to address such tasks tend to fail when individual appearance variations occur. To address this challenge, our study proposes the use of the EVA-02-Ti lightweight vision transformer for processing people’s polar mappings and handling the task of fall detection. In particular, we strive to leverage the transformation’s rotation-invariant characteristic and correctly classify the rotated images. Towards this goal, a polar-based rotary position embedding (P-RoPE), which generates relative positions among polar patches according to r and θ axes instead of the Cartesian x and y axes, is presented. Replacing the original RoPE, we achieve an enhancement of ViT’s performance, as demonstrated in our experimental protocol, while it also outperforms a state-of-the-art approach. An evaluation was conducted on E-FPDS and VFP290k, where training was performed on initial images and testing was performed on the rotated ones. Finally, when assessed on Fashion-MNIST-rot-12k, a standard dataset for rotation-invariant scenarios, P-RoPE again surpasses both the baseline version and another benchmark method.
旋转不变框架在许多计算机视觉任务中至关重要,例如人类动作识别(HAR),特别是在实际场景中应用时。由于大多数数据集(包括跌倒检测数据集)都是在固定摄像机角度、高度和运动的受控环境中生成的,因此当个体外观发生变化时,用于解决此类任务的方法往往会失败。为了解决这一挑战,我们的研究提出使用EVA-02-Ti轻型视觉变压器来处理人的极性映射和处理跌倒检测任务。特别是,我们努力利用变换的旋转不变性特征并正确分类旋转图像。为了实现这一目标,提出了一种基于极的旋转位置嵌入(P-RoPE)方法,该方法根据r轴和θ轴而不是笛卡尔的x轴和y轴来生成极块之间的相对位置。正如我们的实验方案所证明的那样,取代原始RoPE,我们实现了ViT性能的增强,同时它也优于最先进的方法。对E-FPDS和VFP290k进行了评估,其中对初始图像进行了训练,对旋转图像进行了测试。最后,在fashion - mist -rot-12k(一个用于旋转不变场景的标准数据集)上进行评估时,P-RoPE再次超过了基线版本和另一种基准方法。
{"title":"P-RoPE: A polar-based rotary position embedding for polar transformed images in rotation-invariant tasks","authors":"Stavros N. Moutsis ,&nbsp;Konstantinos A. Tsintotas ,&nbsp;Ioannis Kansizoglou ,&nbsp;Antonios Gasteratos","doi":"10.1016/j.patrec.2025.11.037","DOIUrl":"10.1016/j.patrec.2025.11.037","url":null,"abstract":"<div><div>Rotation-invariant frameworks are crucial in many computer vision tasks, such as human action recognition (HAR), especially when applied in real-world scenarios. Since most datasets, including those on fall detection, have been generated in controlled environments with fixed camera angles, heights, and movements, approaches developed to address such tasks tend to fail when individual appearance variations occur. To address this challenge, our study proposes the use of the EVA-02-Ti lightweight vision transformer for processing people’s polar mappings and handling the task of fall detection. In particular, we strive to leverage the transformation’s rotation-invariant characteristic and correctly classify the rotated images. Towards this goal, a polar-based rotary position embedding (P-RoPE), which generates relative positions among polar patches according to <em>r</em> and <em>θ</em> axes instead of the Cartesian <em>x</em> and <em>y</em> axes, is presented. Replacing the original RoPE, we achieve an enhancement of ViT’s performance, as demonstrated in our experimental protocol, while it also outperforms a state-of-the-art approach. An evaluation was conducted on E-FPDS and VFP290k, where training was performed on initial images and testing was performed on the rotated ones. Finally, when assessed on Fashion-MNIST-rot-12k, a standard dataset for rotation-invariant scenarios, P-RoPE again surpasses both the baseline version and another benchmark method.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 23-29"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive recursive channel selection for robust decoding of motor imagery EEG signal in patients with intracerebral hemorrhage 自适应递归通道选择在脑出血患者运动图像脑电信号鲁棒解码中的应用
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-01 Epub Date: 2025-12-16 DOI: 10.1016/j.patrec.2025.12.004
Shengjie Li , Jian Shi , Danyang Chen , Zheng Zhu , Feng Hu , Wei Jiang , Kai Shu , Zheng You , Ping Zhang , Zhouping Tang
In the study of electroencephalography (EEG)-based motor imagery (MI) brain-computer interfaces (BCIs), neurorehabilitation technologies hold significant potential for recovering from intracerebral hemorrhage (ICH). However, the rehabilitation process is hindered as the clinical practicality of such systems is reduced considerably due to their lengthy setup procedures caused by excessive number of channels. Accordingly, this study proposes a channel selection method based on an adaptive recursive learning framework, which establishes a comprehensive evaluation metric by combining time-frequency domain features. Experimental results demonstrate that, upon using 37.50 % fewer channels, the average accuracy of MI classification increased from 65.44 % to 69.28 % in healthy subjects and from 65.00 % to 67.64 % in patients with ICH. This study presents the pioneering EEG-based MI BCI channel selection process specifically designed for ICH patients, paving the way for personalized rehabilitation protocols and facilitating the translation of neurotechnology into clinical practice.
在基于脑电图(EEG)的运动成像(MI)脑机接口(bci)的研究中,神经康复技术在脑出血(ICH)的康复中具有重要的潜力。然而,由于通道数量过多,这些系统的设置过程冗长,大大降低了临床实用性,从而阻碍了康复过程。为此,本研究提出了一种基于自适应递归学习框架的信道选择方法,该方法结合时频域特征建立了综合评价指标。实验结果表明,在减少37.50%通道后,健康受试者的心肌梗死分类平均准确率从65.44%提高到69.28%,脑出血患者的心肌梗死分类平均准确率从65.00%提高到67.64%。本研究提出了专门为脑出血患者设计的基于脑电图的MI BCI通道选择过程,为个性化康复方案铺平了道路,并促进了神经技术向临床实践的转化。
{"title":"Adaptive recursive channel selection for robust decoding of motor imagery EEG signal in patients with intracerebral hemorrhage","authors":"Shengjie Li ,&nbsp;Jian Shi ,&nbsp;Danyang Chen ,&nbsp;Zheng Zhu ,&nbsp;Feng Hu ,&nbsp;Wei Jiang ,&nbsp;Kai Shu ,&nbsp;Zheng You ,&nbsp;Ping Zhang ,&nbsp;Zhouping Tang","doi":"10.1016/j.patrec.2025.12.004","DOIUrl":"10.1016/j.patrec.2025.12.004","url":null,"abstract":"<div><div>In the study of electroencephalography (EEG)-based motor imagery (MI) brain-computer interfaces (BCIs), neurorehabilitation technologies hold significant potential for recovering from intracerebral hemorrhage (ICH). However, the rehabilitation process is hindered as the clinical practicality of such systems is reduced considerably due to their lengthy setup procedures caused by excessive number of channels. Accordingly, this study proposes a channel selection method based on an adaptive recursive learning framework, which establishes a comprehensive evaluation metric by combining time-frequency domain features. Experimental results demonstrate that, upon using 37.50 % fewer channels, the average accuracy of MI classification increased from 65.44 % to 69.28 % in healthy subjects and from 65.00 % to 67.64 % in patients with ICH. This study presents the pioneering EEG-based MI BCI channel selection process specifically designed for ICH patients, paving the way for personalized rehabilitation protocols and facilitating the translation of neurotechnology into clinical practice.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 95-101"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal-Ex: Causal graph-based micro and macro expression spotting 因果关系:基于因果图的微观和宏观表达定位
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-01 Epub Date: 2025-12-05 DOI: 10.1016/j.patrec.2025.12.002
Pei-Sze Tan, Sailaja Rajanala, Arghya Pal, Raphaël C.-W. Phan, Huey-Fang Ong
Detecting concealed emotions within apparently normal expressions is crucial for identifying potential mental health issues and facilitating timely support and intervention. The task of spotting macro- and micro-expressions involves predicting the emotional timeline within a video by identifying the onset (i.e., the beginning), apex (the peak of emotion), and offset (the end of emotion) frames of the displayed emotions. More particularly, closely monitoring the key emotion-conveying regions of the face; namely, the foundational muscle-movement cues known as facial action units (AUs)–greatly aids in the clear identification of micro-expressions. One major roadblock is the inadvertent introduction of biases into the training process, which degrades performance regardless of feature quality. Biases are spurious factors that falsely inflate or deflate performance metrics. For instance, the neural networks tend to falsely attribute certain AUs in specific facial regions to particular emotion classes, a phenomenon also termed as Inductive biases. To remove these false attributions, we must identify and mitigate biases that arise from mere correlation between some features and the output class labels. We hence introduce action-unit causal graphs. Unlike the traditional action-unit graph, which connects AUs based solely on spatial adjacency, the causal AU graph is derived from statistical tests and retains edges between AUs only when there is significant evidence that one AU causally influences another. Our model, named Causal-Ex (Causal-based Expression spotting), employs a fast causal inference algorithm to construct a causal graph of facial region of interests (ROIs). This enables us to select causally relevant facial action units in the ROIs. Our work demonstrates improvement in overall F1-scores compared to state-of-the-art approaches with 0.388 on CAS(ME)2 and 0.3701 on SAMM-Long Video datasets. Our code can be found at: https://github.com/noobasuna/causal_ex.git.
在明显正常的表情中发现隐藏的情绪对于识别潜在的心理健康问题和促进及时的支持和干预至关重要。发现宏观和微观表情的任务包括通过识别所显示的情绪的开始(即开始),顶点(情绪的顶峰)和抵消(情绪的结束)帧来预测视频中的情绪时间轴。更具体地说,密切监测面部关键的情绪传递区域;也就是说,被称为面部动作单位(AUs)的基本肌肉运动线索,极大地帮助清晰地识别微表情。一个主要的障碍是在训练过程中无意中引入偏差,这会降低性能,而不管特征质量如何。偏见是虚假的因素,错误地夸大或缩小绩效指标。例如,神经网络倾向于错误地将特定面部区域的某些au归因于特定的情绪类别,这种现象也被称为归纳偏差。为了消除这些错误的归因,我们必须识别和减轻由于一些特征和输出类标签之间的相关性而产生的偏差。因此,我们引入动作单元因果图。与传统的行动单元图(仅基于空间邻接性连接AU)不同,因果AU图是从统计测试中得出的,只有当有显著证据表明一个AU对另一个AU产生因果影响时,才保留AU之间的边缘。该模型采用快速因果推理算法构建了面部兴趣区域(roi)的因果图,并命名为causal - ex (causal -based Expression spotting)。这使我们能够在roi中选择因果相关的面部动作单元。我们的工作表明,与最先进的方法相比,总体f1得分有所提高,在CAS(ME)2上为0.388,在SAMM-Long Video数据集上为0.3701。我们的代码可以在https://github.com/noobasuna/causal_ex.git找到。
{"title":"Causal-Ex: Causal graph-based micro and macro expression spotting","authors":"Pei-Sze Tan,&nbsp;Sailaja Rajanala,&nbsp;Arghya Pal,&nbsp;Raphaël C.-W. Phan,&nbsp;Huey-Fang Ong","doi":"10.1016/j.patrec.2025.12.002","DOIUrl":"10.1016/j.patrec.2025.12.002","url":null,"abstract":"<div><div>Detecting concealed emotions within apparently normal expressions is crucial for identifying potential mental health issues and facilitating timely support and intervention. The task of spotting macro- and micro-expressions involves predicting the emotional timeline within a video by identifying the onset (i.e., the beginning), apex (the peak of emotion), and offset (the end of emotion) frames of the displayed emotions. More particularly, closely monitoring the key emotion-conveying regions of the face; namely, the foundational muscle-movement cues known as facial action units (AUs)–greatly aids in the clear identification of micro-expressions. One major roadblock is the inadvertent introduction of biases into the training process, which degrades performance regardless of feature quality. Biases are spurious factors that falsely inflate or deflate performance metrics. For instance, the neural networks tend to falsely attribute certain AUs in specific facial regions to particular emotion classes, a phenomenon also termed as Inductive biases. To remove these false attributions, we must identify and mitigate biases that arise from mere correlation between some features and the output class labels. We hence introduce action-unit causal graphs. Unlike the traditional action-unit graph, which connects AUs based solely on spatial adjacency, the causal AU graph is derived from statistical tests and retains edges between AUs only when there is significant evidence that one AU causally influences another. Our model, named <span>Causal-Ex</span> (<strong>Causal</strong>-based <strong>Ex</strong>pression spotting), employs a fast causal inference algorithm to construct a causal graph of facial region of interests (ROIs). This enables us to select causally relevant facial action units in the ROIs. Our work demonstrates improvement in overall F1-scores compared to state-of-the-art approaches with 0.388 on CAS(ME)<sup>2</sup> and 0.3701 on SAMM-Long Video datasets. Our code can be found at: <span><span>https://github.com/noobasuna/causal_ex.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 52-59"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145749054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Function-based labels for complementary recommendation: Definition, annotation, and LLM-as-a-Judge 用于补充推荐的基于函数的标签:定义、注释和法学硕士作为法官
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-01 Epub Date: 2025-11-28 DOI: 10.1016/j.patrec.2025.11.042
Chihiro Yamasaki, Kai Sugahara, Yuma Nagi, Kazushi Okamoto
Complementary recommendations enhance the user experience by suggesting items that are frequently purchased together while serving different functions from the query item. Inferring or evaluating whether two items have a complementary relationship requires complementary relationship labels; however, defining these labels is challenging because of the inherent ambiguity of such relationships. Complementary labels based on user historical behavior logs attempt to capture these relationships, but often produce inconsistent and unreliable results. Recent efforts have introduced large language models (LLMs) to infer these relationships. However, these approaches provide a binary classification without a nuanced understanding of complementary relationships. In this study, we address these challenges by introducing Function-Based Labels (FBLs), a novel definition of complementary relationships independent of user purchase logs and the opaque decision processes of LLMs. We constructed a human-annotated FBLs dataset comprising 2759 item pairs and demonstrated that it covered possible item relationships and minimized ambiguity. We then evaluated whether machine learning methods using annotated FBLs could accurately infer labels for unseen item pairs, and whether LLM-generated complementary labels align with human perception. Among machine learning methods, ModernBERT achieved the highest performance with a Macro-F1 of 0.911, demonstrating accuracy and robustness even under limited supervision. For LLMs, GPT-4o-mini achieved high consistency (0.989) and classification accuracy (0.849) under the detailed FBL definition, while requiring only 1/842 the cost and 1/75 the time of human annotation. Overall, our study presents FBLs as a clear definition of complementary relationships, enabling more accurate inferences and automated labeling of complementary recommendations.
补充推荐通过推荐经常一起购买的商品来增强用户体验,同时提供与查询商品不同的功能。推断或评价两个项目是否具有互补关系需要互补关系标签;然而,由于这些关系固有的模糊性,定义这些标签是具有挑战性的。基于用户历史行为日志的补充标签试图捕捉这些关系,但经常产生不一致和不可靠的结果。最近的努力引入了大型语言模型(llm)来推断这些关系。然而,这些方法提供了一种二元分类,而没有对互补关系进行细致入微的理解。在本研究中,我们通过引入基于功能的标签(FBLs)来解决这些挑战,FBLs是一种独立于用户购买日志和llm不透明决策过程的互补关系的新定义。我们构建了一个包含2759个项目对的人工注释FBLs数据集,并证明它涵盖了可能的项目关系和最小化的歧义。然后,我们评估了使用带注释的fbl的机器学习方法是否可以准确地推断出未见项目对的标签,以及llm生成的互补标签是否与人类感知一致。在机器学习方法中,ModernBERT取得了最高的性能,其Macro-F1为0.911,即使在有限的监督下也表现出准确性和鲁棒性。对于llm, gpt - 40 -mini在详细的FBL定义下获得了较高的一致性(0.989)和分类准确率(0.849),而成本仅为人工标注的1/842,时间仅为人工标注的1/75。总的来说,我们的研究将FBLs作为互补关系的明确定义,从而实现更准确的推断和互补推荐的自动标记。
{"title":"Function-based labels for complementary recommendation: Definition, annotation, and LLM-as-a-Judge","authors":"Chihiro Yamasaki,&nbsp;Kai Sugahara,&nbsp;Yuma Nagi,&nbsp;Kazushi Okamoto","doi":"10.1016/j.patrec.2025.11.042","DOIUrl":"10.1016/j.patrec.2025.11.042","url":null,"abstract":"<div><div>Complementary recommendations enhance the user experience by suggesting items that are frequently purchased together while serving different functions from the query item. Inferring or evaluating whether two items have a complementary relationship requires complementary relationship labels; however, defining these labels is challenging because of the inherent ambiguity of such relationships. Complementary labels based on user historical behavior logs attempt to capture these relationships, but often produce inconsistent and unreliable results. Recent efforts have introduced large language models (LLMs) to infer these relationships. However, these approaches provide a binary classification without a nuanced understanding of complementary relationships. In this study, we address these challenges by introducing Function-Based Labels (FBLs), a novel definition of complementary relationships independent of user purchase logs and the opaque decision processes of LLMs. We constructed a human-annotated FBLs dataset comprising 2759 item pairs and demonstrated that it covered possible item relationships and minimized ambiguity. We then evaluated whether machine learning methods using annotated FBLs could accurately infer labels for unseen item pairs, and whether LLM-generated complementary labels align with human perception. Among machine learning methods, ModernBERT achieved the highest performance with a Macro-F1 of 0.911, demonstrating accuracy and robustness even under limited supervision. For LLMs, GPT-4o-mini achieved high consistency (0.989) and classification accuracy (0.849) under the detailed FBL definition, while requiring only 1/842 the cost and 1/75 the time of human annotation. Overall, our study presents FBLs as a clear definition of complementary relationships, enabling more accurate inferences and automated labeling of complementary recommendations.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 8-15"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DBIDM: Implementing blind image separation through a dual branch interactive diffusion model DBIDM:通过双分支交互扩散模型实现图像盲分离
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-01 Epub Date: 2025-11-26 DOI: 10.1016/j.patrec.2025.11.038
Jiaxin Gong, Jindong Xu, Haoqin Sun
In the field of Blind Image Separation (BIS), typical applications include rain/snow removal and reflection/shadow layer separation. However, the existing BIS methods generally rely on artificial priors or variants based on CNN or GAN, which are difficult to describe the complex and variable feature distribution of the source image in the real scene, resulting in defects such as source image separation estimation bias, texture distortion and artifact residue in the case of strong noise, nonlinear mixing and high coupling of texture details. To address this issue, this paper innovatively introduces diffusion models into BIS and proposes an efficient Dual Branch Interactive Diffusion Model (DBIDM). DBIDM employs a conditional diffusion model to learn the feature distribution of source images and performs an initial reconstruction of source image feature structures. Furthermore, considering that the two source images are mutually coupled with noise, we designed a Wavelet Interactive Decoupling Module (WIDM). This module is integrated into the diffusion denoising process to improve the separation of detailed information in mixed images. Experiments on synthetic datasets containing rain/snow and complex mixed interference demonstrate that the proposed DBIDM method achieves breakthrough performance in both image restoration and blind separation tasks. Specifically, in single-source degraded scenarios, DBIDM reaches optimal levels of 35.0023 dB (PSNR) and 0.9549 (SSIM) in the rain removal task. It outperforms comparison method by an average of 1.2570 dB and 0.0262. For the snow removal task, improvements of 0.9272 dB and 0.0289 over the second-best indicators are also achieved. In complex dual blind separation scenarios, the restored dual-source images significantly surpass other methods in terms of texture fidelity and detail integrity. There are improvements of 4.1249 dB in PSNR and 0.0926 in SSIM. This effectively addresses the issues of information loss and artifact remnants caused by complex coupling interference.
在盲图像分离(BIS)领域,典型的应用包括雨/雪去除和反射/阴影层分离。然而,现有的BIS方法一般依赖于基于CNN或GAN的人工先验或变体,难以描述真实场景中源图像复杂多变的特征分布,导致在强噪声、非线性混合和纹理细节高耦合的情况下存在源图像分离估计偏差、纹理失真和伪影残留等缺陷。为了解决这一问题,本文创新性地将扩散模型引入到BIS中,提出了一种高效的双分支交互扩散模型(DBIDM)。DBIDM采用条件扩散模型学习源图像的特征分布,对源图像特征结构进行初始重建。此外,考虑到两个源图像相互耦合并带有噪声,我们设计了一个小波交互解耦模块(WIDM)。该模块被集成到扩散去噪过程中,以提高混合图像中详细信息的分离。在包含雨雪和复杂混合干扰的合成数据集上的实验表明,DBIDM方法在图像恢复和盲分离任务上都取得了突破性的性能。其中,在单源退化场景下,DBIDM在降雨任务中分别达到35.0023 dB (PSNR)和0.9549 (SSIM)的最优水平。比比较法平均高出1.2570 dB和0.0262 dB。对于除雪任务,也比次优指标提高了0.9272 dB和0.0289。在复杂的双盲分离场景下,恢复的双源图像在纹理保真度和细节完整性方面明显优于其他方法。PSNR提高4.1249 dB, SSIM提高0.0926 dB。这有效地解决了由复杂耦合干扰引起的信息丢失和工件残留问题。
{"title":"DBIDM: Implementing blind image separation through a dual branch interactive diffusion model","authors":"Jiaxin Gong,&nbsp;Jindong Xu,&nbsp;Haoqin Sun","doi":"10.1016/j.patrec.2025.11.038","DOIUrl":"10.1016/j.patrec.2025.11.038","url":null,"abstract":"<div><div>In the field of Blind Image Separation (BIS), typical applications include rain/snow removal and reflection/shadow layer separation. However, the existing BIS methods generally rely on artificial priors or variants based on CNN or GAN, which are difficult to describe the complex and variable feature distribution of the source image in the real scene, resulting in defects such as source image separation estimation bias, texture distortion and artifact residue in the case of strong noise, nonlinear mixing and high coupling of texture details. To address this issue, this paper innovatively introduces diffusion models into BIS and proposes an efficient Dual Branch Interactive Diffusion Model (DBIDM). DBIDM employs a conditional diffusion model to learn the feature distribution of source images and performs an initial reconstruction of source image feature structures. Furthermore, considering that the two source images are mutually coupled with noise, we designed a Wavelet Interactive Decoupling Module (WIDM). This module is integrated into the diffusion denoising process to improve the separation of detailed information in mixed images. Experiments on synthetic datasets containing rain/snow and complex mixed interference demonstrate that the proposed DBIDM method achieves breakthrough performance in both image restoration and blind separation tasks. Specifically, in single-source degraded scenarios, DBIDM reaches optimal levels of 35.0023 dB (PSNR) and 0.9549 (SSIM) in the rain removal task. It outperforms comparison method by an average of 1.2570 dB and 0.0262. For the snow removal task, improvements of 0.9272 dB and 0.0289 over the second-best indicators are also achieved. In complex dual blind separation scenarios, the restored dual-source images significantly surpass other methods in terms of texture fidelity and detail integrity. There are improvements of 4.1249 dB in PSNR and 0.0926 in SSIM. This effectively addresses the issues of information loss and artifact remnants caused by complex coupling interference.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 44-51"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards robust and reliable multi-modal 3D segmentation of multiple sclerosis lesions 对多发性硬化症病变进行稳健可靠的多模态三维分割
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-01 Epub Date: 2025-12-24 DOI: 10.1016/j.patrec.2025.12.008
Edoardo Coppola , Mattia Savardi , Alberto Signoroni
Accurate 3D segmentation of multiple sclerosis lesions is critical for clinical practice, yet existing approaches face key limitations: many models rely on 2D architectures or partial modality combinations, while others struggle to generalise across scanners and protocols. Although large-scale, multi-site training can improve robustness, its data demands are often prohibitive. To address these challenges, we propose a 3D multi-modal network that simultaneously processes T1-weighted, T2-weighted, and FLAIR scans, leveraging full cross-modal interactions and volumetric context to achieve state-of-the-art performance across four diverse public datasets. To tackle data scarcity, we quantify the minimal fine-tuning effort needed to adapt to individual unseen datasets and reformulate the few-shot learning paradigm at an “instance-per-dataset” level (rather than traditional “instance-per-class”), enabling the quantification of the minimal fine-tuning effort to adapt to multiple unseen sources simultaneously. Finally, we introduce Latent Distance Analysis, a novel label-free reliability estimation technique that anticipates potential distribution shifts and supports any form of test-time adaptation, thereby strengthening efficient robustness and physicians’ trust.
多发性硬化症病变的准确3D分割对临床实践至关重要,然而现有的方法面临着关键的局限性:许多模型依赖于2D架构或部分模态组合,而其他模型则难以在扫描仪和协议之间进行推广。尽管大规模、多站点训练可以提高鲁棒性,但其数据需求往往令人望而却步。为了应对这些挑战,我们提出了一个3D多模态网络,可以同时处理t1加权、t2加权和FLAIR扫描,利用完整的跨模态交互和体积背景,在四个不同的公共数据集上实现最先进的性能。为了解决数据稀缺问题,我们量化了适应单个看不见的数据集所需的最小微调努力,并在“每个数据集实例”级别(而不是传统的“每个类实例”)上重新制定了几次学习范式,从而使最小微调努力的量化能够同时适应多个看不见的源。最后,我们介绍了潜在距离分析,这是一种新的无标签可靠性估计技术,可以预测潜在的分布变化,并支持任何形式的测试时间适应,从而增强了有效的鲁棒性和医生的信任。
{"title":"Towards robust and reliable multi-modal 3D segmentation of multiple sclerosis lesions","authors":"Edoardo Coppola ,&nbsp;Mattia Savardi ,&nbsp;Alberto Signoroni","doi":"10.1016/j.patrec.2025.12.008","DOIUrl":"10.1016/j.patrec.2025.12.008","url":null,"abstract":"<div><div>Accurate 3D segmentation of multiple sclerosis lesions is critical for clinical practice, yet existing approaches face key limitations: many models rely on 2D architectures or partial modality combinations, while others struggle to generalise across scanners and protocols. Although large-scale, multi-site training can improve robustness, its data demands are often prohibitive. To address these challenges, we propose a 3D multi-modal network that simultaneously processes T1-weighted, T2-weighted, and FLAIR scans, leveraging full cross-modal interactions and volumetric context to achieve state-of-the-art performance across four diverse public datasets. To tackle data scarcity, we quantify the <em>minimal</em> fine-tuning effort needed to adapt to individual unseen datasets and reformulate the few-shot learning paradigm at an “instance-per-dataset” level (rather than traditional “instance-per-class”), enabling the quantification of the <em>minimal</em> fine-tuning effort to adapt to <em>multiple</em> unseen sources simultaneously. Finally, we introduce <em>Latent Distance Analysis</em>, a novel label-free reliability estimation technique that anticipates potential distribution shifts and supports any form of test-time adaptation, thereby strengthening efficient robustness and physicians’ trust.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 115-122"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bimodal beta mixture distribution for enhanced OOD inner-differentiation in multi-class text classification 多类文本分类中增强OOD内分化的双峰beta混合分布
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-01 Epub Date: 2025-11-08 DOI: 10.1016/j.patrec.2025.11.015
Camilo Maldonado , Carlos Valle , Héctor Allende
Text classification models often struggle with out-of-distribution (OOD) data due to temporal variability and the closed-world assumption. While distinguishing in-distribution from OOD data is well-studied, differentiating between valuable near-OOD data (potential new classes) and anomalous far-OOD data remains challenging. We propose BBMOE, a method that fine-tunes pre-trained models using labeled OOD data with a bimodal Beta mixture distribution regularization. This approach leverages the Beta distribution’s bounded nature and shape flexibility to enhance near-OOD versus far-OOD differentiation without compromising multi-class classification or OOD detection capabilities. Experiments with RoBERTa demonstrate improved OOD differentiation, while Low-Rank Adaptation (LoRA) reduces training time by about 32% while maintaining performance. We also analyze the relationship between class count and detection performance, and compare Total Variation distance with KL divergence. Results using AUROC and FPR@95 metrics confirm the method’s robustness for real-world text classification applications.
由于时间变异性和封闭世界假设,文本分类模型经常与分布外(OOD)数据作斗争。虽然从良好分布数据中区分分布内数据已经得到了很好的研究,但区分有价值的近良好分布数据(潜在的新类别)和异常的远良好分布数据仍然具有挑战性。我们提出了BBMOE,一种使用带有双峰Beta混合分布正则化的标记OOD数据微调预训练模型的方法。这种方法利用了Beta分布的有界特性和形状灵活性,在不影响多类分类或OOD检测能力的情况下,增强了近OOD和远OOD的区分。RoBERTa的实验表明,在保持性能的同时,低秩适应(Low-Rank Adaptation, LoRA)将训练时间减少了约32%。我们还分析了类数与检测性能之间的关系,并比较了总变异距离与KL散度。使用AUROC和FPR@95指标的结果证实了该方法对现实世界文本分类应用程序的鲁棒性。
{"title":"Bimodal beta mixture distribution for enhanced OOD inner-differentiation in multi-class text classification","authors":"Camilo Maldonado ,&nbsp;Carlos Valle ,&nbsp;Héctor Allende","doi":"10.1016/j.patrec.2025.11.015","DOIUrl":"10.1016/j.patrec.2025.11.015","url":null,"abstract":"<div><div>Text classification models often struggle with out-of-distribution (OOD) data due to temporal variability and the closed-world assumption. While distinguishing in-distribution from OOD data is well-studied, differentiating between valuable near-OOD data (potential new classes) and anomalous far-OOD data remains challenging. We propose BBMOE, a method that fine-tunes pre-trained models using labeled OOD data with a bimodal Beta mixture distribution regularization. This approach leverages the Beta distribution’s bounded nature and shape flexibility to enhance near-OOD versus far-OOD differentiation without compromising multi-class classification or OOD detection capabilities. Experiments with RoBERTa demonstrate improved OOD differentiation, while Low-Rank Adaptation (LoRA) reduces training time by about 32% while maintaining performance. We also analyze the relationship between class count and detection performance, and compare Total Variation distance with KL divergence. Results using AUROC and FPR@95 metrics confirm the method’s robustness for real-world text classification applications.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 158-164"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MUSIC: Multi-coil unified sparsity regularization using inter-slice correlation for arterial spin labeling MRI denoising MUSIC:动脉自旋标记MRI去噪中使用层间相关的多线圈统一稀疏化
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-01 Epub Date: 2025-12-24 DOI: 10.1016/j.patrec.2025.12.009
Hangfan Liu , Bo Li , Yiran Li , Manuel Taso , Dylan Tisdall , Yulin Chang , John A Detre , Ze Wang
Arterial spin labeling (ASL) perfusion MRI stands as the sole non-invasive method to quantify regional cerebral blood flow (CBF), a crucial physiological parameter. However, ASL MRI typically suffers from a relatively low signal-to-noise ratio. In this study, we introduce a novel ASL denoising approach termed Multi-coil Unified Sparsity regularization using Inter-slice Correlation (MUSIC). While MRI, including ASL data, is routinely captured using multi-channel coils, existing denoising techniques are tailored for coil-combined data, overlooking inherent multi-channel correlations. MUSIC capitalizes on the fact that multi-channel images are primarily distinguished by coil sensitivity weighting and random noise, resulting in an intrinsic low-rank structure within the stacked multi-channel data matrix. This low rankness can be further enhanced by grouping highly correlated slices. Our approach involves adapting regularization to each slice individually, forming potentially low-rank matrices by stacking vectorized slices selected from different channels based on their Euclidean distance from the current slice under processing. Matrix rank is then approximated using the logarithm-determinant of the covariance matrix. Importantly, MUSIC operates directly on complex data, eliminating the need for separating magnitude and phase or dividing real and imaginary data, thereby minimizing information loss. The degree of low-rank regularization is controlled by the estimated noise level, achieving a balance between noise reduction and texture preservation. Experimental validation on real-world imaging data demonstrates the efficacy of MUSIC in significantly enhancing ASL perfusion quality. By effectively suppressing noise while retaining essential textural information, MUSIC holds promise for improving the utility and accuracy of ASL perfusion MRI, thus advancing neuroimaging research and clinical diagnoses.
动脉自旋标记(ASL)灌注MRI是唯一一种量化区域脑血流量(CBF)这一重要生理参数的无创方法。然而,ASL MRI通常具有相对较低的信噪比。在这项研究中,我们介绍了一种新的ASL去噪方法,称为使用片间相关(MUSIC)的多线圈统一稀疏性正则化。虽然MRI(包括ASL数据)通常使用多通道线圈捕获,但现有的去噪技术是针对线圈组合数据量身定制的,忽略了固有的多通道相关性。MUSIC利用了多通道图像主要通过线圈灵敏度加权和随机噪声来区分的事实,从而在堆叠的多通道数据矩阵中产生固有的低秩结构。通过对高度相关的切片进行分组,可以进一步增强这种低等级。我们的方法包括对每个切片单独进行正则化,通过堆叠从不同通道中选择的矢量化切片,形成潜在的低秩矩阵,这些切片基于它们与正在处理的当前切片的欧几里得距离。然后使用协方差矩阵的对数行列式来近似矩阵秩。重要的是,MUSIC直接对复杂数据进行操作,不需要分离幅度和相位或区分实数据和虚数据,从而最大限度地减少信息损失。低秩正则化程度由估计的噪声水平控制,实现了噪声降低和纹理保持的平衡。真实影像数据的实验验证表明MUSIC可显著提高ASL灌注质量。通过有效地抑制噪声,同时保留基本的纹理信息,MUSIC有望提高ASL灌注MRI的实用性和准确性,从而推进神经影像学研究和临床诊断。
{"title":"MUSIC: Multi-coil unified sparsity regularization using inter-slice correlation for arterial spin labeling MRI denoising","authors":"Hangfan Liu ,&nbsp;Bo Li ,&nbsp;Yiran Li ,&nbsp;Manuel Taso ,&nbsp;Dylan Tisdall ,&nbsp;Yulin Chang ,&nbsp;John A Detre ,&nbsp;Ze Wang","doi":"10.1016/j.patrec.2025.12.009","DOIUrl":"10.1016/j.patrec.2025.12.009","url":null,"abstract":"<div><div>Arterial spin labeling (ASL) perfusion MRI stands as the sole non-invasive method to quantify regional cerebral blood flow (CBF), a crucial physiological parameter. However, ASL MRI typically suffers from a relatively low signal-to-noise ratio. In this study, we introduce a novel ASL denoising approach termed Multi-coil Unified Sparsity regularization using Inter-slice Correlation (MUSIC). While MRI, including ASL data, is routinely captured using multi-channel coils, existing denoising techniques are tailored for coil-combined data, overlooking inherent multi-channel correlations. MUSIC capitalizes on the fact that multi-channel images are primarily distinguished by coil sensitivity weighting and random noise, resulting in an intrinsic low-rank structure within the stacked multi-channel data matrix. This low rankness can be further enhanced by grouping highly correlated slices. Our approach involves adapting regularization to each slice individually, forming potentially low-rank matrices by stacking vectorized slices selected from different channels based on their Euclidean distance from the current slice under processing. Matrix rank is then approximated using the logarithm-determinant of the covariance matrix. Importantly, MUSIC operates directly on complex data, eliminating the need for separating magnitude and phase or dividing real and imaginary data, thereby minimizing information loss. The degree of low-rank regularization is controlled by the estimated noise level, achieving a balance between noise reduction and texture preservation. Experimental validation on real-world imaging data demonstrates the efficacy of MUSIC in significantly enhancing ASL perfusion quality. By effectively suppressing noise while retaining essential textural information, MUSIC holds promise for improving the utility and accuracy of ASL perfusion MRI, thus advancing neuroimaging research and clinical diagnoses.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 142-148"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing precipitation detection: A multi-sensor approach using conditional GANs and recurrent networks 增强降水检测:使用条件gan和循环网络的多传感器方法
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-01 Epub Date: 2025-06-14 DOI: 10.1016/j.patrec.2025.05.022
Pablo Negri , Daniel Acevedo , Juan Ruiz , Sergio Gonzalez , Luciano Vidal , Alejo Silvarrey , Maria Gabriela Nicora
The advent of automatic precipitation detection with high-frequency data at very low spatial resolution (4 km) renders the satellite infrared brightness temperature (IR-BT) sensor a promising variable. Nevertheless, this approach must confront the inherent simplicity of this variable, where there is not always a strong correlation with convective precipitation, and the very low number of rain events occurring in nature, presenting an imbalanced problem. This paper proposes a novel approach to identify rainfall that integrates the IR-BT variable with lightning activity, defined as the number of detected lightning flashes per unit of time and space. The approach utilizes a recurrent neural network to estimate a binary output and a conditional GAN (cGAN) framework, which enhances the training and performance of this imbalanced problem. Inverse Dice loss, an alternative loss function, is employed to enhance the convergence and results of our framework: PD-GAN. Tests have shown that integrating sensors and the proposed architecture leads to positive outcomes, including a reduction in false alarms and an enhancement in the overlap of positive events.
低空间分辨率(4公里)高频数据自动降水探测的出现,使卫星红外亮度温度(IR-BT)传感器成为一个有前途的变量。然而,这种方法必须面对这个变量固有的简单性,其中并不总是与对流降水有很强的相关性,而且自然界中发生的降雨事件数量非常少,提出了一个不平衡的问题。本文提出了一种将IR-BT变量与闪电活动(定义为每单位时间和空间检测到的闪电次数)相结合的新方法来识别降雨。该方法利用递归神经网络估计二值输出和条件GAN (cGAN)框架,增强了该不平衡问题的训练和性能。利用一种可选的损失函数——逆骰子损失来增强我们的框架:PD-GAN的收敛性和结果。测试表明,集成传感器和拟议的架构会产生积极的结果,包括减少假警报和增强积极事件的重叠。
{"title":"Enhancing precipitation detection: A multi-sensor approach using conditional GANs and recurrent networks","authors":"Pablo Negri ,&nbsp;Daniel Acevedo ,&nbsp;Juan Ruiz ,&nbsp;Sergio Gonzalez ,&nbsp;Luciano Vidal ,&nbsp;Alejo Silvarrey ,&nbsp;Maria Gabriela Nicora","doi":"10.1016/j.patrec.2025.05.022","DOIUrl":"10.1016/j.patrec.2025.05.022","url":null,"abstract":"<div><div>The advent of automatic precipitation detection with high-frequency data at very low spatial resolution (4 km) renders the satellite infrared brightness temperature (IR-BT) sensor a promising variable. Nevertheless, this approach must confront the inherent simplicity of this variable, where there is not always a strong correlation with convective precipitation, and the very low number of rain events occurring in nature, presenting an imbalanced problem. This paper proposes a novel approach to identify rainfall that integrates the IR-BT variable with lightning activity, defined as the number of detected lightning flashes per unit of time and space. The approach utilizes a recurrent neural network to estimate a binary output and a conditional GAN (cGAN) framework, which enhances the training and performance of this imbalanced problem. Inverse Dice loss, an alternative loss function, is employed to enhance the convergence and results of our framework: PD-GAN. Tests have shown that integrating sensors and the proposed architecture leads to positive outcomes, including a reduction in false alarms and an enhancement in the overlap of positive events.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 150-157"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DECFusion: A lightweight decomposition fusion method for luminance artifact removal in infrared and visible images DECFusion:一种用于去除红外和可见光图像中亮度伪影的轻量级分解融合方法
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-01 Epub Date: 2025-12-06 DOI: 10.1016/j.patrec.2025.11.034
Quanquan Xiao , Haiyan Jin , Haonan Su , Yuanlin Zhang
Infrared and visible image fusion is a current research hotspot in the field of multimodal image fusion, which aims to improve the perception and understanding of the scene through effective fusion. However, current deep learning-based fusion methods often fail to fully consider the difference between visible light brightness and thermal information of infrared images, resulting in brightness artifacts in the generated fused images, which seriously affects the visual effect of the fused images. To solve this problem, we propose a lightweight infrared and visible image decomposition fusion method (DECFusion). The method decomposes the luminance information of the visible image and the thermal information of the infrared image into illumination and reflection components through a learnable lightweight network, and adaptively adjusts the illumination component to remove unnecessary luminance interference. In the reconstruction stage, we combine the Retinex theory to reconstruct the image. Experiments show that the fused images generated by our method not only avoid the generation of luminance artifacts, but also are more lightweight and outperform the current state-of-the-art infrared and visible image fusion methods in terms of the visual quality of the fused images. Our code is available at https://github.com/tianzhiya/DECFusion.
红外与可见光图像融合是当前多模态图像融合领域的研究热点,其目的是通过有效的融合提高对场景的感知和理解。然而,目前基于深度学习的融合方法往往没有充分考虑红外图像可见光亮度和热信息的差异,导致生成的融合图像中存在亮度伪影,严重影响融合图像的视觉效果。为了解决这一问题,我们提出了一种轻量级的红外和可见光图像分解融合方法(DECFusion)。该方法通过可学习的轻量级网络将可见光图像的亮度信息和红外图像的热信息分解为照明和反射分量,并自适应调整照明分量以消除不必要的亮度干扰。在重建阶段,我们结合Retinex理论对图像进行重建。实验表明,该方法生成的融合图像不仅避免了亮度伪影的产生,而且更轻量化,在融合图像的视觉质量方面优于目前最先进的红外和可见光图像融合方法。我们的代码可在https://github.com/tianzhiya/DECFusion上获得。
{"title":"DECFusion: A lightweight decomposition fusion method for luminance artifact removal in infrared and visible images","authors":"Quanquan Xiao ,&nbsp;Haiyan Jin ,&nbsp;Haonan Su ,&nbsp;Yuanlin Zhang","doi":"10.1016/j.patrec.2025.11.034","DOIUrl":"10.1016/j.patrec.2025.11.034","url":null,"abstract":"<div><div>Infrared and visible image fusion is a current research hotspot in the field of multimodal image fusion, which aims to improve the perception and understanding of the scene through effective fusion. However, current deep learning-based fusion methods often fail to fully consider the difference between visible light brightness and thermal information of infrared images, resulting in brightness artifacts in the generated fused images, which seriously affects the visual effect of the fused images. To solve this problem, we propose a lightweight infrared and visible image decomposition fusion method (DECFusion). The method decomposes the luminance information of the visible image and the thermal information of the infrared image into illumination and reflection components through a learnable lightweight network, and adaptively adjusts the illumination component to remove unnecessary luminance interference. In the reconstruction stage, we combine the Retinex theory to reconstruct the image. Experiments show that the fused images generated by our method not only avoid the generation of luminance artifacts, but also are more lightweight and outperform the current state-of-the-art infrared and visible image fusion methods in terms of the visual quality of the fused images. Our code is available at <span><span>https://github.com/tianzhiya/DECFusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 67-73"},"PeriodicalIF":3.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145749050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1