首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
AMDC: Attenuation map-guided dual-color space for underwater image color correction 用于水下图像色彩校正的衰减地图引导双色空间
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.patrec.2026.01.005
Shilong Sun , Baiqiang Yu , Ling Zhou , Junpeng Xu , Wenyi Zhao , Weidong Zhang
Underwater images frequently exhibit color distortions due to wavelength-dependent light attenuation and absorption, further complicated by irregular lighting conditions underwater. Traditional color correction methods primarily target global light attenuation but are less effective in handling local color shifts caused by discontinuous depth variations and artificial illumination. To address this issue, we propose a dual-space adaptive color correction method guided by an attenuation map, referred to as AMDC. Specifically, we first utilize global attenuation compensation by leveraging the maximum reference channel of the image. Building on the globally compensated result, we then introduce a dual-space collaborative correction strategy. In RGB space, we perform local adaptive compensation using a weighted sliding window. In CIELab space, we restore color saturation through a zero-symmetric adaptive offset correction approach. To retain the most visually optimal color features, we selectively fuse the a and b channels from the two correction results, producing a locally corrected image. Finally, we utilize the maximum attenuation map of the raw image to guide the fusion of the locally corrected image with the raw, generating the final color-corrected output. Extensive qualitative and quantitative experiments demonstrate the effectiveness and robustness of our method for underwater image color correction.
由于波长相关的光衰减和吸收,水下图像经常表现出颜色失真,水下不规则的照明条件进一步复杂化。传统的色彩校正方法主要针对全局光衰减,但对于处理因深度变化不连续和人工照明引起的局部色彩偏移效果较差。为了解决这个问题,我们提出了一种由衰减图引导的双空间自适应色彩校正方法,称为AMDC。具体来说,我们首先利用图像的最大参考通道来利用全局衰减补偿。在全局补偿结果的基础上,提出了一种双空间协同校正策略。在RGB空间中,我们使用加权滑动窗口进行局部自适应补偿。在CIELab空间中,我们通过零对称自适应偏移校正方法恢复色彩饱和度。为了保留视觉上最优的颜色特征,我们有选择地融合两个校正结果中的a和b通道,产生局部校正图像。最后,我们利用原始图像的最大衰减图来指导局部校正图像与原始图像的融合,生成最终的颜色校正输出。大量的定性和定量实验证明了我们的方法对水下图像颜色校正的有效性和鲁棒性。
{"title":"AMDC: Attenuation map-guided dual-color space for underwater image color correction","authors":"Shilong Sun ,&nbsp;Baiqiang Yu ,&nbsp;Ling Zhou ,&nbsp;Junpeng Xu ,&nbsp;Wenyi Zhao ,&nbsp;Weidong Zhang","doi":"10.1016/j.patrec.2026.01.005","DOIUrl":"10.1016/j.patrec.2026.01.005","url":null,"abstract":"<div><div>Underwater images frequently exhibit color distortions due to wavelength-dependent light attenuation and absorption, further complicated by irregular lighting conditions underwater. Traditional color correction methods primarily target global light attenuation but are less effective in handling local color shifts caused by discontinuous depth variations and artificial illumination. To address this issue, we propose a dual-space adaptive color correction method guided by an attenuation map, referred to as AMDC. Specifically, we first utilize global attenuation compensation by leveraging the maximum reference channel of the image. Building on the globally compensated result, we then introduce a dual-space collaborative correction strategy. In RGB space, we perform local adaptive compensation using a weighted sliding window. In CIELab space, we restore color saturation through a zero-symmetric adaptive offset correction approach. To retain the most visually optimal color features, we selectively fuse the a and b channels from the two correction results, producing a locally corrected image. Finally, we utilize the maximum attenuation map of the raw image to guide the fusion of the locally corrected image with the raw, generating the final color-corrected output. Extensive qualitative and quantitative experiments demonstrate the effectiveness and robustness of our method for underwater image color correction.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 80-86"},"PeriodicalIF":3.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nighttime flare removal via frequency decoupling 通过频率解耦去除夜间耀斑
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.patrec.2026.01.012
Minglong Xue , Aoxiang Ning , Jinhong He , Shuaibin Fan , Senming Zhong
Existing methods for nighttime flare removal struggle to effectively decouple flare features from normal image texture features, frequently resulting in the loss of local details after flare removal. Due to the differences in the frequency distribution characteristics of flare and content structural information—where flare and lighting information are primarily concentrated in the low-frequency range, while content structural information is concentrated in the high-frequency range—we propose a frequency decoupling de-flare network (FDDNet). This method effectively decouples flare from content, enabling efficient flare removal. Specifically, the network consists of the Frequency Decoupling Module (FDM) and the Frequency Fusion Module (FFM). The FDM divides the image’s frequency features into low-frequency and high-frequency components by setting masks. It dynamically optimizes its weights to effectively decouple flare from content while maximizing the retention of structural content information. In addition, based on the traditional skip connections, we propose the Frequency Fusion Module. The module separately fuses the amplitude and phase of features from both the encoding and decoding stages, reducing the impact of flare and brightness anomalies on the reconstructed image while repairing local damage caused by flare removal. Extensive experiments show that our method significantly improves the performance of nighttime flare removal.
现有的夜间耀斑去除方法难以有效地将耀斑特征与正常图像纹理特征解耦,经常导致耀斑去除后局部细节的丢失。由于照明弹和内容结构信息的频率分布特征不同(照明弹和照明信息主要集中在低频范围,而内容结构信息主要集中在高频范围),本文提出了一种频率解耦去照明弹网络(FDDNet)。这种方法有效地将火炬从内容中分离出来,从而实现有效的火炬移除。具体来说,该网络由频率解耦模块(FDM)和频率融合模块(FFM)组成。FDM通过设置掩模将图像的频率特征分为低频和高频分量。它动态优化其权重,以有效地从内容中分离耀斑,同时最大限度地保留结构内容信息。此外,在传统跳线连接的基础上,提出了频率融合模块。该模块分别对编码和解码阶段的特征幅度和相位进行融合,减少耀斑和亮度异常对重构图像的影响,同时修复耀斑去除造成的局部损伤。大量实验表明,该方法显著提高了夜间耀斑去除的性能。
{"title":"Nighttime flare removal via frequency decoupling","authors":"Minglong Xue ,&nbsp;Aoxiang Ning ,&nbsp;Jinhong He ,&nbsp;Shuaibin Fan ,&nbsp;Senming Zhong","doi":"10.1016/j.patrec.2026.01.012","DOIUrl":"10.1016/j.patrec.2026.01.012","url":null,"abstract":"<div><div>Existing methods for nighttime flare removal struggle to effectively decouple flare features from normal image texture features, frequently resulting in the loss of local details after flare removal. Due to the differences in the frequency distribution characteristics of flare and content structural information—where flare and lighting information are primarily concentrated in the low-frequency range, while content structural information is concentrated in the high-frequency range—we propose a frequency decoupling de-flare network (FDDNet). This method effectively decouples flare from content, enabling efficient flare removal. Specifically, the network consists of the Frequency Decoupling Module (FDM) and the Frequency Fusion Module (FFM). The FDM divides the image’s frequency features into low-frequency and high-frequency components by setting masks. It dynamically optimizes its weights to effectively decouple flare from content while maximizing the retention of structural content information. In addition, based on the traditional skip connections, we propose the Frequency Fusion Module. The module separately fuses the amplitude and phase of features from both the encoding and decoding stages, reducing the impact of flare and brightness anomalies on the reconstructed image while repairing local damage caused by flare removal. Extensive experiments show that our method significantly improves the performance of nighttime flare removal.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 73-79"},"PeriodicalIF":3.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging language to generalize natural images to few-shot medical image segmentation 利用语言对自然图像进行泛化,实现少镜头医学图像分割
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.patrec.2026.01.009
Feifan Song , Yuntian Bo , Shidong Wang , Yang Long , Haofeng Zhang
Cross-domain Few-shot Medical Image Segmentation (CD-FSMIS) typically involves pre-training on a large-scale source domain dataset (e.g., natural image dataset) before transferring to a target domain with limited data for pixel-wise segmentation. However, due to the significant domain gap between natural images and medical images, existing Few-shot Segmentation (FSS) methods suffer from severe performance degradation in cross-domain scenarios. We observe that using only annotated masks as cross-domain cues is insufficient, while rich textual information can effectively establish knowledge relationships between visual instances and language descriptions, mitigating domain shift. To address this, we propose a plug-in Cross-domain Text-guided (CD-TG) module that leverages text-domain alignment to construct a new alignment space for domain generalization. This plug-in module consists of two components, including: (1) Text Generation Unit that utilizes the GPT-4 question-answering system to generate standardized category-level textual descriptions, and (2) Semantic-guided Unit that aligns visual features with textual embeddings while incorporating existing mask information. We integrate this plug-in module into five mainstream FSS methods and evaluate it on four widely used medical image datasets, and the experimental results demonstrate its effectiveness. Code is available at https://github.com/Lilacis/CD_TG.
跨域少镜头医学图像分割(CD-FSMIS)通常涉及在大规模源域数据集(例如,自然图像数据集)上进行预训练,然后转移到具有有限数据的目标域进行逐像素分割。然而,由于自然图像和医学图像之间存在明显的领域差距,现有的Few-shot Segmentation (FSS)方法在跨领域场景下性能下降严重。我们发现,仅使用带注释的掩码作为跨领域线索是不够的,而丰富的文本信息可以有效地在视觉实例和语言描述之间建立知识关系,减轻领域转移。为了解决这个问题,我们提出了一个插件跨域文本引导(CD-TG)模块,它利用文本域对齐来构建一个新的域泛化对齐空间。该插件模块由两个部分组成,其中:(1)文本生成单元(Text Generation Unit)利用GPT-4问答系统生成标准化的类别级文本描述;(2)语义引导单元(Semantic-guided Unit)将视觉特征与文本嵌入对齐,同时结合现有掩码信息。我们将该插件模块集成到五种主流的FSS方法中,并在四种广泛使用的医学图像数据集上对其进行了评估,实验结果证明了其有效性。代码可从https://github.com/Lilacis/CD_TG获得。
{"title":"Leveraging language to generalize natural images to few-shot medical image segmentation","authors":"Feifan Song ,&nbsp;Yuntian Bo ,&nbsp;Shidong Wang ,&nbsp;Yang Long ,&nbsp;Haofeng Zhang","doi":"10.1016/j.patrec.2026.01.009","DOIUrl":"10.1016/j.patrec.2026.01.009","url":null,"abstract":"<div><div>Cross-domain Few-shot Medical Image Segmentation (CD-FSMIS) typically involves pre-training on a large-scale source domain dataset (e.g., natural image dataset) before transferring to a target domain with limited data for pixel-wise segmentation. However, due to the significant domain gap between natural images and medical images, existing Few-shot Segmentation (FSS) methods suffer from severe performance degradation in cross-domain scenarios. We observe that using only annotated masks as cross-domain cues is insufficient, while rich textual information can effectively establish knowledge relationships between visual instances and language descriptions, mitigating domain shift. To address this, we propose a plug-in Cross-domain Text-guided (CD-TG) module that leverages text-domain alignment to construct a new alignment space for domain generalization. This plug-in module consists of two components, including: (1) Text Generation Unit that utilizes the GPT-4 question-answering system to generate standardized category-level textual descriptions, and (2) Semantic-guided Unit that aligns visual features with textual embeddings while incorporating existing mask information. We integrate this plug-in module into five mainstream FSS methods and evaluate it on four widely used medical image datasets, and the experimental results demonstrate its effectiveness. Code is available at <span><span>https://github.com/Lilacis/CD_TG</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 66-72"},"PeriodicalIF":3.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wavelet-based diffusion transformer for image dehazing 基于小波的图像去雾扩散变换器
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-10 DOI: 10.1016/j.patrec.2026.01.016
Cheng Ma , Guojun Liu , Jing Yue
In current image dehazing methods based on diffusion models, few studies explore and leverage the inherent prior knowledge of hazy images. Additionally, the inherent complexity of these models often results in difficulties during training, which in turn lead to poor restoration performance in dense hazy environments. To address these challenges, this paper proposes a dehazing diffusion model based on Haar wavelet priors, aiming to fully exploit the characteristic that haze information is concentrated in the low-frequency region. Specifically, the Haar wavelet transform is first applied to decompose the hazy image, and the diffusion model is used to generate low-frequency information in the image, thereby reconstructing the main colors and content of the dehazed image. Moreover, a high-frequency enhancement module based on Gabor is designed to extract high-frequency details through multi-directional Gabor convolution filters, further improving the fine-grained restoration capability of the image. Subsequently, a multi-scale pooling block is adopted to reduce blocky artifacts caused by non-uniform haze conditions, enhancing the visual consistency of the image. Finally, the effectiveness of the proposed method is demonstrated on publicly available datasets, and the model’s generalization ability is tested on real hazy image datasets, as well as its potential for application in other downstream tasks. The code is available at https://github.com/Mccc1003/WDiT_Dehaze-main.
在目前基于扩散模型的图像去雾方法中,很少有研究探索和利用模糊图像固有的先验知识。此外,这些模型固有的复杂性往往会给训练带来困难,从而导致在密集雾霾环境下的恢复性能较差。针对这些挑战,本文提出了一种基于Haar小波先验的消雾扩散模型,旨在充分利用雾霾信息集中在低频区域的特点。具体来说,首先利用Haar小波变换对模糊图像进行分解,然后利用扩散模型在图像中生成低频信息,从而重建去雾图像的主要颜色和内容。设计了基于Gabor的高频增强模块,通过多向Gabor卷积滤波器提取高频细节,进一步提高了图像的细粒度恢复能力。随后,采用多尺度池化块来减少雾霾条件不均匀造成的块伪影,增强图像的视觉一致性。最后,在公开可用的数据集上验证了该方法的有效性,并在真实模糊图像数据集上测试了模型的泛化能力,以及在其他下游任务中的应用潜力。代码可在https://github.com/Mccc1003/WDiT_Dehaze-main上获得。
{"title":"Wavelet-based diffusion transformer for image dehazing","authors":"Cheng Ma ,&nbsp;Guojun Liu ,&nbsp;Jing Yue","doi":"10.1016/j.patrec.2026.01.016","DOIUrl":"10.1016/j.patrec.2026.01.016","url":null,"abstract":"<div><div>In current image dehazing methods based on diffusion models, few studies explore and leverage the inherent prior knowledge of hazy images. Additionally, the inherent complexity of these models often results in difficulties during training, which in turn lead to poor restoration performance in dense hazy environments. To address these challenges, this paper proposes a dehazing diffusion model based on Haar wavelet priors, aiming to fully exploit the characteristic that haze information is concentrated in the low-frequency region. Specifically, the Haar wavelet transform is first applied to decompose the hazy image, and the diffusion model is used to generate low-frequency information in the image, thereby reconstructing the main colors and content of the dehazed image. Moreover, a high-frequency enhancement module based on Gabor is designed to extract high-frequency details through multi-directional Gabor convolution filters, further improving the fine-grained restoration capability of the image. Subsequently, a multi-scale pooling block is adopted to reduce blocky artifacts caused by non-uniform haze conditions, enhancing the visual consistency of the image. Finally, the effectiveness of the proposed method is demonstrated on publicly available datasets, and the model’s generalization ability is tested on real hazy image datasets, as well as its potential for application in other downstream tasks. The code is available at <span><span>https://github.com/Mccc1003/WDiT_Dehaze-main</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 58-65"},"PeriodicalIF":3.3,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MR-DETR: Miss reduction DETR with context frequency attention and adaptive query allocation strategy for small object detection MR-DETR:基于上下文频率关注和自适应查询分配策略的小目标检测的缺失减少DETR
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-09 DOI: 10.1016/j.patrec.2026.01.004
Hailan Shen, Zihan Wang, Shuo Huang, Zailiang Chen
Small object detection is a critical task in computer vision, aiming to accurately detect tiny instances within images. Although DETR-based methods have improved general object detection, they often suffer from missed detections of small objects due to their limited size and indistinct features. Moreover, DETR-based methods employ a fixed number of queries, making it difficult to adapt to the dynamic variations of scenes. In this study, we propose Miss Reduction DETR (MR-DETR), which leverages Context Frequency Attention (CFA) and an Adaptive Query Allocation Strategy (AQAS) to reduce missed detections. First, to better capture fine details of small objects, CFA is designed with two complementary branches known as context and frequency. The former branch employs axial strip convolutions to capture global contextual information, while the latter branch uses a frequency modulation module to emphasize local high-frequency details. Next, AQAS is introducted, which applies feature excitation and compression to the encoder’s output maps, dynamically evaluates object density, and automatically adjusts the number of queries based on a density-to-query mapping, thereby improving adaptability in complex scenes and reducing missed detections. Experimental results demonstrate that MR-DETR achieves state-of-the-art detection performance on the aerial image datasets VisDrone and AI-TOD, which mainly contain small objects.
小目标检测是计算机视觉中的一项关键任务,旨在准确检测图像中的微小目标。尽管基于der的方法改进了一般的目标检测,但由于小目标的尺寸有限,特征不清晰,往往会漏检。此外,基于der的方法使用固定数量的查询,使其难以适应场景的动态变化。在本研究中,我们提出了Miss Reduction DETR (MR-DETR),它利用上下文频率注意(CFA)和自适应查询分配策略(AQAS)来减少遗漏检测。首先,为了更好地捕获小对象的精细细节,CFA设计了两个互补的分支,即上下文和频率。前一个分支采用轴向条带卷积来捕获全局上下文信息,而后一个分支使用调频模块来强调局部高频细节。接下来,介绍了AQAS算法,该算法对编码器的输出映射进行特征激励和压缩,动态评估对象密度,并基于密度-查询映射自动调整查询次数,从而提高了复杂场景下的适应性,减少了漏检。实验结果表明,MR-DETR在主要包含小目标的航空图像数据集VisDrone和AI-TOD上达到了最先进的检测性能。
{"title":"MR-DETR: Miss reduction DETR with context frequency attention and adaptive query allocation strategy for small object detection","authors":"Hailan Shen,&nbsp;Zihan Wang,&nbsp;Shuo Huang,&nbsp;Zailiang Chen","doi":"10.1016/j.patrec.2026.01.004","DOIUrl":"10.1016/j.patrec.2026.01.004","url":null,"abstract":"<div><div>Small object detection is a critical task in computer vision, aiming to accurately detect tiny instances within images. Although DETR-based methods have improved general object detection, they often suffer from missed detections of small objects due to their limited size and indistinct features. Moreover, DETR-based methods employ a fixed number of queries, making it difficult to adapt to the dynamic variations of scenes. In this study, we propose Miss Reduction DETR (MR-DETR), which leverages Context Frequency Attention (CFA) and an Adaptive Query Allocation Strategy (AQAS) to reduce missed detections. First, to better capture fine details of small objects, CFA is designed with two complementary branches known as context and frequency. The former branch employs axial strip convolutions to capture global contextual information, while the latter branch uses a frequency modulation module to emphasize local high-frequency details. Next, AQAS is introducted, which applies feature excitation and compression to the encoder’s output maps, dynamically evaluates object density, and automatically adjusts the number of queries based on a density-to-query mapping, thereby improving adaptability in complex scenes and reducing missed detections. Experimental results demonstrate that MR-DETR achieves state-of-the-art detection performance on the aerial image datasets VisDrone and AI-TOD, which mainly contain small objects.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 52-57"},"PeriodicalIF":3.3,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Signature-in-signature: A fidelity-preserving and usability-ensuring framework for dynamic handwritten signature protection 签名中的签名:动态手写签名保护的保真度和可用性框架
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-07 DOI: 10.1016/j.patrec.2026.01.001
Tianyu Chen, Qi Cui, Zhangjie Fu
Dynamic handwritten signature (DHS) verification is widely used for identity authentication in modern applications, offering a blend of convenience and security. However, traditional verification processes necessitate users to upload multiple signature templates to remote application servers, raising critical risks to the privacy and security of sensitive data. To address these concerns, this paper proposes a robust watermarking framework for DHS data. By embedding unique watermarks as digital signatures into DHS data, the framework ensures effective traceability of DHS, allowing the identification of sources in case of DHS misuse or leakage. Specifically, we introduce a velocity-based loss function that minimizes trajectory distortion during the watermark embedding process, effectively preserving the fidelity of the DHS. In parallel, the training process leverages contrastive learning to ensure that the watermarked DHS remains closer to the original signature in feature space than to other DHS templates. This design guarantees that the usability of the watermarked DHS is unaffected, maintaining the accuracy and reliability of signature verification systems. Extensive experiments conducted on the large-scale dynamic signature dataset demonstrate that the watermarked signatures retain visual integrity and remain imperceptible to human observation. Furthermore, the embedded watermarks exhibit compatibility with a wide range of existing verification methods, ensuring that the framework does not compromise existing verification performance.
动态手写签名(DHS)验证在现代应用程序中广泛用于身份验证,它提供了便利性和安全性。然而,传统的验证过程需要用户将多个签名模板上传到远程应用服务器,这给敏感数据的隐私和安全带来了严重的风险。为了解决这些问题,本文提出了一种针对国土安全部数据的鲁棒水印框架。通过将独特的水印作为数字签名嵌入到国土安全部数据中,该框架确保了国土安全部的有效可追溯性,在国土安全部误用或泄漏的情况下可以识别来源。具体来说,我们引入了一个基于速度的损失函数,该函数在水印嵌入过程中最大限度地减少了轨迹失真,有效地保持了DHS的保真度。同时,训练过程利用对比学习来确保水印的DHS在特征空间中比其他DHS模板更接近原始签名。这种设计保证了水印DHS的可用性不受影响,保证了签名验证系统的准确性和可靠性。在大规模动态签名数据集上进行的大量实验表明,水印后的签名保持了视觉完整性,并且不被人类观察到。此外,嵌入的水印显示出与广泛的现有验证方法的兼容性,确保该框架不会损害现有的验证性能。
{"title":"Signature-in-signature: A fidelity-preserving and usability-ensuring framework for dynamic handwritten signature protection","authors":"Tianyu Chen,&nbsp;Qi Cui,&nbsp;Zhangjie Fu","doi":"10.1016/j.patrec.2026.01.001","DOIUrl":"10.1016/j.patrec.2026.01.001","url":null,"abstract":"<div><div>Dynamic handwritten signature (DHS) verification is widely used for identity authentication in modern applications, offering a blend of convenience and security. However, traditional verification processes necessitate users to upload multiple signature templates to remote application servers, raising critical risks to the privacy and security of sensitive data. To address these concerns, this paper proposes a robust watermarking framework for DHS data. By embedding unique watermarks as digital signatures into DHS data, the framework ensures effective traceability of DHS, allowing the identification of sources in case of DHS misuse or leakage. Specifically, we introduce a velocity-based loss function that minimizes trajectory distortion during the watermark embedding process, effectively preserving the fidelity of the DHS. In parallel, the training process leverages contrastive learning to ensure that the watermarked DHS remains closer to the original signature in feature space than to other DHS templates. This design guarantees that the usability of the watermarked DHS is unaffected, maintaining the accuracy and reliability of signature verification systems. Extensive experiments conducted on the large-scale dynamic signature dataset demonstrate that the watermarked signatures retain visual integrity and remain imperceptible to human observation. Furthermore, the embedded watermarks exhibit compatibility with a wide range of existing verification methods, ensuring that the framework does not compromise existing verification performance.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 45-51"},"PeriodicalIF":3.3,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalization performance distributions along learning curves 沿着学习曲线的泛化性能分布
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-03 DOI: 10.1016/j.patrec.2026.01.003
O. Taylan Turan , Marco Loog , David M.J. Tax
Learning curves show the expected performance with respect to training set size. This is often used to evaluate and compare models, tune hyper-parameters and determine how much data is needed for a specific performance. However, the distributional properties of performance are frequently overlooked on learning curves. Generally, only an average with standard error or standard deviation is used. In this paper, we analyze the distributions of generalization performance on the learning curves. We compile a high-fidelity learning curve database, both with respect to training set size and repetitions of the sampling for a fixed training set size. Our investigation reveals that generalization performance rarely follows a Gaussian distribution for classical classifiers, regardless of dataset balance, loss function, sampling method, or hyper-parameter tuning along learning curves. Furthermore, we show that the choice of statistical summary, mean versus measures like quantiles affect the top model rankings. Our findings highlight the importance of considering different statistical measures and use of non-parametric approaches when evaluating and selecting machine learning models with learning curves.
学习曲线显示了相对于训练集大小的预期性能。这通常用于评估和比较模型、调优超参数以及确定特定性能需要多少数据。然而,性能的分布特性在学习曲线上经常被忽视。通常,只使用标准误差或标准偏差的平均值。本文分析了泛化性能在学习曲线上的分布。我们编译了一个高保真的学习曲线数据库,既考虑了训练集的大小,也考虑了固定训练集大小的采样次数。我们的研究表明,经典分类器的泛化性能很少遵循高斯分布,无论数据集平衡、损失函数、采样方法或沿学习曲线的超参数调整如何。此外,我们表明统计汇总的选择,均值与分位数等度量会影响顶级模型排名。我们的研究结果强调了在评估和选择具有学习曲线的机器学习模型时考虑不同统计度量和使用非参数方法的重要性。
{"title":"Generalization performance distributions along learning curves","authors":"O. Taylan Turan ,&nbsp;Marco Loog ,&nbsp;David M.J. Tax","doi":"10.1016/j.patrec.2026.01.003","DOIUrl":"10.1016/j.patrec.2026.01.003","url":null,"abstract":"<div><div>Learning curves show the expected performance with respect to training set size. This is often used to evaluate and compare models, tune hyper-parameters and determine how much data is needed for a specific performance. However, the distributional properties of performance are frequently overlooked on learning curves. Generally, only an average with standard error or standard deviation is used. In this paper, we analyze the distributions of generalization performance on the learning curves. We compile a high-fidelity learning curve database, both with respect to training set size and repetitions of the sampling for a fixed training set size. Our investigation reveals that generalization performance rarely follows a Gaussian distribution for classical classifiers, regardless of dataset balance, loss function, sampling method, or hyper-parameter tuning along learning curves. Furthermore, we show that the choice of statistical summary, mean versus measures like quantiles affect the top model rankings. Our findings highlight the importance of considering different statistical measures and use of non-parametric approaches when evaluating and selecting machine learning models with learning curves.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 29-36"},"PeriodicalIF":3.3,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical memory-enhanced networks for student knowledge tracing 学生知识追踪的分层记忆增强网络
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-03 DOI: 10.1016/j.patrec.2026.01.002
Huali Yang , Junjie Hu , Tao Huang , Shengze Hu , Wang Gao , Zhuoran Xu , Jing Geng
Accurate recognition of students’ knowledge states is critical for personalized education in the field of intelligent education. Knowledge tracing (KT) has emerged as an important research domain for tracing students’ knowledge states using the analysis of learning trajectory data. However, existing KT methods tend to overlook the hierarchical nature of memory, resulting in incomplete memory transfer. To address this issue, this study proposes a novel hierarchical memory-enhanced knowledge tracing (HMEKT) method that models the hierarchical structure of memory. HMEKT consists of three modules: shallow memory, deep memory, and performance prediction. Specifically, in the shallow memory module, learning and forgetting mechanisms are used to simulate memory growth and decay, capturing the dynamic changes in knowledge states. In the deep memory module, a dynamic memory matrix is used to store the student’s core knowledge system, transferring shallow memory into deep memory through enhancement and reduction gates that control memory transfer. Finally, for predicting student performance, relevant knowledge states are aggregated from the knowledge system matrix for future questions. Experiments on four datasets demonstrate the effectiveness of the model, with a 1.99% AUC gain on Assistment2017 compared to state-of-the-art methods.
在智能教育领域,准确识别学生的知识状态是实现个性化教育的关键。知识追踪(Knowledge tracing, KT)是利用学习轨迹数据分析来追踪学生知识状态的一个重要研究领域。然而,现有的KT方法往往忽略了内存的层次性,导致内存传输不完全。为了解决这一问题,本研究提出了一种新的分层记忆增强知识追踪(HMEKT)方法,该方法对记忆的分层结构进行建模。HMEKT由三个模块组成:浅内存、深内存和性能预测。具体而言,在浅记忆模块中,使用学习和遗忘机制来模拟记忆的增长和衰退,捕捉知识状态的动态变化。在深度记忆模块中,使用动态记忆矩阵来存储学生的核心知识系统,通过控制记忆传递的增强门和还原门将浅记忆传递到深度记忆中。最后,为了预测学生的表现,从知识系统矩阵中汇总相关的知识状态,以备将来的问题。在四个数据集上的实验证明了该模型的有效性,与最先进的方法相比,Assistment2017上的AUC增益为1.99%。
{"title":"Hierarchical memory-enhanced networks for student knowledge tracing","authors":"Huali Yang ,&nbsp;Junjie Hu ,&nbsp;Tao Huang ,&nbsp;Shengze Hu ,&nbsp;Wang Gao ,&nbsp;Zhuoran Xu ,&nbsp;Jing Geng","doi":"10.1016/j.patrec.2026.01.002","DOIUrl":"10.1016/j.patrec.2026.01.002","url":null,"abstract":"<div><div>Accurate recognition of students’ knowledge states is critical for personalized education in the field of intelligent education. Knowledge tracing (KT) has emerged as an important research domain for tracing students’ knowledge states using the analysis of learning trajectory data. However, existing KT methods tend to overlook the hierarchical nature of memory, resulting in incomplete memory transfer. To address this issue, this study proposes a novel hierarchical memory-enhanced knowledge tracing (HMEKT) method that models the hierarchical structure of memory. HMEKT consists of three modules: shallow memory, deep memory, and performance prediction. Specifically, in the shallow memory module, learning and forgetting mechanisms are used to simulate memory growth and decay, capturing the dynamic changes in knowledge states. In the deep memory module, a dynamic memory matrix is used to store the student’s core knowledge system, transferring shallow memory into deep memory through enhancement and reduction gates that control memory transfer. Finally, for predicting student performance, relevant knowledge states are aggregated from the knowledge system matrix for future questions. Experiments on four datasets demonstrate the effectiveness of the model, with a 1.99% AUC gain on Assistment2017 compared to state-of-the-art methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 37-44"},"PeriodicalIF":3.3,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frequency-selective countnet: Enhancing text-guided object counting with frequency features 频率选择计数:通过频率特征增强文本引导的对象计数
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-27 DOI: 10.1016/j.patrec.2025.12.014
Cheng Qian , Jiwu Cao , Ying Mao , Ruotian Zhang , Fei Long , Jun Sang
Text-guided object counting aims to estimate the number of objects described by natural language within complex visual scenes. However, existing approaches often struggle to align textual intent with diverse visual patterns, especially when target objects vary in scale, appearance, or context.
To address these limitations, we propose Frequency-Selective CountNet (FSCNet), a novel framework that integrates spatial and frequency-domain features for precise text-guided counting. FSCNet introduces a Triple-Stream Attention Fusion Module (TSAFM) that combines textual, global, and local visual features. Additionally, an Adaptive Frequency Selector (AFS) dynamically emphasizes frequency components by separately modulating the magnitude and phase spectra, preserving geometric consistency during decoding.
Extensive experiments on the FSC-147 and CARPK datasets demonstrate that FSCNet achieves state-of-the-art performance, outperforming previous best methods by 18.34% in MAE and 27.41% in RMSE on FSC-147 (Avg.) and by 5.17%/7.58% on CARPK.
文本引导的对象计数旨在估计复杂视觉场景中自然语言描述的对象数量。然而,现有的方法往往难以将文本意图与不同的视觉模式结合起来,特别是当目标对象在规模、外观或上下文方面变化时。为了解决这些限制,我们提出了频率选择计数网(FSCNet),这是一个集成空间和频域特征的新框架,用于精确的文本引导计数。FSCNet引入了一个三流注意力融合模块(TSAFM),它结合了文本、全局和局部视觉特征。此外,自适应频率选择器(AFS)通过分别调制幅值和相位谱来动态强调频率成分,在解码过程中保持几何一致性。在FSC-147和CARPK数据集上的大量实验表明,FSCNet达到了最先进的性能,在FSC-147(平均)上,MAE和RMSE分别比以前的最佳方法高出18.34%和27.41%,在CARPK上分别高出5.17%和7.58%。
{"title":"Frequency-selective countnet: Enhancing text-guided object counting with frequency features","authors":"Cheng Qian ,&nbsp;Jiwu Cao ,&nbsp;Ying Mao ,&nbsp;Ruotian Zhang ,&nbsp;Fei Long ,&nbsp;Jun Sang","doi":"10.1016/j.patrec.2025.12.014","DOIUrl":"10.1016/j.patrec.2025.12.014","url":null,"abstract":"<div><div>Text-guided object counting aims to estimate the number of objects described by natural language within complex visual scenes. However, existing approaches often struggle to align textual intent with diverse visual patterns, especially when target objects vary in scale, appearance, or context.</div><div>To address these limitations, we propose Frequency-Selective CountNet (FSCNet), a novel framework that integrates spatial and frequency-domain features for precise text-guided counting. FSCNet introduces a Triple-Stream Attention Fusion Module (TSAFM) that combines textual, global, and local visual features. Additionally, an Adaptive Frequency Selector (AFS) dynamically emphasizes frequency components by separately modulating the magnitude and phase spectra, preserving geometric consistency during decoding.</div><div>Extensive experiments on the FSC-147 and CARPK datasets demonstrate that FSCNet achieves state-of-the-art performance, outperforming previous best methods by 18.34% in MAE and 27.41% in RMSE on FSC-147 (Avg.) and by 5.17%/7.58% on CARPK.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 15-21"},"PeriodicalIF":3.3,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PE-ViT: Parameter-efficient vision transformer with dimension-adaptive experts and economical attention PE-ViT:具有尺寸自适应专家和经济关注的参数高效视觉变压器
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-26 DOI: 10.1016/j.patrec.2025.12.013
Qun Li , Jiru He , Tiancheng Guo , Xinping Gao , Bir Bhanu
Recent advances in Mixture of Experts (MoE) have improved the representational capacity of Vision Transformer (ViT), but most existing methods remain constrained to token-level routing or homogeneous expert scaling, overlooking the diverse representation requirements across different layers and the parameter redundancy within attention modules. To address these problems, we propose PE-ViT, a novel parameter-efficient architecture that integrates the Dimension-adaptive Mixture of Experts (DMoE) and the Selective and Shared Attention (SSA) mechanisms to improve both computational efficiency and model performance. Specifically, DMoE adaptively allocates expert dimensions through layer-wise representation analysis and incorporates shared experts to enhance parameter utilization, while SSA reduces the parameter overhead of attention by dynamically selecting attention heads and sharing query-key projections. Experimental results demonstrate that PE-ViT consistently outperforms existing MoE methods across multiple benchmark datasets.
专家混合(MoE)的最新进展提高了视觉转换器(ViT)的表示能力,但大多数现有方法仍然局限于令牌级路由或同构专家缩放,忽略了不同层之间的不同表示需求和注意模块内部的参数冗余。为了解决这些问题,我们提出了一种新的参数高效架构PE-ViT,它集成了维度自适应混合专家(DMoE)和选择和共享注意(SSA)机制,以提高计算效率和模型性能。具体而言,DMoE通过分层表示分析自适应分配专家维度,并结合共享专家来提高参数利用率,而SSA通过动态选择关注头和共享查询键投影来降低注意力的参数开销。实验结果表明,PE-ViT在多个基准数据集上始终优于现有的MoE方法。
{"title":"PE-ViT: Parameter-efficient vision transformer with dimension-adaptive experts and economical attention","authors":"Qun Li ,&nbsp;Jiru He ,&nbsp;Tiancheng Guo ,&nbsp;Xinping Gao ,&nbsp;Bir Bhanu","doi":"10.1016/j.patrec.2025.12.013","DOIUrl":"10.1016/j.patrec.2025.12.013","url":null,"abstract":"<div><div>Recent advances in Mixture of Experts (MoE) have improved the representational capacity of Vision Transformer (ViT), but most existing methods remain constrained to token-level routing or homogeneous expert scaling, overlooking the diverse representation requirements across different layers and the parameter redundancy within attention modules. To address these problems, we propose PE-ViT, a novel parameter-efficient architecture that integrates the Dimension-adaptive Mixture of Experts (DMoE) and the Selective and Shared Attention (SSA) mechanisms to improve both computational efficiency and model performance. Specifically, DMoE adaptively allocates expert dimensions through layer-wise representation analysis and incorporates shared experts to enhance parameter utilization, while SSA reduces the parameter overhead of attention by dynamically selecting attention heads and sharing query-key projections. Experimental results demonstrate that PE-ViT consistently outperforms existing MoE methods across multiple benchmark datasets.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 135-141"},"PeriodicalIF":3.3,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1