首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
SAMIRO: Spatial Attention Mutual Information Regularization with a pre-trained model as Oracle for lane detection SAMIRO:基于预训练模型的空间注意互信息正则化,用于车道检测
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-17 DOI: 10.1016/j.patrec.2025.10.013
Hyunjong Lee , Jangho Lee , Jaekoo Lee
Lane detection is an important topic in the future mobility solutions. Real-world environmental challenges such as background clutter, varying illumination, and occlusions pose significant obstacles to effective lane detection, particularly when relying on data-driven approaches that require substantial effort and cost for data collection and annotation. To address these issues, lane detection methods must leverage contextual and global information from surrounding lanes and objects. In this paper, we propose a Spatial Attention Mutual Information Regularization with a pre-trained model as an Oracle, called SAMIRO. SAMIRO enhances lane detection performance by transferring knowledge from a pre-trained model while preserving domain-agnostic spatial information. Leveraging SAMIRO’s plug-and-play characteristic, we integrate it into various state-of-the-art lane detection approaches and conduct extensive experiments on major benchmarks such as CULane, Tusimple, and LLAMAS. The results demonstrate that SAMIRO consistently improves performance across different models and datasets. The code will be made available upon publication.
车道检测是未来交通解决方案中的一个重要课题。现实世界的环境挑战,如背景杂乱、光照变化和遮挡,对有效的车道检测构成了重大障碍,特别是当依赖于数据驱动的方法时,需要大量的努力和成本来收集和注释数据。为了解决这些问题,车道检测方法必须利用来自周围车道和物体的上下文和全局信息。在本文中,我们提出了一种空间注意互信息正则化方法,将预训练模型作为Oracle,称为SAMIRO。SAMIRO通过从预训练模型转移知识来增强车道检测性能,同时保留与领域无关的空间信息。利用SAMIRO的即插即用特性,我们将其集成到各种最先进的车道检测方法中,并在CULane, Tusimple和LLAMAS等主要基准上进行了广泛的实验。结果表明,SAMIRO在不同的模型和数据集上一致地提高了性能。该准则将在出版后提供。
{"title":"SAMIRO: Spatial Attention Mutual Information Regularization with a pre-trained model as Oracle for lane detection","authors":"Hyunjong Lee ,&nbsp;Jangho Lee ,&nbsp;Jaekoo Lee","doi":"10.1016/j.patrec.2025.10.013","DOIUrl":"10.1016/j.patrec.2025.10.013","url":null,"abstract":"<div><div>Lane detection is an important topic in the future mobility solutions. Real-world environmental challenges such as background clutter, varying illumination, and occlusions pose significant obstacles to effective lane detection, particularly when relying on data-driven approaches that require substantial effort and cost for data collection and annotation. To address these issues, lane detection methods must leverage contextual and global information from surrounding lanes and objects. In this paper, we propose a <em>Spatial Attention Mutual Information Regularization with a pre-trained model as an Oracle</em>, called <em>SAMIRO</em>. SAMIRO enhances lane detection performance by transferring knowledge from a pre-trained model while preserving domain-agnostic spatial information. Leveraging SAMIRO’s plug-and-play characteristic, we integrate it into various state-of-the-art lane detection approaches and conduct extensive experiments on major benchmarks such as CULane, Tusimple, and LLAMAS. The results demonstrate that SAMIRO consistently improves performance across different models and datasets. The code will be made available upon publication.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 198-204"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Findings from shared tasks on hate speech detection: Performance patterns for low-resource languages 仇恨言论检测的共享任务发现:低资源语言的性能模式
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-09-26 DOI: 10.1016/j.patrec.2025.09.004
Koyel Ghosh , Saptarshi Saha , Thomas Mandl , Sandip Modha
In the digital era, social media has emerged as a powerful channel for expressing opinions, but online platforms have also become a breeding ground for hate speech targeting individuals based on color, caste, gender, sexual orientation, and political ideologies. Despite growing interest in automatic hate speech detection, existing research remains predominantly focused on English, underscoring a critical need to extend efforts to under-resourced languages. To bridge this gap, the HASOC (Hate Speech and Offensive Content Identification) shared task has been promoting multilingual hate speech research. In this paper, we present a brief overview of these four shared tasks (Assamese, Bengali, Bodo and English), datasets, participating systems, and their performance across standard evaluation metrics—precision, recall, accuracy, and macro F1 score. In addition, we analyze the inter-system agreement using Cohen’s κ and Fleiss’ κ, and investigate item-level difficulty through hardness analyses. Our findings offer valuable insights into the challenges and progress in multilingual hate speech detection, particularly for low-resource languages. This paper also serves as a model for the analysis of other results of large-scale experimentation with text classification systems.
在数字时代,社交媒体已成为表达意见的强大渠道,但在线平台也成为基于肤色、种姓、性别、性取向和政治意识形态针对个人的仇恨言论的滋生地。尽管对自动仇恨言论检测的兴趣越来越大,但现有的研究仍然主要集中在英语上,这强调了将努力扩展到资源不足的语言的迫切需要。为了弥补这一差距,HASOC(仇恨言论和攻击性内容识别)的共同任务是促进多语言仇恨言论的研究。在本文中,我们简要概述了这四个共享任务(阿萨姆语、孟加拉语、博多语和英语)、数据集、参与系统,以及它们在标准评估指标(精度、召回率、准确性和宏观F1分数)上的表现。此外,我们使用Cohen ' s κ和Fleiss ' s κ分析系统间一致性,并通过硬度分析研究项目级难度。我们的研究结果为多语言仇恨言论检测的挑战和进展提供了有价值的见解,特别是对于低资源语言。本文也为文本分类系统的其他大规模实验结果分析提供了一个模型。
{"title":"Findings from shared tasks on hate speech detection: Performance patterns for low-resource languages","authors":"Koyel Ghosh ,&nbsp;Saptarshi Saha ,&nbsp;Thomas Mandl ,&nbsp;Sandip Modha","doi":"10.1016/j.patrec.2025.09.004","DOIUrl":"10.1016/j.patrec.2025.09.004","url":null,"abstract":"<div><div>In the digital era, social media has emerged as a powerful channel for expressing opinions, but online platforms have also become a breeding ground for hate speech targeting individuals based on color, caste, gender, sexual orientation, and political ideologies. Despite growing interest in automatic hate speech detection, existing research remains predominantly focused on English, underscoring a critical need to extend efforts to under-resourced languages. To bridge this gap, the HASOC (Hate Speech and Offensive Content Identification) shared task has been promoting multilingual hate speech research. In this paper, we present a brief overview of these four shared tasks (Assamese, Bengali, Bodo and English), datasets, participating systems, and their performance across standard evaluation metrics—precision, recall, accuracy, and macro F1 score. In addition, we analyze the inter-system agreement using Cohen’s <span><math><mi>κ</mi></math></span> and Fleiss’ <span><math><mi>κ</mi></math></span>, and investigate item-level difficulty through hardness analyses. Our findings offer valuable insights into the challenges and progress in multilingual hate speech detection, particularly for low-resource languages. This paper also serves as a model for the analysis of other results of large-scale experimentation with text classification systems.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 303-309"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Window self-attention and 3D volumetric refinement for large vessel occlusion detection in brain angiography 脑血管造影中大血管闭塞检测的窗口自关注和三维体积细化
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-01 DOI: 10.1016/j.patrec.2025.10.019
Ciro Russo , Giulio Russo , Arnau Oliver , Xavier Lladó , Mikel Terceño , Yolanda Silva , Alessandro Bria , Claudio Marrocco
Stroke is one of the leading causes of death and long-term disability worldwide, with large vessel occlusion representing one of the most severe forms due to its association with extensive brain damage and poor prognosis. Rapid and reliable detection of large vessel occlusion in emergency settings is therefore essential to guide timely treatment decisions. Computed tomography angiography is currently the reference imaging modality for this task, as it provides high-resolution visualization of cerebral vessels within minutes. Nevertheless, the small size and variable location of thrombi make their identification difficult, often requiring expert radiological interpretation and being prone to missed detections. In this study, we propose a unified deep learning architecture that integrates GravityNet for slice-wise localization of vessel obstruction with a multi-head self-attention mechanism designed to capture spatial continuity across adjacent slices. A volumetric refinement stage based on three-dimensional non-maximum suppression consolidates overlapping predictions and reduces false positives across the brain volume. Evaluated on a private dataset of computed tomography angiography scans, the proposed method achieves 70.8% sensitivity at one false positive per scan, showing its potential to support automated and time-critical detection of clots in acute stroke workflows.
中风是世界范围内死亡和长期残疾的主要原因之一,大血管闭塞是最严重的形式之一,因为它与广泛的脑损伤和预后不良有关。因此,在紧急情况下快速可靠地检测大血管闭塞对于指导及时的治疗决策至关重要。计算机断层血管造影是目前这项任务的参考成像方式,因为它可以在几分钟内提供脑血管的高分辨率可视化。然而,血栓的小尺寸和可变位置使其难以识别,通常需要专家的放射学解释,并且容易漏检。在这项研究中,我们提出了一个统一的深度学习架构,该架构集成了用于血管阻塞切片定位的GravityNet和用于捕获相邻切片空间连续性的多头自注意机制。基于三维非最大抑制的体积细化阶段巩固了重叠的预测并减少了整个脑容量的误报。在计算机断层血管造影扫描的私人数据集上进行评估,该方法在每次扫描一次假阳性的情况下达到70.8%的灵敏度,显示出其支持急性卒中工作流程中血块自动化和时间关键检测的潜力。
{"title":"Window self-attention and 3D volumetric refinement for large vessel occlusion detection in brain angiography","authors":"Ciro Russo ,&nbsp;Giulio Russo ,&nbsp;Arnau Oliver ,&nbsp;Xavier Lladó ,&nbsp;Mikel Terceño ,&nbsp;Yolanda Silva ,&nbsp;Alessandro Bria ,&nbsp;Claudio Marrocco","doi":"10.1016/j.patrec.2025.10.019","DOIUrl":"10.1016/j.patrec.2025.10.019","url":null,"abstract":"<div><div>Stroke is one of the leading causes of death and long-term disability worldwide, with large vessel occlusion representing one of the most severe forms due to its association with extensive brain damage and poor prognosis. Rapid and reliable detection of large vessel occlusion in emergency settings is therefore essential to guide timely treatment decisions. Computed tomography angiography is currently the reference imaging modality for this task, as it provides high-resolution visualization of cerebral vessels within minutes. Nevertheless, the small size and variable location of thrombi make their identification difficult, often requiring expert radiological interpretation and being prone to missed detections. In this study, we propose a unified deep learning architecture that integrates GravityNet for slice-wise localization of vessel obstruction with a multi-head self-attention mechanism designed to capture spatial continuity across adjacent slices. A volumetric refinement stage based on three-dimensional non-maximum suppression consolidates overlapping predictions and reduces false positives across the brain volume. Evaluated on a private dataset of computed tomography angiography scans, the proposed method achieves 70.8% sensitivity at one false positive per scan, showing its potential to support automated and time-critical detection of clots in acute stroke workflows.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 27-33"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MTW-DETR: A multi-task collaborative optimization model for adverse weather object detection MTW-DETR:一种多任务协同优化的恶劣天气目标检测模型
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-10-29 DOI: 10.1016/j.patrec.2025.10.018
Bo Peng, Chao Ma, Yifan Chen, Mi Zhu, Ningsheng Liao
Most object detection models are trained under ideal lighting and weather conditions. However, when deployed in adverse weather conditions such as haze, rain, and snow, these models suffer from image quality degradation and target occlusion problems, leading to deteriorated detection performance. To address these challenges, this paper proposes MTW-DETR, a multi-task collaborative detection model that employs a dual-stream network architecture to achieve joint optimization of image restoration and object detection. The model enhances feature representation capabilities for low-quality images through a cross-task feature sharing mechanism and a feature enhancement module. Specifically, within the joint learning framework, we design three key components. First, a restoration subnetwork embedded with a Channel Pixel Attention module achieves fine-grained image restoration and adopts a dynamic feature calibration strategy, thereby improving degraded image quality. Furthermore, a Weight Space Reconstruction Module is integrated into the backbone network to enhance multi-scale feature representation capabilities. Finally, a Branch Shift Convolution Module is incorporated in the neck to improve global information extraction ability, enhance understanding of the overall image structure and feature representation. Experimental results demonstrate that on the real haze dataset RTTS, our model achieves 38% AP, representing a 3.7% improvement over the baseline model RT-DETR. In cross-domain evaluations on synthetic rain and fog datasets, the model shows significant accuracy improvements and exhibits excellent generalization ability across diverse weather scenarios.
大多数目标检测模型都是在理想的光照和天气条件下训练的。然而,当部署在恶劣的天气条件下,如雾霾、雨和雪,这些模型遭受图像质量下降和目标遮挡问题,导致检测性能下降。针对这些挑战,本文提出了MTW-DETR多任务协同检测模型,该模型采用双流网络架构实现图像恢复和目标检测的联合优化。该模型通过跨任务特征共享机制和特征增强模块增强了低质量图像的特征表示能力。具体来说,在联合学习框架内,我们设计了三个关键组成部分。首先,嵌入Channel Pixel Attention模块的恢复子网实现细粒度图像恢复,并采用动态特征校准策略,改善退化图像质量。此外,在骨干网中集成了权重空间重构模块,增强了多尺度特征表示能力。最后,在颈部加入分支移位卷积模块,提高全局信息提取能力,增强对图像整体结构和特征表示的理解。实验结果表明,在真实雾霾数据集RTTS上,我们的模型达到38%的AP,比基线模型RT-DETR提高了3.7%。在对合成雨雾数据集的跨域评估中,该模型的精度有了显著提高,并在不同天气情景下表现出出色的泛化能力。
{"title":"MTW-DETR: A multi-task collaborative optimization model for adverse weather object detection","authors":"Bo Peng,&nbsp;Chao Ma,&nbsp;Yifan Chen,&nbsp;Mi Zhu,&nbsp;Ningsheng Liao","doi":"10.1016/j.patrec.2025.10.018","DOIUrl":"10.1016/j.patrec.2025.10.018","url":null,"abstract":"<div><div>Most object detection models are trained under ideal lighting and weather conditions. However, when deployed in adverse weather conditions such as haze, rain, and snow, these models suffer from image quality degradation and target occlusion problems, leading to deteriorated detection performance. To address these challenges, this paper proposes MTW-DETR, a multi-task collaborative detection model that employs a dual-stream network architecture to achieve joint optimization of image restoration and object detection. The model enhances feature representation capabilities for low-quality images through a cross-task feature sharing mechanism and a feature enhancement module. Specifically, within the joint learning framework, we design three key components. First, a restoration subnetwork embedded with a Channel Pixel Attention module achieves fine-grained image restoration and adopts a dynamic feature calibration strategy, thereby improving degraded image quality. Furthermore, a Weight Space Reconstruction Module is integrated into the backbone network to enhance multi-scale feature representation capabilities. Finally, a Branch Shift Convolution Module is incorporated in the neck to improve global information extraction ability, enhance understanding of the overall image structure and feature representation. Experimental results demonstrate that on the real haze dataset RTTS, our model achieves 38% AP, representing a 3.7% improvement over the baseline model RT-DETR. In cross-domain evaluations on synthetic rain and fog datasets, the model shows significant accuracy improvements and exhibits excellent generalization ability across diverse weather scenarios.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 7-12"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145384598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TransStyle: Transformer-based StyleGAN for image inversion and editing TransStyle:用于图像反转和编辑的基于transformer的StyleGAN
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-01 Epub Date: 2025-09-10 DOI: 10.1016/j.patrec.2025.09.002
Yingchun Guo, Xueqi Lv, Gang Yan, Shu Chen, Shi Di
Image inversion using StyleGAN retrieves latent codes by embedding real images into the GAN’s latent space, enabling attribute editing and high-quality image generation. However, existing methods often struggle with reconstruction reliability and flexible editing, resulting in low-quality outcomes. To address these issues, we propose TransStyle, a new StyleGAN inversion model based on Transformer technology. Our model features a novel encoder structure, PACP (Path Aggregation with Covariance Pooling), for improved feature representation and a feature prediction head that uses covariance pooling. Additionally, we propose a Transformer-based module to enhance interactions with semantic information in the latent space. StyleGAN then uses this enhanced latent code to generate images with high fidelity and strong editability. Experimental results demonstrate that our method achieves at least 5% higher face reconstruction similarity compared to current state-of-the-art techniques, confirming the advantages of TransStyle in image reconstruction and editing quality.
使用StyleGAN的图像反演通过将真实图像嵌入GAN的潜在空间来检索潜在代码,从而实现属性编辑和高质量图像生成。然而,现有的方法往往在重建可靠性和灵活编辑方面存在困难,导致结果质量低。为了解决这些问题,我们提出了一种新的基于Transformer技术的StyleGAN反演模型TransStyle。我们的模型采用了一种新的编码器结构,PACP(路径聚合与协方差池),用于改进特征表示和使用协方差池的特征预测头。此外,我们提出了一个基于transformer的模块来增强与潜在空间中语义信息的交互。StyleGAN然后使用这种增强的潜在代码来生成具有高保真度和强可编辑性的图像。实验结果表明,与目前最先进的技术相比,我们的方法实现了至少5%的人脸重建相似度,证实了TransStyle在图像重建和编辑质量方面的优势。
{"title":"TransStyle: Transformer-based StyleGAN for image inversion and editing","authors":"Yingchun Guo,&nbsp;Xueqi Lv,&nbsp;Gang Yan,&nbsp;Shu Chen,&nbsp;Shi Di","doi":"10.1016/j.patrec.2025.09.002","DOIUrl":"10.1016/j.patrec.2025.09.002","url":null,"abstract":"<div><div>Image inversion using StyleGAN retrieves latent codes by embedding real images into the GAN’s latent space, enabling attribute editing and high-quality image generation. However, existing methods often struggle with reconstruction reliability and flexible editing, resulting in low-quality outcomes. To address these issues, we propose TransStyle, a new StyleGAN inversion model based on Transformer technology. Our model features a novel encoder structure, PACP (Path Aggregation with Covariance Pooling), for improved feature representation and a feature prediction head that uses covariance pooling. Additionally, we propose a Transformer-based module to enhance interactions with semantic information in the latent space. StyleGAN then uses this enhanced latent code to generate images with high fidelity and strong editability. Experimental results demonstrate that our method achieves at least 5% higher face reconstruction similarity compared to current state-of-the-art techniques, confirming the advantages of TransStyle in image reconstruction and editing quality.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 1-7"},"PeriodicalIF":3.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145108644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Which images can be effectively learnt from self-supervised learning? 哪些图像可以从自监督学习中有效学习?
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-01 Epub Date: 2025-09-13 DOI: 10.1016/j.patrec.2025.09.003
Michalis Lazarou , Sata Atito , Muhammad Awais , Josef Kittler
Self-supervised learning has shown unprecedented success for learning expressive representations that can be used effectively to solve downstream tasks. However, while the impressive results of self-supervised learning are undeniable there is still a certain mystery regarding how self-supervised learning models learn, what features are they learning and most importantly which examples are hard to learn. Contrastive learning is one of the prominent lines of research in self-supervised learning, where a subcategory of methods relies on knowledge-distillation between a student network and a teacher network which is an exponentially moving average of the student, initially proposed by the seminal work of DINO. In this work we investigate models trained using this family of self-supervised methods and reveal certain properties about them. Specifically, we propose a novel perspective on understanding which examples and which classes are difficult to be learnt effectively during training through the lens of information theory.
自我监督学习在学习表达表征方面取得了前所未有的成功,这些表达表征可以有效地用于解决下游任务。然而,尽管自监督学习的令人印象深刻的结果是不可否认的,但关于自监督学习模型如何学习,它们学习什么特征,最重要的是哪些例子很难学习,仍然存在一定的谜团。对比学习是自我监督学习的重要研究方向之一,其中的一子类方法依赖于学生网络和教师网络之间的知识蒸馏,这是学生的指数移动平均值,最初由DINO的开创性工作提出。在这项工作中,我们研究了使用这一系列自监督方法训练的模型,并揭示了它们的某些特性。具体来说,我们提出了一种新的视角,通过信息论的视角来理解哪些例子和哪些类在训练中难以有效学习。
{"title":"Which images can be effectively learnt from self-supervised learning?","authors":"Michalis Lazarou ,&nbsp;Sata Atito ,&nbsp;Muhammad Awais ,&nbsp;Josef Kittler","doi":"10.1016/j.patrec.2025.09.003","DOIUrl":"10.1016/j.patrec.2025.09.003","url":null,"abstract":"<div><div>Self-supervised learning has shown unprecedented success for learning expressive representations that can be used effectively to solve downstream tasks. However, while the impressive results of self-supervised learning are undeniable there is still a certain mystery regarding how self-supervised learning models learn, what features are they learning and most importantly which examples are hard to learn. Contrastive learning is one of the prominent lines of research in self-supervised learning, where a subcategory of methods relies on knowledge-distillation between a student network and a teacher network which is an exponentially moving average of the student, initially proposed by the seminal work of DINO. In this work we investigate models trained using this family of self-supervised methods and reveal certain properties about them. Specifically, we propose a novel perspective on understanding which examples and which classes are difficult to be learnt effectively during training through the lens of information theory.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 8-13"},"PeriodicalIF":3.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145108645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large margin classifier with graph-based adaptive regularization 基于图自适应正则化的大边界分类器
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-01 Epub Date: 2025-09-25 DOI: 10.1016/j.patrec.2025.09.008
Vítor M. Hanriot , Turíbio T. Salis , Luiz C.B. Torres , Frederico Coelho , Antonio P. Braga
This paper introduces the use of per-class regularization hyperparameters in Gabriel graph-based binary classifiers. We demonstrate how the quality index used for regularization behaves both in the margin region and in the presence of outliers, and how incorporating this regularization flexibility can lead to solutions that effectively eliminate outliers while training the classifier. We also show how it can address class imbalance by generating higher and lower thresholds for the majority and minority classes, respectively. Thus, rather than having a single solution based on fixed thresholds, flexible thresholds expand the solution space and can be optimized through hyperparameter tuning algorithms. Friedman test shows that flexible thresholds are capable of improving Gabriel graph-based classifiers.
介绍了每类正则化超参数在基于Gabriel图的二元分类器中的应用。我们演示了用于正则化的质量指数如何在边缘区域和异常值存在的情况下表现,以及如何结合这种正则化灵活性可以在训练分类器时有效地消除异常值的解决方案。我们还展示了它如何通过分别为多数和少数阶级生成更高和更低的阈值来解决阶级不平衡问题。因此,灵活的阈值扩展了解空间,并可以通过超参数调优算法进行优化,而不是基于固定阈值的单一解。Friedman测试表明,灵活的阈值能够改进基于Gabriel图的分类器。
{"title":"Large margin classifier with graph-based adaptive regularization","authors":"Vítor M. Hanriot ,&nbsp;Turíbio T. Salis ,&nbsp;Luiz C.B. Torres ,&nbsp;Frederico Coelho ,&nbsp;Antonio P. Braga","doi":"10.1016/j.patrec.2025.09.008","DOIUrl":"10.1016/j.patrec.2025.09.008","url":null,"abstract":"<div><div>This paper introduces the use of per-class regularization hyperparameters in Gabriel graph-based binary classifiers. We demonstrate how the quality index used for regularization behaves both in the margin region and in the presence of outliers, and how incorporating this regularization flexibility can lead to solutions that effectively eliminate outliers while training the classifier. We also show how it can address class imbalance by generating higher and lower thresholds for the majority and minority classes, respectively. Thus, rather than having a single solution based on fixed thresholds, flexible thresholds expand the solution space and can be optimized through hyperparameter tuning algorithms. Friedman test shows that flexible thresholds are capable of improving Gabriel graph-based classifiers.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 43-49"},"PeriodicalIF":3.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A self-supervised contrastive learning approach for latent fingerprint identification 一种用于潜在指纹识别的自监督对比学习方法
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-01 Epub Date: 2025-09-23 DOI: 10.1016/j.patrec.2025.09.005
Andre Nobrega , Ilan Theodoro , Pascual Figueroa , Alexandre Xavier Falcão
Latent fingerprints are challenging to identify due to low quality, partial impressions, and noise. This paper proposes a self-supervised contrastive learning approach to generate minutiae embeddings, improving fingerprint representation and matching. We first introduce a method to synthesize realistic latent fingerprints from rolled and plain images by applying ridge distortions, contrast shifts, blurring, noise, and document-based backgrounds. The resulting dataset includes reliable minutiae correspondences for effective training. Fingerprints are then represented as orientation-aligned, minutia-centered patches. A Siamese network trained with contrastive learning on these patches produces discriminative embeddings. Matching computes the mean cosine similarity between the embeddings of paired minutiae from candidate references selected by a matcher. Experiments on NIST SD27 and SD302, using a 20,473-print gallery, demonstrate rank-1 identification gains of 4.25 and 1.66 percentage points over prior work. It also consistently outperforms other synthetic latent generation baselines.
潜在指纹由于质量低、部分印痕和噪声而具有挑战性。本文提出了一种自监督对比学习方法来生成细节嵌入,从而改善指纹的表示和匹配。我们首先介绍了一种通过应用脊线扭曲、对比度偏移、模糊、噪声和基于文档的背景,从滚动图像和平面图像合成真实潜在指纹的方法。生成的数据集包括可靠的细节对应,用于有效的训练。然后将指纹表示为方向对齐的、以细节为中心的斑块。在这些斑块上进行对比学习训练的暹罗网络会产生判别嵌入。匹配计算匹配器从候选参考中选择的成对细节的嵌入之间的平均余弦相似度。在NIST SD27和SD302上进行的实验,使用20,473个打印库,证明了与之前的工作相比,排名1的识别增益分别为4.25和1.66个百分点。它也始终优于其他合成潜在代基线。
{"title":"A self-supervised contrastive learning approach for latent fingerprint identification","authors":"Andre Nobrega ,&nbsp;Ilan Theodoro ,&nbsp;Pascual Figueroa ,&nbsp;Alexandre Xavier Falcão","doi":"10.1016/j.patrec.2025.09.005","DOIUrl":"10.1016/j.patrec.2025.09.005","url":null,"abstract":"<div><div>Latent fingerprints are challenging to identify due to low quality, partial impressions, and noise. This paper proposes a self-supervised contrastive learning approach to generate minutiae embeddings, improving fingerprint representation and matching. We first introduce a method to synthesize realistic latent fingerprints from rolled and plain images by applying ridge distortions, contrast shifts, blurring, noise, and document-based backgrounds. The resulting dataset includes reliable minutiae correspondences for effective training. Fingerprints are then represented as orientation-aligned, minutia-centered patches. A Siamese network trained with contrastive learning on these patches produces discriminative embeddings. Matching computes the mean cosine similarity between the embeddings of paired minutiae from candidate references selected by a matcher. Experiments on NIST SD27 and SD302, using a 20,473-print gallery, demonstrate rank-1 identification gains of 4.25 and 1.66 percentage points over prior work. It also consistently outperforms other synthetic latent generation baselines.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 125-131"},"PeriodicalIF":3.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using traditional and Bayesian neural networks for fast parameter estimation in SAR images 利用传统神经网络和贝叶斯神经网络对SAR图像进行快速参数估计
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-01 Epub Date: 2025-08-18 DOI: 10.1016/j.patrec.2025.08.002
Li Fan , Jeová Farias Sales Rocha Neto
Synthetic Aperture Radar (SAR) imagery analysis plays a crucial role in remote sensing applications but presents challenges due to the presence of inherent speckle noise. To address this, a common practice involves employing the GI0 distribution model to obtain roughness information from the data, facilitating subsequent imaging processes such as segmentation, and classification. Consequently, there is a demand for rapid and reliable methods to estimate the roughness parameter from SAR data, notably in high-resolution imaging contexts. Existing parameter estimation techniques, however, are often slow and susceptible to errors and failures. In this work, we proposed a neural network-based estimation framework that initially learns to predict the underlying parameters of GI0 samples, which then enables it to estimate the roughness of new, unseen data. Our results demonstrate that this neural network-based estimator is faster and more reliable, beside yielding less estimation error, than conventional estimation methods. We further show that this estimation can be further improved by using Bayesian Neural Networks, which additionally promote estimation uncertainty prediction. Finally, we show that this approach can be generalized to handle image inputs and, even if trained on simulated data, is able to perform real-time pixel-wise roughness estimation for high-resolution real SAR imagery.
合成孔径雷达(SAR)图像分析在遥感应用中起着至关重要的作用,但由于存在固有的散斑噪声而面临挑战。为了解决这个问题,一种常见的做法是使用GI0分布模型从数据中获取粗糙度信息,从而促进后续的成像处理,如分割和分类。因此,需要一种快速可靠的方法来从SAR数据中估计粗糙度参数,特别是在高分辨率成像环境中。然而,现有的参数估计技术通常很慢,而且容易出错和失败。在这项工作中,我们提出了一个基于神经网络的估计框架,该框架最初学习预测GI0样本的潜在参数,然后使其能够估计新的,未见过的数据的粗糙度。结果表明,与传统的估计方法相比,基于神经网络的估计方法更快、更可靠,而且估计误差更小。我们进一步证明,贝叶斯神经网络可以进一步改善这种估计,并进一步促进估计不确定性的预测。最后,我们证明了这种方法可以推广到处理图像输入,并且即使在模拟数据上进行训练,也能够对高分辨率真实SAR图像进行实时逐像素粗糙度估计。
{"title":"Using traditional and Bayesian neural networks for fast parameter estimation in SAR images","authors":"Li Fan ,&nbsp;Jeová Farias Sales Rocha Neto","doi":"10.1016/j.patrec.2025.08.002","DOIUrl":"10.1016/j.patrec.2025.08.002","url":null,"abstract":"<div><div>Synthetic Aperture Radar (SAR) imagery analysis plays a crucial role in remote sensing applications but presents challenges due to the presence of inherent speckle noise. To address this, a common practice involves employing the <span><math><msubsup><mrow><mi>G</mi></mrow><mrow><mi>I</mi></mrow><mrow><mn>0</mn></mrow></msubsup></math></span> distribution model to obtain roughness information from the data, facilitating subsequent imaging processes such as segmentation, and classification. Consequently, there is a demand for rapid and reliable methods to estimate the roughness parameter from SAR data, notably in high-resolution imaging contexts. Existing parameter estimation techniques, however, are often slow and susceptible to errors and failures. In this work, we proposed a neural network-based estimation framework that initially learns to predict the underlying parameters of <span><math><msubsup><mrow><mi>G</mi></mrow><mrow><mi>I</mi></mrow><mrow><mn>0</mn></mrow></msubsup></math></span> samples, which then enables it to estimate the roughness of new, unseen data. Our results demonstrate that this neural network-based estimator is faster and more reliable, beside yielding less estimation error, than conventional estimation methods. We further show that this estimation can be further improved by using Bayesian Neural Networks, which additionally promote estimation uncertainty prediction. Finally, we show that this approach can be generalized to handle image inputs and, even if trained on simulated data, is able to perform real-time pixel-wise roughness estimation for high-resolution real SAR imagery.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 140-146"},"PeriodicalIF":3.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Streamlined extended Long Short-Term Memory for video skimming 流线型扩展长短期记忆视频浏览
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-01 Epub Date: 2025-08-12 DOI: 10.1016/j.patrec.2025.08.001
Leonardo Vilela Cardoso, Barbara Hellen P. Soraggi, Silvio Jamil F. Guimarães, Zenilton K.G. Patrocínio Jr
Video skimming aims to generate concise yet informative summaries that highlight the most salient aspects of a video. However, conventional methods often struggle with diverse and redundant content due to their limited ability to detect scene transitions and insufficient temporal modeling. To address these challenges, we propose Streamlined Extended Long Short-Term Memory (StreamExLSTM), a supervised architecture derived from a streamlined variant of the extended Long Short-Term Memory (xLSTM) model. The proposed approach introduces two lightweight modules: ssLSTM, which captures short-range temporal dependencies through convolutional and recurrent operations, and smLSTM, which models long-range narrative structure using stacked memory-enhanced LSTMs. This dual-path design enables the model to balance local detail with global coherence while maintaining low complexity. Experimental results demonstrate that StreamExLSTM outperforms recent supervised baselines, achieving an average F-score of 48.8 on SumMe and 61.1 on TVSum. Moreover, when trained on a combined dataset, it reaches an F-score of 83.7 on the TVSum test set, performing comparably to semi-supervised, reinforcement learning, and GAN-based methods. These results validate StreamExLSTM as an effective and lightweight solution for dynamic video summarization.
视频略读的目的是生成简洁但信息丰富的摘要,突出视频的最突出方面。然而,由于传统方法检测场景过渡的能力有限和时间建模不足,它们经常与多样化和冗余的内容作斗争。为了解决这些挑战,我们提出了流线型扩展长短期记忆(StreamExLSTM),这是一种源自扩展长短期记忆(xLSTM)模型的流线型变体的监督架构。所提出的方法引入了两个轻量级模块:ssLSTM,它通过卷积和循环操作捕获短期时间依赖性;smLSTM,它使用堆叠内存增强的lstm对长期叙述结构建模。这种双路径设计使模型能够在保持低复杂性的同时平衡局部细节和全局相干性。实验结果表明,StreamExLSTM优于最近的监督基线,在SumMe上达到48.8分的平均f分,在TVSum上达到61.1分。此外,当在组合数据集上训练时,它在TVSum测试集上达到83.7的f分,与半监督、强化学习和基于gan的方法相当。这些结果验证了StreamExLSTM是一种有效的轻量级动态视频摘要解决方案。
{"title":"Streamlined extended Long Short-Term Memory for video skimming","authors":"Leonardo Vilela Cardoso,&nbsp;Barbara Hellen P. Soraggi,&nbsp;Silvio Jamil F. Guimarães,&nbsp;Zenilton K.G. Patrocínio Jr","doi":"10.1016/j.patrec.2025.08.001","DOIUrl":"10.1016/j.patrec.2025.08.001","url":null,"abstract":"<div><div>Video skimming aims to generate concise yet informative summaries that highlight the most salient aspects of a video. However, conventional methods often struggle with diverse and redundant content due to their limited ability to detect scene transitions and insufficient temporal modeling. To address these challenges, we propose <strong>Stream</strong>lined <strong>Ex</strong>tended <strong>L</strong>ong <strong>S</strong>hort-<strong>T</strong>erm <strong>M</strong>emory (<strong>StreamExLSTM</strong>), a supervised architecture derived from a streamlined variant of the extended Long Short-Term Memory (xLSTM) model. The proposed approach introduces two lightweight modules: ssLSTM, which captures short-range temporal dependencies through convolutional and recurrent operations, and smLSTM, which models long-range narrative structure using stacked memory-enhanced LSTMs. This dual-path design enables the model to balance local detail with global coherence while maintaining low complexity. Experimental results demonstrate that StreamExLSTM outperforms recent supervised baselines, achieving an average F-score of 48.8 on SumMe and 61.1 on TVSum. Moreover, when trained on a combined dataset, it reaches an F-score of 83.7 on the TVSum test set, performing comparably to semi-supervised, reinforcement learning, and GAN-based methods. These results validate StreamExLSTM as an effective and lightweight solution for dynamic video summarization.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 132-139"},"PeriodicalIF":3.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1