Pub Date : 2026-01-01Epub Date: 2025-11-17DOI: 10.1016/j.patrec.2025.10.013
Hyunjong Lee , Jangho Lee , Jaekoo Lee
Lane detection is an important topic in the future mobility solutions. Real-world environmental challenges such as background clutter, varying illumination, and occlusions pose significant obstacles to effective lane detection, particularly when relying on data-driven approaches that require substantial effort and cost for data collection and annotation. To address these issues, lane detection methods must leverage contextual and global information from surrounding lanes and objects. In this paper, we propose a Spatial Attention Mutual Information Regularization with a pre-trained model as an Oracle, called SAMIRO. SAMIRO enhances lane detection performance by transferring knowledge from a pre-trained model while preserving domain-agnostic spatial information. Leveraging SAMIRO’s plug-and-play characteristic, we integrate it into various state-of-the-art lane detection approaches and conduct extensive experiments on major benchmarks such as CULane, Tusimple, and LLAMAS. The results demonstrate that SAMIRO consistently improves performance across different models and datasets. The code will be made available upon publication.
{"title":"SAMIRO: Spatial Attention Mutual Information Regularization with a pre-trained model as Oracle for lane detection","authors":"Hyunjong Lee , Jangho Lee , Jaekoo Lee","doi":"10.1016/j.patrec.2025.10.013","DOIUrl":"10.1016/j.patrec.2025.10.013","url":null,"abstract":"<div><div>Lane detection is an important topic in the future mobility solutions. Real-world environmental challenges such as background clutter, varying illumination, and occlusions pose significant obstacles to effective lane detection, particularly when relying on data-driven approaches that require substantial effort and cost for data collection and annotation. To address these issues, lane detection methods must leverage contextual and global information from surrounding lanes and objects. In this paper, we propose a <em>Spatial Attention Mutual Information Regularization with a pre-trained model as an Oracle</em>, called <em>SAMIRO</em>. SAMIRO enhances lane detection performance by transferring knowledge from a pre-trained model while preserving domain-agnostic spatial information. Leveraging SAMIRO’s plug-and-play characteristic, we integrate it into various state-of-the-art lane detection approaches and conduct extensive experiments on major benchmarks such as CULane, Tusimple, and LLAMAS. The results demonstrate that SAMIRO consistently improves performance across different models and datasets. The code will be made available upon publication.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 198-204"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-09-26DOI: 10.1016/j.patrec.2025.09.004
Koyel Ghosh , Saptarshi Saha , Thomas Mandl , Sandip Modha
In the digital era, social media has emerged as a powerful channel for expressing opinions, but online platforms have also become a breeding ground for hate speech targeting individuals based on color, caste, gender, sexual orientation, and political ideologies. Despite growing interest in automatic hate speech detection, existing research remains predominantly focused on English, underscoring a critical need to extend efforts to under-resourced languages. To bridge this gap, the HASOC (Hate Speech and Offensive Content Identification) shared task has been promoting multilingual hate speech research. In this paper, we present a brief overview of these four shared tasks (Assamese, Bengali, Bodo and English), datasets, participating systems, and their performance across standard evaluation metrics—precision, recall, accuracy, and macro F1 score. In addition, we analyze the inter-system agreement using Cohen’s and Fleiss’ , and investigate item-level difficulty through hardness analyses. Our findings offer valuable insights into the challenges and progress in multilingual hate speech detection, particularly for low-resource languages. This paper also serves as a model for the analysis of other results of large-scale experimentation with text classification systems.
在数字时代,社交媒体已成为表达意见的强大渠道,但在线平台也成为基于肤色、种姓、性别、性取向和政治意识形态针对个人的仇恨言论的滋生地。尽管对自动仇恨言论检测的兴趣越来越大,但现有的研究仍然主要集中在英语上,这强调了将努力扩展到资源不足的语言的迫切需要。为了弥补这一差距,HASOC(仇恨言论和攻击性内容识别)的共同任务是促进多语言仇恨言论的研究。在本文中,我们简要概述了这四个共享任务(阿萨姆语、孟加拉语、博多语和英语)、数据集、参与系统,以及它们在标准评估指标(精度、召回率、准确性和宏观F1分数)上的表现。此外,我们使用Cohen ' s κ和Fleiss ' s κ分析系统间一致性,并通过硬度分析研究项目级难度。我们的研究结果为多语言仇恨言论检测的挑战和进展提供了有价值的见解,特别是对于低资源语言。本文也为文本分类系统的其他大规模实验结果分析提供了一个模型。
{"title":"Findings from shared tasks on hate speech detection: Performance patterns for low-resource languages","authors":"Koyel Ghosh , Saptarshi Saha , Thomas Mandl , Sandip Modha","doi":"10.1016/j.patrec.2025.09.004","DOIUrl":"10.1016/j.patrec.2025.09.004","url":null,"abstract":"<div><div>In the digital era, social media has emerged as a powerful channel for expressing opinions, but online platforms have also become a breeding ground for hate speech targeting individuals based on color, caste, gender, sexual orientation, and political ideologies. Despite growing interest in automatic hate speech detection, existing research remains predominantly focused on English, underscoring a critical need to extend efforts to under-resourced languages. To bridge this gap, the HASOC (Hate Speech and Offensive Content Identification) shared task has been promoting multilingual hate speech research. In this paper, we present a brief overview of these four shared tasks (Assamese, Bengali, Bodo and English), datasets, participating systems, and their performance across standard evaluation metrics—precision, recall, accuracy, and macro F1 score. In addition, we analyze the inter-system agreement using Cohen’s <span><math><mi>κ</mi></math></span> and Fleiss’ <span><math><mi>κ</mi></math></span>, and investigate item-level difficulty through hardness analyses. Our findings offer valuable insights into the challenges and progress in multilingual hate speech detection, particularly for low-resource languages. This paper also serves as a model for the analysis of other results of large-scale experimentation with text classification systems.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 303-309"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stroke is one of the leading causes of death and long-term disability worldwide, with large vessel occlusion representing one of the most severe forms due to its association with extensive brain damage and poor prognosis. Rapid and reliable detection of large vessel occlusion in emergency settings is therefore essential to guide timely treatment decisions. Computed tomography angiography is currently the reference imaging modality for this task, as it provides high-resolution visualization of cerebral vessels within minutes. Nevertheless, the small size and variable location of thrombi make their identification difficult, often requiring expert radiological interpretation and being prone to missed detections. In this study, we propose a unified deep learning architecture that integrates GravityNet for slice-wise localization of vessel obstruction with a multi-head self-attention mechanism designed to capture spatial continuity across adjacent slices. A volumetric refinement stage based on three-dimensional non-maximum suppression consolidates overlapping predictions and reduces false positives across the brain volume. Evaluated on a private dataset of computed tomography angiography scans, the proposed method achieves 70.8% sensitivity at one false positive per scan, showing its potential to support automated and time-critical detection of clots in acute stroke workflows.
{"title":"Window self-attention and 3D volumetric refinement for large vessel occlusion detection in brain angiography","authors":"Ciro Russo , Giulio Russo , Arnau Oliver , Xavier Lladó , Mikel Terceño , Yolanda Silva , Alessandro Bria , Claudio Marrocco","doi":"10.1016/j.patrec.2025.10.019","DOIUrl":"10.1016/j.patrec.2025.10.019","url":null,"abstract":"<div><div>Stroke is one of the leading causes of death and long-term disability worldwide, with large vessel occlusion representing one of the most severe forms due to its association with extensive brain damage and poor prognosis. Rapid and reliable detection of large vessel occlusion in emergency settings is therefore essential to guide timely treatment decisions. Computed tomography angiography is currently the reference imaging modality for this task, as it provides high-resolution visualization of cerebral vessels within minutes. Nevertheless, the small size and variable location of thrombi make their identification difficult, often requiring expert radiological interpretation and being prone to missed detections. In this study, we propose a unified deep learning architecture that integrates GravityNet for slice-wise localization of vessel obstruction with a multi-head self-attention mechanism designed to capture spatial continuity across adjacent slices. A volumetric refinement stage based on three-dimensional non-maximum suppression consolidates overlapping predictions and reduces false positives across the brain volume. Evaluated on a private dataset of computed tomography angiography scans, the proposed method achieves 70.8% sensitivity at one false positive per scan, showing its potential to support automated and time-critical detection of clots in acute stroke workflows.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 27-33"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-10-29DOI: 10.1016/j.patrec.2025.10.018
Bo Peng, Chao Ma, Yifan Chen, Mi Zhu, Ningsheng Liao
Most object detection models are trained under ideal lighting and weather conditions. However, when deployed in adverse weather conditions such as haze, rain, and snow, these models suffer from image quality degradation and target occlusion problems, leading to deteriorated detection performance. To address these challenges, this paper proposes MTW-DETR, a multi-task collaborative detection model that employs a dual-stream network architecture to achieve joint optimization of image restoration and object detection. The model enhances feature representation capabilities for low-quality images through a cross-task feature sharing mechanism and a feature enhancement module. Specifically, within the joint learning framework, we design three key components. First, a restoration subnetwork embedded with a Channel Pixel Attention module achieves fine-grained image restoration and adopts a dynamic feature calibration strategy, thereby improving degraded image quality. Furthermore, a Weight Space Reconstruction Module is integrated into the backbone network to enhance multi-scale feature representation capabilities. Finally, a Branch Shift Convolution Module is incorporated in the neck to improve global information extraction ability, enhance understanding of the overall image structure and feature representation. Experimental results demonstrate that on the real haze dataset RTTS, our model achieves 38% AP, representing a 3.7% improvement over the baseline model RT-DETR. In cross-domain evaluations on synthetic rain and fog datasets, the model shows significant accuracy improvements and exhibits excellent generalization ability across diverse weather scenarios.
{"title":"MTW-DETR: A multi-task collaborative optimization model for adverse weather object detection","authors":"Bo Peng, Chao Ma, Yifan Chen, Mi Zhu, Ningsheng Liao","doi":"10.1016/j.patrec.2025.10.018","DOIUrl":"10.1016/j.patrec.2025.10.018","url":null,"abstract":"<div><div>Most object detection models are trained under ideal lighting and weather conditions. However, when deployed in adverse weather conditions such as haze, rain, and snow, these models suffer from image quality degradation and target occlusion problems, leading to deteriorated detection performance. To address these challenges, this paper proposes MTW-DETR, a multi-task collaborative detection model that employs a dual-stream network architecture to achieve joint optimization of image restoration and object detection. The model enhances feature representation capabilities for low-quality images through a cross-task feature sharing mechanism and a feature enhancement module. Specifically, within the joint learning framework, we design three key components. First, a restoration subnetwork embedded with a Channel Pixel Attention module achieves fine-grained image restoration and adopts a dynamic feature calibration strategy, thereby improving degraded image quality. Furthermore, a Weight Space Reconstruction Module is integrated into the backbone network to enhance multi-scale feature representation capabilities. Finally, a Branch Shift Convolution Module is incorporated in the neck to improve global information extraction ability, enhance understanding of the overall image structure and feature representation. Experimental results demonstrate that on the real haze dataset RTTS, our model achieves 38% AP, representing a 3.7% improvement over the baseline model RT-DETR. In cross-domain evaluations on synthetic rain and fog datasets, the model shows significant accuracy improvements and exhibits excellent generalization ability across diverse weather scenarios.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 7-12"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145384598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-09-10DOI: 10.1016/j.patrec.2025.09.002
Yingchun Guo, Xueqi Lv, Gang Yan, Shu Chen, Shi Di
Image inversion using StyleGAN retrieves latent codes by embedding real images into the GAN’s latent space, enabling attribute editing and high-quality image generation. However, existing methods often struggle with reconstruction reliability and flexible editing, resulting in low-quality outcomes. To address these issues, we propose TransStyle, a new StyleGAN inversion model based on Transformer technology. Our model features a novel encoder structure, PACP (Path Aggregation with Covariance Pooling), for improved feature representation and a feature prediction head that uses covariance pooling. Additionally, we propose a Transformer-based module to enhance interactions with semantic information in the latent space. StyleGAN then uses this enhanced latent code to generate images with high fidelity and strong editability. Experimental results demonstrate that our method achieves at least 5% higher face reconstruction similarity compared to current state-of-the-art techniques, confirming the advantages of TransStyle in image reconstruction and editing quality.
{"title":"TransStyle: Transformer-based StyleGAN for image inversion and editing","authors":"Yingchun Guo, Xueqi Lv, Gang Yan, Shu Chen, Shi Di","doi":"10.1016/j.patrec.2025.09.002","DOIUrl":"10.1016/j.patrec.2025.09.002","url":null,"abstract":"<div><div>Image inversion using StyleGAN retrieves latent codes by embedding real images into the GAN’s latent space, enabling attribute editing and high-quality image generation. However, existing methods often struggle with reconstruction reliability and flexible editing, resulting in low-quality outcomes. To address these issues, we propose TransStyle, a new StyleGAN inversion model based on Transformer technology. Our model features a novel encoder structure, PACP (Path Aggregation with Covariance Pooling), for improved feature representation and a feature prediction head that uses covariance pooling. Additionally, we propose a Transformer-based module to enhance interactions with semantic information in the latent space. StyleGAN then uses this enhanced latent code to generate images with high fidelity and strong editability. Experimental results demonstrate that our method achieves at least 5% higher face reconstruction similarity compared to current state-of-the-art techniques, confirming the advantages of TransStyle in image reconstruction and editing quality.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 1-7"},"PeriodicalIF":3.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145108644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-09-13DOI: 10.1016/j.patrec.2025.09.003
Michalis Lazarou , Sata Atito , Muhammad Awais , Josef Kittler
Self-supervised learning has shown unprecedented success for learning expressive representations that can be used effectively to solve downstream tasks. However, while the impressive results of self-supervised learning are undeniable there is still a certain mystery regarding how self-supervised learning models learn, what features are they learning and most importantly which examples are hard to learn. Contrastive learning is one of the prominent lines of research in self-supervised learning, where a subcategory of methods relies on knowledge-distillation between a student network and a teacher network which is an exponentially moving average of the student, initially proposed by the seminal work of DINO. In this work we investigate models trained using this family of self-supervised methods and reveal certain properties about them. Specifically, we propose a novel perspective on understanding which examples and which classes are difficult to be learnt effectively during training through the lens of information theory.
{"title":"Which images can be effectively learnt from self-supervised learning?","authors":"Michalis Lazarou , Sata Atito , Muhammad Awais , Josef Kittler","doi":"10.1016/j.patrec.2025.09.003","DOIUrl":"10.1016/j.patrec.2025.09.003","url":null,"abstract":"<div><div>Self-supervised learning has shown unprecedented success for learning expressive representations that can be used effectively to solve downstream tasks. However, while the impressive results of self-supervised learning are undeniable there is still a certain mystery regarding how self-supervised learning models learn, what features are they learning and most importantly which examples are hard to learn. Contrastive learning is one of the prominent lines of research in self-supervised learning, where a subcategory of methods relies on knowledge-distillation between a student network and a teacher network which is an exponentially moving average of the student, initially proposed by the seminal work of DINO. In this work we investigate models trained using this family of self-supervised methods and reveal certain properties about them. Specifically, we propose a novel perspective on understanding which examples and which classes are difficult to be learnt effectively during training through the lens of information theory.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 8-13"},"PeriodicalIF":3.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145108645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-09-25DOI: 10.1016/j.patrec.2025.09.008
Vítor M. Hanriot , Turíbio T. Salis , Luiz C.B. Torres , Frederico Coelho , Antonio P. Braga
This paper introduces the use of per-class regularization hyperparameters in Gabriel graph-based binary classifiers. We demonstrate how the quality index used for regularization behaves both in the margin region and in the presence of outliers, and how incorporating this regularization flexibility can lead to solutions that effectively eliminate outliers while training the classifier. We also show how it can address class imbalance by generating higher and lower thresholds for the majority and minority classes, respectively. Thus, rather than having a single solution based on fixed thresholds, flexible thresholds expand the solution space and can be optimized through hyperparameter tuning algorithms. Friedman test shows that flexible thresholds are capable of improving Gabriel graph-based classifiers.
{"title":"Large margin classifier with graph-based adaptive regularization","authors":"Vítor M. Hanriot , Turíbio T. Salis , Luiz C.B. Torres , Frederico Coelho , Antonio P. Braga","doi":"10.1016/j.patrec.2025.09.008","DOIUrl":"10.1016/j.patrec.2025.09.008","url":null,"abstract":"<div><div>This paper introduces the use of per-class regularization hyperparameters in Gabriel graph-based binary classifiers. We demonstrate how the quality index used for regularization behaves both in the margin region and in the presence of outliers, and how incorporating this regularization flexibility can lead to solutions that effectively eliminate outliers while training the classifier. We also show how it can address class imbalance by generating higher and lower thresholds for the majority and minority classes, respectively. Thus, rather than having a single solution based on fixed thresholds, flexible thresholds expand the solution space and can be optimized through hyperparameter tuning algorithms. Friedman test shows that flexible thresholds are capable of improving Gabriel graph-based classifiers.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 43-49"},"PeriodicalIF":3.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-09-23DOI: 10.1016/j.patrec.2025.09.005
Andre Nobrega , Ilan Theodoro , Pascual Figueroa , Alexandre Xavier Falcão
Latent fingerprints are challenging to identify due to low quality, partial impressions, and noise. This paper proposes a self-supervised contrastive learning approach to generate minutiae embeddings, improving fingerprint representation and matching. We first introduce a method to synthesize realistic latent fingerprints from rolled and plain images by applying ridge distortions, contrast shifts, blurring, noise, and document-based backgrounds. The resulting dataset includes reliable minutiae correspondences for effective training. Fingerprints are then represented as orientation-aligned, minutia-centered patches. A Siamese network trained with contrastive learning on these patches produces discriminative embeddings. Matching computes the mean cosine similarity between the embeddings of paired minutiae from candidate references selected by a matcher. Experiments on NIST SD27 and SD302, using a 20,473-print gallery, demonstrate rank-1 identification gains of 4.25 and 1.66 percentage points over prior work. It also consistently outperforms other synthetic latent generation baselines.
{"title":"A self-supervised contrastive learning approach for latent fingerprint identification","authors":"Andre Nobrega , Ilan Theodoro , Pascual Figueroa , Alexandre Xavier Falcão","doi":"10.1016/j.patrec.2025.09.005","DOIUrl":"10.1016/j.patrec.2025.09.005","url":null,"abstract":"<div><div>Latent fingerprints are challenging to identify due to low quality, partial impressions, and noise. This paper proposes a self-supervised contrastive learning approach to generate minutiae embeddings, improving fingerprint representation and matching. We first introduce a method to synthesize realistic latent fingerprints from rolled and plain images by applying ridge distortions, contrast shifts, blurring, noise, and document-based backgrounds. The resulting dataset includes reliable minutiae correspondences for effective training. Fingerprints are then represented as orientation-aligned, minutia-centered patches. A Siamese network trained with contrastive learning on these patches produces discriminative embeddings. Matching computes the mean cosine similarity between the embeddings of paired minutiae from candidate references selected by a matcher. Experiments on NIST SD27 and SD302, using a 20,473-print gallery, demonstrate rank-1 identification gains of 4.25 and 1.66 percentage points over prior work. It also consistently outperforms other synthetic latent generation baselines.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 125-131"},"PeriodicalIF":3.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-08-18DOI: 10.1016/j.patrec.2025.08.002
Li Fan , Jeová Farias Sales Rocha Neto
Synthetic Aperture Radar (SAR) imagery analysis plays a crucial role in remote sensing applications but presents challenges due to the presence of inherent speckle noise. To address this, a common practice involves employing the distribution model to obtain roughness information from the data, facilitating subsequent imaging processes such as segmentation, and classification. Consequently, there is a demand for rapid and reliable methods to estimate the roughness parameter from SAR data, notably in high-resolution imaging contexts. Existing parameter estimation techniques, however, are often slow and susceptible to errors and failures. In this work, we proposed a neural network-based estimation framework that initially learns to predict the underlying parameters of samples, which then enables it to estimate the roughness of new, unseen data. Our results demonstrate that this neural network-based estimator is faster and more reliable, beside yielding less estimation error, than conventional estimation methods. We further show that this estimation can be further improved by using Bayesian Neural Networks, which additionally promote estimation uncertainty prediction. Finally, we show that this approach can be generalized to handle image inputs and, even if trained on simulated data, is able to perform real-time pixel-wise roughness estimation for high-resolution real SAR imagery.
{"title":"Using traditional and Bayesian neural networks for fast parameter estimation in SAR images","authors":"Li Fan , Jeová Farias Sales Rocha Neto","doi":"10.1016/j.patrec.2025.08.002","DOIUrl":"10.1016/j.patrec.2025.08.002","url":null,"abstract":"<div><div>Synthetic Aperture Radar (SAR) imagery analysis plays a crucial role in remote sensing applications but presents challenges due to the presence of inherent speckle noise. To address this, a common practice involves employing the <span><math><msubsup><mrow><mi>G</mi></mrow><mrow><mi>I</mi></mrow><mrow><mn>0</mn></mrow></msubsup></math></span> distribution model to obtain roughness information from the data, facilitating subsequent imaging processes such as segmentation, and classification. Consequently, there is a demand for rapid and reliable methods to estimate the roughness parameter from SAR data, notably in high-resolution imaging contexts. Existing parameter estimation techniques, however, are often slow and susceptible to errors and failures. In this work, we proposed a neural network-based estimation framework that initially learns to predict the underlying parameters of <span><math><msubsup><mrow><mi>G</mi></mrow><mrow><mi>I</mi></mrow><mrow><mn>0</mn></mrow></msubsup></math></span> samples, which then enables it to estimate the roughness of new, unseen data. Our results demonstrate that this neural network-based estimator is faster and more reliable, beside yielding less estimation error, than conventional estimation methods. We further show that this estimation can be further improved by using Bayesian Neural Networks, which additionally promote estimation uncertainty prediction. Finally, we show that this approach can be generalized to handle image inputs and, even if trained on simulated data, is able to perform real-time pixel-wise roughness estimation for high-resolution real SAR imagery.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 140-146"},"PeriodicalIF":3.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-08-12DOI: 10.1016/j.patrec.2025.08.001
Leonardo Vilela Cardoso, Barbara Hellen P. Soraggi, Silvio Jamil F. Guimarães, Zenilton K.G. Patrocínio Jr
Video skimming aims to generate concise yet informative summaries that highlight the most salient aspects of a video. However, conventional methods often struggle with diverse and redundant content due to their limited ability to detect scene transitions and insufficient temporal modeling. To address these challenges, we propose Streamlined Extended Long Short-Term Memory (StreamExLSTM), a supervised architecture derived from a streamlined variant of the extended Long Short-Term Memory (xLSTM) model. The proposed approach introduces two lightweight modules: ssLSTM, which captures short-range temporal dependencies through convolutional and recurrent operations, and smLSTM, which models long-range narrative structure using stacked memory-enhanced LSTMs. This dual-path design enables the model to balance local detail with global coherence while maintaining low complexity. Experimental results demonstrate that StreamExLSTM outperforms recent supervised baselines, achieving an average F-score of 48.8 on SumMe and 61.1 on TVSum. Moreover, when trained on a combined dataset, it reaches an F-score of 83.7 on the TVSum test set, performing comparably to semi-supervised, reinforcement learning, and GAN-based methods. These results validate StreamExLSTM as an effective and lightweight solution for dynamic video summarization.
{"title":"Streamlined extended Long Short-Term Memory for video skimming","authors":"Leonardo Vilela Cardoso, Barbara Hellen P. Soraggi, Silvio Jamil F. Guimarães, Zenilton K.G. Patrocínio Jr","doi":"10.1016/j.patrec.2025.08.001","DOIUrl":"10.1016/j.patrec.2025.08.001","url":null,"abstract":"<div><div>Video skimming aims to generate concise yet informative summaries that highlight the most salient aspects of a video. However, conventional methods often struggle with diverse and redundant content due to their limited ability to detect scene transitions and insufficient temporal modeling. To address these challenges, we propose <strong>Stream</strong>lined <strong>Ex</strong>tended <strong>L</strong>ong <strong>S</strong>hort-<strong>T</strong>erm <strong>M</strong>emory (<strong>StreamExLSTM</strong>), a supervised architecture derived from a streamlined variant of the extended Long Short-Term Memory (xLSTM) model. The proposed approach introduces two lightweight modules: ssLSTM, which captures short-range temporal dependencies through convolutional and recurrent operations, and smLSTM, which models long-range narrative structure using stacked memory-enhanced LSTMs. This dual-path design enables the model to balance local detail with global coherence while maintaining low complexity. Experimental results demonstrate that StreamExLSTM outperforms recent supervised baselines, achieving an average F-score of 48.8 on SumMe and 61.1 on TVSum. Moreover, when trained on a combined dataset, it reaches an F-score of 83.7 on the TVSum test set, performing comparably to semi-supervised, reinforcement learning, and GAN-based methods. These results validate StreamExLSTM as an effective and lightweight solution for dynamic video summarization.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 132-139"},"PeriodicalIF":3.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}