Pattern Recognition Letters最新文献_第7页

Generation of super-resolution for medical image via a self-prior guided Mamba network with edge-aware constraint 基于边缘感知约束的自先验引导Mamba网络生成超分辨率医学图像

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-22 DOI: 10.1016/j.patrec.2024.11.020

Zexin Ji , Beiji Zou , Xiaoyan Kui , Hua Li , Pierre Vera , Su Ruan

Existing deep learning-based super-resolution generation approaches usually depend on the backbone of convolutional neural networks (CNNs) or Transformers. CNN-based approaches are unable to model long-range dependencies, whereas Transformer-based approaches encounter significant computational burdens due to quadratic complexity in calculations. Moreover, high-frequency texture details in images generated by existing approaches still remain indistinct, posing a major challenge in super-resolution tasks. To overcome these problems, we propose a self-prior guided Mamba network with edge-aware constraint (SEMambaSR) for medical image super-resolution. Recently, State Space Models (SSMs), notably Mamba, have gained prominence for the ability to efficiently model long-range dependencies with low complexity. In this paper, we propose to integrate Mamba into the Unet network allowing to extract multi-scale local and global features to generate high-quality super-resolution images. Additionally, we introduce perturbations by randomly adding a brightness window to the input image, enabling the network to mine the self-prior information of the image. We also design an improved 2D-Selective-Scan (ISS2D) module to learn and adaptively fuse multi-directional long-range dependencies in image features to enhance feature representation. An edge-aware constraint is exploited to learn the multi-scale edge information from encoder features for better synthesis of texture boundaries. Our qualitative and quantitative experimental findings indicate superior super-resolution performance over current methods on IXI and BraTS2021 medical datasets. Specifically, our approach achieved a PSNR of 33.44 dB and an SSIM of 0.9371 on IXI, and a PSNR of 41.99 dB and an SSIM of 0.9846 on BraTS2021, both for 2

\times

upsampling. The downstream vision task on brain tumor segmentation, using a U-Net network, also reveals the effectiveness of our approach, with a mean Dice Score of 57.06% on the BraTS2021 dataset.

现有的基于深度学习的超分辨率生成方法通常依赖于卷积神经网络（cnn）或变压器的主干。基于cnn的方法无法对长期依赖关系进行建模，而基于transformer的方法由于计算的二次复杂度而面临巨大的计算负担。此外，现有方法生成的图像中的高频纹理细节仍然不清晰，这对超分辨率任务构成了重大挑战。为了克服这些问题，我们提出了一种具有边缘感知约束的自先验引导Mamba网络（SEMambaSR）用于医学图像超分辨率。最近，状态空间模型（State Space Models, ssm），特别是Mamba，因为能够以低复杂性高效地建模远程依赖关系而获得了突出的地位。在本文中，我们建议将曼巴整合到Unet网络中，允许提取多尺度局部和全局特征，以生成高质量的超分辨率图像。此外，我们通过在输入图像中随机添加亮度窗口来引入扰动，使网络能够挖掘图像的自先验信息。我们还设计了一个改进的二维选择性扫描（ISS2D）模块来学习和自适应融合图像特征中的多向远程依赖关系，以增强特征表征。利用边缘感知约束从编码器特征中学习多尺度边缘信息，从而更好地合成纹理边界。我们的定性和定量实验结果表明，在IXI和BraTS2021医疗数据集上，超分辨率性能优于当前方法。具体来说，我们的方法在IXI上实现了33.44 dB的PSNR和0.9371的SSIM，在BraTS2021上实现了41.99 dB的PSNR和0.9846的SSIM，两者都是2倍上采样。使用U-Net网络的脑肿瘤分割下游视觉任务也显示了我们方法的有效性，在BraTS2021数据集上的平均Dice Score为57.06%。

{"title":"Generation of super-resolution for medical image via a self-prior guided Mamba network with edge-aware constraint","authors":"Zexin Ji , Beiji Zou , Xiaoyan Kui , Hua Li , Pierre Vera , Su Ruan","doi":"10.1016/j.patrec.2024.11.020","DOIUrl":"10.1016/j.patrec.2024.11.020","url":null,"abstract":"<div><div>Existing deep learning-based super-resolution generation approaches usually depend on the backbone of convolutional neural networks (CNNs) or Transformers. CNN-based approaches are unable to model long-range dependencies, whereas Transformer-based approaches encounter significant computational burdens due to quadratic complexity in calculations. Moreover, high-frequency texture details in images generated by existing approaches still remain indistinct, posing a major challenge in super-resolution tasks. To overcome these problems, we propose a self-prior guided Mamba network with edge-aware constraint (SEMambaSR) for medical image super-resolution. Recently, State Space Models (SSMs), notably Mamba, have gained prominence for the ability to efficiently model long-range dependencies with low complexity. In this paper, we propose to integrate Mamba into the Unet network allowing to extract multi-scale local and global features to generate high-quality super-resolution images. Additionally, we introduce perturbations by randomly adding a brightness window to the input image, enabling the network to mine the self-prior information of the image. We also design an improved 2D-Selective-Scan (ISS2D) module to learn and adaptively fuse multi-directional long-range dependencies in image features to enhance feature representation. An edge-aware constraint is exploited to learn the multi-scale edge information from encoder features for better synthesis of texture boundaries. Our qualitative and quantitative experimental findings indicate superior super-resolution performance over current methods on IXI and BraTS2021 medical datasets. Specifically, our approach achieved a PSNR of 33.44 dB and an SSIM of 0.9371 on IXI, and a PSNR of 41.99 dB and an SSIM of 0.9846 on BraTS2021, both for 2<span><math><mo>×</mo></math></span> upsampling. The downstream vision task on brain tumor segmentation, using a U-Net network, also reveals the effectiveness of our approach, with a mean Dice Score of 57.06% on the BraTS2021 dataset.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"187 ","pages":"Pages 93-99"},"PeriodicalIF":3.9,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142746067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Prototypical class-wise test-time adaptation 原型类测试时间适应性

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-22 DOI: 10.1016/j.patrec.2024.10.011

Hojoon Lee , Seunghwan Lee , Inyoung Jung , Sungeun Hong

Test-time adaptation (TTA) refines pre-trained models during deployment, enabling them to effectively manage new, previously unseen data. However, existing TTA methods focus mainly on global domain alignment, which reduces domain-level gaps but often leads to suboptimal performance. This is because they fail to explicitly consider class-wise alignment, resulting in errors when reliable pseudo-labels are unavailable and source domain samples are inaccessible. In this study, we propose a prototypical class-wise test-time adaptation method, which consists of class-wise prototype adaptation and reliable pseudo-labeling. A main challenge in this approach is the lack of direct access to source domain samples. We leverage the class-specific knowledge contained in the weights of the pre-trained model. To construct class prototypes from the unlabeled target domain, we further introduce a methodology to enhance the reliability of pseudo labels. Our method is adaptable to various models and has been extensively validated, consistently outperforming baselines across multiple benchmark datasets.

测试时间适应（TTA）可在部署过程中完善预训练模型，使其能够有效管理以前未见过的新数据。然而，现有的 TTA 方法主要侧重于全域对齐，这虽然减少了领域级差距，但往往会导致性能不达标。这是因为这些方法没有明确考虑类对齐，导致在无法获得可靠的伪标签和源领域样本时出现错误。在本研究中，我们提出了一种原型分类测试时间适应方法，该方法由原型分类适应和可靠的伪标签组成。这种方法面临的主要挑战是无法直接获取源领域样本。我们利用预训练模型权重中包含的特定类知识。为了从未标明的目标域中构建类原型，我们进一步引入了一种方法来提高伪标签的可靠性。我们的方法适用于各种模型，并经过广泛验证，在多个基准数据集上的表现始终优于基线方法。

引用次数: 0

Detailed evaluation of a population-wise personalization approach to generate synthetic myocardial infarct images 详细评估了一种基于人群的个性化方法来生成合成心肌梗死图像

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-22 DOI: 10.1016/j.patrec.2024.11.017

Anastasia Konik , Patrick Clarysse , Nicolas Duchateau

Personalization of biophysical models to real data is essential to achieve realistic simulations or generate relevant synthetic populations. However, some of these models involve randomness, which poses two challenges: they do not allow the standard personalization to each individual’s data and they lack an analytical formulation required for optimization. In previous work, we introduced a population-based personalization strategy which overcomes these challenges and demonstrated its feasibility on simple 2D geometrical models of myocardial infarct. The method consists in matching the distributions of the synthetic and real populations, quantified through the Kullback–Leibler (KL) divergence. Personalization is achieved with a gradient-free algorithm (CMA-ES), which generates sets of candidate solutions represented by their covariance matrix, whose coefficients evolve until the synthetic and real data are matched. However, the robustness of this strategy regarding settings and more complex data was not challenged. In this work, we specifically address these points, with (i) an improved design, (ii) a thorough evaluation on crucial aspects of the personalization process, including hyperparameters and initialization, and (iii) the application to 3D data. Despite some limits of the simple geometrical models used, our method is able to capture the main characteristics of the real data, as demonstrated both on 2D and 3D segmented late Gadolinium images of 123 subjects with acute myocardial infarction.

将生物物理模型个性化为真实数据对于实现真实的模拟或生成相关的合成种群至关重要。然而，其中一些模型涉及随机性，这带来了两个挑战：它们不允许对每个人的数据进行标准个性化，并且它们缺乏优化所需的分析公式。在之前的工作中，我们介绍了一种基于人群的个性化策略，克服了这些挑战，并在简单的二维心肌梗死几何模型上证明了其可行性。该方法包括匹配合成种群和真实种群的分布，通过Kullback-Leibler （KL）散度量化。个性化是通过无梯度算法（CMA-ES）实现的，该算法生成一组候选解，由它们的协方差矩阵表示，其系数不断进化，直到合成数据和真实数据匹配。然而，对于设置和更复杂的数据，这种策略的稳健性没有受到挑战。在这项工作中，我们通过(i)改进的设计，（ii）对个性化过程的关键方面进行全面评估，包括超参数和初始化，以及（iii）对3D数据的应用，专门解决了这些问题。尽管所使用的简单几何模型存在一些局限性，但我们的方法能够捕获真实数据的主要特征，正如123例急性心肌梗死受试者的2D和3D分割晚期钆图像所证明的那样。

{"title":"Detailed evaluation of a population-wise personalization approach to generate synthetic myocardial infarct images","authors":"Anastasia Konik , Patrick Clarysse , Nicolas Duchateau","doi":"10.1016/j.patrec.2024.11.017","DOIUrl":"10.1016/j.patrec.2024.11.017","url":null,"abstract":"<div><div>Personalization of biophysical models to real data is essential to achieve realistic simulations or generate relevant synthetic populations. However, some of these models involve randomness, which poses two challenges: they do not allow the standard personalization to each individual’s data and they lack an analytical formulation required for optimization. In previous work, we introduced a population-based personalization strategy which overcomes these challenges and demonstrated its feasibility on simple 2D geometrical models of myocardial infarct. The method consists in matching the distributions of the synthetic and real populations, quantified through the Kullback–Leibler (KL) divergence. Personalization is achieved with a gradient-free algorithm (CMA-ES), which generates sets of candidate solutions represented by their covariance matrix, whose coefficients evolve until the synthetic and real data are matched. However, the robustness of this strategy regarding settings and more complex data was not challenged. In this work, we specifically address these points, with (i) an improved design, (ii) a thorough evaluation on crucial aspects of the personalization process, including hyperparameters and initialization, and (iii) the application to 3D data. Despite some limits of the simple geometrical models used, our method is able to capture the main characteristics of the real data, as demonstrated both on 2D and 3D segmented late Gadolinium images of 123 subjects with acute myocardial infarction.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 8-14"},"PeriodicalIF":3.9,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142742881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving ViT interpretability with patch-level mask prediction 利用光斑级掩膜预测提高 ViT 的可解释性

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-22 DOI: 10.1016/j.patrec.2024.11.018

Junyong Kang , Byeongho Heo , Junsuk Choe

Vision Transformers (ViTs) have demonstrated remarkable performances on various computer vision tasks. Attention scores are often used to explain the decision-making process of ViTs, showing which tokens are more important than others. However, the attention scores have several limitations as an explanation for ViT, such as conflicting with other explainable methods or highlighting unrelated tokens. In order to address this limitation, we propose a novel method for generating a visual explanation map from ViTs. Unlike previous approaches that rely on attention scores, our method leverages ViT features and conducts a single forward pass through our Patch-level Mask prediction (PM) module. Our visual explanation map provides class-dependent and probabilistic interpretation that can identify crucial regions of model decisions. Experimental results demonstrate that our approach outperforms previous techniques in both classification and interpretability aspects. Additionally, it can be applied to the weakly-supervised object localization (WSOL) tasks using pseudo mask labels. Our method requires no extra parameters and necessitates minimal locality supervision, utilizing less than 1% of the ImageNet-1k training dataset.

视觉转换器（ViTs）在各种计算机视觉任务中表现出了卓越的性能。注意力分数通常被用来解释 ViTs 的决策过程，显示哪些标记比其他标记更重要。然而，注意力分数作为对 ViT 的解释有一些局限性，例如与其他可解释方法相冲突或突出不相关的标记。为了解决这一局限性，我们提出了一种从 ViT 生成视觉解释图的新方法。与以往依赖注意力分数的方法不同，我们的方法利用了 ViT 的特征，并通过我们的补丁级掩码预测（PM）模块进行一次前向传递。我们的视觉解释图提供了与类别相关的概率解释，可以识别出模型决策的关键区域。实验结果表明，我们的方法在分类和可解释性方面都优于之前的技术。此外，它还可用于使用伪掩码标签的弱监督对象定位（WSOL）任务。我们的方法不需要额外的参数，只需最小的定位监督，使用不到 ImageNet-1k 训练数据集的 1%。

{"title":"Improving ViT interpretability with patch-level mask prediction","authors":"Junyong Kang , Byeongho Heo , Junsuk Choe","doi":"10.1016/j.patrec.2024.11.018","DOIUrl":"10.1016/j.patrec.2024.11.018","url":null,"abstract":"<div><div>Vision Transformers (ViTs) have demonstrated remarkable performances on various computer vision tasks. Attention scores are often used to explain the decision-making process of ViTs, showing which tokens are more important than others. However, the attention scores have several limitations as an explanation for ViT, such as conflicting with other explainable methods or highlighting unrelated tokens. In order to address this limitation, we propose a novel method for generating a visual explanation map from ViTs. Unlike previous approaches that rely on attention scores, our method leverages ViT features and conducts a single forward pass through our Patch-level Mask prediction (PM) module. Our visual explanation map provides class-dependent and probabilistic interpretation that can identify crucial regions of model decisions. Experimental results demonstrate that our approach outperforms previous techniques in both classification and interpretability aspects. Additionally, it can be applied to the weakly-supervised object localization (WSOL) tasks using pseudo mask labels. Our method requires no extra parameters and necessitates minimal locality supervision, utilizing less than 1% of the ImageNet-1k training dataset.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"187 ","pages":"Pages 73-79"},"PeriodicalIF":3.9,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142721864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GANzzle++: Generative approaches for jigsaw puzzle solving as local to global assignment in latent spatial representations GANzzle++：潜在空间表征中从局部到全局分配的拼图游戏生成方法

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-19 DOI: 10.1016/j.patrec.2024.11.010

Davide Talon , Alessio Del Bue , Stuart James

Jigsaw puzzles are a popular and enjoyable pastime that humans can easily solve, even with many pieces. However, solving a jigsaw is a combinatorial problem, and the space of possible solutions is exponential in the number of pieces, intractable for pairwise solutions. In contrast to the classical pairwise local matching of pieces based on edge heuristics, we estimate an approximate solution image, i.e., a mental image, of the puzzle and exploit it to guide the placement of pieces as a piece-to-global assignment problem. Therefore, from unordered pieces, we consider conditioned generation approaches, including Generative Adversarial Networks (GAN) models, Slot Attention (SA) and Vision Transformers (ViT), to recover the solution image. Given the generated solution representation, we cast the jigsaw solving as a 1-to-1 assignment matching problem using Hungarian attention, which places pieces in corresponding positions in the global solution estimate. Results show that the newly proposed GANzzle-SA and GANzzle-VIT benefit from the early fusion strategy where pieces are jointly compressed and gathered for global structure recovery. A single deep learning model generalizes to puzzles of different sizes and improves the performances by a large margin. Evaluated on PuzzleCelebA and PuzzleWikiArts, our approaches bridge the gap of deep learning strategies with respect to optimization-based classic puzzle solvers.

拼图是一种流行且令人愉快的消遣方式，即使拼图块数很多，人类也能轻松解决。然而，拼图的解法是一个组合问题，可能的解法空间与拼图块的数量成指数关系，对于成对解法来说是难以解决的。与传统的基于边缘启发式的成对局部匹配相比，我们估算出了拼图的近似解图像，即心理图像，并利用它来指导拼图的摆放，将其作为一个 "拼图到整体 "的分配问题。因此，我们考虑采用条件生成法，包括生成对抗网络（GAN）模型、片段注意力（SA）和视觉转换器（ViT），来恢复无序拼图的解图像。鉴于生成的解决方案表示，我们使用匈牙利注意力将拼图解法作为 1 对 1 的分配匹配问题，将拼图块放置在全局解决方案估计中的相应位置。结果表明，新提出的 GANzzle-SA 和 GANzzle-VIT 从早期融合策略中获益匪浅，在早期融合策略中，碎片被联合压缩并收集起来，以恢复全局结构。单一深度学习模型适用于不同大小的谜题，并大大提高了性能。通过在 PuzzleCelebA 和 PuzzleWikiArts 上进行评估，我们的方法弥补了深度学习策略与基于优化的经典谜题求解器之间的差距。

{"title":"GANzzle++: Generative approaches for jigsaw puzzle solving as local to global assignment in latent spatial representations","authors":"Davide Talon , Alessio Del Bue , Stuart James","doi":"10.1016/j.patrec.2024.11.010","DOIUrl":"10.1016/j.patrec.2024.11.010","url":null,"abstract":"<div><div>Jigsaw puzzles are a popular and enjoyable pastime that humans can easily solve, even with many pieces. However, solving a jigsaw is a combinatorial problem, and the space of possible solutions is exponential in the number of pieces, intractable for pairwise solutions. In contrast to the classical pairwise local matching of pieces based on edge heuristics, we estimate an approximate solution image, i.e., a <em>mental image</em>, of the puzzle and exploit it to guide the placement of pieces as a piece-to-global assignment problem. Therefore, from unordered pieces, we consider conditioned generation approaches, including Generative Adversarial Networks (GAN) models, Slot Attention (SA) and Vision Transformers (ViT), to recover the solution image. Given the generated solution representation, we cast the jigsaw solving as a 1-to-1 assignment matching problem using Hungarian attention, which places pieces in corresponding positions in the global solution estimate. Results show that the newly proposed GANzzle-SA and GANzzle-VIT benefit from the early fusion strategy where pieces are jointly compressed and gathered for global structure recovery. A single deep learning model generalizes to puzzles of different sizes and improves the performances by a large margin. Evaluated on PuzzleCelebA and PuzzleWikiArts, our approaches bridge the gap of deep learning strategies with respect to optimization-based classic puzzle solvers.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"187 ","pages":"Pages 35-41"},"PeriodicalIF":3.9,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Neuromorphic face analysis: A survey 神经形态人脸分析：调查

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-19 DOI: 10.1016/j.patrec.2024.11.009

Federico Becattini , Lorenzo Berlincioni , Luca Cultrera , Alberto Del Bimbo

Neuromorphic sensors, also known as event cameras, are a class of imaging devices mimicking the function of biological visual systems. Unlike traditional frame-based cameras, which capture fixed images at discrete intervals, neuromorphic sensors continuously generate events that represent changes in light intensity or motion in the visual field with high temporal resolution and low latency. These properties have proven to be interesting in modeling human faces, both from an effectiveness and a privacy-preserving point of view. Neuromorphic face analysis however is still a raw and unstructured field of research, with several attempts at addressing different tasks with no clear standard or benchmark. This survey paper presents a comprehensive overview of capabilities, challenges and emerging applications in the domain of neuromorphic face analysis, to outline promising directions and open issues. After discussing the fundamental working principles of neuromorphic vision and presenting an in-depth overview of the related research, we explore the current state of available data, data representations, emerging challenges, and limitations that require further investigation. This paper aims to highlight the recent progress in this evolving field to provide researchers an all-encompassing analysis of the state of the art along with its problems and shortcomings.

神经形态传感器又称事件相机，是一类模仿生物视觉系统功能的成像设备。传统的帧式摄像头以离散的时间间隔捕捉固定的图像，而神经形态传感器则不同，它能持续生成事件，这些事件代表了视野中光强度或运动的变化，具有高时间分辨率和低延迟的特点。事实证明，无论从有效性还是从保护隐私的角度来看，这些特性在人脸建模方面都很有意义。然而，神经形态人脸分析仍然是一个原始和无序的研究领域，在解决不同任务方面有多种尝试，但没有明确的标准或基准。本调查报告全面概述了神经形态人脸分析领域的能力、挑战和新兴应用，并勾勒出有前景的方向和有待解决的问题。在讨论了神经形态视觉的基本工作原理并对相关研究进行深入概述之后，我们探讨了可用数据的现状、数据表示、新出现的挑战以及需要进一步研究的局限性。本文旨在重点介绍这一不断发展的领域的最新进展，为研究人员提供对技术现状及其问题和不足的全方位分析。

{"title":"Neuromorphic face analysis: A survey","authors":"Federico Becattini , Lorenzo Berlincioni , Luca Cultrera , Alberto Del Bimbo","doi":"10.1016/j.patrec.2024.11.009","DOIUrl":"10.1016/j.patrec.2024.11.009","url":null,"abstract":"<div><div>Neuromorphic sensors, also known as event cameras, are a class of imaging devices mimicking the function of biological visual systems. Unlike traditional frame-based cameras, which capture fixed images at discrete intervals, neuromorphic sensors continuously generate events that represent changes in light intensity or motion in the visual field with high temporal resolution and low latency. These properties have proven to be interesting in modeling human faces, both from an effectiveness and a privacy-preserving point of view. Neuromorphic face analysis however is still a raw and unstructured field of research, with several attempts at addressing different tasks with no clear standard or benchmark. This survey paper presents a comprehensive overview of capabilities, challenges and emerging applications in the domain of neuromorphic face analysis, to outline promising directions and open issues. After discussing the fundamental working principles of neuromorphic vision and presenting an in-depth overview of the related research, we explore the current state of available data, data representations, emerging challenges, and limitations that require further investigation. This paper aims to highlight the recent progress in this evolving field to provide researchers an all-encompassing analysis of the state of the art along with its problems and shortcomings.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"187 ","pages":"Pages 42-48"},"PeriodicalIF":3.9,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi source-free domain adaptation based on pseudo-label knowledge mining 基于伪标签知识挖掘的多源无域适配

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-19 DOI: 10.1016/j.patrec.2024.11.014

Fang Zhou , Zun Xu , Wei Wei , Lei Zhang

MSFDA methods were proposed to train unlabeled target data using a group of source pre-trained models, without directly accessing labeled source domain data. Through transferring knowledge to target domain using pseudo labels obtained by source pre-trained models, existing methods have shown potential for cross-domain classification. However, these models have not directly addressed the negative knowledge transfer caused by incorrect pseudo labels. In this study, we focus on the problem and propose a multi-source-free domain adaptation method based on pseudo-label knowledge mining. Specifically, we first utilize average entropy weighting to compute pseudo labels for target data. Then, we assign a confidence level to each target sample, considering it as either high or low. Finally, we generate mixed augmented target samples and conduct different self-training tasks for those with different confidence to alleviate the negative transfer resulting from inaccurate pseudo labels. Experimental results on three datasets demonstrate the effectiveness of our proposed method.

MSFDA 方法的提出是为了使用一组源预训练模型来训练未标记的目标数据，而无需直接访问有标记的源领域数据。通过使用源预训练模型获得的伪标签将知识转移到目标域，现有方法已显示出跨域分类的潜力。然而，这些模型并没有直接解决伪标签不正确所造成的负面知识转移问题。在本研究中，我们聚焦于这一问题，提出了一种基于伪标签知识挖掘的无源多域适应方法。具体来说，我们首先利用平均熵加权计算目标数据的伪标签。然后，我们为每个目标样本分配一个置信度，将其视为高或低。最后，我们生成混合增强的目标样本，并针对不同置信度的样本执行不同的自我训练任务，以减轻伪标签不准确带来的负迁移。在三个数据集上的实验结果证明了我们所提方法的有效性。

引用次数: 0

Dehazing with all we have 用我们所有的一切去除雾

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-19 DOI: 10.1016/j.patrec.2024.11.011

Yuelong Li , Zhenwei Liu , Yue Xing , Kunliang Liu , Lei Geng , Qingzeng Song , Jianming Wang

In the near past, a large number of classical intuitively originated dehazing and image enhancing approaches have been worked out, and once played key roles in tremendous practical application scenes. Nevertheless, nowadays, the booming of deep neural networks has fundamentally overturned the entire society, and deep learning is widely believed as the main dominant SOTA dehazing framework. Here, we wonder does that imply those once shining intuitive approaches are totally outdated and useless anymore? Following this idea, we propose a general framework that takes full advantage of both traditional intuitively designed and modern deep data driven series of techniques to realize high-quality image dehazing. It is mainly composed of two stages: Multiple I-Filters based Knowledge Extraction (MIF-KE) and Multi-Complexity Paths based Knowledge Analysis and Fusion (MCP-KAF). In MIF-KE, diverse intuitive dehazing techniques are sufficiently explored and evolved to a bunch of adaptive content enhancing I-Filters (I for Intuitive), with the assistance of automatic deep bilateral learning. Then, through pixel-wise affine transformation, these filters are imposed on preliminarily enhanced input image to extract critical dehazing knowledge. Subsequently, in the MCP-KAF stage, the collected knowledge are further comprehensively analyzed and systematically fused through various complexity structure paths to get high-quality dehazed image. The effectiveness and generality of proposed framework have been experimentally verified on three publicly available datasets with diverse haze categories. All source code will be provided soon.

近年来，人们研究出了大量经典的直观产生的去雾和图像增强方法，并在大量的实际应用场景中发挥了关键作用。然而，如今深度神经网络的蓬勃发展已经从根本上颠覆了整个社会，深度学习被广泛认为是SOTA除雾的主要主导框架。在这里，我们想知道这是否意味着那些曾经闪亮的直觉方法已经完全过时和无用了？根据这一思路，我们提出了一个综合利用传统直观设计和现代深度数据驱动系列技术实现高质量图像去雾的总体框架。它主要由两个阶段组成：基于多i- filter的知识提取（MIF-KE）和基于多复杂度路径的知识分析与融合（MCP-KAF）。在MIF-KE中，充分探索了多种直观的除雾技术，并在自动深度双边学习的帮助下，发展成为一堆自适应的内容增强I- filters （I for intuitive）。然后，通过逐像素的仿射变换，对初步增强的输入图像施加这些滤波器，提取关键的去雾知识。随后，在MCP-KAF阶段，对采集到的知识进一步进行综合分析，并通过各种复杂结构路径进行系统融合，得到高质量的去雾图像。在三个不同雾霾类别的公开数据集上实验验证了所提出框架的有效性和通用性。所有源代码将很快提供。

{"title":"Dehazing with all we have","authors":"Yuelong Li , Zhenwei Liu , Yue Xing , Kunliang Liu , Lei Geng , Qingzeng Song , Jianming Wang","doi":"10.1016/j.patrec.2024.11.011","DOIUrl":"10.1016/j.patrec.2024.11.011","url":null,"abstract":"<div><div>In the near past, a large number of classical intuitively originated dehazing and image enhancing approaches have been worked out, and once played key roles in tremendous practical application scenes. Nevertheless, nowadays, the booming of deep neural networks has fundamentally overturned the entire society, and deep learning is widely believed as the main dominant SOTA dehazing framework. Here, we wonder does that imply those once shining intuitive approaches are totally outdated and useless anymore? Following this idea, we propose a general framework that takes full advantage of both traditional intuitively designed and modern deep data driven series of techniques to realize high-quality image dehazing. It is mainly composed of two stages: Multiple I-Filters based Knowledge Extraction (MIF-KE) and Multi-Complexity Paths based Knowledge Analysis and Fusion (MCP-KAF). In MIF-KE, diverse intuitive dehazing techniques are sufficiently explored and evolved to a bunch of adaptive content enhancing I-Filters (I for Intuitive), with the assistance of automatic deep bilateral learning. Then, through pixel-wise affine transformation, these filters are imposed on preliminarily enhanced input image to extract critical dehazing knowledge. Subsequently, in the MCP-KAF stage, the collected knowledge are further comprehensively analyzed and systematically fused through various complexity structure paths to get high-quality dehazed image. The effectiveness and generality of proposed framework have been experimentally verified on three publicly available datasets with diverse haze categories. All source code will be provided soon.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"187 ","pages":"Pages 122-129"},"PeriodicalIF":3.9,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142746071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sparse-attention augmented domain adaptation for unsupervised person re-identification 用于无监督人员再识别的稀疏注意力增强领域适应性

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-19 DOI: 10.1016/j.patrec.2024.11.013

Wei Zhang , Peijun Ye , Tao Su , Dihu Chen

The domain gap persists as a demanding problem for unsupervised domain adaptive person re-identification (UDA re-ID). In response to this question, we present a novel Sparse self-Attention Augmented Domain Adaptation approach (SAADA Model) to promote network performance. In this work, we put forward a composite computational primitive (SAAP). The SAAP leverages sparse self-attention and convolution to enhance domain adaptation at the primitive level. Using SAAP as a core component, we construct an augmented bottleneck block to improve domain adaptation at the bottleneck block level. Finally, the augmented bottleneck block for domain adaptation can be cascaded into the SAADA module. After extensive experiments for UDA re-ID benchmarks, we deploy the SAADA module one time after the stage corresponding to the minimum feature map, and the performance of this method exceeds some SOTA methods. For example, the mAP has increased by 5.1% from the Market-1501 to the difficult MSMT17.

领域差距一直是无监督领域自适应人员再识别（UDA re-ID）的难题。针对这一问题，我们提出了一种新颖的稀疏自注意力增强域自适应方法（SAADA 模型），以提高网络性能。在这项工作中，我们提出了一种复合计算基元（SAAP）。SAAP 利用稀疏自注意力和卷积来增强基元级的域适应性。以 SAAP 为核心组件，我们构建了一个增强瓶颈块，以提高瓶颈块层面的域适应性。最后，用于域适应的增强瓶颈块可以级联到 SAADA 模块中。经过对 UDA re-ID 基准的大量实验，我们在最小特征图对应的阶段之后部署了一次 SAADA 模块，该方法的性能超过了一些 SOTA 方法。例如，从 Market-1501 到困难的 MSMT17，mAP 提高了 5.1%。

引用次数: 0

MACT: Underwater image color correction via Minimally Attenuated Channel Transfer MACT：通过最小衰减通道传输进行水下图像色彩校正

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-18 DOI: 10.1016/j.patrec.2024.11.007

Weibo Zhang , Hao Wang , Peng Ren , Weidong Zhang

Underwater images usually show reduced quality due to the underwater environment where light propagation is affected by scattering and absorption, severely limiting the effectiveness of underwater images in practical applications. To effectively deal with the problem of poor underwater image quality, this paper proposes an innovative Minimally Attenuated Channel Transfer (MACT) method that effectively recovers color distortion and enhances the visibility of underwater images. In underwater images captured from natural scenes, specific color channels are often observed to be severely attenuated. To compensate for the information loss caused by channel attenuation, our color correction method selects the channel with the most minor degradation in the degraded image as the reference channel. Subsequently, we employ the reference channel and the color compensation factor obtained by dual-mean difference to perform adaptive color compensation on different color-degraded channels. Finally, we balance the histogram distribution of the compensated color channels by a linear stretching operation. Extensive experimental results on three benchmark datasets demonstrate that our preprocessing method achieves better performance. The project page is available at https://www.researchgate.net/publication/384252681_2024-MACT.

由于水下环境中光的传播会受到散射和吸收的影响，水下图像的质量通常会有所下降，严重限制了水下图像在实际应用中的有效性。为有效应对水下图像质量差的问题，本文提出了一种创新的最小衰减通道传输（MACT）方法，可有效恢复水下图像的色彩失真并增强其可视性。在自然场景拍摄的水下图像中，经常会观察到特定颜色通道被严重衰减。为了弥补通道衰减造成的信息损失，我们的色彩校正方法会选择衰减程度最轻的通道作为参考通道。随后，我们利用参考通道和通过双均值差分得到的色彩补偿因子，对不同的色彩衰减通道进行自适应色彩补偿。最后，我们通过线性拉伸操作来平衡补偿后色彩通道的直方图分布。在三个基准数据集上的大量实验结果表明，我们的预处理方法取得了更好的性能。项目网页：https://www.researchgate.net/publication/384252681_2024-MACT。

{"title":"MACT: Underwater image color correction via Minimally Attenuated Channel Transfer","authors":"Weibo Zhang , Hao Wang , Peng Ren , Weidong Zhang","doi":"10.1016/j.patrec.2024.11.007","DOIUrl":"10.1016/j.patrec.2024.11.007","url":null,"abstract":"<div><div>Underwater images usually show reduced quality due to the underwater environment where light propagation is affected by scattering and absorption, severely limiting the effectiveness of underwater images in practical applications. To effectively deal with the problem of poor underwater image quality, this paper proposes an innovative Minimally Attenuated Channel Transfer (MACT) method that effectively recovers color distortion and enhances the visibility of underwater images. In underwater images captured from natural scenes, specific color channels are often observed to be severely attenuated. To compensate for the information loss caused by channel attenuation, our color correction method selects the channel with the most minor degradation in the degraded image as the reference channel. Subsequently, we employ the reference channel and the color compensation factor obtained by dual-mean difference to perform adaptive color compensation on different color-degraded channels. Finally, we balance the histogram distribution of the compensated color channels by a linear stretching operation. Extensive experimental results on three benchmark datasets demonstrate that our preprocessing method achieves better performance. The project page is available at <span><span>https://www.researchgate.net/publication/384252681_2024-MACT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"187 ","pages":"Pages 28-34"},"PeriodicalIF":3.9,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0