Pattern Recognition Letters最新文献

英文中文

Neuromorphic face analysis: A survey 神经形态人脸分析：调查

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-19 DOI: 10.1016/j.patrec.2024.11.009

Federico Becattini , Lorenzo Berlincioni , Luca Cultrera , Alberto Del Bimbo

Neuromorphic sensors, also known as event cameras, are a class of imaging devices mimicking the function of biological visual systems. Unlike traditional frame-based cameras, which capture fixed images at discrete intervals, neuromorphic sensors continuously generate events that represent changes in light intensity or motion in the visual field with high temporal resolution and low latency. These properties have proven to be interesting in modeling human faces, both from an effectiveness and a privacy-preserving point of view. Neuromorphic face analysis however is still a raw and unstructured field of research, with several attempts at addressing different tasks with no clear standard or benchmark. This survey paper presents a comprehensive overview of capabilities, challenges and emerging applications in the domain of neuromorphic face analysis, to outline promising directions and open issues. After discussing the fundamental working principles of neuromorphic vision and presenting an in-depth overview of the related research, we explore the current state of available data, data representations, emerging challenges, and limitations that require further investigation. This paper aims to highlight the recent progress in this evolving field to provide researchers an all-encompassing analysis of the state of the art along with its problems and shortcomings.

神经形态传感器又称事件相机，是一类模仿生物视觉系统功能的成像设备。传统的帧式摄像头以离散的时间间隔捕捉固定的图像，而神经形态传感器则不同，它能持续生成事件，这些事件代表了视野中光强度或运动的变化，具有高时间分辨率和低延迟的特点。事实证明，无论从有效性还是从保护隐私的角度来看，这些特性在人脸建模方面都很有意义。然而，神经形态人脸分析仍然是一个原始和无序的研究领域，在解决不同任务方面有多种尝试，但没有明确的标准或基准。本调查报告全面概述了神经形态人脸分析领域的能力、挑战和新兴应用，并勾勒出有前景的方向和有待解决的问题。在讨论了神经形态视觉的基本工作原理并对相关研究进行深入概述之后，我们探讨了可用数据的现状、数据表示、新出现的挑战以及需要进一步研究的局限性。本文旨在重点介绍这一不断发展的领域的最新进展，为研究人员提供对技术现状及其问题和不足的全方位分析。

{"title":"Neuromorphic face analysis: A survey","authors":"Federico Becattini , Lorenzo Berlincioni , Luca Cultrera , Alberto Del Bimbo","doi":"10.1016/j.patrec.2024.11.009","DOIUrl":"10.1016/j.patrec.2024.11.009","url":null,"abstract":"<div><div>Neuromorphic sensors, also known as event cameras, are a class of imaging devices mimicking the function of biological visual systems. Unlike traditional frame-based cameras, which capture fixed images at discrete intervals, neuromorphic sensors continuously generate events that represent changes in light intensity or motion in the visual field with high temporal resolution and low latency. These properties have proven to be interesting in modeling human faces, both from an effectiveness and a privacy-preserving point of view. Neuromorphic face analysis however is still a raw and unstructured field of research, with several attempts at addressing different tasks with no clear standard or benchmark. This survey paper presents a comprehensive overview of capabilities, challenges and emerging applications in the domain of neuromorphic face analysis, to outline promising directions and open issues. After discussing the fundamental working principles of neuromorphic vision and presenting an in-depth overview of the related research, we explore the current state of available data, data representations, emerging challenges, and limitations that require further investigation. This paper aims to highlight the recent progress in this evolving field to provide researchers an all-encompassing analysis of the state of the art along with its problems and shortcomings.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"187 ","pages":"Pages 42-48"},"PeriodicalIF":3.9,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi source-free domain adaptation based on pseudo-label knowledge mining 基于伪标签知识挖掘的多源无域适配

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-19 DOI: 10.1016/j.patrec.2024.11.014

Fang Zhou , Zun Xu , Wei Wei , Lei Zhang

MSFDA methods were proposed to train unlabeled target data using a group of source pre-trained models, without directly accessing labeled source domain data. Through transferring knowledge to target domain using pseudo labels obtained by source pre-trained models, existing methods have shown potential for cross-domain classification. However, these models have not directly addressed the negative knowledge transfer caused by incorrect pseudo labels. In this study, we focus on the problem and propose a multi-source-free domain adaptation method based on pseudo-label knowledge mining. Specifically, we first utilize average entropy weighting to compute pseudo labels for target data. Then, we assign a confidence level to each target sample, considering it as either high or low. Finally, we generate mixed augmented target samples and conduct different self-training tasks for those with different confidence to alleviate the negative transfer resulting from inaccurate pseudo labels. Experimental results on three datasets demonstrate the effectiveness of our proposed method.

MSFDA 方法的提出是为了使用一组源预训练模型来训练未标记的目标数据，而无需直接访问有标记的源领域数据。通过使用源预训练模型获得的伪标签将知识转移到目标域，现有方法已显示出跨域分类的潜力。然而，这些模型并没有直接解决伪标签不正确所造成的负面知识转移问题。在本研究中，我们聚焦于这一问题，提出了一种基于伪标签知识挖掘的无源多域适应方法。具体来说，我们首先利用平均熵加权计算目标数据的伪标签。然后，我们为每个目标样本分配一个置信度，将其视为高或低。最后，我们生成混合增强的目标样本，并针对不同置信度的样本执行不同的自我训练任务，以减轻伪标签不准确带来的负迁移。在三个数据集上的实验结果证明了我们所提方法的有效性。

引用次数: 0

Dehazing with all we have 用我们所有的一切去除雾

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-19 DOI: 10.1016/j.patrec.2024.11.011

Yuelong Li , Zhenwei Liu , Yue Xing , Kunliang Liu , Lei Geng , Qingzeng Song , Jianming Wang

In the near past, a large number of classical intuitively originated dehazing and image enhancing approaches have been worked out, and once played key roles in tremendous practical application scenes. Nevertheless, nowadays, the booming of deep neural networks has fundamentally overturned the entire society, and deep learning is widely believed as the main dominant SOTA dehazing framework. Here, we wonder does that imply those once shining intuitive approaches are totally outdated and useless anymore? Following this idea, we propose a general framework that takes full advantage of both traditional intuitively designed and modern deep data driven series of techniques to realize high-quality image dehazing. It is mainly composed of two stages: Multiple I-Filters based Knowledge Extraction (MIF-KE) and Multi-Complexity Paths based Knowledge Analysis and Fusion (MCP-KAF). In MIF-KE, diverse intuitive dehazing techniques are sufficiently explored and evolved to a bunch of adaptive content enhancing I-Filters (I for Intuitive), with the assistance of automatic deep bilateral learning. Then, through pixel-wise affine transformation, these filters are imposed on preliminarily enhanced input image to extract critical dehazing knowledge. Subsequently, in the MCP-KAF stage, the collected knowledge are further comprehensively analyzed and systematically fused through various complexity structure paths to get high-quality dehazed image. The effectiveness and generality of proposed framework have been experimentally verified on three publicly available datasets with diverse haze categories. All source code will be provided soon.

近年来，人们研究出了大量经典的直观产生的去雾和图像增强方法，并在大量的实际应用场景中发挥了关键作用。然而，如今深度神经网络的蓬勃发展已经从根本上颠覆了整个社会，深度学习被广泛认为是SOTA除雾的主要主导框架。在这里，我们想知道这是否意味着那些曾经闪亮的直觉方法已经完全过时和无用了？根据这一思路，我们提出了一个综合利用传统直观设计和现代深度数据驱动系列技术实现高质量图像去雾的总体框架。它主要由两个阶段组成：基于多i- filter的知识提取（MIF-KE）和基于多复杂度路径的知识分析与融合（MCP-KAF）。在MIF-KE中，充分探索了多种直观的除雾技术，并在自动深度双边学习的帮助下，发展成为一堆自适应的内容增强I- filters （I for intuitive）。然后，通过逐像素的仿射变换，对初步增强的输入图像施加这些滤波器，提取关键的去雾知识。随后，在MCP-KAF阶段，对采集到的知识进一步进行综合分析，并通过各种复杂结构路径进行系统融合，得到高质量的去雾图像。在三个不同雾霾类别的公开数据集上实验验证了所提出框架的有效性和通用性。所有源代码将很快提供。

{"title":"Dehazing with all we have","authors":"Yuelong Li , Zhenwei Liu , Yue Xing , Kunliang Liu , Lei Geng , Qingzeng Song , Jianming Wang","doi":"10.1016/j.patrec.2024.11.011","DOIUrl":"10.1016/j.patrec.2024.11.011","url":null,"abstract":"<div><div>In the near past, a large number of classical intuitively originated dehazing and image enhancing approaches have been worked out, and once played key roles in tremendous practical application scenes. Nevertheless, nowadays, the booming of deep neural networks has fundamentally overturned the entire society, and deep learning is widely believed as the main dominant SOTA dehazing framework. Here, we wonder does that imply those once shining intuitive approaches are totally outdated and useless anymore? Following this idea, we propose a general framework that takes full advantage of both traditional intuitively designed and modern deep data driven series of techniques to realize high-quality image dehazing. It is mainly composed of two stages: Multiple I-Filters based Knowledge Extraction (MIF-KE) and Multi-Complexity Paths based Knowledge Analysis and Fusion (MCP-KAF). In MIF-KE, diverse intuitive dehazing techniques are sufficiently explored and evolved to a bunch of adaptive content enhancing I-Filters (I for Intuitive), with the assistance of automatic deep bilateral learning. Then, through pixel-wise affine transformation, these filters are imposed on preliminarily enhanced input image to extract critical dehazing knowledge. Subsequently, in the MCP-KAF stage, the collected knowledge are further comprehensively analyzed and systematically fused through various complexity structure paths to get high-quality dehazed image. The effectiveness and generality of proposed framework have been experimentally verified on three publicly available datasets with diverse haze categories. All source code will be provided soon.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"187 ","pages":"Pages 122-129"},"PeriodicalIF":3.9,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142746071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sparse-attention augmented domain adaptation for unsupervised person re-identification 用于无监督人员再识别的稀疏注意力增强领域适应性

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-19 DOI: 10.1016/j.patrec.2024.11.013

Wei Zhang , Peijun Ye , Tao Su , Dihu Chen

The domain gap persists as a demanding problem for unsupervised domain adaptive person re-identification (UDA re-ID). In response to this question, we present a novel Sparse self-Attention Augmented Domain Adaptation approach (SAADA Model) to promote network performance. In this work, we put forward a composite computational primitive (SAAP). The SAAP leverages sparse self-attention and convolution to enhance domain adaptation at the primitive level. Using SAAP as a core component, we construct an augmented bottleneck block to improve domain adaptation at the bottleneck block level. Finally, the augmented bottleneck block for domain adaptation can be cascaded into the SAADA module. After extensive experiments for UDA re-ID benchmarks, we deploy the SAADA module one time after the stage corresponding to the minimum feature map, and the performance of this method exceeds some SOTA methods. For example, the mAP has increased by 5.1% from the Market-1501 to the difficult MSMT17.

领域差距一直是无监督领域自适应人员再识别（UDA re-ID）的难题。针对这一问题，我们提出了一种新颖的稀疏自注意力增强域自适应方法（SAADA 模型），以提高网络性能。在这项工作中，我们提出了一种复合计算基元（SAAP）。SAAP 利用稀疏自注意力和卷积来增强基元级的域适应性。以 SAAP 为核心组件，我们构建了一个增强瓶颈块，以提高瓶颈块层面的域适应性。最后，用于域适应的增强瓶颈块可以级联到 SAADA 模块中。经过对 UDA re-ID 基准的大量实验，我们在最小特征图对应的阶段之后部署了一次 SAADA 模块，该方法的性能超过了一些 SOTA 方法。例如，从 Market-1501 到困难的 MSMT17，mAP 提高了 5.1%。

引用次数: 0

MACT: Underwater image color correction via Minimally Attenuated Channel Transfer MACT：通过最小衰减通道传输进行水下图像色彩校正

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-18 DOI: 10.1016/j.patrec.2024.11.007

Weibo Zhang , Hao Wang , Peng Ren , Weidong Zhang

Underwater images usually show reduced quality due to the underwater environment where light propagation is affected by scattering and absorption, severely limiting the effectiveness of underwater images in practical applications. To effectively deal with the problem of poor underwater image quality, this paper proposes an innovative Minimally Attenuated Channel Transfer (MACT) method that effectively recovers color distortion and enhances the visibility of underwater images. In underwater images captured from natural scenes, specific color channels are often observed to be severely attenuated. To compensate for the information loss caused by channel attenuation, our color correction method selects the channel with the most minor degradation in the degraded image as the reference channel. Subsequently, we employ the reference channel and the color compensation factor obtained by dual-mean difference to perform adaptive color compensation on different color-degraded channels. Finally, we balance the histogram distribution of the compensated color channels by a linear stretching operation. Extensive experimental results on three benchmark datasets demonstrate that our preprocessing method achieves better performance. The project page is available at https://www.researchgate.net/publication/384252681_2024-MACT.

由于水下环境中光的传播会受到散射和吸收的影响，水下图像的质量通常会有所下降，严重限制了水下图像在实际应用中的有效性。为有效应对水下图像质量差的问题，本文提出了一种创新的最小衰减通道传输（MACT）方法，可有效恢复水下图像的色彩失真并增强其可视性。在自然场景拍摄的水下图像中，经常会观察到特定颜色通道被严重衰减。为了弥补通道衰减造成的信息损失，我们的色彩校正方法会选择衰减程度最轻的通道作为参考通道。随后，我们利用参考通道和通过双均值差分得到的色彩补偿因子，对不同的色彩衰减通道进行自适应色彩补偿。最后，我们通过线性拉伸操作来平衡补偿后色彩通道的直方图分布。在三个基准数据集上的大量实验结果表明，我们的预处理方法取得了更好的性能。项目网页：https://www.researchgate.net/publication/384252681_2024-MACT。

{"title":"MACT: Underwater image color correction via Minimally Attenuated Channel Transfer","authors":"Weibo Zhang , Hao Wang , Peng Ren , Weidong Zhang","doi":"10.1016/j.patrec.2024.11.007","DOIUrl":"10.1016/j.patrec.2024.11.007","url":null,"abstract":"<div><div>Underwater images usually show reduced quality due to the underwater environment where light propagation is affected by scattering and absorption, severely limiting the effectiveness of underwater images in practical applications. To effectively deal with the problem of poor underwater image quality, this paper proposes an innovative Minimally Attenuated Channel Transfer (MACT) method that effectively recovers color distortion and enhances the visibility of underwater images. In underwater images captured from natural scenes, specific color channels are often observed to be severely attenuated. To compensate for the information loss caused by channel attenuation, our color correction method selects the channel with the most minor degradation in the degraded image as the reference channel. Subsequently, we employ the reference channel and the color compensation factor obtained by dual-mean difference to perform adaptive color compensation on different color-degraded channels. Finally, we balance the histogram distribution of the compensated color channels by a linear stretching operation. Extensive experimental results on three benchmark datasets demonstrate that our preprocessing method achieves better performance. The project page is available at <span><span>https://www.researchgate.net/publication/384252681_2024-MACT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"187 ","pages":"Pages 28-34"},"PeriodicalIF":3.9,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FM-detector: End-to-end flight maneuver recognition method based on flight data 调频探测器：基于飞行数据的端到端飞行动作识别方法

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-17 DOI: 10.1016/j.patrec.2024.11.005

Qingchao Wang, Dexiang Sun, Hao Pang, Xiaodong Zhao, Shoushuo Liu

Flight maneuver recognition (FMR) refers to automatic recognizing a series of aircraft flight patterns and is a key technology for flight training evaluation. However, the traditional FMR methods generally have higher computational complexity and lower recognition accuracy as these methods cannot effectively combine data segmentation, feature extraction and classification. We integrate all phases of FMR into a deep learning framework called FM-detector, instead of designing data segmentation, feature extraction, and classification methods separately. FM-detector consists of three parts: backbone for feature extraction, RPN for generating boundary proposals and CRN for flight maneuver classification and location. The experiment showed that our FM-detector achieved over 97% mAP in recognizing flight maneuvers, with higher quality predicted boundary and lower inference time. Code is available at https://gitcode.net/m0_46456645/fm-detector.

飞行动作识别（FMR）是指自动识别一系列飞机飞行模式，是飞行训练评估的一项关键技术。然而，由于传统的飞行动作识别方法不能有效地将数据分割、特征提取和分类结合起来，因此普遍存在计算复杂度较高、识别准确率较低的问题。我们将 FMR 的所有阶段整合到一个名为 FM-detector 的深度学习框架中，而不是分别设计数据分割、特征提取和分类方法。FM-detector 由三部分组成：用于特征提取的骨干网、用于生成边界建议的 RPN 和用于飞行动作分类和定位的 CRN。实验结果表明，我们的调频探测器在识别飞行动作方面达到了 97% 以上的 mAP，预测边界的质量更高，推理时间更短。代码见 https://gitcode.net/m0_46456645/fm-detector。

引用次数: 0

Graph Neural Networks with maximal independent set-based pooling: Mitigating over-smoothing and over-squashing 基于最大独立集合池的图神经网络：减轻过度平滑和过度挤压

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-17 DOI: 10.1016/j.patrec.2024.11.004

Stevan Stanovic , Benoit Gaüzère , Luc Brun

Graph Neural Networks (GNNs) have significantly advanced graph-level prediction tasks by utilizing efficient convolution and pooling techniques. However, traditional pooling methods in GNNs often fail to preserve key properties, leading to challenges such as graph disconnection, low decimation ratios, and substantial data loss. In this paper, we introduce three novel pooling methods based on Maximal Independent Sets (MIS) to address these issues. Additionally, we provide a theoretical and empirical study on the impact of these pooling methods on over-smoothing and over-squashing phenomena. Our experimental results not only confirm the effectiveness of using maximal independent sets to define pooling operations but also demonstrate their crucial role in mitigating over-smoothing and over-squashing.

图神经网络（GNN）利用高效的卷积和池化技术大大推进了图级预测任务。然而，图神经网络中的传统池化方法往往无法保留关键属性，从而导致图断开、低抽取率和大量数据丢失等挑战。在本文中，我们介绍了三种基于最大独立集（MIS）的新型池化方法来解决这些问题。此外，我们还对这些池化方法对过度平滑和过度扭曲现象的影响进行了理论和实证研究。我们的实验结果不仅证实了使用最大独立集来定义池化操作的有效性，还证明了它们在缓解过度平滑和过度扭曲方面的关键作用。

引用次数: 0

LED-Net: A lightweight edge detection network LED-Net：轻量级边缘检测网络

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-16 DOI: 10.1016/j.patrec.2024.11.006

Shucheng Ji, Xiaochen Yuan, Junqi Bao, Tong Liu

As a fundamental task in computer vision, edge detection is becoming increasingly vital in many fields. Recently, large-parameter pre-training models have been used in edge detection tasks. However, significant computational resources are required. This paper presents a Lightweight Edge Detection Network (LED-Net) with only 50K parameters. It mainly consists of three blocks: Coordinate Depthwise Separable Convolution Block (CDSCB), Sample Depthwise Separable Convolution Block (SDSCB), and Feature Fusion Block (FFB). The CDSCB extracts multi-scale features with positional information, thus reducing the time complexity while guaranteeing the performance. Furthermore, SDSCB is adopted to rescale the multi-scale features to a unified resolution efficiently. To obtain refined edge lines, the FFB is adopted to aggregate the features. In addition, a unified loss function is proposed to achieve a thinner edge prediction. By training on the BIPED dataset and evaluating on the UDED dataset, results show that the proposed LED-Net achieves superior performance in both ODS (0.839), OIS (0.855), and AP (0.830).

作为计算机视觉的一项基本任务，边缘检测在许多领域都变得越来越重要。最近，大参数预训练模型被用于边缘检测任务。然而，这需要大量的计算资源。本文提出了一种仅需 50K 参数的轻量级边缘检测网络（LED-Net）。它主要由三个模块组成：坐标深度可分离卷积块（CDSCB）、样本深度可分离卷积块（SDSCB）和特征融合块（FFB）。CDSCB 提取带有位置信息的多尺度特征，从而在保证性能的同时降低了时间复杂度。此外，还采用 SDSCB 将多尺度特征有效地调整为统一分辨率。为了获得精细的边缘线，采用 FFB 对特征进行聚合。此外，还提出了一种统一的损失函数，以实现更细的边缘预测。通过在 BIPED 数据集上进行训练和在 UDED 数据集上进行评估，结果表明所提出的 LED-Net 在 ODS（0.839）、OIS（0.855）和 AP（0.830）方面都取得了优异的性能。

引用次数: 0

IDA-NET: Individual Difference aware Medical Image Segmentation with Meta-Learning IDA-NET：利用元学习进行个体差异感知医学图像分割

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-16 DOI: 10.1016/j.patrec.2024.11.012

Zheng Zhang , Guanchun Yin , Zibo Ma , Yunpeng Tan , Bo Zhang , Yufeng Zhuang

Individual differences in organ size and spatial distribution can lead to significant variations in the content of medical images at similar anatomical locations. These case-level differences are distinct from the domain shift between multi-source data, yet they can significantly affect model performance and are difficult to address using traditional transfer learning algorithms such as domain generalization. To address the individual difference issue, we propose an individual difference aware meta-learning strategy and introduce an individual discriminator module. These components are designed to learn features related to individual difference, enhancing the model’s ability to accurately segment organs across different patients. Additionally, we present a Transformer-based U-Net framework that captures both long- and short-range dependencies from MR images. This framework utilizes a parallel attention module to address the limitations of self-attention and employs an inter-layer attention module to extract attention relationships across different layers. We evaluate our approach using the Synapse dataset. Results indicate that focusing on individual difference not only significantly improves the performance of various sub-modules, allowing our method to surpass several state-of-the-art methods, but also proves to be beneficial for many other methods as well.

器官大小和空间分布的个体差异会导致相似解剖位置的医学图像内容出现显著差异。这些病例层面的差异不同于多源数据之间的领域转移，但它们会严重影响模型性能，而且难以用领域泛化等传统迁移学习算法来解决。为了解决个体差异问题，我们提出了个体差异感知元学习策略，并引入了个体判别模块。这些组件旨在学习与个体差异相关的特征，从而提高模型准确分割不同患者器官的能力。此外，我们还提出了基于变换器的 U-Net 框架，该框架可捕捉 MR 图像中的长程和短程依赖关系。该框架利用并行注意力模块来解决自我注意力的局限性，并利用层间注意力模块来提取不同层间的注意力关系。我们使用 Synapse 数据集对我们的方法进行了评估。结果表明，关注个体差异不仅能显著提高各种子模块的性能，使我们的方法超越了几种最先进的方法，而且证明对许多其他方法也是有益的。

{"title":"IDA-NET: Individual Difference aware Medical Image Segmentation with Meta-Learning","authors":"Zheng Zhang , Guanchun Yin , Zibo Ma , Yunpeng Tan , Bo Zhang , Yufeng Zhuang","doi":"10.1016/j.patrec.2024.11.012","DOIUrl":"10.1016/j.patrec.2024.11.012","url":null,"abstract":"<div><div>Individual differences in organ size and spatial distribution can lead to significant variations in the content of medical images at similar anatomical locations. These case-level differences are distinct from the domain shift between multi-source data, yet they can significantly affect model performance and are difficult to address using traditional transfer learning algorithms such as domain generalization. To address the individual difference issue, we propose an individual difference aware meta-learning strategy and introduce an individual discriminator module. These components are designed to learn features related to individual difference, enhancing the model’s ability to accurately segment organs across different patients. Additionally, we present a Transformer-based U-Net framework that captures both long- and short-range dependencies from MR images. This framework utilizes a parallel attention module to address the limitations of self-attention and employs an inter-layer attention module to extract attention relationships across different layers. We evaluate our approach using the Synapse dataset. Results indicate that focusing on individual difference not only significantly improves the performance of various sub-modules, allowing our method to surpass several state-of-the-art methods, but also proves to be beneficial for many other methods as well.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"187 ","pages":"Pages 21-27"},"PeriodicalIF":3.9,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A semi-supervised approach for breast tumor segmentation using sparse transformer attention UNet 利用稀疏变换器注意 UNet 的半监督乳腺肿瘤分割方法

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-10 DOI: 10.1016/j.patrec.2024.11.008

Muhammad Wajid , Ahmed Iqbal , Isra Malik , Syed Jawad Hussain , Yasir Jan

Accurate segmentation of breast tumors, especially in younger women, remains a significant challenge in cancer research. Ultrasound imaging, a non-invasive screening method, relies on tumor characteristics such as size and texture, which are crucial for clinicians to make precise diagnoses. However, the lack of annotated datasets necessitates the development of advanced deep learning models. While traditional U-Net models, based on Convolutional Neural Networks (CNNs), excel at local feature extraction, they struggle to capture long-range dependencies. Transformer models address this limitation but are computationally demanding and require large, annotated dataset. To overcome these challenges, we propose a semi-supervised learning approach with three components: The Diverse Image Generation Network (DIGN), the Adaptive Probability Mapping Network (APMN), and STA-UNet (Sparse Transformer Attention UNet), a novel architecture designed to efficiently capture long-range dependencies while reducing computational cost. Experimental results demonstrate that STA-UNet significantly outperforms traditional U-Net models. On the Mendeley dataset, STA-UNet achieves a 4.10 % improvement in the Jaccard Similarity Coefficient (JSC), a 3.84 % increase in the Dice Similarity Coefficient (DSC), and a 38.00 % reduction in Hausdorff Distance (HD). Similarly, on the SIIT dataset, STA-UNet shows a 1.92 % increase in JSC, a 2.02 % improvement in DSC, and a 30.58 % reduction in HD.

准确分割乳腺肿瘤，尤其是年轻女性的乳腺肿瘤，仍然是癌症研究中的一项重大挑战。超声波成像作为一种无创筛查方法，依赖于肿瘤的大小和纹理等特征，这些特征对于临床医生做出精确诊断至关重要。然而，由于缺乏注释数据集，因此有必要开发先进的深度学习模型。虽然基于卷积神经网络（CNN）的传统 U-Net 模型在局部特征提取方面表现出色，但在捕捉长程依赖性方面却很吃力。变换器模型解决了这一局限性，但对计算要求较高，而且需要大型注释数据集。为了克服这些挑战，我们提出了一种由三个部分组成的半监督学习方法：多样化图像生成网络（DIGN）、自适应概率映射网络（APMN）和稀疏变换器注意网络（STA-UNet），这是一种新颖的架构，旨在有效捕捉长距离依赖关系，同时降低计算成本。实验结果表明，STA-UNet 明显优于传统的 U-Net 模型。在 Mendeley 数据集上，STA-UNet 的 Jaccard 相似性系数（JSC）提高了 4.10%，Dice 相似性系数（DSC）提高了 3.84%，Hausdorff 距离（HD）减少了 38.00%。同样，在 SIIT 数据集上，STA-UNet 显示 JSC 增加了 1.92%，DSC 提高了 2.02%，HD 减少了 30.58%。

{"title":"A semi-supervised approach for breast tumor segmentation using sparse transformer attention UNet","authors":"Muhammad Wajid , Ahmed Iqbal , Isra Malik , Syed Jawad Hussain , Yasir Jan","doi":"10.1016/j.patrec.2024.11.008","DOIUrl":"10.1016/j.patrec.2024.11.008","url":null,"abstract":"<div><div>Accurate segmentation of breast tumors, especially in younger women, remains a significant challenge in cancer research. Ultrasound imaging, a non-invasive screening method, relies on tumor characteristics such as size and texture, which are crucial for clinicians to make precise diagnoses. However, the lack of annotated datasets necessitates the development of advanced deep learning models. While traditional U-Net models, based on Convolutional Neural Networks (CNNs), excel at local feature extraction, they struggle to capture long-range dependencies. Transformer models address this limitation but are computationally demanding and require large, annotated dataset. To overcome these challenges, we propose a semi-supervised learning approach with three components: The Diverse Image Generation Network (DIGN), the Adaptive Probability Mapping Network (APMN), and STA-UNet (Sparse Transformer Attention UNet), a novel architecture designed to efficiently capture long-range dependencies while reducing computational cost. Experimental results demonstrate that STA-UNet significantly outperforms traditional U-Net models. On the Mendeley dataset, STA-UNet achieves a 4.10 % improvement in the Jaccard Similarity Coefficient (JSC), a 3.84 % increase in the Dice Similarity Coefficient (DSC), and a 38.00 % reduction in Hausdorff Distance (HD). Similarly, on the SIIT dataset, STA-UNet shows a 1.92 % increase in JSC, a 2.02 % improvement in DSC, and a 30.58 % reduction in HD.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"187 ","pages":"Pages 63-72"},"PeriodicalIF":3.9,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Pattern Recognition Letters

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀