首页 > 最新文献

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

英文 中文
Exploration of Learned Lifting-Based Transform Structures for Fully Scalable and Accessible Wavelet-Like Image Compression 探索基于学习提升的变换结构,实现完全可扩展、可访问的小波图像压缩
Xinyue Li;Aous Naman;David Taubman
This paper provides a comprehensive study on features and performance of different ways to incorporate neural networks into lifting-based wavelet-like transforms, within the context of fully scalable and accessible image compression. Specifically, we explore different arrangements of lifting steps, as well as various network architectures for learned lifting operators. Moreover, we examine the impact of the number of learned lifting steps, the number of channels, the number of layers and the support of kernels in each learned lifting operator. To facilitate the study, we investigate two generic training methodologies that are simultaneously appropriate to a wide variety of lifting structures considered. Experimental results ultimately suggest that retaining fixed lifting steps from the base wavelet transform is highly beneficial. Moreover, we demonstrate that employing more learned lifting steps and more layers in each learned lifting operator do not contribute strongly to the compression performance. However, benefits can be obtained by utilizing more channels in each learned lifting operator. Ultimately, the learned wavelet-like transform proposed in this paper achieves over 25% bit-rate savings compared to JPEG 2000 with compact spatial support.
本文以完全可扩展和可访问的图像压缩为背景,全面研究了将神经网络融入基于提升的小波变换的不同方法的特点和性能。具体来说,我们探索了提升步骤的不同安排,以及学习提升算子的各种网络架构。此外,我们还研究了每个学习到的提升算子中学习到的提升步骤数量、通道数量、层数和内核支持的影响。为便于研究,我们研究了两种通用训练方法,它们同时适用于所考虑的各种提升结构。实验结果最终表明,保留基础小波变换的固定提升步骤非常有益。此外,我们还证明,采用更多的学习提升步骤和每个学习提升算子中的更多层次,对压缩性能的贡献并不大。然而,通过在每个学习提升算子中使用更多通道,可以获得更多好处。最终,本文提出的学习型小波变换与具有紧凑空间支持的 JPEG 2000 相比,可节省超过 25% 的比特率。
{"title":"Exploration of Learned Lifting-Based Transform Structures for Fully Scalable and Accessible Wavelet-Like Image Compression","authors":"Xinyue Li;Aous Naman;David Taubman","doi":"10.1109/TIP.2024.3482877","DOIUrl":"10.1109/TIP.2024.3482877","url":null,"abstract":"This paper provides a comprehensive study on features and performance of different ways to incorporate neural networks into lifting-based wavelet-like transforms, within the context of fully scalable and accessible image compression. Specifically, we explore different arrangements of lifting steps, as well as various network architectures for learned lifting operators. Moreover, we examine the impact of the number of learned lifting steps, the number of channels, the number of layers and the support of kernels in each learned lifting operator. To facilitate the study, we investigate two generic training methodologies that are simultaneously appropriate to a wide variety of lifting structures considered. Experimental results ultimately suggest that retaining fixed lifting steps from the base wavelet transform is highly beneficial. Moreover, we demonstrate that employing more learned lifting steps and more layers in each learned lifting operator do not contribute strongly to the compression performance. However, benefits can be obtained by utilizing more channels in each learned lifting operator. Ultimately, the learned wavelet-like transform proposed in this paper achieves over 25% bit-rate savings compared to JPEG 2000 with compact spatial support.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6173-6188"},"PeriodicalIF":0.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142488358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bi-Directionally Fused Boundary Aware Network for Skin Lesion Segmentation 用于皮损分割的双向融合边界感知网络
Feiniu Yuan;Yuhuan Peng;Qinghua Huang;Xuelong Li
It is quite challenging to visually identify skin lesions with irregular shapes, blurred boundaries and large scale variances. Convolutional Neural Network (CNN) extracts more local features with abundant spatial information, while Transformer has the powerful ability to capture more global information but with insufficient spatial details. To overcome the difficulties in discriminating small or blurred skin lesions, we propose a Bi-directionally Fused Boundary Aware Network (BiFBA-Net). To utilize complementary features produced by CNNs and Transformers, we design a dual-encoding structure. Different from existing dual-encoders, our method designs a Bi-directional Attention Gate (Bi-AG) with two inputs and two outputs for crosswise feature fusion. Our Bi-AG accepts two kinds of features from CNN and Transformer encoders, and two attention gates are designed to generate two attention outputs that are sent back to the two encoders. Thus, we implement adequate exchanging of multi-scale information between CNN and Transformer encoders in a bi-directional and attention way. To perfectly restore feature maps, we propose a progressive decoding structure with boundary aware, containing three decoders with six supervised losses. The first decoder is a CNN network for producing more spatial details. The second one is a Partial Decoder (PD) for aggregating high-level features with more semantics. The last one is a Boundary Aware Decoder (BAD) proposed to progressively improve boundary accuracy. Our BAD uses residual structure and Reverse Attention (RA) at different scales to deeply mine structural and spatial details for refining lesion boundaries. Extensive experiments on public datasets show that our BiFBA-Net achieves higher segmentation accuracy, and has much better ability of boundary perceptions than compared methods. It also alleviates both over-segmentation of small lesions and under-segmentation of large ones.
要通过视觉识别形状不规则、边界模糊、尺度差异大的皮肤病变是一项相当具有挑战性的工作。卷积神经网络(CNN)能提取空间信息丰富的局部特征,而变换器(Transformer)则能捕捉更多全局信息,但空间细节不足。为了克服辨别细小或模糊皮损的困难,我们提出了双向融合边界感知网络(BiFBA-Net)。为了利用 CNN 和变换器产生的互补特征,我们设计了一种双编码结构。与现有的双编码器不同,我们的方法设计了一个具有两个输入和两个输出的双向注意门(Bi-AG),用于交叉特征融合。我们的双向注意门(Bi-AG)接受来自 CNN 和 Transformer 编码器的两种特征,并设计了两个注意门来生成两个注意输出,将其送回两个编码器。因此,我们以双向和注意力的方式在 CNN 和变换器编码器之间实现了多尺度信息的充分交换。为了完美还原特征图,我们提出了一种具有边界感知的渐进式解码结构,其中包含三个具有六个监督损失的解码器。第一个解码器是一个 CNN 网络,用于生成更多空间细节。第二个解码器是部分解码器(PD),用于聚合具有更多语义的高级特征。最后一个是边界感知解码器(BAD),用于逐步提高边界准确性。我们的 BAD 使用不同尺度的残余结构和反向注意(RA)来深入挖掘结构和空间细节,以完善病变边界。在公共数据集上进行的大量实验表明,我们的 BiFBA-Net 可实现更高的分割准确度,其边界感知能力也远胜于其他方法。它还能减轻对小病灶的过度分割和对大病灶的分割不足。
{"title":"A Bi-Directionally Fused Boundary Aware Network for Skin Lesion Segmentation","authors":"Feiniu Yuan;Yuhuan Peng;Qinghua Huang;Xuelong Li","doi":"10.1109/TIP.2024.3482864","DOIUrl":"10.1109/TIP.2024.3482864","url":null,"abstract":"It is quite challenging to visually identify skin lesions with irregular shapes, blurred boundaries and large scale variances. Convolutional Neural Network (CNN) extracts more local features with abundant spatial information, while Transformer has the powerful ability to capture more global information but with insufficient spatial details. To overcome the difficulties in discriminating small or blurred skin lesions, we propose a Bi-directionally Fused Boundary Aware Network (BiFBA-Net). To utilize complementary features produced by CNNs and Transformers, we design a dual-encoding structure. Different from existing dual-encoders, our method designs a Bi-directional Attention Gate (Bi-AG) with two inputs and two outputs for crosswise feature fusion. Our Bi-AG accepts two kinds of features from CNN and Transformer encoders, and two attention gates are designed to generate two attention outputs that are sent back to the two encoders. Thus, we implement adequate exchanging of multi-scale information between CNN and Transformer encoders in a bi-directional and attention way. To perfectly restore feature maps, we propose a progressive decoding structure with boundary aware, containing three decoders with six supervised losses. The first decoder is a CNN network for producing more spatial details. The second one is a Partial Decoder (PD) for aggregating high-level features with more semantics. The last one is a Boundary Aware Decoder (BAD) proposed to progressively improve boundary accuracy. Our BAD uses residual structure and Reverse Attention (RA) at different scales to deeply mine structural and spatial details for refining lesion boundaries. Extensive experiments on public datasets show that our BiFBA-Net achieves higher segmentation accuracy, and has much better ability of boundary perceptions than compared methods. It also alleviates both over-segmentation of small lesions and under-segmentation of large ones.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6340-6353"},"PeriodicalIF":0.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142488442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EviPrompt: A Training-Free Evidential Prompt Generation Method for Adapting Segment Anything Model in Medical Images EviPrompt:用于调整医学影像中分段模型的免训练证据提示生成方法
Yinsong Xu;Jiaqi Tang;Aidong Men;Qingchao Chen
Medical image segmentation is a critical task in clinical applications. Recently, the Segment Anything Model (SAM) has demonstrated potential for natural image segmentation. However, the requirement for expert labour to provide prompts, and the domain gap between natural and medical images pose significant obstacles in adapting SAM to medical images. To overcome these challenges, this paper introduces a novel prompt generation method named EviPrompt. The proposed method requires only a single reference image-annotation pair, making it a training-free solution that significantly reduces the need for extensive labelling and computational resources. First, prompts are automatically generated based on the similarity between features of the reference and target images, and evidential learning is introduced to improve reliability. Then, to mitigate the impact of the domain gap, committee voting and inference-guided in-context learning are employed, generating prompts primarily based on human prior knowledge and reducing reliance on extracted semantic information. EviPrompt represents an efficient and robust approach to medical image segmentation. We evaluate it across a broad range of tasks and modalities, confirming its efficacy. The source code is available at https://github.com/SPIresearch/EviPrompt.
医学图像分割是临床应用中的一项关键任务。最近,"任意分割模型"(Segment Anything Model,SAM)在自然图像分割方面展现出了潜力。然而,由于需要专家提供提示,而且自然图像与医学图像之间存在领域差距,因此将 SAM 应用于医学图像存在重大障碍。为了克服这些挑战,本文介绍了一种名为 EviPrompt 的新型提示生成方法。所提出的方法只需要一个参考图像-注释对,是一种无需训练的解决方案,大大减少了对大量标注和计算资源的需求。首先,根据参考图像和目标图像特征之间的相似性自动生成提示,并引入证据学习以提高可靠性。然后,为了减轻领域差距的影响,采用了委员会投票和推理引导的上下文学习,主要根据人类的先验知识生成提示,减少对提取的语义信息的依赖。EviPrompt 是一种高效、稳健的医学图像分割方法。我们在广泛的任务和模式中对其进行了评估,证实了它的功效。源代码见 https://github.com/SPIresearch/EviPrompt。
{"title":"EviPrompt: A Training-Free Evidential Prompt Generation Method for Adapting Segment Anything Model in Medical Images","authors":"Yinsong Xu;Jiaqi Tang;Aidong Men;Qingchao Chen","doi":"10.1109/TIP.2024.3482175","DOIUrl":"10.1109/TIP.2024.3482175","url":null,"abstract":"Medical image segmentation is a critical task in clinical applications. Recently, the Segment Anything Model (SAM) has demonstrated potential for natural image segmentation. However, the requirement for expert labour to provide prompts, and the domain gap between natural and medical images pose significant obstacles in adapting SAM to medical images. To overcome these challenges, this paper introduces a novel prompt generation method named EviPrompt. The proposed method requires only a single reference image-annotation pair, making it a training-free solution that significantly reduces the need for extensive labelling and computational resources. First, prompts are automatically generated based on the similarity between features of the reference and target images, and evidential learning is introduced to improve reliability. Then, to mitigate the impact of the domain gap, committee voting and inference-guided in-context learning are employed, generating prompts primarily based on human prior knowledge and reducing reliance on extracted semantic information. EviPrompt represents an efficient and robust approach to medical image segmentation. We evaluate it across a broad range of tasks and modalities, confirming its efficacy. The source code is available at \u0000<uri>https://github.com/SPIresearch/EviPrompt</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6204-6215"},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Scalable Training Strategy for Blind Multi-Distribution Noise Removal 多分布盲法噪声消除的可扩展训练策略
Kevin Zhang;Sakshum Kulshrestha;Christopher A. Metzler
Despite recent advances, developing general-purpose universal denoising and artifact-removal networks remains largely an open problem: Given fixed network weights, one inherently trades-off specialization at one task (e.g., removing Poisson noise) for performance at another (e.g., removing speckle noise). In addition, training such a network is challenging due to the curse of dimensionality: As one increases the dimensions of the specification-space (i.e., the number of parameters needed to describe the noise distribution) the number of unique specifications one needs to train for grows exponentially. Uniformly sampling this space will result in a network that does well at very challenging problem specifications but poorly at easy problem specifications, where even large errors will have a small effect on the overall mean squared error. In this work we propose training denoising networks using an adaptive-sampling/active-learning strategy. Our work improves upon a recently proposed universal denoiser training strategy by extending these results to higher dimensions and by incorporating a polynomial approximation of the true specification-loss landscape. This approximation allows us to reduce training times by almost two orders of magnitude. We test our method on simulated joint Poisson-Gaussian-Speckle noise and demonstrate that with our proposed training strategy, a single blind, generalist denoiser network can achieve peak signal-to-noise ratios within a uniform bound of specialized denoiser networks across a large range of operating conditions. We also capture a small dataset of images with varying amounts of joint Poisson-Gaussian-Speckle noise and demonstrate that a universal denoiser trained using our adaptive-sampling strategy outperforms uniformly trained baselines.
尽管最近取得了一些进展,但开发通用的通用去噪和去除伪影网络在很大程度上仍然是一个有待解决的问题:在网络权重固定的情况下,人们需要在一项任务(如去除泊松噪声)的专业性与另一项任务(如去除斑点噪声)的性能之间进行权衡。此外,由于 "维度诅咒"(curse of dimensionality)的存在,训练这样的网络具有挑战性:随着规格空间维度(即描述噪声分布所需的参数数量)的增加,需要训练的独特规格数量也呈指数增长。对这一空间进行均匀采样会导致网络在处理极具挑战性的问题规格时表现出色,但在处理简单的问题规格时却表现不佳,在这种情况下,即使误差很大,对总体均方误差的影响也很小。在这项工作中,我们建议使用自适应采样/主动学习策略来训练去噪网络。我们的工作改进了最近提出的通用去噪器训练策略,将这些结果扩展到了更高的维度,并加入了对真实规格损失情况的多项式近似。这种近似方法使我们的训练时间缩短了近两个数量级。我们在模拟泊松-高斯-啄木鸟联合噪声的基础上测试了我们的方法,结果表明,采用我们提出的训练策略,单个盲通用去噪网络可以在大范围的操作条件下,在专用去噪网络的统一范围内达到峰值信噪比。我们还捕捉了一个具有不同数量泊松-高斯-啄木鸟联合噪声的小型图像数据集,并证明使用我们的自适应采样策略训练的通用去噪器优于统一训练的基线。
{"title":"A Scalable Training Strategy for Blind Multi-Distribution Noise Removal","authors":"Kevin Zhang;Sakshum Kulshrestha;Christopher A. Metzler","doi":"10.1109/TIP.2024.3482185","DOIUrl":"10.1109/TIP.2024.3482185","url":null,"abstract":"Despite recent advances, developing general-purpose universal denoising and artifact-removal networks remains largely an open problem: Given fixed network weights, one inherently trades-off specialization at one task (e.g., removing Poisson noise) for performance at another (e.g., removing speckle noise). In addition, training such a network is challenging due to the curse of dimensionality: As one increases the dimensions of the specification-space (i.e., the number of parameters needed to describe the noise distribution) the number of unique specifications one needs to train for grows exponentially. Uniformly sampling this space will result in a network that does well at very challenging problem specifications but poorly at easy problem specifications, where even large errors will have a small effect on the overall mean squared error. In this work we propose training denoising networks using an adaptive-sampling/active-learning strategy. Our work improves upon a recently proposed universal denoiser training strategy by extending these results to higher dimensions and by incorporating a polynomial approximation of the true specification-loss landscape. This approximation allows us to reduce training times by almost two orders of magnitude. We test our method on simulated joint Poisson-Gaussian-Speckle noise and demonstrate that with our proposed training strategy, a single blind, generalist denoiser network can achieve peak signal-to-noise ratios within a uniform bound of specialized denoiser networks across a large range of operating conditions. We also capture a small dataset of images with varying amounts of joint Poisson-Gaussian-Speckle noise and demonstrate that a universal denoiser trained using our adaptive-sampling strategy outperforms uniformly trained baselines.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6216-6226"},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse Coding Inspired LSTM and Self-Attention Integration for Medical Image Segmentation 稀疏编码启发的 LSTM 和自注意力集成用于医学图像分割
Zexuan Ji;Shunlong Ye;Xiao Ma
Accurate and automatic segmentation of medical images plays an essential role in clinical diagnosis and analysis. It has been established that integrating contextual relationships substantially enhances the representational ability of neural networks. Conventionally, Long Short-Term Memory (LSTM) and Self-Attention (SA) mechanisms have been recognized for their proficiency in capturing global dependencies within data. However, these mechanisms have typically been viewed as distinct modules without a direct linkage. This paper presents the integration of LSTM design with SA sparse coding as a key innovation. It uses linear combinations of LSTM states for SA’s query, key, and value (QKV) matrices to leverage LSTM’s capability for state compression and historical data retention. This approach aims to rectify the shortcomings of conventional sparse coding methods that overlook temporal information, thereby enhancing SA’s ability to do sparse coding and capture global dependencies. Building upon this premise, we introduce two innovative modules that weave the SA matrix into the LSTM state design in distinct manners, enabling LSTM to more adeptly model global dependencies and meld seamlessly with SA without accruing extra computational demands. Both modules are separately embedded into the U-shaped convolutional neural network architecture for handling both 2D and 3D medical images. Experimental evaluations on downstream medical image segmentation tasks reveal that our proposed modules not only excel on four extensively utilized datasets across various baselines but also enhance prediction accuracy, even on baselines that have already incorporated contextual modules. Code is available at https://github.com/yeshunlong/SALSTM.
准确、自动地分割医学图像在临床诊断和分析中发挥着至关重要的作用。已经证实,整合上下文关系可大大增强神经网络的表征能力。传统上,长短时记忆(LSTM)和自我注意(SA)机制因其捕捉数据内全局依赖关系的能力而得到认可。然而,这些机制通常被视为不同的模块,没有直接联系。本文将 LSTM 设计与 SA 稀疏编码相结合,作为一项重要创新。它将 LSTM 状态的线性组合用于 SA 的查询、键和值(QKV)矩阵,以充分利用 LSTM 的状态压缩和历史数据保留能力。这种方法旨在纠正传统稀疏编码方法忽略时间信息的缺点,从而增强 SA 的稀疏编码和捕捉全局依赖性的能力。在这一前提下,我们引入了两个创新模块,它们以不同的方式将 SA 矩阵编织到 LSTM 状态设计中,使 LSTM 能够更巧妙地模拟全局依赖关系,并与 SA 无缝结合,而不会产生额外的计算需求。这两个模块分别嵌入到 U 型卷积神经网络架构中,用于处理二维和三维医学图像。对下游医学图像分割任务的实验评估表明,我们提出的模块不仅在四种广泛使用的数据集上表现出色,而且还提高了预测准确性,即使在已经包含上下文模块的基线上也是如此。代码见 https://github.com/yeshunlong/SALSTM。
{"title":"Sparse Coding Inspired LSTM and Self-Attention Integration for Medical Image Segmentation","authors":"Zexuan Ji;Shunlong Ye;Xiao Ma","doi":"10.1109/TIP.2024.3482189","DOIUrl":"10.1109/TIP.2024.3482189","url":null,"abstract":"Accurate and automatic segmentation of medical images plays an essential role in clinical diagnosis and analysis. It has been established that integrating contextual relationships substantially enhances the representational ability of neural networks. Conventionally, Long Short-Term Memory (LSTM) and Self-Attention (SA) mechanisms have been recognized for their proficiency in capturing global dependencies within data. However, these mechanisms have typically been viewed as distinct modules without a direct linkage. This paper presents the integration of LSTM design with SA sparse coding as a key innovation. It uses linear combinations of LSTM states for SA’s query, key, and value (QKV) matrices to leverage LSTM’s capability for state compression and historical data retention. This approach aims to rectify the shortcomings of conventional sparse coding methods that overlook temporal information, thereby enhancing SA’s ability to do sparse coding and capture global dependencies. Building upon this premise, we introduce two innovative modules that weave the SA matrix into the LSTM state design in distinct manners, enabling LSTM to more adeptly model global dependencies and meld seamlessly with SA without accruing extra computational demands. Both modules are separately embedded into the U-shaped convolutional neural network architecture for handling both 2D and 3D medical images. Experimental evaluations on downstream medical image segmentation tasks reveal that our proposed modules not only excel on four extensively utilized datasets across various baselines but also enhance prediction accuracy, even on baselines that have already incorporated contextual modules. Code is available at \u0000<uri>https://github.com/yeshunlong/SALSTM</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6098-6113"},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection 利用基于子群的正对选择提高噪声抑制深度度量学习中的样本利用率
Zhipeng Yu;Qianqian Xu;Yangbangyan Jiang;Yingfei Sun;Qingming Huang
The existence of noisy labels in real-world data negatively impacts the performance of deep learning models. Although much research effort has been devoted to improving the robustness towards noisy labels in classification tasks, the problem of noisy labels in deep metric learning (DML) remains under-explored. Existing noisy label learning methods designed for DML mainly discard suspicious noisy samples, resulting in a waste of the training data. To address this issue, we propose a noise-robust DML framework with SubGroup-based Positive-pair Selection (SGPS), which constructs reliable positive pairs for noisy samples to enhance the sample utilization. Specifically, SGPS first effectively identifies clean and noisy samples by a probability-based clean sample selectionstrategy. To further utilize the remaining noisy samples, we discover their potential similar samples based on the subgroup information given by a subgroup generation module and then aggregate them into informative positive prototypes for each noisy sample via a positive prototype generation module. Afterward, a new contrastive loss is tailored for the noisy samples with their selected positive pairs. SGPS can be easily integrated into the training process of existing pair-wise DML tasks, like image retrieval and face recognition. Extensive experiments on multiple synthetic and real-world large-scale label noise datasets demonstrate the effectiveness of our proposed method. Without any bells and whistles, our SGPS framework outperforms the state-of-the-art noisy label DML methods.
真实世界数据中存在的噪声标签会对深度学习模型的性能产生负面影响。尽管已有很多研究致力于提高分类任务中噪声标签的鲁棒性,但深度度量学习(DML)中的噪声标签问题仍未得到充分探索。为 DML 设计的现有噪声标签学习方法主要丢弃可疑的噪声样本,造成了训练数据的浪费。为了解决这个问题,我们提出了一种基于子群的正对选择(SGPS)的噪声抑制 DML 框架,它能为噪声样本构建可靠的正对,从而提高样本利用率。具体来说,SGPS 首先通过基于概率的干净样本选择策略有效识别干净样本和噪声样本。为了进一步利用剩余的噪声样本,我们根据子群生成模块给出的子群信息发现其潜在的相似样本,然后通过正向原型生成模块为每个噪声样本聚合成信息丰富的正向原型。之后,再根据所选的正对样本为噪声样本定制新的对比损失。SGPS 可以很容易地集成到现有的成对 DML 任务(如图像检索和人脸识别)的训练过程中。在多个合成和真实世界大规模标签噪声数据集上进行的广泛实验证明了我们提出的方法的有效性。在没有任何附加功能的情况下,我们的 SGPS 框架优于最先进的噪声标签 DML 方法。
{"title":"Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection","authors":"Zhipeng Yu;Qianqian Xu;Yangbangyan Jiang;Yingfei Sun;Qingming Huang","doi":"10.1109/TIP.2024.3482182","DOIUrl":"10.1109/TIP.2024.3482182","url":null,"abstract":"The existence of noisy labels in real-world data negatively impacts the performance of deep learning models. Although much research effort has been devoted to improving the robustness towards noisy labels in classification tasks, the problem of noisy labels in deep metric learning (DML) remains under-explored. Existing noisy label learning methods designed for DML mainly discard suspicious noisy samples, resulting in a waste of the training data. To address this issue, we propose a noise-robust DML framework with SubGroup-based Positive-pair Selection (SGPS), which constructs reliable positive pairs for noisy samples to enhance the sample utilization. Specifically, SGPS first effectively identifies clean and noisy samples by a probability-based clean sample selectionstrategy. To further utilize the remaining noisy samples, we discover their potential similar samples based on the subgroup information given by a subgroup generation module and then aggregate them into informative positive prototypes for each noisy sample via a positive prototype generation module. Afterward, a new contrastive loss is tailored for the noisy samples with their selected positive pairs. SGPS can be easily integrated into the training process of existing pair-wise DML tasks, like image retrieval and face recognition. Extensive experiments on multiple synthetic and real-world large-scale label noise datasets demonstrate the effectiveness of our proposed method. Without any bells and whistles, our SGPS framework outperforms the state-of-the-art noisy label DML methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6083-6097"},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latitude-Redundancy-Aware All-Zero Block Detection for Fast 360-Degree Video Coding 面向快速 360 度视频编码的纬度冗余感知全零块检测
Chang Yu;Xiaopeng Fan;Pengjin Chen;Yuxin Ni;Hengyu Man;Debin Zhao
The sphere-to-plane projection of 360-degree video introduces substantial stretched redundant data, which is discarded when reprojected to the 3D sphere for display. Consequently, encoding and transmitting such redundant data is unnecessary. Highly redundant blocks can be referred to as all-zero blocks (AZBs). Detecting these AZBs in advance can reduce computational and transmission resource consumption. However, this cannot be achieved by existing AZB detection techniques due to the unawareness of the stretching redundancy. In this paper, we first derive a latitude-adaptive redundancy detection (LARD) approach to adaptively detect coefficients carrying redundancy in transformed blocks by modeling the dependency between valid frequency range and the stretching degree based on spectrum analysis. Utilizing LARD, a latitude-redundancy-aware AZB detection scheme tailored for fast 360-degree video coding (LRAS) is proposed to accelerate the encoding process. LRAS consists of three sequential stages: latitude-adaptive AZB (L-AZB) detection, latitude-adaptive genuine-AZB (LG-AZB) detection and latitude-adaptive pseudo-AZB (LP-AZB) detection. Specifically, L-AZB refers to the AZB introduced by projection. LARD is used to detect L-AZB directly. LG-AZB refers to the AZB after hard-decision quantization and zeroing redundant coefficients. A novel latitude-adaptive sum of absolute difference estimation model is built to derive the threshold for LG-AZB detection. LP-AZB refers to the AZB in terms of rate-distortion optimization considering redundancy. A latitude-adaptive rate-distortion model is established for LP-AZB detection. Experimental results show that LRAS can achieve an average total encoding time reduction of 25.85% and 20.38% under low-delay and random access configurations compared to the original HEVC encoder, with only 0.16% and 0.13% BDBR increases and 0.01dB BDPSNR loss, respectively. The transform and quantization time savings are 60.13% and 59.94% on average.
360 度视频的球面到平面投影引入了大量拉伸冗余数据,这些数据在重新投影到三维球面进行显示时会被丢弃。因此,没有必要对这些冗余数据进行编码和传输。高度冗余的数据块可称为全零数据块(AZB)。提前检测这些 AZB 可以减少计算和传输资源的消耗。然而,现有的 AZB 检测技术无法做到这一点,因为它们无法感知拉伸冗余。在本文中,我们首先推导出一种纬度自适应冗余检测(LARD)方法,通过对有效频率范围和基于频谱分析的拉伸程度之间的依赖性建模,自适应地检测转换块中携带冗余的系数。利用 LARD,提出了一种为快速 360 度视频编码(LRAS)量身定制的纬度冗余感知 AZB 检测方案,以加快编码过程。LRAS 包括三个连续阶段:纬度自适应 AZB(L-AZB)检测、纬度自适应真 AZB(LG-AZB)检测和纬度自适应伪 AZB(LP-AZB)检测。具体来说,L-AZB 是指投影引入的 AZB。LARD 用于直接检测 L-AZB。LG-AZB 是指经过硬判定量化和冗余系数归零后的 AZB。建立了一个新颖的纬度自适应绝对差值之和估计模型,从而得出 LG-AZB 检测的阈值。LP-AZB 指的是考虑到冗余的速率失真优化的 AZB。为 LP-AZB 检测建立了纬度自适应速率失真模型。实验结果表明,与原始 HEVC 编码器相比,LRAS 在低延迟和随机存取配置下可实现平均总编码时间缩短 25.85% 和 20.38%,BDBR 分别仅增加 0.16% 和 0.13%,BDPSNR 损失 0.01dB。转换和量化时间平均节省 60.13% 和 59.94%。
{"title":"Latitude-Redundancy-Aware All-Zero Block Detection for Fast 360-Degree Video Coding","authors":"Chang Yu;Xiaopeng Fan;Pengjin Chen;Yuxin Ni;Hengyu Man;Debin Zhao","doi":"10.1109/TIP.2024.3482172","DOIUrl":"10.1109/TIP.2024.3482172","url":null,"abstract":"The sphere-to-plane projection of 360-degree video introduces substantial stretched redundant data, which is discarded when reprojected to the 3D sphere for display. Consequently, encoding and transmitting such redundant data is unnecessary. Highly redundant blocks can be referred to as all-zero blocks (AZBs). Detecting these AZBs in advance can reduce computational and transmission resource consumption. However, this cannot be achieved by existing AZB detection techniques due to the unawareness of the stretching redundancy. In this paper, we first derive a latitude-adaptive redundancy detection (LARD) approach to adaptively detect coefficients carrying redundancy in transformed blocks by modeling the dependency between valid frequency range and the stretching degree based on spectrum analysis. Utilizing LARD, a latitude-redundancy-aware AZB detection scheme tailored for fast 360-degree video coding (LRAS) is proposed to accelerate the encoding process. LRAS consists of three sequential stages: latitude-adaptive AZB (L-AZB) detection, latitude-adaptive genuine-AZB (LG-AZB) detection and latitude-adaptive pseudo-AZB (LP-AZB) detection. Specifically, L-AZB refers to the AZB introduced by projection. LARD is used to detect L-AZB directly. LG-AZB refers to the AZB after hard-decision quantization and zeroing redundant coefficients. A novel latitude-adaptive sum of absolute difference estimation model is built to derive the threshold for LG-AZB detection. LP-AZB refers to the AZB in terms of rate-distortion optimization considering redundancy. A latitude-adaptive rate-distortion model is established for LP-AZB detection. Experimental results show that LRAS can achieve an average total encoding time reduction of 25.85% and 20.38% under low-delay and random access configurations compared to the original HEVC encoder, with only 0.16% and 0.13% BDBR increases and 0.01dB BDPSNR loss, respectively. The transform and quantization time savings are 60.13% and 59.94% on average.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6129-6142"},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Blind Flare Removal Using Knowledge-Driven Flare-Level Estimator 利用知识驱动的耀斑级估计器实现盲目耀斑消除
Haoyou Deng;Lida Li;Feng Zhang;Zhiqiang Li;Bin Xu;Qingbo Lu;Changxin Gao;Nong Sang
Lens flare is a common phenomenon when strong light rays arrive at the camera sensor and a clean scene is consequently mixed up with various opaque and semi-transparent artifacts. Existing deep learning methods are always constrained with limited real image pairs for training. Though recent synthesis-based approaches are found effective, synthesized pairs still deviate from the real ones as the mixing mechanism of flare artifacts and scenes in the wild always depends on a line of undetermined factors, such as lens structure, scratches, etc. In this paper, we present a new perspective from the blind nature of the flare removal task in a knowledge-driven manner. Specifically, we present a simple yet effective flare-level estimator to predict the corruption level of a flare-corrupted image. The estimated flare-level can be interpreted as additive information of the gap between corrupted images and their flare-free correspondences to facilitate a network at both training and testing stages adaptively. Besides, we utilize a flare-level modulator to better integrate the estimations into networks. We also devise a flare-aware block for more accurate flare recognition and reconstruction. Additionally, we collect a new real-world flare dataset for benchmarking, namely WiderFlare. Extensive experiments on three benchmark datasets demonstrate that our method outperforms state-of-the-art methods quantitatively and qualitatively.
镜头眩光是一种常见现象,当强光射入相机传感器时,干净的场景会因此混入各种不透明和半透明的伪影。现有的深度学习方法总是受限于有限的真实图像对训练。尽管最近基于合成的方法被认为是有效的,但合成的图像对仍然与真实图像对有偏差,因为耀斑伪影与野生场景的混合机制总是取决于一系列不确定因素,如镜头结构、划痕等。在本文中,我们以知识驱动的方式,从去除耀斑任务的盲目性中提出了一个新的视角。具体来说,我们提出了一种简单而有效的耀斑级别估算器,用于预测耀斑损坏图像的损坏级别。估算的耀斑级别可以解释为损坏图像与其无耀斑对应图像之间差距的加法信息,从而促进网络在训练和测试阶段的自适应。此外,我们还利用耀斑级调制器将估算结果更好地整合到网络中。我们还设计了一个耀斑感知块,以实现更准确的耀斑识别和重建。此外,我们还收集了一个新的真实耀斑数据集(即 WiderFlare)作为基准。在三个基准数据集上进行的广泛实验表明,我们的方法在数量和质量上都优于最先进的方法。
{"title":"Toward Blind Flare Removal Using Knowledge-Driven Flare-Level Estimator","authors":"Haoyou Deng;Lida Li;Feng Zhang;Zhiqiang Li;Bin Xu;Qingbo Lu;Changxin Gao;Nong Sang","doi":"10.1109/TIP.2024.3480696","DOIUrl":"10.1109/TIP.2024.3480696","url":null,"abstract":"Lens flare is a common phenomenon when strong light rays arrive at the camera sensor and a clean scene is consequently mixed up with various opaque and semi-transparent artifacts. Existing deep learning methods are always constrained with limited real image pairs for training. Though recent synthesis-based approaches are found effective, synthesized pairs still deviate from the real ones as the mixing mechanism of flare artifacts and scenes in the wild always depends on a line of undetermined factors, such as lens structure, scratches, etc. In this paper, we present a new perspective from the blind nature of the flare removal task in a knowledge-driven manner. Specifically, we present a simple yet effective flare-level estimator to predict the corruption level of a flare-corrupted image. The estimated flare-level can be interpreted as additive information of the gap between corrupted images and their flare-free correspondences to facilitate a network at both training and testing stages adaptively. Besides, we utilize a flare-level modulator to better integrate the estimations into networks. We also devise a flare-aware block for more accurate flare recognition and reconstruction. Additionally, we collect a new real-world flare dataset for benchmarking, namely WiderFlare. Extensive experiments on three benchmark datasets demonstrate that our method outperforms state-of-the-art methods quantitatively and qualitatively.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6114-6128"},"PeriodicalIF":0.0,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142486818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric Clustering-Guided Cross-View Contrastive Learning for Partially View-Aligned Representation Learning 非参数聚类引导的跨视图对比学习,用于部分视图对齐表征学习
Shengsheng Qian;Dizhan Xue;Jun Hu;Huaiwen Zhang;Changsheng Xu
With the increasing availability of multi-view data, multi-view representation learning has emerged as a prominent research area. However, collecting strictly view-aligned data is usually expensive, and learning from both aligned and unaligned data can be more practicable. Therefore, Partially View-aligned Representation Learning (PVRL) has recently attracted increasing attention. After aligning multi-view representations based on their semantic similarity, the aligned representations can be utilized to facilitate downstream tasks, such as clustering. However, existing methods may be constrained by the following limitations: 1) They learn semantic relations across views using the known correspondences, which is incomplete and the existence of false negative pairs (FNP) can significantly impact the learning effectiveness; 2) Existing strategies for alleviating the impact of FNP are too intuitive and lack a theoretical explanation of their applicable conditions; 3) They attempt to find FNP based on distance in the common space and fail to explore semantic relations between multi-view data. In this paper, we propose a Nonparametric Clustering-guided Cross-view Contrastive Learning (NC3L) for PVRL, in order to address the above issues. Firstly, we propose to estimate the similarity matrix between multi-view data in the marginal cross-view contrastive loss to approximate the similarity matrix of supervised contrastive learning (CL). Secondly, we establish the theoretical foundation for our proposed method by analyzing the error bounds of the loss function and its derivatives between our method and supervised CL. Thirdly, we propose a Deep Variational Nonparametric Clustering (DeepVNC) by designing a deep reparameterized variational inference for Dirichlet process Gaussian mixture models to construct cluster-level similarity between multi-view data and discover FNP. Additionally, we propose a reparameterization trick to improve the robustness and the performance of our proposed CL method. Extensive experiments on four widely used benchmark datasets show the superiority of our proposed method compared with state-of-the-art methods.
随着多视图数据的日益增多,多视图表示学习已成为一个突出的研究领域。然而,收集严格的视图对齐数据通常成本高昂,而从对齐和非对齐数据中学习则更为实用。因此,部分视图对齐表征学习(PVRL)最近引起了越来越多的关注。根据语义相似性对多视图表征进行配准后,配准后的表征可用于促进聚类等下游任务。然而,现有的方法可能会受到以下限制:1) 它们利用已知的对应关系来学习视图间的语义关系,而这种方法是不完整的,假负对(FNP)的存在会严重影响学习效果;2) 现有的缓解 FNP 影响的策略过于直观,缺乏对其适用条件的理论解释;3) 它们试图根据公共空间中的距离来寻找 FNP,无法探索多视图数据间的语义关系。本文针对上述问题,提出了一种用于 PVRL 的非参数聚类引导的跨视图对比学习(NC3L)。首先,我们提出在边际跨视角对比损失中估计多视角数据之间的相似性矩阵,以近似监督对比学习(CL)的相似性矩阵。其次,我们通过分析我们的方法与有监督对比学习之间损失函数及其导数的误差边界,为我们提出的方法奠定了理论基础。第三,我们提出了一种深度变异非参数聚类(DeepVNC),通过为 Dirichlet 过程高斯混合物模型设计一种深度重参数化变异推理来构建多视图数据之间的聚类相似性,并发现 FNP。此外,我们还提出了一种重参数化技巧,以提高我们提出的 CL 方法的鲁棒性和性能。在四个广泛使用的基准数据集上进行的大量实验表明,与最先进的方法相比,我们提出的方法更胜一筹。
{"title":"Nonparametric Clustering-Guided Cross-View Contrastive Learning for Partially View-Aligned Representation Learning","authors":"Shengsheng Qian;Dizhan Xue;Jun Hu;Huaiwen Zhang;Changsheng Xu","doi":"10.1109/TIP.2024.3480701","DOIUrl":"10.1109/TIP.2024.3480701","url":null,"abstract":"With the increasing availability of multi-view data, multi-view representation learning has emerged as a prominent research area. However, collecting strictly view-aligned data is usually expensive, and learning from both aligned and unaligned data can be more practicable. Therefore, Partially View-aligned Representation Learning (PVRL) has recently attracted increasing attention. After aligning multi-view representations based on their semantic similarity, the aligned representations can be utilized to facilitate downstream tasks, such as clustering. However, existing methods may be constrained by the following limitations: 1) They learn semantic relations across views using the known correspondences, which is incomplete and the existence of false negative pairs (FNP) can significantly impact the learning effectiveness; 2) Existing strategies for alleviating the impact of FNP are too intuitive and lack a theoretical explanation of their applicable conditions; 3) They attempt to find FNP based on distance in the common space and fail to explore semantic relations between multi-view data. In this paper, we propose a Nonparametric Clustering-guided Cross-view Contrastive Learning (NC3L) for PVRL, in order to address the above issues. Firstly, we propose to estimate the similarity matrix between multi-view data in the marginal cross-view contrastive loss to approximate the similarity matrix of supervised contrastive learning (CL). Secondly, we establish the theoretical foundation for our proposed method by analyzing the error bounds of the loss function and its derivatives between our method and supervised CL. Thirdly, we propose a Deep Variational Nonparametric Clustering (DeepVNC) by designing a deep reparameterized variational inference for Dirichlet process Gaussian mixture models to construct cluster-level similarity between multi-view data and discover FNP. Additionally, we propose a reparameterization trick to improve the robustness and the performance of our proposed CL method. Extensive experiments on four widely used benchmark datasets show the superiority of our proposed method compared with state-of-the-art methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6158-6172"},"PeriodicalIF":0.0,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142486817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transforming Image Super-Resolution: A ConvFormer-Based Efficient Approach 转换图像超分辨率:基于 ConvFormer 的高效方法
Gang Wu;Junjun Jiang;Junpeng Jiang;Xianming Liu
Recent progress in single-image super-resolution (SISR) has achieved remarkable performance, yet the computational costs of these methods remain a challenge for deployment on resource-constrained devices. In particular, transformer-based methods, which leverage self-attention mechanisms, have led to significant breakthroughs but also introduce substantial computational costs. To tackle this issue, we introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR), offering an effective and efficient solution for lightweight image super-resolution. The proposed method inherits the advantages of both convolution-based and transformer-based approaches. Specifically, CFSR utilizes large kernel convolutions as a feature mixer to replace the self-attention module, efficiently modeling long-range dependencies and extensive receptive fields with minimal computational overhead. Furthermore, we propose an edge-preserving feed-forward network (EFN) designed to achieve local feature aggregation while effectively preserving high-frequency information. Extensive experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance compared to existing lightweight SR methods. When benchmarked against state-of-the-art methods such as ShuffleMixer, the proposed CFSR achieves a gain of 0.39 dB on the Urban100 dataset for the x2 super-resolution task while requiring 26% and 31% fewer parameters and FLOPs, respectively. The code and pre-trained models are available at https://github.com/Aitical/CFSR.
单图像超分辨率(SISR)的最新进展取得了显著的性能,但这些方法的计算成本仍然是在资源受限的设备上部署所面临的挑战。特别是基于变压器的方法,这种方法利用了自注意机制,取得了重大突破,但也带来了巨大的计算成本。为了解决这个问题,我们引入了卷积变换器层(ConvFormer),并提出了基于卷积变换器的超分辨率网络(CFSR),为轻量级图像超分辨率提供了一种有效且高效的解决方案。所提出的方法继承了基于卷积和基于变换器两种方法的优点。具体来说,CFSR 利用大核卷积作为特征混合器来取代自注意模块,以最小的计算开销有效地模拟长程依赖性和广泛的感受野。此外,我们还提出了一种边缘保留前馈网络(EFN),旨在实现局部特征聚合,同时有效保留高频信息。大量实验证明,与现有的轻量级 SR 方法相比,CFSR 在计算成本和性能之间实现了最佳平衡。与 ShuffleMixer 等最先进的方法相比,所提出的 CFSR 在 Urban100 数据集上的 x2 超分辨率任务中实现了 0.39 dB 的增益,同时所需的参数和 FLOP 分别减少了 26% 和 31%。代码和预训练模型可在 https://github.com/Aitical/CFSR 上获取。
{"title":"Transforming Image Super-Resolution: A ConvFormer-Based Efficient Approach","authors":"Gang Wu;Junjun Jiang;Junpeng Jiang;Xianming Liu","doi":"10.1109/TIP.2024.3477350","DOIUrl":"10.1109/TIP.2024.3477350","url":null,"abstract":"Recent progress in single-image super-resolution (SISR) has achieved remarkable performance, yet the computational costs of these methods remain a challenge for deployment on resource-constrained devices. In particular, transformer-based methods, which leverage self-attention mechanisms, have led to significant breakthroughs but also introduce substantial computational costs. To tackle this issue, we introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR), offering an effective and efficient solution for lightweight image super-resolution. The proposed method inherits the advantages of both convolution-based and transformer-based approaches. Specifically, CFSR utilizes large kernel convolutions as a feature mixer to replace the self-attention module, efficiently modeling long-range dependencies and extensive receptive fields with minimal computational overhead. Furthermore, we propose an edge-preserving feed-forward network (EFN) designed to achieve local feature aggregation while effectively preserving high-frequency information. Extensive experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance compared to existing lightweight SR methods. When benchmarked against state-of-the-art methods such as ShuffleMixer, the proposed CFSR achieves a gain of 0.39 dB on the Urban100 dataset for the x2 super-resolution task while requiring 26% and 31% fewer parameters and FLOPs, respectively. The code and pre-trained models are available at \u0000<uri>https://github.com/Aitical/CFSR</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6071-6082"},"PeriodicalIF":0.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142449611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1