首页 > 最新文献

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

英文 中文
EviPrompt: A Training-Free Evidential Prompt Generation Method for Adapting Segment Anything Model in Medical Images EviPrompt:用于调整医学影像中分段模型的免训练证据提示生成方法
Yinsong Xu;Jiaqi Tang;Aidong Men;Qingchao Chen
Medical image segmentation is a critical task in clinical applications. Recently, the Segment Anything Model (SAM) has demonstrated potential for natural image segmentation. However, the requirement for expert labour to provide prompts, and the domain gap between natural and medical images pose significant obstacles in adapting SAM to medical images. To overcome these challenges, this paper introduces a novel prompt generation method named EviPrompt. The proposed method requires only a single reference image-annotation pair, making it a training-free solution that significantly reduces the need for extensive labelling and computational resources. First, prompts are automatically generated based on the similarity between features of the reference and target images, and evidential learning is introduced to improve reliability. Then, to mitigate the impact of the domain gap, committee voting and inference-guided in-context learning are employed, generating prompts primarily based on human prior knowledge and reducing reliance on extracted semantic information. EviPrompt represents an efficient and robust approach to medical image segmentation. We evaluate it across a broad range of tasks and modalities, confirming its efficacy. The source code is available at https://github.com/SPIresearch/EviPrompt.
医学图像分割是临床应用中的一项关键任务。最近,"任意分割模型"(Segment Anything Model,SAM)在自然图像分割方面展现出了潜力。然而,由于需要专家提供提示,而且自然图像与医学图像之间存在领域差距,因此将 SAM 应用于医学图像存在重大障碍。为了克服这些挑战,本文介绍了一种名为 EviPrompt 的新型提示生成方法。所提出的方法只需要一个参考图像-注释对,是一种无需训练的解决方案,大大减少了对大量标注和计算资源的需求。首先,根据参考图像和目标图像特征之间的相似性自动生成提示,并引入证据学习以提高可靠性。然后,为了减轻领域差距的影响,采用了委员会投票和推理引导的上下文学习,主要根据人类的先验知识生成提示,减少对提取的语义信息的依赖。EviPrompt 是一种高效、稳健的医学图像分割方法。我们在广泛的任务和模式中对其进行了评估,证实了它的功效。源代码见 https://github.com/SPIresearch/EviPrompt。
{"title":"EviPrompt: A Training-Free Evidential Prompt Generation Method for Adapting Segment Anything Model in Medical Images","authors":"Yinsong Xu;Jiaqi Tang;Aidong Men;Qingchao Chen","doi":"10.1109/TIP.2024.3482175","DOIUrl":"10.1109/TIP.2024.3482175","url":null,"abstract":"Medical image segmentation is a critical task in clinical applications. Recently, the Segment Anything Model (SAM) has demonstrated potential for natural image segmentation. However, the requirement for expert labour to provide prompts, and the domain gap between natural and medical images pose significant obstacles in adapting SAM to medical images. To overcome these challenges, this paper introduces a novel prompt generation method named EviPrompt. The proposed method requires only a single reference image-annotation pair, making it a training-free solution that significantly reduces the need for extensive labelling and computational resources. First, prompts are automatically generated based on the similarity between features of the reference and target images, and evidential learning is introduced to improve reliability. Then, to mitigate the impact of the domain gap, committee voting and inference-guided in-context learning are employed, generating prompts primarily based on human prior knowledge and reducing reliance on extracted semantic information. EviPrompt represents an efficient and robust approach to medical image segmentation. We evaluate it across a broad range of tasks and modalities, confirming its efficacy. The source code is available at \u0000<uri>https://github.com/SPIresearch/EviPrompt</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6204-6215"},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Scalable Training Strategy for Blind Multi-Distribution Noise Removal 多分布盲法噪声消除的可扩展训练策略
Kevin Zhang;Sakshum Kulshrestha;Christopher A. Metzler
Despite recent advances, developing general-purpose universal denoising and artifact-removal networks remains largely an open problem: Given fixed network weights, one inherently trades-off specialization at one task (e.g., removing Poisson noise) for performance at another (e.g., removing speckle noise). In addition, training such a network is challenging due to the curse of dimensionality: As one increases the dimensions of the specification-space (i.e., the number of parameters needed to describe the noise distribution) the number of unique specifications one needs to train for grows exponentially. Uniformly sampling this space will result in a network that does well at very challenging problem specifications but poorly at easy problem specifications, where even large errors will have a small effect on the overall mean squared error. In this work we propose training denoising networks using an adaptive-sampling/active-learning strategy. Our work improves upon a recently proposed universal denoiser training strategy by extending these results to higher dimensions and by incorporating a polynomial approximation of the true specification-loss landscape. This approximation allows us to reduce training times by almost two orders of magnitude. We test our method on simulated joint Poisson-Gaussian-Speckle noise and demonstrate that with our proposed training strategy, a single blind, generalist denoiser network can achieve peak signal-to-noise ratios within a uniform bound of specialized denoiser networks across a large range of operating conditions. We also capture a small dataset of images with varying amounts of joint Poisson-Gaussian-Speckle noise and demonstrate that a universal denoiser trained using our adaptive-sampling strategy outperforms uniformly trained baselines.
尽管最近取得了一些进展,但开发通用的通用去噪和去除伪影网络在很大程度上仍然是一个有待解决的问题:在网络权重固定的情况下,人们需要在一项任务(如去除泊松噪声)的专业性与另一项任务(如去除斑点噪声)的性能之间进行权衡。此外,由于 "维度诅咒"(curse of dimensionality)的存在,训练这样的网络具有挑战性:随着规格空间维度(即描述噪声分布所需的参数数量)的增加,需要训练的独特规格数量也呈指数增长。对这一空间进行均匀采样会导致网络在处理极具挑战性的问题规格时表现出色,但在处理简单的问题规格时却表现不佳,在这种情况下,即使误差很大,对总体均方误差的影响也很小。在这项工作中,我们建议使用自适应采样/主动学习策略来训练去噪网络。我们的工作改进了最近提出的通用去噪器训练策略,将这些结果扩展到了更高的维度,并加入了对真实规格损失情况的多项式近似。这种近似方法使我们的训练时间缩短了近两个数量级。我们在模拟泊松-高斯-啄木鸟联合噪声的基础上测试了我们的方法,结果表明,采用我们提出的训练策略,单个盲通用去噪网络可以在大范围的操作条件下,在专用去噪网络的统一范围内达到峰值信噪比。我们还捕捉了一个具有不同数量泊松-高斯-啄木鸟联合噪声的小型图像数据集,并证明使用我们的自适应采样策略训练的通用去噪器优于统一训练的基线。
{"title":"A Scalable Training Strategy for Blind Multi-Distribution Noise Removal","authors":"Kevin Zhang;Sakshum Kulshrestha;Christopher A. Metzler","doi":"10.1109/TIP.2024.3482185","DOIUrl":"10.1109/TIP.2024.3482185","url":null,"abstract":"Despite recent advances, developing general-purpose universal denoising and artifact-removal networks remains largely an open problem: Given fixed network weights, one inherently trades-off specialization at one task (e.g., removing Poisson noise) for performance at another (e.g., removing speckle noise). In addition, training such a network is challenging due to the curse of dimensionality: As one increases the dimensions of the specification-space (i.e., the number of parameters needed to describe the noise distribution) the number of unique specifications one needs to train for grows exponentially. Uniformly sampling this space will result in a network that does well at very challenging problem specifications but poorly at easy problem specifications, where even large errors will have a small effect on the overall mean squared error. In this work we propose training denoising networks using an adaptive-sampling/active-learning strategy. Our work improves upon a recently proposed universal denoiser training strategy by extending these results to higher dimensions and by incorporating a polynomial approximation of the true specification-loss landscape. This approximation allows us to reduce training times by almost two orders of magnitude. We test our method on simulated joint Poisson-Gaussian-Speckle noise and demonstrate that with our proposed training strategy, a single blind, generalist denoiser network can achieve peak signal-to-noise ratios within a uniform bound of specialized denoiser networks across a large range of operating conditions. We also capture a small dataset of images with varying amounts of joint Poisson-Gaussian-Speckle noise and demonstrate that a universal denoiser trained using our adaptive-sampling strategy outperforms uniformly trained baselines.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6216-6226"},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse Coding Inspired LSTM and Self-Attention Integration for Medical Image Segmentation 稀疏编码启发的 LSTM 和自注意力集成用于医学图像分割
Zexuan Ji;Shunlong Ye;Xiao Ma
Accurate and automatic segmentation of medical images plays an essential role in clinical diagnosis and analysis. It has been established that integrating contextual relationships substantially enhances the representational ability of neural networks. Conventionally, Long Short-Term Memory (LSTM) and Self-Attention (SA) mechanisms have been recognized for their proficiency in capturing global dependencies within data. However, these mechanisms have typically been viewed as distinct modules without a direct linkage. This paper presents the integration of LSTM design with SA sparse coding as a key innovation. It uses linear combinations of LSTM states for SA’s query, key, and value (QKV) matrices to leverage LSTM’s capability for state compression and historical data retention. This approach aims to rectify the shortcomings of conventional sparse coding methods that overlook temporal information, thereby enhancing SA’s ability to do sparse coding and capture global dependencies. Building upon this premise, we introduce two innovative modules that weave the SA matrix into the LSTM state design in distinct manners, enabling LSTM to more adeptly model global dependencies and meld seamlessly with SA without accruing extra computational demands. Both modules are separately embedded into the U-shaped convolutional neural network architecture for handling both 2D and 3D medical images. Experimental evaluations on downstream medical image segmentation tasks reveal that our proposed modules not only excel on four extensively utilized datasets across various baselines but also enhance prediction accuracy, even on baselines that have already incorporated contextual modules. Code is available at https://github.com/yeshunlong/SALSTM.
准确、自动地分割医学图像在临床诊断和分析中发挥着至关重要的作用。已经证实,整合上下文关系可大大增强神经网络的表征能力。传统上,长短时记忆(LSTM)和自我注意(SA)机制因其捕捉数据内全局依赖关系的能力而得到认可。然而,这些机制通常被视为不同的模块,没有直接联系。本文将 LSTM 设计与 SA 稀疏编码相结合,作为一项重要创新。它将 LSTM 状态的线性组合用于 SA 的查询、键和值(QKV)矩阵,以充分利用 LSTM 的状态压缩和历史数据保留能力。这种方法旨在纠正传统稀疏编码方法忽略时间信息的缺点,从而增强 SA 的稀疏编码和捕捉全局依赖性的能力。在这一前提下,我们引入了两个创新模块,它们以不同的方式将 SA 矩阵编织到 LSTM 状态设计中,使 LSTM 能够更巧妙地模拟全局依赖关系,并与 SA 无缝结合,而不会产生额外的计算需求。这两个模块分别嵌入到 U 型卷积神经网络架构中,用于处理二维和三维医学图像。对下游医学图像分割任务的实验评估表明,我们提出的模块不仅在四种广泛使用的数据集上表现出色,而且还提高了预测准确性,即使在已经包含上下文模块的基线上也是如此。代码见 https://github.com/yeshunlong/SALSTM。
{"title":"Sparse Coding Inspired LSTM and Self-Attention Integration for Medical Image Segmentation","authors":"Zexuan Ji;Shunlong Ye;Xiao Ma","doi":"10.1109/TIP.2024.3482189","DOIUrl":"10.1109/TIP.2024.3482189","url":null,"abstract":"Accurate and automatic segmentation of medical images plays an essential role in clinical diagnosis and analysis. It has been established that integrating contextual relationships substantially enhances the representational ability of neural networks. Conventionally, Long Short-Term Memory (LSTM) and Self-Attention (SA) mechanisms have been recognized for their proficiency in capturing global dependencies within data. However, these mechanisms have typically been viewed as distinct modules without a direct linkage. This paper presents the integration of LSTM design with SA sparse coding as a key innovation. It uses linear combinations of LSTM states for SA’s query, key, and value (QKV) matrices to leverage LSTM’s capability for state compression and historical data retention. This approach aims to rectify the shortcomings of conventional sparse coding methods that overlook temporal information, thereby enhancing SA’s ability to do sparse coding and capture global dependencies. Building upon this premise, we introduce two innovative modules that weave the SA matrix into the LSTM state design in distinct manners, enabling LSTM to more adeptly model global dependencies and meld seamlessly with SA without accruing extra computational demands. Both modules are separately embedded into the U-shaped convolutional neural network architecture for handling both 2D and 3D medical images. Experimental evaluations on downstream medical image segmentation tasks reveal that our proposed modules not only excel on four extensively utilized datasets across various baselines but also enhance prediction accuracy, even on baselines that have already incorporated contextual modules. Code is available at \u0000<uri>https://github.com/yeshunlong/SALSTM</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6098-6113"},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection 利用基于子群的正对选择提高噪声抑制深度度量学习中的样本利用率
Zhipeng Yu;Qianqian Xu;Yangbangyan Jiang;Yingfei Sun;Qingming Huang
The existence of noisy labels in real-world data negatively impacts the performance of deep learning models. Although much research effort has been devoted to improving the robustness towards noisy labels in classification tasks, the problem of noisy labels in deep metric learning (DML) remains under-explored. Existing noisy label learning methods designed for DML mainly discard suspicious noisy samples, resulting in a waste of the training data. To address this issue, we propose a noise-robust DML framework with SubGroup-based Positive-pair Selection (SGPS), which constructs reliable positive pairs for noisy samples to enhance the sample utilization. Specifically, SGPS first effectively identifies clean and noisy samples by a probability-based clean sample selectionstrategy. To further utilize the remaining noisy samples, we discover their potential similar samples based on the subgroup information given by a subgroup generation module and then aggregate them into informative positive prototypes for each noisy sample via a positive prototype generation module. Afterward, a new contrastive loss is tailored for the noisy samples with their selected positive pairs. SGPS can be easily integrated into the training process of existing pair-wise DML tasks, like image retrieval and face recognition. Extensive experiments on multiple synthetic and real-world large-scale label noise datasets demonstrate the effectiveness of our proposed method. Without any bells and whistles, our SGPS framework outperforms the state-of-the-art noisy label DML methods.
真实世界数据中存在的噪声标签会对深度学习模型的性能产生负面影响。尽管已有很多研究致力于提高分类任务中噪声标签的鲁棒性,但深度度量学习(DML)中的噪声标签问题仍未得到充分探索。为 DML 设计的现有噪声标签学习方法主要丢弃可疑的噪声样本,造成了训练数据的浪费。为了解决这个问题,我们提出了一种基于子群的正对选择(SGPS)的噪声抑制 DML 框架,它能为噪声样本构建可靠的正对,从而提高样本利用率。具体来说,SGPS 首先通过基于概率的干净样本选择策略有效识别干净样本和噪声样本。为了进一步利用剩余的噪声样本,我们根据子群生成模块给出的子群信息发现其潜在的相似样本,然后通过正向原型生成模块为每个噪声样本聚合成信息丰富的正向原型。之后,再根据所选的正对样本为噪声样本定制新的对比损失。SGPS 可以很容易地集成到现有的成对 DML 任务(如图像检索和人脸识别)的训练过程中。在多个合成和真实世界大规模标签噪声数据集上进行的广泛实验证明了我们提出的方法的有效性。在没有任何附加功能的情况下,我们的 SGPS 框架优于最先进的噪声标签 DML 方法。
{"title":"Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection","authors":"Zhipeng Yu;Qianqian Xu;Yangbangyan Jiang;Yingfei Sun;Qingming Huang","doi":"10.1109/TIP.2024.3482182","DOIUrl":"10.1109/TIP.2024.3482182","url":null,"abstract":"The existence of noisy labels in real-world data negatively impacts the performance of deep learning models. Although much research effort has been devoted to improving the robustness towards noisy labels in classification tasks, the problem of noisy labels in deep metric learning (DML) remains under-explored. Existing noisy label learning methods designed for DML mainly discard suspicious noisy samples, resulting in a waste of the training data. To address this issue, we propose a noise-robust DML framework with SubGroup-based Positive-pair Selection (SGPS), which constructs reliable positive pairs for noisy samples to enhance the sample utilization. Specifically, SGPS first effectively identifies clean and noisy samples by a probability-based clean sample selectionstrategy. To further utilize the remaining noisy samples, we discover their potential similar samples based on the subgroup information given by a subgroup generation module and then aggregate them into informative positive prototypes for each noisy sample via a positive prototype generation module. Afterward, a new contrastive loss is tailored for the noisy samples with their selected positive pairs. SGPS can be easily integrated into the training process of existing pair-wise DML tasks, like image retrieval and face recognition. Extensive experiments on multiple synthetic and real-world large-scale label noise datasets demonstrate the effectiveness of our proposed method. Without any bells and whistles, our SGPS framework outperforms the state-of-the-art noisy label DML methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6083-6097"},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latitude-Redundancy-Aware All-Zero Block Detection for Fast 360-Degree Video Coding 面向快速 360 度视频编码的纬度冗余感知全零块检测
Chang Yu;Xiaopeng Fan;Pengjin Chen;Yuxin Ni;Hengyu Man;Debin Zhao
The sphere-to-plane projection of 360-degree video introduces substantial stretched redundant data, which is discarded when reprojected to the 3D sphere for display. Consequently, encoding and transmitting such redundant data is unnecessary. Highly redundant blocks can be referred to as all-zero blocks (AZBs). Detecting these AZBs in advance can reduce computational and transmission resource consumption. However, this cannot be achieved by existing AZB detection techniques due to the unawareness of the stretching redundancy. In this paper, we first derive a latitude-adaptive redundancy detection (LARD) approach to adaptively detect coefficients carrying redundancy in transformed blocks by modeling the dependency between valid frequency range and the stretching degree based on spectrum analysis. Utilizing LARD, a latitude-redundancy-aware AZB detection scheme tailored for fast 360-degree video coding (LRAS) is proposed to accelerate the encoding process. LRAS consists of three sequential stages: latitude-adaptive AZB (L-AZB) detection, latitude-adaptive genuine-AZB (LG-AZB) detection and latitude-adaptive pseudo-AZB (LP-AZB) detection. Specifically, L-AZB refers to the AZB introduced by projection. LARD is used to detect L-AZB directly. LG-AZB refers to the AZB after hard-decision quantization and zeroing redundant coefficients. A novel latitude-adaptive sum of absolute difference estimation model is built to derive the threshold for LG-AZB detection. LP-AZB refers to the AZB in terms of rate-distortion optimization considering redundancy. A latitude-adaptive rate-distortion model is established for LP-AZB detection. Experimental results show that LRAS can achieve an average total encoding time reduction of 25.85% and 20.38% under low-delay and random access configurations compared to the original HEVC encoder, with only 0.16% and 0.13% BDBR increases and 0.01dB BDPSNR loss, respectively. The transform and quantization time savings are 60.13% and 59.94% on average.
360 度视频的球面到平面投影引入了大量拉伸冗余数据,这些数据在重新投影到三维球面进行显示时会被丢弃。因此,没有必要对这些冗余数据进行编码和传输。高度冗余的数据块可称为全零数据块(AZB)。提前检测这些 AZB 可以减少计算和传输资源的消耗。然而,现有的 AZB 检测技术无法做到这一点,因为它们无法感知拉伸冗余。在本文中,我们首先推导出一种纬度自适应冗余检测(LARD)方法,通过对有效频率范围和基于频谱分析的拉伸程度之间的依赖性建模,自适应地检测转换块中携带冗余的系数。利用 LARD,提出了一种为快速 360 度视频编码(LRAS)量身定制的纬度冗余感知 AZB 检测方案,以加快编码过程。LRAS 包括三个连续阶段:纬度自适应 AZB(L-AZB)检测、纬度自适应真 AZB(LG-AZB)检测和纬度自适应伪 AZB(LP-AZB)检测。具体来说,L-AZB 是指投影引入的 AZB。LARD 用于直接检测 L-AZB。LG-AZB 是指经过硬判定量化和冗余系数归零后的 AZB。建立了一个新颖的纬度自适应绝对差值之和估计模型,从而得出 LG-AZB 检测的阈值。LP-AZB 指的是考虑到冗余的速率失真优化的 AZB。为 LP-AZB 检测建立了纬度自适应速率失真模型。实验结果表明,与原始 HEVC 编码器相比,LRAS 在低延迟和随机存取配置下可实现平均总编码时间缩短 25.85% 和 20.38%,BDBR 分别仅增加 0.16% 和 0.13%,BDPSNR 损失 0.01dB。转换和量化时间平均节省 60.13% 和 59.94%。
{"title":"Latitude-Redundancy-Aware All-Zero Block Detection for Fast 360-Degree Video Coding","authors":"Chang Yu;Xiaopeng Fan;Pengjin Chen;Yuxin Ni;Hengyu Man;Debin Zhao","doi":"10.1109/TIP.2024.3482172","DOIUrl":"10.1109/TIP.2024.3482172","url":null,"abstract":"The sphere-to-plane projection of 360-degree video introduces substantial stretched redundant data, which is discarded when reprojected to the 3D sphere for display. Consequently, encoding and transmitting such redundant data is unnecessary. Highly redundant blocks can be referred to as all-zero blocks (AZBs). Detecting these AZBs in advance can reduce computational and transmission resource consumption. However, this cannot be achieved by existing AZB detection techniques due to the unawareness of the stretching redundancy. In this paper, we first derive a latitude-adaptive redundancy detection (LARD) approach to adaptively detect coefficients carrying redundancy in transformed blocks by modeling the dependency between valid frequency range and the stretching degree based on spectrum analysis. Utilizing LARD, a latitude-redundancy-aware AZB detection scheme tailored for fast 360-degree video coding (LRAS) is proposed to accelerate the encoding process. LRAS consists of three sequential stages: latitude-adaptive AZB (L-AZB) detection, latitude-adaptive genuine-AZB (LG-AZB) detection and latitude-adaptive pseudo-AZB (LP-AZB) detection. Specifically, L-AZB refers to the AZB introduced by projection. LARD is used to detect L-AZB directly. LG-AZB refers to the AZB after hard-decision quantization and zeroing redundant coefficients. A novel latitude-adaptive sum of absolute difference estimation model is built to derive the threshold for LG-AZB detection. LP-AZB refers to the AZB in terms of rate-distortion optimization considering redundancy. A latitude-adaptive rate-distortion model is established for LP-AZB detection. Experimental results show that LRAS can achieve an average total encoding time reduction of 25.85% and 20.38% under low-delay and random access configurations compared to the original HEVC encoder, with only 0.16% and 0.13% BDBR increases and 0.01dB BDPSNR loss, respectively. The transform and quantization time savings are 60.13% and 59.94% on average.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6129-6142"},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Blind Flare Removal Using Knowledge-Driven Flare-Level Estimator 利用知识驱动的耀斑级估计器实现盲目耀斑消除
Haoyou Deng;Lida Li;Feng Zhang;Zhiqiang Li;Bin Xu;Qingbo Lu;Changxin Gao;Nong Sang
Lens flare is a common phenomenon when strong light rays arrive at the camera sensor and a clean scene is consequently mixed up with various opaque and semi-transparent artifacts. Existing deep learning methods are always constrained with limited real image pairs for training. Though recent synthesis-based approaches are found effective, synthesized pairs still deviate from the real ones as the mixing mechanism of flare artifacts and scenes in the wild always depends on a line of undetermined factors, such as lens structure, scratches, etc. In this paper, we present a new perspective from the blind nature of the flare removal task in a knowledge-driven manner. Specifically, we present a simple yet effective flare-level estimator to predict the corruption level of a flare-corrupted image. The estimated flare-level can be interpreted as additive information of the gap between corrupted images and their flare-free correspondences to facilitate a network at both training and testing stages adaptively. Besides, we utilize a flare-level modulator to better integrate the estimations into networks. We also devise a flare-aware block for more accurate flare recognition and reconstruction. Additionally, we collect a new real-world flare dataset for benchmarking, namely WiderFlare. Extensive experiments on three benchmark datasets demonstrate that our method outperforms state-of-the-art methods quantitatively and qualitatively.
镜头眩光是一种常见现象,当强光射入相机传感器时,干净的场景会因此混入各种不透明和半透明的伪影。现有的深度学习方法总是受限于有限的真实图像对训练。尽管最近基于合成的方法被认为是有效的,但合成的图像对仍然与真实图像对有偏差,因为耀斑伪影与野生场景的混合机制总是取决于一系列不确定因素,如镜头结构、划痕等。在本文中,我们以知识驱动的方式,从去除耀斑任务的盲目性中提出了一个新的视角。具体来说,我们提出了一种简单而有效的耀斑级别估算器,用于预测耀斑损坏图像的损坏级别。估算的耀斑级别可以解释为损坏图像与其无耀斑对应图像之间差距的加法信息,从而促进网络在训练和测试阶段的自适应。此外,我们还利用耀斑级调制器将估算结果更好地整合到网络中。我们还设计了一个耀斑感知块,以实现更准确的耀斑识别和重建。此外,我们还收集了一个新的真实耀斑数据集(即 WiderFlare)作为基准。在三个基准数据集上进行的广泛实验表明,我们的方法在数量和质量上都优于最先进的方法。
{"title":"Toward Blind Flare Removal Using Knowledge-Driven Flare-Level Estimator","authors":"Haoyou Deng;Lida Li;Feng Zhang;Zhiqiang Li;Bin Xu;Qingbo Lu;Changxin Gao;Nong Sang","doi":"10.1109/TIP.2024.3480696","DOIUrl":"10.1109/TIP.2024.3480696","url":null,"abstract":"Lens flare is a common phenomenon when strong light rays arrive at the camera sensor and a clean scene is consequently mixed up with various opaque and semi-transparent artifacts. Existing deep learning methods are always constrained with limited real image pairs for training. Though recent synthesis-based approaches are found effective, synthesized pairs still deviate from the real ones as the mixing mechanism of flare artifacts and scenes in the wild always depends on a line of undetermined factors, such as lens structure, scratches, etc. In this paper, we present a new perspective from the blind nature of the flare removal task in a knowledge-driven manner. Specifically, we present a simple yet effective flare-level estimator to predict the corruption level of a flare-corrupted image. The estimated flare-level can be interpreted as additive information of the gap between corrupted images and their flare-free correspondences to facilitate a network at both training and testing stages adaptively. Besides, we utilize a flare-level modulator to better integrate the estimations into networks. We also devise a flare-aware block for more accurate flare recognition and reconstruction. Additionally, we collect a new real-world flare dataset for benchmarking, namely WiderFlare. Extensive experiments on three benchmark datasets demonstrate that our method outperforms state-of-the-art methods quantitatively and qualitatively.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6114-6128"},"PeriodicalIF":0.0,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142486818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric Clustering-Guided Cross-View Contrastive Learning for Partially View-Aligned Representation Learning 非参数聚类引导的跨视图对比学习,用于部分视图对齐表征学习
Shengsheng Qian;Dizhan Xue;Jun Hu;Huaiwen Zhang;Changsheng Xu
With the increasing availability of multi-view data, multi-view representation learning has emerged as a prominent research area. However, collecting strictly view-aligned data is usually expensive, and learning from both aligned and unaligned data can be more practicable. Therefore, Partially View-aligned Representation Learning (PVRL) has recently attracted increasing attention. After aligning multi-view representations based on their semantic similarity, the aligned representations can be utilized to facilitate downstream tasks, such as clustering. However, existing methods may be constrained by the following limitations: 1) They learn semantic relations across views using the known correspondences, which is incomplete and the existence of false negative pairs (FNP) can significantly impact the learning effectiveness; 2) Existing strategies for alleviating the impact of FNP are too intuitive and lack a theoretical explanation of their applicable conditions; 3) They attempt to find FNP based on distance in the common space and fail to explore semantic relations between multi-view data. In this paper, we propose a Nonparametric Clustering-guided Cross-view Contrastive Learning (NC3L) for PVRL, in order to address the above issues. Firstly, we propose to estimate the similarity matrix between multi-view data in the marginal cross-view contrastive loss to approximate the similarity matrix of supervised contrastive learning (CL). Secondly, we establish the theoretical foundation for our proposed method by analyzing the error bounds of the loss function and its derivatives between our method and supervised CL. Thirdly, we propose a Deep Variational Nonparametric Clustering (DeepVNC) by designing a deep reparameterized variational inference for Dirichlet process Gaussian mixture models to construct cluster-level similarity between multi-view data and discover FNP. Additionally, we propose a reparameterization trick to improve the robustness and the performance of our proposed CL method. Extensive experiments on four widely used benchmark datasets show the superiority of our proposed method compared with state-of-the-art methods.
随着多视图数据的日益增多,多视图表示学习已成为一个突出的研究领域。然而,收集严格的视图对齐数据通常成本高昂,而从对齐和非对齐数据中学习则更为实用。因此,部分视图对齐表征学习(PVRL)最近引起了越来越多的关注。根据语义相似性对多视图表征进行配准后,配准后的表征可用于促进聚类等下游任务。然而,现有的方法可能会受到以下限制:1) 它们利用已知的对应关系来学习视图间的语义关系,而这种方法是不完整的,假负对(FNP)的存在会严重影响学习效果;2) 现有的缓解 FNP 影响的策略过于直观,缺乏对其适用条件的理论解释;3) 它们试图根据公共空间中的距离来寻找 FNP,无法探索多视图数据间的语义关系。本文针对上述问题,提出了一种用于 PVRL 的非参数聚类引导的跨视图对比学习(NC3L)。首先,我们提出在边际跨视角对比损失中估计多视角数据之间的相似性矩阵,以近似监督对比学习(CL)的相似性矩阵。其次,我们通过分析我们的方法与有监督对比学习之间损失函数及其导数的误差边界,为我们提出的方法奠定了理论基础。第三,我们提出了一种深度变异非参数聚类(DeepVNC),通过为 Dirichlet 过程高斯混合物模型设计一种深度重参数化变异推理来构建多视图数据之间的聚类相似性,并发现 FNP。此外,我们还提出了一种重参数化技巧,以提高我们提出的 CL 方法的鲁棒性和性能。在四个广泛使用的基准数据集上进行的大量实验表明,与最先进的方法相比,我们提出的方法更胜一筹。
{"title":"Nonparametric Clustering-Guided Cross-View Contrastive Learning for Partially View-Aligned Representation Learning","authors":"Shengsheng Qian;Dizhan Xue;Jun Hu;Huaiwen Zhang;Changsheng Xu","doi":"10.1109/TIP.2024.3480701","DOIUrl":"10.1109/TIP.2024.3480701","url":null,"abstract":"With the increasing availability of multi-view data, multi-view representation learning has emerged as a prominent research area. However, collecting strictly view-aligned data is usually expensive, and learning from both aligned and unaligned data can be more practicable. Therefore, Partially View-aligned Representation Learning (PVRL) has recently attracted increasing attention. After aligning multi-view representations based on their semantic similarity, the aligned representations can be utilized to facilitate downstream tasks, such as clustering. However, existing methods may be constrained by the following limitations: 1) They learn semantic relations across views using the known correspondences, which is incomplete and the existence of false negative pairs (FNP) can significantly impact the learning effectiveness; 2) Existing strategies for alleviating the impact of FNP are too intuitive and lack a theoretical explanation of their applicable conditions; 3) They attempt to find FNP based on distance in the common space and fail to explore semantic relations between multi-view data. In this paper, we propose a Nonparametric Clustering-guided Cross-view Contrastive Learning (NC3L) for PVRL, in order to address the above issues. Firstly, we propose to estimate the similarity matrix between multi-view data in the marginal cross-view contrastive loss to approximate the similarity matrix of supervised contrastive learning (CL). Secondly, we establish the theoretical foundation for our proposed method by analyzing the error bounds of the loss function and its derivatives between our method and supervised CL. Thirdly, we propose a Deep Variational Nonparametric Clustering (DeepVNC) by designing a deep reparameterized variational inference for Dirichlet process Gaussian mixture models to construct cluster-level similarity between multi-view data and discover FNP. Additionally, we propose a reparameterization trick to improve the robustness and the performance of our proposed CL method. Extensive experiments on four widely used benchmark datasets show the superiority of our proposed method compared with state-of-the-art methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6158-6172"},"PeriodicalIF":0.0,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142486817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transforming Image Super-Resolution: A ConvFormer-Based Efficient Approach 转换图像超分辨率:基于 ConvFormer 的高效方法
Gang Wu;Junjun Jiang;Junpeng Jiang;Xianming Liu
Recent progress in single-image super-resolution (SISR) has achieved remarkable performance, yet the computational costs of these methods remain a challenge for deployment on resource-constrained devices. In particular, transformer-based methods, which leverage self-attention mechanisms, have led to significant breakthroughs but also introduce substantial computational costs. To tackle this issue, we introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR), offering an effective and efficient solution for lightweight image super-resolution. The proposed method inherits the advantages of both convolution-based and transformer-based approaches. Specifically, CFSR utilizes large kernel convolutions as a feature mixer to replace the self-attention module, efficiently modeling long-range dependencies and extensive receptive fields with minimal computational overhead. Furthermore, we propose an edge-preserving feed-forward network (EFN) designed to achieve local feature aggregation while effectively preserving high-frequency information. Extensive experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance compared to existing lightweight SR methods. When benchmarked against state-of-the-art methods such as ShuffleMixer, the proposed CFSR achieves a gain of 0.39 dB on the Urban100 dataset for the x2 super-resolution task while requiring 26% and 31% fewer parameters and FLOPs, respectively. The code and pre-trained models are available at https://github.com/Aitical/CFSR.
单图像超分辨率(SISR)的最新进展取得了显著的性能,但这些方法的计算成本仍然是在资源受限的设备上部署所面临的挑战。特别是基于变压器的方法,这种方法利用了自注意机制,取得了重大突破,但也带来了巨大的计算成本。为了解决这个问题,我们引入了卷积变换器层(ConvFormer),并提出了基于卷积变换器的超分辨率网络(CFSR),为轻量级图像超分辨率提供了一种有效且高效的解决方案。所提出的方法继承了基于卷积和基于变换器两种方法的优点。具体来说,CFSR 利用大核卷积作为特征混合器来取代自注意模块,以最小的计算开销有效地模拟长程依赖性和广泛的感受野。此外,我们还提出了一种边缘保留前馈网络(EFN),旨在实现局部特征聚合,同时有效保留高频信息。大量实验证明,与现有的轻量级 SR 方法相比,CFSR 在计算成本和性能之间实现了最佳平衡。与 ShuffleMixer 等最先进的方法相比,所提出的 CFSR 在 Urban100 数据集上的 x2 超分辨率任务中实现了 0.39 dB 的增益,同时所需的参数和 FLOP 分别减少了 26% 和 31%。代码和预训练模型可在 https://github.com/Aitical/CFSR 上获取。
{"title":"Transforming Image Super-Resolution: A ConvFormer-Based Efficient Approach","authors":"Gang Wu;Junjun Jiang;Junpeng Jiang;Xianming Liu","doi":"10.1109/TIP.2024.3477350","DOIUrl":"10.1109/TIP.2024.3477350","url":null,"abstract":"Recent progress in single-image super-resolution (SISR) has achieved remarkable performance, yet the computational costs of these methods remain a challenge for deployment on resource-constrained devices. In particular, transformer-based methods, which leverage self-attention mechanisms, have led to significant breakthroughs but also introduce substantial computational costs. To tackle this issue, we introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR), offering an effective and efficient solution for lightweight image super-resolution. The proposed method inherits the advantages of both convolution-based and transformer-based approaches. Specifically, CFSR utilizes large kernel convolutions as a feature mixer to replace the self-attention module, efficiently modeling long-range dependencies and extensive receptive fields with minimal computational overhead. Furthermore, we propose an edge-preserving feed-forward network (EFN) designed to achieve local feature aggregation while effectively preserving high-frequency information. Extensive experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance compared to existing lightweight SR methods. When benchmarked against state-of-the-art methods such as ShuffleMixer, the proposed CFSR achieves a gain of 0.39 dB on the Urban100 dataset for the x2 super-resolution task while requiring 26% and 31% fewer parameters and FLOPs, respectively. The code and pre-trained models are available at \u0000<uri>https://github.com/Aitical/CFSR</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6071-6082"},"PeriodicalIF":0.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142449611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NTK-Guided Few-Shot Class Incremental Learning NTK引导的 "几枪 "类增量学习
Jingren Liu;Zhong Ji;Yanwei Pang;Yunlong Yu
The proliferation of Few-Shot Class Incremental Learning (FSCIL) methodologies has highlighted the critical challenge of maintaining robust anti-amnesia capabilities in FSCIL learners. In this paper, we present a novel conceptualization of anti-amnesia in terms of mathematical generalization, leveraging the Neural Tangent Kernel (NTK) perspective. Our method focuses on two key aspects: ensuring optimal NTK convergence and minimizing NTK-related generalization loss, which serve as the theoretical foundation for cross-task generalization. To achieve global NTK convergence, we introduce a principled meta-learning mechanism that guides optimization within an expanded network architecture. Concurrently, to reduce the NTK-related generalization loss, we systematically optimize its constituent factors. Specifically, we initiate self-supervised pre-training on the base session to enhance NTK-related generalization potential. These self-supervised weights are then carefully refined through curricular alignment, followed by the application of dual NTK regularization tailored specifically for both convolutional and linear layers. Through the combined effects of these measures, our network acquires robust NTK properties, ensuring optimal convergence and stability of the NTK matrix and minimizing the NTK-related generalization loss, significantly enhancing its theoretical generalization. On popular FSCIL benchmark datasets, our NTK-FSCIL surpasses contemporary state-of-the-art approaches, elevating end-session accuracy by 2.9% to 9.3%.
快速类增量学习(FSCIL)方法的普及凸显了在 FSCIL 学习者中保持强大的反遗忘能力这一严峻挑战。在本文中,我们利用神经切分核(NTK)的观点,从数学概括的角度提出了反遗忘的新概念。我们的方法侧重于两个关键方面:确保最佳的 NTK 收敛性和最小化 NTK 相关的泛化损失,这两个方面是跨任务泛化的理论基础。为了实现全局 NTK 收敛,我们引入了一种原则性元学习机制,在扩展的网络架构内指导优化。同时,为了减少与 NTK 相关的泛化损失,我们对其组成因素进行了系统优化。具体来说,我们在基础会话上启动了自我监督预训练,以增强与 NTK 相关的泛化潜力。然后,通过课程调整对这些自我监督权重进行仔细完善,接着应用专门为卷积层和线性层定制的双重 NTK 正则化。通过这些措施的综合作用,我们的网络获得了强大的 NTK 特性,确保了 NTK 矩阵的最佳收敛性和稳定性,并最大限度地减少了 NTK 相关的泛化损失,从而显著增强了其理论泛化能力。在流行的 FSCIL 基准数据集上,我们的 NTK-FSCIL 超越了当代最先进的方法,将会终准确率提高了 2.9% 至 9.3%。
{"title":"NTK-Guided Few-Shot Class Incremental Learning","authors":"Jingren Liu;Zhong Ji;Yanwei Pang;Yunlong Yu","doi":"10.1109/TIP.2024.3478854","DOIUrl":"10.1109/TIP.2024.3478854","url":null,"abstract":"The proliferation of Few-Shot Class Incremental Learning (FSCIL) methodologies has highlighted the critical challenge of maintaining robust anti-amnesia capabilities in FSCIL learners. In this paper, we present a novel conceptualization of anti-amnesia in terms of mathematical generalization, leveraging the Neural Tangent Kernel (NTK) perspective. Our method focuses on two key aspects: ensuring optimal NTK convergence and minimizing NTK-related generalization loss, which serve as the theoretical foundation for cross-task generalization. To achieve global NTK convergence, we introduce a principled meta-learning mechanism that guides optimization within an expanded network architecture. Concurrently, to reduce the NTK-related generalization loss, we systematically optimize its constituent factors. Specifically, we initiate self-supervised pre-training on the base session to enhance NTK-related generalization potential. These self-supervised weights are then carefully refined through curricular alignment, followed by the application of dual NTK regularization tailored specifically for both convolutional and linear layers. Through the combined effects of these measures, our network acquires robust NTK properties, ensuring optimal convergence and stability of the NTK matrix and minimizing the NTK-related generalization loss, significantly enhancing its theoretical generalization. On popular FSCIL benchmark datasets, our NTK-FSCIL surpasses contemporary state-of-the-art approaches, elevating end-session accuracy by 2.9% to 9.3%.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6029-6044"},"PeriodicalIF":0.0,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142448771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Content-Weighted Pseudocylindrical Representation for 360° Image Compression 学习内容加权伪圆柱表示法以实现 360° 图像压缩
Mu Li;Youneng Bao;Xiaohang Sui;Jinxing Li;Guangming Lu;Yong Xu
Learned 360° image compression methods using equirectangular projection (ERP) often confront a non-uniform sampling issue, inherent to sphere-to-rectangle projection. While uniformly or nearly uniformly sampling representations, along with their corresponding convolution operations, have been proposed to mitigate this issue, these methods often concentrate solely on uniform sampling rates, thus neglecting the content of the image. In this paper, we urge that different contents within 360° images have varying significance and advocate for the adoption of a content-adaptive parametric representation in 360° image compression, which takes into account both the content and sampling rate. We first introduce the parametric pseudocylindrical representation and corresponding convolution operation, upon which we build a learned 360° image codec. Then, we model the hyperparameter of the representation as the output of a network, derived from the image’s content and its spherical coordinates. We treat the optimization of hyperparameters for different 360° images as distinct compression tasks and propose a meta-learning algorithm to jointly optimize the codec and the metaknowledge, i.e., the hyperparameter estimation network. A significant challenge is the lack of a direct derivative from the compression loss to the hyperparameter network. To address this, we present a novel method to relax the rate-distortion loss as a function of the hyperparameters, enabling gradient-based optimization of the metaknowledge. Experimental results on omnidirectional images demonstrate that our method achieves state-of-the-art performance and superior visual quality.
使用等角投影(ERP)学习的 360° 图像压缩方法经常会遇到球到角投影固有的非均匀采样问题。虽然已经提出了均匀或接近均匀采样表示法以及相应的卷积运算来缓解这一问题,但这些方法往往只关注均匀采样率,从而忽略了图像的内容。在本文中,我们认为 360° 图像中的不同内容具有不同的意义,并主张在 360° 图像压缩中采用内容自适应参数表示法,这种表示法同时考虑了内容和采样率。我们首先介绍了参数伪圆柱表示法和相应的卷积运算,并在此基础上建立了学习型 360° 图像编解码器。然后,我们根据图像内容及其球面坐标,将表示的超参数建模为网络输出。我们将优化不同 360° 图像的超参数视为不同的压缩任务,并提出了一种元学习算法,用于联合优化编解码器和元知识(即超参数估计网络)。一个重大挑战是缺乏从压缩损失到超参数网络的直接导数。为了解决这个问题,我们提出了一种新方法,将速率失真损失作为超参数的函数进行放松,从而实现基于梯度的元知识优化。全向图像的实验结果表明,我们的方法实现了最先进的性能和卓越的视觉质量。
{"title":"Learning Content-Weighted Pseudocylindrical Representation for 360° Image Compression","authors":"Mu Li;Youneng Bao;Xiaohang Sui;Jinxing Li;Guangming Lu;Yong Xu","doi":"10.1109/TIP.2024.3477356","DOIUrl":"10.1109/TIP.2024.3477356","url":null,"abstract":"Learned 360° image compression methods using equirectangular projection (ERP) often confront a non-uniform sampling issue, inherent to sphere-to-rectangle projection. While uniformly or nearly uniformly sampling representations, along with their corresponding convolution operations, have been proposed to mitigate this issue, these methods often concentrate solely on uniform sampling rates, thus neglecting the content of the image. In this paper, we urge that different contents within 360° images have varying significance and advocate for the adoption of a content-adaptive parametric representation in 360° image compression, which takes into account both the content and sampling rate. We first introduce the parametric pseudocylindrical representation and corresponding convolution operation, upon which we build a learned 360° image codec. Then, we model the hyperparameter of the representation as the output of a network, derived from the image’s content and its spherical coordinates. We treat the optimization of hyperparameters for different 360° images as distinct compression tasks and propose a meta-learning algorithm to jointly optimize the codec and the metaknowledge, i.e., the hyperparameter estimation network. A significant challenge is the lack of a direct derivative from the compression loss to the hyperparameter network. To address this, we present a novel method to relax the rate-distortion loss as a function of the hyperparameters, enabling gradient-based optimization of the metaknowledge. Experimental results on omnidirectional images demonstrate that our method achieves state-of-the-art performance and superior visual quality.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5975-5988"},"PeriodicalIF":0.0,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142448772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1