首页 > 最新文献

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

英文 中文
Blind Video Quality Prediction by Uncovering Human Video Perceptual Representation. 通过揭示人类视频感知表征进行盲视频质量预测
Liang Liao, Kangmin Xu, Haoning Wu, Chaofeng Chen, Wenxiu Sun, Qiong Yan, C-C Jay Kuo, Weisi Lin

Blind video quality assessment (VQA) has become an increasingly demanding problem in automatically assessing the quality of ever-growing in-the-wild videos. Although efforts have been made to measure temporal distortions, the core to distinguish between VQA and image quality assessment (IQA), the lack of modeling of how the human visual system (HVS) relates to the temporal quality of videos hinders the precise mapping of predicted temporal scores to the human perception. Inspired by the recent discovery of the temporal straightness law of natural videos in the HVS, this paper intends to model the complex temporal distortions of in-the-wild videos in a simple and uniform representation by describing the geometric properties of videos in the visual perceptual domain. A novel videolet, with perceptual representation embedding of a few consecutive frames, is designed as the basic quality measurement unit to quantify temporal distortions by measuring the angular and linear displacements from the straightness law. By combining the predicted score on each videolet, a perceptually temporal quality evaluator (PTQE) is formed to measure the temporal quality of the entire video. Experimental results demonstrate that the perceptual representation in the HVS is an efficient way of predicting subjective temporal quality. Moreover, when combined with spatial quality metrics, PTQE achieves top performance over popular in-the-wild video datasets. More importantly, PTQE requires no additional information beyond the video being assessed, making it applicable to any dataset without parameter tuning. Additionally, the generalizability of PTQE is evaluated on video frame interpolation tasks, demonstrating its potential to benefit temporal-related enhancement tasks.

盲视频质量评估(VQA)已成为自动评估日益增多的野外视频质量的一个日益紧迫的问题。虽然人们已经努力测量时间失真,并以此作为区分 VQA 和图像质量评估(IQA)的核心,但由于缺乏对人类视觉系统(HVS)与视频时间质量关系的建模,因此无法将预测的时间分数精确映射到人类感知。受最近发现的自然视频在 HVS 中的时间平直度规律的启发,本文打算通过描述视频在视觉感知域中的几何特性,用简单统一的表示方法来模拟野外视频的复杂时间失真。本文设计了一种新颖的 videolet,将几个连续帧的感知表示嵌入其中,作为基本的质量测量单元,通过测量直线度法则的角度和线性位移来量化时间失真。通过综合每个视频子的预测得分,形成一个感知时态质量评价器(PTQE)来测量整个视频的时态质量。实验结果表明,HVS 中的感知表示是预测主观时间质量的有效方法。此外,当与空间质量度量相结合时,PTQE 在流行的野生视频数据集上取得了顶级性能。更重要的是,PTQE 无需评估视频之外的额外信息,因此无需调整参数即可适用于任何数据集。此外,在视频帧插值任务中对 PTQE 的通用性进行了评估,证明了它在时间相关增强任务中的潜在优势。
{"title":"Blind Video Quality Prediction by Uncovering Human Video Perceptual Representation.","authors":"Liang Liao, Kangmin Xu, Haoning Wu, Chaofeng Chen, Wenxiu Sun, Qiong Yan, C-C Jay Kuo, Weisi Lin","doi":"10.1109/TIP.2024.3445738","DOIUrl":"https://doi.org/10.1109/TIP.2024.3445738","url":null,"abstract":"<p><p>Blind video quality assessment (VQA) has become an increasingly demanding problem in automatically assessing the quality of ever-growing in-the-wild videos. Although efforts have been made to measure temporal distortions, the core to distinguish between VQA and image quality assessment (IQA), the lack of modeling of how the human visual system (HVS) relates to the temporal quality of videos hinders the precise mapping of predicted temporal scores to the human perception. Inspired by the recent discovery of the temporal straightness law of natural videos in the HVS, this paper intends to model the complex temporal distortions of in-the-wild videos in a simple and uniform representation by describing the geometric properties of videos in the visual perceptual domain. A novel videolet, with perceptual representation embedding of a few consecutive frames, is designed as the basic quality measurement unit to quantify temporal distortions by measuring the angular and linear displacements from the straightness law. By combining the predicted score on each videolet, a perceptually temporal quality evaluator (PTQE) is formed to measure the temporal quality of the entire video. Experimental results demonstrate that the perceptual representation in the HVS is an efficient way of predicting subjective temporal quality. Moreover, when combined with spatial quality metrics, PTQE achieves top performance over popular in-the-wild video datasets. More importantly, PTQE requires no additional information beyond the video being assessed, making it applicable to any dataset without parameter tuning. Additionally, the generalizability of PTQE is evaluated on video frame interpolation tasks, demonstrating its potential to benefit temporal-related enhancement tasks.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UVaT: Uncertainty Incorporated View-aware Transformer for Robust Multi-view Classification. UVaT:用于稳健多视角分类的不确定性纳入视角感知变换器。
Yapeng Li, Yong Luo, Bo Du

Existing multi-view classification algorithms usually assume that all examples have observations on all views, and the data in different views are clean. However, in real-world applications, we are often provided with data that have missing representations or contain noise on some views (i.e., missing or noise views). This may lead to significant performance degeneration, and thus many algorithms are proposed to address the incomplete view or noisy view issues. However, most of existing algorithms deal with the two issues separately, and hence may fail when both missing and noisy views exist. They are also usually not flexible in that the view or feature significance cannot be adaptively identified. Besides, the view missing patterns may vary in the training and test phases, and such difference is often ignored. To remedy these drawbacks, we propose a novel multi-view classification framework that is simultaneously robust to both incomplete and noisy views. This is achieved by integrating early fusion and late fusion in a single framework. Specifically, in our early fusion module, we propose a view-aware transformer to mask the missing views and adaptively explore the relationships between views and target tasks to deal with missing views. Considering that view missing patterns may change from the training to the test phase, we also design single-view classification and category-consistency constraints to reduce the dependence of our model on view-missing patterns. In our late fusion module, we quantify the view uncertainty in an ensemble way to estimate the noise level of that view. Then the uncertainty and prediction logits of different views are integrated to make our model robust to noisy views. The framework is trained in an end-to-end manner. Experimental results on diverse datasets demonstrate the robustness and effectiveness of our model for both incomplete and noisy views. Codes are available at https://github.com/li-yapeng/UVaT.

现有的多视图分类算法通常假定所有实例在所有视图上都有观察结果,而且不同视图上的数据都是干净的。然而,在实际应用中,我们经常会遇到某些视图上的数据表示缺失或包含噪声(即缺失视图或噪声视图)。这可能会导致性能大幅下降,因此有很多算法被提出来解决不完整视图或噪声视图问题。然而,现有的大多数算法都是分别处理这两个问题,因此当同时存在缺失视图和噪声视图时,算法可能会失效。这些算法通常也不灵活,不能自适应地识别视图或特征的重要性。此外,视图缺失模式在训练和测试阶段可能会有所不同,而这种差异往往会被忽略。为了弥补这些缺陷,我们提出了一种新颖的多视图分类框架,它同时对不完整视图和噪声视图具有鲁棒性。这是通过将早期融合和后期融合整合到一个框架中来实现的。具体来说,在早期融合模块中,我们提出了一个视图感知转换器来掩盖缺失视图,并自适应地探索视图与目标任务之间的关系,以处理缺失视图。考虑到从训练到测试阶段,视图缺失模式可能会发生变化,我们还设计了单视图分类和类别一致性约束,以减少模型对视图缺失模式的依赖。在后期融合模块中,我们以集合方式量化视图的不确定性,以估计该视图的噪声水平。然后整合不同视图的不确定性和预测对数,使我们的模型对噪声视图具有鲁棒性。该框架以端到端的方式进行训练。在不同数据集上的实验结果表明,我们的模型对不完整视图和噪声视图都具有鲁棒性和有效性。代码见 https://github.com/li-yapeng/UVaT。
{"title":"UVaT: Uncertainty Incorporated View-aware Transformer for Robust Multi-view Classification.","authors":"Yapeng Li, Yong Luo, Bo Du","doi":"10.1109/TIP.2024.3451931","DOIUrl":"https://doi.org/10.1109/TIP.2024.3451931","url":null,"abstract":"<p><p>Existing multi-view classification algorithms usually assume that all examples have observations on all views, and the data in different views are clean. However, in real-world applications, we are often provided with data that have missing representations or contain noise on some views (i.e., missing or noise views). This may lead to significant performance degeneration, and thus many algorithms are proposed to address the incomplete view or noisy view issues. However, most of existing algorithms deal with the two issues separately, and hence may fail when both missing and noisy views exist. They are also usually not flexible in that the view or feature significance cannot be adaptively identified. Besides, the view missing patterns may vary in the training and test phases, and such difference is often ignored. To remedy these drawbacks, we propose a novel multi-view classification framework that is simultaneously robust to both incomplete and noisy views. This is achieved by integrating early fusion and late fusion in a single framework. Specifically, in our early fusion module, we propose a view-aware transformer to mask the missing views and adaptively explore the relationships between views and target tasks to deal with missing views. Considering that view missing patterns may change from the training to the test phase, we also design single-view classification and category-consistency constraints to reduce the dependence of our model on view-missing patterns. In our late fusion module, we quantify the view uncertainty in an ensemble way to estimate the noise level of that view. Then the uncertainty and prediction logits of different views are integrated to make our model robust to noisy views. The framework is trained in an end-to-end manner. Experimental results on diverse datasets demonstrate the robustness and effectiveness of our model for both incomplete and noisy views. Codes are available at https://github.com/li-yapeng/UVaT.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
M2GCNet: Multi-Modal Graph Convolution Network for Precise Brain Tumor Segmentation Across Multiple MRI Sequences M2GCNet:多模态图卷积网络用于在多个磁共振成像序列中精确划分脑肿瘤。
Tongxue Zhou
Accurate segmentation of brain tumors across multiple MRI sequences is essential for diagnosis, treatment planning, and clinical decision-making. In this paper, I propose a cutting-edge framework, named multi-modal graph convolution network (M2GCNet), to explore the relationships across different MR modalities, and address the challenge of brain tumor segmentation. The core of M2GCNet is the multi-modal graph convolution module (M2GCM), a pivotal component that represents MR modalities as graphs, with nodes corresponding to image pixels and edges capturing latent relationships between pixels. This graph-based representation enables the effective utilization of both local and global contextual information. Notably, M2GCM comprises two important modules: the spatial-wise graph convolution module (SGCM), adept at capturing extensive spatial dependencies among distinct regions within an image, and the channel-wise graph convolution module (CGCM), dedicated to modelling intricate contextual dependencies among different channels within the image. Additionally, acknowledging the intrinsic correlation present among different MR modalities, a multi-modal correlation loss function is introduced. This novel loss function aims to capture specific nonlinear relationships between correlated modality pairs, enhancing the model’s ability to achieve accurate segmentation results. The experimental evaluation on two brain tumor datasets demonstrates the superiority of the proposed M2GCNet over other state-of-the-art segmentation methods. Furthermore, the proposed method paves the way for improved tumor diagnosis, multi-modal information fusion, and a deeper understanding of brain tumor pathology.
在多个核磁共振成像序列中准确分割脑肿瘤对于诊断、治疗计划和临床决策至关重要。在本文中,我提出了一个名为多模态图卷积网络(M2GCNet)的前沿框架,以探索不同磁共振成像模式之间的关系,解决脑肿瘤分割的难题。M2GCNet 的核心是多模态图卷积模块(M2GCM),它是将磁共振模态表示为图的关键组件,节点对应图像像素,边缘捕捉像素之间的潜在关系。这种基于图的表示方法能有效利用局部和全局上下文信息。值得注意的是,M2GCM 包括两个重要模块:空间图卷积模块(SGCM)和通道图卷积模块(CGCM),前者善于捕捉图像中不同区域之间广泛的空间依赖关系,后者则致力于模拟图像中不同通道之间错综复杂的上下文依赖关系。此外,考虑到不同磁共振模式之间的内在相关性,还引入了多模式相关损失函数。这种新颖的损失函数旨在捕捉相关模态对之间的特定非线性关系,从而提高模型获得准确分割结果的能力。在两个脑肿瘤数据集上进行的实验评估表明,所提出的 M2GCNet 优于其他最先进的分割方法。此外,所提出的方法还为改进肿瘤诊断、多模态信息融合以及加深对脑肿瘤病理的理解铺平了道路。
{"title":"M2GCNet: Multi-Modal Graph Convolution Network for Precise Brain Tumor Segmentation Across Multiple MRI Sequences","authors":"Tongxue Zhou","doi":"10.1109/TIP.2024.3451936","DOIUrl":"10.1109/TIP.2024.3451936","url":null,"abstract":"Accurate segmentation of brain tumors across multiple MRI sequences is essential for diagnosis, treatment planning, and clinical decision-making. In this paper, I propose a cutting-edge framework, named multi-modal graph convolution network (M2GCNet), to explore the relationships across different MR modalities, and address the challenge of brain tumor segmentation. The core of M2GCNet is the multi-modal graph convolution module (M2GCM), a pivotal component that represents MR modalities as graphs, with nodes corresponding to image pixels and edges capturing latent relationships between pixels. This graph-based representation enables the effective utilization of both local and global contextual information. Notably, M2GCM comprises two important modules: the spatial-wise graph convolution module (SGCM), adept at capturing extensive spatial dependencies among distinct regions within an image, and the channel-wise graph convolution module (CGCM), dedicated to modelling intricate contextual dependencies among different channels within the image. Additionally, acknowledging the intrinsic correlation present among different MR modalities, a multi-modal correlation loss function is introduced. This novel loss function aims to capture specific nonlinear relationships between correlated modality pairs, enhancing the model’s ability to achieve accurate segmentation results. The experimental evaluation on two brain tumor datasets demonstrates the superiority of the proposed M2GCNet over other state-of-the-art segmentation methods. Furthermore, the proposed method paves the way for improved tumor diagnosis, multi-modal information fusion, and a deeper understanding of brain tumor pathology.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-Preserving Autoencoder for Collaborative Object Detection 用于协作对象检测的隐私保护自动编码器
Bardia Azizian;Ivan V. Bajić
Privacy is a crucial concern in collaborative machine vision where a part of a Deep Neural Network (DNN) model runs on the edge, and the rest is executed on the cloud. In such applications, the machine vision model does not need the exact visual content to perform its task. Taking advantage of this potential, private information could be removed from the data insofar as it does not significantly impair the accuracy of the machine vision system. In this paper, we present an autoencoder-style network integrated within an object detection pipeline, which generates a latent representation of the input image that preserves task-relevant information while removing private information. Our approach employs an adversarial training strategy that not only removes private information from the bottleneck of the autoencoder but also promotes improved compression efficiency for feature channels coded by conventional codecs like VVC-Intra. We assess the proposed system using a realistic evaluation framework for privacy, directly measuring face and license plate recognition accuracy. Experimental results show that our proposed method is able to reduce the bitrate significantly at the same object detection accuracy compared to coding the input images directly, while keeping the face and license plate recognition accuracy on the images recovered from the bottleneck features low, implying strong privacy protection. Our code is available at https://github.com/bardia-az/ppa-code.
在协作式机器视觉中,一部分深度神经网络(DNN)模型在边缘运行,其余部分在云端执行,因此隐私是一个关键问题。在这种应用中,机器视觉模型不需要确切的视觉内容来执行任务。利用这一潜力,只要不严重影响机器视觉系统的准确性,就可以从数据中删除私人信息。在本文中,我们提出了一种集成在物体检测流水线中的自动编码器式网络,它能生成输入图像的潜在表示,在保留任务相关信息的同时去除私人信息。我们的方法采用了一种对抗训练策略,不仅能从自动编码器的瓶颈中去除私人信息,还能提高用传统编解码器(如 VVC-Intra)编码的特征通道的压缩效率。我们使用一个现实的隐私评估框架来评估所提出的系统,直接测量人脸和车牌识别的准确性。实验结果表明,与直接对输入图像进行编码相比,我们提出的方法能够在相同的目标检测精度下显著降低比特率,同时保持从瓶颈特征恢复的图像上较低的人脸和车牌识别精度,这意味着对隐私的有力保护。我们的代码见 https://github.com/bardia-az/ppa-code。
{"title":"Privacy-Preserving Autoencoder for Collaborative Object Detection","authors":"Bardia Azizian;Ivan V. Bajić","doi":"10.1109/TIP.2024.3451938","DOIUrl":"10.1109/TIP.2024.3451938","url":null,"abstract":"Privacy is a crucial concern in collaborative machine vision where a part of a Deep Neural Network (DNN) model runs on the edge, and the rest is executed on the cloud. In such applications, the machine vision model does not need the exact visual content to perform its task. Taking advantage of this potential, private information could be removed from the data insofar as it does not significantly impair the accuracy of the machine vision system. In this paper, we present an autoencoder-style network integrated within an object detection pipeline, which generates a latent representation of the input image that preserves task-relevant information while removing private information. Our approach employs an adversarial training strategy that not only removes private information from the bottleneck of the autoencoder but also promotes improved compression efficiency for feature channels coded by conventional codecs like VVC-Intra. We assess the proposed system using a realistic evaluation framework for privacy, directly measuring face and license plate recognition accuracy. Experimental results show that our proposed method is able to reduce the bitrate significantly at the same object detection accuracy compared to coding the input images directly, while keeping the face and license plate recognition accuracy on the images recovered from the bottleneck features low, implying strong privacy protection. Our code is available at \u0000<uri>https://github.com/bardia-az/ppa-code</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Style Consistency Unsupervised Domain Adaptation Medical Image Segmentation 风格一致的无监督领域适应医学图像分割。
Lang Chen;Yun Bian;Jianbin Zeng;Qingquan Meng;Weifang Zhu;Fei Shi;Chengwei Shao;Xinjian Chen;Dehui Xiang
Unsupervised domain adaptation medical image segmentation is aimed to segment unlabeled target domain images with labeled source domain images. However, different medical imaging modalities lead to large domain shift between their images, in which well-trained models from one imaging modality often fail to segment images from anothor imaging modality. In this paper, to mitigate domain shift between source domain and target domain, a style consistency unsupervised domain adaptation image segmentation method is proposed. First, a local phase-enhanced style fusion method is designed to mitigate domain shift and produce locally enhanced organs of interest. Second, a phase consistency discriminator is constructed to distinguish the phase consistency of domain-invariant features between source domain and target domain, so as to enhance the disentanglement of the domain-invariant and style encoders and removal of domain-specific features from the domain-invariant encoder. Third, a style consistency estimation method is proposed to obtain inconsistency maps from intermediate synthesized target domain images with different styles to measure the difficult regions, mitigate domain shift between synthesized target domain images and real target domain images, and improve the integrity of interested organs. Fourth, style consistency entropy is defined for target domain images to further improve the integrity of the interested organ by the concentration on the inconsistent regions. Comprehensive experiments have been performed with an in-house dataset and a publicly available dataset. The experimental results have demonstrated the superiority of our framework over state-of-the-art methods.
无监督域适应医学图像分割的目的是将未标记的目标域图像与已标记的源域图像进行分割。然而,不同的医学成像模式会导致其图像之间出现较大的域偏移,在这种情况下,一种成像模式下训练有素的模型往往无法分割另一种成像模式下的图像。为了减轻源域和目标域之间的域偏移,本文提出了一种风格一致的无监督域适应图像分割方法。首先,设计了一种局部相位增强风格融合方法,以减轻域偏移并产生局部增强的感兴趣器官。其次,构建了一个相位一致性判别器,以区分源域和目标域的域不变特征的相位一致性,从而增强域不变编码器和样式编码器的分离,并从域不变编码器中去除特定域的特征。第三,提出一种风格一致性估计方法,从不同风格的中间合成目标域图像中获取不一致性图,以测量困难区域,减轻合成目标域图像与真实目标域图像之间的域偏移,提高感兴趣器官的完整性。第四,定义目标域图像的风格一致性熵,通过集中不一致区域进一步提高感兴趣器官的完整性。我们利用内部数据集和公开数据集进行了综合实验。实验结果表明,我们的框架优于最先进的方法。
{"title":"Style Consistency Unsupervised Domain Adaptation Medical Image Segmentation","authors":"Lang Chen;Yun Bian;Jianbin Zeng;Qingquan Meng;Weifang Zhu;Fei Shi;Chengwei Shao;Xinjian Chen;Dehui Xiang","doi":"10.1109/TIP.2024.3451934","DOIUrl":"10.1109/TIP.2024.3451934","url":null,"abstract":"Unsupervised domain adaptation medical image segmentation is aimed to segment unlabeled target domain images with labeled source domain images. However, different medical imaging modalities lead to large domain shift between their images, in which well-trained models from one imaging modality often fail to segment images from anothor imaging modality. In this paper, to mitigate domain shift between source domain and target domain, a style consistency unsupervised domain adaptation image segmentation method is proposed. First, a local phase-enhanced style fusion method is designed to mitigate domain shift and produce locally enhanced organs of interest. Second, a phase consistency discriminator is constructed to distinguish the phase consistency of domain-invariant features between source domain and target domain, so as to enhance the disentanglement of the domain-invariant and style encoders and removal of domain-specific features from the domain-invariant encoder. Third, a style consistency estimation method is proposed to obtain inconsistency maps from intermediate synthesized target domain images with different styles to measure the difficult regions, mitigate domain shift between synthesized target domain images and real target domain images, and improve the integrity of interested organs. Fourth, style consistency entropy is defined for target domain images to further improve the integrity of the interested organ by the concentration on the inconsistent regions. Comprehensive experiments have been performed with an in-house dataset and a publicly available dataset. The experimental results have demonstrated the superiority of our framework over state-of-the-art methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reference-Based Multi-Stage Progressive Restoration for Multi-Degraded Images 基于参考的多级渐进式修复多降级图像。
Yi Zhang;Qixue Yang;Damon M. Chandler;Xuanqin Mou
Image restoration (IR) via deep learning has been vigorously studied in recent years. However, due to the ill-posed nature of the problem, it is challenging to recover the high-quality image details from a single distorted input especially when images are corrupted by multiple distortions. In this paper, we propose a multi-stage IR approach for progressive restoration of multi-degraded images via transferring similar edges/textures from the reference image. Our method, called a Reference-based Image Restoration Transformer (Ref-IRT), operates via three main stages. In the first stage, a cascaded U-Transformer network is employed to perform the preliminary recovery of the image. The proposed network consists of two U-Transformer architectures connected by feature fusion of the encoders and decoders, and the residual image is estimated by each U-Transformer in an easy-to-hard and coarse-to-fine fashion to gradually recover the high-quality image. The second and third stages perform texture transfer from a reference image to the preliminarily-recovered target image to further enhance the restoration performance. To this end, a quality-degradation-restoration method is proposed for more accurate content/texture matching between the reference and target images, and a texture transfer/reconstruction network is employed to map the transferred features to the high-quality image. Experimental results tested on three benchmark datasets demonstrate the effectiveness of our model as compared with other state-of-the-art multi-degraded IR methods. Our code and dataset are available at https://vinelab.jp/refmdir/.
近年来,通过深度学习进行图像复原(IR)的研究十分活跃。然而,由于该问题的不确定性,从单个失真输入中恢复高质量图像细节具有挑战性,尤其是当图像受到多重失真破坏时。在本文中,我们提出了一种多阶段红外方法,通过从参考图像中转移相似的边缘/纹理来逐步恢复多畸变图像。我们的方法称为基于参考的图像复原转换器(Ref-IRT),通过三个主要阶段进行操作。在第一阶段,采用级联 U 变换器网络对图像进行初步恢复。拟议的网络由两个 U 变换器架构组成,通过编码器和解码器的特征融合进行连接,每个 U 变换器以由易到难、由粗到细的方式估算残留图像,从而逐步恢复高质量图像。第二和第三阶段从参考图像到初步恢复的目标图像进行纹理转移,以进一步提高恢复性能。为此,我们提出了一种质量降级-修复方法,用于在参考图像和目标图像之间进行更精确的内容/纹理匹配,并采用纹理转移/重建网络将转移的特征映射到高质量图像上。在三个基准数据集上测试的实验结果表明,与其他最先进的多重降级红外方法相比,我们的模型非常有效。我们的代码和数据集可在 https://vinelab.jp/refmdir/ 上获取。
{"title":"Reference-Based Multi-Stage Progressive Restoration for Multi-Degraded Images","authors":"Yi Zhang;Qixue Yang;Damon M. Chandler;Xuanqin Mou","doi":"10.1109/TIP.2024.3451939","DOIUrl":"10.1109/TIP.2024.3451939","url":null,"abstract":"Image restoration (IR) via deep learning has been vigorously studied in recent years. However, due to the ill-posed nature of the problem, it is challenging to recover the high-quality image details from a single distorted input especially when images are corrupted by multiple distortions. In this paper, we propose a multi-stage IR approach for progressive restoration of multi-degraded images via transferring similar edges/textures from the reference image. Our method, called a Reference-based Image Restoration Transformer (Ref-IRT), operates via three main stages. In the first stage, a cascaded U-Transformer network is employed to perform the preliminary recovery of the image. The proposed network consists of two U-Transformer architectures connected by feature fusion of the encoders and decoders, and the residual image is estimated by each U-Transformer in an easy-to-hard and coarse-to-fine fashion to gradually recover the high-quality image. The second and third stages perform texture transfer from a reference image to the preliminarily-recovered target image to further enhance the restoration performance. To this end, a quality-degradation-restoration method is proposed for more accurate content/texture matching between the reference and target images, and a texture transfer/reconstruction network is employed to map the transferred features to the high-quality image. Experimental results tested on three benchmark datasets demonstrate the effectiveness of our model as compared with other state-of-the-art multi-degraded IR methods. Our code and dataset are available at \u0000<uri>https://vinelab.jp/refmdir/</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection 为弱监督视频异常检测学习提示增强型上下文特征
Yujiang Pu;Xiaoyu Wu;Lulu Yang;Shengjin Wang
Weakly supervised video anomaly detection aims to locate abnormal activities in untrimmed videos without the need for frame-level supervision. Prior work has utilized graph convolution networks or self-attention mechanisms alongside multiple instance learning (MIL)-based classification loss to model temporal relations and learn discriminative features. However, these approaches are limited in two aspects: 1) Multi-branch parallel architectures, while capturing multi-scale temporal dependencies, inevitably lead to increased parameter and computational costs. 2) The binarized MIL constraint only ensures the interclass separability while neglecting the fine-grained discriminability within anomalous classes. To this end, we introduce a novel WS-VAD framework that focuses on efficient temporal modeling and anomaly innerclass discriminability. We first construct a Temporal Context Aggregation (TCA) module that simultaneously captures local-global dependencies by reusing an attention matrix along with adaptive context fusion. In addition, we propose a Prompt-Enhanced Learning (PEL) module that incorporates semantic priors using knowledge-based prompts to boost the discrimination of visual features while ensuring separability across anomaly subclasses. The proposed components have been validated through extensive experiments, which demonstrate superior performance on three challenging datasets, UCF-Crime, XD-Violence and ShanghaiTech, with fewer parameters and reduced computational effort. Notably, our method can significantly improve the detection accuracy for certain anomaly subclasses and reduced the false alarm rate. Our code is available at: https://github.com/yujiangpu20/PEL4VAD.
弱监督视频异常检测旨在定位未剪辑视频中的异常活动,而无需帧级监督。之前的工作利用图卷积网络或自注意机制,以及基于多实例学习(MIL)的分类损失,对时间关系建模并学习判别特征。然而,这些方法在两个方面受到限制:(1) 多分支并行架构在捕捉多尺度时间依赖性的同时,不可避免地导致参数和计算成本的增加。(2) 二值化 MIL 约束只能确保类间可分性,而忽略了异常类内的细粒度可分性。为此,我们引入了一种新颖的 WS-VAD 框架,重点关注高效的时态建模和异常类内部的可区分性。我们首先构建了一个时态上下文聚合(TCA)模块,该模块通过重用注意力矩阵和自适应上下文融合,同时捕捉局部和全局的依赖关系。此外,我们还提出了 "提示增强学习"(Prompt-Enhanced Learning,PEL)模块,该模块利用基于知识的提示纳入语义先验,以提高视觉特征的辨别能力,同时确保异常子类之间的可分离性。我们通过大量实验对所提出的组件进行了验证,结果表明这些组件在 UCF-犯罪、XD-暴力和 ShanghaiTech 这三个具有挑战性的数据集上表现出色,而且参数更少,计算量更小。值得注意的是,我们的方法能显著提高某些异常子类的检测准确率,并降低误报率。我们的代码见:https://github.com/yujiangpu20/PEL4VAD。
{"title":"Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection","authors":"Yujiang Pu;Xiaoyu Wu;Lulu Yang;Shengjin Wang","doi":"10.1109/TIP.2024.3451935","DOIUrl":"10.1109/TIP.2024.3451935","url":null,"abstract":"Weakly supervised video anomaly detection aims to locate abnormal activities in untrimmed videos without the need for frame-level supervision. Prior work has utilized graph convolution networks or self-attention mechanisms alongside multiple instance learning (MIL)-based classification loss to model temporal relations and learn discriminative features. However, these approaches are limited in two aspects: 1) Multi-branch parallel architectures, while capturing multi-scale temporal dependencies, inevitably lead to increased parameter and computational costs. 2) The binarized MIL constraint only ensures the interclass separability while neglecting the fine-grained discriminability within anomalous classes. To this end, we introduce a novel WS-VAD framework that focuses on efficient temporal modeling and anomaly innerclass discriminability. We first construct a Temporal Context Aggregation (TCA) module that simultaneously captures local-global dependencies by reusing an attention matrix along with adaptive context fusion. In addition, we propose a Prompt-Enhanced Learning (PEL) module that incorporates semantic priors using knowledge-based prompts to boost the discrimination of visual features while ensuring separability across anomaly subclasses. The proposed components have been validated through extensive experiments, which demonstrate superior performance on three challenging datasets, UCF-Crime, XD-Violence and ShanghaiTech, with fewer parameters and reduced computational effort. Notably, our method can significantly improve the detection accuracy for certain anomaly subclasses and reduced the false alarm rate. Our code is available at: \u0000<uri>https://github.com/yujiangpu20/PEL4VAD</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural Relation Modeling of 3D Point Clouds 三维点云的结构关系建模
Yu Zheng;Jiwen Lu;Yueqi Duan;Jie Zhou
In this paper, we propose an effective plug-and-play module called structural relation network (SRN) to model structural dependencies in 3D point clouds for feature representation. Existing network architectures such as PointNet++ and RS-CNN capture local structures individually and ignore the inner interactions between different sub-clouds. Motivated by the fact that structural relation modeling plays critical roles for humans to understand 3D objects, our SRN exploits local information by modeling structural relations in 3D spaces. For a given sub-cloud of point sets, SRN firstly extracts its geometrical and locational relations with the other sub-clouds and maps them into the embedding space, then aggregates both relational features with the other sub-clouds. As the variation of semantics embedded in different sub-clouds is ignored by SRN, we further extend SRN to enable dynamic message passing between different sub-clouds. We propose a graph-based structural relation network (GSRN) where sub-clouds and their pairwise relations are modeled as nodes and edges respectively, so that the node features are updated by the messages along the edges. Since the node features might not be well preserved when acquiring the global representation, we propose a Combined Entropy Readout (CER) function to adaptively aggregate them into the holistic representation, so that GSRN simultaneously models the local-local and local-global region-wise interaction. The proposed SRN and GSRN modules are simple, interpretable, and do not require any additional supervision signals, which can be easily equipped with the existing networks. Experimental results on the benchmark datasets (ScanObjectNN, ModelNet40, ShapeNet Part, S3DIS, ScanNet and SUN-RGBD) indicate promising boosts on the tasks of 3D point cloud classification, segmentation and object detection.
在本文中,我们提出了一种有效的即插即用模块,称为结构关系网络(SRN),用于为三维点云中的结构依赖关系建模,以进行特征表示。现有的网络架构(如 PointNet++ 和 RS-CNN)只能单独捕捉局部结构,而忽略了不同子云之间的内在相互作用。结构关系建模对人类理解三维物体起着至关重要的作用,受这一事实的启发,我们的 SRN 通过对三维空间中的结构关系建模来利用局部信息。对于给定的点集子云,SRN 首先提取其与其他子云之间的几何关系和位置关系,并将其映射到嵌入空间,然后将这两种关系特征与其他子云聚合在一起。由于 SRN 忽略了不同子云中嵌入语义的变化,因此我们进一步扩展了 SRN,使其能够在不同子云之间动态传递信息。我们提出了一种基于图的结构关系网络(GSRN),其中子云及其配对关系分别被建模为节点和边,这样节点特征就会被边上的信息更新。由于节点特征在获取全局表示时可能不会得到很好的保留,我们提出了一个组合熵读出(CER)函数,将节点特征自适应地聚合到整体表示中,从而使 GSRN 同时模拟局部-局部和局部-全局区域交互。所提出的 SRN 和 GSRN 模块简单、可解释,不需要任何额外的监督信号,可以很容易地配备到现有网络中。在基准数据集(ScanObjectNN、ModelNet40、ShapeNet Part、S3DIS、ScanNet 和 SUN-RGBD)上的实验结果表明,在三维点云分类、分割和物体检测任务中,SRN 和 GSRN 有着良好的提升效果。
{"title":"Structural Relation Modeling of 3D Point Clouds","authors":"Yu Zheng;Jiwen Lu;Yueqi Duan;Jie Zhou","doi":"10.1109/TIP.2024.3451940","DOIUrl":"10.1109/TIP.2024.3451940","url":null,"abstract":"In this paper, we propose an effective plug-and-play module called structural relation network (SRN) to model structural dependencies in 3D point clouds for feature representation. Existing network architectures such as PointNet++ and RS-CNN capture local structures individually and ignore the inner interactions between different sub-clouds. Motivated by the fact that structural relation modeling plays critical roles for humans to understand 3D objects, our SRN exploits local information by modeling structural relations in 3D spaces. For a given sub-cloud of point sets, SRN firstly extracts its geometrical and locational relations with the other sub-clouds and maps them into the embedding space, then aggregates both relational features with the other sub-clouds. As the variation of semantics embedded in different sub-clouds is ignored by SRN, we further extend SRN to enable dynamic message passing between different sub-clouds. We propose a graph-based structural relation network (GSRN) where sub-clouds and their pairwise relations are modeled as nodes and edges respectively, so that the node features are updated by the messages along the edges. Since the node features might not be well preserved when acquiring the global representation, we propose a Combined Entropy Readout (CER) function to adaptively aggregate them into the holistic representation, so that GSRN simultaneously models the local-local and local-global region-wise interaction. The proposed SRN and GSRN modules are simple, interpretable, and do not require any additional supervision signals, which can be easily equipped with the existing networks. Experimental results on the benchmark datasets (ScanObjectNN, ModelNet40, ShapeNet Part, S3DIS, ScanNet and SUN-RGBD) indicate promising boosts on the tasks of 3D point cloud classification, segmentation and object detection.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contrastive Open-set Active Learning based Sample Selection for Image Classification. 基于对比开放集主动学习的图像分类样本选择。
Zizheng Yan, Delian Ruan, Yushuang Wu, Junshi Huang, Zhenhua Chai, Xiaoguang Han, Shuguang Cui, Guanbin Li

In this paper, we address a complex but practical scenario in Active Learning (AL) known as open-set AL, where the unlabeled data consists of both in-distribution (ID) and out-of-distribution (OOD) samples. Standard AL methods will fail in this scenario as OOD samples are highly likely to be regarded as uncertain samples, leading to their selection and wasting of the budget. Existing methods focus on selecting the highly likely ID samples, which tend to be easy and less informative. To this end, we introduce two criteria, namely contrastive confidence and historical divergence, which measure the possibility of being ID and the hardness of a sample, respectively. By balancing the two proposed criteria, highly informative ID samples can be selected as much as possible. Furthermore, unlike previous methods that require additional neural networks to detect the OOD samples, we propose a contrastive clustering framework that endows the classifier with the ability to identify the OOD samples and further enhances the network's representation learning. The experimental results demonstrate that the proposed method achieves state-of-the-art performance on several benchmark datasets.

在本文中,我们讨论了主动学习(AL)中一个复杂但实用的场景,即开放集 AL,其中未标记数据由分布内(ID)和分布外(OOD)样本组成。标准 AL 方法在这种情况下会失效,因为 OOD 样本极有可能被视为不确定样本,从而被选中并浪费预算。现有方法的重点是选择极有可能被视为不确定的 ID 样本,而这些样本往往比较简单且信息量较少。为此,我们引入了两个标准,即对比置信度和历史分歧,分别衡量 ID 的可能性和样本的硬度。通过平衡所提出的两个标准,可以尽可能选择信息量大的 ID 样本。此外,与以往需要额外的神经网络来检测 OOD 样本的方法不同,我们提出了一个对比聚类框架,赋予分类器识别 OOD 样本的能力,并进一步增强网络的表征学习。实验结果表明,所提出的方法在多个基准数据集上取得了最先进的性能。
{"title":"Contrastive Open-set Active Learning based Sample Selection for Image Classification.","authors":"Zizheng Yan, Delian Ruan, Yushuang Wu, Junshi Huang, Zhenhua Chai, Xiaoguang Han, Shuguang Cui, Guanbin Li","doi":"10.1109/TIP.2024.3451928","DOIUrl":"https://doi.org/10.1109/TIP.2024.3451928","url":null,"abstract":"<p><p>In this paper, we address a complex but practical scenario in Active Learning (AL) known as open-set AL, where the unlabeled data consists of both in-distribution (ID) and out-of-distribution (OOD) samples. Standard AL methods will fail in this scenario as OOD samples are highly likely to be regarded as uncertain samples, leading to their selection and wasting of the budget. Existing methods focus on selecting the highly likely ID samples, which tend to be easy and less informative. To this end, we introduce two criteria, namely contrastive confidence and historical divergence, which measure the possibility of being ID and the hardness of a sample, respectively. By balancing the two proposed criteria, highly informative ID samples can be selected as much as possible. Furthermore, unlike previous methods that require additional neural networks to detect the OOD samples, we propose a contrastive clustering framework that endows the classifier with the ability to identify the OOD samples and further enhances the network's representation learning. The experimental results demonstrate that the proposed method achieves state-of-the-art performance on several benchmark datasets.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balanced Destruction-Reconstruction Dynamics for Memory-Replay Class Incremental Learning 记忆重放类增量学习的平衡破坏-重建动力学
Yuhang Zhou;Jiangchao Yao;Feng Hong;Ya Zhang;Yanfeng Wang
Class incremental learning (CIL) aims to incrementally update a trained model with the new classes of samples (plasticity) while retaining previously learned ability (stability). To address the most challenging issue in this goal, i.e., catastrophic forgetting, the mainstream paradigm is memory-replay CIL, which consolidates old knowledge by replaying a small number of old classes of samples saved in the memory. Despite effectiveness, the inherent destruction-reconstruction dynamics in memory-replay CIL are an intrinsic limitation: if the old knowledge is severely destructed, it will be quite hard to reconstruct the lossless counterpart. Our theoretical analysis shows that the destruction of old knowledge can be effectively alleviated by balancing the contribution of samples from the current phase and those saved in the memory. Motivated by this theoretical finding, we propose a novel Balanced Destruction-Reconstruction module (BDR) for memory-replay CIL, which can achieve better knowledge reconstruction by reducing the degree of maximal destruction of old knowledge. Specifically, to achieve a better balance between old knowledge and new classes, the proposed BDR module takes into account two factors: the variance in training status across different classes and the quantity imbalance of samples from the current phase and memory. By dynamically manipulating the gradient during training based on these factors, BDR can effectively alleviate knowledge destruction and improve knowledge reconstruction. Extensive experiments on a range of CIL benchmarks have shown that as a lightweight plug-and-play module, BDR can significantly improve the performance of existing state-of-the-art methods with good generalization. Our code is publicly available here.
类增量学习(CIL)旨在利用新的样本类增量更新训练有素的模型(可塑性),同时保留先前学习的能力(稳定性)。为了解决这一目标中最具挑战性的问题,即灾难性遗忘,主流范式是记忆重放 CIL,即通过重放保存在内存中的少量旧类样本来巩固旧知识。尽管这种方法很有效,但记忆重放 CIL 中固有的破坏-重建动态是一种内在限制:如果旧知识遭到严重破坏,就很难重建无损的对应知识。我们的理论分析表明,通过平衡当前阶段样本和内存中保存样本的贡献,可以有效缓解旧知识的破坏。受这一理论发现的启发,我们为记忆重放 CIL 提出了一种新颖的平衡破坏-重构模块(BDR),它可以通过降低旧知识的最大破坏程度来实现更好的知识重构。具体来说,为了在旧知识和新类别之间实现更好的平衡,所提出的 BDR 模块考虑了两个因素:不同类别间训练状态的差异,以及当前阶段和记忆中样本数量的不平衡1。通过在训练过程中根据这些因素动态调整梯度,BDR 可以有效缓解知识破坏并改善知识重建。在一系列 CIL 基准上进行的广泛实验表明,作为一个轻量级即插即用模块,BDR 可以显著提高现有先进方法的性能,并具有良好的泛化能力。我们的代码在此公开发布。
{"title":"Balanced Destruction-Reconstruction Dynamics for Memory-Replay Class Incremental Learning","authors":"Yuhang Zhou;Jiangchao Yao;Feng Hong;Ya Zhang;Yanfeng Wang","doi":"10.1109/TIP.2024.3451932","DOIUrl":"10.1109/TIP.2024.3451932","url":null,"abstract":"Class incremental learning (CIL) aims to incrementally update a trained model with the new classes of samples (plasticity) while retaining previously learned ability (stability). To address the most challenging issue in this goal, i.e., catastrophic forgetting, the mainstream paradigm is memory-replay CIL, which consolidates old knowledge by replaying a small number of old classes of samples saved in the memory. Despite effectiveness, the inherent destruction-reconstruction dynamics in memory-replay CIL are an intrinsic limitation: if the old knowledge is severely destructed, it will be quite hard to reconstruct the lossless counterpart. Our theoretical analysis shows that the destruction of old knowledge can be effectively alleviated by balancing the contribution of samples from the current phase and those saved in the memory. Motivated by this theoretical finding, we propose a novel Balanced Destruction-Reconstruction module (BDR) for memory-replay CIL, which can achieve better knowledge reconstruction by reducing the degree of maximal destruction of old knowledge. Specifically, to achieve a better balance between old knowledge and new classes, the proposed BDR module takes into account two factors: the variance in training status across different classes and the quantity imbalance of samples from the current phase and memory. By dynamically manipulating the gradient during training based on these factors, BDR can effectively alleviate knowledge destruction and improve knowledge reconstruction. Extensive experiments on a range of CIL benchmarks have shown that as a lightweight plug-and-play module, BDR can significantly improve the performance of existing state-of-the-art methods with good generalization. Our code is publicly available here.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1