Journal of Visual Communication and Image Representation最新文献_第7页

Person re-identification transformer with patch attention and pruning 人员重新识别变压器贴片注意和修剪

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-11-26 DOI: 10.1016/j.jvcir.2024.104348

Fabrice Ndayishimiye , Gang-Joon Yoon , Joonjae Lee , Sang Min Yoon

Person re-identification (Re-ID), which is widely used in surveillance and tracking systems, aims to search individuals as they move between different camera views by maintaining identity across various camera views. In the realm of person re-identification (Re-ID), recent advancements have introduced convolutional neural networks (CNNs) and vision transformers (ViTs) as promising solutions. While CNN-based methods excel in local feature extraction, ViTs have emerged as effective alternatives to CNN-based person Re-ID, offering the ability to capture long-range dependencies through multi-head self-attention without relying on convolution and downsampling. However, it still faces challenges such as changes in illumination, viewpoint, pose, low resolutions, and partial occlusions. To address the limitations of widely used person Re-ID datasets and improve the generalization, we present a novel person Re-ID method that enhances global and local information interactions using self-attention modules within a ViT network. It leverages dynamic pruning to extract and prioritize essential image patches effectively. The designed patch selection and pruning for person Re-ID model resulted in a robust feature extractor even in scenarios with partial occlusion, background clutter, and illumination variations. Empirical validation demonstrates its superior performance compared to previous approaches and its adaptability across various domains.

人员再识别（Re-ID）技术广泛应用于监控和跟踪系统，其目的是通过在不同的摄像机视图中保持身份来搜索在不同摄像机视图之间移动的个人。在人的再识别（Re-ID）领域，最近的进展是引入卷积神经网络（cnn）和视觉变压器（ViTs）作为有前途的解决方案。虽然基于cnn的方法在局部特征提取方面表现出色，但vit已经成为基于cnn的人的Re-ID的有效替代方案，它提供了通过多头自关注捕获远程依赖关系的能力，而不依赖于卷积和下采样。然而，它仍然面临着诸如光照、视点、姿态、低分辨率和部分遮挡的变化等挑战。为了解决广泛使用的人员Re-ID数据集的局限性并提高泛化能力，我们提出了一种新的人员Re-ID方法，该方法利用ViT网络中的自关注模块增强了全局和局部信息交互。它利用动态修剪有效地提取和优先考虑必要的图像补丁。所设计的人体Re-ID模型的斑块选择和剪枝即使在部分遮挡、背景杂波和光照变化的情况下也能获得鲁棒的特征提取器。实证验证表明，该方法具有较好的性能和跨领域的适应性。

{"title":"Person re-identification transformer with patch attention and pruning","authors":"Fabrice Ndayishimiye , Gang-Joon Yoon , Joonjae Lee , Sang Min Yoon","doi":"10.1016/j.jvcir.2024.104348","DOIUrl":"10.1016/j.jvcir.2024.104348","url":null,"abstract":"<div><div>Person re-identification (Re-ID), which is widely used in surveillance and tracking systems, aims to search individuals as they move between different camera views by maintaining identity across various camera views. In the realm of person re-identification (Re-ID), recent advancements have introduced convolutional neural networks (CNNs) and vision transformers (ViTs) as promising solutions. While CNN-based methods excel in local feature extraction, ViTs have emerged as effective alternatives to CNN-based person Re-ID, offering the ability to capture long-range dependencies through multi-head self-attention without relying on convolution and downsampling. However, it still faces challenges such as changes in illumination, viewpoint, pose, low resolutions, and partial occlusions. To address the limitations of widely used person Re-ID datasets and improve the generalization, we present a novel person Re-ID method that enhances global and local information interactions using self-attention modules within a ViT network. It leverages dynamic pruning to extract and prioritize essential image patches effectively. The designed patch selection and pruning for person Re-ID model resulted in a robust feature extractor even in scenarios with partial occlusion, background clutter, and illumination variations. Empirical validation demonstrates its superior performance compared to previous approaches and its adaptability across various domains.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"106 ","pages":"Article 104348"},"PeriodicalIF":2.6,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142743785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Illumination-guided dual-branch fusion network for partition-based image exposure correction 用于基于分区的图像曝光校正的照度引导双分支融合网络

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-11-22 DOI: 10.1016/j.jvcir.2024.104342

Jianming Zhang, Jia Jiang, Mingshuang Wu, Zhijian Feng, Xiangnan Shi

Images captured in the wild often suffer from issues such as under-exposure, over-exposure, or sometimes a combination of both. These images tend to lose details and texture due to uneven exposure. The majority of image enhancement methods currently focus on correcting either under-exposure or over-exposure, but there are only a few methods available that can effectively handle these two problems simultaneously. In order to address these issues, a novel partition-based exposure correction method is proposed. Firstly, our method calculates the illumination map to generate a partition mask that divides the original image into under-exposed and over-exposed areas. Then, we propose a Transformer-based parameter estimation module to estimate the dual gamma values for partition-based exposure correction. Finally, we introduce a dual-branch fusion module to merge the original image with the exposure-corrected image to obtain the final result. It is worth noting that the illumination map plays a guiding role in both the dual gamma model parameters estimation and the dual-branch fusion. Extensive experiments demonstrate that the proposed method consistently achieves superior performance over state-of-the-art (SOTA) methods on 9 datasets with paired or unpaired samples. Our codes are available at https://github.com/csust7zhangjm/ExposureCorrectionWMS.

在野外拍摄的图像经常会出现曝光不足、曝光过度等问题，有时甚至是两者兼而有之。由于曝光不均，这些图像往往会丢失细节和纹理。目前，大多数图像增强方法都侧重于纠正曝光不足或曝光过度，但能同时有效处理这两个问题的方法却寥寥无几。为了解决这些问题，我们提出了一种新颖的基于分区的曝光校正方法。首先，我们的方法通过计算光照图生成分区掩码，将原始图像划分为曝光不足和曝光过度区域。然后，我们提出了一个基于变换器的参数估计模块，用于估计基于分区的曝光校正所需的双伽马值。最后，我们引入双分支融合模块，将原始图像与曝光校正后的图像合并，得到最终结果。值得注意的是，光照图在双伽马模型参数估计和双分支融合中都起着指导作用。大量实验证明，在 9 个配对或非配对样本数据集上，所提出的方法始终比最先进的（SOTA）方法性能更优。我们的代码见 https://github.com/csust7zhangjm/ExposureCorrectionWMS。

{"title":"Illumination-guided dual-branch fusion network for partition-based image exposure correction","authors":"Jianming Zhang, Jia Jiang, Mingshuang Wu, Zhijian Feng, Xiangnan Shi","doi":"10.1016/j.jvcir.2024.104342","DOIUrl":"10.1016/j.jvcir.2024.104342","url":null,"abstract":"<div><div>Images captured in the wild often suffer from issues such as under-exposure, over-exposure, or sometimes a combination of both. These images tend to lose details and texture due to uneven exposure. The majority of image enhancement methods currently focus on correcting either under-exposure or over-exposure, but there are only a few methods available that can effectively handle these two problems simultaneously. In order to address these issues, a novel partition-based exposure correction method is proposed. Firstly, our method calculates the illumination map to generate a partition mask that divides the original image into under-exposed and over-exposed areas. Then, we propose a Transformer-based parameter estimation module to estimate the dual gamma values for partition-based exposure correction. Finally, we introduce a dual-branch fusion module to merge the original image with the exposure-corrected image to obtain the final result. It is worth noting that the illumination map plays a guiding role in both the dual gamma model parameters estimation and the dual-branch fusion. Extensive experiments demonstrate that the proposed method consistently achieves superior performance over state-of-the-art (SOTA) methods on 9 datasets with paired or unpaired samples. Our codes are available at <span><span>https://github.com/csust7zhangjm/ExposureCorrectionWMS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"106 ","pages":"Article 104342"},"PeriodicalIF":2.6,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142700395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhanced soft domain adaptation for object detection in the dark 增强的软域自适应，用于黑暗环境下的目标检测

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-11-20 DOI: 10.1016/j.jvcir.2024.104337

Yunfei Bai , Chang Liu , Rui Yang , Xiaomao Li

Unlike foggy conditions, domain adaptation is rarely facilitated in dark detection tasks due to the lack of dark datasets. We generate target low-light images via swapping the ring-shaped frequency spectrum of Exdark with Cityscapes, and surprisingly find the promotion is less satisfactory. The root lies in non-transferable alignment that excessively highlights dark backgrounds. To tackle this issue, we propose an Enhanced Soft Domain Adaptation (ESDA) framework to focus on background misalignment. Specifically, Soft Domain Adaptation (SDA) compensates for over-alignment of backgrounds by providing different soft labels for foreground and background samples. The Highlight Foreground (HF), by introducing center sampling, increases the number of high-quality background samples for training. Suppress Background (SB) weakens non-transferable background alignment by replacing foreground scores with backgrounds. Experimental results show SDA combined with HF and SB is sufficiently strengthened and achieves state-of-the-art performance using multiple cross-domain benchmarks. Note that ESDA yields 11.8% relative improvement on the real-world ExDark dataset.

与雾条件不同，由于缺乏暗数据集，在暗检测任务中很少促进域自适应。我们将Exdark的环形频谱与cityscape进行交换，生成目标低光图像，结果令人惊讶地发现提升效果并不理想。根源在于不可转移的对齐，过度突出深色背景。为了解决这个问题，我们提出了一个增强的软域自适应（ESDA）框架来关注背景偏差。具体来说，软域自适应（SDA）通过为前景和背景样本提供不同的软标签来补偿背景的过度对齐。突出前景（HF）通过引入中心采样，增加了用于训练的高质量背景样本的数量。抑制背景（SB）削弱不可转移的背景对齐取代前景分数与背景。实验结果表明，高频和SB结合的SDA得到了充分的增强，并在多个跨域基准测试中达到了最先进的性能。请注意，ESDA在真实世界的ExDark数据集上产生了11.8%的相对改进。

{"title":"Enhanced soft domain adaptation for object detection in the dark","authors":"Yunfei Bai , Chang Liu , Rui Yang , Xiaomao Li","doi":"10.1016/j.jvcir.2024.104337","DOIUrl":"10.1016/j.jvcir.2024.104337","url":null,"abstract":"<div><div>Unlike foggy conditions, domain adaptation is rarely facilitated in dark detection tasks due to the lack of dark datasets. We generate target low-light images via swapping the ring-shaped frequency spectrum of Exdark with Cityscapes, and surprisingly find the promotion is less satisfactory. The root lies in non-transferable alignment that excessively highlights dark backgrounds. To tackle this issue, we propose an Enhanced Soft Domain Adaptation (ESDA) framework to focus on background misalignment. Specifically, Soft Domain Adaptation (SDA) compensates for over-alignment of backgrounds by providing different soft labels for foreground and background samples. The Highlight Foreground (HF), by introducing center sampling, increases the number of high-quality background samples for training. Suppress Background (SB) weakens non-transferable background alignment by replacing foreground scores with backgrounds. Experimental results show SDA combined with HF and SB is sufficiently strengthened and achieves state-of-the-art performance using multiple cross-domain benchmarks. Note that ESDA yields 11.8% relative improvement on the real-world ExDark dataset.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"106 ","pages":"Article 104337"},"PeriodicalIF":2.6,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142743784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HRGUNet: A novel high-resolution generative adversarial network combined with an improved UNet method for brain tumor segmentation HRGUNet：新型高分辨率生成对抗网络与改进的 UNet 方法相结合用于脑肿瘤分割

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-11-19 DOI: 10.1016/j.jvcir.2024.104345

Dongmei Zhou, Hao Luo, Xingyang Li, Shengbing Chen

Brain tumor segmentation in MRI images is challenging due to variability in tumor characteristics and low contrast. We propose HRGUNet, which combines a high-resolution generative adversarial network with an improved UNet architecture to enhance segmentation accuracy. Our proposed GAN model uses an innovative discriminator design that is able to process complete tumor labels as input. This approach can better ensure that the generator produces realistic tumor labels compared to some existing GAN models that only use local features. Additionally, we introduce a Multi-Scale Pyramid Fusion (MSPF) module to improve fine-grained feature extraction and a Refined Channel Attention (RCA) module to enhance focus on tumor regions. In comparative experiments, our method was verified on the BraTS2020 and BraTS2019 data sets, and the average Dice coefficient increased by 1.5% and 1.2% respectively, and the Hausdorff distance decreased by 23.9% and 15.2% respectively, showing its robustness and generalization for segmenting complex tumor structures.

由于肿瘤特征的多变性和低对比度，磁共振成像图像中的脑肿瘤分割具有挑战性。我们提出的 HRGUNet 将高分辨率生成式对抗网络与改进的 UNet 架构相结合，以提高分割精度。我们提出的 GAN 模型采用了创新的判别器设计，能够将完整的肿瘤标签作为输入进行处理。与一些仅使用局部特征的现有 GAN 模型相比，这种方法能更好地确保生成器生成真实的肿瘤标签。此外，我们还引入了多尺度金字塔融合（MSPF）模块来改进细粒度特征提取，并引入了精细通道关注（RCA）模块来加强对肿瘤区域的关注。在对比实验中，我们的方法在 BraTS2020 和 BraTS2019 数据集上得到了验证，平均 Dice 系数分别增加了 1.5% 和 1.2%，Hausdorff 距离分别减少了 23.9% 和 15.2%，显示了该方法在分割复杂肿瘤结构时的鲁棒性和普适性。

{"title":"HRGUNet: A novel high-resolution generative adversarial network combined with an improved UNet method for brain tumor segmentation","authors":"Dongmei Zhou, Hao Luo, Xingyang Li, Shengbing Chen","doi":"10.1016/j.jvcir.2024.104345","DOIUrl":"10.1016/j.jvcir.2024.104345","url":null,"abstract":"<div><div>Brain tumor segmentation in MRI images is challenging due to variability in tumor characteristics and low contrast. We propose HRGUNet, which combines a high-resolution generative adversarial network with an improved UNet architecture to enhance segmentation accuracy. Our proposed GAN model uses an innovative discriminator design that is able to process complete tumor labels as input. This approach can better ensure that the generator produces realistic tumor labels compared to some existing GAN models that only use local features. Additionally, we introduce a Multi-Scale Pyramid Fusion (MSPF) module to improve fine-grained feature extraction and a Refined Channel Attention (RCA) module to enhance focus on tumor regions. In comparative experiments, our method was verified on the BraTS2020 and BraTS2019 data sets, and the average Dice coefficient increased by 1.5% and 1.2% respectively, and the Hausdorff distance decreased by 23.9% and 15.2% respectively, showing its robustness and generalization for segmenting complex tumor structures.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104345"},"PeriodicalIF":2.6,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Panoramic Arbitrary Style Transfer with Deformable Distortion Constraints 全景任意风格转移与变形失真约束

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-11-19 DOI: 10.1016/j.jvcir.2024.104344

Wujian Ye , Yue Wang , Yijun Liu , Wenjie Lin , Xin Xiang

Neural style transfer is a prominent AI technique for creating captivating visual effects and enhancing user experiences. However, most current methods inadequately handle panoramic images, leading to a loss of original visual semantics and emotions due to insufficient structural feature consideration. To address this, a novel panorama arbitrary style transfer method named PAST-Renderer is proposed by integrating deformable convolutions and distortion constraints. The proposed method can dynamically adjust the position of the convolutional kernels according to the geometric structure of the input image, thereby better adapting to the spatial distortions and deformations in panoramic images. Deformable convolutions enable adaptive transformations on a two-dimensional plane, enhancing content and style feature extraction and fusion in panoramic images. Distortion constraints adjust content and style losses, ensuring semantic consistency in salience, edge, and depth of field with the original image. Experimental results show significant improvements, with the PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index Measure) of stylized panoramic images’ semantic maps increasing by approximately 2–4 dB and 0.1–0.3, respectively. Our method PAST-Renderer performs better in both artistic and realistic style transfer, preserving semantic integrity with natural colors, realistic edge details, and rich thematic content.

神经风格转移是一种突出的人工智能技术，用于创造迷人的视觉效果和增强用户体验。然而，目前大多数方法对全景图像的处理不够充分，由于没有充分考虑结构特征，导致原有的视觉语义和情感的丧失。为了解决这一问题，提出了一种新的全景任意样式转换方法PAST-Renderer，该方法将可变形卷积和失真约束相结合。该方法可以根据输入图像的几何结构动态调整卷积核的位置，从而更好地适应全景图像中的空间扭曲和变形。可变形卷积可以在二维平面上进行自适应变换，增强全景图像中的内容和样式特征提取和融合。失真约束调整内容和风格损失，确保与原始图像在显著性、边缘和景深方面的语义一致性。实验结果表明，程式化全景图像语义图的PSNR（峰值信噪比）和SSIM（结构相似指数度量）分别提高了约2-4 dB和0.1-0.3。我们的方法PAST-Renderer在艺术风格和现实风格的转换上都有更好的表现，保留了语义的完整性，自然的色彩，逼真的边缘细节，丰富的主题内容。

{"title":"Panoramic Arbitrary Style Transfer with Deformable Distortion Constraints","authors":"Wujian Ye , Yue Wang , Yijun Liu , Wenjie Lin , Xin Xiang","doi":"10.1016/j.jvcir.2024.104344","DOIUrl":"10.1016/j.jvcir.2024.104344","url":null,"abstract":"<div><div>Neural style transfer is a prominent AI technique for creating captivating visual effects and enhancing user experiences. However, most current methods inadequately handle panoramic images, leading to a loss of original visual semantics and emotions due to insufficient structural feature consideration. To address this, a novel panorama arbitrary style transfer method named PAST-Renderer is proposed by integrating deformable convolutions and distortion constraints. The proposed method can dynamically adjust the position of the convolutional kernels according to the geometric structure of the input image, thereby better adapting to the spatial distortions and deformations in panoramic images. Deformable convolutions enable adaptive transformations on a two-dimensional plane, enhancing content and style feature extraction and fusion in panoramic images. Distortion constraints adjust content and style losses, ensuring semantic consistency in salience, edge, and depth of field with the original image. Experimental results show significant improvements, with the PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index Measure) of stylized panoramic images’ semantic maps increasing by approximately 2–4 dB and 0.1–0.3, respectively. Our method PAST-Renderer performs better in both artistic and realistic style transfer, preserving semantic integrity with natural colors, realistic edge details, and rich thematic content.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"106 ","pages":"Article 104344"},"PeriodicalIF":2.6,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142743787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Underwater image enhancement method via extreme enhancement and ultimate weakening 通过极限增强和终极削弱实现水下图像增强的方法

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-11-16 DOI: 10.1016/j.jvcir.2024.104341

Yang Zhou , Qinghua Su , Zhongbo Hu , Shaojie Jiang

The existing histogram-based methods for underwater image enhancement are prone to over-enhancement, which will affect the analysis of enhanced images. However, an idea that achieves contrast balance by enhancing and weakening the contrast of an image can address the problem. Therefore, an underwater image enhancement method based on extreme enhancement and ultimate weakening (EEUW) is proposed in this paper. This approach comprises two main steps. Firstly, an image with extreme contrast can be achieved by applying grey prediction evolution algorithm (GPE), which is the first time that GPE is introduced into dual-histogram thresholding method to find the optimal segmentation threshold for accurate segmentation. Secondly, a pure gray image can be obtained through a fusion strategy based on the grayscale world assumption to achieve the ultimate weakening. Experiments conducted on three standard underwater image benchmark datasets validate that EEUW outperforms the 10 state-of-the-art methods in improving the contrast of underwater images.

现有的基于直方图的水下图像增强方法容易造成过度增强，从而影响对增强后图像的分析。然而，通过增强和削弱图像对比度来实现对比度平衡的想法可以解决这一问题。因此，本文提出了一种基于极限增强和极限削弱（EEUW）的水下图像增强方法。这种方法包括两个主要步骤。首先，通过应用灰色预测进化算法（GPE）可以获得对比度极高的图像，这也是首次将 GPE 引入双组图阈值法中，从而找到最佳分割阈值，实现精确分割。其次，通过基于灰度世界假设的融合策略可以得到纯灰度图像，实现最终弱化。在三个标准水下图像基准数据集上进行的实验验证了 EEUW 在改善水下图像对比度方面优于 10 种最先进的方法。

引用次数: 0

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-11-12 DOI: 10.1016/j.jvcir.2024.104340

Songhao Zhu, Yi Wang

Few-shot object detection method aims to learn novel classes through a small number of annotated novel class samples without having a catastrophic impact on previously learned knowledge, thereby expanding the trained model’s ability to detect novel classes. For existing few-shot object detection methods, there is a prominent false positive issue for the novel class samples due to the similarity in appearance features and feature distribution between the novel classes and the base classes. That is, the following two issues need to be solved: (1) How to detect these false positive samples in large-scale dataset, and (2) How to utilize the correlations between these false positive samples and other samples to improve the accuracy of the detection model. To address the first issue, an adaptive fusion data augmentation strategy is utilized to enhance the diversity of novel class samples and further alleviate the issue of false positive novel class samples. To address the second issue, a similarity transfer strategy is here proposed to effectively utilize the correlations between different categories. Experimental results demonstrate that the proposed method performs well in various settings of PASCAL VOC and MSCOCO datasets, achieving 48.7 and 11.3 on PASCAL VOC and MSCOCO under few-shot settings (shot = 1) in terms of nAP50 respectively.

少量对象检测方法旨在通过少量标注的新类别样本来学习新类别，而不会对先前学习的知识产生灾难性影响，从而扩大训练模型检测新类别的能力。对于现有的少量物体检测方法来说，由于新类别与基础类别在外观特征和特征分布上的相似性，新类别样本的假阳性问题非常突出。也就是说，需要解决以下两个问题：（1）如何在大规模数据集中检测出这些假阳性样本；（2）如何利用这些假阳性样本与其他样本之间的相关性来提高检测模型的准确性。针对第一个问题，我们采用了一种自适应融合数据增强策略，以增强新类别样本的多样性，进一步缓解新类别样本的假阳性问题。为解决第二个问题，本文提出了一种相似性转移策略，以有效利用不同类别之间的相关性。实验结果表明，所提出的方法在 PASCAL VOC 和 MSCOCO 数据集的各种设置下均表现良好，在 PASCAL VOC 和 MSCOCO 数据集的少镜头设置（镜头 = 1）下，nAP50 分别达到 48.7 和 11.3。

{"title":"Multi-level similarity transfer and adaptive fusion data augmentation for few-shot object detection","authors":"Songhao Zhu, Yi Wang","doi":"10.1016/j.jvcir.2024.104340","DOIUrl":"10.1016/j.jvcir.2024.104340","url":null,"abstract":"<div><div>Few-shot object detection method aims to learn novel classes through a small number of annotated novel class samples without having a catastrophic impact on previously learned knowledge, thereby expanding the trained model’s ability to detect novel classes. For existing few-shot object detection methods, there is a prominent false positive issue for the novel class samples due to the similarity in appearance features and feature distribution between the novel classes and the base classes. That is, the following two issues need to be solved: (1) How to detect these false positive samples in large-scale dataset, and (2) How to utilize the correlations between these false positive samples and other samples to improve the accuracy of the detection model. To address the first issue, an adaptive fusion data augmentation strategy is utilized to enhance the diversity of novel class samples and further alleviate the issue of false positive novel class samples. To address the second issue, a similarity transfer strategy is here proposed to effectively utilize the correlations between different categories. Experimental results demonstrate that the proposed method performs well in various settings of PASCAL VOC and MSCOCO datasets, achieving 48.7 and 11.3 on PASCAL VOC and MSCOCO under few-shot settings (shot = 1) in terms of nAP50 respectively.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104340"},"PeriodicalIF":2.6,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Color image watermarking using vector SNCM-HMT 使用矢量 SNCM-HMT 对彩色图像进行水印处理

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-11-04 DOI: 10.1016/j.jvcir.2024.104339

Hongxin Wang, Runtong Ma, Panpan Niu

An image watermarking scheme is typically evaluated using three main conflicting characteristics: imperceptibility, robustness, and capacity. Developing a good image watermarking method is challenging because it requires a trade-off between these three basic characteristics. In this paper, we proposed a statistical color image watermarking based on robust discrete nonseparable Shearlet transform (DNST)-fast quaternion generic polar complex exponential transform (FQGPCET) magnitude and vector skew-normal-Cauchy mixtures (SNCM)-hidden Markov tree (HMT). The proposed watermarking system consists of two main parts: watermark inserting and watermark extraction. In watermark inserting, we first perform DNST on R, G, and B components of color host image, respectively. We then compute block FQGPCET of DNST domain color components, and embed watermark signal in DNST-FQGPCET magnitudes using multiplicative approach. In watermark extraction, we first analyze the robustness and statistical characteristics of local DNST-FQGPCET magnitudes of color image. We then observe that, vector SNCM-HMT model can capture accurately the marginal distribution and multiple strong dependencies of local DNST-FQGPCET magnitudes. Meanwhile, vector SNCM-HMT parameters can be computed effectively using variational expectation–maximization (VEM) parameter estimation. Motivated by our modeling results, we finally develop a new statistical color image watermark decoder based on vector SNCM-HMT and maximum likelihood (ML) decision rule. Experimental results on extensive test images demonstrate that the proposed statistical color image watermarking provides a performance better than that of most of the state-of-the-art statistical methods and some deep learning approaches recently proposed in the literature.

图像水印方案通常使用三个相互冲突的主要特征进行评估：不可感知性、鲁棒性和容量。开发一种好的图像水印方法具有挑战性，因为它需要在这三个基本特性之间进行权衡。在本文中，我们提出了一种基于鲁棒离散非可分剪切变换（DNST）-快速四元泛极性复指数变换（FQGPCET）幅度和矢量偏斜-正态-考奇混合物（SNCM）-隐藏马尔可夫树（HMT）的统计彩色图像水印。拟议的水印系统包括两个主要部分：水印插入和水印提取。在插入水印时，我们首先分别对彩色主图像的 R、G 和 B 分量执行 DNST。然后，我们计算 DNST 域彩色分量的块 FQGPCET，并使用乘法方法将水印信号嵌入 DNST-FQGPCET 幅值中。在提取水印时，我们首先分析了彩色图像局部 DNST-FQGPCET 幅值的鲁棒性和统计特征。结果表明，矢量 SNCM-HMT 模型能准确捕捉局部 DNST-FQGPCET 幅值的边际分布和多重强依赖关系。同时，矢量 SNCM-HMT 参数可通过变分期望最大化（VEM）参数估计法有效计算。在建模结果的激励下，我们最终开发出一种基于向量 SNCM-HMT 和最大似然 (ML) 决策规则的新型统计彩色图像水印解码器。在大量测试图像上的实验结果表明，所提出的统计彩色图像水印的性能优于大多数最先进的统计方法和最近在文献中提出的一些深度学习方法。

{"title":"Color image watermarking using vector SNCM-HMT","authors":"Hongxin Wang, Runtong Ma, Panpan Niu","doi":"10.1016/j.jvcir.2024.104339","DOIUrl":"10.1016/j.jvcir.2024.104339","url":null,"abstract":"<div><div>An image watermarking scheme is typically evaluated using three main conflicting characteristics: imperceptibility, robustness, and capacity. Developing a good image watermarking method is challenging because it requires a trade-off between these three basic characteristics. In this paper, we proposed a statistical color image watermarking based on robust discrete nonseparable Shearlet transform (DNST)-fast quaternion generic polar complex exponential transform (FQGPCET) magnitude and vector skew-normal-Cauchy mixtures (SNCM)-hidden Markov tree (HMT). The proposed watermarking system consists of two main parts: watermark inserting and watermark extraction. In watermark inserting, we first perform DNST on R, G, and B components of color host image, respectively. We then compute block FQGPCET of DNST domain color components, and embed watermark signal in DNST-FQGPCET magnitudes using multiplicative approach. In watermark extraction, we first analyze the robustness and statistical characteristics of local DNST-FQGPCET magnitudes of color image. We then observe that, vector SNCM-HMT model can capture accurately the marginal distribution and multiple strong dependencies of local DNST-FQGPCET magnitudes. Meanwhile, vector SNCM-HMT parameters can be computed effectively using variational expectation–maximization (VEM) parameter estimation. Motivated by our modeling results, we finally develop a new statistical color image watermark decoder based on vector SNCM-HMT and maximum likelihood (ML) decision rule. Experimental results on extensive test images demonstrate that the proposed statistical color image watermarking provides a performance better than that of most of the state-of-the-art statistical methods and some deep learning approaches recently proposed in the literature.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104339"},"PeriodicalIF":2.6,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A memory access number constraint-based string prediction technique for high throughput SCC implemented in AVS3 在 AVS3 中实现基于内存访问数约束的字符串预测技术，以实现高吞吐量 SCC

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-11-03 DOI: 10.1016/j.jvcir.2024.104338

Liping Zhao , Zuge Yan , Keli Hu , Sheng Feng , Jiangda Wang , Xueyan Cao , Tao Lin

String prediction (SP) is a highly efficient screen content coding (SCC) tool that has been adopted in international and Chinese video coding standards. SP exhibits a highly flexible and efficient ability to predict repetitive matching patterns. However, SP also suffers from low throughput of decoded display output pixels per memory access, which is synchronized with the decoder clock, due to the high number of memory accesses required to decode an SP coding unit for display. Even in state-of-the-art (SOTA) SP, the worst-case scenario involves two memory accesses for decoding each 4-pixel basic string unit across two memory access units, resulting in a throughput as low as two pixels per memory access (PPMA). To solve this problem, we are the first to propose a technique called memory access number constraint-based string prediction (MANC-SP) to achieve high throughput in SCC. First, a novel MANC-SP framework is proposed, a well-designed memory access number constraint rule is established on the basis of statistical data, and a constrained RDO-based string searching method is presented. Compared with the existing SOTA SP, the experimental results demonstrate that MANC-SP can improve the throughput from 2 to 2.67 PPMA, achieving a throughput improvement of 33.33% while maintaining a negligible impact on coding efficiency and complexity.

字符串预测（SP）是一种高效的屏幕内容编码（SCC）工具，已被国际和中国的视频编码标准所采用。SP 具有高度灵活和高效的预测重复匹配模式的能力。然而，SP 也存在每次内存访问（与解码器时钟同步）的解码显示输出像素吞吐量低的问题，这是因为对 SP 编码单元进行解码显示需要大量的内存访问。即使在最先进的（SOTA）SP 中，最糟糕的情况也是在两个存储器访问单元中对每个 4 像素基本字符串单元进行解码时需要两次存储器访问，导致每次存储器访问的吞吐量低至两个像素。为解决这一问题，我们首次提出了一种称为基于内存访问数约束的字符串预测（MANC-SP）的技术，以实现 SCC 的高吞吐量。首先，我们提出了一个新颖的 MANC-SP 框架，在统计数据的基础上建立了一个精心设计的内存访问数约束规则，并提出了一种基于 RDO 约束的字符串搜索方法。实验结果表明，与现有的 SOTA SP 相比，MANC-SP 可将吞吐量从 2 PPMA 提高到 2.67 PPMA，吞吐量提高了 33.33%，同时对编码效率和复杂度的影响几乎可以忽略不计。

{"title":"A memory access number constraint-based string prediction technique for high throughput SCC implemented in AVS3","authors":"Liping Zhao , Zuge Yan , Keli Hu , Sheng Feng , Jiangda Wang , Xueyan Cao , Tao Lin","doi":"10.1016/j.jvcir.2024.104338","DOIUrl":"10.1016/j.jvcir.2024.104338","url":null,"abstract":"<div><div>String prediction (SP) is a highly efficient screen content coding (SCC) tool that has been adopted in international and Chinese video coding standards. SP exhibits a highly flexible and efficient ability to predict repetitive matching patterns. However, SP also suffers from low throughput of decoded display output pixels per memory access, which is synchronized with the decoder clock, due to the high number of memory accesses required to decode an SP coding unit for display. Even in state-of-the-art (SOTA) SP, the worst-case scenario involves two memory accesses for decoding each 4-pixel basic string unit across two memory access units, resulting in a throughput as low as two pixels per memory access (PPMA). To solve this problem, we are the first to propose a technique called memory access number constraint-based string prediction (MANC-SP) to achieve high throughput in SCC. First, a novel MANC-SP framework is proposed, a well-designed memory access number constraint rule is established on the basis of statistical data, and a constrained RDO-based string searching method is presented. Compared with the existing SOTA SP, the experimental results demonstrate that MANC-SP can improve the throughput from 2 to 2.67 PPMA, achieving a throughput improvement of <strong>33.33%</strong> while maintaining a negligible impact on coding efficiency and complexity.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104338"},"PeriodicalIF":2.6,"publicationDate":"2024-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Faster-slow network fused with enhanced fine-grained features for action recognition 快慢网络融合增强型细粒度特征进行动作识别

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-10-30 DOI: 10.1016/j.jvcir.2024.104328

Xuegang Wu , Jiawei Zhu , Liu Yang

Two-stream methods, which separate human actions and backgrounds into temporal and spatial streams visually, have shown promising results in action recognition datasets. However, prior researches emphasize motion modeling but overlook the robust correlation between motion features and spatial information, causing restriction of the model’s ability to recognize behaviors entailing occlusions or rapid changes. Therefore, we introduce Faster-slow, an improved framework for frame-level motion features. It introduces a Behavioural Feature Enhancement (BFE) module based on a novel two-stream network with different temporal resolutions. BFE consists of two components: MM, which incorporates motion-aware attention to capture dependencies between adjacent frames; STC, which enhances spatio-temporal and channel information to generate optimized features. Overall, BFE facilitates the extraction of finer-grained motion information, while ensuring a stable fusion of information across both streams. We evaluate the Faster-slow on the Atomic Visual Actions dataset, and the Faster-AVA dataset constructed in this paper, yielding promising experimental results.

双流法将人的动作和背景以视觉方式分为时间流和空间流，这种方法在动作识别数据集中显示出良好的效果。然而，之前的研究强调运动建模，却忽视了运动特征与空间信息之间的强相关性，从而限制了模型识别包含遮挡或快速变化的行为的能力。因此，我们引入了帧级运动特征改进框架 Faster-slow。它基于具有不同时间分辨率的新型双流网络，引入了行为特征增强（BFE）模块。BFE 由两个部分组成：MM：结合运动感知注意力，捕捉相邻帧之间的依赖关系；STC：增强时空信息和信道信息，生成优化特征。总之，BFE 可帮助提取更精细的运动信息，同时确保两个数据流信息的稳定融合。我们在 Atomic Visual Actions 数据集和本文构建的 Faster-AVA 数据集上对 Faster-slow 进行了评估，取得了令人满意的实验结果。

{"title":"Faster-slow network fused with enhanced fine-grained features for action recognition","authors":"Xuegang Wu , Jiawei Zhu , Liu Yang","doi":"10.1016/j.jvcir.2024.104328","DOIUrl":"10.1016/j.jvcir.2024.104328","url":null,"abstract":"<div><div>Two-stream methods, which separate human actions and backgrounds into temporal and spatial streams visually, have shown promising results in action recognition datasets. However, prior researches emphasize motion modeling but overlook the robust correlation between motion features and spatial information, causing restriction of the model’s ability to recognize behaviors entailing occlusions or rapid changes. Therefore, we introduce Faster-slow, an improved framework for frame-level motion features. It introduces a Behavioural Feature Enhancement (BFE) module based on a novel two-stream network with different temporal resolutions. BFE consists of two components: MM, which incorporates motion-aware attention to capture dependencies between adjacent frames; STC, which enhances spatio-temporal and channel information to generate optimized features. Overall, BFE facilitates the extraction of finer-grained motion information, while ensuring a stable fusion of information across both streams. We evaluate the Faster-slow on the Atomic Visual Actions dataset, and the Faster-AVA dataset constructed in this paper, yielding promising experimental results.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104328"},"PeriodicalIF":2.6,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0