Eurasip Journal on Image and Video Processing最新文献_第3页

Assessment framework for deepfake detection in real-world situations 真实世界中深度伪造检测的评估框架

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-02-13 DOI: 10.1186/s13640-024-00621-8

Yuhang Lu, Touradj Ebrahimi

Detecting digital face manipulation in images and video has attracted extensive attention due to the potential risk to public trust. To counteract the malicious usage of such techniques, deep learning-based deepfake detection methods have been employed and have exhibited remarkable performance. However, the performance of such detectors is often assessed on related benchmarks that hardly reflect real-world situations. For example, the impact of various image and video processing operations and typical workflow distortions on detection accuracy has not been systematically measured. In this paper, a more reliable assessment framework is proposed to evaluate the performance of learning-based deepfake detectors in more realistic settings. To the best of our acknowledgment, it is the first systematic assessment approach for deepfake detectors that not only reports the general performance under real-world conditions but also quantitatively measures their robustness toward different processing operations. To demonstrate the effectiveness and usage of the framework, extensive experiments and detailed analysis of four popular deepfake detection methods are further presented in this paper. In addition, a stochastic degradation-based data augmentation method driven by realistic processing operations is designed, which significantly improves the robustness of deepfake detectors.

由于对公众信任的潜在风险，检测图像和视频中的数字人脸操纵引起了广泛关注。为了抵制此类技术的恶意使用，人们采用了基于深度学习的深度伪造检测方法，并取得了显著的效果。然而，这类检测器的性能通常是在相关基准上评估的，很难反映真实世界的情况。例如，各种图像和视频处理操作以及典型的工作流程失真对检测精度的影响尚未得到系统测量。本文提出了一个更可靠的评估框架，以评估基于学习的深度防伪检测器在更真实环境中的性能。据我们所知，这是第一种系统性的深度伪造检测器评估方法，不仅能报告真实世界条件下的一般性能，还能定量测量它们对不同处理操作的鲁棒性。为了证明该框架的有效性和用途，本文进一步介绍了对四种流行的深度伪造检测方法的广泛实验和详细分析。此外，本文还设计了一种基于随机退化的数据增强方法，该方法由现实处理操作驱动，可显著提高深度伪造检测器的鲁棒性。

{"title":"Assessment framework for deepfake detection in real-world situations","authors":"Yuhang Lu, Touradj Ebrahimi","doi":"10.1186/s13640-024-00621-8","DOIUrl":"https://doi.org/10.1186/s13640-024-00621-8","url":null,"abstract":"Detecting digital face manipulation in images and video has attracted extensive attention due to the potential risk to public trust. To counteract the malicious usage of such techniques, deep learning-based deepfake detection methods have been employed and have exhibited remarkable performance. However, the performance of such detectors is often assessed on related benchmarks that hardly reflect real-world situations. For example, the impact of various image and video processing operations and typical workflow distortions on detection accuracy has not been systematically measured. In this paper, a more reliable assessment framework is proposed to evaluate the performance of learning-based deepfake detectors in more realistic settings. To the best of our acknowledgment, it is the first systematic assessment approach for deepfake detectors that not only reports the general performance under real-world conditions but also quantitatively measures their robustness toward different processing operations. To demonstrate the effectiveness and usage of the framework, extensive experiments and detailed analysis of four popular deepfake detection methods are further presented in this paper. In addition, a stochastic degradation-based data augmentation method driven by realistic processing operations is designed, which significantly improves the robustness of deepfake detectors.","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"46 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Edge-aware nonlinear diffusion-driven regularization model for despeckling synthetic aperture radar images 用于合成孔径雷达图像去斑的边缘感知非线性扩散驱动正则化模型

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-01-11 DOI: 10.1186/s13640-023-00617-w

Anthony Bua, Goodluck Kapyela, Libe Massawe, Baraka Maiseli

Speckle noise corrupts synthetic aperture radar (SAR) images and limits their applications in sensitive scientific and engineering fields. This challenge has attracted several scholars because of the wide demand of SAR images in forestry, oceanography, geology, glaciology, and topography. Despite some significant efforts to address the challenge, an open-ended research question remains to simultaneously suppress speckle noise and to restore semantic features in SAR images. Therefore, this work establishes a diffusion-driven nonlinear method with edge-awareness capabilities to restore corrupted SAR images while protecting critical image features, such as contours and textures. The proposed method incorporates two terms that promote effective noise removal: (1) high-order diffusion kernel; and (2) fractional regularization term that is sensitive to speckle noise. These terms have been carefully designed to ensure that the restored SAR images contain stronger edges and well-preserved textures. Empirical results show that the proposed model produces content-rich images with higher subjective and objective values. Furthermore, our model generates images with unnoticeable staircase and block artifacts, which are commonly found in the classical Perona–Malik and Total variation models.

斑点噪声会破坏合成孔径雷达（SAR）图像，限制其在敏感的科学和工程领域的应用。由于林业、海洋学、地质学、冰川学和地形学等领域对合成孔径雷达图像的广泛需求，这一难题吸引了众多学者的研究。尽管为应对这一挑战做出了巨大努力，但如何同时抑制斑点噪声和恢复合成孔径雷达图像的语义特征，仍是一个有待解决的研究课题。因此，本研究建立了一种具有边缘感知能力的扩散驱动非线性方法，用于恢复损坏的合成孔径雷达图像，同时保护关键的图像特征，如轮廓和纹理。所提出的方法包含两个促进有效去噪的术语：(1) 高阶扩散核；(2) 对斑点噪声敏感的分数正则化术语。这些术语都经过精心设计，以确保修复后的合成孔径雷达图像包含更强的边缘和保存完好的纹理。实证结果表明，所提出的模型能生成内容丰富的图像，具有更高的主观和客观值。此外，我们的模型生成的图像具有不易察觉的阶梯和块状伪影，而这些伪影通常出现在经典的 Perona-Malik 模型和总变异模型中。

{"title":"Edge-aware nonlinear diffusion-driven regularization model for despeckling synthetic aperture radar images","authors":"Anthony Bua, Goodluck Kapyela, Libe Massawe, Baraka Maiseli","doi":"10.1186/s13640-023-00617-w","DOIUrl":"https://doi.org/10.1186/s13640-023-00617-w","url":null,"abstract":"Speckle noise corrupts synthetic aperture radar (SAR) images and limits their applications in sensitive scientific and engineering fields. This challenge has attracted several scholars because of the wide demand of SAR images in forestry, oceanography, geology, glaciology, and topography. Despite some significant efforts to address the challenge, an open-ended research question remains to simultaneously suppress speckle noise and to restore semantic features in SAR images. Therefore, this work establishes a diffusion-driven nonlinear method with edge-awareness capabilities to restore corrupted SAR images while protecting critical image features, such as contours and textures. The proposed method incorporates two terms that promote effective noise removal: (1) high-order diffusion kernel; and (2) fractional regularization term that is sensitive to speckle noise. These terms have been carefully designed to ensure that the restored SAR images contain stronger edges and well-preserved textures. Empirical results show that the proposed model produces content-rich images with higher subjective and objective values. Furthermore, our model generates images with unnoticeable staircase and block artifacts, which are commonly found in the classical Perona–Malik and Total variation models.","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"104 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139422462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimodal few-shot classification without attribute embedding 无属性嵌入的多模态少镜头分类

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-01-10 DOI: 10.1186/s13640-024-00620-9

Jun Qing Chang, Deepu Rajan, Nicholas Vun

Multimodal few-shot learning aims to exploit complementary information inherent in multiple modalities for vision tasks in low data scenarios. Most of the current research focuses on a suitable embedding space for the various modalities. While solutions based on embedding provide state-of-the-art results, they reduce the interpretability of the model. Separate visualization approaches enable the models to become more transparent. In this paper, a multimodal few-shot learning framework that is inherently interpretable is presented. This is achieved by using the textual modality in the form of attributes without embedding them. This enables the model to directly explain which attributes caused it to classify an image into a particular class. The model consists of a variational autoencoder to learn the visual latent representation, which is combined with a semantic latent representation that is learnt from a normal autoencoder, which calculates a semantic loss between the latent representation and a binary attribute vector. A decoder reconstructs the original image from concatenated latent vectors. The proposed model outperforms other multimodal methods when all test classes are used, e.g., 50 classes in a 50-way 1-shot setting, and is comparable for lesser number of ways. Since raw text attributes are used, the datasets for evaluation are CUB, SUN and AWA2. The effectiveness of interpretability provided by the model is evaluated by analyzing how well it has learnt to identify the attributes.

多模态少量学习旨在利用多种模态固有的互补信息，在数据量较少的情况下完成视觉任务。目前的大部分研究都集中在为各种模态寻找合适的嵌入空间。虽然基于嵌入的解决方案能提供最先进的结果，但却降低了模型的可解释性。单独的可视化方法可以使模型更加透明。本文介绍了一种本质上可解释的多模态少量学习框架。这是通过以属性的形式使用文本模态而不嵌入属性来实现的。这样，模型就能直接解释是哪些属性导致其将图像归入特定类别。该模型由一个学习视觉潜表征的变异自动编码器和一个从普通自动编码器学习的语义潜表征组成，后者计算潜表征和二进制属性向量之间的语义损失。解码器从连接的潜在向量中重建原始图像。当使用所有测试类别时，例如在 50 路 1 次拍摄设置中使用 50 个类别时，所提出的模型优于其他多模态方法，而在使用较少的方式时，其性能也不相上下。由于使用的是原始文本属性，因此评估数据集为 CUB、SUN 和 AWA2。通过分析模型在识别属性方面的学习效果，来评估模型提供的可解释性的有效性。

{"title":"Multimodal few-shot classification without attribute embedding","authors":"Jun Qing Chang, Deepu Rajan, Nicholas Vun","doi":"10.1186/s13640-024-00620-9","DOIUrl":"https://doi.org/10.1186/s13640-024-00620-9","url":null,"abstract":"Multimodal few-shot learning aims to exploit complementary information inherent in multiple modalities for vision tasks in low data scenarios. Most of the current research focuses on a suitable embedding space for the various modalities. While solutions based on embedding provide state-of-the-art results, they reduce the interpretability of the model. Separate visualization approaches enable the models to become more transparent. In this paper, a multimodal few-shot learning framework that is inherently interpretable is presented. This is achieved by using the textual modality in the form of attributes without embedding them. This enables the model to directly explain which attributes caused it to classify an image into a particular class. The model consists of a variational autoencoder to learn the visual latent representation, which is combined with a semantic latent representation that is learnt from a normal autoencoder, which calculates a semantic loss between the latent representation and a binary attribute vector. A decoder reconstructs the original image from concatenated latent vectors. The proposed model outperforms other multimodal methods when all test classes are used, e.g., 50 classes in a 50-way 1-shot setting, and is comparable for lesser number of ways. Since raw text attributes are used, the datasets for evaluation are CUB, SUN and AWA2. The effectiveness of interpretability provided by the model is evaluated by analyzing how well it has learnt to identify the attributes.","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"4 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139422411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Secure image transmission through LTE wireless communications systems 通过 LTE 无线通信系统安全传输图像

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-01-10 DOI: 10.1186/s13640-024-00619-2

Farouk Abduh Kamil Al-Fahaidy, Radwan AL-Bouthigy, Mohammad Yahya H. Al-Shamri, Safwan Abdulkareem

Secure transmission of images over wireless communications systems can be done using RSA, the most known and efficient cryptographic algorithm, and OFDMA, the most preferred signal processing choice in wireless communications. This paper aims to investigate the performance of OFDMA system for wireless transmission of RSA-based encrypted images. In fact, the performance of OFDMA systems; based on different signal processing techniques, such as, discrete sine transforms (DST) and discrete cosine transforms (DCT), as well as the conventional discrete Fourier transforms (DFT) are tested for wireless transmission of gray-scale images with/without RSA encryption. The progress of transmitting the image is carried by firstly, encrypting the image with RSA algorithm. Then, the encrypted image is modulated with DFT-based, DCT-based, and DST-based OFDMA systems. After that, the modulated images are transmitted over a wireless multipath fading channel. The reverse operations will be carried at the receiver, in addition to the frequency domain equalization to overcome the channel effect. Exhaustive numbers of scenarios are performed for study and investigation of the performance of the different OFDMA systems in terms of PSNR and MSE, with different subcarriers mapping and modulation techniques, is done. Results indicate that the ability of different OFDMA systems for wireless secure transmission of images. However, the DCT-OFDMA system showed superiority over the DST-OFDMA and the conventional DFT-OFDMA systems.

通过无线通信系统安全传输图像可以使用 RSA（最著名、最高效的加密算法）和 OFDMA（无线通信中最受欢迎的信号处理选择）。本文旨在研究 OFDMA 系统在无线传输基于 RSA 的加密图像时的性能。事实上，本文测试了基于不同信号处理技术（如离散正弦变换（DST）和离散余弦变换（DCT）以及传统的离散傅里叶变换（DFT））的 OFDMA 系统在无线传输带/不带 RSA 加密的灰度图像时的性能。传输图像时，首先使用 RSA 算法对图像进行加密。然后，用基于 DFT、DCT 和 DST 的 OFDMA 系统对加密图像进行调制。然后，通过无线多径衰减信道传输调制后的图像。除了频域均衡以克服信道效应外，接收器还将进行反向操作。在研究过程中，对不同的子载波映射和调制技术下的不同 OFDMA 系统的 PSNR 和 MSE 性能进行了详尽的研究和调查。结果表明，不同的 OFDMA 系统都能实现图像的无线安全传输。不过，DCT-OFDMA 系统优于 DST-OFDMA 和传统的 DFT-OFDMA 系统。

{"title":"Secure image transmission through LTE wireless communications systems","authors":"Farouk Abduh Kamil Al-Fahaidy, Radwan AL-Bouthigy, Mohammad Yahya H. Al-Shamri, Safwan Abdulkareem","doi":"10.1186/s13640-024-00619-2","DOIUrl":"https://doi.org/10.1186/s13640-024-00619-2","url":null,"abstract":"Secure transmission of images over wireless communications systems can be done using RSA, the most known and efficient cryptographic algorithm, and OFDMA, the most preferred signal processing choice in wireless communications. This paper aims to investigate the performance of OFDMA system for wireless transmission of RSA-based encrypted images. In fact, the performance of OFDMA systems; based on different signal processing techniques, such as, discrete sine transforms (DST) and discrete cosine transforms (DCT), as well as the conventional discrete Fourier transforms (DFT) are tested for wireless transmission of gray-scale images with/without RSA encryption. The progress of transmitting the image is carried by firstly, encrypting the image with RSA algorithm. Then, the encrypted image is modulated with DFT-based, DCT-based, and DST-based OFDMA systems. After that, the modulated images are transmitted over a wireless multipath fading channel. The reverse operations will be carried at the receiver, in addition to the frequency domain equalization to overcome the channel effect. Exhaustive numbers of scenarios are performed for study and investigation of the performance of the different OFDMA systems in terms of PSNR and MSE, with different subcarriers mapping and modulation techniques, is done. Results indicate that the ability of different OFDMA systems for wireless secure transmission of images. However, the DCT-OFDMA system showed superiority over the DST-OFDMA and the conventional DFT-OFDMA systems.","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"14 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139422457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An optimized capsule neural networks for tomato leaf disease classification 用于番茄叶病分类的优化胶囊神经网络

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-01-08 DOI: 10.1186/s13640-023-00618-9

Lobna M. Abouelmagd, Mahmoud Y. Shams, Hanaa Salem Marie, Aboul Ella Hassanien

Plant diseases have a significant impact on leaves, with each disease exhibiting specific spots characterized by unique colors and locations. Therefore, it is crucial to develop a method for detecting these diseases based on spot shape, color, and location within the leaves. While Convolutional Neural Networks (CNNs) have been widely used in deep learning applications, they suffer from limitations in capturing relative spatial and orientation relationships. This paper presents a computer vision methodology that utilizes an optimized capsule neural network (CapsNet) to detect and classify ten tomato leaf diseases using standard dataset images. To mitigate overfitting, data augmentation, and preprocessing techniques were employed during the training phase. CapsNet was chosen over CNNs due to its superior ability to capture spatial positioning within the image. The proposed CapsNet approach achieved an accuracy of 96.39% with minimal loss, relying on a 0.00001 Adam optimizer. By comparing the results with existing state-of-the-art approaches, the study demonstrates the effectiveness of CapsNet in accurately identifying and classifying tomato leaf diseases based on spot shape, color, and location. The findings highlight the potential of CapsNet as an alternative to CNNs for improving disease detection and classification in plant pathology research.

植物病害对叶片的影响很大，每种病害都会表现出特定的病斑，这些病斑具有独特的颜色和位置。因此，根据病斑的形状、颜色和在叶片上的位置开发一种检测这些病害的方法至关重要。虽然卷积神经网络（CNN）已在深度学习应用中得到广泛应用，但它们在捕捉相对空间和方向关系方面存在局限性。本文介绍了一种计算机视觉方法，该方法利用优化的胶囊神经网络（CapsNet），使用标准数据集图像对十种番茄叶片病害进行检测和分类。为减少过拟合，在训练阶段采用了数据增强和预处理技术。之所以选择 CapsNet 而不是 CNN，是因为 CapsNet 在捕捉图像中的空间定位方面能力出众。所提出的 CapsNet 方法依靠 0.00001 Adam 优化器，以最小的损失达到了 96.39% 的准确率。通过将结果与现有的最先进方法进行比较，该研究证明了 CapsNet 在根据斑点形状、颜色和位置对番茄叶片病害进行准确识别和分类方面的有效性。研究结果凸显了 CapsNet 在植物病理学研究中作为 CNNs 替代品改善病害检测和分类的潜力。

{"title":"An optimized capsule neural networks for tomato leaf disease classification","authors":"Lobna M. Abouelmagd, Mahmoud Y. Shams, Hanaa Salem Marie, Aboul Ella Hassanien","doi":"10.1186/s13640-023-00618-9","DOIUrl":"https://doi.org/10.1186/s13640-023-00618-9","url":null,"abstract":"Plant diseases have a significant impact on leaves, with each disease exhibiting specific spots characterized by unique colors and locations. Therefore, it is crucial to develop a method for detecting these diseases based on spot shape, color, and location within the leaves. While Convolutional Neural Networks (CNNs) have been widely used in deep learning applications, they suffer from limitations in capturing relative spatial and orientation relationships. This paper presents a computer vision methodology that utilizes an optimized capsule neural network (CapsNet) to detect and classify ten tomato leaf diseases using standard dataset images. To mitigate overfitting, data augmentation, and preprocessing techniques were employed during the training phase. CapsNet was chosen over CNNs due to its superior ability to capture spatial positioning within the image. The proposed CapsNet approach achieved an accuracy of 96.39% with minimal loss, relying on a 0.00001 Adam optimizer. By comparing the results with existing state-of-the-art approaches, the study demonstrates the effectiveness of CapsNet in accurately identifying and classifying tomato leaf diseases based on spot shape, color, and location. The findings highlight the potential of CapsNet as an alternative to CNNs for improving disease detection and classification in plant pathology research.","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"29 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139396774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-layer features template update object tracking algorithm based on SiamFC++ 基于 SiamFC++ 的多层特征模板更新物体跟踪算法

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-01-04 DOI: 10.1186/s13640-023-00616-x

Xiaofeng Lu, Xuan Wang, Zhengyang Wang, Xinhong Hei

SiamFC++ only extracts the object feature of the first frame as a tracking template, and only uses the highest level feature maps in both the classification branch and the regression branch, so that the respective characteristics of the two branches are not fully utilized. In view of this, the present paper proposes an object tracking algorithm based on SiamFC++. The algorithm uses the multi-layer features of the Siamese network to update template. First, FPN is used to extract feature maps from different layers of Backbone for classification branch and regression branch. Second, 3D convolution is used to update the tracking template of the object tracking algorithm. Next, a template update judgment condition is proposed based on mutual information. Finally, AlexNet is used as the backbone and GOT-10K as training set. Compared with SiamFC++, our algorithm obtains improved results on OTB100, VOT2016, VOT2018 and GOT-10k data sets, and the tracking process is real time.

SiamFC++ 只提取第一帧的物体特征作为跟踪模板，在分类分支和回归分支中都只使用最高级别的特征图，这样就不能充分利用两个分支各自的特点。有鉴于此，本文提出了一种基于 SiamFC++ 的物体跟踪算法。该算法利用连体网络的多层特征来更新模板。首先，利用 FPN 从 Backbone 的不同层提取特征图，用于分类分支和回归分支。其次，利用三维卷积更新物体跟踪算法的跟踪模板。接着，提出了基于互信息的模板更新判断条件。最后，使用 AlexNet 作为骨干网，GOT-10K 作为训练集。与 SiamFC++ 相比，我们的算法在 OTB100、VOT2016、VOT2018 和 GOT-10k 数据集上获得了更好的结果，并且跟踪过程是实时的。

引用次数: 0

Subjective performance evaluation of bitrate allocation strategies for MPEG and JPEG Pleno point cloud compression. MPEG 和 JPEG Pleno 点云压缩比特率分配策略的主观性能评估。

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-01-01 Epub Date: 2024-06-11 DOI: 10.1186/s13640-024-00629-0

Davi Lazzarotto, Michela Testolina, Touradj Ebrahimi

The recent rise in interest in point clouds as an imaging modality has motivated standardization groups such as JPEG and MPEG to launch activities aiming at developing compression standards for point clouds. Lossy compression usually introduces visual artifacts that negatively impact the perceived quality of media, which can only be reliably measured through subjective visual quality assessment experiments. While MPEG standards have been subjectively evaluated in previous studies on multiple occasions, no work has yet assessed the performance of the recent JPEG Pleno standard in comparison to them. In this study, a comprehensive performance evaluation of JPEG and MPEG standards for point cloud compression is conducted. The impact of different configuration parameters on the performance of the codecs is first analyzed with the help of objective quality metrics. The results from this analysis are used to define three rate allocation strategies for each codec, which are employed to compress a set of point clouds at four target rates. The set of distorted point clouds is then subjectively evaluated following two subjective quality assessment protocols. Finally, the obtained results are used to compare the performance of these compression standards and draw insights about best coding practices.

近来，人们对点云这种成像方式的兴趣日益浓厚，促使 JPEG 和 MPEG 等标准化组织发起了旨在制定点云压缩标准的活动。有损压缩通常会引入视觉伪影，对媒体的感知质量产生负面影响，而这只能通过主观视觉质量评估实验进行可靠测量。虽然 MPEG 标准已在之前的研究中多次进行了主观评估，但还没有研究对最近的 JPEG Pleno 标准的性能进行评估。本研究对 JPEG 和 MPEG 标准的点云压缩性能进行了全面评估。首先借助客观质量指标分析了不同配置参数对编解码器性能的影响。分析结果被用于为每种编解码器定义三种速率分配策略，这些策略被用于以四种目标速率压缩一组点云。然后，根据两个主观质量评估协议对这组失真点云进行主观评估。最后，利用获得的结果来比较这些压缩标准的性能，并得出最佳编码实践的见解。

{"title":"Subjective performance evaluation of bitrate allocation strategies for MPEG and JPEG Pleno point cloud compression.","authors":"Davi Lazzarotto, Michela Testolina, Touradj Ebrahimi","doi":"10.1186/s13640-024-00629-0","DOIUrl":"10.1186/s13640-024-00629-0","url":null,"abstract":"The recent rise in interest in point clouds as an imaging modality has motivated standardization groups such as JPEG and MPEG to launch activities aiming at developing compression standards for point clouds. Lossy compression usually introduces visual artifacts that negatively impact the perceived quality of media, which can only be reliably measured through subjective visual quality assessment experiments. While MPEG standards have been subjectively evaluated in previous studies on multiple occasions, no work has yet assessed the performance of the recent JPEG Pleno standard in comparison to them. In this study, a comprehensive performance evaluation of JPEG and MPEG standards for point cloud compression is conducted. The impact of different configuration parameters on the performance of the codecs is first analyzed with the help of objective quality metrics. The results from this analysis are used to define three rate allocation strategies for each codec, which are employed to compress a set of point clouds at four target rates. The set of distorted point clouds is then subjectively evaluated following two subjective quality assessment protocols. Finally, the obtained results are used to compare the performance of these compression standards and draw insights about best coding practices.","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"2024 1","pages":"14"},"PeriodicalIF":2.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11166754/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141318743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learned scalable video coding for humans and machines. 为人类和机器学习可扩展的视频编码。

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-01-01 Epub Date: 2024-11-14 DOI: 10.1186/s13640-024-00657-w

Hadi Hadizadeh, Ivan V Bajić

Video coding has traditionally been developed to support services such as video streaming, videoconferencing, digital TV, and so on. The main intent was to enable human viewing of the encoded content. However, with the advances in deep neural networks (DNNs), encoded video is increasingly being used for automatic video analytics performed by machines. In applications such as automatic traffic monitoring, analytics such as vehicle detection, tracking and counting, would run continuously, while human viewing could be required occasionally to review potential incidents. To support such applications, a new paradigm for video coding is needed that will facilitate efficient representation and compression of video for both machine and human use in a scalable manner. In this manuscript, we introduce an end-to-end learnable video codec that supports a machine vision task in its base layer, while its enhancement layer, together with the base layer, supports input reconstruction for human viewing. The proposed system is constructed based on the concept of conditional coding to achieve better compression gains. Comprehensive experimental evaluations conducted on four standard video datasets demonstrate that our framework outperforms both state-of-the-art learned and conventional video codecs in its base layer, while maintaining comparable performance on the human vision task in its enhancement layer.

视频编码的开发历来是为了支持视频流、视频会议、数字电视等服务。其主要目的是让人类能够观看编码内容。然而，随着深度神经网络（DNN）的发展，编码视频越来越多地用于机器自动视频分析。在自动交通监控等应用中，车辆检测、跟踪和计数等分析将持续运行，而人类可能需要偶尔查看，以审查潜在的事故。为支持此类应用，需要一种新的视频编码范例，以可扩展的方式促进视频的高效表示和压缩，供机器和人类使用。在本手稿中，我们介绍了一种端到端可学习视频编解码器，它的基础层支持机器视觉任务，而增强层则与基础层一起支持供人类观看的输入重构。所提出的系统是基于条件编码概念构建的，以实现更好的压缩增益。在四个标准视频数据集上进行的综合实验评估表明，我们的框架在基础层上的表现优于最先进的学习型视频编解码器和传统视频编解码器，同时在增强层上的人类视觉任务中保持了相当的性能。

{"title":"Learned scalable video coding for humans and machines.","authors":"Hadi Hadizadeh, Ivan V Bajić","doi":"10.1186/s13640-024-00657-w","DOIUrl":"10.1186/s13640-024-00657-w","url":null,"abstract":"Video coding has traditionally been developed to support services such as video streaming, videoconferencing, digital TV, and so on. The main intent was to enable human viewing of the encoded content. However, with the advances in deep neural networks (DNNs), encoded video is increasingly being used for automatic video analytics performed by machines. In applications such as automatic traffic monitoring, analytics such as vehicle detection, tracking and counting, would run continuously, while human viewing could be required occasionally to review potential incidents. To support such applications, a new paradigm for video coding is needed that will facilitate efficient representation and compression of video for both machine and human use in a scalable manner. In this manuscript, we introduce an end-to-end learnable video codec that supports a machine vision task in its base layer, while its enhancement layer, together with the base layer, supports input reconstruction for human viewing. The proposed system is constructed based on the concept of conditional coding to achieve better compression gains. Comprehensive experimental evaluations conducted on four standard video datasets demonstrate that our framework outperforms both state-of-the-art learned and conventional video codecs in its base layer, while maintaining comparable performance on the human vision task in its enhancement layer.","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"2024 1","pages":"41"},"PeriodicalIF":2.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11564357/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust steganography in practical communication: a comparative study 稳健隐写术在实际通信中的应用:比较研究

4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2023-10-31 DOI: 10.1186/s13640-023-00615-y

Tong Qiao, Shengwang Xu, Shuai Wang, Xiaoshuai Wu, Bo Liu, Ning Zheng, Ming Xu, Binmin Pan

Abstract To realize the act of covert communication in a public channel, steganography is proposed. In the current study, modern adaptive steganography plays a dominant role due to its high undetectability. However, the effectiveness of modern adaptive steganography is challenged when being applied in practical communication, such as over social network. Several robust steganographic methods have been proposed, while the comparative study between them is still unknown. Thus, we propose a framework to generalize the current typical steganographic methods resisting against compression attack, and meanwhile empirically analyze advantages and disadvantages of them based on four baseline indicators, referring to as capacity, imperceptibility, undetectability, and robustness. More importantly, the robustness performance of the methods is compared in the real application, such as on Facebook, Twitter, and WeChat, which has not been comprehensively addressed in this community. In particular, the methods modifying sign of DCT coefficients perform more superiority on the social media application.

摘要为了在公共信道中实现隐蔽通信行为，提出了隐写技术。在目前的研究中，现代自适应隐写术因其高不可检测性而占据主导地位。然而，现代自适应隐写术在社交网络等实际通信中应用时，其有效性受到了挑战。已经提出了几种鲁棒隐写方法，但它们之间的比较研究仍然未知。因此，我们提出了一个框架来概括当前典型的抗压缩攻击的隐写方法，并基于容量、不可感知性、不可检测性和鲁棒性四个基线指标对其优缺点进行了实证分析。更重要的是，在实际应用中比较了方法的鲁棒性性能，例如在Facebook, Twitter和微信上，这个问题在这个社区还没有得到全面的解决。其中，DCT系数修改符号的方法在社交媒体应用中表现出更大的优势。

{"title":"Robust steganography in practical communication: a comparative study","authors":"Tong Qiao, Shengwang Xu, Shuai Wang, Xiaoshuai Wu, Bo Liu, Ning Zheng, Ming Xu, Binmin Pan","doi":"10.1186/s13640-023-00615-y","DOIUrl":"https://doi.org/10.1186/s13640-023-00615-y","url":null,"abstract":"Abstract To realize the act of covert communication in a public channel, steganography is proposed. In the current study, modern adaptive steganography plays a dominant role due to its high undetectability. However, the effectiveness of modern adaptive steganography is challenged when being applied in practical communication, such as over social network. Several robust steganographic methods have been proposed, while the comparative study between them is still unknown. Thus, we propose a framework to generalize the current typical steganographic methods resisting against compression attack, and meanwhile empirically analyze advantages and disadvantages of them based on four baseline indicators, referring to as capacity, imperceptibility, undetectability, and robustness. More importantly, the robustness performance of the methods is compared in the real application, such as on Facebook, Twitter, and WeChat, which has not been comprehensively addressed in this community. In particular, the methods modifying sign of DCT coefficients perform more superiority on the social media application.","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"44 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135868890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-attention-based approach for deepfake face and expression swap detection and localization 基于多注意力的深度假人脸与表情交换检测与定位方法

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2023-08-18 DOI: 10.1186/s13640-023-00614-z

Saima Waseem, S. Abu-Bakar, Z. Omar, Bilal Ashfaq Ahmed, Saba Baloch, Adel Hafeezallah

引用次数: 1