首页 > 最新文献

Eurasip Journal on Image and Video Processing最新文献

英文 中文
Fast CU size decision and intra-prediction mode decision method for H.266/VVC 针对 H.266/VVC 的快速 CU 大小决策和内部预测模式决策方法
IF 2.4 4区 计算机科学 Pub Date : 2024-03-18 DOI: 10.1186/s13640-024-00622-7

Abstract

H.266/Versatile Video Coding (VVC) is the most recent video coding standard developed by the Joint Video Experts Team (JVET). The quad-tree with nested multi-type tree (QTMT) architecture that improves the compression performance of H.266/VVC is introduced. Moreover, H.266/VVC contains a greater number of intra-prediction modes than H.265/High Efficiency Video Coding (HEVC), totalling 67. However, these lead to extremely the coding computational complexity. To cope with the above issues, a fast intra-coding unit (CU) size decision method and a fast intra-prediction mode decision method are proposed in this paper. Specifically, the trained Support Vector Machine (SVM) classifier models are utilized for determining CU partition mode in a fast CU size decision scheme. Furthermore, the quantity of intra-prediction modes added to the RDO mode set decreases in a fast intra-prediction mode decision scheme based on the improved search step. Simulation results illustrate that the proposed overall algorithm can decrease 55.24% encoding runtime with negligible BDBR.

摘要 H.266/Versatile Video Coding(VVC)是联合视频专家组(JVET)制定的最新视频编码标准。本文介绍了四叉树嵌套多类型树(QTMT)结构,它提高了 H.266/VVC 的压缩性能。此外,H.266/VVC 包含比 H.265/High Efficiency Video Coding (HEVC) 更多的内部预测模式,共计 67 种。然而,这些都导致编码计算复杂度极高。为解决上述问题,本文提出了一种快速编码内单元(CU)大小决策方法和一种快速预测内模式决策方法。具体来说,在快速编码单元大小决策方案中,利用训练有素的支持向量机(SVM)分类器模型来确定编码单元分区模式。此外,在基于改进搜索步骤的快速内部预测模式决策方案中,添加到 RDO 模式集的内部预测模式数量会减少。仿真结果表明,所提出的整体算法可以减少 55.24% 的编码运行时间,而 BDBR 可忽略不计。
{"title":"Fast CU size decision and intra-prediction mode decision method for H.266/VVC","authors":"","doi":"10.1186/s13640-024-00622-7","DOIUrl":"https://doi.org/10.1186/s13640-024-00622-7","url":null,"abstract":"<h3>Abstract</h3> <p>H.266/Versatile Video Coding (VVC) is the most recent video coding standard developed by the Joint Video Experts Team (JVET). The quad-tree with nested multi-type tree (QTMT) architecture that improves the compression performance of H.266/VVC is introduced. Moreover, H.266/VVC contains a greater number of intra-prediction modes than H.265/High Efficiency Video Coding (HEVC), totalling 67. However, these lead to extremely the coding computational complexity. To cope with the above issues, a fast intra-coding unit (CU) size decision method and a fast intra-prediction mode decision method are proposed in this paper. Specifically, the trained Support Vector Machine (SVM) classifier models are utilized for determining CU partition mode in a fast CU size decision scheme. Furthermore, the quantity of intra-prediction modes added to the RDO mode set decreases in a fast intra-prediction mode decision scheme based on the improved search step. Simulation results illustrate that the proposed overall algorithm can decrease 55.24% encoding runtime with negligible BDBR.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140172567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessment framework for deepfake detection in real-world situations 真实世界中深度伪造检测的评估框架
IF 2.4 4区 计算机科学 Pub Date : 2024-02-13 DOI: 10.1186/s13640-024-00621-8
Yuhang Lu, Touradj Ebrahimi

Detecting digital face manipulation in images and video has attracted extensive attention due to the potential risk to public trust. To counteract the malicious usage of such techniques, deep learning-based deepfake detection methods have been employed and have exhibited remarkable performance. However, the performance of such detectors is often assessed on related benchmarks that hardly reflect real-world situations. For example, the impact of various image and video processing operations and typical workflow distortions on detection accuracy has not been systematically measured. In this paper, a more reliable assessment framework is proposed to evaluate the performance of learning-based deepfake detectors in more realistic settings. To the best of our acknowledgment, it is the first systematic assessment approach for deepfake detectors that not only reports the general performance under real-world conditions but also quantitatively measures their robustness toward different processing operations. To demonstrate the effectiveness and usage of the framework, extensive experiments and detailed analysis of four popular deepfake detection methods are further presented in this paper. In addition, a stochastic degradation-based data augmentation method driven by realistic processing operations is designed, which significantly improves the robustness of deepfake detectors.

由于对公众信任的潜在风险,检测图像和视频中的数字人脸操纵引起了广泛关注。为了抵制此类技术的恶意使用,人们采用了基于深度学习的深度伪造检测方法,并取得了显著的效果。然而,这类检测器的性能通常是在相关基准上评估的,很难反映真实世界的情况。例如,各种图像和视频处理操作以及典型的工作流程失真对检测精度的影响尚未得到系统测量。本文提出了一个更可靠的评估框架,以评估基于学习的深度防伪检测器在更真实环境中的性能。据我们所知,这是第一种系统性的深度伪造检测器评估方法,不仅能报告真实世界条件下的一般性能,还能定量测量它们对不同处理操作的鲁棒性。为了证明该框架的有效性和用途,本文进一步介绍了对四种流行的深度伪造检测方法的广泛实验和详细分析。此外,本文还设计了一种基于随机退化的数据增强方法,该方法由现实处理操作驱动,可显著提高深度伪造检测器的鲁棒性。
{"title":"Assessment framework for deepfake detection in real-world situations","authors":"Yuhang Lu, Touradj Ebrahimi","doi":"10.1186/s13640-024-00621-8","DOIUrl":"https://doi.org/10.1186/s13640-024-00621-8","url":null,"abstract":"<p>Detecting digital face manipulation in images and video has attracted extensive attention due to the potential risk to public trust. To counteract the malicious usage of such techniques, deep learning-based deepfake detection methods have been employed and have exhibited remarkable performance. However, the performance of such detectors is often assessed on related benchmarks that hardly reflect real-world situations. For example, the impact of various image and video processing operations and typical workflow distortions on detection accuracy has not been systematically measured. In this paper, a more reliable assessment framework is proposed to evaluate the performance of learning-based deepfake detectors in more realistic settings. To the best of our acknowledgment, it is the first systematic assessment approach for deepfake detectors that not only reports the general performance under real-world conditions but also quantitatively measures their robustness toward different processing operations. To demonstrate the effectiveness and usage of the framework, extensive experiments and detailed analysis of four popular deepfake detection methods are further presented in this paper. In addition, a stochastic degradation-based data augmentation method driven by realistic processing operations is designed, which significantly improves the robustness of deepfake detectors.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Edge-aware nonlinear diffusion-driven regularization model for despeckling synthetic aperture radar images 用于合成孔径雷达图像去斑的边缘感知非线性扩散驱动正则化模型
IF 2.4 4区 计算机科学 Pub Date : 2024-01-11 DOI: 10.1186/s13640-023-00617-w
Anthony Bua, Goodluck Kapyela, Libe Massawe, Baraka Maiseli

Speckle noise corrupts synthetic aperture radar (SAR) images and limits their applications in sensitive scientific and engineering fields. This challenge has attracted several scholars because of the wide demand of SAR images in forestry, oceanography, geology, glaciology, and topography. Despite some significant efforts to address the challenge, an open-ended research question remains to simultaneously suppress speckle noise and to restore semantic features in SAR images. Therefore, this work establishes a diffusion-driven nonlinear method with edge-awareness capabilities to restore corrupted SAR images while protecting critical image features, such as contours and textures. The proposed method incorporates two terms that promote effective noise removal: (1) high-order diffusion kernel; and (2) fractional regularization term that is sensitive to speckle noise. These terms have been carefully designed to ensure that the restored SAR images contain stronger edges and well-preserved textures. Empirical results show that the proposed model produces content-rich images with higher subjective and objective values. Furthermore, our model generates images with unnoticeable staircase and block artifacts, which are commonly found in the classical Perona–Malik and Total variation models.

斑点噪声会破坏合成孔径雷达(SAR)图像,限制其在敏感的科学和工程领域的应用。由于林业、海洋学、地质学、冰川学和地形学等领域对合成孔径雷达图像的广泛需求,这一难题吸引了众多学者的研究。尽管为应对这一挑战做出了巨大努力,但如何同时抑制斑点噪声和恢复合成孔径雷达图像的语义特征,仍是一个有待解决的研究课题。因此,本研究建立了一种具有边缘感知能力的扩散驱动非线性方法,用于恢复损坏的合成孔径雷达图像,同时保护关键的图像特征,如轮廓和纹理。所提出的方法包含两个促进有效去噪的术语:(1) 高阶扩散核;(2) 对斑点噪声敏感的分数正则化术语。这些术语都经过精心设计,以确保修复后的合成孔径雷达图像包含更强的边缘和保存完好的纹理。实证结果表明,所提出的模型能生成内容丰富的图像,具有更高的主观和客观值。此外,我们的模型生成的图像具有不易察觉的阶梯和块状伪影,而这些伪影通常出现在经典的 Perona-Malik 模型和总变异模型中。
{"title":"Edge-aware nonlinear diffusion-driven regularization model for despeckling synthetic aperture radar images","authors":"Anthony Bua, Goodluck Kapyela, Libe Massawe, Baraka Maiseli","doi":"10.1186/s13640-023-00617-w","DOIUrl":"https://doi.org/10.1186/s13640-023-00617-w","url":null,"abstract":"<p>Speckle noise corrupts synthetic aperture radar (SAR) images and limits their applications in sensitive scientific and engineering fields. This challenge has attracted several scholars because of the wide demand of SAR images in forestry, oceanography, geology, glaciology, and topography. Despite some significant efforts to address the challenge, an open-ended research question remains to simultaneously suppress speckle noise and to restore semantic features in SAR images. Therefore, this work establishes a diffusion-driven nonlinear method with edge-awareness capabilities to restore corrupted SAR images while protecting critical image features, such as contours and textures. The proposed method incorporates two terms that promote effective noise removal: (1) high-order diffusion kernel; and (2) fractional regularization term that is sensitive to speckle noise. These terms have been carefully designed to ensure that the restored SAR images contain stronger edges and well-preserved textures. Empirical results show that the proposed model produces content-rich images with higher subjective and objective values. Furthermore, our model generates images with unnoticeable staircase and block artifacts, which are commonly found in the classical Perona–Malik and Total variation models.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139422462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal few-shot classification without attribute embedding 无属性嵌入的多模态少镜头分类
IF 2.4 4区 计算机科学 Pub Date : 2024-01-10 DOI: 10.1186/s13640-024-00620-9
Jun Qing Chang, Deepu Rajan, Nicholas Vun

Multimodal few-shot learning aims to exploit complementary information inherent in multiple modalities for vision tasks in low data scenarios. Most of the current research focuses on a suitable embedding space for the various modalities. While solutions based on embedding provide state-of-the-art results, they reduce the interpretability of the model. Separate visualization approaches enable the models to become more transparent. In this paper, a multimodal few-shot learning framework that is inherently interpretable is presented. This is achieved by using the textual modality in the form of attributes without embedding them. This enables the model to directly explain which attributes caused it to classify an image into a particular class. The model consists of a variational autoencoder to learn the visual latent representation, which is combined with a semantic latent representation that is learnt from a normal autoencoder, which calculates a semantic loss between the latent representation and a binary attribute vector. A decoder reconstructs the original image from concatenated latent vectors. The proposed model outperforms other multimodal methods when all test classes are used, e.g., 50 classes in a 50-way 1-shot setting, and is comparable for lesser number of ways. Since raw text attributes are used, the datasets for evaluation are CUB, SUN and AWA2. The effectiveness of interpretability provided by the model is evaluated by analyzing how well it has learnt to identify the attributes.

多模态少量学习旨在利用多种模态固有的互补信息,在数据量较少的情况下完成视觉任务。目前的大部分研究都集中在为各种模态寻找合适的嵌入空间。虽然基于嵌入的解决方案能提供最先进的结果,但却降低了模型的可解释性。单独的可视化方法可以使模型更加透明。本文介绍了一种本质上可解释的多模态少量学习框架。这是通过以属性的形式使用文本模态而不嵌入属性来实现的。这样,模型就能直接解释是哪些属性导致其将图像归入特定类别。该模型由一个学习视觉潜表征的变异自动编码器和一个从普通自动编码器学习的语义潜表征组成,后者计算潜表征和二进制属性向量之间的语义损失。解码器从连接的潜在向量中重建原始图像。当使用所有测试类别时,例如在 50 路 1 次拍摄设置中使用 50 个类别时,所提出的模型优于其他多模态方法,而在使用较少的方式时,其性能也不相上下。由于使用的是原始文本属性,因此评估数据集为 CUB、SUN 和 AWA2。通过分析模型在识别属性方面的学习效果,来评估模型提供的可解释性的有效性。
{"title":"Multimodal few-shot classification without attribute embedding","authors":"Jun Qing Chang, Deepu Rajan, Nicholas Vun","doi":"10.1186/s13640-024-00620-9","DOIUrl":"https://doi.org/10.1186/s13640-024-00620-9","url":null,"abstract":"<p>Multimodal few-shot learning aims to exploit complementary information inherent in multiple modalities for vision tasks in low data scenarios. Most of the current research focuses on a suitable embedding space for the various modalities. While solutions based on embedding provide state-of-the-art results, they reduce the interpretability of the model. Separate visualization approaches enable the models to become more transparent. In this paper, a multimodal few-shot learning framework that is inherently interpretable is presented. This is achieved by using the textual modality in the form of attributes without embedding them. This enables the model to directly explain which attributes caused it to classify an image into a particular class. The model consists of a variational autoencoder to learn the visual latent representation, which is combined with a semantic latent representation that is learnt from a normal autoencoder, which calculates a semantic loss between the latent representation and a binary attribute vector. A decoder reconstructs the original image from concatenated latent vectors. The proposed model outperforms other multimodal methods when all test classes are used, e.g., 50 classes in a 50-way 1-shot setting, and is comparable for lesser number of ways. Since raw text attributes are used, the datasets for evaluation are CUB, SUN and AWA2. The effectiveness of interpretability provided by the model is evaluated by analyzing how well it has learnt to identify the attributes.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139422411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secure image transmission through LTE wireless communications systems 通过 LTE 无线通信系统安全传输图像
IF 2.4 4区 计算机科学 Pub Date : 2024-01-10 DOI: 10.1186/s13640-024-00619-2
Farouk Abduh Kamil Al-Fahaidy, Radwan AL-Bouthigy, Mohammad Yahya H. Al-Shamri, Safwan Abdulkareem

Secure transmission of images over wireless communications systems can be done using RSA, the most known and efficient cryptographic algorithm, and OFDMA, the most preferred signal processing choice in wireless communications. This paper aims to investigate the performance of OFDMA system for wireless transmission of RSA-based encrypted images. In fact, the performance of OFDMA systems; based on different signal processing techniques, such as, discrete sine transforms (DST) and discrete cosine transforms (DCT), as well as the conventional discrete Fourier transforms (DFT) are tested for wireless transmission of gray-scale images with/without RSA encryption. The progress of transmitting the image is carried by firstly, encrypting the image with RSA algorithm. Then, the encrypted image is modulated with DFT-based, DCT-based, and DST-based OFDMA systems. After that, the modulated images are transmitted over a wireless multipath fading channel. The reverse operations will be carried at the receiver, in addition to the frequency domain equalization to overcome the channel effect. Exhaustive numbers of scenarios are performed for study and investigation of the performance of the different OFDMA systems in terms of PSNR and MSE, with different subcarriers mapping and modulation techniques, is done. Results indicate that the ability of different OFDMA systems for wireless secure transmission of images. However, the DCT-OFDMA system showed superiority over the DST-OFDMA and the conventional DFT-OFDMA systems.

通过无线通信系统安全传输图像可以使用 RSA(最著名、最高效的加密算法)和 OFDMA(无线通信中最受欢迎的信号处理选择)。本文旨在研究 OFDMA 系统在无线传输基于 RSA 的加密图像时的性能。事实上,本文测试了基于不同信号处理技术(如离散正弦变换(DST)和离散余弦变换(DCT)以及传统的离散傅里叶变换(DFT))的 OFDMA 系统在无线传输带/不带 RSA 加密的灰度图像时的性能。传输图像时,首先使用 RSA 算法对图像进行加密。然后,用基于 DFT、DCT 和 DST 的 OFDMA 系统对加密图像进行调制。然后,通过无线多径衰减信道传输调制后的图像。除了频域均衡以克服信道效应外,接收器还将进行反向操作。在研究过程中,对不同的子载波映射和调制技术下的不同 OFDMA 系统的 PSNR 和 MSE 性能进行了详尽的研究和调查。结果表明,不同的 OFDMA 系统都能实现图像的无线安全传输。不过,DCT-OFDMA 系统优于 DST-OFDMA 和传统的 DFT-OFDMA 系统。
{"title":"Secure image transmission through LTE wireless communications systems","authors":"Farouk Abduh Kamil Al-Fahaidy, Radwan AL-Bouthigy, Mohammad Yahya H. Al-Shamri, Safwan Abdulkareem","doi":"10.1186/s13640-024-00619-2","DOIUrl":"https://doi.org/10.1186/s13640-024-00619-2","url":null,"abstract":"<p>Secure transmission of images over wireless communications systems can be done using RSA, the most known and efficient cryptographic algorithm, and OFDMA, the most preferred signal processing choice in wireless communications. This paper aims to investigate the performance of OFDMA system for wireless transmission of RSA-based encrypted images. In fact, the performance of OFDMA systems; based on different signal processing techniques, such as, discrete sine transforms (DST) and discrete cosine transforms (DCT), as well as the conventional discrete Fourier transforms (DFT) are tested for wireless transmission of gray-scale images with/without RSA encryption. The progress of transmitting the image is carried by firstly, encrypting the image with RSA algorithm. Then, the encrypted image is modulated with DFT-based, DCT-based, and DST-based OFDMA systems. After that, the modulated images are transmitted over a wireless multipath fading channel. The reverse operations will be carried at the receiver, in addition to the frequency domain equalization to overcome the channel effect. Exhaustive numbers of scenarios are performed for study and investigation of the performance of the different OFDMA systems in terms of PSNR and MSE, with different subcarriers mapping and modulation techniques, is done. Results indicate that the ability of different OFDMA systems for wireless secure transmission of images. However, the DCT-OFDMA system showed superiority over the DST-OFDMA and the conventional DFT-OFDMA systems.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139422457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An optimized capsule neural networks for tomato leaf disease classification 用于番茄叶病分类的优化胶囊神经网络
IF 2.4 4区 计算机科学 Pub Date : 2024-01-08 DOI: 10.1186/s13640-023-00618-9
Lobna M. Abouelmagd, Mahmoud Y. Shams, Hanaa Salem Marie, Aboul Ella Hassanien

Plant diseases have a significant impact on leaves, with each disease exhibiting specific spots characterized by unique colors and locations. Therefore, it is crucial to develop a method for detecting these diseases based on spot shape, color, and location within the leaves. While Convolutional Neural Networks (CNNs) have been widely used in deep learning applications, they suffer from limitations in capturing relative spatial and orientation relationships. This paper presents a computer vision methodology that utilizes an optimized capsule neural network (CapsNet) to detect and classify ten tomato leaf diseases using standard dataset images. To mitigate overfitting, data augmentation, and preprocessing techniques were employed during the training phase. CapsNet was chosen over CNNs due to its superior ability to capture spatial positioning within the image. The proposed CapsNet approach achieved an accuracy of 96.39% with minimal loss, relying on a 0.00001 Adam optimizer. By comparing the results with existing state-of-the-art approaches, the study demonstrates the effectiveness of CapsNet in accurately identifying and classifying tomato leaf diseases based on spot shape, color, and location. The findings highlight the potential of CapsNet as an alternative to CNNs for improving disease detection and classification in plant pathology research.

植物病害对叶片的影响很大,每种病害都会表现出特定的病斑,这些病斑具有独特的颜色和位置。因此,根据病斑的形状、颜色和在叶片上的位置开发一种检测这些病害的方法至关重要。虽然卷积神经网络(CNN)已在深度学习应用中得到广泛应用,但它们在捕捉相对空间和方向关系方面存在局限性。本文介绍了一种计算机视觉方法,该方法利用优化的胶囊神经网络(CapsNet),使用标准数据集图像对十种番茄叶片病害进行检测和分类。为减少过拟合,在训练阶段采用了数据增强和预处理技术。之所以选择 CapsNet 而不是 CNN,是因为 CapsNet 在捕捉图像中的空间定位方面能力出众。所提出的 CapsNet 方法依靠 0.00001 Adam 优化器,以最小的损失达到了 96.39% 的准确率。通过将结果与现有的最先进方法进行比较,该研究证明了 CapsNet 在根据斑点形状、颜色和位置对番茄叶片病害进行准确识别和分类方面的有效性。研究结果凸显了 CapsNet 在植物病理学研究中作为 CNNs 替代品改善病害检测和分类的潜力。
{"title":"An optimized capsule neural networks for tomato leaf disease classification","authors":"Lobna M. Abouelmagd, Mahmoud Y. Shams, Hanaa Salem Marie, Aboul Ella Hassanien","doi":"10.1186/s13640-023-00618-9","DOIUrl":"https://doi.org/10.1186/s13640-023-00618-9","url":null,"abstract":"<p>Plant diseases have a significant impact on leaves, with each disease exhibiting specific spots characterized by unique colors and locations. Therefore, it is crucial to develop a method for detecting these diseases based on spot shape, color, and location within the leaves. While Convolutional Neural Networks (CNNs) have been widely used in deep learning applications, they suffer from limitations in capturing relative spatial and orientation relationships. This paper presents a computer vision methodology that utilizes an optimized capsule neural network (CapsNet) to detect and classify ten tomato leaf diseases using standard dataset images. To mitigate overfitting, data augmentation, and preprocessing techniques were employed during the training phase. CapsNet was chosen over CNNs due to its superior ability to capture spatial positioning within the image. The proposed CapsNet approach achieved an accuracy of 96.39% with minimal loss, relying on a 0.00001 Adam optimizer. By comparing the results with existing state-of-the-art approaches, the study demonstrates the effectiveness of CapsNet in accurately identifying and classifying tomato leaf diseases based on spot shape, color, and location. The findings highlight the potential of CapsNet as an alternative to CNNs for improving disease detection and classification in plant pathology research.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139396774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-layer features template update object tracking algorithm based on SiamFC++ 基于 SiamFC++ 的多层特征模板更新物体跟踪算法
IF 2.4 4区 计算机科学 Pub Date : 2024-01-04 DOI: 10.1186/s13640-023-00616-x
Xiaofeng Lu, Xuan Wang, Zhengyang Wang, Xinhong Hei

SiamFC++ only extracts the object feature of the first frame as a tracking template, and only uses the highest level feature maps in both the classification branch and the regression branch, so that the respective characteristics of the two branches are not fully utilized. In view of this, the present paper proposes an object tracking algorithm based on SiamFC++. The algorithm uses the multi-layer features of the Siamese network to update template. First, FPN is used to extract feature maps from different layers of Backbone for classification branch and regression branch. Second, 3D convolution is used to update the tracking template of the object tracking algorithm. Next, a template update judgment condition is proposed based on mutual information. Finally, AlexNet is used as the backbone and GOT-10K as training set. Compared with SiamFC++, our algorithm obtains improved results on OTB100, VOT2016, VOT2018 and GOT-10k data sets, and the tracking process is real time.

SiamFC++ 只提取第一帧的物体特征作为跟踪模板,在分类分支和回归分支中都只使用最高级别的特征图,这样就不能充分利用两个分支各自的特点。有鉴于此,本文提出了一种基于 SiamFC++ 的物体跟踪算法。该算法利用连体网络的多层特征来更新模板。首先,利用 FPN 从 Backbone 的不同层提取特征图,用于分类分支和回归分支。其次,利用三维卷积更新物体跟踪算法的跟踪模板。接着,提出了基于互信息的模板更新判断条件。最后,使用 AlexNet 作为骨干网,GOT-10K 作为训练集。与 SiamFC++ 相比,我们的算法在 OTB100、VOT2016、VOT2018 和 GOT-10k 数据集上获得了更好的结果,并且跟踪过程是实时的。
{"title":"Multi-layer features template update object tracking algorithm based on SiamFC++","authors":"Xiaofeng Lu, Xuan Wang, Zhengyang Wang, Xinhong Hei","doi":"10.1186/s13640-023-00616-x","DOIUrl":"https://doi.org/10.1186/s13640-023-00616-x","url":null,"abstract":"<p>SiamFC++ only extracts the object feature of the first frame as a tracking template, and only uses the highest level feature maps in both the classification branch and the regression branch, so that the respective characteristics of the two branches are not fully utilized. In view of this, the present paper proposes an object tracking algorithm based on SiamFC++. The algorithm uses the multi-layer features of the Siamese network to update template. First, FPN is used to extract feature maps from different layers of Backbone for classification branch and regression branch. Second, 3D convolution is used to update the tracking template of the object tracking algorithm. Next, a template update judgment condition is proposed based on mutual information. Finally, AlexNet is used as the backbone and GOT-10K as training set. Compared with SiamFC++, our algorithm obtains improved results on OTB100, VOT2016, VOT2018 and GOT-10k data sets, and the tracking process is real time.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139092803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Subjective performance evaluation of bitrate allocation strategies for MPEG and JPEG Pleno point cloud compression. MPEG 和 JPEG Pleno 点云压缩比特率分配策略的主观性能评估。
IF 2.4 4区 计算机科学 Pub Date : 2024-01-01 Epub Date: 2024-06-11 DOI: 10.1186/s13640-024-00629-0
Davi Lazzarotto, Michela Testolina, Touradj Ebrahimi

The recent rise in interest in point clouds as an imaging modality has motivated standardization groups such as JPEG and MPEG to launch activities aiming at developing compression standards for point clouds. Lossy compression usually introduces visual artifacts that negatively impact the perceived quality of media, which can only be reliably measured through subjective visual quality assessment experiments. While MPEG standards have been subjectively evaluated in previous studies on multiple occasions, no work has yet assessed the performance of the recent JPEG Pleno standard in comparison to them. In this study, a comprehensive performance evaluation of JPEG and MPEG standards for point cloud compression is conducted. The impact of different configuration parameters on the performance of the codecs is first analyzed with the help of objective quality metrics. The results from this analysis are used to define three rate allocation strategies for each codec, which are employed to compress a set of point clouds at four target rates. The set of distorted point clouds is then subjectively evaluated following two subjective quality assessment protocols. Finally, the obtained results are used to compare the performance of these compression standards and draw insights about best coding practices.

近来,人们对点云这种成像方式的兴趣日益浓厚,促使 JPEG 和 MPEG 等标准化组织发起了旨在制定点云压缩标准的活动。有损压缩通常会引入视觉伪影,对媒体的感知质量产生负面影响,而这只能通过主观视觉质量评估实验进行可靠测量。虽然 MPEG 标准已在之前的研究中多次进行了主观评估,但还没有研究对最近的 JPEG Pleno 标准的性能进行评估。本研究对 JPEG 和 MPEG 标准的点云压缩性能进行了全面评估。首先借助客观质量指标分析了不同配置参数对编解码器性能的影响。分析结果被用于为每种编解码器定义三种速率分配策略,这些策略被用于以四种目标速率压缩一组点云。然后,根据两个主观质量评估协议对这组失真点云进行主观评估。最后,利用获得的结果来比较这些压缩标准的性能,并得出最佳编码实践的见解。
{"title":"Subjective performance evaluation of bitrate allocation strategies for MPEG and JPEG Pleno point cloud compression.","authors":"Davi Lazzarotto, Michela Testolina, Touradj Ebrahimi","doi":"10.1186/s13640-024-00629-0","DOIUrl":"10.1186/s13640-024-00629-0","url":null,"abstract":"<p><p>The recent rise in interest in point clouds as an imaging modality has motivated standardization groups such as JPEG and MPEG to launch activities aiming at developing compression standards for point clouds. Lossy compression usually introduces visual artifacts that negatively impact the perceived quality of media, which can only be reliably measured through subjective visual quality assessment experiments. While MPEG standards have been subjectively evaluated in previous studies on multiple occasions, no work has yet assessed the performance of the recent JPEG Pleno standard in comparison to them. In this study, a comprehensive performance evaluation of JPEG and MPEG standards for point cloud compression is conducted. The impact of different configuration parameters on the performance of the codecs is first analyzed with the help of objective quality metrics. The results from this analysis are used to define three rate allocation strategies for each codec, which are employed to compress a set of point clouds at four target rates. The set of distorted point clouds is then subjectively evaluated following two subjective quality assessment protocols. Finally, the obtained results are used to compare the performance of these compression standards and draw insights about best coding practices.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11166754/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141318743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust steganography in practical communication: a comparative study 稳健隐写术在实际通信中的应用:比较研究
4区 计算机科学 Pub Date : 2023-10-31 DOI: 10.1186/s13640-023-00615-y
Tong Qiao, Shengwang Xu, Shuai Wang, Xiaoshuai Wu, Bo Liu, Ning Zheng, Ming Xu, Binmin Pan
Abstract To realize the act of covert communication in a public channel, steganography is proposed. In the current study, modern adaptive steganography plays a dominant role due to its high undetectability. However, the effectiveness of modern adaptive steganography is challenged when being applied in practical communication, such as over social network. Several robust steganographic methods have been proposed, while the comparative study between them is still unknown. Thus, we propose a framework to generalize the current typical steganographic methods resisting against compression attack, and meanwhile empirically analyze advantages and disadvantages of them based on four baseline indicators, referring to as capacity, imperceptibility, undetectability, and robustness. More importantly, the robustness performance of the methods is compared in the real application, such as on Facebook, Twitter, and WeChat, which has not been comprehensively addressed in this community. In particular, the methods modifying sign of DCT coefficients perform more superiority on the social media application.
摘要为了在公共信道中实现隐蔽通信行为,提出了隐写技术。在目前的研究中,现代自适应隐写术因其高不可检测性而占据主导地位。然而,现代自适应隐写术在社交网络等实际通信中应用时,其有效性受到了挑战。已经提出了几种鲁棒隐写方法,但它们之间的比较研究仍然未知。因此,我们提出了一个框架来概括当前典型的抗压缩攻击的隐写方法,并基于容量、不可感知性、不可检测性和鲁棒性四个基线指标对其优缺点进行了实证分析。更重要的是,在实际应用中比较了方法的鲁棒性性能,例如在Facebook, Twitter和微信上,这个问题在这个社区还没有得到全面的解决。其中,DCT系数修改符号的方法在社交媒体应用中表现出更大的优势。
{"title":"Robust steganography in practical communication: a comparative study","authors":"Tong Qiao, Shengwang Xu, Shuai Wang, Xiaoshuai Wu, Bo Liu, Ning Zheng, Ming Xu, Binmin Pan","doi":"10.1186/s13640-023-00615-y","DOIUrl":"https://doi.org/10.1186/s13640-023-00615-y","url":null,"abstract":"Abstract To realize the act of covert communication in a public channel, steganography is proposed. In the current study, modern adaptive steganography plays a dominant role due to its high undetectability. However, the effectiveness of modern adaptive steganography is challenged when being applied in practical communication, such as over social network. Several robust steganographic methods have been proposed, while the comparative study between them is still unknown. Thus, we propose a framework to generalize the current typical steganographic methods resisting against compression attack, and meanwhile empirically analyze advantages and disadvantages of them based on four baseline indicators, referring to as capacity, imperceptibility, undetectability, and robustness. More importantly, the robustness performance of the methods is compared in the real application, such as on Facebook, Twitter, and WeChat, which has not been comprehensively addressed in this community. In particular, the methods modifying sign of DCT coefficients perform more superiority on the social media application.","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135868890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-attention-based approach for deepfake face and expression swap detection and localization 基于多注意力的深度假人脸与表情交换检测与定位方法
IF 2.4 4区 计算机科学 Pub Date : 2023-08-18 DOI: 10.1186/s13640-023-00614-z
Saima Waseem, S. Abu-Bakar, Z. Omar, Bilal Ashfaq Ahmed, Saba Baloch, Adel Hafeezallah
{"title":"Multi-attention-based approach for deepfake face and expression swap detection and localization","authors":"Saima Waseem, S. Abu-Bakar, Z. Omar, Bilal Ashfaq Ahmed, Saba Baloch, Adel Hafeezallah","doi":"10.1186/s13640-023-00614-z","DOIUrl":"https://doi.org/10.1186/s13640-023-00614-z","url":null,"abstract":"","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42272550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Eurasip Journal on Image and Video Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1