Eurasip Journal on Image and Video Processing最新文献

英文中文

Advanced fine-tuning procedures to enhance DNN robustness in visual coding for machines 在机器视觉编码中增强 DNN 稳健性的高级微调程序

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-09-18 DOI: 10.1186/s13640-024-00650-3

Alban Marie, Karol Desnos, Alexandre Mercat, Luce Morin, Jarno Vanne, Lu Zhang

Video Coding for Machines (VCM) is gaining momentum in applications like autonomous driving, industry manufacturing, and surveillance, where the robustness of machine learning algorithms against coding artifacts is one of the key success factors. This work complements the MPEG/JVET standardization efforts in improving the resilience of deep neural network (DNN)-based machine models against such coding artifacts by proposing the following three advanced fine-tuning procedures for their training: (1) the progressive increase of the distortion strength as the training proceeds; (2) the incorporation of a regularization term in the original loss function to minimize the distance between predictions on compressed and original content; and (3) a joint training procedure that combines the proposed two approaches. These proposals were evaluated against a conventional fine-tuning anchor on two different machine tasks and datasets: image classification on ImageNet and semantic segmentation on Cityscapes. Our joint training procedure is shown to reduce the training time in both cases and still obtain a 2.4% coding gain in image classification and 7.4% in semantic segmentation, whereas a slight increase in training time can bring up to 9.4% better coding efficiency for the segmentation. All these coding gains are obtained without any additional inference or encoding time. As these advanced fine-tuning procedures are standard-compliant, they offer the potential to have a significant impact on visual coding for machine applications.

机器视频编码（VCM）在自动驾驶、工业制造和监控等应用领域的发展势头日益强劲，其中机器学习算法对编码伪影的鲁棒性是成功的关键因素之一。这项工作是对 MPEG/JVET 标准化工作的补充，通过为基于深度神经网络（DNN）的机器模型的训练提出以下三种先进的微调程序，提高机器模型对此类编码人工痕迹的适应能力：(1) 在训练过程中逐步增加失真强度；(2) 在原始损失函数中加入正则化项，以最小化对压缩内容和原始内容的预测之间的距离；(3) 结合上述两种方法的联合训练程序。我们在两个不同的机器任务和数据集（ImageNet 的图像分类和 Cityscapes 的语义分割）上，对照传统的微调锚对这些建议进行了评估。结果表明，我们的联合训练程序在两种情况下都能减少训练时间，并在图像分类和语义分割中分别获得 2.4% 和 7.4% 的编码增益，而训练时间的略微增加则能使分割的编码效率提高 9.4%。所有这些编码增益都是在不增加任何推理或编码时间的情况下实现的。由于这些先进的微调程序符合标准，因此有可能对机器应用的视觉编码产生重大影响。

{"title":"Advanced fine-tuning procedures to enhance DNN robustness in visual coding for machines","authors":"Alban Marie, Karol Desnos, Alexandre Mercat, Luce Morin, Jarno Vanne, Lu Zhang","doi":"10.1186/s13640-024-00650-3","DOIUrl":"https://doi.org/10.1186/s13640-024-00650-3","url":null,"abstract":"Video Coding for Machines (VCM) is gaining momentum in applications like autonomous driving, industry manufacturing, and surveillance, where the robustness of machine learning algorithms against coding artifacts is one of the key success factors. This work complements the MPEG/JVET standardization efforts in improving the resilience of deep neural network (DNN)-based machine models against such coding artifacts by proposing the following three advanced fine-tuning procedures for their training: (1) the progressive increase of the distortion strength as the training proceeds; (2) the incorporation of a regularization term in the original loss function to minimize the distance between predictions on compressed and original content; and (3) a joint training procedure that combines the proposed two approaches. These proposals were evaluated against a conventional fine-tuning anchor on two different machine tasks and datasets: image classification on ImageNet and semantic segmentation on Cityscapes. Our joint training procedure is shown to reduce the training time in both cases and still obtain a 2.4% coding gain in image classification and 7.4% in semantic segmentation, whereas a slight increase in training time can bring up to 9.4% better coding efficiency for the segmentation. All these coding gains are obtained without any additional inference or encoding time. As these advanced fine-tuning procedures are standard-compliant, they offer the potential to have a significant impact on visual coding for machine applications.","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"208 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel multiscale cGAN approach for enhanced salient object detection in single haze images 用于增强单幅灰霾图像中突出物体检测的新型多尺度 cGAN 方法

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-09-15 DOI: 10.1186/s13640-024-00648-x

Gayathri Dhara, Ravi Kant Kumar

In computer vision, image dehazing is a low-level task that employs algorithms to analyze and remove haze from images, resulting in haze-free visuals. The aim of Salient Object Detection (SOD) is to locate the most visually prominent areas in images. However, most SOD techniques applied to visible images struggle in complex scenarios characterized by similarities between the foreground and background, cluttered backgrounds, adverse weather conditions, and low lighting. Identifying objects in hazy images is challenging due to the degradation of visibility caused by atmospheric conditions, leading to diminished visibility and reduced contrast. This paper introduces an innovative approach called Dehaze-SOD, a unique integrated model that addresses two vital tasks: dehazing and salient object detection. The key novelty of Dehaze-SOD lies in its dual functionality, seamlessly integrating dehazing and salient object identification into a unified framework. This is achieved using a conditional Generative Adversarial Network (cGAN) comprising two distinct subnetworks: one for image dehazing and another for salient object detection. The first module, designed with residual blocks, Dark Channel Prior (DCP), total variation, and the multiscale Retinex algorithm, processes the input hazy images. The second module employs an enhanced EfficientNet architecture with added attention mechanisms and pixel-wise refinement to further improve the dehazing process. The outputs from these subnetworks are combined to produce dehazed images, which are then fed into our proposed encoder–decoder framework for salient object detection. The cGAN is trained with two modules working together: the generator aims to produce haze-free images, whereas the discriminator distinguishes between the generated haze-free images and real haze-free images. Dehaze-SOD demonstrates superior performance compared to state-of-the-art dehazing methods in terms of color fidelity, visibility enhancement, and haze removal. The proposed method effectively produces high-quality, haze-free images from various hazy inputs and accurately detects salient objects within them. This makes Dehaze-SOD a promising tool for improving salient object detection in challenging hazy conditions. The effectiveness of our approach has been validated using benchmark evaluation metrics such as mean absolute error (MAE), peak signal-to-noise ratio (PSNR), and structural similarity index measure (SSIM).

在计算机视觉领域，图像去雾是一项低级任务，它采用算法来分析和去除图像中的雾气，从而获得无雾的视觉效果。突出物体检测（SOD）的目的是找出图像中视觉效果最突出的区域。然而，应用于可见光图像的大多数 SOD 技术在前景与背景相似、背景杂乱、天气条件恶劣和光照不足等复杂情况下都难以发挥作用。在朦胧图像中识别物体具有挑战性，因为大气条件会导致能见度下降，从而降低能见度和对比度。本文介绍了一种名为 Dehaze-SOD 的创新方法，这是一种独特的集成模型，可解决去雾和突出物体检测这两项重要任务。Dehaze-SOD 的主要创新点在于其双重功能，它将去光晕和突出物体识别无缝集成到一个统一的框架中。这是通过一个条件生成对抗网络（cGAN）实现的，该网络由两个不同的子网络组成：一个用于图像去毛刺，另一个用于突出物体检测。第一个模块采用残差块、暗通道优先（DCP）、总变化和多尺度 Retinex 算法处理输入的模糊图像。第二个模块采用增强型 EfficientNet 架构，增加了注意力机制和像素细化功能，以进一步改进去雾处理过程。这些子网络的输出结合起来生成去雾图像，然后输入我们提出的编码器-解码器框架，进行突出物体检测。cGAN 的训练由两个模块共同完成：生成器旨在生成无雾霾图像，而鉴别器则对生成的无雾霾图像和真实的无雾霾图像进行鉴别。Dehaze-SOD 在色彩保真度、能见度增强和雾霾去除方面的表现优于最先进的去雾霾方法。所提出的方法能有效地从各种雾霾输入中生成高质量的无雾霾图像，并能准确地检测出图像中的突出物体。这使得 Dehaze-SOD 成为在具有挑战性的雾霾条件下改进突出物体检测的一种有前途的工具。平均绝对误差（MAE）、峰值信噪比（PSNR）和结构相似性指数（SSIM）等基准评估指标验证了我们方法的有效性。

{"title":"A novel multiscale cGAN approach for enhanced salient object detection in single haze images","authors":"Gayathri Dhara, Ravi Kant Kumar","doi":"10.1186/s13640-024-00648-x","DOIUrl":"https://doi.org/10.1186/s13640-024-00648-x","url":null,"abstract":"In computer vision, image dehazing is a low-level task that employs algorithms to analyze and remove haze from images, resulting in haze-free visuals. The aim of Salient Object Detection (SOD) is to locate the most visually prominent areas in images. However, most SOD techniques applied to visible images struggle in complex scenarios characterized by similarities between the foreground and background, cluttered backgrounds, adverse weather conditions, and low lighting. Identifying objects in hazy images is challenging due to the degradation of visibility caused by atmospheric conditions, leading to diminished visibility and reduced contrast. This paper introduces an innovative approach called Dehaze-SOD, a unique integrated model that addresses two vital tasks: dehazing and salient object detection. The key novelty of Dehaze-SOD lies in its dual functionality, seamlessly integrating dehazing and salient object identification into a unified framework. This is achieved using a conditional Generative Adversarial Network (cGAN) comprising two distinct subnetworks: one for image dehazing and another for salient object detection. The first module, designed with residual blocks, Dark Channel Prior (DCP), total variation, and the multiscale Retinex algorithm, processes the input hazy images. The second module employs an enhanced EfficientNet architecture with added attention mechanisms and pixel-wise refinement to further improve the dehazing process. The outputs from these subnetworks are combined to produce dehazed images, which are then fed into our proposed encoder–decoder framework for salient object detection. The cGAN is trained with two modules working together: the generator aims to produce haze-free images, whereas the discriminator distinguishes between the generated haze-free images and real haze-free images. Dehaze-SOD demonstrates superior performance compared to state-of-the-art dehazing methods in terms of color fidelity, visibility enhancement, and haze removal. The proposed method effectively produces high-quality, haze-free images from various hazy inputs and accurately detects salient objects within them. This makes Dehaze-SOD a promising tool for improving salient object detection in challenging hazy conditions. The effectiveness of our approach has been validated using benchmark evaluation metrics such as mean absolute error (MAE), peak signal-to-noise ratio (PSNR), and structural similarity index measure (SSIM).","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"28 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimization of parameters for image denoising algorithm pertaining to generalized Caputo-Fabrizio fractional operator 优化与广义卡普托-法布里齐奥分数算子有关的图像去噪算法参数

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-09-13 DOI: 10.1186/s13640-024-00632-5

S. Gaur, A. M. Khan, D. L. Suthar

The aim of the present paper is to optimize the values of different parameters related to the image denoising algorithm involving Caputo Fabrizio fractional integral operator of non-singular type with the Mittag-Leffler function in generalized form. The algorithm aims to find the coefficients of a kernel to remove the noise from images. The optimization of kernel coefficients are done on the basis of different numerical parameters like Mean Square Error (MSE), Peak Signal-to-Noise Ratio (PSNR), Structure Similarity Index measure (SSIM) and Image Enhancement Factor (IEF). The performance of the proposed algorithm is investigated through above-mentioned numeric parameters and visual perception with the other prevailed algorithms. Experimental results demonstrate that the proposed optimized kernel based on generalized fractional operator performs favorably compared to state of the art methods. The uniqueness of the paper is to highlight the optimized values of performance parameters for different values of fractional order. The novelty of the presented work lies in the development of a kernel utilizing coefficients from a fractional integral operator, specifically involving the Mittag-Leffler function in a more generalized form.

本文的目的是优化与图像去噪算法相关的不同参数值，该算法涉及卡普托-法布里齐奥非矢量型分数积分算子和广义形式的米塔格-勒夫勒函数。该算法旨在找到去除图像噪声的核系数。核系数的优化基于不同的数值参数，如均方误差（MSE）、峰值信噪比（PSNR）、结构相似性指数（SSIM）和图像增强因子（IEF）。通过上述数值参数和与其他主流算法的视觉感知，研究了所提算法的性能。实验结果表明，与现有方法相比，基于广义分数算子的优化内核表现优异。本文的独特之处在于突出了不同分数阶值下性能参数的优化值。本文的新颖之处在于开发了一种利用分数积分算子系数的内核，特别是以更广义的形式涉及 Mittag-Leffler 函数。

{"title":"Optimization of parameters for image denoising algorithm pertaining to generalized Caputo-Fabrizio fractional operator","authors":"S. Gaur, A. M. Khan, D. L. Suthar","doi":"10.1186/s13640-024-00632-5","DOIUrl":"https://doi.org/10.1186/s13640-024-00632-5","url":null,"abstract":"The aim of the present paper is to optimize the values of different parameters related to the image denoising algorithm involving Caputo Fabrizio fractional integral operator of non-singular type with the Mittag-Leffler function in generalized form. The algorithm aims to find the coefficients of a kernel to remove the noise from images. The optimization of kernel coefficients are done on the basis of different numerical parameters like Mean Square Error (MSE), Peak Signal-to-Noise Ratio (PSNR), Structure Similarity Index measure (SSIM) and Image Enhancement Factor (IEF). The performance of the proposed algorithm is investigated through above-mentioned numeric parameters and visual perception with the other prevailed algorithms. Experimental results demonstrate that the proposed optimized kernel based on generalized fractional operator performs favorably compared to state of the art methods. The uniqueness of the paper is to highlight the optimized values of performance parameters for different values of fractional order. The novelty of the presented work lies in the development of a kernel utilizing coefficients from a fractional integral operator, specifically involving the Mittag-Leffler function in a more generalized form.","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"27 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Utility-based performance evaluation of biometric sample quality measures 基于效用的生物识别样本质量测量性能评估

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-09-09 DOI: 10.1186/s13640-024-00644-1

Olaf Henniger, Biying Fu, Alexander Kurz

The quality score of a biometric sample is intended to predict the sample’s degree of utility for biometric recognition. Different authors proposed different definitions for utility. A harmonized definition of utility would be useful to facilitate the comparison of biometric sample quality assessment algorithms. In this article, we compare different definitions of utility and apply them to both face image and fingerprint image data sets containing multiple samples per biometric instance and covering a wide range of potential quality issues. The results differ only slightly. We show that discarding samples with low utility scores results in rapidly declining false non-match rates. The obtained utility scores can be used as target labels for training biometric sample quality assessment algorithms and as baseline when summarizing utility-prediction performance in a single plot or even in a single figure of merit.

生物识别样本的质量得分旨在预测样本在生物识别中的实用程度。不同的作者提出了不同的实用性定义。一个统一的实用性定义将有助于生物识别样本质量评估算法的比较。在本文中，我们比较了不同的效用定义，并将其应用于人脸图像和指纹图像数据集，这些数据集包含每个生物识别实例的多个样本，并涵盖各种潜在的质量问题。结果仅略有不同。我们发现，舍弃效用分数低的样本会导致错误非匹配率迅速下降。所获得的效用分数可用作训练生物识别样本质量评估算法的目标标签，也可用作在单幅图甚至单个优点图中总结效用预测性能的基线。

引用次数: 0

Beyond the visible: thermal data for facial soft biometric estimation 超越可见光：用于面部软生物识别估算的热数据

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-09-06 DOI: 10.1186/s13640-024-00640-5

Nelida Mirabet-Herranz, Jean-Luc Dugelay

In recent years, the estimation of biometric parameters from facial visuals, including images and videos, has emerged as a prominent area of research. However, the robustness of deep learning-based models is challenged, particularly in the presence of changing illumination conditions. To overcome these limitations and unlock new opportunities, thermal imagery has arisen as a viable alternative. Nevertheless, the limited availability of datasets containing thermal data and the small amount of annotations on them limits the exploration of this spectrum. Motivated by this gap, this paper introduces the Label-EURECOM Visible and Thermal (LVT) Face Dataset for face biometrics. This pioneering dataset includes paired visible and thermal images and videos from 52 subjects along with metadata of 22 soft biometrics and health parameters. Due to the reduced number of existing datasets in this domain, the LVT Face Dataset aims to facilitate further research and advancements in the utilization of thermal imagery for diverse eHealth applications and soft biometric estimation. Moreover, we present the first comparative study between visible and thermal spectra as input images for soft biometric estimation, namely gender age and weight, from face images on our collected dataset.

近年来，从面部视觉效果（包括图像和视频）估算生物识别参数已成为一个突出的研究领域。然而，基于深度学习的模型的鲁棒性受到了挑战，尤其是在光照条件不断变化的情况下。为了克服这些局限性并开启新的机遇，热成像已成为一种可行的替代方法。然而，包含热数据的数据集的可用性有限，且注释量较少，这限制了对该光谱的探索。基于这一空白，本文介绍了用于人脸生物识别的 Label-EURECOM 可见光和热成像（LVT）人脸数据集。这个开创性的数据集包括来自 52 个受试者的成对可见光和热图像及视频，以及 22 种软生物识别技术和健康参数的元数据。由于该领域现有数据集的数量较少，LVT 人脸数据集旨在促进热图像在各种电子健康应用和软生物识别估算中的进一步研究和发展。此外，我们还首次将可见光谱和热光谱作为软生物识别估算的输入图像，即从我们收集的数据集上的人脸图像估算性别年龄和体重。

{"title":"Beyond the visible: thermal data for facial soft biometric estimation","authors":"Nelida Mirabet-Herranz, Jean-Luc Dugelay","doi":"10.1186/s13640-024-00640-5","DOIUrl":"https://doi.org/10.1186/s13640-024-00640-5","url":null,"abstract":"In recent years, the estimation of biometric parameters from facial visuals, including images and videos, has emerged as a prominent area of research. However, the robustness of deep learning-based models is challenged, particularly in the presence of changing illumination conditions. To overcome these limitations and unlock new opportunities, thermal imagery has arisen as a viable alternative. Nevertheless, the limited availability of datasets containing thermal data and the small amount of annotations on them limits the exploration of this spectrum. Motivated by this gap, this paper introduces the Label-EURECOM Visible and Thermal (LVT) Face Dataset for face biometrics. This pioneering dataset includes paired visible and thermal images and videos from 52 subjects along with metadata of 22 soft biometrics and health parameters. Due to the reduced number of existing datasets in this domain, the LVT Face Dataset aims to facilitate further research and advancements in the utilization of thermal imagery for diverse eHealth applications and soft biometric estimation. Moreover, we present the first comparative study between visible and thermal spectra as input images for soft biometric estimation, namely gender age and weight, from face images on our collected dataset.","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"4 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Contactless hand biometrics for forensics: review and performance benchmark 用于取证的非接触式手部生物识别技术：回顾与性能基准

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-09-05 DOI: 10.1186/s13640-024-00642-3

Lazaro Janier Gonzalez-Soler, Kacper Marek Zyla, Christian Rathgeb, Daniel Fischer

Contactless hand biometrics has emerged as an alternative to traditional biometric characteristics, e.g., fingerprint or face, as it possesses distinctive properties that are of interest in forensic investigations. As a result, several hand-based recognition techniques have been proposed with the aim of identifying both wanted criminals and missing victims. The great success of deep neural networks and their application in a variety of computer vision and pattern recognition tasks has led to hand-based algorithms achieving high identification performance on controlled images with few variations in, e.g., background context and hand gestures. This article provides a comprehensive review of the scientific literature focused on contactless hand biometrics together with an in-depth analysis of the identification performance of freely available deep learning-based hand recognition systems under various scenarios. Based on the performance benchmark, the relevant technical considerations and trade-offs of state-of-the-art methods are discussed, as well as further topics related to this research field.

非接触式手部生物识别技术已成为指纹或脸部等传统生物识别特征的替代技术，因为它具有法医调查中感兴趣的独特属性。因此，人们提出了几种基于手的识别技术，目的是识别通缉犯和失踪受害者。深度神经网络的巨大成功及其在各种计算机视觉和模式识别任务中的应用，使得基于手的算法在背景和手势等变化较少的受控图像上实现了较高的识别性能。本文全面回顾了有关非接触式手部生物识别的科学文献，并深入分析了基于深度学习的免费手部识别系统在各种场景下的识别性能。在性能基准的基础上，讨论了最先进方法的相关技术考虑因素和权衡，以及与该研究领域相关的其他主题。

{"title":"Contactless hand biometrics for forensics: review and performance benchmark","authors":"Lazaro Janier Gonzalez-Soler, Kacper Marek Zyla, Christian Rathgeb, Daniel Fischer","doi":"10.1186/s13640-024-00642-3","DOIUrl":"https://doi.org/10.1186/s13640-024-00642-3","url":null,"abstract":"Contactless hand biometrics has emerged as an alternative to traditional biometric characteristics, e.g., fingerprint or face, as it possesses distinctive properties that are of interest in forensic investigations. As a result, several hand-based recognition techniques have been proposed with the aim of identifying both wanted criminals and missing victims. The great success of deep neural networks and their application in a variety of computer vision and pattern recognition tasks has led to hand-based algorithms achieving high identification performance on controlled images with few variations in, e.g., background context and hand gestures. This article provides a comprehensive review of the scientific literature focused on contactless hand biometrics together with an in-depth analysis of the identification performance of freely available deep learning-based hand recognition systems under various scenarios. Based on the performance benchmark, the relevant technical considerations and trade-offs of state-of-the-art methods are discussed, as well as further topics related to this research field.","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"193 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Face image de-identification based on feature embedding 基于特征嵌入的人脸图像识别

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-09-02 DOI: 10.1186/s13640-024-00646-z

Goki Hanawa, Koichi Ito, Takafumi Aoki

A large number of images are available on the Internet with the growth of social networking services, and many of them are face photos or contain faces. It is necessary to protect the privacy of face images to prevent their malicious use by face image de-identification techniques that make face recognition difficult, which prevent the collection of specific face images using face recognition. In this paper, we propose a face image de-identification method that generates a de-identified image from an input face image by embedding facial features extracted from that of another person into the input face image. We develop the novel framework for embedding facial features into a face image and loss functions based on images and features to de-identify a face image preserving its appearance. Through a set of experiments using public face image datasets, we demonstrate that the proposed method exhibits higher de-identification performance against unknown face recognition models than conventional methods while preserving the appearance of the input face images.

随着社交网络服务的发展，互联网上出现了大量图像，其中许多是人脸照片或包含人脸。为了防止人脸图像被恶意使用，有必要通过人脸图像去识别技术来保护人脸图像的隐私，因为这种技术会给人脸识别带来困难，从而无法利用人脸识别技术收集特定的人脸图像。在本文中，我们提出了一种人脸图像去识别方法，通过将从他人图像中提取的面部特征嵌入到输入的人脸图像中，从输入的人脸图像中生成去识别图像。我们开发了将面部特征嵌入到人脸图像中的新框架，并开发了基于图像和特征的损失函数，以去识别保持外观的人脸图像。通过一组使用公共人脸图像数据集的实验，我们证明了与传统方法相比，所提出的方法在保留输入人脸图像外观的同时，对未知人脸识别模型具有更高的去识别性能。

引用次数: 0

Comprehensive multiparametric analysis of human deepfake speech recognition 人类深度伪语音识别的多参数综合分析

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-08-30 DOI: 10.1186/s13640-024-00641-4

Kamil Malinka, Anton Firc, Milan Šalko, Daniel Prudký, Karolína Radačovská, Petr Hanáček

In this paper, we undertake a novel two-pronged investigation into the human recognition of deepfake speech, addressing critical gaps in existing research. First, we pioneer an evaluation of the impact of prior information on deepfake recognition, setting our work apart by simulating real-world attack scenarios where individuals are not informed in advance of deepfake exposure. This approach simulates the unpredictability of real-world deepfake attacks, providing unprecedented insights into human vulnerability under realistic conditions. Second, we introduce a novel metric to evaluate the quality of deepfake audio. This metric facilitates a deeper exploration into how the quality of deepfake speech influences human detection accuracy. By examining both the effect of prior knowledge about deepfakes and the role of deepfake speech quality, our research reveals the importance of these factors, contributes to understanding human vulnerability to deepfakes, and suggests measures to enhance human detection skills.

在本文中，我们针对现有研究中的关键空白，对深度伪造语音的人工识别进行了双管齐下的新颖研究。首先，我们开创性地评估了先验信息对deepfake识别的影响，通过模拟真实世界的攻击场景，使我们的工作与众不同，在这些场景中，个人不会提前获知deepfake暴露。这种方法模拟了真实世界中deepfake攻击的不可预测性，为在现实条件下了解人类的弱点提供了前所未有的见解。其次，我们引入了一种新的指标来评估 deepfake 音频的质量。该指标有助于更深入地探索深度伪造语音的质量如何影响人类检测的准确性。通过研究有关深度伪造的先验知识的影响和深度伪造语音质量的作用，我们的研究揭示了这些因素的重要性，有助于理解人类对深度伪造的脆弱性，并提出了提高人类检测技能的措施。

引用次数: 0

A method for image–text matching based on semantic filtering and adaptive adjustment 基于语义过滤和自适应调整的图像文本匹配方法

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-08-29 DOI: 10.1186/s13640-024-00639-y

Ran Jin, Tengda Hou, Tao Jin, Jie Yuan, Chenjie Du

As image–text matching (a critical task in the field of computer vision) links cross-modal data, it has captured extensive attention. Most of the existing methods intended for matching images and texts explore the local similarity levels between images and sentences to align images with texts. Even though this fine-grained approach has remarkable gains, how to further mine the deep semantics between data pairs and focus on the essential semantics in data remains to be quested. In this work, a new semantic filtering and adaptive approach (FAAR) was proposed to ease the above problem. To be specific, the filtered attention (FA) module selectively focuses on typical alignments with the interference of meaningless comparisons eliminated. Next, the adaptive regulator (AR) further adjusts the attention weights of key segments for filtered regions and words. The superiority of our proposed method was validated by a number of qualitative experiments and analyses on the Flickr30K and MSCOCO data sets.

图像与文本匹配（计算机视觉领域的一项重要任务）将跨模态数据联系在一起，因此受到广泛关注。大多数现有的图像与文本匹配方法都是通过探索图像与句子之间的局部相似度来实现图像与文本的匹配。尽管这种细粒度的方法取得了显著的效果，但如何进一步挖掘数据对之间的深层语义，聚焦数据中的本质语义，仍是一个亟待解决的问题。本文提出了一种新的语义过滤和自适应方法（FAAR）来解决上述问题。具体来说，过滤注意力（FA）模块选择性地关注典型配准，排除无意义比较的干扰。接下来，自适应调节器（AR）进一步调整关键片段的注意力权重，以过滤区域和单词。在 Flickr30K 和 MSCOCO 数据集上进行的大量定性实验和分析验证了我们提出的方法的优越性。

引用次数: 0

Research on facial expression recognition algorithm based on improved MobileNetV3 基于改进的 MobileNetV3 的面部表情识别算法研究

IF 2.4 4区计算机科学

Eurasip Journal on Image and Video Processing

Pub Date : 2024-08-22 DOI: 10.1186/s13640-024-00638-z

Bin Jiang, Nanxing Li, Xiaomei Cui, Qiuwen Zhang, Huanlong Zhang, Zuhe Li, Weihua Liu

Aiming at the problem that face images are easily interfered by occlusion factors in uncontrollable environments, and the complex structure of traditional convolutional neural networks leads to low expression recognition rates, slow network convergence speed, and long network training time, an improved lightweight convolutional neural network is proposed for facial expression recognition algorithm. First, the dilation convolution is introduced into the shortcut connection of the inverted residual structure in the MobileNetV3 network to expand the receptive field of the convolution kernel and reduce the loss of expression features. Then, the channel attention mechanism SENet in the network is replaced by the two-dimensional (channel and spatial) attention mechanism SimAM introduced without parameters to reduce the network parameters. Finally, in the normalization operation, the Batch Normalization of the backbone network is replaced with Group Normalization, which is stable at various batch sizes, to reduce errors caused by processing small batches of data. Experimental results on RaFD, FER2013, and FER2013Plus face expression data sets show that the network reduces the training times while maintaining network accuracy, improves network convergence speed, and has good convergence effects.

针对人脸图像在不可控环境下易受遮挡因素干扰，以及传统卷积神经网络结构复杂导致表情识别率低、网络收敛速度慢、网络训练时间长等问题，提出了一种改进的轻量级卷积神经网络用于人脸表情识别算法。首先，在 MobileNetV3 网络的倒残差结构的快捷连接中引入扩张卷积，以扩大卷积核的感受野，减少表情特征的损失。然后，将网络中的信道注意机制 SENet 替换为无参数引入的二维（信道和空间）注意机制 SimAM，以减少网络参数。最后，在归一化操作中，将骨干网络的批归一化（Batch Normalization）替换为在各种批量大小下都很稳定的组归一化（Group Normalization），以减少处理小批量数据时产生的误差。在 RaFD、FER2013 和 FER2013Plus 人脸表情数据集上的实验结果表明，该网络在保持网络准确性的同时减少了训练时间，提高了网络收敛速度，具有良好的收敛效果。

{"title":"Research on facial expression recognition algorithm based on improved MobileNetV3","authors":"Bin Jiang, Nanxing Li, Xiaomei Cui, Qiuwen Zhang, Huanlong Zhang, Zuhe Li, Weihua Liu","doi":"10.1186/s13640-024-00638-z","DOIUrl":"https://doi.org/10.1186/s13640-024-00638-z","url":null,"abstract":"Aiming at the problem that face images are easily interfered by occlusion factors in uncontrollable environments, and the complex structure of traditional convolutional neural networks leads to low expression recognition rates, slow network convergence speed, and long network training time, an improved lightweight convolutional neural network is proposed for facial expression recognition algorithm. First, the dilation convolution is introduced into the shortcut connection of the inverted residual structure in the MobileNetV3 network to expand the receptive field of the convolution kernel and reduce the loss of expression features. Then, the channel attention mechanism SENet in the network is replaced by the two-dimensional (channel and spatial) attention mechanism SimAM introduced without parameters to reduce the network parameters. Finally, in the normalization operation, the Batch Normalization of the backbone network is replaced with Group Normalization, which is stable at various batch sizes, to reduce errors caused by processing small batches of data. Experimental results on RaFD, FER2013, and FER2013Plus face expression data sets show that the network reduces the training times while maintaining network accuracy, improves network convergence speed, and has good convergence effects.","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"43 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Eurasip Journal on Image and Video Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀