eDifFIQA: Towards Efficient Face Image Quality Assessment Based on Denoising Diffusion Probabilistic Models

IEEE transactions on biometrics, behavior, and identity science Pub Date : 2024-03-12 DOI:10.1109/TBIOM.2024.3376236

Žiga Babnik;Peter Peer;Vitomir Štruc

{"title":"eDifFIQA: Towards Efficient Face Image Quality Assessment Based on Denoising Diffusion Probabilistic Models","authors":"Žiga Babnik;Peter Peer;Vitomir Štruc","doi":"10.1109/TBIOM.2024.3376236","DOIUrl":null,"url":null,"abstract":"State-of-the-art Face Recognition (FR) models perform well in constrained scenarios, but frequently fail in difficult real-world scenarios, when no quality guarantees can be made for face samples. For this reason, Face Image Quality Assessment (FIQA) techniques are often used by FR systems, to provide quality estimates of captured face samples. The quality estimate provided by FIQA techniques can be used by the FR system to reject samples of low-quality, in turn improving the performance of the system and reducing the number of critical false-match errors. However, despite steady improvements, ensuring a good trade-off between the performance and computational complexity of FIQA methods across diverse face samples remains challenging. In this paper, we present DifFIQA, a powerful unsupervised approach for quality assessment based on the popular denoising diffusion probabilistic models (DDPMs) and the extended (eDifFIQA) approach. The main idea of the base DifFIQA approach is to utilize the forward and backward processes of DDPMs to perturb facial images and quantify the impact of these perturbations on the corresponding image embeddings for quality prediction. Because of the iterative nature of DDPMs the base DifFIQA approach is extremely computationally expensive. Using eDifFIQA we are able to improve on both the performance and computational complexity of the base DifFIQA approach, by employing label optimized knowledge distillation. In this process, quality information inferred by DifFIQA is distilled into a quality-regression model. During the distillation process we use an additional source of quality information hidden in the relative position of the embedding to further improve the predictive capabilities of the underlying regression model. By choosing different feature extraction backbone models as the basis for the quality-regression eDifFIQA model, we are able to control the trade-off between the predictive capabilities and computational complexity of the final model. We evaluate three eDifFIQA variants of varying sizes in comprehensive experiments on 7 diverse datasets containing static-images and a separate video-based dataset, with 4 target CNN-based FR models and 2 target Transformer-based FR models and against 10 state-of-the-art FIQA techniques, as well as against the initial DifFIQA baseline and a simple regression-based predictor DifFIQA(R), distilled from DifFIQA without any additional optimization. The results show that the proposed label optimized knowledge distillation improves on the performance and computationally complexity of the base DifFIQA approach, and is able to achieve state-of-the-art performance in several distinct experimental scenarios. Furthermore, we also show that the distilled model can be used directly for face recognition and leads to highly competitive results.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 4","pages":"458-474"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biometrics, behavior, and identity science","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10468647/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

State-of-the-art Face Recognition (FR) models perform well in constrained scenarios, but frequently fail in difficult real-world scenarios, when no quality guarantees can be made for face samples. For this reason, Face Image Quality Assessment (FIQA) techniques are often used by FR systems, to provide quality estimates of captured face samples. The quality estimate provided by FIQA techniques can be used by the FR system to reject samples of low-quality, in turn improving the performance of the system and reducing the number of critical false-match errors. However, despite steady improvements, ensuring a good trade-off between the performance and computational complexity of FIQA methods across diverse face samples remains challenging. In this paper, we present DifFIQA, a powerful unsupervised approach for quality assessment based on the popular denoising diffusion probabilistic models (DDPMs) and the extended (eDifFIQA) approach. The main idea of the base DifFIQA approach is to utilize the forward and backward processes of DDPMs to perturb facial images and quantify the impact of these perturbations on the corresponding image embeddings for quality prediction. Because of the iterative nature of DDPMs the base DifFIQA approach is extremely computationally expensive. Using eDifFIQA we are able to improve on both the performance and computational complexity of the base DifFIQA approach, by employing label optimized knowledge distillation. In this process, quality information inferred by DifFIQA is distilled into a quality-regression model. During the distillation process we use an additional source of quality information hidden in the relative position of the embedding to further improve the predictive capabilities of the underlying regression model. By choosing different feature extraction backbone models as the basis for the quality-regression eDifFIQA model, we are able to control the trade-off between the predictive capabilities and computational complexity of the final model. We evaluate three eDifFIQA variants of varying sizes in comprehensive experiments on 7 diverse datasets containing static-images and a separate video-based dataset, with 4 target CNN-based FR models and 2 target Transformer-based FR models and against 10 state-of-the-art FIQA techniques, as well as against the initial DifFIQA baseline and a simple regression-based predictor DifFIQA(R), distilled from DifFIQA without any additional optimization. The results show that the proposed label optimized knowledge distillation improves on the performance and computationally complexity of the base DifFIQA approach, and is able to achieve state-of-the-art performance in several distinct experimental scenarios. Furthermore, we also show that the distilled model can be used directly for face recognition and leads to highly competitive results.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

eDifFIQA：基于去噪扩散概率模型的高效人脸图像质量评估

最先进的人脸识别（FR）模型在受限场景中表现良好，但在现实世界的困难场景中却经常失败，因为无法保证人脸样本的质量。因此，人脸识别系统经常使用人脸图像质量评估（FIQA）技术，对采集到的人脸样本进行质量估计。FR 系统可以利用 FIQA 技术提供的质量估计值来剔除低质量的样本，从而提高系统性能，减少关键的错误匹配错误。然而，尽管 FIQA 技术在不断改进，但在不同的人脸样本中确保其性能和计算复杂度之间取得良好的平衡仍然具有挑战性。在本文中，我们介绍了 DifFIQA，这是一种基于流行的去噪扩散概率模型（DDPM）和扩展（eDifFIQA）方法的强大的无监督质量评估方法。基础 DifFIQA 方法的主要思想是利用 DDPMs 的前向和后向过程对面部图像进行扰动，并量化这些扰动对相应图像嵌入的影响，从而进行质量预测。由于 DDPM 的迭代性质，基本 DifFIQA 方法的计算成本极高。利用 eDifFIQA，我们可以通过采用标签优化知识提炼法来提高基本 DifFIQA 方法的性能和计算复杂度。在此过程中，DifFIQA 推断出的质量信息被提炼到质量回归模型中。在蒸馏过程中，我们使用了隐藏在嵌入相对位置中的额外质量信息源，以进一步提高基础回归模型的预测能力。通过选择不同的特征提取骨干模型作为质量回归 eDifFIQA 模型的基础，我们能够控制最终模型的预测能力和计算复杂度之间的权衡。我们在 7 个不同的数据集（包含静态图像和一个单独的视频数据集）上进行了综合实验，评估了三个不同规模的 eDifFIQA 变体，其中 4 个目标是基于 CNN 的 FR 模型，2 个目标是基于 Transformer 的 FR 模型，并与 10 种最先进的 FIQA 技术进行了比较，同时还与初始 DifFIQA 基线和基于回归的简单预测器 DifFIQA(R) 进行了比较，后者是从 DifFIQA 中提炼出来的，没有进行任何额外的优化。结果表明，建议的标签优化知识蒸馏提高了基础 DifFIQA 方法的性能和计算复杂度，并能在几个不同的实验场景中实现最先进的性能。此外，我们还表明，提炼后的模型可直接用于人脸识别，并能带来极具竞争力的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助