Self-supervised random mask attention GAN in tackling pose-invariant face recognition

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pattern Recognition Pub Date : 2024-11-02 DOI:10.1016/j.patcog.2024.111112
Jiashu Liao , Tanaya Guha , Victor Sanchez
{"title":"Self-supervised random mask attention GAN in tackling pose-invariant face recognition","authors":"Jiashu Liao ,&nbsp;Tanaya Guha ,&nbsp;Victor Sanchez","doi":"10.1016/j.patcog.2024.111112","DOIUrl":null,"url":null,"abstract":"<div><div>Pose Invariant Face Recognition (PIFR) has significantly advanced with Generative Adversarial Networks (GANs), which rotate face images acquired at any angle to a frontal view for enhanced recognition. However, such frontalization methods typically need ground-truth frontal-view images, often collected under strict laboratory conditions, making it challenging and costly to acquire the necessary training data. Additionally, traditional self-supervised PIFR methods rely on external rendering models for training, further complicating the overall training process. To tackle these two issues, we propose a new framework called <em>Mask Rotate</em>. Our framework introduces a novel training approach that requires no paired ground truth data for the face image frontalization task. Moreover, it eliminates the need for an external rendering model during training. Specifically, our framework simplifies the face image frontalization task by transforming it into a face image completion task. During the inference or testing stage, it employs a reliable pre-trained rendering model to obtain a frontal-view face image, which may have several regions with missing texture due to pose variations and occlusion. Our framework then uses a novel self-supervised <em>Random Mask</em> Attention Generative Adversarial Network (RMAGAN) to fill in these missing regions by considering them as randomly masked regions. Furthermore, our proposed <em>Mask Rotate</em> framework uses a reliable post-processing model designed to improve the visual quality of the face images after frontalization. In comprehensive experiments, the <em>Mask Rotate</em> framework eliminates the requirement for complex computations during training and achieves strong results, both qualitative and quantitative, compared to the state-of-the-art.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111112"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S003132032400863X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Pose Invariant Face Recognition (PIFR) has significantly advanced with Generative Adversarial Networks (GANs), which rotate face images acquired at any angle to a frontal view for enhanced recognition. However, such frontalization methods typically need ground-truth frontal-view images, often collected under strict laboratory conditions, making it challenging and costly to acquire the necessary training data. Additionally, traditional self-supervised PIFR methods rely on external rendering models for training, further complicating the overall training process. To tackle these two issues, we propose a new framework called Mask Rotate. Our framework introduces a novel training approach that requires no paired ground truth data for the face image frontalization task. Moreover, it eliminates the need for an external rendering model during training. Specifically, our framework simplifies the face image frontalization task by transforming it into a face image completion task. During the inference or testing stage, it employs a reliable pre-trained rendering model to obtain a frontal-view face image, which may have several regions with missing texture due to pose variations and occlusion. Our framework then uses a novel self-supervised Random Mask Attention Generative Adversarial Network (RMAGAN) to fill in these missing regions by considering them as randomly masked regions. Furthermore, our proposed Mask Rotate framework uses a reliable post-processing model designed to improve the visual quality of the face images after frontalization. In comprehensive experiments, the Mask Rotate framework eliminates the requirement for complex computations during training and achieves strong results, both qualitative and quantitative, compared to the state-of-the-art.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
解决姿态不变人脸识别问题的自监督随机掩码注意力 GAN
姿态不变人脸识别(PIFR)在生成对抗网络(GANs)的帮助下取得了长足的进步,该网络可将从任何角度获取的人脸图像旋转为正面视图,从而提高识别率。然而,这种正面化方法通常需要在严格的实验室条件下采集的地面真实正面视图图像,因此获取必要的训练数据具有挑战性且成本高昂。此外,传统的自监督 PIFR 方法依赖外部渲染模型进行训练,使整个训练过程更加复杂。为了解决这两个问题,我们提出了一个名为 Mask Rotate 的新框架。我们的框架引入了一种新颖的训练方法,无需为人脸图像正面化任务提供成对的地面真实数据。此外,它在训练过程中无需外部渲染模型。具体来说,我们的框架将人脸图像正面化任务转化为人脸图像完成任务,从而简化了人脸图像正面化任务。在推理或测试阶段,它采用可靠的预训练渲染模型来获取正面视角的人脸图像,由于姿势变化和遮挡,该图像可能会有几个纹理缺失的区域。然后,我们的框架使用新颖的自监督随机遮罩注意力生成对抗网络(RMAGAN),通过将这些区域视为随机遮罩区域来填补这些缺失区域。此外,我们提出的掩码旋转框架采用了可靠的后处理模型,旨在提高正面化后人脸图像的视觉质量。在综合实验中,面具旋转框架无需在训练过程中进行复杂的计算,在定性和定量方面都取得了优于最先进技术的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
期刊最新文献
Learning accurate and enriched features for stereo image super-resolution Semi-supervised multi-view feature selection with adaptive similarity fusion and learning DyConfidMatch: Dynamic thresholding and re-sampling for 3D semi-supervised learning CAST: An innovative framework for Cross-dimensional Attention Structure in Transformers Embedded feature selection for robust probability learning machines
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1