Unmasking deepfakes: Eye blink pattern analysis using a hybrid LSTM and MLP-CNN model

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Image and Vision Computing Pub Date : 2025-02-01 DOI:10.1016/j.imavis.2024.105370

Ruchika Sharma, Rudresh Dwivedi

{"title":"Unmasking deepfakes: Eye blink pattern analysis using a hybrid LSTM and MLP-CNN model","authors":"Ruchika Sharma, Rudresh Dwivedi","doi":"10.1016/j.imavis.2024.105370","DOIUrl":null,"url":null,"abstract":"<div><div>Recent progress in the field of computer vision incorporates robust tools for creating convincing deepfakes. Hence, the propagation of fake media may have detrimental effects on social communities, potentially tarnishing the reputation of individuals or groups. Furthermore, this phenomenon may manipulate public sentiments and skew opinions about the affected entities. Recent research determines Convolution Neural Networks (CNNs) as a viable solution for detecting deepfakes within the networks. However, existing techniques struggle to accurately capture the differences between frames in the collected media streams. To alleviate these limitations, our work proposes a new Deepfake detection approach using a hybrid model using the Multi-layer Perceptron Convolution Neural Network (MLP-CNN) model and LSTM (Long Short Term Memory). Our model has utilized Contrast Limited Adaptive Histogram Equalization (CLAHE) (Musa et al., 2018) approach to enhance the contrast of the image and later on applying Viola Jones Algorithm (VJA) (Paul et al., 2018) to the preprocessed image for detecting the face. The extracted features such as Improved eye blinking pattern detection (IEBPD), active shape model (ASM), face attributes, and eye attributes features along with the age and gender of the corresponding image are fed to the hybrid deepfake detection model that involves two classifiers MLP-CNN and LSTM model. The proposed model is trained with these features to detect the deepfake images proficiently. The experimentation demonstrates that our proposed hybrid model has been evaluated on two datasets, i.e. World Leader Dataset (WLDR) and the DeepfakeTIMIT Dataset. From the experimental results, it is affirmed that our proposed hybrid model outperforms existing approaches such as DeepVision, DNN (Deep Neutral Network), CNN (Convolution Neural Network), RNN (Recurrent Neural network), DeepMaxout, DBN (Deep Belief Networks), and Bi-GRU (Bi-Directional Gated Recurrent Unit).</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105370"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S026288562400475X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recent progress in the field of computer vision incorporates robust tools for creating convincing deepfakes. Hence, the propagation of fake media may have detrimental effects on social communities, potentially tarnishing the reputation of individuals or groups. Furthermore, this phenomenon may manipulate public sentiments and skew opinions about the affected entities. Recent research determines Convolution Neural Networks (CNNs) as a viable solution for detecting deepfakes within the networks. However, existing techniques struggle to accurately capture the differences between frames in the collected media streams. To alleviate these limitations, our work proposes a new Deepfake detection approach using a hybrid model using the Multi-layer Perceptron Convolution Neural Network (MLP-CNN) model and LSTM (Long Short Term Memory). Our model has utilized Contrast Limited Adaptive Histogram Equalization (CLAHE) (Musa et al., 2018) approach to enhance the contrast of the image and later on applying Viola Jones Algorithm (VJA) (Paul et al., 2018) to the preprocessed image for detecting the face. The extracted features such as Improved eye blinking pattern detection (IEBPD), active shape model (ASM), face attributes, and eye attributes features along with the age and gender of the corresponding image are fed to the hybrid deepfake detection model that involves two classifiers MLP-CNN and LSTM model. The proposed model is trained with these features to detect the deepfake images proficiently. The experimentation demonstrates that our proposed hybrid model has been evaluated on two datasets, i.e. World Leader Dataset (WLDR) and the DeepfakeTIMIT Dataset. From the experimental results, it is affirmed that our proposed hybrid model outperforms existing approaches such as DeepVision, DNN (Deep Neutral Network), CNN (Convolution Neural Network), RNN (Recurrent Neural network), DeepMaxout, DBN (Deep Belief Networks), and Bi-GRU (Bi-Directional Gated Recurrent Unit).

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

揭露深度伪造：使用混合LSTM和MLP-CNN模型进行眨眼模式分析

计算机视觉领域的最新进展结合了强大的工具来创建令人信服的深度伪造。因此，虚假媒体的传播可能会对社会群体产生不利影响，可能会损害个人或团体的声誉。此外，这种现象可能会操纵公众情绪，歪曲对受影响实体的看法。最近的研究确定卷积神经网络（cnn）是一种可行的解决方案，用于检测网络中的深度伪造。然而，现有的技术很难准确地捕捉到所收集的媒体流中帧之间的差异。为了减轻这些限制，我们的工作提出了一种新的Deepfake检测方法，使用多层感知器卷积神经网络（MLP-CNN）模型和LSTM（长短期记忆）的混合模型。我们的模型利用对比度有限自适应直方图均衡化（CLAHE）（Musa等人，2018）方法来增强图像的对比度，然后将Viola Jones算法（VJA）（Paul等人，2018）应用于预处理图像以检测人脸。将提取的改进的眨眼模式检测（IEBPD）、主动形状模型（ASM）、人脸属性和眼睛属性特征以及相应图像的年龄和性别等特征馈送到包含两个分类器MLP-CNN和LSTM模型的混合深度伪造检测模型中。利用这些特征对模型进行训练，使模型能够熟练地检测深度假图像。实验表明，我们提出的混合模型已经在两个数据集上进行了评估，即世界领先数据集（WLDR）和DeepfakeTIMIT数据集。从实验结果来看，我们提出的混合模型优于现有的方法，如DeepVision， DNN（深度中性网络），CNN（卷积神经网络），RNN（循环神经网络），DeepMaxout， DBN（深度信念网络）和Bi-GRU（双向门控循环单元）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.