Human performance in detecting deepfakes: A systematic review and meta-analysis of 56 papers

IF 4.9 Q1 PSYCHOLOGY, EXPERIMENTAL Computers in human behavior reports Pub Date : 2024-12-01 DOI:10.1016/j.chbr.2024.100538

Alexander Diel , Tania Lalgi , Isabel Carolin Schröter , Karl F. MacDorman , Martin Teufel , Alexander Bäuerle

{"title":"Human performance in detecting deepfakes: A systematic review and meta-analysis of 56 papers","authors":"Alexander Diel , Tania Lalgi , Isabel Carolin Schröter , Karl F. MacDorman , Martin Teufel , Alexander Bäuerle","doi":"10.1016/j.chbr.2024.100538","DOIUrl":null,"url":null,"abstract":"<div><div>Deepfakes are AI-generated media designed to look real, often with the intent to deceive. Deepfakes threaten public and personal safety by facilitating disinformation, propaganda, and identity theft. Though research has been conducted on human performance in deepfake detection, the results have not yet been synthesized. This systematic review and meta-analysis investigates human deepfake detection accuracy. Searches in PubMed, ScienceGov, JSTOR, Google Scholar, and paper references, conducted in June and October 2024, identified empirical studies measuring human detection of high-quality deepfakes. After pooling accuracy, odds-ratio, and sensitivity (d') effect sizes (k = 137 effects) from 56 papers involving 86,155 participants, we analyzed 1) overall deepfake detection performance, 2) performance across stimulus types (audio, image, text, and video), and 3) the effects of detection-improvement strategies. Overall deepfake detection rates (sensitivity) were not significantly above chance because 95% confidence intervals crossed 50%. Total deepfake detection accuracy was 55.54% (95% CI [48.87, 62.10], k = 67). For audio, accuracy was 62.08% [38.23, 83.18], k = 8; for images, 53.16% [42.12, 64.64], k = 18; for text, 52.00% [37.42, 65.88], k = 15; and for video, 57.31% [47.80, 66.57], k = 26. Odds ratios were 0.64 [0.52, 0.79], k = 62, indicating 39% detection accuracy, below chance (audio 45%, image 35%, text 40%, video 40%). Moreover, d' values show no significant difference from chance. However, strategies like feedback training, AI support, and deepfake caricaturization improved detection performance above chance levels (65.14% [55.21, 74.46], k = 15), especially for video stimuli.</div></div>","PeriodicalId":72681,"journal":{"name":"Computers in human behavior reports","volume":"16 ","pages":"Article 100538"},"PeriodicalIF":4.9000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in human behavior reports","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2451958824001714","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Deepfakes are AI-generated media designed to look real, often with the intent to deceive. Deepfakes threaten public and personal safety by facilitating disinformation, propaganda, and identity theft. Though research has been conducted on human performance in deepfake detection, the results have not yet been synthesized. This systematic review and meta-analysis investigates human deepfake detection accuracy. Searches in PubMed, ScienceGov, JSTOR, Google Scholar, and paper references, conducted in June and October 2024, identified empirical studies measuring human detection of high-quality deepfakes. After pooling accuracy, odds-ratio, and sensitivity (d') effect sizes (k = 137 effects) from 56 papers involving 86,155 participants, we analyzed 1) overall deepfake detection performance, 2) performance across stimulus types (audio, image, text, and video), and 3) the effects of detection-improvement strategies. Overall deepfake detection rates (sensitivity) were not significantly above chance because 95% confidence intervals crossed 50%. Total deepfake detection accuracy was 55.54% (95% CI [48.87, 62.10], k = 67). For audio, accuracy was 62.08% [38.23, 83.18], k = 8; for images, 53.16% [42.12, 64.64], k = 18; for text, 52.00% [37.42, 65.88], k = 15; and for video, 57.31% [47.80, 66.57], k = 26. Odds ratios were 0.64 [0.52, 0.79], k = 62, indicating 39% detection accuracy, below chance (audio 45%, image 35%, text 40%, video 40%). Moreover, d' values show no significant difference from chance. However, strategies like feedback training, AI support, and deepfake caricaturization improved detection performance above chance levels (65.14% [55.21, 74.46], k = 15), especially for video stimuli.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

人类在检测深度造假中的表现：对56篇论文的系统回顾和荟萃分析

深度造假是人工智能生成的媒体，旨在看起来真实，通常带有欺骗的意图。深度造假通过助长虚假信息、宣传和身份盗窃，威胁公共和个人安全。虽然已经对人类在深度伪造检测中的表现进行了研究，但结果尚未合成。本系统综述和荟萃分析调查了人类深度伪造检测的准确性。在PubMed， ScienceGov， JSTOR， b谷歌Scholar和论文参考文献中进行的搜索于2024年6月和10月进行，确定了测量人类检测高质量深度伪造的实证研究。在汇总了涉及86155名参与者的56篇论文的准确性、优势比和灵敏度（d'）效应大小（k = 137个效应）后，我们分析了1)整体深度伪造检测性能，2)不同刺激类型（音频、图像、文本和视频）的性能，以及3)检测改进策略的效果。总体深度伪造的检测率（灵敏度）没有显著高于概率，因为95%的置信区间超过50%。总深度造假检测准确率为55.54% （95% CI [48.87, 62.10], k = 67）。音频准确率为62.08% [38.23,83.18],k = 8；对于图像，53.16% [42.12,64.64],k = 18；对于文本，52.00% [37.42,65.88],k = 15；视频为57.31% [47.80,66.57],k = 26。比值比为0.64 [0.52,0.79],k = 62，表明检测准确率为39%，低于概率（音频45%，图像35%，文本40%，视频40%）。d'值与偶然性无显著差异。然而，反馈训练、人工智能支持和深度假漫画化等策略将检测性能提高到机会水平以上（65.14% [55.21,74.46],k = 15），特别是对于视频刺激。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computers in human behavior reports Psychology (General)

CiteScore

7.80

自引率

0.00%

发文量