Human performance in detecting deepfakes: A systematic review and meta-analysis of 56 papers

IF 4.9 Q1 PSYCHOLOGY, EXPERIMENTAL Computers in human behavior reports Pub Date : 2024-12-01 DOI:10.1016/j.chbr.2024.100538
Alexander Diel , Tania Lalgi , Isabel Carolin Schröter , Karl F. MacDorman , Martin Teufel , Alexander Bäuerle
{"title":"Human performance in detecting deepfakes: A systematic review and meta-analysis of 56 papers","authors":"Alexander Diel ,&nbsp;Tania Lalgi ,&nbsp;Isabel Carolin Schröter ,&nbsp;Karl F. MacDorman ,&nbsp;Martin Teufel ,&nbsp;Alexander Bäuerle","doi":"10.1016/j.chbr.2024.100538","DOIUrl":null,"url":null,"abstract":"<div><div><em>Deepfakes</em> are AI-generated media designed to look real, often with the intent to deceive. Deepfakes threaten public and personal safety by facilitating disinformation, propaganda, and identity theft. Though research has been conducted on human performance in deepfake detection, the results have not yet been synthesized. This systematic review and meta-analysis investigates human deepfake detection accuracy. Searches in PubMed, ScienceGov, JSTOR, Google Scholar, and paper references, conducted in June and October 2024, identified empirical studies measuring human detection of high-quality deepfakes. After pooling accuracy, odds-ratio, and sensitivity (<em>d'</em>) effect sizes (<em>k</em> = 137 effects) from 56 papers involving 86,155 participants, we analyzed 1) overall deepfake detection performance, 2) performance across stimulus types (audio, image, text, and video), and 3) the effects of detection-improvement strategies. Overall deepfake detection rates (<em>sensitivity</em>) were not significantly above chance because 95% confidence intervals crossed 50%. Total deepfake detection accuracy was 55.54% (95% CI [48.87, 62.10], <em>k</em> = 67). For audio, accuracy was 62.08% [38.23, 83.18], <em>k</em> = 8; for images, 53.16% [42.12, 64.64], <em>k</em> = 18; for text, 52.00% [37.42, 65.88], <em>k</em> = 15; and for video, 57.31% [47.80, 66.57], <em>k</em> = 26. Odds ratios were 0.64 [0.52, 0.79], <em>k</em> = 62, indicating 39% detection accuracy, below chance (audio 45%, image 35%, text 40%, video 40%). Moreover, <em>d'</em> values show no significant difference from chance. However, strategies like feedback training, AI support, and deepfake caricaturization improved detection performance above chance levels (65.14% [55.21, 74.46], <em>k</em> = 15), especially for video stimuli.</div></div>","PeriodicalId":72681,"journal":{"name":"Computers in human behavior reports","volume":"16 ","pages":"Article 100538"},"PeriodicalIF":4.9000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in human behavior reports","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2451958824001714","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Deepfakes are AI-generated media designed to look real, often with the intent to deceive. Deepfakes threaten public and personal safety by facilitating disinformation, propaganda, and identity theft. Though research has been conducted on human performance in deepfake detection, the results have not yet been synthesized. This systematic review and meta-analysis investigates human deepfake detection accuracy. Searches in PubMed, ScienceGov, JSTOR, Google Scholar, and paper references, conducted in June and October 2024, identified empirical studies measuring human detection of high-quality deepfakes. After pooling accuracy, odds-ratio, and sensitivity (d') effect sizes (k = 137 effects) from 56 papers involving 86,155 participants, we analyzed 1) overall deepfake detection performance, 2) performance across stimulus types (audio, image, text, and video), and 3) the effects of detection-improvement strategies. Overall deepfake detection rates (sensitivity) were not significantly above chance because 95% confidence intervals crossed 50%. Total deepfake detection accuracy was 55.54% (95% CI [48.87, 62.10], k = 67). For audio, accuracy was 62.08% [38.23, 83.18], k = 8; for images, 53.16% [42.12, 64.64], k = 18; for text, 52.00% [37.42, 65.88], k = 15; and for video, 57.31% [47.80, 66.57], k = 26. Odds ratios were 0.64 [0.52, 0.79], k = 62, indicating 39% detection accuracy, below chance (audio 45%, image 35%, text 40%, video 40%). Moreover, d' values show no significant difference from chance. However, strategies like feedback training, AI support, and deepfake caricaturization improved detection performance above chance levels (65.14% [55.21, 74.46], k = 15), especially for video stimuli.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人类在检测深度造假中的表现:对56篇论文的系统回顾和荟萃分析
深度造假是人工智能生成的媒体,旨在看起来真实,通常带有欺骗的意图。深度造假通过助长虚假信息、宣传和身份盗窃,威胁公共和个人安全。虽然已经对人类在深度伪造检测中的表现进行了研究,但结果尚未合成。本系统综述和荟萃分析调查了人类深度伪造检测的准确性。在PubMed, ScienceGov, JSTOR, b谷歌Scholar和论文参考文献中进行的搜索于2024年6月和10月进行,确定了测量人类检测高质量深度伪造的实证研究。在汇总了涉及86155名参与者的56篇论文的准确性、优势比和灵敏度(d')效应大小(k = 137个效应)后,我们分析了1)整体深度伪造检测性能,2)不同刺激类型(音频、图像、文本和视频)的性能,以及3)检测改进策略的效果。总体深度伪造的检测率(灵敏度)没有显著高于概率,因为95%的置信区间超过50%。总深度造假检测准确率为55.54% (95% CI [48.87, 62.10], k = 67)。音频准确率为62.08% [38.23,83.18],k = 8;对于图像,53.16% [42.12,64.64],k = 18;对于文本,52.00% [37.42,65.88],k = 15;视频为57.31% [47.80,66.57],k = 26。比值比为0.64 [0.52,0.79],k = 62,表明检测准确率为39%,低于概率(音频45%,图像35%,文本40%,视频40%)。d'值与偶然性无显著差异。然而,反馈训练、人工智能支持和深度假漫画化等策略将检测性能提高到机会水平以上(65.14% [55.21,74.46],k = 15),特别是对于视频刺激。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.80
自引率
0.00%
发文量
0
期刊最新文献
Designing adaptive learning environments for continuing education: Stakeholders’ perspectives on indicators and interventions Longitudinal bidirectional relation between fear of missing out and risky loot box consumption: Evidence for FoMO-Driven loot boxes spiral hypothesis The thrill of chance: Psychophysiological responses in loot boxes and simulated slot machines Friendship or feedback? – Relations between computer science students’ goals, technology acceptance, use of an online peer feedback tool, and learning Human performance in detecting deepfakes: A systematic review and meta-analysis of 56 papers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1