Alexander Diel , Tania Lalgi , Isabel Carolin Schröter , Karl F. MacDorman , Martin Teufel , Alexander Bäuerle
{"title":"Human performance in detecting deepfakes: A systematic review and meta-analysis of 56 papers","authors":"Alexander Diel , Tania Lalgi , Isabel Carolin Schröter , Karl F. MacDorman , Martin Teufel , Alexander Bäuerle","doi":"10.1016/j.chbr.2024.100538","DOIUrl":null,"url":null,"abstract":"<div><div><em>Deepfakes</em> are AI-generated media designed to look real, often with the intent to deceive. Deepfakes threaten public and personal safety by facilitating disinformation, propaganda, and identity theft. Though research has been conducted on human performance in deepfake detection, the results have not yet been synthesized. This systematic review and meta-analysis investigates human deepfake detection accuracy. Searches in PubMed, ScienceGov, JSTOR, Google Scholar, and paper references, conducted in June and October 2024, identified empirical studies measuring human detection of high-quality deepfakes. After pooling accuracy, odds-ratio, and sensitivity (<em>d'</em>) effect sizes (<em>k</em> = 137 effects) from 56 papers involving 86,155 participants, we analyzed 1) overall deepfake detection performance, 2) performance across stimulus types (audio, image, text, and video), and 3) the effects of detection-improvement strategies. Overall deepfake detection rates (<em>sensitivity</em>) were not significantly above chance because 95% confidence intervals crossed 50%. Total deepfake detection accuracy was 55.54% (95% CI [48.87, 62.10], <em>k</em> = 67). For audio, accuracy was 62.08% [38.23, 83.18], <em>k</em> = 8; for images, 53.16% [42.12, 64.64], <em>k</em> = 18; for text, 52.00% [37.42, 65.88], <em>k</em> = 15; and for video, 57.31% [47.80, 66.57], <em>k</em> = 26. Odds ratios were 0.64 [0.52, 0.79], <em>k</em> = 62, indicating 39% detection accuracy, below chance (audio 45%, image 35%, text 40%, video 40%). Moreover, <em>d'</em> values show no significant difference from chance. However, strategies like feedback training, AI support, and deepfake caricaturization improved detection performance above chance levels (65.14% [55.21, 74.46], <em>k</em> = 15), especially for video stimuli.</div></div>","PeriodicalId":72681,"journal":{"name":"Computers in human behavior reports","volume":"16 ","pages":"Article 100538"},"PeriodicalIF":4.9000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in human behavior reports","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2451958824001714","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Deepfakes are AI-generated media designed to look real, often with the intent to deceive. Deepfakes threaten public and personal safety by facilitating disinformation, propaganda, and identity theft. Though research has been conducted on human performance in deepfake detection, the results have not yet been synthesized. This systematic review and meta-analysis investigates human deepfake detection accuracy. Searches in PubMed, ScienceGov, JSTOR, Google Scholar, and paper references, conducted in June and October 2024, identified empirical studies measuring human detection of high-quality deepfakes. After pooling accuracy, odds-ratio, and sensitivity (d') effect sizes (k = 137 effects) from 56 papers involving 86,155 participants, we analyzed 1) overall deepfake detection performance, 2) performance across stimulus types (audio, image, text, and video), and 3) the effects of detection-improvement strategies. Overall deepfake detection rates (sensitivity) were not significantly above chance because 95% confidence intervals crossed 50%. Total deepfake detection accuracy was 55.54% (95% CI [48.87, 62.10], k = 67). For audio, accuracy was 62.08% [38.23, 83.18], k = 8; for images, 53.16% [42.12, 64.64], k = 18; for text, 52.00% [37.42, 65.88], k = 15; and for video, 57.31% [47.80, 66.57], k = 26. Odds ratios were 0.64 [0.52, 0.79], k = 62, indicating 39% detection accuracy, below chance (audio 45%, image 35%, text 40%, video 40%). Moreover, d' values show no significant difference from chance. However, strategies like feedback training, AI support, and deepfake caricaturization improved detection performance above chance levels (65.14% [55.21, 74.46], k = 15), especially for video stimuli.