{"title":"深度伪造环境音频的检测","authors":"Hafsa Ouajdi, Oussama Hadder, Modan Tailleur, Mathieu Lagrange, Laurie M. Heller","doi":"arxiv-2403.17529","DOIUrl":null,"url":null,"abstract":"With the ever-rising quality of deep generative models, it is increasingly\nimportant to be able to discern whether the audio data at hand have been\nrecorded or synthesized. Although the detection of fake speech signals has been\nstudied extensively, this is not the case for the detection of fake\nenvironmental audio. We propose a simple and efficient pipeline for detecting fake environmental\nsounds based on the CLAP audio embedding. We evaluate this detector using audio\ndata from the 2023 DCASE challenge task on Foley sound synthesis. Our experiments show that fake sounds generated by 44 state-of-the-art\nsynthesizers can be detected on average with 98% accuracy. We show that using\nan audio embedding learned on environmental audio is beneficial over a standard\nVGGish one as it provides a 10% increase in detection performance. Informal\nlistening to Incorrect Negative examples demonstrates audible features of fake\nsounds missed by the detector such as distortion and implausible background\nnoise.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detection of Deepfake Environmental Audio\",\"authors\":\"Hafsa Ouajdi, Oussama Hadder, Modan Tailleur, Mathieu Lagrange, Laurie M. Heller\",\"doi\":\"arxiv-2403.17529\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the ever-rising quality of deep generative models, it is increasingly\\nimportant to be able to discern whether the audio data at hand have been\\nrecorded or synthesized. Although the detection of fake speech signals has been\\nstudied extensively, this is not the case for the detection of fake\\nenvironmental audio. We propose a simple and efficient pipeline for detecting fake environmental\\nsounds based on the CLAP audio embedding. We evaluate this detector using audio\\ndata from the 2023 DCASE challenge task on Foley sound synthesis. Our experiments show that fake sounds generated by 44 state-of-the-art\\nsynthesizers can be detected on average with 98% accuracy. We show that using\\nan audio embedding learned on environmental audio is beneficial over a standard\\nVGGish one as it provides a 10% increase in detection performance. Informal\\nlistening to Incorrect Negative examples demonstrates audible features of fake\\nsounds missed by the detector such as distortion and implausible background\\nnoise.\",\"PeriodicalId\":501178,\"journal\":{\"name\":\"arXiv - CS - Sound\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Sound\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2403.17529\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2403.17529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
With the ever-rising quality of deep generative models, it is increasingly
important to be able to discern whether the audio data at hand have been
recorded or synthesized. Although the detection of fake speech signals has been
studied extensively, this is not the case for the detection of fake
environmental audio. We propose a simple and efficient pipeline for detecting fake environmental
sounds based on the CLAP audio embedding. We evaluate this detector using audio
data from the 2023 DCASE challenge task on Foley sound synthesis. Our experiments show that fake sounds generated by 44 state-of-the-art
synthesizers can be detected on average with 98% accuracy. We show that using
an audio embedding learned on environmental audio is beneficial over a standard
VGGish one as it provides a 10% increase in detection performance. Informal
listening to Incorrect Negative examples demonstrates audible features of fake
sounds missed by the detector such as distortion and implausible background
noise.