{"title":"Towards Detection of Synthetic Utterances in Romanian Language Speech Forensics","authors":"Gheorghe Pop, D. Burileanu","doi":"10.1109/sped53181.2021.9587393","DOIUrl":null,"url":null,"abstract":"The latest decade has seen a huge wave of interest in the synthesis of human image and speech. Besides the enormous impact of synthetic voice in the communication between humans and machines, the production of the so-called “fake media” entered the focus of forensic audio and video communities. A large variety of techniques are now available to produce synthetic speech, from the traditional concatenative speech production to multi-million parameter speech and speaker models. Recent work in the field of artificial intelligence (AI) has shown some synthetic speech generators as capable to fool even state-of-the-art automatic speaker verification systems. AI seems to hold the key to successful speaker spoofing attacks, but also for their countermeasures. As a first step on the way, this paper describes a data-centric method to detect the use of synthetically generated spoken digits in the Romanian language.","PeriodicalId":193702,"journal":{"name":"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/sped53181.2021.9587393","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The latest decade has seen a huge wave of interest in the synthesis of human image and speech. Besides the enormous impact of synthetic voice in the communication between humans and machines, the production of the so-called “fake media” entered the focus of forensic audio and video communities. A large variety of techniques are now available to produce synthetic speech, from the traditional concatenative speech production to multi-million parameter speech and speaker models. Recent work in the field of artificial intelligence (AI) has shown some synthetic speech generators as capable to fool even state-of-the-art automatic speaker verification systems. AI seems to hold the key to successful speaker spoofing attacks, but also for their countermeasures. As a first step on the way, this paper describes a data-centric method to detect the use of synthetically generated spoken digits in the Romanian language.