Towards Detection of Synthetic Utterances in Romanian Language Speech Forensics

2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) Pub Date : 2021-10-13 DOI:10.1109/sped53181.2021.9587393

Gheorghe Pop, D. Burileanu

引用次数: 1

Abstract

The latest decade has seen a huge wave of interest in the synthesis of human image and speech. Besides the enormous impact of synthetic voice in the communication between humans and machines, the production of the so-called “fake media” entered the focus of forensic audio and video communities. A large variety of techniques are now available to produce synthetic speech, from the traditional concatenative speech production to multi-million parameter speech and speaker models. Recent work in the field of artificial intelligence (AI) has shown some synthetic speech generators as capable to fool even state-of-the-art automatic speaker verification systems. AI seems to hold the key to successful speaker spoofing attacks, but also for their countermeasures. As a first step on the way, this paper describes a data-centric method to detect the use of synthetically generated spoken digits in the Romanian language.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

罗马尼亚语语音取证中合成语音的检测研究

近十年来，人们对人类图像和语言的合成产生了巨大的兴趣。除了人工合成语音在人与机器交流中的巨大影响外，所谓“假媒体”的生产也成为了法医音视频界关注的焦点。现在有各种各样的技术可用于合成语音，从传统的串联语音生产到数百万参数的语音和扬声器模型。人工智能(AI)领域的最新研究表明，一些合成语音生成器甚至能够欺骗最先进的自动语音验证系统。人工智能似乎掌握了成功的演讲者欺骗攻击的关键，但也为他们的对策。作为第一步，本文描述了一种以数据为中心的方法来检测罗马尼亚语中合成生成的口语数字的使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)

自引率

0.00%

发文量