Detection of AI-Synthesized Speech Using Cepstral & Bispectral Statistics

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2020-09-03 DOI:10.1109/MIPR51284.2021.00076

A. Singh, Priyanka Singh

引用次数: 15

Abstract

Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or maybe deep fakes. Validating the authenticity of a speech is one of the primary problems of digital audio forensics. We propose an approach to distinguish human speech from AI synthesized speech exploiting the Bi-spectral and Cepstral analysis. Higher-order statistics have less correlation for human speech in comparison to a synthesized speech. Also, Cepstral analysis revealed a durable power component in human speech that is missing for a synthesized speech. We integrate both these analyses and propose a model to detect AI synthesized speech.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于倒谱和双谱统计的人工智能合成语音检测

数字技术已经使难以想象的应用成为可能。拥有一些易于编辑和操作的工具似乎令人兴奋，但它也引发了令人担忧的担忧，这些担忧可能会以语音克隆、复制或深度伪造的方式传播。验证语音的真实性是数字音频取证的主要问题之一。我们提出了一种利用双谱和倒谱分析来区分人类语音和人工智能合成语音的方法。与合成语音相比，高阶统计与人类语音的相关性较小。此外，倒谱分析揭示了人类语音中持久的功率成分，这是合成语音所缺少的。我们将这两种分析结合起来，提出了一种检测人工智能合成语音的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)

自引率

0.00%

发文量

期刊最新文献

XM2A: Multi-Scale Multi-Head Attention with Cross-Talk for Multi-Variate Time Series Analysis Demo Paper: Ad Hoc Search On Statistical Data Based On Categorization And Metadata Augmentation An Introduction to the JPEG Fake Media Initiative Augmented Tai-Chi Chuan Practice Tool with Pose Evaluation Exploring the Spatial-Visual Locality of Geo-tagged Urban Street Images