Deletion and insertion tampering detection for speech authentication based on fluctuating super vector of electrical network frequency

IF 3 3区计算机科学 Q2 ACOUSTICS Speech Communication Pub Date : 2024-03-01 Epub Date: 2024-02-12 DOI:10.1016/j.specom.2024.103046

Chunyan Zeng , Shuai Kong , Zhifeng Wang , Shixiong Feng , Nan Zhao , Juan Wang

{"title":"Deletion and insertion tampering detection for speech authentication based on fluctuating super vector of electrical network frequency","authors":"Chunyan Zeng , Shuai Kong , Zhifeng Wang , Shixiong Feng , Nan Zhao , Juan Wang","doi":"10.1016/j.specom.2024.103046","DOIUrl":null,"url":null,"abstract":"<div><p>The current digital speech deletion and insertion tampering detection methods mainly employes the extraction of phase and frequency features of the Electrical Network Frequency (ENF). However, there are some problems with the existing approaches, such as the alignment problem for speech samples with different durations, the sparsity of ENF features, and the small number of tampered speech samples for training, which lead to low accuracy of deletion and insertion tampering detection. Therefore, this paper proposes a tampering detection method for digital speech deletion and insertion based on ENF Fluctuation Super-vector (ENF-FSV) and deep feature learning representation. By extracting the parameters of ENF phase and frequency fitting curves, feature alignment and dimensionality reduction are achieved, and the alignment and sparsity problems are avoided while the fluctuation information of phase and frequency is extracted. To solve the problem of the insufficient sample size of tampered speech for training, the ENF Universal Background Model (ENF-UBM) is built by a large number of untampered speech samples, and the mean vector is updated to extract ENF-FSV. Considering the shallow representation of ENF features with not highlighting important features, we construct an end-to-end deep neural network to strengthen the attention to the abrupt fluctuation information by the attention mechanism to enhance the representational power of the ENF-FSV features, and then the deep ENF-FSV features extracted by the Residual Network (ResNet) module are fed to the designed classification network for tampering detection. The experimental results show that the method in this paper exhibits higher accuracy and better robustness in the Carioca, New Spanish, and ENF High-sampling Group (ENF-HG) databases when compared with the state-of-the-art methods.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"158 ","pages":"Article 103046"},"PeriodicalIF":3.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639324000189","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/12 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

Abstract

The current digital speech deletion and insertion tampering detection methods mainly employes the extraction of phase and frequency features of the Electrical Network Frequency (ENF). However, there are some problems with the existing approaches, such as the alignment problem for speech samples with different durations, the sparsity of ENF features, and the small number of tampered speech samples for training, which lead to low accuracy of deletion and insertion tampering detection. Therefore, this paper proposes a tampering detection method for digital speech deletion and insertion based on ENF Fluctuation Super-vector (ENF-FSV) and deep feature learning representation. By extracting the parameters of ENF phase and frequency fitting curves, feature alignment and dimensionality reduction are achieved, and the alignment and sparsity problems are avoided while the fluctuation information of phase and frequency is extracted. To solve the problem of the insufficient sample size of tampered speech for training, the ENF Universal Background Model (ENF-UBM) is built by a large number of untampered speech samples, and the mean vector is updated to extract ENF-FSV. Considering the shallow representation of ENF features with not highlighting important features, we construct an end-to-end deep neural network to strengthen the attention to the abrupt fluctuation information by the attention mechanism to enhance the representational power of the ENF-FSV features, and then the deep ENF-FSV features extracted by the Residual Network (ResNet) module are fed to the designed classification network for tampering detection. The experimental results show that the method in this paper exhibits higher accuracy and better robustness in the Carioca, New Spanish, and ENF High-sampling Group (ENF-HG) databases when compared with the state-of-the-art methods.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于电网频率波动超矢量的语音认证删除和插入篡改检测

目前的数字语音删除和插入篡改检测方法主要采用电网络频率（ENF）的相位和频率特性提取。然而，现有方法存在一些问题，如不同时长语音样本的对齐问题、ENF 特征的稀疏性、用于训练的篡改语音样本数量较少等，导致删除和插入篡改检测的准确率较低。因此，本文提出了一种基于ENF波动超向量（ENF-FSV）和深度特征学习表示的数字语音删插篡改检测方法。通过提取ENF相位和频率拟合曲线参数，实现了特征对齐和降维，在提取相位和频率波动信息的同时，避免了对齐和稀疏性问题。为解决训练时篡改语音样本量不足的问题，利用大量未篡改语音样本建立 ENF 通用背景模型（ENF-UBM），并更新均值向量以提取 ENF-FSV。考虑到ENF特征的表征较浅，无法突出重要特征，我们构建了端到端的深度神经网络，通过注意力机制加强对突变波动信息的关注，增强ENF-FSV特征的表征力，然后将残差网络（ResNet）模块提取的ENF-FSV深度特征反馈给设计的分类网络，进行篡改检测。实验结果表明，与最先进的方法相比，本文的方法在 Carioca、New Spanish 和 ENF 高采样组（ENF-HG）数据库中表现出更高的准确性和更好的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Speech Communication 工程技术-计算机：跨学科应用

CiteScore

6.80

自引率

6.20%

发文量

审稿时长

19.2 weeks

期刊介绍： Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results. The journal''s primary objectives are: • to present a forum for the advancement of human and human-machine speech communication science; • to stimulate cross-fertilization between different fields of this domain; • to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.