Deletion and insertion tampering detection for speech authentication based on fluctuating super vector of electrical network frequency

IF 2.4 3区 计算机科学 Q2 ACOUSTICS Speech Communication Pub Date : 2024-02-12 DOI:10.1016/j.specom.2024.103046
Chunyan Zeng , Shuai Kong , Zhifeng Wang , Shixiong Feng , Nan Zhao , Juan Wang
{"title":"Deletion and insertion tampering detection for speech authentication based on fluctuating super vector of electrical network frequency","authors":"Chunyan Zeng ,&nbsp;Shuai Kong ,&nbsp;Zhifeng Wang ,&nbsp;Shixiong Feng ,&nbsp;Nan Zhao ,&nbsp;Juan Wang","doi":"10.1016/j.specom.2024.103046","DOIUrl":null,"url":null,"abstract":"<div><p>The current digital speech deletion and insertion tampering detection methods mainly employes the extraction of phase and frequency features of the Electrical Network Frequency (ENF). However, there are some problems with the existing approaches, such as the alignment problem for speech samples with different durations, the sparsity of ENF features, and the small number of tampered speech samples for training, which lead to low accuracy of deletion and insertion tampering detection. Therefore, this paper proposes a tampering detection method for digital speech deletion and insertion based on ENF Fluctuation Super-vector (ENF-FSV) and deep feature learning representation. By extracting the parameters of ENF phase and frequency fitting curves, feature alignment and dimensionality reduction are achieved, and the alignment and sparsity problems are avoided while the fluctuation information of phase and frequency is extracted. To solve the problem of the insufficient sample size of tampered speech for training, the ENF Universal Background Model (ENF-UBM) is built by a large number of untampered speech samples, and the mean vector is updated to extract ENF-FSV. Considering the shallow representation of ENF features with not highlighting important features, we construct an end-to-end deep neural network to strengthen the attention to the abrupt fluctuation information by the attention mechanism to enhance the representational power of the ENF-FSV features, and then the deep ENF-FSV features extracted by the Residual Network (ResNet) module are fed to the designed classification network for tampering detection. The experimental results show that the method in this paper exhibits higher accuracy and better robustness in the Carioca, New Spanish, and ENF High-sampling Group (ENF-HG) databases when compared with the state-of-the-art methods.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639324000189","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

The current digital speech deletion and insertion tampering detection methods mainly employes the extraction of phase and frequency features of the Electrical Network Frequency (ENF). However, there are some problems with the existing approaches, such as the alignment problem for speech samples with different durations, the sparsity of ENF features, and the small number of tampered speech samples for training, which lead to low accuracy of deletion and insertion tampering detection. Therefore, this paper proposes a tampering detection method for digital speech deletion and insertion based on ENF Fluctuation Super-vector (ENF-FSV) and deep feature learning representation. By extracting the parameters of ENF phase and frequency fitting curves, feature alignment and dimensionality reduction are achieved, and the alignment and sparsity problems are avoided while the fluctuation information of phase and frequency is extracted. To solve the problem of the insufficient sample size of tampered speech for training, the ENF Universal Background Model (ENF-UBM) is built by a large number of untampered speech samples, and the mean vector is updated to extract ENF-FSV. Considering the shallow representation of ENF features with not highlighting important features, we construct an end-to-end deep neural network to strengthen the attention to the abrupt fluctuation information by the attention mechanism to enhance the representational power of the ENF-FSV features, and then the deep ENF-FSV features extracted by the Residual Network (ResNet) module are fed to the designed classification network for tampering detection. The experimental results show that the method in this paper exhibits higher accuracy and better robustness in the Carioca, New Spanish, and ENF High-sampling Group (ENF-HG) databases when compared with the state-of-the-art methods.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于电网频率波动超矢量的语音认证删除和插入篡改检测
目前的数字语音删除和插入篡改检测方法主要采用电网络频率(ENF)的相位和频率特性提取。然而,现有方法存在一些问题,如不同时长语音样本的对齐问题、ENF 特征的稀疏性、用于训练的篡改语音样本数量较少等,导致删除和插入篡改检测的准确率较低。因此,本文提出了一种基于ENF波动超向量(ENF-FSV)和深度特征学习表示的数字语音删插篡改检测方法。通过提取ENF相位和频率拟合曲线参数,实现了特征对齐和降维,在提取相位和频率波动信息的同时,避免了对齐和稀疏性问题。为解决训练时篡改语音样本量不足的问题,利用大量未篡改语音样本建立 ENF 通用背景模型(ENF-UBM),并更新均值向量以提取 ENF-FSV。考虑到ENF特征的表征较浅,无法突出重要特征,我们构建了端到端的深度神经网络,通过注意力机制加强对突变波动信息的关注,增强ENF-FSV特征的表征力,然后将残差网络(ResNet)模块提取的ENF-FSV深度特征反馈给设计的分类网络,进行篡改检测。实验结果表明,与最先进的方法相比,本文的方法在 Carioca、New Spanish 和 ENF 高采样组(ENF-HG)数据库中表现出更高的准确性和更好的鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Speech Communication
Speech Communication 工程技术-计算机:跨学科应用
CiteScore
6.80
自引率
6.20%
发文量
94
审稿时长
19.2 weeks
期刊介绍: Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results. The journal''s primary objectives are: • to present a forum for the advancement of human and human-machine speech communication science; • to stimulate cross-fertilization between different fields of this domain; • to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.
期刊最新文献
A corpus of audio-visual recordings of linguistically balanced, Danish sentences for speech-in-noise experiments Forms, factors and functions of phonetic convergence: Editorial Feasibility of acoustic features of vowel sounds in estimating the upper airway cross sectional area during wakefulness: A pilot study Zero-shot voice conversion based on feature disentanglement Multi-modal co-learning for silent speech recognition based on ultrasound tongue images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1