Voice Activity Detection Using Novel Teager Energy Based Band Spectral Entropy

Raveesh Hegde, R. Muralishankar
{"title":"Voice Activity Detection Using Novel Teager Energy Based Band Spectral Entropy","authors":"Raveesh Hegde, R. Muralishankar","doi":"10.1109/ICCES45898.2019.9002565","DOIUrl":null,"url":null,"abstract":"There are many features proposed in the literature for voice activity detection (VAD). Shen et al. [20] first used a spectral entropy-based feature to detect regions of speech spurts under noisy conditions. However, VAD employing this feature was unreliable when the noise level greatly exceeds the speech level. To improve the performance of spectral entropy based VAD under low signal-to-noise ratios (SNRs), spectrum of a signal over a frame is divided into sub bands and spectral entropy is computed over these bands. Later, these spectral entropies are weighted and summed to obtain the entropy. Based on the amount of noise in each band, weights were found empirically. This approach was named as banded spectral entropy (BSE) [21]. In [24], deviation threshold computed from approximate ramp line and the sorted spectral coefficients of the band are adopted to decide useful/useless bands. In this paper, we propose a novel Teager Energy Band Spectral Entropy (TE_BSE) feature for VAD. Here, we carryout enhancement of spectral peaks employing Teager energy of each frequency transformed speech frame. This is followed with dividing of spectrum into sub bands and entropy computation over each band. The summing of entropy from each useful band is done to get TE _ BSE feature. We identify useful/useless bands following [24]. Later, we present the performance of our proposed VAD in terms of probability of detection $(\\pmb{P}_{\\pmb{D}})$, probability of false alarm $(\\pmb{P}_{\\pmb{FA}})$ and probability of error under different noises and SNRs. Finally, from the VAD results on real-world sample, proposed VAD outperforms statistical based VAD by Sohn et. al. [8] with improved $\\pmb{P}_{\\pmb{D}}$ not at the cost of increase in $\\pmb{P}_{\\pmb{FA}}$.","PeriodicalId":348347,"journal":{"name":"2019 International Conference on Communication and Electronics Systems (ICCES)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Communication and Electronics Systems (ICCES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCES45898.2019.9002565","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

There are many features proposed in the literature for voice activity detection (VAD). Shen et al. [20] first used a spectral entropy-based feature to detect regions of speech spurts under noisy conditions. However, VAD employing this feature was unreliable when the noise level greatly exceeds the speech level. To improve the performance of spectral entropy based VAD under low signal-to-noise ratios (SNRs), spectrum of a signal over a frame is divided into sub bands and spectral entropy is computed over these bands. Later, these spectral entropies are weighted and summed to obtain the entropy. Based on the amount of noise in each band, weights were found empirically. This approach was named as banded spectral entropy (BSE) [21]. In [24], deviation threshold computed from approximate ramp line and the sorted spectral coefficients of the band are adopted to decide useful/useless bands. In this paper, we propose a novel Teager Energy Band Spectral Entropy (TE_BSE) feature for VAD. Here, we carryout enhancement of spectral peaks employing Teager energy of each frequency transformed speech frame. This is followed with dividing of spectrum into sub bands and entropy computation over each band. The summing of entropy from each useful band is done to get TE _ BSE feature. We identify useful/useless bands following [24]. Later, we present the performance of our proposed VAD in terms of probability of detection $(\pmb{P}_{\pmb{D}})$, probability of false alarm $(\pmb{P}_{\pmb{FA}})$ and probability of error under different noises and SNRs. Finally, from the VAD results on real-world sample, proposed VAD outperforms statistical based VAD by Sohn et. al. [8] with improved $\pmb{P}_{\pmb{D}}$ not at the cost of increase in $\pmb{P}_{\pmb{FA}}$.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于新型Teager能量的频带谱熵语音活动检测
针对语音活动检测(VAD),文献中提出了许多特征。Shen等人[20]首先使用基于谱熵的特征来检测噪声条件下的语音爆发区域。然而,当噪声水平大大超过语音水平时,采用该特性的VAD是不可靠的。为了提高基于谱熵的VAD在低信噪比条件下的性能,将一帧信号的频谱划分为若干子带,并在这些子带上计算谱熵。然后,对这些谱熵进行加权和,得到熵。根据每个波段的噪声量,经验地确定权重。这种方法被命名为带状谱熵(BSE)[21]。在[24]中,采用从近似斜坡线计算的偏差阈值和对波段进行排序的光谱系数来确定有用/无用波段。本文提出了一种新的Teager能带谱熵(TE_BSE)特征。在这里,我们利用每个频率变换语音帧的Teager能量对频谱峰进行增强。接下来是将频谱划分为子波段,并在每个波段上计算熵。对各有用波段的熵进行求和,得到t_ BSE特征。我们确定了有用/无用的波段如下[24]。随后,我们从检测概率$(\pmb{P}_{\pmb{D}})$、虚警概率$(\pmb{P}_{\pmb{FA}})$和不同噪声和信噪比下的误差概率三个方面介绍了我们所提出的VAD的性能。最后,从实际样本的VAD结果来看,本文提出的VAD优于Sohn等人[8]基于统计的VAD,改进了$\pmb{P}_{\pmb{D}}$,而不是以增加$\pmb{P}_{\pmb{FA}}$为代价。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Automated Library System Using Robotic Arm Road Crack Detection and Segmentation for Autonomous Driving Design and Simulation of Two Stage Sample and Hold Circuit with Low Power using Current Controlled Conveyor The PI Controllers and its optimal tuning for Load Frequency Control (LFC) of Hybrid Hydro-thermal Power Systems Low Power Hardware Based Real Time Music System and Digital Data Transmission Using FPGA
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1