A speech feature based on Bark frequency warping-the non-uniform linear prediction (NLP) cepstrum

Yoon Kim, J.O. Smith
{"title":"A speech feature based on Bark frequency warping-the non-uniform linear prediction (NLP) cepstrum","authors":"Yoon Kim, J.O. Smith","doi":"10.1109/ASPAA.1999.810867","DOIUrl":null,"url":null,"abstract":"We propose a new method of obtaining features from speech signals for robust analysis and recognition-the non-uniform linear prediction (NLP) cepstrum. The objective is to derive a representation that suppresses speaker-dependent characteristics while preserving the linguistic quality of speech segments. The analysis is based on two principles. First, Bark frequency warping is performed on the LP spectrum to emulate the auditory spectrum. While widely used methods such as the mel-frequency and PLP analysis use the FFT spectrum as its basis for warping, the NLP analysis uses the LP-based vocal-tract spectrum with glottal effects removed. Second, all-pole modeling (LP) is used before and after the warping. The pre-warp LP is used to first obtain the vocal-tract spectrum, while the post-warp LP is performed to obtain a smoothed, two-peak model of the warped spectrum. Experiments were conducted to test the effectiveness of the proposed feature in the case of identification/discrimination of vowels uttered by multiple speakers using linear discriminant analysis (LDA), and frame-based vowel recognition with a statistical model. In both cases, the NLP analysis was shown to be an effective tool for speaker-independent speech analysis/recognition applications.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"57 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASPAA.1999.810867","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

We propose a new method of obtaining features from speech signals for robust analysis and recognition-the non-uniform linear prediction (NLP) cepstrum. The objective is to derive a representation that suppresses speaker-dependent characteristics while preserving the linguistic quality of speech segments. The analysis is based on two principles. First, Bark frequency warping is performed on the LP spectrum to emulate the auditory spectrum. While widely used methods such as the mel-frequency and PLP analysis use the FFT spectrum as its basis for warping, the NLP analysis uses the LP-based vocal-tract spectrum with glottal effects removed. Second, all-pole modeling (LP) is used before and after the warping. The pre-warp LP is used to first obtain the vocal-tract spectrum, while the post-warp LP is performed to obtain a smoothed, two-peak model of the warped spectrum. Experiments were conducted to test the effectiveness of the proposed feature in the case of identification/discrimination of vowels uttered by multiple speakers using linear discriminant analysis (LDA), and frame-based vowel recognition with a statistical model. In both cases, the NLP analysis was shown to be an effective tool for speaker-independent speech analysis/recognition applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种基于Bark频率翘曲的语音特征-非均匀线性预测倒谱
本文提出了一种从语音信号中获取特征用于鲁棒分析和识别的新方法——非均匀线性预测倒谱。我们的目标是推导出一种表示,它可以抑制说话人依赖的特征,同时保持语音片段的语言质量。这种分析基于两个原则。首先,对低频频谱进行吠叫频率翘曲以模拟听觉频谱。虽然广泛使用的方法(如mel-frequency和PLP分析)使用FFT频谱作为其翘曲的基础,但NLP分析使用基于lp的声道频谱,去除声门效应。其次,在翘曲前后使用全极建模(LP)。曲前LP首先用于获得声道频谱,而曲后LP用于获得曲后频谱的平滑双峰模型。实验验证了该特征在线性判别分析(LDA)和基于帧的统计模型元音识别中的有效性。在这两种情况下,NLP分析被证明是独立于说话者的语音分析/识别应用的有效工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Application of the phase vocoder to pitch-preserving synchronization of an audio stream to an external clock Bayesian restoration of quantised audio signals using a sinusoidal model with autoregressive residuals Joint estimation of vocal tract filter and glottal source waveform via convex optimization Grid-based beamformer design for room-environment microphone arrays Maximization of the subjective loudness of speech with constrained amplitude
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1