Biomimetic multi-resolution analysis for robust speaker recognition.

IF 1.7 3区 计算机科学 Q2 ACOUSTICS Eurasip Journal on Audio Speech and Music Processing Pub Date : 2012-01-01 Epub Date: 2012-09-07 DOI:10.1186/1687-4722-2012-22
Sridhar Krishna Nemala, Dmitry N Zotkin, Ramani Duraiswami, Mounya Elhilali
{"title":"Biomimetic multi-resolution analysis for robust speaker recognition.","authors":"Sridhar Krishna Nemala,&nbsp;Dmitry N Zotkin,&nbsp;Ramani Duraiswami,&nbsp;Mounya Elhilali","doi":"10.1186/1687-4722-2012-22","DOIUrl":null,"url":null,"abstract":"<p><p>Humans exhibit a remarkable ability to reliably classify sound sources in the environment even in presence of high levels of noise. In contrast, most engineering systems suffer a drastic drop in performance when speech signals are corrupted with channel or background distortions. Our brains are equipped with elaborate machinery for speech analysis and feature extraction, which hold great lessons for improving the performance of automatic speech processing systems under adverse conditions. The work presented here explores a biologically-motivated multi-resolution speaker information representation obtained by performing an intricate yet computationally-efficient analysis of the information-rich spectro-temporal attributes of the speech signal. We evaluate the proposed features in a speaker verification task performed on NIST SRE 2010 data. The biomimetic approach yields significant robustness in presence of non-stationary noise and reverberation, offering a new framework for deriving reliable features for speaker recognition and speech processing.</p>","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1687-4722-2012-22","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eurasip Journal on Audio Speech and Music Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/1687-4722-2012-22","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2012/9/7 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 5

Abstract

Humans exhibit a remarkable ability to reliably classify sound sources in the environment even in presence of high levels of noise. In contrast, most engineering systems suffer a drastic drop in performance when speech signals are corrupted with channel or background distortions. Our brains are equipped with elaborate machinery for speech analysis and feature extraction, which hold great lessons for improving the performance of automatic speech processing systems under adverse conditions. The work presented here explores a biologically-motivated multi-resolution speaker information representation obtained by performing an intricate yet computationally-efficient analysis of the information-rich spectro-temporal attributes of the speech signal. We evaluate the proposed features in a speaker verification task performed on NIST SRE 2010 data. The biomimetic approach yields significant robustness in presence of non-stationary noise and reverberation, offering a new framework for deriving reliable features for speaker recognition and speech processing.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
鲁棒说话人识别的仿生多分辨率分析。
人类表现出一种非凡的能力,即使在噪音很大的环境中也能可靠地对声源进行分类。相比之下,当语音信号被信道或背景失真破坏时,大多数工程系统的性能会急剧下降。我们的大脑配备了复杂的语音分析和特征提取机制,这对于提高语音自动处理系统在不利条件下的性能具有重要的借鉴意义。本文介绍的工作探索了一种生物驱动的多分辨率说话人信息表示,该信息表示是通过对语音信号的信息丰富的光谱时间属性进行复杂但计算效率高的分析获得的。我们在NIST SRE 2010数据上执行的说话人验证任务中评估了所提出的特征。仿生方法在存在非平稳噪声和混响时具有显著的鲁棒性,为获得可靠的说话人识别和语音处理特征提供了新的框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Eurasip Journal on Audio Speech and Music Processing
Eurasip Journal on Audio Speech and Music Processing ACOUSTICS-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
4.10
自引率
4.20%
发文量
0
审稿时长
12 months
期刊介绍: The aim of “EURASIP Journal on Audio, Speech, and Music Processing” is to bring together researchers, scientists and engineers working on the theory and applications of the processing of various audio signals, with a specific focus on speech and music. EURASIP Journal on Audio, Speech, and Music Processing will be an interdisciplinary journal for the dissemination of all basic and applied aspects of speech communication and audio processes.
期刊最新文献
Compression of room impulse responses for compact storage and fast low-latency convolution Guest editorial: AI for computational audition—sound and music processing Physics-constrained adaptive kernel interpolation for region-to-region acoustic transfer function: a Bayesian approach Physics-informed neural network for volumetric sound field reconstruction of speech signals Optimal sensor placement for the spatial reconstruction of sound fields
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1