Towards multidimensional attentive voice tracking—estimating voice state from auditory glimpses with regression neural networks and Monte Carlo sampling

IF 1.7 3区 计算机科学 Q2 ACOUSTICS Eurasip Journal on Audio Speech and Music Processing Pub Date : 2024-05-22 DOI:10.1186/s13636-024-00350-w
Joanna Luberadzka, Hendrik Kayser, Jörg Lücke, Volker Hohmann
{"title":"Towards multidimensional attentive voice tracking—estimating voice state from auditory glimpses with regression neural networks and Monte Carlo sampling","authors":"Joanna Luberadzka, Hendrik Kayser, Jörg Lücke, Volker Hohmann","doi":"10.1186/s13636-024-00350-w","DOIUrl":null,"url":null,"abstract":"Selective attention is a crucial ability of the auditory system. Computationally, following an auditory object can be illustrated as tracking its acoustic properties, e.g., pitch, timbre, or location in space. The difficulty is related to the fact that in a complex auditory scene, the information about the tracked object is not available in a clean form. The more cluttered the sound mixture, the more time and frequency regions where the object of interest is masked by other sound sources. How does the auditory system recognize and follow acoustic objects based on this fragmentary information? Numerous studies highlight the crucial role of top-down processing in this task. Having in mind both auditory modeling and signal processing applications, we investigated how computational methods with and without top-down processing deal with increasing sparsity of the auditory features in the task of estimating instantaneous voice states, defined as a combination of three parameters: fundamental frequency F0 and formant frequencies F1 and F2. We found that the benefit from top-down processing grows with increasing sparseness of the auditory data.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"33 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eurasip Journal on Audio Speech and Music Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s13636-024-00350-w","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Selective attention is a crucial ability of the auditory system. Computationally, following an auditory object can be illustrated as tracking its acoustic properties, e.g., pitch, timbre, or location in space. The difficulty is related to the fact that in a complex auditory scene, the information about the tracked object is not available in a clean form. The more cluttered the sound mixture, the more time and frequency regions where the object of interest is masked by other sound sources. How does the auditory system recognize and follow acoustic objects based on this fragmentary information? Numerous studies highlight the crucial role of top-down processing in this task. Having in mind both auditory modeling and signal processing applications, we investigated how computational methods with and without top-down processing deal with increasing sparsity of the auditory features in the task of estimating instantaneous voice states, defined as a combination of three parameters: fundamental frequency F0 and formant frequencies F1 and F2. We found that the benefit from top-down processing grows with increasing sparseness of the auditory data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
实现多维专注语音跟踪--利用回归神经网络和蒙特卡洛抽样从听觉瞥见中估计语音状态
选择性注意是听觉系统的一项重要能力。从计算角度看,跟踪一个听觉对象可以理解为跟踪其声学特性,如音高、音色或在空间中的位置。困难在于,在复杂的听觉场景中,被跟踪对象的信息并不是以简洁的形式存在的。声音混合物越杂乱,感兴趣对象被其他声源掩盖的时间和频率区域就越多。听觉系统如何根据这些零散的信息识别和跟踪声音对象呢?大量研究强调了自上而下的处理过程在这项任务中的关键作用。考虑到听觉建模和信号处理应用,我们研究了在估计瞬时语音状态(定义为三个参数的组合:基频 F0 和声母频率 F1 和 F2)的任务中,采用和不采用自上而下处理的计算方法如何处理日益稀疏的听觉特征。我们发现,随着听觉数据的稀疏程度增加,自上而下处理的优势也在增加。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Eurasip Journal on Audio Speech and Music Processing
Eurasip Journal on Audio Speech and Music Processing ACOUSTICS-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
4.10
自引率
4.20%
发文量
0
审稿时长
12 months
期刊介绍: The aim of “EURASIP Journal on Audio, Speech, and Music Processing” is to bring together researchers, scientists and engineers working on the theory and applications of the processing of various audio signals, with a specific focus on speech and music. EURASIP Journal on Audio, Speech, and Music Processing will be an interdisciplinary journal for the dissemination of all basic and applied aspects of speech communication and audio processes.
期刊最新文献
Compression of room impulse responses for compact storage and fast low-latency convolution Guest editorial: AI for computational audition—sound and music processing Physics-constrained adaptive kernel interpolation for region-to-region acoustic transfer function: a Bayesian approach Physics-informed neural network for volumetric sound field reconstruction of speech signals Optimal sensor placement for the spatial reconstruction of sound fields
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1