Joanna Luberadzka, Hendrik Kayser, Jörg Lücke, Volker Hohmann
{"title":"实现多维专注语音跟踪--利用回归神经网络和蒙特卡洛抽样从听觉瞥见中估计语音状态","authors":"Joanna Luberadzka, Hendrik Kayser, Jörg Lücke, Volker Hohmann","doi":"10.1186/s13636-024-00350-w","DOIUrl":null,"url":null,"abstract":"Selective attention is a crucial ability of the auditory system. Computationally, following an auditory object can be illustrated as tracking its acoustic properties, e.g., pitch, timbre, or location in space. The difficulty is related to the fact that in a complex auditory scene, the information about the tracked object is not available in a clean form. The more cluttered the sound mixture, the more time and frequency regions where the object of interest is masked by other sound sources. How does the auditory system recognize and follow acoustic objects based on this fragmentary information? Numerous studies highlight the crucial role of top-down processing in this task. Having in mind both auditory modeling and signal processing applications, we investigated how computational methods with and without top-down processing deal with increasing sparsity of the auditory features in the task of estimating instantaneous voice states, defined as a combination of three parameters: fundamental frequency F0 and formant frequencies F1 and F2. We found that the benefit from top-down processing grows with increasing sparseness of the auditory data.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"33 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards multidimensional attentive voice tracking—estimating voice state from auditory glimpses with regression neural networks and Monte Carlo sampling\",\"authors\":\"Joanna Luberadzka, Hendrik Kayser, Jörg Lücke, Volker Hohmann\",\"doi\":\"10.1186/s13636-024-00350-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Selective attention is a crucial ability of the auditory system. Computationally, following an auditory object can be illustrated as tracking its acoustic properties, e.g., pitch, timbre, or location in space. The difficulty is related to the fact that in a complex auditory scene, the information about the tracked object is not available in a clean form. The more cluttered the sound mixture, the more time and frequency regions where the object of interest is masked by other sound sources. How does the auditory system recognize and follow acoustic objects based on this fragmentary information? Numerous studies highlight the crucial role of top-down processing in this task. Having in mind both auditory modeling and signal processing applications, we investigated how computational methods with and without top-down processing deal with increasing sparsity of the auditory features in the task of estimating instantaneous voice states, defined as a combination of three parameters: fundamental frequency F0 and formant frequencies F1 and F2. We found that the benefit from top-down processing grows with increasing sparseness of the auditory data.\",\"PeriodicalId\":49202,\"journal\":{\"name\":\"Eurasip Journal on Audio Speech and Music Processing\",\"volume\":\"33 1\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Eurasip Journal on Audio Speech and Music Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1186/s13636-024-00350-w\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eurasip Journal on Audio Speech and Music Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s13636-024-00350-w","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
摘要
选择性注意是听觉系统的一项重要能力。从计算角度看,跟踪一个听觉对象可以理解为跟踪其声学特性,如音高、音色或在空间中的位置。困难在于,在复杂的听觉场景中,被跟踪对象的信息并不是以简洁的形式存在的。声音混合物越杂乱,感兴趣对象被其他声源掩盖的时间和频率区域就越多。听觉系统如何根据这些零散的信息识别和跟踪声音对象呢?大量研究强调了自上而下的处理过程在这项任务中的关键作用。考虑到听觉建模和信号处理应用,我们研究了在估计瞬时语音状态(定义为三个参数的组合:基频 F0 和声母频率 F1 和 F2)的任务中,采用和不采用自上而下处理的计算方法如何处理日益稀疏的听觉特征。我们发现,随着听觉数据的稀疏程度增加,自上而下处理的优势也在增加。
Towards multidimensional attentive voice tracking—estimating voice state from auditory glimpses with regression neural networks and Monte Carlo sampling
Selective attention is a crucial ability of the auditory system. Computationally, following an auditory object can be illustrated as tracking its acoustic properties, e.g., pitch, timbre, or location in space. The difficulty is related to the fact that in a complex auditory scene, the information about the tracked object is not available in a clean form. The more cluttered the sound mixture, the more time and frequency regions where the object of interest is masked by other sound sources. How does the auditory system recognize and follow acoustic objects based on this fragmentary information? Numerous studies highlight the crucial role of top-down processing in this task. Having in mind both auditory modeling and signal processing applications, we investigated how computational methods with and without top-down processing deal with increasing sparsity of the auditory features in the task of estimating instantaneous voice states, defined as a combination of three parameters: fundamental frequency F0 and formant frequencies F1 and F2. We found that the benefit from top-down processing grows with increasing sparseness of the auditory data.
期刊介绍:
The aim of “EURASIP Journal on Audio, Speech, and Music Processing” is to bring together researchers, scientists and engineers working on the theory and applications of the processing of various audio signals, with a specific focus on speech and music. EURASIP Journal on Audio, Speech, and Music Processing will be an interdisciplinary journal for the dissemination of all basic and applied aspects of speech communication and audio processes.