Relationship Between Speakers' Physiological Structure and Acoustic Speech Signals: Data-Driven Study Based on Frequency-Wise Attentional Neural Network
Kai Li, Xugang Lu, M. Akagi, J. Dang, Sheng Li, M. Unoki
{"title":"Relationship Between Speakers' Physiological Structure and Acoustic Speech Signals: Data-Driven Study Based on Frequency-Wise Attentional Neural Network","authors":"Kai Li, Xugang Lu, M. Akagi, J. Dang, Sheng Li, M. Unoki","doi":"10.23919/eusipco55093.2022.9909649","DOIUrl":null,"url":null,"abstract":"Quantitatively revealing the relationship between speakers' physiological structure and acoustic speech signals by considering the properties of resonance and antiresonance can help us to extract effective speaker discriminative information (SDI) from speech signals. The conventional quantification method based on F-ratio only considers the power of acoustic speech in each frequency band independently. We propose a novel frequency-wise attentional neural network to learn the nonlinear combined effect of the frequency components on speaker identity. The learned results indicate that antiresonance frequency induced by the nasal cavity is another essential factor for speaker discrimination that the F-ratio method could not reveal. To further evaluate our findings, we designed a non-uniform subband processing strategy based on the learned results for speaker feature extraction and did automatic speaker verification (ASV). The ASV results confirmed that further emphasizing the spectral structure around the antiresonance frequency region can enhance speaker discrimination.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/eusipco55093.2022.9909649","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Quantitatively revealing the relationship between speakers' physiological structure and acoustic speech signals by considering the properties of resonance and antiresonance can help us to extract effective speaker discriminative information (SDI) from speech signals. The conventional quantification method based on F-ratio only considers the power of acoustic speech in each frequency band independently. We propose a novel frequency-wise attentional neural network to learn the nonlinear combined effect of the frequency components on speaker identity. The learned results indicate that antiresonance frequency induced by the nasal cavity is another essential factor for speaker discrimination that the F-ratio method could not reveal. To further evaluate our findings, we designed a non-uniform subband processing strategy based on the learned results for speaker feature extraction and did automatic speaker verification (ASV). The ASV results confirmed that further emphasizing the spectral structure around the antiresonance frequency region can enhance speaker discrimination.