Phase characteristics of vocal tract filter can distinguish speakers

IF 1.3 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Frontiers in Applied Mathematics and Statistics Pub Date : 2023-12-08 DOI:10.3389/fams.2023.1274846
Masahiro Okada, Hiroshi Ito
{"title":"Phase characteristics of vocal tract filter can distinguish speakers","authors":"Masahiro Okada, Hiroshi Ito","doi":"10.3389/fams.2023.1274846","DOIUrl":null,"url":null,"abstract":"Speaker recognition has been performed by considering individual variations in the power spectrograms of speech, which reflect the resonance phenomena in the speaker's vocal tract filter. In recent years, phase-based features have been used for speaker recognition. However, the phase-based features are not in a raw form of the phase but are crafted by humans, suggesting that the role of the raw phase is less interpretable. This study used phase spectrograms, which are calculated by subtracting the phase in the time-frequency domain of the electroglottograph signal from that of speech. The phase spectrograms represent the non-modified phase characteristics of the vocal tract filter.The phase spectrograms were obtained from five Japanese participants. Phase spectrograms corresponding to vowels, called phase spectra, were then extracted and circular-averaged for each vowel. The speakers were determined based on the degree of similarity of the averaged spectra.The accuracy of discriminating speakers using the averaged phase spectra was observed to be high although speakers were discriminated using only phase information without power. In particular, the averaged phase spectra showed different shapes for different speakers, resulting in the similarity between the different speaker spectrum pairs being lower. Therefore, the speakers were distinguished by using phase spectra.This predominance of phase spectra suggested that the phase characteristics of the vocal tract filter reflect the individuality of speakers.","PeriodicalId":36662,"journal":{"name":"Frontiers in Applied Mathematics and Statistics","volume":"30 38","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Applied Mathematics and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fams.2023.1274846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Speaker recognition has been performed by considering individual variations in the power spectrograms of speech, which reflect the resonance phenomena in the speaker's vocal tract filter. In recent years, phase-based features have been used for speaker recognition. However, the phase-based features are not in a raw form of the phase but are crafted by humans, suggesting that the role of the raw phase is less interpretable. This study used phase spectrograms, which are calculated by subtracting the phase in the time-frequency domain of the electroglottograph signal from that of speech. The phase spectrograms represent the non-modified phase characteristics of the vocal tract filter.The phase spectrograms were obtained from five Japanese participants. Phase spectrograms corresponding to vowels, called phase spectra, were then extracted and circular-averaged for each vowel. The speakers were determined based on the degree of similarity of the averaged spectra.The accuracy of discriminating speakers using the averaged phase spectra was observed to be high although speakers were discriminated using only phase information without power. In particular, the averaged phase spectra showed different shapes for different speakers, resulting in the similarity between the different speaker spectrum pairs being lower. Therefore, the speakers were distinguished by using phase spectra.This predominance of phase spectra suggested that the phase characteristics of the vocal tract filter reflect the individuality of speakers.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
声道滤波器的相位特征可区分扬声器
通过考虑语音功率谱的个体变化来进行说话人识别,功率谱反映了说话人声道滤波器中的共振现象。近年来,基于相位的特征被用于说话人识别。然而,基于阶段的特征并不是阶段的原始形式,而是由人类精心制作的,这表明原始阶段的作用是不可解释的。本研究使用相位谱图,通过从语音信号中减去电声门信号的时频域相位来计算。相位谱图表示声道滤波器未修改的相位特征。相位谱图来自5名日本参与者。然后提取元音对应的相谱图,称为相谱,并对每个元音进行循环平均。根据平均光谱的相似程度来确定扬声器。结果表明,仅使用相位信息而不使用功率信息对说话人进行识别时,使用平均相位谱识别说话人的准确率较高。特别是,不同扬声器的平均相位谱呈现不同的形状,导致不同扬声器频谱对之间的相似性较低。因此,利用相谱对说话人进行了识别。这种相位谱的优势表明,声道滤波器的相位特征反映了说话人的个性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Frontiers in Applied Mathematics and Statistics
Frontiers in Applied Mathematics and Statistics Mathematics-Statistics and Probability
CiteScore
1.90
自引率
7.10%
发文量
117
审稿时长
14 weeks
期刊最新文献
Third-degree B-spline collocation method for singularly perturbed time delay parabolic problem with two parameters Item response theory to discriminate COVID-19 knowledge and attitudes among university students Editorial: Justified modeling frameworks and novel interpretations of ecological and epidemiological systems Pneumonia and COVID-19 co-infection modeling with optimal control analysis Enhanced corn seed disease classification: leveraging MobileNetV2 with feature augmentation and transfer learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1