谱图跟踪共振峰及其在说话人验证中的应用

J. Leu, Liang-tsair Geeng, C. Pu, Jyh-Bin Shiau
{"title":"谱图跟踪共振峰及其在说话人验证中的应用","authors":"J. Leu, Liang-tsair Geeng, C. Pu, Jyh-Bin Shiau","doi":"10.1109/CCST.2012.6393541","DOIUrl":null,"url":null,"abstract":"Formants are the most visible features in spectrograms and they also hold the most valuable speech information. Traditionally, formant tracks are found by first finding formant points in individual frames, then the formants points in neighboring frames are joined together to form tracks. In this paper we present a formant tracking approach based on image processing techniques. Our approach is to first find the running directions of the formants in a spectrogram. Then we perform smoothing on the spectrogram along the directions of the formants to produce formants that are more continuous and stable. Then we perform ridge detection to find formant track candidates in the spectrogram. After removing tracks that are too short or too weak, we fit the remaining tracks with 2nd degree polynomial curves to extract formants that are both smooth and continuous. Besides extracting thin formant tracks, we also extracted formant tracks with width. These thick formants are able to indication not only the locations of the formants but also the width of the formants. Using the voices of 70 people, we conducted experiments to test the effectiveness of the thin formants and the thick formants when they are used in speaker verification. Using only one sentence (6 to 10 words, 3 seconds in length) for comparison, the thin formants and the thick formants are able to achieve 88.3% and 93.8% of accuracy in speaker verification, respectively. When the number of sentences for comparison increased to seven, the accuracy rate improved to 93.8% and 98.7%, respectively.","PeriodicalId":405531,"journal":{"name":"2012 IEEE International Carnahan Conference on Security Technology (ICCST)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Tracking formants in spectrograms and its application in speaker verification\",\"authors\":\"J. Leu, Liang-tsair Geeng, C. Pu, Jyh-Bin Shiau\",\"doi\":\"10.1109/CCST.2012.6393541\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Formants are the most visible features in spectrograms and they also hold the most valuable speech information. Traditionally, formant tracks are found by first finding formant points in individual frames, then the formants points in neighboring frames are joined together to form tracks. In this paper we present a formant tracking approach based on image processing techniques. Our approach is to first find the running directions of the formants in a spectrogram. Then we perform smoothing on the spectrogram along the directions of the formants to produce formants that are more continuous and stable. Then we perform ridge detection to find formant track candidates in the spectrogram. After removing tracks that are too short or too weak, we fit the remaining tracks with 2nd degree polynomial curves to extract formants that are both smooth and continuous. Besides extracting thin formant tracks, we also extracted formant tracks with width. These thick formants are able to indication not only the locations of the formants but also the width of the formants. Using the voices of 70 people, we conducted experiments to test the effectiveness of the thin formants and the thick formants when they are used in speaker verification. Using only one sentence (6 to 10 words, 3 seconds in length) for comparison, the thin formants and the thick formants are able to achieve 88.3% and 93.8% of accuracy in speaker verification, respectively. When the number of sentences for comparison increased to seven, the accuracy rate improved to 93.8% and 98.7%, respectively.\",\"PeriodicalId\":405531,\"journal\":{\"name\":\"2012 IEEE International Carnahan Conference on Security Technology (ICCST)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE International Carnahan Conference on Security Technology (ICCST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCST.2012.6393541\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Carnahan Conference on Security Technology (ICCST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCST.2012.6393541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

共振峰是声谱图中最明显的特征,也是最有价值的语音信息。传统的方法是先在单个帧中找到共振峰点,然后将相邻帧中的共振峰点连接在一起形成轨迹。本文提出了一种基于图像处理技术的峰形跟踪方法。我们的方法是首先在谱图中找到共振峰的运行方向。然后沿着共振峰的方向对谱图进行平滑处理,得到更连续、更稳定的共振峰。然后进行脊检测,在谱图中寻找形成峰轨迹候选点。在去除太短或太弱的轨道后,我们用2次多项式曲线拟合剩余的轨道,以提取既光滑又连续的共振峰。除了提取薄的形成峰轨迹外,还提取了宽的形成峰轨迹。这些厚的共振峰不仅能够指示共振峰的位置,而且还能指示共振峰的宽度。我们利用70个人的声音进行了实验,测试了薄共振峰和厚共振峰用于说话人验证时的有效性。仅用一个句子(6 ~ 10个单词,长度为3秒)进行比较,薄共振峰和厚共振峰在说话人验证中的准确率分别达到了88.3%和93.8%。当对比句数增加到7句时,准确率分别提高到93.8%和98.7%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Tracking formants in spectrograms and its application in speaker verification
Formants are the most visible features in spectrograms and they also hold the most valuable speech information. Traditionally, formant tracks are found by first finding formant points in individual frames, then the formants points in neighboring frames are joined together to form tracks. In this paper we present a formant tracking approach based on image processing techniques. Our approach is to first find the running directions of the formants in a spectrogram. Then we perform smoothing on the spectrogram along the directions of the formants to produce formants that are more continuous and stable. Then we perform ridge detection to find formant track candidates in the spectrogram. After removing tracks that are too short or too weak, we fit the remaining tracks with 2nd degree polynomial curves to extract formants that are both smooth and continuous. Besides extracting thin formant tracks, we also extracted formant tracks with width. These thick formants are able to indication not only the locations of the formants but also the width of the formants. Using the voices of 70 people, we conducted experiments to test the effectiveness of the thin formants and the thick formants when they are used in speaker verification. Using only one sentence (6 to 10 words, 3 seconds in length) for comparison, the thin formants and the thick formants are able to achieve 88.3% and 93.8% of accuracy in speaker verification, respectively. When the number of sentences for comparison increased to seven, the accuracy rate improved to 93.8% and 98.7%, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Department of Defense Instruction 8500.2 “Information Assurance (IA) Implementation:” A retrospective Attack tree-based evaluation of physical protection systems vulnerability Super-resolution processing of the partial pictorial image of the single pictorial image which eliminated artificiality A concept of automated vulnerability search in contactless communication applications Working towards an international ANPR Standard — An initial investigation into the UK standard
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1