谱图跟踪共振峰及其在说话人验证中的应用

2012 IEEE International Carnahan Conference on Security Technology (ICCST) Pub Date : 2012-12-31 DOI:10.1109/CCST.2012.6393541

J. Leu, Liang-tsair Geeng, C. Pu, Jyh-Bin Shiau

{"title":"谱图跟踪共振峰及其在说话人验证中的应用","authors":"J. Leu, Liang-tsair Geeng, C. Pu, Jyh-Bin Shiau","doi":"10.1109/CCST.2012.6393541","DOIUrl":null,"url":null,"abstract":"Formants are the most visible features in spectrograms and they also hold the most valuable speech information. Traditionally, formant tracks are found by first finding formant points in individual frames, then the formants points in neighboring frames are joined together to form tracks. In this paper we present a formant tracking approach based on image processing techniques. Our approach is to first find the running directions of the formants in a spectrogram. Then we perform smoothing on the spectrogram along the directions of the formants to produce formants that are more continuous and stable. Then we perform ridge detection to find formant track candidates in the spectrogram. After removing tracks that are too short or too weak, we fit the remaining tracks with 2nd degree polynomial curves to extract formants that are both smooth and continuous. Besides extracting thin formant tracks, we also extracted formant tracks with width. These thick formants are able to indication not only the locations of the formants but also the width of the formants. Using the voices of 70 people, we conducted experiments to test the effectiveness of the thin formants and the thick formants when they are used in speaker verification. Using only one sentence (6 to 10 words, 3 seconds in length) for comparison, the thin formants and the thick formants are able to achieve 88.3% and 93.8% of accuracy in speaker verification, respectively. When the number of sentences for comparison increased to seven, the accuracy rate improved to 93.8% and 98.7%, respectively.","PeriodicalId":405531,"journal":{"name":"2012 IEEE International Carnahan Conference on Security Technology (ICCST)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Tracking formants in spectrograms and its application in speaker verification\",\"authors\":\"J. Leu, Liang-tsair Geeng, C. Pu, Jyh-Bin Shiau\",\"doi\":\"10.1109/CCST.2012.6393541\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Formants are the most visible features in spectrograms and they also hold the most valuable speech information. Traditionally, formant tracks are found by first finding formant points in individual frames, then the formants points in neighboring frames are joined together to form tracks. In this paper we present a formant tracking approach based on image processing techniques. Our approach is to first find the running directions of the formants in a spectrogram. Then we perform smoothing on the spectrogram along the directions of the formants to produce formants that are more continuous and stable. Then we perform ridge detection to find formant track candidates in the spectrogram. After removing tracks that are too short or too weak, we fit the remaining tracks with 2nd degree polynomial curves to extract formants that are both smooth and continuous. Besides extracting thin formant tracks, we also extracted formant tracks with width. These thick formants are able to indication not only the locations of the formants but also the width of the formants. Using the voices of 70 people, we conducted experiments to test the effectiveness of the thin formants and the thick formants when they are used in speaker verification. Using only one sentence (6 to 10 words, 3 seconds in length) for comparison, the thin formants and the thick formants are able to achieve 88.3% and 93.8% of accuracy in speaker verification, respectively. When the number of sentences for comparison increased to seven, the accuracy rate improved to 93.8% and 98.7%, respectively.\",\"PeriodicalId\":405531,\"journal\":{\"name\":\"2012 IEEE International Carnahan Conference on Security Technology (ICCST)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE International Carnahan Conference on Security Technology (ICCST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCST.2012.6393541\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Carnahan Conference on Security Technology (ICCST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCST.2012.6393541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

共振峰是声谱图中最明显的特征，也是最有价值的语音信息。传统的方法是先在单个帧中找到共振峰点，然后将相邻帧中的共振峰点连接在一起形成轨迹。本文提出了一种基于图像处理技术的峰形跟踪方法。我们的方法是首先在谱图中找到共振峰的运行方向。然后沿着共振峰的方向对谱图进行平滑处理，得到更连续、更稳定的共振峰。然后进行脊检测，在谱图中寻找形成峰轨迹候选点。在去除太短或太弱的轨道后，我们用2次多项式曲线拟合剩余的轨道，以提取既光滑又连续的共振峰。除了提取薄的形成峰轨迹外，还提取了宽的形成峰轨迹。这些厚的共振峰不仅能够指示共振峰的位置，而且还能指示共振峰的宽度。我们利用70个人的声音进行了实验，测试了薄共振峰和厚共振峰用于说话人验证时的有效性。仅用一个句子(6 ~ 10个单词，长度为3秒)进行比较，薄共振峰和厚共振峰在说话人验证中的准确率分别达到了88.3%和93.8%。当对比句数增加到7句时，准确率分别提高到93.8%和98.7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Tracking formants in spectrograms and its application in speaker verification

Formants are the most visible features in spectrograms and they also hold the most valuable speech information. Traditionally, formant tracks are found by first finding formant points in individual frames, then the formants points in neighboring frames are joined together to form tracks. In this paper we present a formant tracking approach based on image processing techniques. Our approach is to first find the running directions of the formants in a spectrogram. Then we perform smoothing on the spectrogram along the directions of the formants to produce formants that are more continuous and stable. Then we perform ridge detection to find formant track candidates in the spectrogram. After removing tracks that are too short or too weak, we fit the remaining tracks with 2nd degree polynomial curves to extract formants that are both smooth and continuous. Besides extracting thin formant tracks, we also extracted formant tracks with width. These thick formants are able to indication not only the locations of the formants but also the width of the formants. Using the voices of 70 people, we conducted experiments to test the effectiveness of the thin formants and the thick formants when they are used in speaker verification. Using only one sentence (6 to 10 words, 3 seconds in length) for comparison, the thin formants and the thick formants are able to achieve 88.3% and 93.8% of accuracy in speaker verification, respectively. When the number of sentences for comparison increased to seven, the accuracy rate improved to 93.8% and 98.7%, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE International Carnahan Conference on Security Technology (ICCST)

自引率

0.00%

发文量