使用PLP倒谱特征进行语音分割

2013 International Conference on Asian Language Processing Pub Date : 2013-08-17 DOI:10.1109/IALP.2013.47

Bhavik B. Vachhani, H. Patil

{"title":"使用PLP倒谱特征进行语音分割","authors":"Bhavik B. Vachhani, H. Patil","doi":"10.1109/IALP.2013.47","DOIUrl":null,"url":null,"abstract":"Phonetic segmentation can find its potential application for Text-to-Speech (TTS) synthesis and Automatic Speech Recognition (ASR) systems. In this paper, we propose use of Perceptual Linear Prediction Cepstral Coefficients (PLPCC) feature for phonetic segmentation task. To detect phonetic boundaries, we used spectral transition measure (STM). Using proposed approach, we achieve 85 % (i.e., 3 % better than state-of-the art Mel-frequency Cepstral Coefficients (MFCC) for 20 ms agreement duration) accuracy and 15 % over-segmentation rate (i.e., 8 % less than MFCC) for automatic boundary detection of 2, 34, 925 phone boundaries corresponding 630 speakers of entire TIMIT database.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"270 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Use of PLP Cepstral Features for Phonetic Segmentation\",\"authors\":\"Bhavik B. Vachhani, H. Patil\",\"doi\":\"10.1109/IALP.2013.47\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Phonetic segmentation can find its potential application for Text-to-Speech (TTS) synthesis and Automatic Speech Recognition (ASR) systems. In this paper, we propose use of Perceptual Linear Prediction Cepstral Coefficients (PLPCC) feature for phonetic segmentation task. To detect phonetic boundaries, we used spectral transition measure (STM). Using proposed approach, we achieve 85 % (i.e., 3 % better than state-of-the art Mel-frequency Cepstral Coefficients (MFCC) for 20 ms agreement duration) accuracy and 15 % over-segmentation rate (i.e., 8 % less than MFCC) for automatic boundary detection of 2, 34, 925 phone boundaries corresponding 630 speakers of entire TIMIT database.\",\"PeriodicalId\":413833,\"journal\":{\"name\":\"2013 International Conference on Asian Language Processing\",\"volume\":\"270 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Asian Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IALP.2013.47\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Asian Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2013.47","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

语音切分在文本到语音(TTS)合成和自动语音识别(ASR)系统中具有潜在的应用前景。在本文中，我们提出使用感知线性预测倒谱系数(PLPCC)特征来完成语音分割任务。为了检测语音边界，我们使用了频谱转移度量(STM)。使用该方法，我们实现了85%(即比最先进的Mel-frequency Cepstral Coefficients (MFCC)在20 ms协议持续时间内提高3%)的准确率和15%的过分割率(即比MFCC低8%)，用于整个TIMIT数据库中对应630个扬声器的2,34,925个电话边界的自动边界检测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Use of PLP Cepstral Features for Phonetic Segmentation

Phonetic segmentation can find its potential application for Text-to-Speech (TTS) synthesis and Automatic Speech Recognition (ASR) systems. In this paper, we propose use of Perceptual Linear Prediction Cepstral Coefficients (PLPCC) feature for phonetic segmentation task. To detect phonetic boundaries, we used spectral transition measure (STM). Using proposed approach, we achieve 85 % (i.e., 3 % better than state-of-the art Mel-frequency Cepstral Coefficients (MFCC) for 20 ms agreement duration) accuracy and 15 % over-segmentation rate (i.e., 8 % less than MFCC) for automatic boundary detection of 2, 34, 925 phone boundaries corresponding 630 speakers of entire TIMIT database.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 International Conference on Asian Language Processing

自引率

0.00%

发文量