{"title":"基于DCT特征和CRF模型的重音检测与预测","authors":"Wenping Hu, Yao Qian, F. Soong","doi":"10.1109/ISCSLP.2012.6423504","DOIUrl":null,"url":null,"abstract":"Automatic detection/prediction of pitch accent, which determines the existence of prominent syllable of a word and its corresponding pitch accent pattern, is crucial in making expressive Text-To-Speech (TTS) synthesis. To train a model to detect and predict pitch accent usually requires a large amount of annotated training data to be manually labeled by phonetically trained language experts, which is both time consuming and costly. In this paper, we propose a semi-automatic algorithm to do pitch accent modeling, where the existence of accentuation in the training data is labeled at the word level by native speaker (i.e., not phonetically trained language experts) and the type of a pitch accent is automatically detected with its vector quantized DCT coefficient patterns. A cascaded, two-stage approach, which separates predicting the pitch accent existence and determining corresponding pitch accent type, is proposed to process any unrestricted text input with Conditional Random Field (CRF) trained models. The evaluation results show that the new approach outperforms the conventional, single stage approach.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pitch accent detection and prediction with DCT features and CRF model\",\"authors\":\"Wenping Hu, Yao Qian, F. Soong\",\"doi\":\"10.1109/ISCSLP.2012.6423504\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic detection/prediction of pitch accent, which determines the existence of prominent syllable of a word and its corresponding pitch accent pattern, is crucial in making expressive Text-To-Speech (TTS) synthesis. To train a model to detect and predict pitch accent usually requires a large amount of annotated training data to be manually labeled by phonetically trained language experts, which is both time consuming and costly. In this paper, we propose a semi-automatic algorithm to do pitch accent modeling, where the existence of accentuation in the training data is labeled at the word level by native speaker (i.e., not phonetically trained language experts) and the type of a pitch accent is automatically detected with its vector quantized DCT coefficient patterns. A cascaded, two-stage approach, which separates predicting the pitch accent existence and determining corresponding pitch accent type, is proposed to process any unrestricted text input with Conditional Random Field (CRF) trained models. The evaluation results show that the new approach outperforms the conventional, single stage approach.\",\"PeriodicalId\":186099,\"journal\":{\"name\":\"2012 8th International Symposium on Chinese Spoken Language Processing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 8th International Symposium on Chinese Spoken Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCSLP.2012.6423504\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 8th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP.2012.6423504","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Pitch accent detection and prediction with DCT features and CRF model
Automatic detection/prediction of pitch accent, which determines the existence of prominent syllable of a word and its corresponding pitch accent pattern, is crucial in making expressive Text-To-Speech (TTS) synthesis. To train a model to detect and predict pitch accent usually requires a large amount of annotated training data to be manually labeled by phonetically trained language experts, which is both time consuming and costly. In this paper, we propose a semi-automatic algorithm to do pitch accent modeling, where the existence of accentuation in the training data is labeled at the word level by native speaker (i.e., not phonetically trained language experts) and the type of a pitch accent is automatically detected with its vector quantized DCT coefficient patterns. A cascaded, two-stage approach, which separates predicting the pitch accent existence and determining corresponding pitch accent type, is proposed to process any unrestricted text input with Conditional Random Field (CRF) trained models. The evaluation results show that the new approach outperforms the conventional, single stage approach.