{"title":"利用不同的短期和长期特征分析说话人识别任务的性能","authors":"P. Suba, B. Bharathi","doi":"10.1109/ICACCCT.2014.7019342","DOIUrl":null,"url":null,"abstract":"The Automatic Speaker Recognition (ASR) is to identify information about the particular speaker identification. The actual goal is to possess machine automatically to recognize a person or perhaps to authenticate a persons claimed identity through his/her speech. This paper proposes the speaker identification task using different short term and long term features. The short term features are extracted based on frames. This represents the characteristics of speech signal with reduced redundancy. In training phase, various short-term features such as Mel Frequency Cepstral Coefficient(MFCC), Linear Predictive Cepstral Coefficient(LPCC), Perceptual Linear Predictive(PLP) extracted and modeled using Gaussian Mixture Models(GMM). The long term features like prosody are used to identify the speaking behavior. The long term features are often obtained on portions of speech signal longer than one frame. Long term feature are extracted from the speech signal and trained using Gaussian mixture models. The different short term and long term features are extracted separately and the combination of them are also extracted and modeled using Gaussian Mixture Models(GMM) to get the target model. In testing phase, the features are extracted from the given test speech signal at different duration of time. This extracted features are given to the stated speaker design and the decisions are obtained. Finally, the overall performance are examined according to the combination of short-term and long term-features.","PeriodicalId":239918,"journal":{"name":"2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Analysing the performance of speaker identification task using different short term and long term features\",\"authors\":\"P. Suba, B. Bharathi\",\"doi\":\"10.1109/ICACCCT.2014.7019342\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Automatic Speaker Recognition (ASR) is to identify information about the particular speaker identification. The actual goal is to possess machine automatically to recognize a person or perhaps to authenticate a persons claimed identity through his/her speech. This paper proposes the speaker identification task using different short term and long term features. The short term features are extracted based on frames. This represents the characteristics of speech signal with reduced redundancy. In training phase, various short-term features such as Mel Frequency Cepstral Coefficient(MFCC), Linear Predictive Cepstral Coefficient(LPCC), Perceptual Linear Predictive(PLP) extracted and modeled using Gaussian Mixture Models(GMM). The long term features like prosody are used to identify the speaking behavior. The long term features are often obtained on portions of speech signal longer than one frame. Long term feature are extracted from the speech signal and trained using Gaussian mixture models. The different short term and long term features are extracted separately and the combination of them are also extracted and modeled using Gaussian Mixture Models(GMM) to get the target model. In testing phase, the features are extracted from the given test speech signal at different duration of time. This extracted features are given to the stated speaker design and the decisions are obtained. Finally, the overall performance are examined according to the combination of short-term and long term-features.\",\"PeriodicalId\":239918,\"journal\":{\"name\":\"2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACCCT.2014.7019342\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACCCT.2014.7019342","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysing the performance of speaker identification task using different short term and long term features
The Automatic Speaker Recognition (ASR) is to identify information about the particular speaker identification. The actual goal is to possess machine automatically to recognize a person or perhaps to authenticate a persons claimed identity through his/her speech. This paper proposes the speaker identification task using different short term and long term features. The short term features are extracted based on frames. This represents the characteristics of speech signal with reduced redundancy. In training phase, various short-term features such as Mel Frequency Cepstral Coefficient(MFCC), Linear Predictive Cepstral Coefficient(LPCC), Perceptual Linear Predictive(PLP) extracted and modeled using Gaussian Mixture Models(GMM). The long term features like prosody are used to identify the speaking behavior. The long term features are often obtained on portions of speech signal longer than one frame. Long term feature are extracted from the speech signal and trained using Gaussian mixture models. The different short term and long term features are extracted separately and the combination of them are also extracted and modeled using Gaussian Mixture Models(GMM) to get the target model. In testing phase, the features are extracted from the given test speech signal at different duration of time. This extracted features are given to the stated speaker design and the decisions are obtained. Finally, the overall performance are examined according to the combination of short-term and long term-features.