利用不同的短期和长期特征分析说话人识别任务的性能

P. Suba, B. Bharathi
{"title":"利用不同的短期和长期特征分析说话人识别任务的性能","authors":"P. Suba, B. Bharathi","doi":"10.1109/ICACCCT.2014.7019342","DOIUrl":null,"url":null,"abstract":"The Automatic Speaker Recognition (ASR) is to identify information about the particular speaker identification. The actual goal is to possess machine automatically to recognize a person or perhaps to authenticate a persons claimed identity through his/her speech. This paper proposes the speaker identification task using different short term and long term features. The short term features are extracted based on frames. This represents the characteristics of speech signal with reduced redundancy. In training phase, various short-term features such as Mel Frequency Cepstral Coefficient(MFCC), Linear Predictive Cepstral Coefficient(LPCC), Perceptual Linear Predictive(PLP) extracted and modeled using Gaussian Mixture Models(GMM). The long term features like prosody are used to identify the speaking behavior. The long term features are often obtained on portions of speech signal longer than one frame. Long term feature are extracted from the speech signal and trained using Gaussian mixture models. The different short term and long term features are extracted separately and the combination of them are also extracted and modeled using Gaussian Mixture Models(GMM) to get the target model. In testing phase, the features are extracted from the given test speech signal at different duration of time. This extracted features are given to the stated speaker design and the decisions are obtained. Finally, the overall performance are examined according to the combination of short-term and long term-features.","PeriodicalId":239918,"journal":{"name":"2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Analysing the performance of speaker identification task using different short term and long term features\",\"authors\":\"P. Suba, B. Bharathi\",\"doi\":\"10.1109/ICACCCT.2014.7019342\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Automatic Speaker Recognition (ASR) is to identify information about the particular speaker identification. The actual goal is to possess machine automatically to recognize a person or perhaps to authenticate a persons claimed identity through his/her speech. This paper proposes the speaker identification task using different short term and long term features. The short term features are extracted based on frames. This represents the characteristics of speech signal with reduced redundancy. In training phase, various short-term features such as Mel Frequency Cepstral Coefficient(MFCC), Linear Predictive Cepstral Coefficient(LPCC), Perceptual Linear Predictive(PLP) extracted and modeled using Gaussian Mixture Models(GMM). The long term features like prosody are used to identify the speaking behavior. The long term features are often obtained on portions of speech signal longer than one frame. Long term feature are extracted from the speech signal and trained using Gaussian mixture models. The different short term and long term features are extracted separately and the combination of them are also extracted and modeled using Gaussian Mixture Models(GMM) to get the target model. In testing phase, the features are extracted from the given test speech signal at different duration of time. This extracted features are given to the stated speaker design and the decisions are obtained. Finally, the overall performance are examined according to the combination of short-term and long term-features.\",\"PeriodicalId\":239918,\"journal\":{\"name\":\"2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACCCT.2014.7019342\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACCCT.2014.7019342","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

自动说话人识别(ASR)是识别有关特定说话人识别的信息。实际目标是让机器自动识别一个人,或者通过他/她的讲话来验证一个人声称的身份。本文提出了使用不同的短期和长期特征的说话人识别任务。基于帧提取短期特征。这代表了减少冗余的语音信号的特征。在训练阶段,利用高斯混合模型(GMM)对Mel频率倒谱系数(MFCC)、线性预测倒谱系数(LPCC)、感知线性预测(PLP)等短期特征进行提取和建模。像韵律这样的长期特征被用来识别说话行为。长时特征通常是在长于一帧的语音信号中得到的。从语音信号中提取长期特征,并使用高斯混合模型进行训练。对不同的短期和长期特征分别进行提取,并对它们的组合进行提取和建模,利用高斯混合模型(Gaussian Mixture Models, GMM)得到目标模型。在测试阶段,从给定的测试语音信号中提取不同持续时间的特征。将所提取的特征用于所述扬声器设计,并得到决策结果。最后,结合短期和长期特征对整体绩效进行考察。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Analysing the performance of speaker identification task using different short term and long term features
The Automatic Speaker Recognition (ASR) is to identify information about the particular speaker identification. The actual goal is to possess machine automatically to recognize a person or perhaps to authenticate a persons claimed identity through his/her speech. This paper proposes the speaker identification task using different short term and long term features. The short term features are extracted based on frames. This represents the characteristics of speech signal with reduced redundancy. In training phase, various short-term features such as Mel Frequency Cepstral Coefficient(MFCC), Linear Predictive Cepstral Coefficient(LPCC), Perceptual Linear Predictive(PLP) extracted and modeled using Gaussian Mixture Models(GMM). The long term features like prosody are used to identify the speaking behavior. The long term features are often obtained on portions of speech signal longer than one frame. Long term feature are extracted from the speech signal and trained using Gaussian mixture models. The different short term and long term features are extracted separately and the combination of them are also extracted and modeled using Gaussian Mixture Models(GMM) to get the target model. In testing phase, the features are extracted from the given test speech signal at different duration of time. This extracted features are given to the stated speaker design and the decisions are obtained. Finally, the overall performance are examined according to the combination of short-term and long term-features.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A hybrid approach to synchronization in real time multiprocessor systems An effective tree metrics graph cut algorithm for MR brain image segmentation and tumor Identification Performance tradeoffs between diversity schemes in wireless systems Fixed point pipelined architecture for QR decomposition Reliability of different levels of cascaded H-Bridge inverter: An investigation and comparison
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1