{"title":"Supervised and unsupervised clustering of the speaker space for connectionist speech recognition","authors":"Y. Konig, N. Morgan","doi":"10.1109/ICASSP.1993.319176","DOIUrl":null,"url":null,"abstract":"One of the challenging problems of a speaker-independent continuous speech recognition system is how to achieve good performance with a new speaker, when the only available source of information about the new speaker is the utterance to be recognized. The authors propose a first step toward a solution, based on clustering of the speaker space. The study had two steps. The first was searching for a set of features to cluster speakers. Second, using the chosen features, two kinds of clustering were investigated: supervised-using two clusters, males and females-and unsupervised-using two, three, and five clusters. The cluster information was integrated into the connectionist speech recognition system by using the speaker cluster neural network (SCNN). The SCNN attempts to share the speaker-independent parameters and to model the cluster-dependent parameters. The results show that the best performance is achieved with the supervised clusters, resulting in an overall improvement in recognition performance.<<ETX>>","PeriodicalId":428449,"journal":{"name":"1993 IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"1 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1993-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"1993 IEEE International Conference on Acoustics, Speech, and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.1993.319176","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
One of the challenging problems of a speaker-independent continuous speech recognition system is how to achieve good performance with a new speaker, when the only available source of information about the new speaker is the utterance to be recognized. The authors propose a first step toward a solution, based on clustering of the speaker space. The study had two steps. The first was searching for a set of features to cluster speakers. Second, using the chosen features, two kinds of clustering were investigated: supervised-using two clusters, males and females-and unsupervised-using two, three, and five clusters. The cluster information was integrated into the connectionist speech recognition system by using the speaker cluster neural network (SCNN). The SCNN attempts to share the speaker-independent parameters and to model the cluster-dependent parameters. The results show that the best performance is achieved with the supervised clusters, resulting in an overall improvement in recognition performance.<>