{"title":"树形说话人聚类,快速适应说话人","authors":"T. Kosaka, S. Sagayama","doi":"10.1109/ICASSP.1994.389309","DOIUrl":null,"url":null,"abstract":"The paper proposes a tree-structured speaker clustering algorithm and discusses its application to fast speaker adaptation. By tracing the clustering tree from top to bottom, adaptation is performed step-by-step from global to local individuality of speech. This adaptation method employs successive branch selection in the speaker clustering tree rather than parameter training and hence achieves fast adaptation using only a small amount of training data. This speaker adaptation method was applied to a hidden Markov network (HMnet) and evaluated in Japanese phoneme and phrase recognition experiments, in which it significantly outperformed speaker-independent recognition methods. In the phrase recognition experiments, the method reduced the error rate by 26.6% using three phrase utterances (approximately 2.7 seconds).<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"66","resultStr":"{\"title\":\"Tree-structured speaker clustering for fast speaker adaptation\",\"authors\":\"T. Kosaka, S. Sagayama\",\"doi\":\"10.1109/ICASSP.1994.389309\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper proposes a tree-structured speaker clustering algorithm and discusses its application to fast speaker adaptation. By tracing the clustering tree from top to bottom, adaptation is performed step-by-step from global to local individuality of speech. This adaptation method employs successive branch selection in the speaker clustering tree rather than parameter training and hence achieves fast adaptation using only a small amount of training data. This speaker adaptation method was applied to a hidden Markov network (HMnet) and evaluated in Japanese phoneme and phrase recognition experiments, in which it significantly outperformed speaker-independent recognition methods. In the phrase recognition experiments, the method reduced the error rate by 26.6% using three phrase utterances (approximately 2.7 seconds).<<ETX>>\",\"PeriodicalId\":290798,\"journal\":{\"name\":\"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1994-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"66\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.1994.389309\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.1994.389309","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Tree-structured speaker clustering for fast speaker adaptation
The paper proposes a tree-structured speaker clustering algorithm and discusses its application to fast speaker adaptation. By tracing the clustering tree from top to bottom, adaptation is performed step-by-step from global to local individuality of speech. This adaptation method employs successive branch selection in the speaker clustering tree rather than parameter training and hence achieves fast adaptation using only a small amount of training data. This speaker adaptation method was applied to a hidden Markov network (HMnet) and evaluated in Japanese phoneme and phrase recognition experiments, in which it significantly outperformed speaker-independent recognition methods. In the phrase recognition experiments, the method reduced the error rate by 26.6% using three phrase utterances (approximately 2.7 seconds).<>