{"title":"多语言三音集定义的凝聚与树聚类","authors":"B. Imperl, Z. Kacic, B. Horvat, A. Zgank","doi":"10.1109/ICASSP.2000.861809","DOIUrl":null,"url":null,"abstract":"The paper addresses the problem of multilingual acoustic modelling for the design of multilingual speech recognisers. Two different approaches for the definition of multilingual set of triphones (bottom-up and a top-down) are investigated. A new clustering algorithm for the definition of multilingual set of triphones is proposed. The agglomerative clustering algorithm (bottom-up) is based on a definition of a distance measure for triphones defined as a weighted sum of explicit estimates of the context similarity on a monophone level. The monophone similarity estimation method is based on the algorithm of Houtgast. The second type of system uses tree-based clustering (top-down) with a common decision tree. The experiments were based on the SpeechDat II databases (Slovenian, Spanish and German 1000 FDB SpeechDat II). Experiments have shown that the use of the agglomerative clustering algorithm results in a significant reduction of the number of triphones with minor degradation of word accuracy.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Agglomerative vs. tree-based clustering for the definition of multilingual set of triphones\",\"authors\":\"B. Imperl, Z. Kacic, B. Horvat, A. Zgank\",\"doi\":\"10.1109/ICASSP.2000.861809\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper addresses the problem of multilingual acoustic modelling for the design of multilingual speech recognisers. Two different approaches for the definition of multilingual set of triphones (bottom-up and a top-down) are investigated. A new clustering algorithm for the definition of multilingual set of triphones is proposed. The agglomerative clustering algorithm (bottom-up) is based on a definition of a distance measure for triphones defined as a weighted sum of explicit estimates of the context similarity on a monophone level. The monophone similarity estimation method is based on the algorithm of Houtgast. The second type of system uses tree-based clustering (top-down) with a common decision tree. The experiments were based on the SpeechDat II databases (Slovenian, Spanish and German 1000 FDB SpeechDat II). Experiments have shown that the use of the agglomerative clustering algorithm results in a significant reduction of the number of triphones with minor degradation of word accuracy.\",\"PeriodicalId\":164817,\"journal\":{\"name\":\"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2000.861809\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2000.861809","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Agglomerative vs. tree-based clustering for the definition of multilingual set of triphones
The paper addresses the problem of multilingual acoustic modelling for the design of multilingual speech recognisers. Two different approaches for the definition of multilingual set of triphones (bottom-up and a top-down) are investigated. A new clustering algorithm for the definition of multilingual set of triphones is proposed. The agglomerative clustering algorithm (bottom-up) is based on a definition of a distance measure for triphones defined as a weighted sum of explicit estimates of the context similarity on a monophone level. The monophone similarity estimation method is based on the algorithm of Houtgast. The second type of system uses tree-based clustering (top-down) with a common decision tree. The experiments were based on the SpeechDat II databases (Slovenian, Spanish and German 1000 FDB SpeechDat II). Experiments have shown that the use of the agglomerative clustering algorithm results in a significant reduction of the number of triphones with minor degradation of word accuracy.