{"title":"改进了基于个体的世界英语聚类的英语说话者之间的口音差距预测","authors":"Fumiya Shiozawa, D. Saito, N. Minematsu","doi":"10.1109/SLT.2016.7846255","DOIUrl":null,"url":null,"abstract":"The term of “World Englishes” describes the current state of English and one of their main characteristics is a large diversity of pronunciation, called accents. In our previous studies, we developed several techniques to realize effective clustering and visualization of the diversity. For this aim, the accent gap between two speakers has to be quantified independently of extra-linguistic factors such as age and gender. To realize this, a unique representation of speech, called speech structure, which is theoretically invariant against these factors, was applied to represent pronunciation. In the current study, by controlling the degree of invariance, we attempt to improve accent gap prediction. Two techniques are tested: DNN-based model-free estimation of divergence and multi-stream speech structures. In the former, instead of estimating separability between two speech events based on some model assumptions, DNN-based class posteriors are utilized for estimation. In the latter, by deriving one speech structure for each sub-space of acoustic features, constrained invariance is realized. Our proposals are tested in terms of the correlation between reference accent gaps and the predicted and quantified gaps. Experiments show that the correlation is improved from 0.718 to 0.730.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Improved prediction of the accent gap between speakers of English for individual-based clustering of World Englishes\",\"authors\":\"Fumiya Shiozawa, D. Saito, N. Minematsu\",\"doi\":\"10.1109/SLT.2016.7846255\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The term of “World Englishes” describes the current state of English and one of their main characteristics is a large diversity of pronunciation, called accents. In our previous studies, we developed several techniques to realize effective clustering and visualization of the diversity. For this aim, the accent gap between two speakers has to be quantified independently of extra-linguistic factors such as age and gender. To realize this, a unique representation of speech, called speech structure, which is theoretically invariant against these factors, was applied to represent pronunciation. In the current study, by controlling the degree of invariance, we attempt to improve accent gap prediction. Two techniques are tested: DNN-based model-free estimation of divergence and multi-stream speech structures. In the former, instead of estimating separability between two speech events based on some model assumptions, DNN-based class posteriors are utilized for estimation. In the latter, by deriving one speech structure for each sub-space of acoustic features, constrained invariance is realized. Our proposals are tested in terms of the correlation between reference accent gaps and the predicted and quantified gaps. Experiments show that the correlation is improved from 0.718 to 0.730.\",\"PeriodicalId\":281635,\"journal\":{\"name\":\"2016 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2016.7846255\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improved prediction of the accent gap between speakers of English for individual-based clustering of World Englishes
The term of “World Englishes” describes the current state of English and one of their main characteristics is a large diversity of pronunciation, called accents. In our previous studies, we developed several techniques to realize effective clustering and visualization of the diversity. For this aim, the accent gap between two speakers has to be quantified independently of extra-linguistic factors such as age and gender. To realize this, a unique representation of speech, called speech structure, which is theoretically invariant against these factors, was applied to represent pronunciation. In the current study, by controlling the degree of invariance, we attempt to improve accent gap prediction. Two techniques are tested: DNN-based model-free estimation of divergence and multi-stream speech structures. In the former, instead of estimating separability between two speech events based on some model assumptions, DNN-based class posteriors are utilized for estimation. In the latter, by deriving one speech structure for each sub-space of acoustic features, constrained invariance is realized. Our proposals are tested in terms of the correlation between reference accent gaps and the predicted and quantified gaps. Experiments show that the correlation is improved from 0.718 to 0.730.