{"title":"二次规划结构约束分布匹配及其在发音评价中的应用","authors":"Y. Qiao, Masayuki Suzuki, N. Minematsu, K. Hirose","doi":"10.1109/ACPR.2011.6166673","DOIUrl":null,"url":null,"abstract":"We proposed a structural representation of speech that is robust to speaker difference due to its transformation-invariant property in previous works, where we compared two speech structures by calculating the distance between two structural vectors, each composed of the lengths of a structure's edges. However, this distance cannot yield matching scores directly related to individual events (nodes) of the two structures. In spite of comparing structural vectors directly, this paper takes structures as constraints for optimal pattern matching. We derive the formulas of objective functions and constraint functions for optimization. Under assumptions of Gaussian and shared covariance matrices, we show that this optimal problem can be reduced to a quadratically constrained quadratic programming problem. To relieve the too strong invariance problem, we use a subspace decomposition method and perform the optimization in each subspace. We evaluate the proposed method on a task to assess the goodness of students' English pronunciation. Experimental results show that the proposed method achieves higher correlations with teachers' manual scores than compared methods.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Structure-constrained distribution matching using quadratic programming and its application to pronunciation evaluation\",\"authors\":\"Y. Qiao, Masayuki Suzuki, N. Minematsu, K. Hirose\",\"doi\":\"10.1109/ACPR.2011.6166673\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We proposed a structural representation of speech that is robust to speaker difference due to its transformation-invariant property in previous works, where we compared two speech structures by calculating the distance between two structural vectors, each composed of the lengths of a structure's edges. However, this distance cannot yield matching scores directly related to individual events (nodes) of the two structures. In spite of comparing structural vectors directly, this paper takes structures as constraints for optimal pattern matching. We derive the formulas of objective functions and constraint functions for optimization. Under assumptions of Gaussian and shared covariance matrices, we show that this optimal problem can be reduced to a quadratically constrained quadratic programming problem. To relieve the too strong invariance problem, we use a subspace decomposition method and perform the optimization in each subspace. We evaluate the proposed method on a task to assess the goodness of students' English pronunciation. Experimental results show that the proposed method achieves higher correlations with teachers' manual scores than compared methods.\",\"PeriodicalId\":287232,\"journal\":{\"name\":\"The First Asian Conference on Pattern Recognition\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The First Asian Conference on Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACPR.2011.6166673\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The First Asian Conference on Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACPR.2011.6166673","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Structure-constrained distribution matching using quadratic programming and its application to pronunciation evaluation
We proposed a structural representation of speech that is robust to speaker difference due to its transformation-invariant property in previous works, where we compared two speech structures by calculating the distance between two structural vectors, each composed of the lengths of a structure's edges. However, this distance cannot yield matching scores directly related to individual events (nodes) of the two structures. In spite of comparing structural vectors directly, this paper takes structures as constraints for optimal pattern matching. We derive the formulas of objective functions and constraint functions for optimization. Under assumptions of Gaussian and shared covariance matrices, we show that this optimal problem can be reduced to a quadratically constrained quadratic programming problem. To relieve the too strong invariance problem, we use a subspace decomposition method and perform the optimization in each subspace. We evaluate the proposed method on a task to assess the goodness of students' English pronunciation. Experimental results show that the proposed method achieves higher correlations with teachers' manual scores than compared methods.