{"title":"XDF-REPA:面向英语学习的精细发音评估的密集标记数据集","authors":"Yun Gao, Zhigang Ou, Jianfeng Cheng, Yong Ruan, Xiangdong Wang, Yueliang Qian","doi":"10.1109/O-COCOSDA46868.2019.9041154","DOIUrl":null,"url":null,"abstract":"Currently, most computer assisted pronunciation training (CAPT) systems focus on overall scoring or mispronunciation detection. In this paper, we address the issue of refined pronunciation assessment (RPA), which aims at providing more refined information to L2 learners. To meet the major challenge of the lack of densely labeled data, we present the XDF-REPA dataset, which is freely available to the public. The dataset contains 19,213 English word utterances by 18 Chinese adults, among which 4,200 audio clips from 9 speakers are densely labeled by 3 linguists with intended phoneme, actually uttered phoneme, phoneme score for each phoneme, and an overall score for the word as well. To reduce the difference between annotators, scoring rules combining subjectivity and objectivity are defined. To demonstrate the usage of the dataset and provide a baseline for other researchers, a prototype system for RPA is developed and described in the paper, which adopts a DNN-HMM based acoustic model and a variant of Goodness of Pronunciation (GOP) to yield all the corrective feedbacks needed for RPA. Experimental results show error detection accuracy varies from 80.1% to 85.1% for different subsets and linguists, and accuracy of actually-uttered-phoneme recognition varies from 70.9% to 80.8% for different subsets and linguists.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"XDF-REPA: A Densely Labeled Dataset toward Refined Pronunciation Assessment for English Learning\",\"authors\":\"Yun Gao, Zhigang Ou, Jianfeng Cheng, Yong Ruan, Xiangdong Wang, Yueliang Qian\",\"doi\":\"10.1109/O-COCOSDA46868.2019.9041154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Currently, most computer assisted pronunciation training (CAPT) systems focus on overall scoring or mispronunciation detection. In this paper, we address the issue of refined pronunciation assessment (RPA), which aims at providing more refined information to L2 learners. To meet the major challenge of the lack of densely labeled data, we present the XDF-REPA dataset, which is freely available to the public. The dataset contains 19,213 English word utterances by 18 Chinese adults, among which 4,200 audio clips from 9 speakers are densely labeled by 3 linguists with intended phoneme, actually uttered phoneme, phoneme score for each phoneme, and an overall score for the word as well. To reduce the difference between annotators, scoring rules combining subjectivity and objectivity are defined. To demonstrate the usage of the dataset and provide a baseline for other researchers, a prototype system for RPA is developed and described in the paper, which adopts a DNN-HMM based acoustic model and a variant of Goodness of Pronunciation (GOP) to yield all the corrective feedbacks needed for RPA. Experimental results show error detection accuracy varies from 80.1% to 85.1% for different subsets and linguists, and accuracy of actually-uttered-phoneme recognition varies from 70.9% to 80.8% for different subsets and linguists.\",\"PeriodicalId\":263209,\"journal\":{\"name\":\"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/O-COCOSDA46868.2019.9041154\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
XDF-REPA: A Densely Labeled Dataset toward Refined Pronunciation Assessment for English Learning
Currently, most computer assisted pronunciation training (CAPT) systems focus on overall scoring or mispronunciation detection. In this paper, we address the issue of refined pronunciation assessment (RPA), which aims at providing more refined information to L2 learners. To meet the major challenge of the lack of densely labeled data, we present the XDF-REPA dataset, which is freely available to the public. The dataset contains 19,213 English word utterances by 18 Chinese adults, among which 4,200 audio clips from 9 speakers are densely labeled by 3 linguists with intended phoneme, actually uttered phoneme, phoneme score for each phoneme, and an overall score for the word as well. To reduce the difference between annotators, scoring rules combining subjectivity and objectivity are defined. To demonstrate the usage of the dataset and provide a baseline for other researchers, a prototype system for RPA is developed and described in the paper, which adopts a DNN-HMM based acoustic model and a variant of Goodness of Pronunciation (GOP) to yield all the corrective feedbacks needed for RPA. Experimental results show error detection accuracy varies from 80.1% to 85.1% for different subsets and linguists, and accuracy of actually-uttered-phoneme recognition varies from 70.9% to 80.8% for different subsets and linguists.