Chiranjeevi Yarra, Aparna Srinivasan, Chandana Srinivasa, Ritu Aggarwal, P. Ghosh
{"title":"voisTUTOR语料库:印度第二语言英语学习者的语音语料库,用于发音评估","authors":"Chiranjeevi Yarra, Aparna Srinivasan, Chandana Srinivasa, Ritu Aggarwal, P. Ghosh","doi":"10.1109/O-COCOSDA46868.2019.9041162","DOIUrl":null,"url":null,"abstract":"This paper describes the voisTUTOR corpus, a pronunciation assessment corpus of Indian second language (L2) learners learning English. This corpus consists of 26529 utterances approximately totalling to 14 hours. The recorded data was collected from 16 Indian L2 learners who are from six native languages, namely, Kannada, Telugu, Tamil, Malayalam, Hindi and Gujarati. A total of 1676 unique stimuli were considered for the recording. The stimuli were designed such that they ranged from single word stimuli to multiple word stimuli containing simple, complex and compound sentences. The corpus also consists of ratings representing overall quality on a scale of 0 to 10 for every utterance. In addition to the overall rating, unlike the existing corpora, a binary decision (0 or 1) is provided indicating the quality of the following seven factors, on which overall pronunciation typically depends, - 1) intelligibility, 2) phoneme quality, 3) phoneme mispronunciation, 4) syllable stress quality, 5) intonation quality, 6) correctness of pauses and 7) mother tongue influence. A spoken English expert provides the ratings and binary decisions for all the utterances. Furthermore, the corpus also consists of recordings of all the stimuli obtained from a male and a female spoken English expert. Considering factor dependent binary decisions and spoken English experts' recordings, voisTUTOR corpus is unique compared to the existing corpora. To the best of our knowledge, there exists no such corpus for pronunciation assessment in Indian nativity.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"voisTUTOR corpus: A speech corpus of Indian L2 English learners for pronunciation assessment\",\"authors\":\"Chiranjeevi Yarra, Aparna Srinivasan, Chandana Srinivasa, Ritu Aggarwal, P. Ghosh\",\"doi\":\"10.1109/O-COCOSDA46868.2019.9041162\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes the voisTUTOR corpus, a pronunciation assessment corpus of Indian second language (L2) learners learning English. This corpus consists of 26529 utterances approximately totalling to 14 hours. The recorded data was collected from 16 Indian L2 learners who are from six native languages, namely, Kannada, Telugu, Tamil, Malayalam, Hindi and Gujarati. A total of 1676 unique stimuli were considered for the recording. The stimuli were designed such that they ranged from single word stimuli to multiple word stimuli containing simple, complex and compound sentences. The corpus also consists of ratings representing overall quality on a scale of 0 to 10 for every utterance. In addition to the overall rating, unlike the existing corpora, a binary decision (0 or 1) is provided indicating the quality of the following seven factors, on which overall pronunciation typically depends, - 1) intelligibility, 2) phoneme quality, 3) phoneme mispronunciation, 4) syllable stress quality, 5) intonation quality, 6) correctness of pauses and 7) mother tongue influence. A spoken English expert provides the ratings and binary decisions for all the utterances. Furthermore, the corpus also consists of recordings of all the stimuli obtained from a male and a female spoken English expert. Considering factor dependent binary decisions and spoken English experts' recordings, voisTUTOR corpus is unique compared to the existing corpora. To the best of our knowledge, there exists no such corpus for pronunciation assessment in Indian nativity.\",\"PeriodicalId\":263209,\"journal\":{\"name\":\"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/O-COCOSDA46868.2019.9041162\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041162","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
voisTUTOR corpus: A speech corpus of Indian L2 English learners for pronunciation assessment
This paper describes the voisTUTOR corpus, a pronunciation assessment corpus of Indian second language (L2) learners learning English. This corpus consists of 26529 utterances approximately totalling to 14 hours. The recorded data was collected from 16 Indian L2 learners who are from six native languages, namely, Kannada, Telugu, Tamil, Malayalam, Hindi and Gujarati. A total of 1676 unique stimuli were considered for the recording. The stimuli were designed such that they ranged from single word stimuli to multiple word stimuli containing simple, complex and compound sentences. The corpus also consists of ratings representing overall quality on a scale of 0 to 10 for every utterance. In addition to the overall rating, unlike the existing corpora, a binary decision (0 or 1) is provided indicating the quality of the following seven factors, on which overall pronunciation typically depends, - 1) intelligibility, 2) phoneme quality, 3) phoneme mispronunciation, 4) syllable stress quality, 5) intonation quality, 6) correctness of pauses and 7) mother tongue influence. A spoken English expert provides the ratings and binary decisions for all the utterances. Furthermore, the corpus also consists of recordings of all the stimuli obtained from a male and a female spoken English expert. Considering factor dependent binary decisions and spoken English experts' recordings, voisTUTOR corpus is unique compared to the existing corpora. To the best of our knowledge, there exists no such corpus for pronunciation assessment in Indian nativity.