Pannakorn Chao-angthong, A. Suchato, P. Punyabukkana
{"title":"泰国北部方言从文本到语音","authors":"Pannakorn Chao-angthong, A. Suchato, P. Punyabukkana","doi":"10.1109/JCSSE.2017.8025905","DOIUrl":null,"url":null,"abstract":"Each of the dialects of Thai Language has a distinct identity associated with its accents. The conversation between different native speakers of these dialects despite their standard language origination cannot be avoided when visiting each region. Communication with people who understand only the Northern Thai Dialect (NTD) brought us to the idea of inventing the Northern Thai Dialect Text to Speech (NTD-TTS). This idea derives from the same concept as a translating program; after getting text input in the Center Thai Dialect (CTD), the TTS system will translate and synthesize speech output in NTD. TTS used a software structure and modified two components: Grapheme to Phoneme (G2P) and Speech models. The NTD-G2P conversion was created by using rule-based and dictionary-based approaches. It was evaluated by 100 randomly selected sentences from ORCHID. The NTD-G2P reports a conversion accuracy of 83.19% on the syllable level and it is used for implementing the NTD-corpus. The sentence selections were presented to train the NTD speech model. The selection chosen covers 95.32% in the first percentile of phoneme distribution in the NTD-corpus. After connecting the speech models to the TTS system, the whole system was evaluated with Mean Opinion Score (MOS) and the comprehension on the syllable level by the native speakers. The NTD-MOS evaluations indicated that the accent, naturalness, and intelligibility of synthetic speech ranged from “acceptable” to “good”. The test set of the NTD-TTS system earned a good MOS and high comprehension percentage from the NTD native listeners. The results are 3.73 in the accent, 3.68 in the naturalness, 3.63 in the intelligibility, and the comprehension percentage is 97.16%.","PeriodicalId":6460,"journal":{"name":"2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Northern Thai Dialect Text to Speech\",\"authors\":\"Pannakorn Chao-angthong, A. Suchato, P. Punyabukkana\",\"doi\":\"10.1109/JCSSE.2017.8025905\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Each of the dialects of Thai Language has a distinct identity associated with its accents. The conversation between different native speakers of these dialects despite their standard language origination cannot be avoided when visiting each region. Communication with people who understand only the Northern Thai Dialect (NTD) brought us to the idea of inventing the Northern Thai Dialect Text to Speech (NTD-TTS). This idea derives from the same concept as a translating program; after getting text input in the Center Thai Dialect (CTD), the TTS system will translate and synthesize speech output in NTD. TTS used a software structure and modified two components: Grapheme to Phoneme (G2P) and Speech models. The NTD-G2P conversion was created by using rule-based and dictionary-based approaches. It was evaluated by 100 randomly selected sentences from ORCHID. The NTD-G2P reports a conversion accuracy of 83.19% on the syllable level and it is used for implementing the NTD-corpus. The sentence selections were presented to train the NTD speech model. The selection chosen covers 95.32% in the first percentile of phoneme distribution in the NTD-corpus. After connecting the speech models to the TTS system, the whole system was evaluated with Mean Opinion Score (MOS) and the comprehension on the syllable level by the native speakers. The NTD-MOS evaluations indicated that the accent, naturalness, and intelligibility of synthetic speech ranged from “acceptable” to “good”. The test set of the NTD-TTS system earned a good MOS and high comprehension percentage from the NTD native listeners. The results are 3.73 in the accent, 3.68 in the naturalness, 3.63 in the intelligibility, and the comprehension percentage is 97.16%.\",\"PeriodicalId\":6460,\"journal\":{\"name\":\"2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"volume\":\"1 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/JCSSE.2017.8025905\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCSSE.2017.8025905","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
泰国语的每一种方言都有与其口音相关的独特身份。在访问每个地区时,这些方言的不同母语人士之间的对话是不可避免的,尽管他们的标准语言来源。与只懂泰国北部方言(NTD)的人交流使我们产生了发明泰国北部方言文本到语音(NTD- tts)的想法。这一理念源于与翻译程序相同的概念;TTS系统在获得CTD文本输入后,对NTD语音输出进行翻译和合成。TTS采用了一种软件结构,并对两个部分进行了修改:Grapheme to Phoneme (G2P)和Speech模型。NTD-G2P转换是通过使用基于规则和基于字典的方法创建的。它是通过从ORCHID中随机选择的100个句子来评估的。NTD-G2P在音节水平上的转换准确率为83.19%,用于实现ntd语料库。通过句子选择来训练NTD语音模型。所选择的音素覆盖了ntd语料库中音素分布前百分位数的95.32%。将语音模型与TTS系统连接后,用平均意见评分(Mean Opinion Score, MOS)和母语使用者在音节水平上的理解能力对整个系统进行评价。NTD-MOS评价表明,合成语音的口音、自然度和可理解性在“可接受”到“良好”之间。NTD- tts系统的测试集获得了NTD母语听众良好的MOS和较高的理解率。结果口音分3.73分,自然度分3.68分,可理解度分3.63分,理解率为97.16%。
Each of the dialects of Thai Language has a distinct identity associated with its accents. The conversation between different native speakers of these dialects despite their standard language origination cannot be avoided when visiting each region. Communication with people who understand only the Northern Thai Dialect (NTD) brought us to the idea of inventing the Northern Thai Dialect Text to Speech (NTD-TTS). This idea derives from the same concept as a translating program; after getting text input in the Center Thai Dialect (CTD), the TTS system will translate and synthesize speech output in NTD. TTS used a software structure and modified two components: Grapheme to Phoneme (G2P) and Speech models. The NTD-G2P conversion was created by using rule-based and dictionary-based approaches. It was evaluated by 100 randomly selected sentences from ORCHID. The NTD-G2P reports a conversion accuracy of 83.19% on the syllable level and it is used for implementing the NTD-corpus. The sentence selections were presented to train the NTD speech model. The selection chosen covers 95.32% in the first percentile of phoneme distribution in the NTD-corpus. After connecting the speech models to the TTS system, the whole system was evaluated with Mean Opinion Score (MOS) and the comprehension on the syllable level by the native speakers. The NTD-MOS evaluations indicated that the accent, naturalness, and intelligibility of synthetic speech ranged from “acceptable” to “good”. The test set of the NTD-TTS system earned a good MOS and high comprehension percentage from the NTD native listeners. The results are 3.73 in the accent, 3.68 in the naturalness, 3.63 in the intelligibility, and the comprehension percentage is 97.16%.