A Great Reduction of WER by Syllable Toneme Prediction for Thai Grapheme to Phoneme Conversion

2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2019-10-01 DOI:10.1109/O-COCOSDA46868.2019.9041212

S. Saychum, A. Rugchatjaroen, C. Wutiwiwatchai

{"title":"A Great Reduction of WER by Syllable Toneme Prediction for Thai Grapheme to Phoneme Conversion","authors":"S. Saychum, A. Rugchatjaroen, C. Wutiwiwatchai","doi":"10.1109/O-COCOSDA46868.2019.9041212","DOIUrl":null,"url":null,"abstract":"Thai toneme prediction has been one of the greatest difficulties for Thai grapheme to phoneme conversion (G2P). This paper presents an improvement in the prediction of linguistic features in terms of tone rules. Among these, there will always be exceptions, for example, the tones used in loan words and transliterated words, which are usually adopted from the original language. This paper does not concern itself with the transliteration problem, but aims to show the success of a method which uses an automatic toneme predictor based on the tone rules of Thai pronunciation for the development of a machine learning model. The proposed method attaches a predictor to the final stage of converting a grapheme to a phoneme. Furthermore, this work also explores end-to-end prediction using Long Short Term Memories (LSTM) that takes its input sequence from the National Electronic and Computer Technology Center's Pseudo-Syllable segmentation and alignment tool. An evaluation was conducted to show the success of the proposed system, and also to compare the results with our traditional end-to-end sequence-to-sequence G2P. A comparison of the results shows that sequence-to-sequence modeling obtains the lowest Word Error Rate at 1.6%, and the proposed system works well on a 2018 small device (Raspberry Pi).","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041212","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Thai toneme prediction has been one of the greatest difficulties for Thai grapheme to phoneme conversion (G2P). This paper presents an improvement in the prediction of linguistic features in terms of tone rules. Among these, there will always be exceptions, for example, the tones used in loan words and transliterated words, which are usually adopted from the original language. This paper does not concern itself with the transliteration problem, but aims to show the success of a method which uses an automatic toneme predictor based on the tone rules of Thai pronunciation for the development of a machine learning model. The proposed method attaches a predictor to the final stage of converting a grapheme to a phoneme. Furthermore, this work also explores end-to-end prediction using Long Short Term Memories (LSTM) that takes its input sequence from the National Electronic and Computer Technology Center's Pseudo-Syllable segmentation and alignment tool. An evaluation was conducted to show the success of the proposed system, and also to compare the results with our traditional end-to-end sequence-to-sequence G2P. A comparison of the results shows that sequence-to-sequence modeling obtains the lowest Word Error Rate at 1.6%, and the proposed system works well on a 2018 small device (Raspberry Pi).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用音节语调预测泰文字素到音素转换中WER的大幅减少

泰语的音素预测一直是泰语字形到音素转换(G2P)的最大难题之一。本文提出了一种基于声调规则的语言特征预测方法。其中，总会有例外，比如外来词和音译词所用的音调，通常是从原语言中采用的。本文不关注音译问题，而是旨在展示一种方法的成功，该方法使用基于泰语发音音调规则的自动音调预测器来开发机器学习模型。该方法将一个预测器附加到将字素转换为音素的最后阶段。此外，这项工作还探索了使用长短期记忆(LSTM)的端到端预测，LSTM的输入序列来自国家电子和计算机技术中心的伪音节分割和对齐工具。结果表明，该系统是成功的，并与传统的端到端序列对序列G2P进行了比较。结果比较表明，序列到序列建模获得的单词错误率最低，为1.6%，并且所提出的系统在2018年的小型设备(树莓派)上运行良好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)

自引率

0.00%

发文量

期刊最新文献

A Great Reduction of WER by Syllable Toneme Prediction for Thai Grapheme to Phoneme Conversion index The Architecture of Speech-to-Speech Translator for Mobile Conversation Characteristics of everyday conversation derived from the analysis of dialog act annotation Annotation and preliminary analysis of utterance decontextualization in a multiactivity