文本到语音的非标准越南语单词检测与规范化

2022 14th International Conference on Knowledge and Systems Engineering (KSE) Pub Date : 2022-09-07 DOI:10.1109/KSE56063.2022.9953791

Huu-Tien Dang, Thi-Hai-Yen Vuong, X. Phan

{"title":"文本到语音的非标准越南语单词检测与规范化","authors":"Huu-Tien Dang, Thi-Hai-Yen Vuong, X. Phan","doi":"10.1109/KSE56063.2022.9953791","DOIUrl":null,"url":null,"abstract":"Converting written texts into their spoken forms is an essential problem in any text-to-speech (TTS) systems. However, building an effective text normalization solution for a real-world TTS system face two main challenges: (1) the semantic ambiguity of non-standard words (NSWs), e.g., numbers, dates, ranges, scores, abbreviations, and (2) transforming NSWs into pronounceable syllables, such as URL, email address, hashtag, and contact name. In this paper, we propose a new two-phase normalization approach to deal with these challenges. First, a model-based tagger is designed to detect NSWs. Then, depending on NSW types, a rule-based normalizer expands those NSWs into their final verbal forms. We conducted three empirical experiments for NSW detection using Conditional Random Fields (CRFs), BiLSTM-CNN-CRF, and BERT-BiGRU-CRF models on a manually annotated dataset including 5819 sentences extracted from Vietnamese news articles. In the second phase, we propose a forward lexicon-based maximum matching algorithm to split down the hashtag, email, URL, and contact name. The experimental results of the tagging phase show that the average F1 scores of the BiLSTM-CNN-CRF and CRF models are above 90.00%, reaching the highest F1 of 95.00% with the BERT-BiGRU-CRF model. Overall, our approach has low sentence error rates, at 8.15% with CRF and 7.11% with BiLSTM-CNNCRF taggers, and only 6.67% with BERT-BiGRU-CRF tagger.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Non-Standard Vietnamese Word Detection and Normalization for Text–to–Speech\",\"authors\":\"Huu-Tien Dang, Thi-Hai-Yen Vuong, X. Phan\",\"doi\":\"10.1109/KSE56063.2022.9953791\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Converting written texts into their spoken forms is an essential problem in any text-to-speech (TTS) systems. However, building an effective text normalization solution for a real-world TTS system face two main challenges: (1) the semantic ambiguity of non-standard words (NSWs), e.g., numbers, dates, ranges, scores, abbreviations, and (2) transforming NSWs into pronounceable syllables, such as URL, email address, hashtag, and contact name. In this paper, we propose a new two-phase normalization approach to deal with these challenges. First, a model-based tagger is designed to detect NSWs. Then, depending on NSW types, a rule-based normalizer expands those NSWs into their final verbal forms. We conducted three empirical experiments for NSW detection using Conditional Random Fields (CRFs), BiLSTM-CNN-CRF, and BERT-BiGRU-CRF models on a manually annotated dataset including 5819 sentences extracted from Vietnamese news articles. In the second phase, we propose a forward lexicon-based maximum matching algorithm to split down the hashtag, email, URL, and contact name. The experimental results of the tagging phase show that the average F1 scores of the BiLSTM-CNN-CRF and CRF models are above 90.00%, reaching the highest F1 of 95.00% with the BERT-BiGRU-CRF model. Overall, our approach has low sentence error rates, at 8.15% with CRF and 7.11% with BiLSTM-CNNCRF taggers, and only 6.67% with BERT-BiGRU-CRF tagger.\",\"PeriodicalId\":330865,\"journal\":{\"name\":\"2022 14th International Conference on Knowledge and Systems Engineering (KSE)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 14th International Conference on Knowledge and Systems Engineering (KSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/KSE56063.2022.9953791\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE56063.2022.9953791","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

将书面文本转换为口语形式是任何文本到语音(TTS)系统的基本问题。然而，为现实世界的TTS系统构建有效的文本规范化解决方案面临两个主要挑战:(1)非标准单词(nsw)的语义歧义，如数字、日期、范围、分数、缩写;(2)将非标准单词(nsw)转换为可发音的音节，如URL、电子邮件地址、hashtag和联系人姓名。在本文中，我们提出了一种新的两阶段规范化方法来应对这些挑战。首先，设计了一个基于模型的标注器来检测nsw。然后，根据NSW类型，基于规则的规范化器将这些NSW扩展为最终的口头形式。本文采用条件随机场(CRFs)、BiLSTM-CNN-CRF和BERT-BiGRU-CRF模型，在越南新闻文章中提取的5819个句子的人工标注数据集上进行了三个NSW检测的实证实验。在第二阶段，我们提出了一种基于正向词典的最大匹配算法来拆分标签、电子邮件、URL和联系人姓名。标记阶段的实验结果表明，BiLSTM-CNN-CRF和CRF模型的平均F1分数都在90.00%以上，其中BERT-BiGRU-CRF模型的F1分数最高，达到95.00%。总体而言，我们的方法具有较低的句子错误率，使用CRF和BiLSTM-CNNCRF标注器的错误率分别为8.15%和7.11%，使用BERT-BiGRU-CRF标注器的错误率只有6.67%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Non-Standard Vietnamese Word Detection and Normalization for Text–to–Speech

Converting written texts into their spoken forms is an essential problem in any text-to-speech (TTS) systems. However, building an effective text normalization solution for a real-world TTS system face two main challenges: (1) the semantic ambiguity of non-standard words (NSWs), e.g., numbers, dates, ranges, scores, abbreviations, and (2) transforming NSWs into pronounceable syllables, such as URL, email address, hashtag, and contact name. In this paper, we propose a new two-phase normalization approach to deal with these challenges. First, a model-based tagger is designed to detect NSWs. Then, depending on NSW types, a rule-based normalizer expands those NSWs into their final verbal forms. We conducted three empirical experiments for NSW detection using Conditional Random Fields (CRFs), BiLSTM-CNN-CRF, and BERT-BiGRU-CRF models on a manually annotated dataset including 5819 sentences extracted from Vietnamese news articles. In the second phase, we propose a forward lexicon-based maximum matching algorithm to split down the hashtag, email, URL, and contact name. The experimental results of the tagging phase show that the average F1 scores of the BiLSTM-CNN-CRF and CRF models are above 90.00%, reaching the highest F1 of 95.00% with the BERT-BiGRU-CRF model. Overall, our approach has low sentence error rates, at 8.15% with CRF and 7.11% with BiLSTM-CNNCRF taggers, and only 6.67% with BERT-BiGRU-CRF tagger.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 14th International Conference on Knowledge and Systems Engineering (KSE)

自引率

0.00%

发文量

期刊最新文献

DWEN: A novel method for accurate estimation of cell type compositions from bulk data samples Polygenic risk scores adaptation for Height in a Vietnamese population Sentiment Classification for Beauty-fashion Reviews An Automated Stub Method for Unit Testing C/C++ Projects Knowledge-based Problem Solving and Reasoning methods