文本到语音的非标准越南语单词检测与规范化

Huu-Tien Dang, Thi-Hai-Yen Vuong, X. Phan
{"title":"文本到语音的非标准越南语单词检测与规范化","authors":"Huu-Tien Dang, Thi-Hai-Yen Vuong, X. Phan","doi":"10.1109/KSE56063.2022.9953791","DOIUrl":null,"url":null,"abstract":"Converting written texts into their spoken forms is an essential problem in any text-to-speech (TTS) systems. However, building an effective text normalization solution for a real-world TTS system face two main challenges: (1) the semantic ambiguity of non-standard words (NSWs), e.g., numbers, dates, ranges, scores, abbreviations, and (2) transforming NSWs into pronounceable syllables, such as URL, email address, hashtag, and contact name. In this paper, we propose a new two-phase normalization approach to deal with these challenges. First, a model-based tagger is designed to detect NSWs. Then, depending on NSW types, a rule-based normalizer expands those NSWs into their final verbal forms. We conducted three empirical experiments for NSW detection using Conditional Random Fields (CRFs), BiLSTM-CNN-CRF, and BERT-BiGRU-CRF models on a manually annotated dataset including 5819 sentences extracted from Vietnamese news articles. In the second phase, we propose a forward lexicon-based maximum matching algorithm to split down the hashtag, email, URL, and contact name. The experimental results of the tagging phase show that the average F1 scores of the BiLSTM-CNN-CRF and CRF models are above 90.00%, reaching the highest F1 of 95.00% with the BERT-BiGRU-CRF model. Overall, our approach has low sentence error rates, at 8.15% with CRF and 7.11% with BiLSTM-CNNCRF taggers, and only 6.67% with BERT-BiGRU-CRF tagger.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Non-Standard Vietnamese Word Detection and Normalization for Text–to–Speech\",\"authors\":\"Huu-Tien Dang, Thi-Hai-Yen Vuong, X. Phan\",\"doi\":\"10.1109/KSE56063.2022.9953791\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Converting written texts into their spoken forms is an essential problem in any text-to-speech (TTS) systems. However, building an effective text normalization solution for a real-world TTS system face two main challenges: (1) the semantic ambiguity of non-standard words (NSWs), e.g., numbers, dates, ranges, scores, abbreviations, and (2) transforming NSWs into pronounceable syllables, such as URL, email address, hashtag, and contact name. In this paper, we propose a new two-phase normalization approach to deal with these challenges. First, a model-based tagger is designed to detect NSWs. Then, depending on NSW types, a rule-based normalizer expands those NSWs into their final verbal forms. We conducted three empirical experiments for NSW detection using Conditional Random Fields (CRFs), BiLSTM-CNN-CRF, and BERT-BiGRU-CRF models on a manually annotated dataset including 5819 sentences extracted from Vietnamese news articles. In the second phase, we propose a forward lexicon-based maximum matching algorithm to split down the hashtag, email, URL, and contact name. The experimental results of the tagging phase show that the average F1 scores of the BiLSTM-CNN-CRF and CRF models are above 90.00%, reaching the highest F1 of 95.00% with the BERT-BiGRU-CRF model. Overall, our approach has low sentence error rates, at 8.15% with CRF and 7.11% with BiLSTM-CNNCRF taggers, and only 6.67% with BERT-BiGRU-CRF tagger.\",\"PeriodicalId\":330865,\"journal\":{\"name\":\"2022 14th International Conference on Knowledge and Systems Engineering (KSE)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 14th International Conference on Knowledge and Systems Engineering (KSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/KSE56063.2022.9953791\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE56063.2022.9953791","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

将书面文本转换为口语形式是任何文本到语音(TTS)系统的基本问题。然而,为现实世界的TTS系统构建有效的文本规范化解决方案面临两个主要挑战:(1)非标准单词(nsw)的语义歧义,如数字、日期、范围、分数、缩写;(2)将非标准单词(nsw)转换为可发音的音节,如URL、电子邮件地址、hashtag和联系人姓名。在本文中,我们提出了一种新的两阶段规范化方法来应对这些挑战。首先,设计了一个基于模型的标注器来检测nsw。然后,根据NSW类型,基于规则的规范化器将这些NSW扩展为最终的口头形式。本文采用条件随机场(CRFs)、BiLSTM-CNN-CRF和BERT-BiGRU-CRF模型,在越南新闻文章中提取的5819个句子的人工标注数据集上进行了三个NSW检测的实证实验。在第二阶段,我们提出了一种基于正向词典的最大匹配算法来拆分标签、电子邮件、URL和联系人姓名。标记阶段的实验结果表明,BiLSTM-CNN-CRF和CRF模型的平均F1分数都在90.00%以上,其中BERT-BiGRU-CRF模型的F1分数最高,达到95.00%。总体而言,我们的方法具有较低的句子错误率,使用CRF和BiLSTM-CNNCRF标注器的错误率分别为8.15%和7.11%,使用BERT-BiGRU-CRF标注器的错误率只有6.67%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Non-Standard Vietnamese Word Detection and Normalization for Text–to–Speech
Converting written texts into their spoken forms is an essential problem in any text-to-speech (TTS) systems. However, building an effective text normalization solution for a real-world TTS system face two main challenges: (1) the semantic ambiguity of non-standard words (NSWs), e.g., numbers, dates, ranges, scores, abbreviations, and (2) transforming NSWs into pronounceable syllables, such as URL, email address, hashtag, and contact name. In this paper, we propose a new two-phase normalization approach to deal with these challenges. First, a model-based tagger is designed to detect NSWs. Then, depending on NSW types, a rule-based normalizer expands those NSWs into their final verbal forms. We conducted three empirical experiments for NSW detection using Conditional Random Fields (CRFs), BiLSTM-CNN-CRF, and BERT-BiGRU-CRF models on a manually annotated dataset including 5819 sentences extracted from Vietnamese news articles. In the second phase, we propose a forward lexicon-based maximum matching algorithm to split down the hashtag, email, URL, and contact name. The experimental results of the tagging phase show that the average F1 scores of the BiLSTM-CNN-CRF and CRF models are above 90.00%, reaching the highest F1 of 95.00% with the BERT-BiGRU-CRF model. Overall, our approach has low sentence error rates, at 8.15% with CRF and 7.11% with BiLSTM-CNNCRF taggers, and only 6.67% with BERT-BiGRU-CRF tagger.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
DWEN: A novel method for accurate estimation of cell type compositions from bulk data samples Polygenic risk scores adaptation for Height in a Vietnamese population Sentiment Classification for Beauty-fashion Reviews An Automated Stub Method for Unit Testing C/C++ Projects Knowledge-based Problem Solving and Reasoning methods
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1