Using LSTM neural networks for cross-lingual phonetic speech segmentation with an iterative correction procedure

IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Computational Intelligence Pub Date : 2023-09-19 DOI:10.1111/coin.12602
Zdeněk Hanzlíček, Jindřich Matoušek, Jakub Vít
{"title":"Using LSTM neural networks for cross-lingual phonetic speech segmentation with an iterative correction procedure","authors":"Zdeněk Hanzlíček,&nbsp;Jindřich Matoušek,&nbsp;Jakub Vít","doi":"10.1111/coin.12602","DOIUrl":null,"url":null,"abstract":"<p>This article describes experiments on speech segmentation using long short-term memory recurrent neural networks. The main part of the paper deals with multi-lingual and cross-lingual segmentation, that is, it is performed on a language different from the one on which the model was trained. The experimental data involves large Czech, English, German, and Russian speech corpora designated for speech synthesis. For optimal multi-lingual modeling, a compact phonetic alphabet was proposed by sharing and clustering phones of particular languages. Many experiments were performed exploring various experimental conditions and data combinations. We proposed a simple procedure that iteratively adapts the inaccurate default model to the new voice/language. The segmentation accuracy was evaluated by comparison with reference segmentation created by a well-tuned hidden Markov model-based framework with additional manual corrections. The resulting segmentation was also employed in a unit selection text-to-speech system. The generated speech quality was compared with the reference segmentation by a preference listening test.</p>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/coin.12602","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.12602","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

This article describes experiments on speech segmentation using long short-term memory recurrent neural networks. The main part of the paper deals with multi-lingual and cross-lingual segmentation, that is, it is performed on a language different from the one on which the model was trained. The experimental data involves large Czech, English, German, and Russian speech corpora designated for speech synthesis. For optimal multi-lingual modeling, a compact phonetic alphabet was proposed by sharing and clustering phones of particular languages. Many experiments were performed exploring various experimental conditions and data combinations. We proposed a simple procedure that iteratively adapts the inaccurate default model to the new voice/language. The segmentation accuracy was evaluated by comparison with reference segmentation created by a well-tuned hidden Markov model-based framework with additional manual corrections. The resulting segmentation was also employed in a unit selection text-to-speech system. The generated speech quality was compared with the reference segmentation by a preference listening test.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用 LSTM 神经网络进行跨语言语音分割,并采用迭代修正程序
本文介绍了使用长短期记忆递归神经网络进行语音分割的实验。论文的主要部分涉及多语言和跨语言分段,即在不同于训练模型的语言上进行分段。实验数据包括用于语音合成的大型捷克语、英语、德语和俄语语音库。为了优化多语言建模,通过共享和聚类特定语言的音素,提出了一个紧凑的音素字母表。我们进行了许多实验,探索各种实验条件和数据组合。我们提出了一个简单的程序,通过迭代使不准确的默认模型适应新的语音/语言。通过与基于隐马尔可夫模型的框架所创建的参考分段进行比较,并进行额外的人工修正,评估了分段的准确性。在单元选择文本到语音系统中也采用了由此产生的分段。通过偏好听力测试,将生成的语音质量与参考分段进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computational Intelligence
Computational Intelligence 工程技术-计算机:人工智能
CiteScore
6.90
自引率
3.60%
发文量
65
审稿时长
>12 weeks
期刊介绍: This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.
期刊最新文献
Comprehensive analysis of feature-algorithm interactions for fall detection across age groups via machine learning An Efficient and Robust 3D Medical Image Classification Approach Based on 3D CNN, Time-Distributed 2D CNN-BLSTM Models, and mRMR Feature Selection Modified local Granger causality analysis based on Peter-Clark algorithm for multivariate time series prediction on IoT data A Benchmark Proposal for Non-Generative Fair Adversarial Learning Strategies Using a Fairness-Utility Trade-off Metric Synthetic Image Generation Using Deep Learning: A Systematic Literature Review
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1