{"title":"基于DTW算法的音素边界标记","authors":"J. Rafalko","doi":"10.23919/SPA.2018.8563359","DOIUrl":null,"url":null,"abstract":"The paper presents an approach to marking the boundaries of allophones in the speech signal based on the Dynamic Time Warping (DTW) algorithm. Setting and marking of allophones boundaries in continuous speech is a difficult issue due to the mutual influence of adjacent phonemes on each other. It is this neighbourhood on the one hand that creates variants of phonemes that is allophones, and on the other hand it affects that the border between allophones is in some cases very difficult to determine. Nowadays, this task is carried out manually in cooperation with specialists in the field of phonetics. The presented approach allows to build a system that is able to automate this process. The aim of the work currently carried out by the author is a method that facilitates the training material processing for the needs of the development of multimodal speech recognition systems. For this purpose, the difficult problem of marking boundaries of allophones is solved in this report based on the Polish dictionary in the context of the creation of allophone bases for speech synthesis. This is done in this way due to the simplified possibility of organizing critical listening and subjective evaluation of received allophones by a large group of Polish native speakers (73 people). Strengthening the method will allow it to be used for the extraction of allophones for the needs of developed system of automatic transcription of English speech and for its notation according to the IPA standard. The analysed continuous speech is combined in the DTW algorithm with a synthesized speech signal. The comparison of both signals is performed not in the time domain as in the classical DTW, but in the frequency domain. This allows for a statement that the phonetic content of both signals is compared. The paper describes the process of marking the boundaries of allophones for the Polish language, however after appropriate modifications, this approach can be used to determine the allophones boundaries in other languages, especially for English.","PeriodicalId":265587,"journal":{"name":"2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Marking the Allophones Boundaries Based on the DTW Algorithm\",\"authors\":\"J. Rafalko\",\"doi\":\"10.23919/SPA.2018.8563359\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper presents an approach to marking the boundaries of allophones in the speech signal based on the Dynamic Time Warping (DTW) algorithm. Setting and marking of allophones boundaries in continuous speech is a difficult issue due to the mutual influence of adjacent phonemes on each other. It is this neighbourhood on the one hand that creates variants of phonemes that is allophones, and on the other hand it affects that the border between allophones is in some cases very difficult to determine. Nowadays, this task is carried out manually in cooperation with specialists in the field of phonetics. The presented approach allows to build a system that is able to automate this process. The aim of the work currently carried out by the author is a method that facilitates the training material processing for the needs of the development of multimodal speech recognition systems. For this purpose, the difficult problem of marking boundaries of allophones is solved in this report based on the Polish dictionary in the context of the creation of allophone bases for speech synthesis. This is done in this way due to the simplified possibility of organizing critical listening and subjective evaluation of received allophones by a large group of Polish native speakers (73 people). Strengthening the method will allow it to be used for the extraction of allophones for the needs of developed system of automatic transcription of English speech and for its notation according to the IPA standard. The analysed continuous speech is combined in the DTW algorithm with a synthesized speech signal. The comparison of both signals is performed not in the time domain as in the classical DTW, but in the frequency domain. This allows for a statement that the phonetic content of both signals is compared. The paper describes the process of marking the boundaries of allophones for the Polish language, however after appropriate modifications, this approach can be used to determine the allophones boundaries in other languages, especially for English.\",\"PeriodicalId\":265587,\"journal\":{\"name\":\"2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/SPA.2018.8563359\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/SPA.2018.8563359","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

提出了一种基于动态时间翘曲(DTW)算法的语音信号中音素边界标记方法。由于相邻音素之间的相互影响,连续语音中音素边界的设置和标记是一个难题。就是这个邻域,一方面创造了音素的变体,即音素的变体,另一方面,它影响了音素之间的边界在某些情况下很难确定。如今,这项任务是与语音学领域的专家合作手工完成的。提出的方法允许构建一个能够自动化此过程的系统。作者目前开展的工作的目的是为开发多模态语音识别系统的需要提供一种促进训练材料处理的方法。为此,本报告以波兰语词典为基础,在创建语音合成的音素基的背景下,解决了音素边界标注的难题。这样做的原因是,通过一大批波兰语母语者(73人)对收到的音素进行组织批判性听力和主观评价的可能性简化了。对该方法进行强化,可用于已开发的英语语音自动转录系统的音素提取和按国际音标标准标注语音。分析后的连续语音在DTW算法中与合成语音信号相结合。这两个信号的比较不是在时域进行的,而是在频域进行的。这样就可以比较两个信号的语音内容。本文描述了波兰语的音素边界标记过程,但经过适当的修改,这种方法可以用于确定其他语言的音素边界,特别是英语。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Marking the Allophones Boundaries Based on the DTW Algorithm
The paper presents an approach to marking the boundaries of allophones in the speech signal based on the Dynamic Time Warping (DTW) algorithm. Setting and marking of allophones boundaries in continuous speech is a difficult issue due to the mutual influence of adjacent phonemes on each other. It is this neighbourhood on the one hand that creates variants of phonemes that is allophones, and on the other hand it affects that the border between allophones is in some cases very difficult to determine. Nowadays, this task is carried out manually in cooperation with specialists in the field of phonetics. The presented approach allows to build a system that is able to automate this process. The aim of the work currently carried out by the author is a method that facilitates the training material processing for the needs of the development of multimodal speech recognition systems. For this purpose, the difficult problem of marking boundaries of allophones is solved in this report based on the Polish dictionary in the context of the creation of allophone bases for speech synthesis. This is done in this way due to the simplified possibility of organizing critical listening and subjective evaluation of received allophones by a large group of Polish native speakers (73 people). Strengthening the method will allow it to be used for the extraction of allophones for the needs of developed system of automatic transcription of English speech and for its notation according to the IPA standard. The analysed continuous speech is combined in the DTW algorithm with a synthesized speech signal. The comparison of both signals is performed not in the time domain as in the classical DTW, but in the frequency domain. This allows for a statement that the phonetic content of both signals is compared. The paper describes the process of marking the boundaries of allophones for the Polish language, however after appropriate modifications, this approach can be used to determine the allophones boundaries in other languages, especially for English.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Vehicle detector training with labels derived from background subtraction algorithms in video surveillance Automatic 3D segmentation of MRI data for detection of head and neck cancerous lymph nodes Centerline-Radius Polygonal-Mesh Modeling of Bifurcated Blood Vessels in 3D Images using Conformal Mapping Active elimination of tonal components in acoustic signals An adaptive transmission algorithm for an inertial motion capture system in the aspect of energy saving
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1