基于片段的孤立词识别网络方法的训练程序

F. Soong
{"title":"基于片段的孤立词识别网络方法的训练程序","authors":"F. Soong","doi":"10.1109/ICASSP.1987.1169579","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a complete training procedure for creating a subword-based network and test it in an isolated word recognition experiment. We first hand segment one training token per word into contiguous subword segments with the aid of an interactive program that can display and playback various acoustic features of an utterance. The subword segmental units adopted in this paper consist of four different sound classes including: stationary sounds, fast transitional sounds, slow transitional sounds plus consonant clusters and others. The hand segmented token is used to initialize a subword-based word network which is then refined by using more training tokens. The refinement is carried out with a two-level dynamic programming (DP) procedure. At the first level, or the word level, an endpoint-relaxed DP algorithm is used to remove any possible endpointing errors and to mark tentative segment boundaries. Between the marked segment boundaries, another endpoint-relaxed DP algorithm is employed at the segment level to refine the segments extracted at the word level. A segment-based word network, which consists of serial and parallel branches, is generated from this training procedure. While serial branches are generated by using acoustically similar segments aligned at the segment level parallel branches are created for accomodating different acoustic manifestations of the same sound class in different phonetic contexts or different pronunciations. A speaker-dependent, isolated word, recognition experiment was carried out. For a four-speaker(2 male and 2 female), English alphabet data base, the segment-based network, when compared with a conventional word-template-based approach, gives improved performance. The word error rate is reduced from 11.2% for the word-based recognizer down to 7.7% for the network-based recognizer; or correspondingly, the number of misrecognized words is reduced from 116 to 80 out of 1040 recognition trials.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A training procedure for a segment-based-network approach to isolated word recognition\",\"authors\":\"F. Soong\",\"doi\":\"10.1109/ICASSP.1987.1169579\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a complete training procedure for creating a subword-based network and test it in an isolated word recognition experiment. We first hand segment one training token per word into contiguous subword segments with the aid of an interactive program that can display and playback various acoustic features of an utterance. The subword segmental units adopted in this paper consist of four different sound classes including: stationary sounds, fast transitional sounds, slow transitional sounds plus consonant clusters and others. The hand segmented token is used to initialize a subword-based word network which is then refined by using more training tokens. The refinement is carried out with a two-level dynamic programming (DP) procedure. At the first level, or the word level, an endpoint-relaxed DP algorithm is used to remove any possible endpointing errors and to mark tentative segment boundaries. Between the marked segment boundaries, another endpoint-relaxed DP algorithm is employed at the segment level to refine the segments extracted at the word level. A segment-based word network, which consists of serial and parallel branches, is generated from this training procedure. While serial branches are generated by using acoustically similar segments aligned at the segment level parallel branches are created for accomodating different acoustic manifestations of the same sound class in different phonetic contexts or different pronunciations. A speaker-dependent, isolated word, recognition experiment was carried out. For a four-speaker(2 male and 2 female), English alphabet data base, the segment-based network, when compared with a conventional word-template-based approach, gives improved performance. The word error rate is reduced from 11.2% for the word-based recognizer down to 7.7% for the network-based recognizer; or correspondingly, the number of misrecognized words is reduced from 116 to 80 out of 1040 recognition trials.\",\"PeriodicalId\":140810,\"journal\":{\"name\":\"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1987-04-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.1987.1169579\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.1987.1169579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

在本文中,我们提出了一个完整的训练程序来创建一个基于子词的网络,并在一个孤立的词识别实验中对其进行了测试。我们首先将每个单词的一个训练标记手工分割成连续的子词片段,借助一个可以显示和播放话语的各种声学特征的交互式程序。本文采用的子词分段单元包括四种不同的音类:静止音、快速过渡音、慢过渡音加辅音簇等。手分割标记用于初始化基于子词的单词网络,然后通过使用更多的训练标记对该网络进行细化。采用两级动态规划(DP)方法进行优化。在第一层或字层,使用端点放松DP算法来消除任何可能的端点错误并标记暂定段边界。在标记的段边界之间,在段级上采用另一种端点放松DP算法对词级提取的段进行细化。在此训练过程中,生成了一个由串行分支和并行分支组成的基于分词的词网络。序列分支是利用声学上相似的段在段级上排列而产生的,平行分支是为了适应同一音类在不同语音上下文中或不同发音中的不同声学表现而产生的。进行了一个依赖说话人的孤立词识别实验。对于一个四人(2男2女)的英语字母表数据库,与传统的基于词模板的方法相比,基于分词的网络具有更好的性能。单词错误率从基于单词的识别器的11.2%下降到基于网络的识别器的7.7%;或者相应地,在1040次识别试验中,错误识别的单词数量从116个减少到80个。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A training procedure for a segment-based-network approach to isolated word recognition
In this paper, we propose a complete training procedure for creating a subword-based network and test it in an isolated word recognition experiment. We first hand segment one training token per word into contiguous subword segments with the aid of an interactive program that can display and playback various acoustic features of an utterance. The subword segmental units adopted in this paper consist of four different sound classes including: stationary sounds, fast transitional sounds, slow transitional sounds plus consonant clusters and others. The hand segmented token is used to initialize a subword-based word network which is then refined by using more training tokens. The refinement is carried out with a two-level dynamic programming (DP) procedure. At the first level, or the word level, an endpoint-relaxed DP algorithm is used to remove any possible endpointing errors and to mark tentative segment boundaries. Between the marked segment boundaries, another endpoint-relaxed DP algorithm is employed at the segment level to refine the segments extracted at the word level. A segment-based word network, which consists of serial and parallel branches, is generated from this training procedure. While serial branches are generated by using acoustically similar segments aligned at the segment level parallel branches are created for accomodating different acoustic manifestations of the same sound class in different phonetic contexts or different pronunciations. A speaker-dependent, isolated word, recognition experiment was carried out. For a four-speaker(2 male and 2 female), English alphabet data base, the segment-based network, when compared with a conventional word-template-based approach, gives improved performance. The word error rate is reduced from 11.2% for the word-based recognizer down to 7.7% for the network-based recognizer; or correspondingly, the number of misrecognized words is reduced from 116 to 80 out of 1040 recognition trials.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A high resolution data-adaptive time-frequency representation A fast prediction-error detector for estimating sparse-spike sequences Some applications of mathematical morphology to range imagery Parameter estimation using the autocorrelation of the discrete Fourier transform Array signal processing with interconnected Neuron-like elements
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1