Isolated word recognition using high-order statistics and time-delay neural networks

M. Ashouri
{"title":"Isolated word recognition using high-order statistics and time-delay neural networks","authors":"M. Ashouri","doi":"10.1109/HOST.1997.613487","DOIUrl":null,"url":null,"abstract":"In this paper, two isolated word recognition methods based on high-order statistics and a time-delay neural network (TDNN) for recognition of Farsi spoken digits have been studied. The adopted speech recognition system consists of four modules, namely, a preprocessor, endpoints' detector, feature extractor and classifier. The first method estimates the AR parameters of speech based on the third- and fourth-order cumulants using high-order Yule-Walker, W-slice and 1-D slice approaches. In the second, method, statistical features are extracted from the estimated high-order probability density function (pdf) of thresholded amplitude features. For each pdf estimate, the values of mean, variance, third order moment and entropy are computed. The total number of features for each frame of approximate length of 15 ms is 16. The adopted TDNN has 16 nodes in its input layer, 10 nodes in its output layer and two hidden layers. The learning rule of the adopted TDNN that is based on the backpropagation rule has been modified to decrease the training time. Computer simulation results obtained from recognizing 10 Farsi digits spoken by different speakers shows that the first method has a better recognition rate while the second method necessitates less computation.","PeriodicalId":305928,"journal":{"name":"Proceedings of the IEEE Signal Processing Workshop on Higher-Order Statistics","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE Signal Processing Workshop on Higher-Order Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HOST.1997.613487","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In this paper, two isolated word recognition methods based on high-order statistics and a time-delay neural network (TDNN) for recognition of Farsi spoken digits have been studied. The adopted speech recognition system consists of four modules, namely, a preprocessor, endpoints' detector, feature extractor and classifier. The first method estimates the AR parameters of speech based on the third- and fourth-order cumulants using high-order Yule-Walker, W-slice and 1-D slice approaches. In the second, method, statistical features are extracted from the estimated high-order probability density function (pdf) of thresholded amplitude features. For each pdf estimate, the values of mean, variance, third order moment and entropy are computed. The total number of features for each frame of approximate length of 15 ms is 16. The adopted TDNN has 16 nodes in its input layer, 10 nodes in its output layer and two hidden layers. The learning rule of the adopted TDNN that is based on the backpropagation rule has been modified to decrease the training time. Computer simulation results obtained from recognizing 10 Farsi digits spoken by different speakers shows that the first method has a better recognition rate while the second method necessitates less computation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
孤立词识别使用高阶统计和时滞神经网络
本文研究了基于高阶统计量和时延神经网络(TDNN)的波斯语口语数字孤立词识别方法。所采用的语音识别系统由预处理器、端点检测器、特征提取器和分类器四个模块组成。第一种方法基于三阶和四阶累积量,使用高阶Yule-Walker、W-slice和1-D slice方法估计语音的AR参数。在第二种方法中,从估计的阈值振幅特征的高阶概率密度函数(pdf)中提取统计特征。对于每个pdf估计,计算均值、方差、三阶矩和熵的值。大约长度为15ms的每帧的特征总数为16个。所采用的TDNN输入层有16个节点,输出层有10个节点,隐藏层有2个。所采用的基于反向传播规则的TDNN的学习规则被修改,以减少训练时间。通过对10个不同说话人所说的波斯语数字的计算机仿真结果表明,第一种方法具有更好的识别率,而第二种方法的计算量更少。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Narrow band source separation in wide band context applications to array signal processing Higher-order statistics for tissue characterization from ultrasound images An iterative mixed norm image restoration algorithm Comparison between asymmetric generalized Gaussian (AGG) and symmetric-/spl alpha/-stable (S/spl alpha/S) noise models for signal estimation in non Gaussian environments Linear algebraic approaches for (almost) periodic moving average system identification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1