端到端语音识别的新型Spec-CNN-CTC模型

Jing Xue, Jun Zhang
{"title":"端到端语音识别的新型Spec-CNN-CTC模型","authors":"Jing Xue, Jun Zhang","doi":"10.1145/3457682.3457703","DOIUrl":null,"url":null,"abstract":"This paper discusses the application of a special data augmentation approach for end-to-end phone recognition system on the Deep Neural Networks. The system improves the performance of phone recognition and alleviates overfitting during training. Also, it offers a solution to the problem of few public datasets annotated at the phone level. And we propose the CNN-CTC structure as a baseline model. The model is based on Convolutional Neural Networks (CNNs) and Connectionist Temporal Classification (CTC) objective function. Which is an end-to-end structure, and there is no need to force alignment each frame of audio. The SpecAugment approach directly processes the feature of audio, such as the log Mel-spectrogram. In our experiment, the Spec-CNN-CTC system achieves a phone error rate of 16.11% on TIMIT corpus with no prior linguistic information. Which is outperforming the previous work Acoustic-State-Transition Model (ASTM) by 27.63%, the DNN-HMM with MFCC + IFCC features by 16.8%, the RNN-CRF model by 17.3% and the DBM-DNN model by 22.62%.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Novel Spec-CNN-CTC Model for End-to-End Speech Recognition\",\"authors\":\"Jing Xue, Jun Zhang\",\"doi\":\"10.1145/3457682.3457703\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper discusses the application of a special data augmentation approach for end-to-end phone recognition system on the Deep Neural Networks. The system improves the performance of phone recognition and alleviates overfitting during training. Also, it offers a solution to the problem of few public datasets annotated at the phone level. And we propose the CNN-CTC structure as a baseline model. The model is based on Convolutional Neural Networks (CNNs) and Connectionist Temporal Classification (CTC) objective function. Which is an end-to-end structure, and there is no need to force alignment each frame of audio. The SpecAugment approach directly processes the feature of audio, such as the log Mel-spectrogram. In our experiment, the Spec-CNN-CTC system achieves a phone error rate of 16.11% on TIMIT corpus with no prior linguistic information. Which is outperforming the previous work Acoustic-State-Transition Model (ASTM) by 27.63%, the DNN-HMM with MFCC + IFCC features by 16.8%, the RNN-CRF model by 17.3% and the DBM-DNN model by 22.62%.\",\"PeriodicalId\":142045,\"journal\":{\"name\":\"2021 13th International Conference on Machine Learning and Computing\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 13th International Conference on Machine Learning and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3457682.3457703\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Machine Learning and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3457682.3457703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

本文讨论了一种特殊的数据增强方法在深度神经网络端到端手机识别系统中的应用。该系统提高了手机识别的性能,缓解了训练过程中的过拟合问题。此外,它还解决了在电话级别上标注的公共数据集较少的问题。我们提出了CNN-CTC结构作为基线模型。该模型基于卷积神经网络(cnn)和连接时间分类(CTC)目标函数。这是一个端到端的结构,不需要强制对齐每一帧音频。SpecAugment方法直接处理音频的特征,如对数梅尔谱图。在我们的实验中,Spec-CNN-CTC系统在没有先验语言信息的TIMIT语料库上实现了16.11%的电话错误率。它比之前的声学状态转换模型(ASTM)高27.63%,比具有MFCC + IFCC特征的DNN-HMM高16.8%,比RNN-CRF模型高17.3%,比DBM-DNN模型高22.62%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Novel Spec-CNN-CTC Model for End-to-End Speech Recognition
This paper discusses the application of a special data augmentation approach for end-to-end phone recognition system on the Deep Neural Networks. The system improves the performance of phone recognition and alleviates overfitting during training. Also, it offers a solution to the problem of few public datasets annotated at the phone level. And we propose the CNN-CTC structure as a baseline model. The model is based on Convolutional Neural Networks (CNNs) and Connectionist Temporal Classification (CTC) objective function. Which is an end-to-end structure, and there is no need to force alignment each frame of audio. The SpecAugment approach directly processes the feature of audio, such as the log Mel-spectrogram. In our experiment, the Spec-CNN-CTC system achieves a phone error rate of 16.11% on TIMIT corpus with no prior linguistic information. Which is outperforming the previous work Acoustic-State-Transition Model (ASTM) by 27.63%, the DNN-HMM with MFCC + IFCC features by 16.8%, the RNN-CRF model by 17.3% and the DBM-DNN model by 22.62%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Corpus Construction and Entity Recognition for the Field of Industrial Robot Fault Diagnosis GCN2-NAA: Two-stage Graph Convolutional Networks with Node-Aware Attention for Joint Entity and Relation Extraction A Practical Indoor and Outdoor Seamless Navigation System Based on Electronic Map and Geomagnetism SC-DGCN: Sentiment Classification Based on Densely Connected Graph Convolutional Network Bird Songs Recognition Based on Ensemble Extreme Learning Machine
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1