{"title":"基于注意的音素识别方法研究","authors":"Yupei Zhang","doi":"10.1145/3581807.3581866","DOIUrl":null,"url":null,"abstract":"A phoneme is the smallest sound unit of a language. Every language has its corresponding phonemes. Phoneme recognition can be used in speech-based applications such as auto speech recognition and lip sync. This paper proposes an end-to-end deep learning model called Connectionist Temporal Classification (CTC) and attention-based seq2seq network that consists of one bi-GRU layer in the encoder and one GRU layer in the decoder, for recognizing the phonemes in speech. Experiments on the TIMIT dataset demonstrate its advantages on some other seq2seq networks, with over 50% improvements after applying the attention mechanism.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on Phoneme Recognition using Attention-based Methods\",\"authors\":\"Yupei Zhang\",\"doi\":\"10.1145/3581807.3581866\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A phoneme is the smallest sound unit of a language. Every language has its corresponding phonemes. Phoneme recognition can be used in speech-based applications such as auto speech recognition and lip sync. This paper proposes an end-to-end deep learning model called Connectionist Temporal Classification (CTC) and attention-based seq2seq network that consists of one bi-GRU layer in the encoder and one GRU layer in the decoder, for recognizing the phonemes in speech. Experiments on the TIMIT dataset demonstrate its advantages on some other seq2seq networks, with over 50% improvements after applying the attention mechanism.\",\"PeriodicalId\":292813,\"journal\":{\"name\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3581807.3581866\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581807.3581866","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

音素是语言中最小的声音单位。每种语言都有相应的音素。音素识别可以用于基于语音的应用程序,如自动语音识别和口型同步。本文提出了一个端到端的深度学习模型,称为连接时间分类(CTC)和基于注意力的seq2seq网络,该网络由编码器中的一个双GRU层和解码器中的一个GRU层组成,用于识别语音中的音素。在TIMIT数据集上的实验证明了它在其他一些seq2seq网络上的优势,在应用注意机制后,提高了50%以上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Research on Phoneme Recognition using Attention-based Methods
A phoneme is the smallest sound unit of a language. Every language has its corresponding phonemes. Phoneme recognition can be used in speech-based applications such as auto speech recognition and lip sync. This paper proposes an end-to-end deep learning model called Connectionist Temporal Classification (CTC) and attention-based seq2seq network that consists of one bi-GRU layer in the encoder and one GRU layer in the decoder, for recognizing the phonemes in speech. Experiments on the TIMIT dataset demonstrate its advantages on some other seq2seq networks, with over 50% improvements after applying the attention mechanism.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Multi-Scale Channel Attention for Chinese Scene Text Recognition Vehicle Re-identification Based on Multi-Scale Attention Feature Fusion Comparative Study on EEG Feature Recognition based on Deep Belief Network VA-TransUNet: A U-shaped Medical Image Segmentation Network with Visual Attention Traffic Flow Forecasting Research Based on Delay Reconstruction and GRU-SVR
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1