Acoustic model training using self-attention for low-resource speech recognition

IF 0.3 Q4 ACOUSTICS Journal of the Acoustical Society of Korea Pub Date : 2020-09-01 DOI:10.7776/ASK.2020.39.5.483

Hosung Kim

引用次数: 0

Abstract

This paper proposes acoustic model training using self-attention for low-resource speech recognition. In low-resource speech recognition, it is difficult for acoustic model to distinguish certain phones. For example, plosive /d/ and /t/, plosive /g/ and /k/ and affricate /z/ and /ch/. In acoustic model training, the self-attention generates attention weights from the deep neural network model. In this study, these weights handle the similar pronunciation error for low-resource speech recognition. When the proposed method was applied to Time Delay Neural Network-Output gate Projected Gated Recurrent Unit (TNDD-OPGRU)-based acoustic model, the proposed model showed a 5.98 % word error rate. It shows absolute improvement of 0.74 % compared with TDNN-OPGRU model.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于自注意的低资源语音识别声学模型训练

本文提出了一种基于自注意的声学模型训练方法，用于低资源语音识别。在低资源语音识别中，声学模型难以区分特定的电话。例如，爆破音/d/和/t/，爆破音/g/和/k/，不灭音/z/和/ch/。在声学模型训练中，自注意从深度神经网络模型中生成注意权值。在本研究中，这些权重处理了低资源语音识别的类似发音错误。将该方法应用于基于时延神经网络输出门投影门控循环单元(TNDD-OPGRU)的声学模型，该模型的单词错误率为5.98%。与TDNN-OPGRU模型相比，该模型的绝对改进率为0.74%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of the Acoustical Society of Korea ACOUSTICS-

CiteScore

0.60

自引率

50.00%

发文量