End-To-End Silent Speech Recognition with Acoustic Sensing

2021 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2020-11-23 DOI:10.1109/SLT48900.2021.9383622

Jian Luo, Jianzong Wang, Ning Cheng, Guilin Jiang, Jing Xiao

引用次数: 3

Abstract

Silent speech interfaces (SSI) has been an exciting area of recent interest. In this paper, we present a non-invasive silent speech interface that uses inaudible acoustic signals to capture people’s lip movements when they speak. We exploit the speaker and microphone of the smartphone to emit signals and listen to their reflections, respectively. The extracted phase features of these reflections are fed into the deep learning networks to recognize speech. And we also propose an end-to-end recognition framework, which combines the CNN and attention-based encoder-decoder network. Evaluation results on a limited vocabulary (54 sentences) yield word error rates of 8.4% in speaker-independent and environment-independent settings, and 8.1% for unseen sentence testing.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

端到端静音语音识别与声学传感

无声语音接口(SSI)是最近一个令人兴奋的领域。在本文中，我们提出了一种非侵入性无声语音接口，它使用听不见的声学信号来捕捉人们说话时的嘴唇运动。我们利用智能手机的扬声器和麦克风分别发出信号和听取它们的反射。将这些反射的相位特征提取到深度学习网络中进行语音识别。我们还提出了一个端到端识别框架，该框架结合了CNN和基于注意力的编码器-解码器网络。在有限词汇量(54个句子)的评估结果中，与说话人无关和环境无关的单词错误率为8.4%，未见句子测试的错误率为8.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量