{"title":"Neural Chinese silent speech recognition with facial electromyography","authors":"Liang Xie , Yakun Zhang , Hao Yuan , Meishan Zhang , Xingyu Zhang , Changyan Zheng , Ye Yan , Erwei Yin","doi":"10.1016/j.specom.2025.103230","DOIUrl":null,"url":null,"abstract":"<div><div>The majority work in speech recognition is based on audible speech and has already achieved great success. However, in several special scenarios, the voice might be unavailable. Recently, Gaddy and Klein (2020) presented an initial study of silent speech analysis, aiming to voice the silent speech from facial electromyography (EMG). In this work, we present the first study of neural silent speech recognition in Chinese, which goes one step further to convert the silent facial EMG signals into text directly. We build a benchmark dataset and then introduce a neural end-to-end model to the task. The model is further optimized with two auxiliary tasks for better feature learning. In addition, we suggest a systematic data augmentation strategy to improve model performance. Experimental results show that our final best model can achieve a character error rate of 38.0% on a sentence-level silent speech recognition task. We also provide in-depth analysis to gain a comprehensive understanding of our task and the various models proposed. Although our model achieves initial results, there is still a gap compared to the ideal level, warranting further attention and research.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"171 ","pages":"Article 103230"},"PeriodicalIF":3.0000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639325000457","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
The majority work in speech recognition is based on audible speech and has already achieved great success. However, in several special scenarios, the voice might be unavailable. Recently, Gaddy and Klein (2020) presented an initial study of silent speech analysis, aiming to voice the silent speech from facial electromyography (EMG). In this work, we present the first study of neural silent speech recognition in Chinese, which goes one step further to convert the silent facial EMG signals into text directly. We build a benchmark dataset and then introduce a neural end-to-end model to the task. The model is further optimized with two auxiliary tasks for better feature learning. In addition, we suggest a systematic data augmentation strategy to improve model performance. Experimental results show that our final best model can achieve a character error rate of 38.0% on a sentence-level silent speech recognition task. We also provide in-depth analysis to gain a comprehensive understanding of our task and the various models proposed. Although our model achieves initial results, there is still a gap compared to the ideal level, warranting further attention and research.
期刊介绍:
Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results.
The journal''s primary objectives are:
• to present a forum for the advancement of human and human-machine speech communication science;
• to stimulate cross-fertilization between different fields of this domain;
• to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.