mSilent: Towards General Corpus Silent Speech Recognition Using COTS mmWave Radar

Shangcui Zeng, Hao Wan, Shuyu Shi, Wei Wang
{"title":"mSilent: Towards General Corpus Silent Speech Recognition Using COTS mmWave Radar","authors":"Shangcui Zeng, Hao Wan, Shuyu Shi, Wei Wang","doi":"10.1145/3580838","DOIUrl":null,"url":null,"abstract":"Silent speech recognition (SSR) allows users to speak to the device without making a sound, avoiding being overheard or disturbing others. Compared to the video-based approach, wireless signal-based SSR can work when the user is wearing a mask and has fewer privacy concerns. However, previous wireless-based systems are still far from well-studied, e.g. they are only evaluated in corpus with highly limited size, making them only feasible for interaction with dozens of deterministic commands. In this paper, we present mSilent, a millimeter-wave (mmWave) based SSR system that can work in the general corpus containing thousands of daily conversation sentences. With the strong recognition capability, mSilent not only supports the more complex interaction with assistants, but also enables more general applications in daily life such as communication and input. To extract fine-grained articulatory features, we build a signal processing pipeline that uses a clustering-selection algorithm to separate articulatory gestures and generates a multi-scale detrended spectrogram (MSDS). To handle the complexity of the general corpus, we design an end-to-end deep neural network that consists of a multi-branch convolutional front-end and a Transformer-based sequence-to-sequence back-end. We collect a general corpus dataset of 1,000 daily conversation sentences that contains 21K samples of bi-modality data (mmWave and video). Our evaluation shows that mSilent achieves a 9.5% average word error rate (WER) at a distance of 1.5m, which is comparable to the performance of the state-of-the-art video-based approach. We also explore deploying mSilent in two typical scenarios of text entry and in-car assistant, and the less than 6% average WER demonstrates the potential of mSilent in general daily applications. CCS Concepts: • Human-centered computing → Ubiquitous and mobile computing systems and tools ;","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"3 1","pages":"39:1-39:28"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3580838","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Silent speech recognition (SSR) allows users to speak to the device without making a sound, avoiding being overheard or disturbing others. Compared to the video-based approach, wireless signal-based SSR can work when the user is wearing a mask and has fewer privacy concerns. However, previous wireless-based systems are still far from well-studied, e.g. they are only evaluated in corpus with highly limited size, making them only feasible for interaction with dozens of deterministic commands. In this paper, we present mSilent, a millimeter-wave (mmWave) based SSR system that can work in the general corpus containing thousands of daily conversation sentences. With the strong recognition capability, mSilent not only supports the more complex interaction with assistants, but also enables more general applications in daily life such as communication and input. To extract fine-grained articulatory features, we build a signal processing pipeline that uses a clustering-selection algorithm to separate articulatory gestures and generates a multi-scale detrended spectrogram (MSDS). To handle the complexity of the general corpus, we design an end-to-end deep neural network that consists of a multi-branch convolutional front-end and a Transformer-based sequence-to-sequence back-end. We collect a general corpus dataset of 1,000 daily conversation sentences that contains 21K samples of bi-modality data (mmWave and video). Our evaluation shows that mSilent achieves a 9.5% average word error rate (WER) at a distance of 1.5m, which is comparable to the performance of the state-of-the-art video-based approach. We also explore deploying mSilent in two typical scenarios of text entry and in-car assistant, and the less than 6% average WER demonstrates the potential of mSilent in general daily applications. CCS Concepts: • Human-centered computing → Ubiquitous and mobile computing systems and tools ;
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于COTS毫米波雷达的通用语料库无声语音识别
无声语音识别(SSR)允许用户在不发出声音的情况下对设备说话,避免被偷听或打扰他人。与基于视频的方法相比,基于无线信号的SSR可以在用户戴着面具的情况下工作,并且隐私问题较少。然而,以前的基于无线的系统还远远没有得到充分的研究,例如,它们只在高度有限的语料库中进行评估,这使得它们只能与几十个确定性命令进行交互。在本文中,我们提出了一种基于毫米波(mmWave)的SSR系统mSilent,该系统可以在包含数千个日常会话句子的一般语料库中工作。mSilent具有强大的识别能力,不仅可以支持与助手更复杂的交互,还可以实现日常生活中更通用的应用,如交流和输入。为了提取细粒度的发音特征,我们构建了一个信号处理管道,该管道使用聚类选择算法分离发音手势并生成多尺度去趋势谱图(MSDS)。为了处理通用语料库的复杂性,我们设计了一个端到端的深度神经网络,该网络由多分支卷积前端和基于transformer的序列到序列后端组成。我们收集了1000个日常会话句子的通用语料库数据集,其中包含21K双模态数据样本(毫米波和视频)。我们的评估表明,mSilent在1.5米的距离上实现了9.5%的平均单词错误率(WER),这与最先进的基于视频的方法的性能相当。我们还探索了在文本输入和车载助手这两种典型场景中部署mSilent的可能性,低于6%的平均WER显示了mSilent在一般日常应用程序中的潜力。CCS概念:•以人为中心的计算→无处不在的移动计算系统和工具;
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Multi-Subject 3D Human Mesh Construction Using Commodity WiFi UHead: Driver Attention Monitoring System Using UWB Radar DeltaLCA: Comparative Life-Cycle Assessment for Electronics Design Multimodal Daily-Life Logging in Free-living Environment Using Non-Visual Egocentric Sensors on a Smartphone Lateralization Effects in Electrodermal Activity Data Collected Using Wearable Devices
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1