{"title":"Lipwatch: Enabling Silent Speech Recognition on Smartwatches using Acoustic Sensing","authors":"Qian Zhang, Yubin Lan, Kaiyi Guo, Dong Wang","doi":"10.1145/3659614","DOIUrl":null,"url":null,"abstract":"Silent Speech Interfaces (SSI) on mobile devices offer a privacy-friendly alternative to conventional voice input methods. Previous research has primarily focused on smartphones. In this paper, we introduce Lipwatch, a novel system that utilizes acoustic sensing techniques to enable SSI on smartwatches. Lipwatch leverages the inaudible waves emitted by the watch's speaker to capture lip movements and then analyzes the echo to enable SSI. In contrast to acoustic sensing-based SSI on smartphones, our development of Lipwatch takes into full consideration the specific scenarios and requirements associated with smartwatches. Firstly, we elaborate a wake-up-free mechanism, allowing users to interact without the need for a wake-up phrase or button presses. The mechanism utilizes the inertial sensors on the smartwatch to detect gestures, in combination with acoustic signals that detecting lip movements to determine whether SSI should be activated. Secondly, we design a flexible silent speech recognition mechanism that explores limited vocabulary recognition to comprehend a broader range of user commands, even those not present in the training dataset, relieving users from strict adherence to predefined commands. We evaluate Lipwatch on 15 individuals using a set of the 80 most common interaction commands on smartwatches. The system achieves a Word Error Rate (WER) of 13.7% in user-independent test. Even when users utter commands containing words absent in the training set, Lipwatch still demonstrates a remarkable 88.7% top-3 accuracy. We implement a real-time version of Lipwatch on a commercial smartwatch. The user study shows that Lipwatch can be a practical and promising option to enable SSI on smartwatches.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":null,"pages":null},"PeriodicalIF":3.6000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3659614","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1
Abstract
Silent Speech Interfaces (SSI) on mobile devices offer a privacy-friendly alternative to conventional voice input methods. Previous research has primarily focused on smartphones. In this paper, we introduce Lipwatch, a novel system that utilizes acoustic sensing techniques to enable SSI on smartwatches. Lipwatch leverages the inaudible waves emitted by the watch's speaker to capture lip movements and then analyzes the echo to enable SSI. In contrast to acoustic sensing-based SSI on smartphones, our development of Lipwatch takes into full consideration the specific scenarios and requirements associated with smartwatches. Firstly, we elaborate a wake-up-free mechanism, allowing users to interact without the need for a wake-up phrase or button presses. The mechanism utilizes the inertial sensors on the smartwatch to detect gestures, in combination with acoustic signals that detecting lip movements to determine whether SSI should be activated. Secondly, we design a flexible silent speech recognition mechanism that explores limited vocabulary recognition to comprehend a broader range of user commands, even those not present in the training dataset, relieving users from strict adherence to predefined commands. We evaluate Lipwatch on 15 individuals using a set of the 80 most common interaction commands on smartwatches. The system achieves a Word Error Rate (WER) of 13.7% in user-independent test. Even when users utter commands containing words absent in the training set, Lipwatch still demonstrates a remarkable 88.7% top-3 accuracy. We implement a real-time version of Lipwatch on a commercial smartwatch. The user study shows that Lipwatch can be a practical and promising option to enable SSI on smartwatches.