{"title":"Feasibility and accuracy of hotword detection using vibration energy harvester","authors":"Sara Khalifa, Mahbub Hassan, A. Seneviratne","doi":"10.1109/WoWMoM.2016.7523555","DOIUrl":null,"url":null,"abstract":"Vibration energy harvesting (VEH) is a promising source of renewable energy that can be used to extend battery life of next generation mobile devices. In this paper, we study the feasibility and accuracy of VEH for detecting hotwords, such as “OK Google”, used by popular voice control applications to distinguish user commands from other conversations. The idea of using power signals of VEH to detect hotwords is based on the fact that human voice creates vibrations in the air, which could be potentially picked up by the VEH hardware inside a mobile device. Using off-the-shelf VEH product, we conduct a comprehensive experimental study involving 8 subjects. We analyse two possible usage scenarios for the VEH hardware. In the first scenario, the user is not required to talk directly to the device (indirect), but the VEH is expected to pick up the ambient vibrations caused by user-generated sound waves. In the second, the user is expected to direct his voice to the VEH (direct) and talk to it from a close distance. For both usage scenarios, we evaluate two types of hotword detection, speaker-independent and speaker-dependent. We find that VEH can detect hotwords more accurately in the direct scenario compared to the indirect. For the direct scenario, our results show that a simple Decision Tree classifier can detect hotwords from VEH signals with accuracies of 73% and 85%, respectively, for speaker-independent and speaker-dependent detections. Finally, we show that these accuracies are comparable to what could be achieved with an accelerometer sampled at 200 Hz.","PeriodicalId":187747,"journal":{"name":"2016 IEEE 17th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 17th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WoWMoM.2016.7523555","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
Vibration energy harvesting (VEH) is a promising source of renewable energy that can be used to extend battery life of next generation mobile devices. In this paper, we study the feasibility and accuracy of VEH for detecting hotwords, such as “OK Google”, used by popular voice control applications to distinguish user commands from other conversations. The idea of using power signals of VEH to detect hotwords is based on the fact that human voice creates vibrations in the air, which could be potentially picked up by the VEH hardware inside a mobile device. Using off-the-shelf VEH product, we conduct a comprehensive experimental study involving 8 subjects. We analyse two possible usage scenarios for the VEH hardware. In the first scenario, the user is not required to talk directly to the device (indirect), but the VEH is expected to pick up the ambient vibrations caused by user-generated sound waves. In the second, the user is expected to direct his voice to the VEH (direct) and talk to it from a close distance. For both usage scenarios, we evaluate two types of hotword detection, speaker-independent and speaker-dependent. We find that VEH can detect hotwords more accurately in the direct scenario compared to the indirect. For the direct scenario, our results show that a simple Decision Tree classifier can detect hotwords from VEH signals with accuracies of 73% and 85%, respectively, for speaker-independent and speaker-dependent detections. Finally, we show that these accuracies are comparable to what could be achieved with an accelerometer sampled at 200 Hz.