{"title":"VoShield:声场动力学的声音活力检测","authors":"Qiang Yang, Kaiyan Cui, Yuanqing Zheng","doi":"10.1109/INFOCOM53939.2023.10229038","DOIUrl":null,"url":null,"abstract":"Voice assistants are widely integrated into a variety of smart devices, enabling users to easily complete daily tasks and even critical operations like online transactions with voice commands. Thus, once attackers replay a secretly-recorded voice command by loudspeakers to compromise users’ voice assistants, this operation will cause serious consequences, such as information leakage and property loss. Unfortunately, most voice liveness detection approaches against replay attacks mainly rely on detecting lip motions or subtle physiological features in speech, which are limited within a very short range. In this paper, we propose VoShield to check whether a voice command is from a genuine user or a loudspeaker imposter. VoShield measures sound field dynamics, a feature that changes fast as the human mouths dynamically open and close. In contrast, it would remain rather stable for loudspeakers due to the fixed size. This feature enables VoShield to largely extend the working distance and remain resilient to user locations. Besides, sound field dynamics are extracted from the difference between multiple microphone channels, making this feature robust to voice volume. To evaluate VoShield, we conducted comprehensive experiments with various settings in different working scenarios. The results show that VoShield can achieve a detection accuracy of 98.2% and an Equal Error Rate of 2.0%, which serves as a promising complement to current voice authentication systems for smart devices.","PeriodicalId":387707,"journal":{"name":"IEEE INFOCOM 2023 - IEEE Conference on Computer Communications","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"VoShield: Voice Liveness Detection with Sound Field Dynamics\",\"authors\":\"Qiang Yang, Kaiyan Cui, Yuanqing Zheng\",\"doi\":\"10.1109/INFOCOM53939.2023.10229038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Voice assistants are widely integrated into a variety of smart devices, enabling users to easily complete daily tasks and even critical operations like online transactions with voice commands. Thus, once attackers replay a secretly-recorded voice command by loudspeakers to compromise users’ voice assistants, this operation will cause serious consequences, such as information leakage and property loss. Unfortunately, most voice liveness detection approaches against replay attacks mainly rely on detecting lip motions or subtle physiological features in speech, which are limited within a very short range. In this paper, we propose VoShield to check whether a voice command is from a genuine user or a loudspeaker imposter. VoShield measures sound field dynamics, a feature that changes fast as the human mouths dynamically open and close. In contrast, it would remain rather stable for loudspeakers due to the fixed size. This feature enables VoShield to largely extend the working distance and remain resilient to user locations. Besides, sound field dynamics are extracted from the difference between multiple microphone channels, making this feature robust to voice volume. To evaluate VoShield, we conducted comprehensive experiments with various settings in different working scenarios. The results show that VoShield can achieve a detection accuracy of 98.2% and an Equal Error Rate of 2.0%, which serves as a promising complement to current voice authentication systems for smart devices.\",\"PeriodicalId\":387707,\"journal\":{\"name\":\"IEEE INFOCOM 2023 - IEEE Conference on Computer Communications\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE INFOCOM 2023 - IEEE Conference on Computer Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INFOCOM53939.2023.10229038\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2023 - IEEE Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM53939.2023.10229038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
VoShield: Voice Liveness Detection with Sound Field Dynamics
Voice assistants are widely integrated into a variety of smart devices, enabling users to easily complete daily tasks and even critical operations like online transactions with voice commands. Thus, once attackers replay a secretly-recorded voice command by loudspeakers to compromise users’ voice assistants, this operation will cause serious consequences, such as information leakage and property loss. Unfortunately, most voice liveness detection approaches against replay attacks mainly rely on detecting lip motions or subtle physiological features in speech, which are limited within a very short range. In this paper, we propose VoShield to check whether a voice command is from a genuine user or a loudspeaker imposter. VoShield measures sound field dynamics, a feature that changes fast as the human mouths dynamically open and close. In contrast, it would remain rather stable for loudspeakers due to the fixed size. This feature enables VoShield to largely extend the working distance and remain resilient to user locations. Besides, sound field dynamics are extracted from the difference between multiple microphone channels, making this feature robust to voice volume. To evaluate VoShield, we conducted comprehensive experiments with various settings in different working scenarios. The results show that VoShield can achieve a detection accuracy of 98.2% and an Equal Error Rate of 2.0%, which serves as a promising complement to current voice authentication systems for smart devices.