Paralinguistic Privacy Protection at the Edge

IF 2.8 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Privacy and Security Pub Date : 2022-11-03 DOI:https://dl.acm.org/doi/10.1145/3570161

Ranya Aloufi, Hamed Haddadi, David Boyle

{"title":"Paralinguistic Privacy Protection at the Edge","authors":"Ranya Aloufi, Hamed Haddadi, David Boyle","doi":"https://dl.acm.org/doi/10.1145/3570161","DOIUrl":null,"url":null,"abstract":"Voice user interfaces and digital assistants are rapidly entering our lives and becoming singular touch points spanning our devices. These always-on services capture and transmit our audio data to powerful cloud services for further processing and subsequent actions. Our voices and raw audio signals collected through these devices contain a host of sensitive paralinguistic information that is transmitted to service providers regardless of deliberate or false triggers. As our emotional patterns and sensitive attributes like our identity, gender, well-being, are easily inferred using deep acoustic models, we encounter a new generation of privacy risks by using these services. One approach to mitigate the risk of paralinguistic-based privacy breaches is to exploit a combination of cloud-based processing with privacy-preserving, on-device paralinguistic information learning and filtering before transmitting voice data. In this paper we introduce EDGY, a configurable, lightweight, disentangled representation learning framework that transforms and filters high-dimensional voice data to identify and contain sensitive attributes at the edge prior to offloading to the cloud. We evaluate EDGY’s on-device performance and explore optimization techniques, including model quantization and knowledge distillation, to enable private, accurate and efficient representation learning on resource-constrained devices. Our results show that EDGY runs in tens of milliseconds with 0.2% relative improvement in ‘zero-shot’ ABX score or minimal performance penalties of approximately 5.95% word error rate (WER) in learning linguistic representations from raw voice signals, using a CPU and a single-core ARM processor without specialized hardware.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":"90 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Privacy and Security","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3570161","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Voice user interfaces and digital assistants are rapidly entering our lives and becoming singular touch points spanning our devices. These always-on services capture and transmit our audio data to powerful cloud services for further processing and subsequent actions. Our voices and raw audio signals collected through these devices contain a host of sensitive paralinguistic information that is transmitted to service providers regardless of deliberate or false triggers. As our emotional patterns and sensitive attributes like our identity, gender, well-being, are easily inferred using deep acoustic models, we encounter a new generation of privacy risks by using these services. One approach to mitigate the risk of paralinguistic-based privacy breaches is to exploit a combination of cloud-based processing with privacy-preserving, on-device paralinguistic information learning and filtering before transmitting voice data.

In this paper we introduce EDGY, a configurable, lightweight, disentangled representation learning framework that transforms and filters high-dimensional voice data to identify and contain sensitive attributes at the edge prior to offloading to the cloud. We evaluate EDGY’s on-device performance and explore optimization techniques, including model quantization and knowledge distillation, to enable private, accurate and efficient representation learning on resource-constrained devices. Our results show that EDGY runs in tens of milliseconds with 0.2% relative improvement in ‘zero-shot’ ABX score or minimal performance penalties of approximately 5.95% word error rate (WER) in learning linguistic representations from raw voice signals, using a CPU and a single-core ARM processor without specialized hardware.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

边缘的副语言隐私保护

语音用户界面和数字助理正在迅速进入我们的生活，并成为跨越我们设备的单一接触点。这些始终在线的服务捕获并将我们的音频数据传输到强大的云服务，以进行进一步处理和后续操作。通过这些设备收集的我们的声音和原始音频信号包含大量敏感的副语言信息，无论是否有意或虚假触发，这些信息都会传输给服务提供商。由于我们的情感模式和敏感属性，如我们的身份、性别、幸福感，很容易通过深层声学模型推断出来，我们在使用这些服务时遇到了新一代的隐私风险。减轻基于副语言的隐私泄露风险的一种方法是在传输语音数据之前，将基于云的处理与隐私保护、设备上的副语言信息学习和过滤相结合。在本文中，我们介绍了EDGY，这是一个可配置的、轻量级的、解纠缠的表示学习框架，它可以转换和过滤高维语音数据，以便在卸载到云之前识别和包含边缘的敏感属性。我们评估了EDGY在设备上的性能，并探索了优化技术，包括模型量化和知识蒸馏，以便在资源受限的设备上实现私有、准确和高效的表示学习。我们的结果表明，EDGY在几十毫秒内运行，在“零射击”ABX分数方面相对提高0.2%，或者在使用CPU和单核ARM处理器而没有专门硬件的情况下，从原始语音信号中学习语言表示时，单词错误率(WER)的最小性能损失约为5.95%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Privacy and Security Computer Science-General Computer Science

CiteScore

5.20

自引率

0.00%

发文量

期刊介绍： ACM Transactions on Privacy and Security (TOPS) (formerly known as TISSEC) publishes high-quality research results in the fields of information and system security and privacy. Studies addressing all aspects of these fields are welcomed, ranging from technologies, to systems and applications, to the crafting of policies.

期刊最新文献

ZPredict: ML-Based IPID Side-channel Measurements ZTA-IoT: A Novel Architecture for Zero-Trust in IoT Systems and an Ensuing Usage Control Model Security Analysis of the Consumer Remote SIM Provisioning Protocol X-squatter: AI Multilingual Generation of Cross-Language Sound-squatting Toward Robust ASR System against Audio Adversarial Examples using Agitated Logit