{"title":"A Benchmark for Multi-speaker Anonymization","authors":"Xiaoxiao Miao, Ruijie Tao, Chang Zeng, Xin Wang","doi":"arxiv-2407.05608","DOIUrl":null,"url":null,"abstract":"Privacy-preserving voice protection approaches primarily suppress\nprivacy-related information derived from paralinguistic attributes while\npreserving the linguistic content. Existing solutions focus on single-speaker\nscenarios. However, they lack practicality for real-world applications, i.e.,\nmulti-speaker scenarios. In this paper, we present an initial attempt to\nprovide a multi-speaker anonymization benchmark by defining the task and\nevaluation protocol, proposing benchmarking solutions, and discussing the\nprivacy leakage of overlapping conversations. Specifically, ideal multi-speaker\nanonymization should preserve the number of speakers and the turn-taking\nstructure of the conversation, ensuring accurate context conveyance while\nmaintaining privacy. To achieve that, a cascaded system uses speaker\ndiarization to aggregate the speech of each speaker and speaker anonymization\nto conceal speaker privacy and preserve speech content. Additionally, we\npropose two conversation-level speaker vector anonymization methods to improve\nthe utility further. Both methods aim to make the original and corresponding\npseudo-speaker identities of each speaker unlinkable while preserving or even\nimproving the distinguishability among pseudo-speakers in a conversation. The\nfirst method minimizes the differential similarity across speaker pairs in the\noriginal and anonymized conversations to maintain original speaker\nrelationships in the anonymized version. The other method minimizes the\naggregated similarity across anonymized speakers to achieve better\ndifferentiation between speakers. Experiments conducted on both non-overlap\nsimulated and real-world datasets demonstrate the effectiveness of the\nmulti-speaker anonymization system with the proposed speaker anonymizers.\nAdditionally, we analyzed overlapping speech regarding privacy leakage and\nprovide potential solutions.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.05608","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Privacy-preserving voice protection approaches primarily suppress
privacy-related information derived from paralinguistic attributes while
preserving the linguistic content. Existing solutions focus on single-speaker
scenarios. However, they lack practicality for real-world applications, i.e.,
multi-speaker scenarios. In this paper, we present an initial attempt to
provide a multi-speaker anonymization benchmark by defining the task and
evaluation protocol, proposing benchmarking solutions, and discussing the
privacy leakage of overlapping conversations. Specifically, ideal multi-speaker
anonymization should preserve the number of speakers and the turn-taking
structure of the conversation, ensuring accurate context conveyance while
maintaining privacy. To achieve that, a cascaded system uses speaker
diarization to aggregate the speech of each speaker and speaker anonymization
to conceal speaker privacy and preserve speech content. Additionally, we
propose two conversation-level speaker vector anonymization methods to improve
the utility further. Both methods aim to make the original and corresponding
pseudo-speaker identities of each speaker unlinkable while preserving or even
improving the distinguishability among pseudo-speakers in a conversation. The
first method minimizes the differential similarity across speaker pairs in the
original and anonymized conversations to maintain original speaker
relationships in the anonymized version. The other method minimizes the
aggregated similarity across anonymized speakers to achieve better
differentiation between speakers. Experiments conducted on both non-overlap
simulated and real-world datasets demonstrate the effectiveness of the
multi-speaker anonymization system with the proposed speaker anonymizers.
Additionally, we analyzed overlapping speech regarding privacy leakage and
provide potential solutions.