{"title":"Learning Privacy-Preserving Embeddings for Image Data to Be Published","authors":"Chu-Chen Li, Cheng-Te Li, Shou-De Lin","doi":"10.1145/3623404","DOIUrl":null,"url":null,"abstract":"Deep learning has its superiority on learning feature representations that bring promising performance in various application domains. Recent advances have shown that privacy attributes of users and patients (e.g., identity, gender and race) can be accurately inferred from image data. To avoid the risk of privacy leaking, data owners can resort to release the embeddings, rather than the original images. In this paper, we aim at learning to generate privacy-preserving embeddings from image data. The obtained embeddings are required to maintain the data utility (e.g., keep the performance of the main task like disease prediction), and to simultaneously prevent the private attributes of data instances from being accurately inferred. In addition, we also want the embeddings hard to be successfully used to reconstruct the original images. We propose a hybrid method based on multi-task learning to reach the goal. The key idea is two-fold. One is to learn the feature encoder that can benefit the main task and fool the sensitive task at the same time via iterative training and feature disentanglement. The other is to incorporate the learning of adversarial examples to mislead the sensitive attribute classification’s performance. Experiments conducted on Multi-Attribute Facial Landmark (MAFL) and NIH Chest X-rays datasets exhibit the effectiveness of our hybrid method. A set of advanced studies also show the usefulness of each model component, the difficulty in data reconstruction, and the performance impact of task correlation.","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":" ","pages":""},"PeriodicalIF":7.2000,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3623404","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning has its superiority on learning feature representations that bring promising performance in various application domains. Recent advances have shown that privacy attributes of users and patients (e.g., identity, gender and race) can be accurately inferred from image data. To avoid the risk of privacy leaking, data owners can resort to release the embeddings, rather than the original images. In this paper, we aim at learning to generate privacy-preserving embeddings from image data. The obtained embeddings are required to maintain the data utility (e.g., keep the performance of the main task like disease prediction), and to simultaneously prevent the private attributes of data instances from being accurately inferred. In addition, we also want the embeddings hard to be successfully used to reconstruct the original images. We propose a hybrid method based on multi-task learning to reach the goal. The key idea is two-fold. One is to learn the feature encoder that can benefit the main task and fool the sensitive task at the same time via iterative training and feature disentanglement. The other is to incorporate the learning of adversarial examples to mislead the sensitive attribute classification’s performance. Experiments conducted on Multi-Attribute Facial Landmark (MAFL) and NIH Chest X-rays datasets exhibit the effectiveness of our hybrid method. A set of advanced studies also show the usefulness of each model component, the difficulty in data reconstruction, and the performance impact of task correlation.
期刊介绍:
ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world.
ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.