{"title":"隐私保护数据分析的乘权机制","authors":"Moritz Hardt, G. Rothblum","doi":"10.1109/FOCS.2010.85","DOIUrl":null,"url":null,"abstract":"We consider statistical data analysis in the interactive setting. In this setting a trusted curator maintains a database of sensitive information about individual participants, and releases privacy-preserving answers to queries as they arrive. Our primary contribution is a new differentially private multiplicative weights mechanism for answering a large number of interactive counting (or linear) queries that arrive online and may be adaptively chosen. This is the first mechanism with worst-case accuracy guarantees that can answer large numbers of interactive queries and is {\\em efficient} (in terms of the runtime's dependence on the data universe size). The error is asymptotically \\emph{optimal} in its dependence on the number of participants, and depends only logarithmically on the number of queries being answered. The running time is nearly {\\em linear} in the size of the data universe. As a further contribution, when we relax the utility requirement and require accuracy only for databases drawn from a rich class of databases, we obtain exponential improvements in running time. Even in this relaxed setting we continue to guarantee privacy for {\\em any} input database. Only the utility requirement is relaxed. Specifically, we show that when the input database is drawn from a {\\em smooth} distribution — a distribution that does not place too much weight on any single data item — accuracy remains as above, and the running time becomes {\\em poly-logarithmic} in the data universe size. The main technical contributions are the application of multiplicative weights techniques to the differential privacy setting, a new privacy analysis for the interactive setting, and a technique for reducing data dimensionality for databases drawn from smooth distributions.","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"411","resultStr":"{\"title\":\"A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis\",\"authors\":\"Moritz Hardt, G. Rothblum\",\"doi\":\"10.1109/FOCS.2010.85\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider statistical data analysis in the interactive setting. In this setting a trusted curator maintains a database of sensitive information about individual participants, and releases privacy-preserving answers to queries as they arrive. Our primary contribution is a new differentially private multiplicative weights mechanism for answering a large number of interactive counting (or linear) queries that arrive online and may be adaptively chosen. This is the first mechanism with worst-case accuracy guarantees that can answer large numbers of interactive queries and is {\\\\em efficient} (in terms of the runtime's dependence on the data universe size). The error is asymptotically \\\\emph{optimal} in its dependence on the number of participants, and depends only logarithmically on the number of queries being answered. The running time is nearly {\\\\em linear} in the size of the data universe. As a further contribution, when we relax the utility requirement and require accuracy only for databases drawn from a rich class of databases, we obtain exponential improvements in running time. Even in this relaxed setting we continue to guarantee privacy for {\\\\em any} input database. Only the utility requirement is relaxed. Specifically, we show that when the input database is drawn from a {\\\\em smooth} distribution — a distribution that does not place too much weight on any single data item — accuracy remains as above, and the running time becomes {\\\\em poly-logarithmic} in the data universe size. The main technical contributions are the application of multiplicative weights techniques to the differential privacy setting, a new privacy analysis for the interactive setting, and a technique for reducing data dimensionality for databases drawn from smooth distributions.\",\"PeriodicalId\":228365,\"journal\":{\"name\":\"2010 IEEE 51st Annual Symposium on Foundations of Computer Science\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"411\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE 51st Annual Symposium on Foundations of Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FOCS.2010.85\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2010.85","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis
We consider statistical data analysis in the interactive setting. In this setting a trusted curator maintains a database of sensitive information about individual participants, and releases privacy-preserving answers to queries as they arrive. Our primary contribution is a new differentially private multiplicative weights mechanism for answering a large number of interactive counting (or linear) queries that arrive online and may be adaptively chosen. This is the first mechanism with worst-case accuracy guarantees that can answer large numbers of interactive queries and is {\em efficient} (in terms of the runtime's dependence on the data universe size). The error is asymptotically \emph{optimal} in its dependence on the number of participants, and depends only logarithmically on the number of queries being answered. The running time is nearly {\em linear} in the size of the data universe. As a further contribution, when we relax the utility requirement and require accuracy only for databases drawn from a rich class of databases, we obtain exponential improvements in running time. Even in this relaxed setting we continue to guarantee privacy for {\em any} input database. Only the utility requirement is relaxed. Specifically, we show that when the input database is drawn from a {\em smooth} distribution — a distribution that does not place too much weight on any single data item — accuracy remains as above, and the running time becomes {\em poly-logarithmic} in the data universe size. The main technical contributions are the application of multiplicative weights techniques to the differential privacy setting, a new privacy analysis for the interactive setting, and a technique for reducing data dimensionality for databases drawn from smooth distributions.