{"title":"所有隐私制度下的局部隐私直方图","authors":"Clément L. Canonne, Abigail Gentle","doi":"arxiv-2408.04888","DOIUrl":null,"url":null,"abstract":"Frequency estimation, a.k.a. histograms, is a workhorse of data analysis, and\nas such has been thoroughly studied under differentially privacy. In\nparticular, computing histograms in the local model of privacy has been the\nfocus of a fruitful recent line of work, and various algorithms have been\nproposed, achieving the order-optimal $\\ell_\\infty$ error in the high-privacy\n(small $\\varepsilon$) regime while balancing other considerations such as time-\nand communication-efficiency. However, to the best of our knowledge, the\npicture is much less clear when it comes to the medium- or low-privacy regime\n(large $\\varepsilon$), despite its increased relevance in practice. In this\npaper, we investigate locally private histograms, and the very related\ndistribution learning task, in this medium-to-low privacy regime, and establish\nnear-tight (and somewhat unexpected) bounds on the $\\ell_\\infty$ error\nachievable. Our theoretical findings emerge from a novel analysis, which\nappears to improve bounds across the board for the locally private histogram\nproblem. We back our theoretical findings by an empirical comparison of\nexisting algorithms in all privacy regimes, to assess their typical performance\nand behaviour beyond the worst-case setting.","PeriodicalId":501216,"journal":{"name":"arXiv - CS - Discrete Mathematics","volume":"193 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Locally Private Histograms in All Privacy Regimes\",\"authors\":\"Clément L. Canonne, Abigail Gentle\",\"doi\":\"arxiv-2408.04888\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Frequency estimation, a.k.a. histograms, is a workhorse of data analysis, and\\nas such has been thoroughly studied under differentially privacy. In\\nparticular, computing histograms in the local model of privacy has been the\\nfocus of a fruitful recent line of work, and various algorithms have been\\nproposed, achieving the order-optimal $\\\\ell_\\\\infty$ error in the high-privacy\\n(small $\\\\varepsilon$) regime while balancing other considerations such as time-\\nand communication-efficiency. However, to the best of our knowledge, the\\npicture is much less clear when it comes to the medium- or low-privacy regime\\n(large $\\\\varepsilon$), despite its increased relevance in practice. In this\\npaper, we investigate locally private histograms, and the very related\\ndistribution learning task, in this medium-to-low privacy regime, and establish\\nnear-tight (and somewhat unexpected) bounds on the $\\\\ell_\\\\infty$ error\\nachievable. Our theoretical findings emerge from a novel analysis, which\\nappears to improve bounds across the board for the locally private histogram\\nproblem. We back our theoretical findings by an empirical comparison of\\nexisting algorithms in all privacy regimes, to assess their typical performance\\nand behaviour beyond the worst-case setting.\",\"PeriodicalId\":501216,\"journal\":{\"name\":\"arXiv - CS - Discrete Mathematics\",\"volume\":\"193 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Discrete Mathematics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.04888\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Discrete Mathematics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04888","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Frequency estimation, a.k.a. histograms, is a workhorse of data analysis, and
as such has been thoroughly studied under differentially privacy. In
particular, computing histograms in the local model of privacy has been the
focus of a fruitful recent line of work, and various algorithms have been
proposed, achieving the order-optimal $\ell_\infty$ error in the high-privacy
(small $\varepsilon$) regime while balancing other considerations such as time-
and communication-efficiency. However, to the best of our knowledge, the
picture is much less clear when it comes to the medium- or low-privacy regime
(large $\varepsilon$), despite its increased relevance in practice. In this
paper, we investigate locally private histograms, and the very related
distribution learning task, in this medium-to-low privacy regime, and establish
near-tight (and somewhat unexpected) bounds on the $\ell_\infty$ error
achievable. Our theoretical findings emerge from a novel analysis, which
appears to improve bounds across the board for the locally private histogram
problem. We back our theoretical findings by an empirical comparison of
existing algorithms in all privacy regimes, to assess their typical performance
and behaviour beyond the worst-case setting.