Pub Date : 2022-01-01DOI: 10.1007/978-3-031-13945-1_2
S. Anantharaman, S. Frittella, Benjamin Nguyen
{"title":"Privacy Analysis with a Distributed Transition System and a Data-Wise Metric","authors":"S. Anantharaman, S. Frittella, Benjamin Nguyen","doi":"10.1007/978-3-031-13945-1_2","DOIUrl":"https://doi.org/10.1007/978-3-031-13945-1_2","url":null,"abstract":"","PeriodicalId":91946,"journal":{"name":"Privacy in statistical databases. PSD (Conference : 2004- )","volume":"1 2 1","pages":"15-30"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83474908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1007/978-3-031-13945-1_12
Benet Manzanares-Salor, David Sánchez, Pierre Lison
{"title":"Automatic Evaluation of Disclosure Risks of Text Anonymization Methods","authors":"Benet Manzanares-Salor, David Sánchez, Pierre Lison","doi":"10.1007/978-3-031-13945-1_12","DOIUrl":"https://doi.org/10.1007/978-3-031-13945-1_12","url":null,"abstract":"","PeriodicalId":91946,"journal":{"name":"Privacy in statistical databases. PSD (Conference : 2004- )","volume":"23 1","pages":"157-171"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88989766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-23DOI: 10.1007/978-3-030-57521-2_23
Jeremy Seeman, A. Slavkovic, M. Reimherr
{"title":"Private Posterior Inference Consistent with Public Information: A Case Study in Small Area Estimation from Synthetic Census Data","authors":"Jeremy Seeman, A. Slavkovic, M. Reimherr","doi":"10.1007/978-3-030-57521-2_23","DOIUrl":"https://doi.org/10.1007/978-3-030-57521-2_23","url":null,"abstract":"","PeriodicalId":91946,"journal":{"name":"Privacy in statistical databases. PSD (Conference : 2004- )","volume":"8 1","pages":"323-336"},"PeriodicalIF":0.0,"publicationDate":"2020-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88102382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-23DOI: 10.1007/978-3-030-57521-2_15
Jiurui Tang, Jerome P. Reiter, R. Steorts
{"title":"Bayesian Modeling for Simultaneous Regression and Record Linkage","authors":"Jiurui Tang, Jerome P. Reiter, R. Steorts","doi":"10.1007/978-3-030-57521-2_15","DOIUrl":"https://doi.org/10.1007/978-3-030-57521-2_15","url":null,"abstract":"","PeriodicalId":91946,"journal":{"name":"Privacy in statistical databases. PSD (Conference : 2004- )","volume":"64 1","pages":"209-223"},"PeriodicalIF":0.0,"publicationDate":"2020-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88943312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-15DOI: 10.1007/978-3-030-57521-2_20
Satkartar K. Kinney, Charlotte Looby, Feng Yu
{"title":"Advantages of Imputation vs. Data Swapping for Statistical Disclosure Control","authors":"Satkartar K. Kinney, Charlotte Looby, Feng Yu","doi":"10.1007/978-3-030-57521-2_20","DOIUrl":"https://doi.org/10.1007/978-3-030-57521-2_20","url":null,"abstract":"","PeriodicalId":91946,"journal":{"name":"Privacy in statistical databases. PSD (Conference : 2004- )","volume":"19 1","pages":"281-296"},"PeriodicalIF":0.0,"publicationDate":"2020-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82892549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01Epub Date: 2020-09-16DOI: 10.1007/978-3-030-57521-2_12
Goran Lesaja, Ionut Iacob, Anna Oganian
In this paper, we consider a Controlled Tabular Adjustment (CTA) model for statistical disclosure limitation of tabular data. The goal of the CTA model is to find the closest safe (masked) table to the original table that contains sensitive information. The measure of closeness is usually measured using ℓ1 or ℓ2 norm. However, in the norm-based CTA model, there is no control of how well the statistical properties of the data in the original table are preserved in the masked table. Hence, we propose a different criterion of "closeness" between the masked and original table which attempts to minimally change certain statistics used in the analysis of the table. The Chi-square statistic is among the most utilized measures for the analysis of data in two-dimensional tables. Hence, we propose a Chi-square CTA model which minimizes the objective function that depends on the difference of the Chi-square statistics of the original and masked table. The model is non-linear and non-convex and therefore harder to solve which prompted us to also consider a modification of this model which can be transformed into a linear programming model that can be solved more efficiently. We present numerical results for the two-dimensional table illustrating our novel approach and providing a comparison with norm-based CTA models.
{"title":"On Different Formulations of a Continuous CTA Model.","authors":"Goran Lesaja, Ionut Iacob, Anna Oganian","doi":"10.1007/978-3-030-57521-2_12","DOIUrl":"https://doi.org/10.1007/978-3-030-57521-2_12","url":null,"abstract":"<p><p>In this paper, we consider a Controlled Tabular Adjustment (CTA) model for statistical disclosure limitation of tabular data. The goal of the CTA model is to find the closest safe (masked) table to the original table that contains sensitive information. The measure of closeness is usually measured using <i>ℓ</i> <sub>1</sub> or <i>ℓ</i> <sub>2</sub> norm. However, in the norm-based CTA model, there is no control of how well the statistical properties of the data in the original table are preserved in the masked table. Hence, we propose a different criterion of \"closeness\" between the masked and original table which attempts to minimally change certain statistics used in the analysis of the table. The Chi-square statistic is among the most utilized measures for the analysis of data in two-dimensional tables. Hence, we propose a <i>Chi-square</i> CTA model which minimizes the objective function that depends on the difference of the Chi-square statistics of the original and masked table. The model is non-linear and non-convex and therefore harder to solve which prompted us to also consider a modification of this model which can be transformed into a linear programming model that can be solved more efficiently. We present numerical results for the two-dimensional table illustrating our novel approach and providing a comparison with norm-based CTA models.</p>","PeriodicalId":91946,"journal":{"name":"Privacy in statistical databases. PSD (Conference : 2004- )","volume":"12276 ","pages":"166-179"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8057307/pdf/nihms-1676971.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38822223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01Epub Date: 2020-09-16DOI: 10.1007/978-3-030-57521-2_10
Anna Oganian, Ionut Iacob, Goran Lesaja
One of the most challenging problems for national statistical agencies is how to release to the public microdata sets with a large number of attributes while keeping the disclosure risk of sensitive information of data subjects under control. When statistical agencies alter microdata in order to limit the disclosure risk, they need to take into account relationships between the variables to produce a good quality public data set. Hence, Statistical Disclosure Limitation (SDL) methods should not be univariate (treating each variable independently of others), but preferably multivariate, that is, handling several variables at the same time. Statistical agencies are often concerned about disclosure risk associated with the extreme values of numerical variables. Thus, such observations are often top or bottom-coded in the public use files. Top-coding consists of the substitution of extreme observations of the numerical variable by a threshold, for example, by the 99th percentile of the corresponding variable. Bottom coding is defined similarly but applies to the values in the lower tail of the distribution. We argue that a univariate form of top/bottom-coding may not offer adequate protection for some subpopulations which are different in terms of a top-coded variable from other subpopulations or the whole population. In this paper, we propose a multivariate form of top-coding based on clustering the variables into groups according to some metric of closeness between the variables and then forming the rules for the multivariate top-codes using techniques of Association Rule Mining within the clusters of variables obtained on the previous step. Bottom-coding procedures can be defined in a similar way. We illustrate our method on a genuine multivariate data set of realistic size.
{"title":"Multivariate Top-Coding for Statistical Disclosure Limitation.","authors":"Anna Oganian, Ionut Iacob, Goran Lesaja","doi":"10.1007/978-3-030-57521-2_10","DOIUrl":"10.1007/978-3-030-57521-2_10","url":null,"abstract":"<p><p>One of the most challenging problems for national statistical agencies is how to release to the public microdata sets with a large number of attributes while keeping the disclosure risk of sensitive information of data subjects under control. When statistical agencies alter microdata in order to limit the disclosure risk, they need to take into account relationships between the variables to produce a good quality public data set. Hence, Statistical Disclosure Limitation (SDL) methods should not be univariate (treating each variable independently of others), but preferably multivariate, that is, handling several variables at the same time. Statistical agencies are often concerned about disclosure risk associated with the extreme values of numerical variables. Thus, such observations are often top or bottom-coded in the public use files. Top-coding consists of the substitution of extreme observations of the numerical variable by a threshold, for example, by the 99th percentile of the corresponding variable. Bottom coding is defined similarly but applies to the values in the lower tail of the distribution. We argue that a univariate form of top/bottom-coding may not offer adequate protection for some subpopulations which are different in terms of a top-coded variable from other subpopulations or the whole population. In this paper, we propose a multivariate form of top-coding based on clustering the variables into groups according to some metric of closeness between the variables and then forming the rules for the multivariate top-codes using techniques of Association Rule Mining within the clusters of variables obtained on the previous step. Bottom-coding procedures can be defined in a similar way. We illustrate our method on a genuine multivariate data set of realistic size.</p>","PeriodicalId":91946,"journal":{"name":"Privacy in statistical databases. PSD (Conference : 2004- )","volume":"12276 ","pages":"136-148"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8057308/pdf/nihms-1676966.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38822222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-14DOI: 10.1007/978-3-030-57521-2_14
Douwe Hut, J. Goseling, M. V. Lieshout, P. D. Wolf, E. D. Jonge
{"title":"Statistical Disclosure Control When Publishing on Thematic Maps","authors":"Douwe Hut, J. Goseling, M. V. Lieshout, P. D. Wolf, E. D. Jonge","doi":"10.1007/978-3-030-57521-2_14","DOIUrl":"https://doi.org/10.1007/978-3-030-57521-2_14","url":null,"abstract":"","PeriodicalId":91946,"journal":{"name":"Privacy in statistical databases. PSD (Conference : 2004- )","volume":"4 1","pages":"195-205"},"PeriodicalIF":0.0,"publicationDate":"2020-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81336038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}