User-private information retrieval (UPIR) is the art of retrieving information without telling the information holder who you are. UPIR is sometimes called anonymous keyword search. This article discusses a UPIR protocol in which the users form a peer-to-peer network over which they collaborate in protecting the privacy of each other. The protocol is known as P2P UPIR. It will be explained why the P2P UPIR protocol may have a flaw in the protection of the privacy of the client in front of the server. Two alternative variations of the protocols are discussed. One of these will prove to resolve the privacy flaw discovered in the original protocol. Hence the aim of this article is to propose a modification of the P2P UPIR protocol. It is justified why the projective planes are still the optimal configurations for P2P UPIR for the modified protocol.
{"title":"On query self-submission in peer-to-peer user-private information retrieval","authors":"K. Stokes, M. Bras-Amorós","doi":"10.1145/1971690.1971697","DOIUrl":"https://doi.org/10.1145/1971690.1971697","url":null,"abstract":"User-private information retrieval (UPIR) is the art of retrieving information without telling the information holder who you are. UPIR is sometimes called anonymous keyword search. This article discusses a UPIR protocol in which the users form a peer-to-peer network over which they collaborate in protecting the privacy of each other. The protocol is known as P2P UPIR. It will be explained why the P2P UPIR protocol may have a flaw in the protection of the privacy of the client in front of the server. Two alternative variations of the protocols are discussed. One of these will prove to resolve the privacy flaw discovered in the original protocol. Hence the aim of this article is to propose a modification of the P2P UPIR protocol. It is justified why the projective planes are still the optimal configurations for P2P UPIR for the modified protocol.","PeriodicalId":245552,"journal":{"name":"International Conference on Pattern Analysis and Intelligent Systems","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132760033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
International research collaborations access and integrate data collected in different countries. For different reasons, e.g., legislation, data owners need to control who has access to and how their data are analyzed. The analysis of data is performed in statistical software, which is usually called on top of a data management system, e.g., a database management system (DBMS). Therefore access to data is controlled by the DBMS, while statistical analyses are usually controlled by another system. To improve security we propose a novel architecture for executing statistical analysis on data stored in a DBMS. In the proposed architecture the statistical software is called from a DBMS. The architecture allows control of both data retrieval and statistical data analysis from one system, i.e., DBMS. We implemented a prototype for executing analysis programs by calling statistical software SAS from a relational DBMS IBM DB2 over data stored in DB2 database. This paper describes the proposed architecture and the implemented prototype.
国际研究合作获取和整合在不同国家收集的数据。出于不同的原因,例如立法,数据所有者需要控制谁可以访问以及如何分析他们的数据。数据分析是在统计软件中进行的,统计软件通常被称为数据管理系统,例如数据库管理系统(DBMS)。因此,对数据的访问由DBMS控制,而统计分析通常由另一个系统控制。为了提高安全性,我们提出了一种对存储在DBMS中的数据执行统计分析的新架构。在提出的体系结构中,从DBMS调用统计软件。该体系结构允许从一个系统(即DBMS)控制数据检索和统计数据分析。我们通过在DB2数据库中存储的数据上从关系DBMS IBM DB2调用统计软件SAS,实现了执行分析程序的原型。本文描述了提出的体系结构和实现的原型。
{"title":"Improving security by using a database management system for integrated statistical data analysis","authors":"Vadym Khatsanovskyy, Jan-Eric Litton, R. Fomkin","doi":"10.1145/1971690.1971699","DOIUrl":"https://doi.org/10.1145/1971690.1971699","url":null,"abstract":"International research collaborations access and integrate data collected in different countries. For different reasons, e.g., legislation, data owners need to control who has access to and how their data are analyzed. The analysis of data is performed in statistical software, which is usually called on top of a data management system, e.g., a database management system (DBMS). Therefore access to data is controlled by the DBMS, while statistical analyses are usually controlled by another system. To improve security we propose a novel architecture for executing statistical analysis on data stored in a DBMS. In the proposed architecture the statistical software is called from a DBMS. The architecture allows control of both data retrieval and statistical data analysis from one system, i.e., DBMS. We implemented a prototype for executing analysis programs by calling statistical software SAS from a relational DBMS IBM DB2 over data stored in DB2 database. This paper describes the proposed architecture and the implemented prototype.","PeriodicalId":245552,"journal":{"name":"International Conference on Pattern Analysis and Intelligent Systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127621389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Transaction data about individuals are increasingly collected to support a plethora of applications, spanning from marketing to biomedical studies. Publishing these data is required by many organizations, but may result in privacy breaches, if an attacker exploits potentially identifying information to link individuals to their records in the published data. Algorithms that prevent this threat by transforming transaction data prior to their release have been proposed recently, but incur significant information loss due to their inability to accommodate a range of different privacy requirements that data owners often have. To address this issue, we propose a novel clustering-based framework to anonymizing transaction data. Our framework provides the basis for designing algorithms that explore a larger solution space than existing methods, which allows publishing data with less information loss, and can satisfy a wide range of privacy requirements. Based on this framework, we develop PCTA, a generalization-based algorithm to construct anonymizations that incur a small amount of information loss under many different privacy requirements. Experiments with benchmark datasets verify that PCTA significantly outperforms the current state-of-the-art algorithms in terms of data utility, while being comparable in terms of efficiency.
{"title":"PCTA: privacy-constrained clustering-based transaction data anonymization","authors":"A. Gkoulalas-Divanis, G. Loukides","doi":"10.1145/1971690.1971695","DOIUrl":"https://doi.org/10.1145/1971690.1971695","url":null,"abstract":"Transaction data about individuals are increasingly collected to support a plethora of applications, spanning from marketing to biomedical studies. Publishing these data is required by many organizations, but may result in privacy breaches, if an attacker exploits potentially identifying information to link individuals to their records in the published data. Algorithms that prevent this threat by transforming transaction data prior to their release have been proposed recently, but incur significant information loss due to their inability to accommodate a range of different privacy requirements that data owners often have. To address this issue, we propose a novel clustering-based framework to anonymizing transaction data. Our framework provides the basis for designing algorithms that explore a larger solution space than existing methods, which allows publishing data with less information loss, and can satisfy a wide range of privacy requirements. Based on this framework, we develop PCTA, a generalization-based algorithm to construct anonymizations that incur a small amount of information loss under many different privacy requirements. Experiments with benchmark datasets verify that PCTA significantly outperforms the current state-of-the-art algorithms in terms of data utility, while being comparable in terms of efficiency.","PeriodicalId":245552,"journal":{"name":"International Conference on Pattern Analysis and Intelligent Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116701647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the increasing amount of data collected by service providers, privacy concerns increase for data owners who must provide private data to receive services. Legislative acts require service providers to protect the privacy of customers. Privacy policy frameworks, such as P3P, assist the service providers by describing their privacy policies to customers (e.g. publishing privacy policy on websites). Unfortunately, providing the policies alone does not guarantee that they are actually enforced. Furthermore, a privacy-preserving model should consider the privacy preferences of both the data provider and collector. This paper discusses the challenges in development of capturing privacy predicates in a lattice structures. A use case study is presented to show the applicability of the lattice approach to a specific domain. We also present a comprehensive study on applying a lattice-based approach to P3P. We show capturing privacy elements of P3P in a lattice format facilitates managing and enforcing policies presented in P3P and accommodates the customization of privacy practices and preferences of data and service providers. We also propose that the outcome of this approach can be used on lattice-based privacy aware access control models [8].
{"title":"Capturing P3P semantics using an enforceable lattice-based structure","authors":"Kambiz Ghazinour, K. Barker","doi":"10.1145/1971690.1971694","DOIUrl":"https://doi.org/10.1145/1971690.1971694","url":null,"abstract":"With the increasing amount of data collected by service providers, privacy concerns increase for data owners who must provide private data to receive services. Legislative acts require service providers to protect the privacy of customers. Privacy policy frameworks, such as P3P, assist the service providers by describing their privacy policies to customers (e.g. publishing privacy policy on websites). Unfortunately, providing the policies alone does not guarantee that they are actually enforced. Furthermore, a privacy-preserving model should consider the privacy preferences of both the data provider and collector. This paper discusses the challenges in development of capturing privacy predicates in a lattice structures. A use case study is presented to show the applicability of the lattice approach to a specific domain. We also present a comprehensive study on applying a lattice-based approach to P3P. We show capturing privacy elements of P3P in a lattice format facilitates managing and enforcing policies presented in P3P and accommodates the customization of privacy practices and preferences of data and service providers. We also propose that the outcome of this approach can be used on lattice-based privacy aware access control models [8].","PeriodicalId":245552,"journal":{"name":"International Conference on Pattern Analysis and Intelligent Systems","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132879258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
During the similarity join process, one or more sources may not allow sharing the whole data with other sources. In this case, privacy preserved similarity join is required. We showed in our previous work [4] that using long attributes, such as paper abstracts, movie summaries, product descriptions, and user feedbacks, could improve the similarity join accuracy under supervised learning. However, the existing secure protocols for similarity join methods can not be used to join tables using these long attributes. Moreover, the majority of the existing privacy-preserving protocols did not consider the semantic similarities during the similarity join process. In this paper, we introduce a secure efficient protocol to semantically join tables when the join attributes are long attributes. Furthermore, instead of using machine learning methods, which are not always applicable, we use similarity thresholds to decide matched pairs. Results show that our protocol can efficiently join tables using the long attributes by considering the semantic relationships among the long string values. Therefore, it improves the overall secure similarity join performance.
{"title":"A privacy preserving efficient protocol for semantic similarity join using long string attributes","authors":"Bilal Hawashin, F. Fotouhi, T. Truta","doi":"10.1145/1971690.1971696","DOIUrl":"https://doi.org/10.1145/1971690.1971696","url":null,"abstract":"During the similarity join process, one or more sources may not allow sharing the whole data with other sources. In this case, privacy preserved similarity join is required. We showed in our previous work [4] that using long attributes, such as paper abstracts, movie summaries, product descriptions, and user feedbacks, could improve the similarity join accuracy under supervised learning. However, the existing secure protocols for similarity join methods can not be used to join tables using these long attributes. Moreover, the majority of the existing privacy-preserving protocols did not consider the semantic similarities during the similarity join process. In this paper, we introduce a secure efficient protocol to semantically join tables when the join attributes are long attributes. Furthermore, instead of using machine learning methods, which are not always applicable, we use similarity thresholds to decide matched pairs. Results show that our protocol can efficiently join tables using the long attributes by considering the semantic relationships among the long string values. Therefore, it improves the overall secure similarity join performance.","PeriodicalId":245552,"journal":{"name":"International Conference on Pattern Analysis and Intelligent Systems","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132940447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper outlines the privacy concerns in the Cross-Community Reputation (CCR) model for sharing reputation knowledge across communities. These privacy concerns are discussed and modeled, and a policy-based approach that copes with them is presented.
{"title":"Privacy issues with sharing reputation across virtual communities","authors":"Nurit Gal-Oz, Tal Grinshpoun, E. Gudes","doi":"10.1145/1971690.1971693","DOIUrl":"https://doi.org/10.1145/1971690.1971693","url":null,"abstract":"This paper outlines the privacy concerns in the Cross-Community Reputation (CCR) model for sharing reputation knowledge across communities. These privacy concerns are discussed and modeled, and a policy-based approach that copes with them is presented.","PeriodicalId":245552,"journal":{"name":"International Conference on Pattern Analysis and Intelligent Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114328146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Digital storage in the information society allows perfect and unlimited remembering. Yet, the right of an individual to enforce oblivion for pieces of information about her is part of her fundamental right to privacy. We propose a solution to digital forgetting based on anonymously fingerprinting expiration dates. In our solution, people who learn information about an individual are rationally interested in helping the individual enforce her oblivion policy. Thanks to this rational involvement, even services for content spreading like Facebook or YouTube would be interested in fingerprinting downloads, thereby effectively enforcing the right of content owners to canceling content.
{"title":"Rational enforcement of digital oblivion","authors":"J. Domingo-Ferrer","doi":"10.1145/1971690.1971692","DOIUrl":"https://doi.org/10.1145/1971690.1971692","url":null,"abstract":"Digital storage in the information society allows perfect and unlimited remembering. Yet, the right of an individual to enforce oblivion for pieces of information about her is part of her fundamental right to privacy. We propose a solution to digital forgetting based on anonymously fingerprinting expiration dates. In our solution, people who learn information about an individual are rationally interested in helping the individual enforce her oblivion policy. Thanks to this rational involvement, even services for content spreading like Facebook or YouTube would be interested in fingerprinting downloads, thereby effectively enforcing the right of content owners to canceling content.","PeriodicalId":245552,"journal":{"name":"International Conference on Pattern Analysis and Intelligent Systems","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123524973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data anonymization is an expensive process, and sometimes the utility of the anonymized data may not justify the cost of anonymization. For example in a distributed setting where the data reside at different sites and needs to be anonymized without a trusted server, Secure Multiparty Computation (SMC) protocols need to be employed. However, the cost of SMC protocols could be prohibitive, and therefore the parties may want to look ahead of anonymization to decide if it is worth running the expensive SMC protocols. In this work, we describe a probabilistic fast look ahead of k-anonymization of horizontally partitioned data. The look ahead returns an upper bound on the probability that k-anonymity will be achieved at a certain utility where the utility is quantified by commonly used metrics from the anonymization literature. The look ahead process exploits prior information such as total data size, attribute distributions, or attribute correlations, all of which require simple SMC operations to compute. More specifically, given only statistics on the private dataset, we show how to calculate the probability that a mapping of values to generalizations will make a private dataset k-anonymous.
{"title":"A probabilistic look ahead of anonymization: keynote talk","authors":"Y. Saygin","doi":"10.1145/1971690.1971691","DOIUrl":"https://doi.org/10.1145/1971690.1971691","url":null,"abstract":"Data anonymization is an expensive process, and sometimes the utility of the anonymized data may not justify the cost of anonymization. For example in a distributed setting where the data reside at different sites and needs to be anonymized without a trusted server, Secure Multiparty Computation (SMC) protocols need to be employed. However, the cost of SMC protocols could be prohibitive, and therefore the parties may want to look ahead of anonymization to decide if it is worth running the expensive SMC protocols. In this work, we describe a probabilistic fast look ahead of k-anonymization of horizontally partitioned data. The look ahead returns an upper bound on the probability that k-anonymity will be achieved at a certain utility where the utility is quantified by commonly used metrics from the anonymization literature. The look ahead process exploits prior information such as total data size, attribute distributions, or attribute correlations, all of which require simple SMC operations to compute. More specifically, given only statistics on the private dataset, we show how to calculate the probability that a mapping of values to generalizations will make a private dataset k-anonymous.","PeriodicalId":245552,"journal":{"name":"International Conference on Pattern Analysis and Intelligent Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132549971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Micro-data protection is a hot topic in the field of Statistical Disclosure Control (SDC), that has gained special interest after the disclosure of 658000 queries by the AOL search engine in August 2006. Many algorithms, methods and properties have been proposed to deal with micro-data disclosure, p-Sensitive k-anonymity has been recently defined as a sophistication of k-anonymity. This new property requires that there be at least p different values for each confidential attribute within the records sharing a combination of key attributes. Like k-anonymity, the algorithm originally proposed to achieve this property was based on generalisations and suppressions; when data sets are numerical this has several data utility problems, namely turning numerical key attributes into categorical, injecting new categories, injecting missing data, and so on. In this article, we recall the foundational concepts of micro-aggregation, k-anonymity and p-sensitive k-anonymity. We show that k-anonymity and p-sensitive k-anonymity can be achieved in numerical data sets by means of micro-aggregation heuristics properly adapted to deal with this task. In addition, we present and evaluate two heuristics for p-sensitive k-anonymity which, being based on micro-aggregation, overcome most of the drawbacks resulting from the generalisation and suppression method.
{"title":"Micro-aggregation-based heuristics for p-sensitive k-anonymity: one step beyond","authors":"A. Solanas, F. Sebé, J. Domingo-Ferrer","doi":"10.1145/1379287.1379300","DOIUrl":"https://doi.org/10.1145/1379287.1379300","url":null,"abstract":"Micro-data protection is a hot topic in the field of Statistical Disclosure Control (SDC), that has gained special interest after the disclosure of 658000 queries by the AOL search engine in August 2006. Many algorithms, methods and properties have been proposed to deal with micro-data disclosure, p-Sensitive k-anonymity has been recently defined as a sophistication of k-anonymity. This new property requires that there be at least p different values for each confidential attribute within the records sharing a combination of key attributes. Like k-anonymity, the algorithm originally proposed to achieve this property was based on generalisations and suppressions; when data sets are numerical this has several data utility problems, namely turning numerical key attributes into categorical, injecting new categories, injecting missing data, and so on. In this article, we recall the foundational concepts of micro-aggregation, k-anonymity and p-sensitive k-anonymity. We show that k-anonymity and p-sensitive k-anonymity can be achieved in numerical data sets by means of micro-aggregation heuristics properly adapted to deal with this task. In addition, we present and evaluate two heuristics for p-sensitive k-anonymity which, being based on micro-aggregation, overcome most of the drawbacks resulting from the generalisation and suppression method.","PeriodicalId":245552,"journal":{"name":"International Conference on Pattern Analysis and Intelligent Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127584619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we consider the on-line max and min query auditing problem: given a private association between fields in a data set, a sequence of max and min queries that have already been posed about the data, their corresponding answers and a new query, deny the answer if a private information is inferred or give the true answer otherwise. We give a probabilistic definition of privacy and demonstrate that max and min queries, without "no duplicates" assumption, can be audited by means of a Bayesian network. Moreover, we show how our auditing approach is able to manage user prior-knowledge.
{"title":"A Bayesian approach for on-line max and min auditing","authors":"G. Canfora, B. Cavallo","doi":"10.1145/1379287.1379292","DOIUrl":"https://doi.org/10.1145/1379287.1379292","url":null,"abstract":"In this paper we consider the on-line max and min query auditing problem: given a private association between fields in a data set, a sequence of max and min queries that have already been posed about the data, their corresponding answers and a new query, deny the answer if a private information is inferred or give the true answer otherwise. We give a probabilistic definition of privacy and demonstrate that max and min queries, without \"no duplicates\" assumption, can be audited by means of a Bayesian network. Moreover, we show how our auditing approach is able to manage user prior-knowledge.","PeriodicalId":245552,"journal":{"name":"International Conference on Pattern Analysis and Intelligent Systems","volume":"58 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114111161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}