The stochastic block model (SBM) and degree-corrected block model (DCBM) are network models often selected as the fundamental setting in which to analyze the theoretical properties of community detection methods. We consider the problem of spectral clustering of SBM and DCBM networks under a local form of edge differential privacy. Using a randomized response privacy mechanism called the edge-flip mechanism, we develop theoretical guarantees for differentially private community detection, demonstrating conditions under which this strong privacy guarantee can be upheld while achieving spectral clustering convergence rates that match the known rates without privacy. We prove the strongest theoretical results are achievable for dense networks (those with node degree linear in the number of nodes), while weak consistency is achievable under mild sparsity (node degree greater than ). We empirically demonstrate our results on a number of network examples.
Privacy protection is an important requirement in many statistical studies. A recently proposed data collection method, triple matrix-masking, retains exact summary statistics without exposing the raw data at any point in the process. In this paper, we provide theoretical formulation and proofs showing that a modified version of the procedure is strong collection obfuscating: no party in the data collection process is able to gain knowledge of the individual level data, even with some partially masked data information in addition to the publicly published data. This provides a theoretical foundation for the usage of such a procedure to collect masked data that allows exact statistical inference for linear models, while preserving a well-defined notion of privacy protection for each individual participant in the study. This paper fits into a line of work tackling the problem of how to create useful synthetic data without having a trustworthy data aggregator. We achieve this by splitting the trust between two parties, the "masking service provider" and the "data collector."
A major obstacle that hinders medical and social research is the lack of reliable data due to people's reluctance to reveal private information to strangers. Fortunately, statistical inference always targets a well-defined population rather than a particular individual subject and, in many current applications, data can be collected using a web-based system or other mobile devices. These two characteristics enable us to develop a data collection method, called triple matrix-masking (TM 2 ), which offers strong privacy protection with an immediate matrix transformation so that even the researchers cannot see the data, and then further uses matrix transformations to guarantee that the data will still be analyzable by standard statistical methods. The entities involved in the proposed process are a masking service provider who receives the initially masked data and then applies another mask, and the data collectors who partially decrypt the now doubly masked data and then apply a third mask before releasing the data to the public. A critical feature of the method is that the keys to generate the matrices are held separately. This ensures that nobody sees the actual data, but because of the specially designed transformations, statistical inference on parameters of interest can be conducted with the same results as if the original data were used. Hence the TM2 method hides sensitive data with no efficiency loss for statistical inference of binary and normal data, which improves over Warner's randomized response technique. In addition, we add several features to the proposed procedure: an error checking mechanism is built into the data collection process in order to make sure that the masked data used for analysis are an appropriate transformation of the original data; and a partial masking technique is introduced to grant data users access to non-sensitive personal information while sensitive information remains hidden.